Fiumara_BasePrimeEd2022_RNAseq
Fiumara M, Ferrari F, Omer-Javed A, Beretta S et al. Genotoxic effects of base and prime editing in human hematopoietic stem cells. Nature Biotechnology 2023.
- PMID: 37679541
- DOI: 10.1038/s41587-023-01915-4
- RNA-seq Base Editing: GSE218462
- RNA-seq Prime Editing: GSE218463
Analyses
RNA-seq analysis to compare Treated (BE4, ABE8, Cas9, Mock electro) vs Untreated samples at Day1 or Day3 (GSE218462), and Prime edited vs Mock electro samples (GSE218463):
- Input pre-processing with FastQC and quality trimming with Trimmomatic;
- Read alignments to the human genome assembly (GRCh38) with the STAR using standard parameters
- Gene quantification computed with featureCounts
- Differential Gene Expression (DGE) analysis with the R/Bioconductor package DESeq2: genes having FDR < 0.05 were considered as differentially expressed
- Post-analyses with the R/Bioconductor package ClusterProfiler using the Hallmark collection from MSigDB as reference database
- Visualization of the (spliced) alignments on the TP73 gene was done with Integrative Genomes Viewer (IGV).
Variant calling analysis on RNA-Seq base editing data (GSE218462):
- Merging of reads from replicates of each condition and downsampling to 120M with Seqtk
- Alignment to the human genome assembly (GRCh38) with STAR
- Mark duplicates with Picard MarkDuplicates and split of eads containing Ns with GATK SplitNCigarReads
- Variant calling using three different tools: HaplotypeCaller (with options
--min-base-quality-score 20
,--dont-use-soft-clipped-bases
, and–standard-min-confidence-threshold-for-calling 20
), Mutect2 (in tumor-only mode, with options--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter
), and FreeBayes.
Nucleotide composition of each position was also assessed using REDItools on each sample, discarding all the positions having coverage lower than 20 and base quality lower than 30 to avoid errors due to low sampling. Variants called by each tool in the untreated controls were filtered out in the treated samples to enrich for private mutations. This procedure retained only variants in high-quality genomic positions in both treated and untreated sample, for which the untreated sample showed ≥ 99% of reads supporting the reference, non-mutant, base at the position of the mutation (based on REDItools). The final lists of variants for each sample were made by those called by all tools and passing the filtering procedure (intersection).