# Fiumara_BasePrimeEd2022_RNAseq Fiumara M, Ferrari F, Omer-Javed A, Beretta S et al. **Genotoxic effects of base and prime editing in human hematopoietic stem cells.** _Nature Biotechnology_ 2023. - PMID: [37679541](https://www.ncbi.nlm.nih.gov/pubmed/37679541) - DOI: [10.1038/s41587-023-01915-4](https://doi.org/10.1038/s41587-023-01915-4) - RNA-seq Base Editing: [GSE218462](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE218462) - RNA-seq Prime Editing: [GSE218463](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE218463) --- ### Analyses ### RNA-seq analysis to compare Treated (BE4, ABE8, Cas9, Mock electro) vs Untreated samples at Day1 or Day3 (GSE218462), and Prime edited vs Mock electro samples (GSE218463): - Input pre-processing with _FastQC_ and quality trimming with _Trimmomatic_; - Read alignments to the human genome assembly (GRCh38) with the _STAR_ using standard parameters - Gene quantification computed with _featureCounts_ - Differential Gene Expression (DGE) analysis with the R/Bioconductor package _DESeq2_: genes having FDR < 0.05 were considered as differentially expressed - Post-analyses with the R/Bioconductor package _ClusterProfiler_ using the Hallmark collection from _MSigDB_ as reference database - Visualization of the (spliced) alignments on the TP73 gene was done with Integrative Genomes Viewer (_IGV_). Variant calling analysis on RNA-Seq base editing data (GSE218462): - Merging of reads from replicates of each condition and downsampling to 120M with _Seqtk_ - Alignment to the human genome assembly (GRCh38) with _STAR_ - Mark duplicates with _Picard MarkDuplicates_ and split of eads containing Ns with _GATK SplitNCigarReads_ - Variant calling using three different tools: _HaplotypeCaller_ (with options `--min-base-quality-score 20`, `--dont-use-soft-clipped-bases`, and `–standard-min-confidence-threshold-for-calling 20`), _Mutect2_ (in tumor-only mode, with options `--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter`), and _FreeBayes_. Nucleotide composition of each position was also assessed using [_REDItools_](https://github.com/tflati/reditools2.0) on each sample, discarding all the positions having coverage lower than 20 and base quality lower than 30 to avoid errors due to low sampling.
Variants called by each tool in the untreated controls were filtered out in the treated samples to enrich for private mutations. This procedure retained only variants in high-quality genomic positions in both treated and untreated sample, for which the untreated sample showed ≥ 99% of reads supporting the reference, non-mutant, base at the position of the mutation (based on REDItools).
The final lists of variants for each sample were made by those called by all tools and passing the filtering procedure (intersection).