README.md 2.63 KB
Newer Older
Stefano Beretta's avatar
Stefano Beretta committed
1
# Fiumara_BasePrimeEd2022_RNAseq
Stefano Beretta's avatar
Stefano Beretta committed
2

Stefano Beretta's avatar
Stefano Beretta committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fiumara M, Ferrari F, Omer-Javed A, Beretta S et al.
**Discovery and mitigation of genotoxic effects of base and prime editing in human hematopoietic stem cells.**
_Nature Biotechnology_ 2023.
- RNA-seq Base Editing: [GSE218462](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE218462)
- RNA-seq Prime Editing: [GSE218463](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE218463)

---

### Analyses ###

RNA-seq analysis to compare Treated (BE4, ABE8, Cas9, Mock electro) vs Untreated samples at Day1 or Day3 (GSE218462), and Prime edited vs Mock electro samples (GSE218463):
- input pre-processing with `FastQC` and quality trimming with `Trimmomatic`;
- read alignments to the human genome assembly (GRCh38) with the `STAR` using standard parameters;
- gene quantification computed with `featureCounts`;
- Differential Gene Expression (DGE) analysis with the R/Bioconductor package `DESeq2`: genes having FDR < 0.05 were considered as differentially expressed
- post-analyses with the R/Bioconductor package `ClusterProfiler` using the Hallmark collection from MSigDB as reference database.
- visualization of the (spliced) alignments on the TP73 gene was done with Integrative Genomes Viewer (`IGV`). 
Stefano Beretta's avatar
Stefano Beretta committed
20
21
22
23
24
25
26
27

Variant calling analysis on RNA-Seq base editing data:
- merging of reads from replicates of each condition and downsampling to 120M with `SeqTK`;
- alignment to the human genome assembly (GRCh38) with `STAR`; 
- mark duplicates with Picard `MarkDuplicates` and split of eads containing Ns with GATK `SplitNCigarReads`;
- variant calling using three different tools: `HaplotypeCaller` (with options `--min-base-quality-score 20`, `--dont-use-soft-clipped-bases`, and `–standard-min-confidence-threshold-for-calling 20`), `Mutect2` (in tumor-only mode, with options `--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter`), and `FreeBayes`.

Nucleotide composition of each position was also assessed using REDItools (https://github.com/tflati/reditools2.0) on each sample, discarding all the positions having coverage lower than 20 and base quality lower than 30 to avoid errors due to low sampling. Next, variants called by each tool in the untreated controls were filtered out in the treated samples to enrich for private mutations. This procedure retained only variants in high-quality genomic positions in both treated and untreated sample, for which the untreated sample showed ≥ 99% of reads supporting the reference, non-mutant, base at the position of the mutation (based on REDItools). The final lists of variants for each sample were made by those called by all tools and passing the filtering procedure (intersection).