Commit 28a1d6db authored by Stefano Beretta's avatar Stefano Beretta
Browse files

Update README.md

parent bc10093f
...@@ -11,17 +11,17 @@ _Nature Biotechnology_ 2023. ...@@ -11,17 +11,17 @@ _Nature Biotechnology_ 2023.
### Analyses ### ### Analyses ###
RNA-seq analysis to compare Treated (BE4, ABE8, Cas9, Mock electro) vs Untreated samples at Day1 or Day3 (GSE218462), and Prime edited vs Mock electro samples (GSE218463): RNA-seq analysis to compare Treated (BE4, ABE8, Cas9, Mock electro) vs Untreated samples at Day1 or Day3 (GSE218462), and Prime edited vs Mock electro samples (GSE218463):
- input pre-processing with `FastQC` and quality trimming with `Trimmomatic`; - Input pre-processing with _FastQC_ and quality trimming with _Trimmomatic_;
- read alignments to the human genome assembly (GRCh38) with the `STAR` using standard parameters; - Read alignments to the human genome assembly (GRCh38) with the _STAR_ using standard parameters
- gene quantification computed with `featureCounts`; - Gene quantification computed with _featureCounts_
- Differential Gene Expression (DGE) analysis with the R/Bioconductor package `DESeq2`: genes having FDR < 0.05 were considered as differentially expressed - Differential Gene Expression (DGE) analysis with the R/Bioconductor package _DESeq2_: genes having FDR < 0.05 were considered as differentially expressed
- post-analyses with the R/Bioconductor package `ClusterProfiler` using the Hallmark collection from MSigDB as reference database. - Post-analyses with the R/Bioconductor package _ClusterProfiler_ using the Hallmark collection from _MSigDB_ as reference database
- visualization of the (spliced) alignments on the TP73 gene was done with Integrative Genomes Viewer (`IGV`). - Visualization of the (spliced) alignments on the TP73 gene was done with Integrative Genomes Viewer (_IGV_).
Variant calling analysis on RNA-Seq base editing data: Variant calling analysis on RNA-Seq base editing data:
- merging of reads from replicates of each condition and downsampling to 120M with `SeqTK`; - Merging of reads from replicates of each condition and downsampling to 120M with _Seqtk_
- alignment to the human genome assembly (GRCh38) with `STAR`; - Alignment to the human genome assembly (GRCh38) with _STAR_
- mark duplicates with Picard `MarkDuplicates` and split of eads containing Ns with GATK `SplitNCigarReads`; - Mark duplicates with _Picard MarkDuplicates_ and split of eads containing Ns with _GATK SplitNCigarReads_
- variant calling using three different tools: `HaplotypeCaller` (with options `--min-base-quality-score 20`, `--dont-use-soft-clipped-bases`, and `–standard-min-confidence-threshold-for-calling 20`), `Mutect2` (in tumor-only mode, with options `--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter`), and `FreeBayes`. - Variant calling using three different tools: _HaplotypeCaller_ (with options `--min-base-quality-score 20`, `--dont-use-soft-clipped-bases`, and `–standard-min-confidence-threshold-for-calling 20`), _Mutect2_ (in tumor-only mode, with options `--disable-read-filter MateOnSameContigOrNoMappedMateReadFilter`), and _FreeBayes_.
Nucleotide composition of each position was also assessed using REDItools (https://github.com/tflati/reditools2.0) on each sample, discarding all the positions having coverage lower than 20 and base quality lower than 30 to avoid errors due to low sampling. Next, variants called by each tool in the untreated controls were filtered out in the treated samples to enrich for private mutations. This procedure retained only variants in high-quality genomic positions in both treated and untreated sample, for which the untreated sample showed ≥ 99% of reads supporting the reference, non-mutant, base at the position of the mutation (based on REDItools). The final lists of variants for each sample were made by those called by all tools and passing the filtering procedure (intersection). Nucleotide composition of each position was also assessed using REDItools (https://github.com/tflati/reditools2.0) on each sample, discarding all the positions having coverage lower than 20 and base quality lower than 30 to avoid errors due to low sampling. Next, variants called by each tool in the untreated controls were filtered out in the treated samples to enrich for private mutations. This procedure retained only variants in high-quality genomic positions in both treated and untreated sample, for which the untreated sample showed ≥ 99% of reads supporting the reference, non-mutant, base at the position of the mutation (based on REDItools). The final lists of variants for each sample were made by those called by all tools and passing the filtering procedure (intersection).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment