Bulk RNAseq data analysis workflow includes the following steps:
- Quality control by FastQC
- Trimming of bad quality reads with TrimGalore
- Alignment with STAR
Running command
"STAR " + "--runThreadN {threads} " + "--genomeDir {input.genome} " + "--readFilesIn {params.trim_seq} " + "--outSAMstrandField intronMotif " + "--outFileNamePrefix {params.aln_seq_prefix} " + "--outSAMtype BAM SortedByCoordinate " + "--outSAMmultNmax 1 " + "--outFilterMismatchNmax 10 " + "--outReadsUnmapped Fastx " + "--readFilesCommand zcat " - Gene expression quantification with FeatureCounts
Running command
"featureCounts " + "-a {input.annot} " + "-o {output.fcount} " + "-g gene_name " + "-p -B -C " + "-s {params.strand} " + "--minOverlap 10 " + "-T {threads} " + "{input.bams} " - Differential Expression analysis with Deseq2 and edgeR.
For Thal vs HD comparison intra population (HSC or MPP - n = 2) we used the simple design with counts ~ condition, whereas for assessing differencies between Thal and HD in HSC and MPP togheter we included the celltype covariate : count ~ celltype + condition. Genes with an FDR < 0.05 or < 0.001 were considered DEGs for intra population comparisons and combined (HSC + MPP) with covariate respectively.
- Downstream analysis
We performed Over Representation Analyses (ORA) and Gene Set Enrichment Analyses (GSEA) using R/Bioconductor package clusterProfiler (v.3.8.1). Pre-ranked gene list according to log2FC was used to perform Gene Set Enrichment Analysis (GSEA). Terms with adjusted p-value (Benjiamini-Hochber correction) less than 0.05 were considered significantly enriched.