README.md 2.34 KB

Bulk RNAseq data analysis workflow includes the following steps:

  1. Quality control by FastQC
  2. Trimming of bad quality reads with TrimGalore
  3. Alignment with STAR
    Running command "STAR " + "--runThreadN {threads} " + "--genomeDir {input.genome} " + "--readFilesIn {params.trim_seq} " + "--outSAMstrandField intronMotif " + "--outFileNamePrefix {params.aln_seq_prefix} " + "--outSAMtype BAM SortedByCoordinate " + "--outSAMmultNmax 1 " + "--outFilterMismatchNmax 10 " + "--outReadsUnmapped Fastx " + "--readFilesCommand zcat "
  4. Gene expression quantification with FeatureCounts
    Running command "featureCounts " + "-a {input.annot} " + "-o {output.fcount} " + "-g gene_name " + "-p -B -C " + "-s {params.strand} " + "--minOverlap 10 " + "-T {threads} " + "{input.bams} "
  5. Differential Expression analysis with Deseq2 and edgeR.

For Thal vs HD comparison intra population (HSC or MPP - n = 2) we used the simple design with counts ~ condition, whereas for assessing differencies between Thal and HD in HSC and MPP togheter we included the celltype covariate : count ~ celltype + condition. Genes with an FDR < 0.05 or < 0.001 were considered DEGs for intra population comparisons and combined (HSC + MPP) with covariate respectively.

  1. Downstream analysis

We performed Over Representation Analyses (ORA) and Gene Set Enrichment Analyses (GSEA) using R/Bioconductor package clusterProfiler (v.3.8.1). Pre-ranked gene list according to log2FC was used to perform Gene Set Enrichment Analysis (GSEA). Terms with adjusted p-value (Benjiamini-Hochber correction) less than 0.05 were considered significantly enriched.