README.md 2.99 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## Introduction. 

BulkRNAseq analysis was performed both for single-diseases separately and the by combining all samples in an integrated analysis (GFP_L vs GFP_H).  
For the latter, the starting point of the analysis is the genes expression counts matrix deposited at [GSE236138](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE236138).    
Moreover, for the second dataset present at GEO id [GSE236141](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE236141), we performed an additional analysis focus on interaction analysis between condition and treatment.   

## Workflow and steps. 

Below the most important steps:  
1. Quality control by [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
2. Trimming of bad quality reads with [TrimGalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)<details><summary>Running command</summary>trim_galore --quality 20 --fastqc --length 25 --output_dir {outdir} --paired {input.r1} {inout.r2}</details>
3. Alignment with [STAR](https://github.com/alexdobin/STAR)
    <details><summary>Running command</summary>
            "STAR " +
            "--runThreadN {threads} " +
            "--genomeDir {input.genome} " +
            "--readFilesIn {params.trim_seq} " +
            "--outSAMstrandField intronMotif " +
            "--outFileNamePrefix {params.aln_seq_prefix} " +
            "--outSAMtype BAM SortedByCoordinate " +
            "--outSAMmultNmax 1 " +
            "--outFilterMismatchNmax 10 " +
            "--outReadsUnmapped Fastx " +
            "--readFilesCommand zcat "
    </details>
4. Gene expression quantification with [FeatureCounts](https://academic.oup.com/bioinformatics/article/30/7/923/232889)
    <details><summary>Running command</summary>
            "featureCounts " +
            "-a {input.annot} " +
            "-o {output.fcount} " +
            "-g gene_name " +
            "-p -B -C " +
            "-s {params.strand} " +
            "--minOverlap 10 " +
            "-T {threads} " +
            "{input.bams} "
    </details>
5. Differential Expression analysis with [Deseq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html).  
    For Differential Gene Expression analysis we followed the standard workflow provided by package.  
    <details><summary>Detail</summary>
    results(DESeq.ds, pAdjustMethod = "BH", independentFiltering = TRUE, contrast = c("groups", Group1, Group2), alpha = 0.05)
    </details>
    For interaction analysis we apply the design according to Deseq2 vignette:  
    <details><summary>Interaction</summary>
    design = as.formula("~ Condition + Treatment + Condition:Treatment")
    </details>
6. Dowstream functional Analysis with [ClusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html).  
    In order to retrieve functional annotation from DE analysis, we performed **O**ver **R**epresentation **A**nalysis and **G**ene **S**et        **E**nrichment **A**nalysis by using the functions EnrichGO and GSEA provided by the package.