ostaverdera Natcom2025

This repository includes essential scripts for producing final figures embedded in the following manuscript:  
**Costa Verdera, et al.,** _AAV vectors trigger DNA damage response-dependent pro-inflammatory signalling in human iPSC-derived CNS models and mouse brain_ 

PubMed: 
DOI:  
GEO:  GSE253824

---

The repository is divided in five folders, 1 bulk RNAseq analysis and 3 scRNAseq analyses and 1 snRNAseq analysis.  
Below a brief description of the bulk and scRNAseq workflows adopted in this work.

**Bulk RNAseq** analysis was performed using a standard pipeline that includes the follwing steps:
1. Quality control by [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
2. Trimming of bad quality reads with [TrimGalore](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)<details><summary>Running command</summary>trim_galore --quality 20 --fastqc --length 25 --output_dir {outdir} --paired {input.r1} {inout.r2}</details>
3. Alignment with [STAR](https://github.com/alexdobin/STAR)
    <details><summary>Running command</summary>
            "STAR " +
            "--runThreadN {threads} " +
            "--genomeDir {input.genome} " +
            "--readFilesIn {params.trim_seq} " +
            "--outSAMstrandField intronMotif " +
            "--outFileNamePrefix {params.aln_seq_prefix} " +
            "--outSAMtype BAM SortedByCoordinate " +
            "--outSAMmultNmax 1 " +
            "--outFilterMismatchNmax 10 " +
            "--outReadsUnmapped Fastx " +
            "--readFilesCommand zcat "
    </details>
4. Gene expression quantification with [featureCounts](https://academic.oup.com/bioinformatics/article/30/7/923/232889)
    <details><summary>Running command</summary>
            "featureCounts " +
            "-a {input.annot} " +
            "-o {output.fcount} " +
            "-g gene_name " +
            "-p -B -C " +
            "-s {params.strand} " +
            "--minOverlap 10 " +
            "-T {threads} " +
            "{input.bams} "
    </details>
5. Differential Expression analysis with [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html).  
    For Differential Gene Expression analysis we followed the standard workflow provided by package.  
   
6. Dowstream functional Analysis with [ClusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html).  
    In order to retrieve functional annotation from DE analysis, we performed **G**ene **S**et **E**nrichment **A**nalysis by using the _GSEA_ function provided by the package.  
    **GSEA** analysis was performed in particular using the Hallmark gene set from the MSigDB database (version 7.2)

---
    
**scRNAseq** analysis was performed using a standard pipeline that includes the following steps:

scRNAseq analysis was performed with [Seurat](https://satijalab.org/seurat/). Below are the main steps of the basic data analysis workflow that start from a minimal object after loading of 10X data to markers identification:  

1. Quality control and filtering
2. Cell cycle scoring
3. Normalization (default seurat settings)
4. Scaling (with following variables to regress out: percent.mt + nCount_RNA and CC.Difference calculated as show in [vignette](https://satijalab.org/seurat/articles/cell_cycle_vignette.html#alternate-workflow-1))
5. Dimensionality reduction: PCA
6. Clustering
7. Markers identification

- [7.1] Clusters related markers
- [7.2] Intracluster differential expression analysis according to comparison of interest

---