README.md 2.08 KB
Newer Older
Matteo Barcella's avatar
Matteo Barcella committed
1
2
## Introduction

Matteo Barcella's avatar
Matteo Barcella committed
3
**Whole Exome** data analysis - from fastq data to variants calling and annotation
Matteo Barcella's avatar
Matteo Barcella committed
4
5
6
7
8
9
10
11
As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis.

Below the main steps performed and the relative running commands:

# Alignment #

Fastq files were aligned with [BWA aligner](https://github.com/lh3/bwa) (v0.7.17) to GRCh38 reference genome (GRCh38.p13 gencodegenes) using default parameters, except for the -M option for [Picard](https://broadinstitute.github.io/picard/) compatibility necessary for marking of duplicates.  

Matteo Barcella's avatar
Matteo Barcella committed
12
```
Matteo Barcella's avatar
Matteo Barcella committed
13
14
15
16
17
18
19
20
- bwa mem -t 12 -R @RG\tID:SampleID_L1\tSM:SampleID  PL:ILLUMINA -M GRCh38.p13.genome.fa 3_S8_L001_R1_001.fastq.gz 3_S8_L001_R2_001.fastq.gz` 
- picard SortSam INPUT=SampleID_L1.sam OUTPUT=BALL_12_27_1_L1_mouse.bam SORT_ORDER=coordinate
- picard MergeSamFiles I=SampleID_L1_mouse.bam I=SampleID_L2_mouse.bam OUTPUT=SampleID_mouse.bam
- samtools index SampleID_mouse.bam
- picard MarkDuplicates INPUT=SampleID_mouse.bam OUTPUT=SampleID.dedup_reads_mouse.bam METRICS_FILE=SampleID.metrics_mouse.txt
- gatk BaseRecalibrator --input SampleID.dedup_reads_mouse.bam --reference $genome --known-sites $vreference --output SampleID_recal_data_mouse.table
- gatk ApplyBQSR --reference $genome --input SampleID.dedup_reads_mouse.bam --output SampleID.dedup_reads_mouse_recal.bam --bqsr-recal-file SampleID_recal_data_mouse.table --static-quantized-quals 10 --static-quantized-qual
- gatk BaseRecalibrator --input SampleID.dedup_reads_mouse_recal.bam --reference $genome --known-sites $vreference --output SampleID_post_recal_data_mouse.table
Matteo Barcella's avatar
Matteo Barcella committed
21
```
Matteo Barcella's avatar
Matteo Barcella committed
22

Matteo Barcella's avatar
Matteo Barcella committed
23

Matteo Barcella's avatar
Matteo Barcella committed
24
25
26
27
28
## Disambiguation ##

We perform alignment using both human and mouse reference genomes in order to perform reads disambiguation using [disambiguate](https://pubmed.ncbi.nlm.nih.gov/27990269/) and discard reads from human mapping that belong to mouse cells.  

<details><summary>Code</summary>
Matteo Barcella's avatar
Matteo Barcella committed
29
`python disambiguate.py -a bwa -s Sample_10_395_3 02-Alignment_human/SampleID.dedup_reads_rehead_recal.bam 02-Alignment_mouse/SampleID.dedup_reads_mouse_recal.bam`
Matteo Barcella's avatar
Matteo Barcella committed
30
31
32
33
</details>

# Variant calling #

Matteo Barcella's avatar
Matteo Barcella committed
34
35