README.md 2.16 KB
Newer Older
Matteo Barcella's avatar
Matteo Barcella committed
1
2
## Introduction

Matteo Barcella's avatar
Matteo Barcella committed
3
**Whole Exome** data analysis - from fastq data to variants calling and annotation
Matteo Barcella's avatar
Matteo Barcella committed
4
5
6
7
8
9
10
11
As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis.

Below the main steps performed and the relative running commands:

# Alignment #

Fastq files were aligned with [BWA aligner](https://github.com/lh3/bwa) (v0.7.17) to GRCh38 reference genome (GRCh38.p13 gencodegenes) using default parameters, except for the -M option for [Picard](https://broadinstitute.github.io/picard/) compatibility necessary for marking of duplicates.  

Matteo Barcella's avatar
Matteo Barcella committed
12
13
```
bwa mem -t 12 -R @RG\tID:BALL_10_395_3_L1\tSM:BALL_10_395_3  PL:ILLUMINA -M GRCh38.p13.genome.fa 3_S8_L001_R1_001.fastq.gz 3_S8_L001_R2_001.fastq.gz` 
Matteo Barcella's avatar
Matteo Barcella committed
14
15
16
17
18
19
picard SortSam INPUT=BALL_10_395_3_L1.sam OUTPUT=BALL_12_27_1_L1_mouse.bam SORT_ORDER=coordinate
picard MergeSamFiles I=BALL_10_395_3_L1_mouse.bam I=BALL_10_395_3_L2_mouse.bam OUTPUT=BALL_10_395_3_mouse.bam
samtools index BALL_10_395_3_mouse.bam
picard MarkDuplicates INPUT=BALL_10_395_3_mouse.bam OUTPUT=BALL_10_395_3.dedup_reads_mouse.bam METRICS_FILE=BALL_10_395_3.metrics_mouse.txt
gatk BaseRecalibrator --input BALL_10_395_3.dedup_reads_mouse.bam --reference $genome --known-sites $vreference --output BALL_10_395_3_recal_data_mouse.table
gatk ApplyBQSR --reference $genome --input BALL_10_395_3.dedup_reads_mouse.bam --output BALL_10_395_3.dedup_reads_mouse_recal.bam --bqsr-recal-file BALL_10_395_3_recal_data_mouse.table --static-quantized-quals 10 --static-quantized-qual
Matteo Barcella's avatar
Matteo Barcella committed
20
gatk BaseRecalibrator --input BALL_10_395_3.dedup_reads_mouse_recal.bam --reference $genome --known-sites $vreference --output BALL_10_395_3_post_recal_data_mouse.table
Matteo Barcella's avatar
Matteo Barcella committed
21
```
Matteo Barcella's avatar
Matteo Barcella committed
22

Matteo Barcella's avatar
Matteo Barcella committed
23

Matteo Barcella's avatar
Matteo Barcella committed
24
25
26
27
28
29
30
31
32
33
## Disambiguation ##

We perform alignment using both human and mouse reference genomes in order to perform reads disambiguation using [disambiguate](https://pubmed.ncbi.nlm.nih.gov/27990269/) and discard reads from human mapping that belong to mouse cells.  

<details><summary>Code</summary>
`python disambiguate.py -a bwa -s Sample_10_395_3 02-Alignment_human/BALL_10_395_3.dedup_reads_rehead_recal.bam 02-Alignment_mouse/BALL_10_395_3.dedup_reads_mouse_recal.bam`
</details>

# Variant calling #

Matteo Barcella's avatar
Matteo Barcella committed
34
35