## Introduction **Whole Exome** data analysis - from fastq data to variants calling and annotation As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis. Below the main steps performed and the relative running commands: # Alignment # Fastq files were aligned with [BWA aligner](https://github.com/lh3/bwa) (v0.7.17) to GRCh38 reference genome (GRCh38.p13 gencodegenes) using default parameters, except for the -M option for [Picard](https://broadinstitute.github.io/picard/) compatibility necessary for marking of duplicates.
Code ``` bwa mem -t 12 -R @RG\tID:BALL_10_395_3_L1\tSM:BALL_10_395_3 PL:ILLUMINA -M GRCh38.p13.genome.fa 3_S8_L001_R1_001.fastq.gz 3_S8_L001_R2_001.fastq.gz` picard SortSam INPUT=BALL_10_395_3_L1.sam OUTPUT=BALL_12_27_1_L1_mouse.bam SORT_ORDER=coordinate picard MergeSamFiles I=BALL_10_395_3_L1_mouse.bam I=BALL_10_395_3_L2_mouse.bam OUTPUT=BALL_10_395_3_mouse.bam samtools index BALL_10_395_3_mouse.bam picard MarkDuplicates INPUT=BALL_10_395_3_mouse.bam OUTPUT=BALL_10_395_3.dedup_reads_mouse.bam METRICS_FILE=BALL_10_395_3.metrics_mouse.txt gatk BaseRecalibrator --input BALL_10_395_3.dedup_reads_mouse.bam --reference $genome --known-sites $vreference --output BALL_10_395_3_recal_data_mouse.table gatk ApplyBQSR --reference $genome --input BALL_10_395_3.dedup_reads_mouse.bam --output BALL_10_395_3.dedup_reads_mouse_recal.bam --bqsr-recal-file BALL_10_395_3_recal_data_mouse.table --static-quantized-quals 10 --static-quantized-qual gatk BaseRecalibrator --input BALL_10_395_3.dedup_reads_mouse_recal.bam --reference $genome --known-sites $vreference --output BALL_10_395_3_post_recal_data_mouse.table ```
## Disambiguation ## We perform alignment using both human and mouse reference genomes in order to perform reads disambiguation using [disambiguate](https://pubmed.ncbi.nlm.nih.gov/27990269/) and discard reads from human mapping that belong to mouse cells.
Code `python disambiguate.py -a bwa -s Sample_10_395_3 02-Alignment_human/BALL_10_395_3.dedup_reads_rehead_recal.bam 02-Alignment_mouse/BALL_10_395_3.dedup_reads_mouse_recal.bam`
# Variant calling #