## Introduction **Whole Exome** data analysis - from fastq data to variants calling and annotation As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis. Below the main steps performed and the relative running commands: # Alignment # Fastq files were aligned with [BWA aligner](https://github.com/lh3/bwa) (v0.7.17) to GRCh38 reference genome (GRCh38.p13 gencodegenes) using default parameters, except for the -M option for [Picard](https://broadinstitute.github.io/picard/) compatibility necessary for marking of duplicates. ``` - bwa mem -t 12 -R @RG\tID:SampleID_L1\tSM:SampleID PL:ILLUMINA -M GRCh38.p13.genome.fa 3_S8_L001_R1_001.fastq.gz 3_S8_L001_R2_001.fastq.gz` - picard SortSam INPUT=SampleID_L1.sam OUTPUT=BALL_12_27_1_L1_mouse.bam SORT_ORDER=coordinate - picard MergeSamFiles I=SampleID_L1_mouse.bam I=SampleID_L2_mouse.bam OUTPUT=SampleID_mouse.bam - samtools index SampleID_mouse.bam - picard MarkDuplicates INPUT=SampleID_mouse.bam OUTPUT=SampleID.dedup_reads_mouse.bam METRICS_FILE=SampleID.metrics_mouse.txt - gatk BaseRecalibrator --input SampleID.dedup_reads_mouse.bam --reference $genome --known-sites $vreference --output SampleID_recal_data_mouse.table - gatk ApplyBQSR --reference $genome --input SampleID.dedup_reads_mouse.bam --output SampleID.dedup_reads_mouse_recal.bam --bqsr-recal-file SampleID_recal_data_mouse.table --static-quantized-quals 10 --static-quantized-qual - gatk BaseRecalibrator --input SampleID.dedup_reads_mouse_recal.bam --reference $genome --known-sites $vreference --output SampleID_post_recal_data_mouse.table ``` ## Disambiguation ## We perform alignment using both human and mouse reference genomes in order to perform reads disambiguation using [disambiguate](https://pubmed.ncbi.nlm.nih.gov/27990269/) and discard reads from human mapping that belong to mouse cells.
Code `python disambiguate.py -a bwa -s Sample_10_395_3 02-Alignment_human/SampleID.dedup_reads_rehead_recal.bam 02-Alignment_mouse/SampleID.dedup_reads_mouse_recal.bam`
# Variant calling #