diff --git a/WES/README.md b/WES/README.md index 6266fafda08630e039a7d91466254c2f0545849e..adea06a903d60a04cfe3d904fd87f41b6cffd07a 100644 --- a/WES/README.md +++ b/WES/README.md @@ -2,6 +2,7 @@ **Whole Exome** data analysis - from fastq data to variants calling and annotation As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis. +Raw fastq data are available at the folling ENA repository. Below the main steps performed and the relative running commands: @@ -28,5 +29,54 @@ We perform alignment using both human and mouse reference genomes in order to pe # Variant calling # +For variant calling analysis we opted for Mutect2 algorithm (more sensitive for somatic variants identification) in tumor-only mode. +Below the workflow and commands used: + +``` +- gatk Mutect2 -I SampleID.disambiguatedSpeciesA.bam.sorted.bam -O Mutect2_raw_call_SampleID_exome_refseq_light.vcf.gz -R GRCh38.p13.genome.fa -L exome_light.bed -G StandardMutectAnnotation -OVI true -OVM true -ip 50 +- gatk FilterMutectCalls -R GRCh38.p13.genome.fa -V Mutect2_raw_call_SampleID_exome_refseq_light.vcf.gz -O Mutect2_raw_call_SampleID_exome_refseq_light_filtered.vcf.gz +- bcftools view -i 'FORMAT/DP>10' Mutect2_raw_call_SampleID_exome_refseq_light_filtered.vcf.gz -O z > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz +- bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t[%DP\t]\t[%AD\t]\t[%AF\t]\t[%GT\t]\n' Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_fields.txt +- bcftools view -f PASS Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz -O z > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS.vcf.gz +- bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t[%DP\t]\t[%AD\t]\t[%AF\t]\t[%GT\t]\n' Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS.vcf.gz > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS_fields.txt + +``` + +# Annotation # + +We implemented several layers of annotation using different datasets including [SNPeff](http://pcingola.github.io/SnpEff/) (v.86), [dbSNP](https://www.ncbi.nlm.nih.gov/snp/) (v.152), [dbNSFP4](http://database.liulab.science/dbNSFP) (4.0a for academic). + +``` +snpEff -v -csvStats ${sampleid}\_rawstats_canon.csv -s ${sampleid}\_rawstats_canon.html -canon GRCh38.86 Mutect2_raw_call_${sampleid}\_exome_refseq_light_filtered.vcf.gz > $sampleid\_raw_annot_refseq_light_filtered.vcf + +# dbSNP annotation + +SnpSift annotate $dbsnp ${sampleid}\_raw_annot_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_refseq_light_filtered.vcf +gatk IndexFeatureFile -I ${sampleid}\_dbsnp_refseq_light_filtered.vcf + +# Cosmic annotation (coding) + +SnpSift annotate $cosmic ${sampleid}\_dbsnp_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf +gatk IndexFeatureFile -I ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf + +# Cosmic annotation (non coding) + +SnpSift annotate $cosmicnc ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf +gatk IndexFeatureFile -I ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf + +# dbNSFP4 annotation + +SnpSift dbnsfp -db dbNSFP4.0a/dbNSFP4.0a.txt.gz -v ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_cnc_dbnsfp_refseq_light_filtered.vcf + +``` + +# Field extraction for AF comparison # + +We leverage SnpSift extractfield function for selecting fields of interest and performing Allele Fraction analysis. + +`SnpSift extractFields ${sampleid}_dbsnp_cc_cnc_dbnsfp_refseq_light_filtered.vcf CHROM POS REF ALT GEN[0].AF GEN[0].DP GEN[0].AD[0] GEN[0].AD[1] GEN[0].GT ID FILTER ANN[0].HGVS_P ANN[0].GENE ANN[0].BIOTYPE ANN[0].RANK ANN[0].EFFECT ANN[0].IMPACT COMMON G5 dbNSFP_ExAC_Adj_AF dbNSFP_1000Gp3_AF dbNSFP_ExAC_AF dbNSFP_phastCons100way_vertebrate dbNSFP_FATHMM_pred dbNSFP_GERP___RS dbNSFP_GERP___NR dbNSFP_CADD_phred dbNSFP_MetaSVM_pred dbNSFP_LRT_pred dbNSFP_PROVEAN_pred dbNSFP_MutationTaster_pred dbNSFP_MutationAssessor_pred dbNSFP_SIFT_pred dbNSFP_Polyphen2_HVAR_pred dbNSFP_Polyphen2_HDIV_pred > ${sampleid}_dbsnp_cc_cnc_dbnsfp_refseq_light_raw_filtered.fields.txt` + + +