Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
custom
casertanucera_leukemia2023
Commits
294c2e0c
Commit
294c2e0c
authored
Aug 04, 2023
by
Matteo Barcella
Browse files
Update README.md
parent
a3bee2ce
Changes
1
Hide whitespace changes
Inline
Side-by-side
WES/README.md
View file @
294c2e0c
...
@@ -2,6 +2,7 @@
...
@@ -2,6 +2,7 @@
**Whole Exome**
data analysis - from fastq data to variants calling and annotation
**Whole Exome**
data analysis - from fastq data to variants calling and annotation
As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis.
As stated in supplementary methods we performed WES analysis relying mostly on the GATK best practices for WES analysis.
Raw fastq data are available at the folling ENA repository.
Below the main steps performed and the relative running commands:
Below the main steps performed and the relative running commands:
...
@@ -28,5 +29,54 @@ We perform alignment using both human and mouse reference genomes in order to pe
...
@@ -28,5 +29,54 @@ We perform alignment using both human and mouse reference genomes in order to pe
# Variant calling #
# Variant calling #
For variant calling analysis we opted for Mutect2 algorithm (more sensitive for somatic variants identification) in tumor-only mode.
Below the workflow and commands used:
```
- gatk Mutect2 -I SampleID.disambiguatedSpeciesA.bam.sorted.bam -O Mutect2_raw_call_SampleID_exome_refseq_light.vcf.gz -R GRCh38.p13.genome.fa -L exome_light.bed -G StandardMutectAnnotation -OVI true -OVM true -ip 50
- gatk FilterMutectCalls -R GRCh38.p13.genome.fa -V Mutect2_raw_call_SampleID_exome_refseq_light.vcf.gz -O Mutect2_raw_call_SampleID_exome_refseq_light_filtered.vcf.gz
- bcftools view -i 'FORMAT/DP>10' Mutect2_raw_call_SampleID_exome_refseq_light_filtered.vcf.gz -O z > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz
- bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t[%DP\t]\t[%AD\t]\t[%AF\t]\t[%GT\t]\n' Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_fields.txt
- bcftools view -f PASS Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10.vcf.gz -O z > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS.vcf.gz
- bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t[%DP\t]\t[%AD\t]\t[%AF\t]\t[%GT\t]\n' Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS.vcf.gz > Mutect2_raw_call_SampleID_exome_refseq_light_filtered_DP10_PASS_fields.txt
```
# Annotation #
We implemented several layers of annotation using different datasets including
[
SNPeff
](
http://pcingola.github.io/SnpEff/
)
(
v.86
)
,
[
dbSNP
](
https://www.ncbi.nlm.nih.gov/snp/
)
(
v.152
)
,
[
dbNSFP4
](
http://database.liulab.science/dbNSFP
)
(
4.0a
for academic).
```
snpEff -v -csvStats ${sampleid}\_rawstats_canon.csv -s ${sampleid}\_rawstats_canon.html -canon GRCh38.86 Mutect2_raw_call_${sampleid}\_exome_refseq_light_filtered.vcf.gz > $sampleid\_raw_annot_refseq_light_filtered.vcf
# dbSNP annotation
SnpSift annotate $dbsnp ${sampleid}\_raw_annot_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_refseq_light_filtered.vcf
gatk IndexFeatureFile -I ${sampleid}\_dbsnp_refseq_light_filtered.vcf
# Cosmic annotation (coding)
SnpSift annotate $cosmic ${sampleid}\_dbsnp_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf
gatk IndexFeatureFile -I ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf
# Cosmic annotation (non coding)
SnpSift annotate $cosmicnc ${sampleid}\_dbsnp_cc_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf
gatk IndexFeatureFile -I ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf
# dbNSFP4 annotation
SnpSift dbnsfp -db dbNSFP4.0a/dbNSFP4.0a.txt.gz -v ${sampleid}\_dbsnp_cc_cnc_refseq_light_filtered.vcf > ${sampleid}\_dbsnp_cc_cnc_dbnsfp_refseq_light_filtered.vcf
```
# Field extraction for AF comparison #
We leverage SnpSift extractfield function for selecting fields of interest and performing Allele Fraction analysis.
`SnpSift extractFields ${sampleid}_dbsnp_cc_cnc_dbnsfp_refseq_light_filtered.vcf CHROM POS REF ALT GEN[0].AF GEN[0].DP GEN[0].AD[0] GEN[0].AD[1] GEN[0].GT ID FILTER ANN[0].HGVS_P ANN[0].GENE ANN[0].BIOTYPE ANN[0].RANK ANN[0].EFFECT ANN[0].IMPACT COMMON G5 dbNSFP_ExAC_Adj_AF dbNSFP_1000Gp3_AF dbNSFP_ExAC_AF dbNSFP_phastCons100way_vertebrate dbNSFP_FATHMM_pred dbNSFP_GERP___RS dbNSFP_GERP___NR dbNSFP_CADD_phred dbNSFP_MetaSVM_pred dbNSFP_LRT_pred dbNSFP_PROVEAN_pred dbNSFP_MutationTaster_pred dbNSFP_MutationAssessor_pred dbNSFP_SIFT_pred dbNSFP_Polyphen2_HVAR_pred dbNSFP_Polyphen2_HDIV_pred > ${sampleid}_dbsnp_cc_cnc_dbnsfp_refseq_light_raw_filtered.fields.txt`
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment