@@ -20,7 +20,7 @@ WES data were analyzed following the GATK "Best Practice Workflows".
- Alignment: reads were aligned to the human genome assembly (GRCh38) using the _BWA_ and alignments were processed to mark duplicates using _Picard MarkDuplicates_
- GATK: _BaseRecalibrator_ + _ApplyBQSR_ were used to recalibrate base quality scores on dbSNP known sites
- Variant Calling: _HaplotypeCaller_ in GVCF mode was used to call variants in each sample, which were then combined using _CombineGVCFs_ and genotyped with _GenotypeGVCFs_
- Variant Filter: variants were filtered using VariantFiltration based on their 'QualityByDepth (i.e., `--filter-expression 'QD < 2.0'`) and overall coverage 'DP' (i.e., `--filter-expression 'DP < 50'`), while to identify private variants belonging to each sample additional filters were applied, i.e., variants having low genotype quality (i.e., GQ < 80) and low coverage (i.e., DP < 100 stringent and DP < 10 relaxed, respectively) were removed
- Variant Filter: variants were filtered using VariantFiltration based on their 'QualityByDepth (i.e., `--filter-expression 'QD < 2.0'`) and overall coverage 'DP' (i.e., `--filter-expression 'DP < 500'`), while to identify private variants belonging to each sample additional filters were applied, i.e., variants having low genotype quality (i.e., GQ < 80) and low coverage (i.e., DP < 50 and DP < 10) were removed
- Variant Post-process: "Mock electro" in vitro sample for each experiment was used as germline reference, and its variants were filtered out from all other samples, as well as the multi-allelic ones (mainly involving repetitive sequences)
- Variant Annotation: variants were annotated using _SnpEff_ on the canonical isoform from the GRCh38.p13.RefSeq reference database