Abstract
Whole exome sequencing (WES)-based assays undergo rigorous validation before being implemented in diagnostic laboratories. This validation process generates experimental evidence that allows laboratories to predict the performance of the intended assay. The NA12878 Genome in a Bottle (GIAB) HapMap reference sample is commonly used for validation in diagnostic laboratories. We investigated what data points should be taken into consideration when validating WES-based assays using the GIAB reference in a diagnostic setting. We delineate specific factors that require special consideration and identify OMIM genes associated with diseases that may ‘bypass’ validation. Four replicates of the NA12878 sample were sequenced at the CHEO Genetics Diagnostic Laboratory on a NextSeq 500; the data were analyzed using the bcbio_nexgen v1.1.2 pipeline. The hap.py validation engine, Real Time Genomics vcfeval tool, and high confidence (HC) variant calls in HC regions available for the GIAB sample were used to validate the obtained variant calls. The same validation process was then used to evaluate variant calls obtained for the same sample by two other clinical diagnostic laboratories. We showed that variant calls in NA12878 can be confidently measured only in the regions that intersect between the GIAB HC regions and the target regions of exome capture. Of the 4139 (as of October 2019) OMIM genes associated with a phenotype and having a known molecular basis of disease, 84 were fully outside of the GIAB HC regions and many of the remaining OMIM genes were only partially covered by the HC regions. A significant proportion of variants identified in the NA12878 sample outside of the HC regions have unknown (UNK) status due to the absence of HC reference alleles. Verification of such calls is possible either by an alternative truth set or by orthogonal testing. Similarly, many variants outside of exome capture regions, if not accounted for, will be deemed false negatives due to insufficient probe coverage. Our results demonstrate the importance of the intersection between genomic regions of interest, capture regions, and the high confidence regions. If not considered, false and ambiguous variant calls could have a negative impact on diagnostic accuracy of the intended WES-based diagnostic assay and increase the need for confirmatory testing. To enable laboratories to identify ‘problematic’ regions and optimize validation efforts, we have made our VCF and BED files available in UCSC Genome Browser: NA12878 WES Benchmark. Relevant genes and genome annotations are evolving, we implemented a general purpose algorithm to cross-reference OMIM genes with the genomic regions of interest that can be applied to capture genes/regions outside HC regions (see repository of data material section).
Similar content being viewed by others
Abbreviations
- WES:
-
Whole exome sequencing
- CHEO:
-
Children’s Hospital of Eastern Ontario
- ROI:
-
Region of interest
- IGV:
-
Integrative Genomics Viewer; https://software.broadinstitute.org/software/igv
- SRA:
-
Short read archive
- NCBI:
-
National Center for Biotechnology Information
- GIAB:
-
Genome in a bottle
- PG:
-
Platinum genomes
- NIST:
-
National Institute of Standards and Technology
- NA12878:
-
HapMap individual whose genome serves as reference materials
- RM:
-
Reference materials
- SNV:
-
Single nucleotide variant
- TP:
-
True positive
- FN:
-
False negative
- FP:
-
False positive
- NGS:
-
Next generation sequencing
- HC:
-
High confidence
- non-HC:
-
Non high confidence
- CRE:
-
Clinical research exome
- FDR:
-
False discovery rate
- GATK:
-
Genome analysis toolkit
- UCSC:
-
University of California, Santa Cruz
- VCF:
-
Variant call format
- BED:
-
Browser extensible definition
- ARUP:
-
Associated Regional and University Pathologists, Inc.—ARUP Laboratories and University of California
- UCSF:
-
University of California, San Francisco, Department of Laboratory Medicine
References
Afgan E, Baker D, Van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C, Grüning B (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44(W1):W3–10
Aziz N, Zhao Q, Bry L, Driscoll DK, Funke B, Gibson JS, Grody WW, Hegde MR, Hoeltge GA, Leonard DG, Merker JD (2014) College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. Arch Pathol Lab Med 139(4):481–493
Chapman B, Kirchner R, Pantano L, Khotiainsteva T, De Smet M, Beltrame L et al (2019) bcbio/bcbio-nextgen: v1.1.9 (Version v1.1.9). Zenodo. https://doi.org/10.5281/zenodo.3564939
Cleveland MH, Zook JM, Salit M, Vallone PM (2018) Determining performance metrics for targeted next-generation sequencing panels using reference materials. J Mol Diagn 20(5):583–590
Eberle MA, Fritzilas E, Krusche P, Källberg M, Moore BL, Bekritsky MA, Iqbal Z, Chuang HY, Humphray SJ, Halpern AL, Kruglyak S (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res 27(1):157–164
Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP, Hambuch T, Lu F, Lyon E, Voelkerding KV, Zehnbauer BA, Agarwala R (2012) Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 30(11):1033
Gargis AS, Kalman L, Bick DP, Da Silva C, Dimmock DP, Funke BH, Gowrisankar S, Hegde MR, Kulkarni S, Mason CE, Nagarajan R (2015) Good laboratory practice for clinical next-generation sequencing informatics pipelines. Nat Biotechnol 33(7):689
Gibson KM, Nesbitt A, Cao K, Yu Z, Denenberg E, DeChene E, Guan Q, Bhoj E, Zhou X, Zhang B, Wu C (2018) Novel findings with reassessment of exome data: implications for validation testing and interpretation of genomic data. Genet Med 20(3):329
Goldfeder RL, Priest JR, Zook JM, Grove ME, Waggott D, Wheeler MT, Salit M, Ashley EA (2016) Medical implications of technical accuracy in genome sequencing. Genome Med 8(1):24
Hegde M, Santani A, Mao R, Ferreira-Gonzalez A, Weck KE, Voelkerding KV (2017) Development and validation of clinical whole-exome and whole-genome sequencing for detection of germline variants in inherited disease. Arch Pathol Lab Med 141(6):798–805
Hwang S, Kim E, Lee I, Marcotte EM (2015) Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep 7(5):17875
Kalman LV, Datta V, Williams M, Zook JM, Salit ML, Han JY (2016) Development and characterization of reference materials for genetic testing: focus on public partnerships. Ann Lab Med 36(6):513–520
Krusche P, Trigg L, Boutros PC, Mason CE, Francisco M, Moore BL, Gonzalez-Porta M, Eberle MA, Tezak Z, Lababidi S, Truty R (2019) Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 37(5):555
Laurie S, Fernandez-Callejo M, Marco-Sola S, Trotta JR, Camps J, Chacón A, Espinosa A, Gut M, Gut I, Heath S, Beltran S (2016) From wet-lab to variations: concordance and speed of bioinformatics pipelines for whole genome and whole exome sequencing. Hum Mutat 37(12):1263–1271
Lelieveld SH, Spielmann M, Mundlos S, Veltman JA, Gilissen C (2015) Comparison of exome and genome sequencing technologies for the complete capture of protein-coding regions. Hum Mutat 36(8):815–822
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760
Lincoln SE, Truty R, Lin CF, Zook JM, Paul J, Ramey VH, Salit M, Rehm HL, Nussbaum RL, Lebo MS (2019) A rigorous interlaboratory examination of the need to confirm next-generation sequencing—detected variants with an orthogonal method in clinical genetic testing. J Mol Diagn 21(2):318–329
Linderman MD, Brandt T, Edelmann L, Jabado O, Kasai Y, Kornreich R, Mahajan M, Shah H, Kasarskis A, Schadt EE (2014) Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med Genom 7(1):20
Neph S, Reynolds AP, Kuehn MS, Stamatoyannopoulos JA (2016) Operating on genomic ranges using BEDOPS. In: Statistical genomics. Humana Press, New York, pp 267–281
Niazi R, Gonzalez MA, Balciuniene J, Evans P, Sarmady M, Tayoun AN (2018) The development and validation of clinical exome-based panels using exomeslicer: considerations and proof of concept using an epilepsy panel. J Mol Diagn 20(5):643–652
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM (2015) Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 7(6):235
Patwardhan A, Harris J, Leng N, Bartha G, Church DM, Luo S, Haudenschild C, Pratt M, Zook J, Salit M, Tirch J (2015) Achieving high-sensitivity for clinical applications using augmented exome sequencing. Genome Med 7(1):71
Pranckeviciene E, Potter R, Huang L, Jarinova O (2019) Validation of bcbio-nextgen pipeline based on NextSeq500 Exome sequencing. In: 2019 IEEE EMBS international conference on biomedical and health informatics (BHI). IEEE, pp 1–6
SoRelle JA, Wachsmann M, Cantarel BL (2020) Assembling and validating bioinformatic pipelines for next-generation sequencing clinical assays. Arch Pathol Lab Med. https://doi.org/10.5858/arpa.2019-0476-RA
Zook J, Salit M (2015) Genomic reference materials for clinical applications. In: Clinical genomics. Academic Press, Cambridge, pp 393–402
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, Francisco M (2019) An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 37(5):561
Acknowledgements
We thank Dr. Hussein Daoud from Illumina for advice on Illumina instrument software use with Agilent capture kit. Dr. Sergey Naumenko from Harvard Chan School of Public Health is greatly acknowledged for his help with the bcbio-nextgen pipeline configuration for WES data analysis. We thank anonymous reviewers for their helpful comments that helped to clarify interpretations of genomic regions used in validation.
Funding
Supported by the Innovation Fund of the Alternative Funding Plan for the Academic Health Sciences Centers of Ontario and the CHEO Genetics Diagnostic Laboratory operating funds.
Author information
Authors and Affiliations
Contributions
EP: Conceptual design, study design, data collection, computational analysis, and manuscript writing. LR, MG, and LN: Study design, data collection, computational analysis, and manuscript writing. RP ad ES-B: Sequencing and data collection. GM: Project management and coordination. AS: Conceptual design and manuscript writing. LB: Conceptual design, study design, and manuscript writing. LH and OJ: Conceptual design, study design, data collection, manuscript writing, and project management. LR, MG and LN contributed equally.
Corresponding authors
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethics approval
Not required (Continuous Quality Assurance study).
Consent for publication
Not applicable.
Repository of data material
The BED and VCF files supporting this study are available from respective web sites and from the zenodo.org as [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3597727. This data is browsable in the public session “NA12878 WES Benchmark” in UCSC Genome Browser. A Galaxy page presenting some use cases and complementary to this dataset titled "Procedure and datasets to cross-reference OMIM genes with the genomic regions of interest" is freely available to the users registered and logged onto the usegalaxy.org public Galaxy server (Afgan et al. 2016) through the Shared Data—> Pages https://usegalaxy.org/u/erinija/p/omim-genes-in-na12878-wes-benchmark. The list of 84 OMIM genes fully outside of GIAB HC regions is available in Supplementary Table 1 and the details of all validation results are presented in Supplementary Table 1.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Pranckeviciene, E., Racacho, L., Ghani, M. et al. Interplay between probe design and test performance: overlap between genomic regions of interest, capture regions and high quality reference calls influence performance of WES-based assays. Hum Genet 140, 289–297 (2021). https://doi.org/10.1007/s00439-020-02201-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-020-02201-y