Skip to main content

Advertisement

Log in

Interplay between probe design and test performance: overlap between genomic regions of interest, capture regions and high quality reference calls influence performance of WES-based assays

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Whole exome sequencing (WES)-based assays undergo rigorous validation before being implemented in diagnostic laboratories. This validation process generates experimental evidence that allows laboratories to predict the performance of the intended assay. The NA12878 Genome in a Bottle (GIAB) HapMap reference sample is commonly used for validation in diagnostic laboratories. We investigated what data points should be taken into consideration when validating WES-based assays using the GIAB reference in a diagnostic setting. We delineate specific factors that require special consideration and identify OMIM genes associated with diseases that may ‘bypass’ validation. Four replicates of the NA12878 sample were sequenced at the CHEO Genetics Diagnostic Laboratory on a NextSeq 500; the data were analyzed using the bcbio_nexgen v1.1.2 pipeline. The hap.py validation engine, Real Time Genomics vcfeval tool, and high confidence (HC) variant calls in HC regions available for the GIAB sample were used to validate the obtained variant calls. The same validation process was then used to evaluate variant calls obtained for the same sample by two other clinical diagnostic laboratories. We showed that variant calls in NA12878 can be confidently measured only in the regions that intersect between the GIAB HC regions and the target regions of exome capture. Of the 4139 (as of October 2019) OMIM genes associated with a phenotype and having a known molecular basis of disease, 84 were fully outside of the GIAB HC regions and many of the remaining OMIM genes were only partially covered by the HC regions. A significant proportion of variants identified in the NA12878 sample outside of the HC regions have unknown (UNK) status due to the absence of HC reference alleles. Verification of such calls is possible either by an alternative truth set or by orthogonal testing. Similarly, many variants outside of exome capture regions, if not accounted for, will be deemed false negatives due to insufficient probe coverage. Our results demonstrate the importance of the intersection between genomic regions of interest, capture regions, and the high confidence regions. If not considered, false and ambiguous variant calls could have a negative impact on diagnostic accuracy of the intended WES-based diagnostic assay and increase the need for confirmatory testing. To enable laboratories to identify ‘problematic’ regions and optimize validation efforts, we have made our VCF and BED files available in UCSC Genome Browser: NA12878 WES Benchmark. Relevant genes and genome annotations are evolving, we implemented a general purpose algorithm to cross-reference OMIM genes with the genomic regions of interest that can be applied to capture genes/regions outside HC regions (see repository of data material section).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Abbreviations

WES:

Whole exome sequencing

CHEO:

Children’s Hospital of Eastern Ontario

ROI:

Region of interest

IGV:

Integrative Genomics Viewer; https://software.broadinstitute.org/software/igv

SRA:

Short read archive

NCBI:

National Center for Biotechnology Information

GIAB:

Genome in a bottle

PG:

Platinum genomes

NIST:

National Institute of Standards and Technology

NA12878:

HapMap individual whose genome serves as reference materials

RM:

Reference materials

SNV:

Single nucleotide variant

TP:

True positive

FN:

False negative

FP:

False positive

NGS:

Next generation sequencing

HC:

High confidence

non-HC:

Non high confidence

CRE:

Clinical research exome

FDR:

False discovery rate

GATK:

Genome analysis toolkit

UCSC:

University of California, Santa Cruz

VCF:

Variant call format

BED:

Browser extensible definition

ARUP:

Associated Regional and University Pathologists, Inc.—ARUP Laboratories and University of California

UCSF:

University of California, San Francisco, Department of Laboratory Medicine

References

  • Afgan E, Baker D, Van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C, Grüning B (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44(W1):W3–10

    Article  CAS  Google Scholar 

  • Aziz N, Zhao Q, Bry L, Driscoll DK, Funke B, Gibson JS, Grody WW, Hegde MR, Hoeltge GA, Leonard DG, Merker JD (2014) College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. Arch Pathol Lab Med 139(4):481–493

    Article  Google Scholar 

  • Chapman B, Kirchner R, Pantano L, Khotiainsteva T, De Smet M, Beltrame L et al (2019) bcbio/bcbio-nextgen: v1.1.9 (Version v1.1.9). Zenodo. https://doi.org/10.5281/zenodo.3564939

  • Cleveland MH, Zook JM, Salit M, Vallone PM (2018) Determining performance metrics for targeted next-generation sequencing panels using reference materials. J Mol Diagn 20(5):583–590

    Article  CAS  Google Scholar 

  • Eberle MA, Fritzilas E, Krusche P, Källberg M, Moore BL, Bekritsky MA, Iqbal Z, Chuang HY, Humphray SJ, Halpern AL, Kruglyak S (2017) A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res 27(1):157–164

    Article  CAS  Google Scholar 

  • Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP, Hambuch T, Lu F, Lyon E, Voelkerding KV, Zehnbauer BA, Agarwala R (2012) Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 30(11):1033

    Article  CAS  Google Scholar 

  • Gargis AS, Kalman L, Bick DP, Da Silva C, Dimmock DP, Funke BH, Gowrisankar S, Hegde MR, Kulkarni S, Mason CE, Nagarajan R (2015) Good laboratory practice for clinical next-generation sequencing informatics pipelines. Nat Biotechnol 33(7):689

    Article  CAS  Google Scholar 

  • Gibson KM, Nesbitt A, Cao K, Yu Z, Denenberg E, DeChene E, Guan Q, Bhoj E, Zhou X, Zhang B, Wu C (2018) Novel findings with reassessment of exome data: implications for validation testing and interpretation of genomic data. Genet Med 20(3):329

    Article  Google Scholar 

  • Goldfeder RL, Priest JR, Zook JM, Grove ME, Waggott D, Wheeler MT, Salit M, Ashley EA (2016) Medical implications of technical accuracy in genome sequencing. Genome Med 8(1):24

    Article  Google Scholar 

  • Hegde M, Santani A, Mao R, Ferreira-Gonzalez A, Weck KE, Voelkerding KV (2017) Development and validation of clinical whole-exome and whole-genome sequencing for detection of germline variants in inherited disease. Arch Pathol Lab Med 141(6):798–805

    Article  CAS  Google Scholar 

  • Hwang S, Kim E, Lee I, Marcotte EM (2015) Systematic comparison of variant calling pipelines using gold standard personal exome variants. Sci Rep 7(5):17875

    Article  Google Scholar 

  • Kalman LV, Datta V, Williams M, Zook JM, Salit ML, Han JY (2016) Development and characterization of reference materials for genetic testing: focus on public partnerships. Ann Lab Med 36(6):513–520

    Article  CAS  Google Scholar 

  • Krusche P, Trigg L, Boutros PC, Mason CE, Francisco M, Moore BL, Gonzalez-Porta M, Eberle MA, Tezak Z, Lababidi S, Truty R (2019) Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 37(5):555

    Article  CAS  Google Scholar 

  • Laurie S, Fernandez-Callejo M, Marco-Sola S, Trotta JR, Camps J, Chacón A, Espinosa A, Gut M, Gut I, Heath S, Beltran S (2016) From wet-lab to variations: concordance and speed of bioinformatics pipelines for whole genome and whole exome sequencing. Hum Mutat 37(12):1263–1271

    Article  CAS  Google Scholar 

  • Lelieveld SH, Spielmann M, Mundlos S, Veltman JA, Gilissen C (2015) Comparison of exome and genome sequencing technologies for the complete capture of protein-coding regions. Hum Mutat 36(8):815–822

    Article  CAS  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760

    Article  CAS  Google Scholar 

  • Lincoln SE, Truty R, Lin CF, Zook JM, Paul J, Ramey VH, Salit M, Rehm HL, Nussbaum RL, Lebo MS (2019) A rigorous interlaboratory examination of the need to confirm next-generation sequencing—detected variants with an orthogonal method in clinical genetic testing. J Mol Diagn 21(2):318–329

    Article  CAS  Google Scholar 

  • Linderman MD, Brandt T, Edelmann L, Jabado O, Kasai Y, Kornreich R, Mahajan M, Shah H, Kasarskis A, Schadt EE (2014) Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med Genom 7(1):20

    Article  Google Scholar 

  • Neph S, Reynolds AP, Kuehn MS, Stamatoyannopoulos JA (2016) Operating on genomic ranges using BEDOPS. In: Statistical genomics. Humana Press, New York, pp 267–281

  • Niazi R, Gonzalez MA, Balciuniene J, Evans P, Sarmady M, Tayoun AN (2018) The development and validation of clinical exome-based panels using exomeslicer: considerations and proof of concept using an epilepsy panel. J Mol Diagn 20(5):643–652

    Article  CAS  Google Scholar 

  • Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM (2015) Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 7(6):235

    Google Scholar 

  • Patwardhan A, Harris J, Leng N, Bartha G, Church DM, Luo S, Haudenschild C, Pratt M, Zook J, Salit M, Tirch J (2015) Achieving high-sensitivity for clinical applications using augmented exome sequencing. Genome Med 7(1):71

    Article  Google Scholar 

  • Pranckeviciene E, Potter R, Huang L, Jarinova O (2019) Validation of bcbio-nextgen pipeline based on NextSeq500 Exome sequencing. In: 2019 IEEE EMBS international conference on biomedical and health informatics (BHI). IEEE, pp 1–6

  • SoRelle JA, Wachsmann M, Cantarel BL (2020) Assembling and validating bioinformatic pipelines for next-generation sequencing clinical assays. Arch Pathol Lab Med. https://doi.org/10.5858/arpa.2019-0476-RA

    Article  PubMed  Google Scholar 

  • Zook J, Salit M (2015) Genomic reference materials for clinical applications. In: Clinical genomics. Academic Press, Cambridge, pp 393–402

  • Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, Francisco M (2019) An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 37(5):561

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank Dr. Hussein Daoud from Illumina for advice on Illumina instrument software use with Agilent capture kit. Dr. Sergey Naumenko from Harvard Chan School of Public Health is greatly acknowledged for his help with the bcbio-nextgen pipeline configuration for WES data analysis. We thank anonymous reviewers for their helpful comments that helped to clarify interpretations of genomic regions used in validation.

Funding

Supported by the Innovation Fund of the Alternative Funding Plan for the Academic Health Sciences Centers of Ontario and the CHEO Genetics Diagnostic Laboratory operating funds.

Author information

Authors and Affiliations

Authors

Contributions

EP: Conceptual design, study design, data collection, computational analysis, and manuscript writing. LR, MG, and LN: Study design, data collection, computational analysis, and manuscript writing. RP ad ES-B: Sequencing and data collection. GM: Project management and coordination. AS: Conceptual design and manuscript writing. LB: Conceptual design, study design, and manuscript writing. LH and OJ: Conceptual design, study design, data collection, manuscript writing, and project management. LR, MG and LN contributed equally.

Corresponding authors

Correspondence to Erinija Pranckeviciene or Olga Jarinova.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics approval

Not required (Continuous Quality Assurance study).

Consent for publication

Not applicable.

Repository of data material

The BED and VCF files supporting this study are available from respective web sites and from the zenodo.org as [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3597727. This data is browsable in the public session “NA12878 WES Benchmark” in UCSC Genome Browser. A Galaxy page presenting some use cases and complementary to this dataset titled "Procedure and datasets to cross-reference OMIM genes with the genomic regions of interest" is freely available to the users registered and logged onto the usegalaxy.org public Galaxy server (Afgan et al. 2016) through the Shared Data—> Pages https://usegalaxy.org/u/erinija/p/omim-genes-in-na12878-wes-benchmark. The list of 84 OMIM genes fully outside of GIAB HC regions is available in Supplementary Table 1 and the details of all validation results are presented in Supplementary Table 1.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 17 kb)

Supplementary file2 (XLSX 24 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pranckeviciene, E., Racacho, L., Ghani, M. et al. Interplay between probe design and test performance: overlap between genomic regions of interest, capture regions and high quality reference calls influence performance of WES-based assays. Hum Genet 140, 289–297 (2021). https://doi.org/10.1007/s00439-020-02201-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-020-02201-y

Navigation