An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data

Abstract

Next generation sequencing (NGS) has traditionally been performed in various fields including agricultural to clinical and there are so many sequencing platforms available in order to obtain accurate and consistent results. However, these platforms showed amplification bias when facilitating variant calls in personal genomes. Here, we sequenced whole genomes and whole exomes from ten Korean individuals using Illumina and Ion Proton, respectively to find the vulnerability and accuracy of NGS platform in the GC rich/poor area. Overall, a total of 1013 Gb reads from Illumina and ~39.1 Gb reads from Ion Proton were analyzed using BWA-GATK variant calling pipeline. Furthermore, conjunction with the VQSR tool and detailed filtering strategies, we achieved high-quality variants. Finally, each of the ten variants from Illumina only, Ion Proton only, and intersection was selected for Sanger validation. The validation results revealed that Illumina platform showed higher accuracy than Ion Proton. The described filtering methods are advantageous for large population-based whole genome studies designed to identify common and rare variations associated with complex diseases.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Abyzov A, Li S, Kim DR, Mohiyuddin M, Stutz AM, Parrish NF, Mu XJ, Clark W, Chen K, Hurles M et al (2015) Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun 6:7256

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. Boland JF, Chung CC, Roberson D, Mitchell J, Zhang X, Im KM, He J, Chanock SJ, Yeager M, Dean M (2013) The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet 132:1153–1163

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712

    CAS  Article  PubMed  Google Scholar 

  4. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. Gonzaga-Jauregui C, Lupski JR, Gibbs RA (2012) Human genome sequencing in health and disease. Annu Rev Med 63:35–61

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ et al (2009) A highly annotated whole-genome sequence of a Korean individual. Nature 460:1011–1015

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. Merriman B, Ion Torrent R, Team D, Rothberg JM (2012) Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33:3397–3417

    CAS  Article  PubMed  Google Scholar 

  10. Motoike IN, Matsumoto M, Danjoh I, Katsuoka F, Kojima K, Nariai N, Sato Y, Yamaguchi-Kabata Y, Ito S, Kudo H et al (2014) Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population. BMC Genom 15:673

    Article  Google Scholar 

  11. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordonez GR, Bignell GR et al (2010) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463:191–196

    CAS  Article  PubMed  Google Scholar 

  12. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom 13:341

    CAS  Article  Google Scholar 

  13. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Rieber N, Zapatka M, Lasitschka B, Jones D, Northcott P, Hutter B, Jager N, Kool M, Taylor M, Lichter P et al (2013) Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS ONE 8:e66621

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Rusmini M, Federici S, Caroli F, Grossi A, Baldi M, Obici L, Insalaco A, Tommasini A, Caorsi R, Gallo E et al (2016) Next-generation sequencing and its initial applications for molecular diagnosis of systemic auto-inflammatory diseases. Ann Rheum Dis 75:1550–1557

    CAS  Article  PubMed  Google Scholar 

  16. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528

    CAS  Article  PubMed  Google Scholar 

  17. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Tsai AG, Lieber MR (2010) Mechanisms of chromosomal rearrangement in the human genome. BMC Genom 11(Suppl 1):S1

    Article  Google Scholar 

  19. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426

    Article  PubMed  Google Scholar 

  20. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J et al (2008) The diploid genome sequence of an Asian individual. Nature 456:60–65

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE et al (2007) A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80:91–104

    CAS  Article  PubMed  Google Scholar 

  22. Zhang G, Wang J, Yang J, Li W, Deng Y, Li J, Huang J, Hu S, Zhang B (2015) Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling. BMC Genom 16:581

    Article  Google Scholar 

  23. Zong C, Lu S, Chapman AR, Xie XS (2012) Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338:1622–1626

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kyudong Han.

Ethics declarations

Conflict of interest

Young Ju Ahn declares that he has no conflict of interest. Kesavan Markkandan declares that he has no conflict of interest. In-Pyo Baek declares that he has no conflict of interest. Seyoung Mun declares that he has no conflict of interest. Wooseok Lee declares that he has no conflict of interest. Heui-Soo Kim declares that he has no conflict of interest. Kyudong Han declares that he has no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ahn, Y.J., Markkandan, K., Baek, IP. et al. An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data. Genes Genom 40, 39–47 (2018). https://doi.org/10.1007/s13258-017-0608-6

Download citation

Keywords

  • Whole genome sequencing
  • Whole exome sequencing
  • Illumina
  • Ion Proton
  • Variant calling