Genes & Genomics

, Volume 40, Issue 1, pp 39–47 | Cite as

An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data

  • Yong Ju Ahn
  • Kesavan Markkandan
  • In-Pyo Baek
  • Seyoung Mun
  • Wooseok Lee
  • Heui-Soo Kim
  • Kyudong Han
Research Article
  • 76 Downloads

Abstract

Next generation sequencing (NGS) has traditionally been performed in various fields including agricultural to clinical and there are so many sequencing platforms available in order to obtain accurate and consistent results. However, these platforms showed amplification bias when facilitating variant calls in personal genomes. Here, we sequenced whole genomes and whole exomes from ten Korean individuals using Illumina and Ion Proton, respectively to find the vulnerability and accuracy of NGS platform in the GC rich/poor area. Overall, a total of 1013 Gb reads from Illumina and ~39.1 Gb reads from Ion Proton were analyzed using BWA-GATK variant calling pipeline. Furthermore, conjunction with the VQSR tool and detailed filtering strategies, we achieved high-quality variants. Finally, each of the ten variants from Illumina only, Ion Proton only, and intersection was selected for Sanger validation. The validation results revealed that Illumina platform showed higher accuracy than Ion Proton. The described filtering methods are advantageous for large population-based whole genome studies designed to identify common and rare variations associated with complex diseases.

Keywords

Whole genome sequencing Whole exome sequencing Illumina Ion Proton Variant calling 

Notes

Compliance with ethical standards

Conflict of interest

Young Ju Ahn declares that he has no conflict of interest. Kesavan Markkandan declares that he has no conflict of interest. In-Pyo Baek declares that he has no conflict of interest. Seyoung Mun declares that he has no conflict of interest. Wooseok Lee declares that he has no conflict of interest. Heui-Soo Kim declares that he has no conflict of interest. Kyudong Han declares that he has no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Supplementary material

13258_2017_608_MOESM1_ESM.vcf (25.6 mb)
Supplementary material 1 (VCF 26222 KB)
13258_2017_608_MOESM2_ESM.vcf (41.7 mb)
Supplementary material 2 (VCF 42675 KB)
13258_2017_608_MOESM3_ESM.tsv (691 kb)
Supplementary material 3 (TSV 690 KB)
13258_2017_608_MOESM4_ESM.xlsx (17 kb)
Supplementary material 4 (XLSX 16 KB)

References

  1. Abyzov A, Li S, Kim DR, Mohiyuddin M, Stutz AM, Parrish NF, Mu XJ, Clark W, Chen K, Hurles M et al (2015) Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun 6:7256CrossRefPubMedPubMedCentralGoogle Scholar
  2. Boland JF, Chung CC, Roberson D, Mitchell J, Zhang X, Im KM, He J, Chanock SJ, Yeager M, Dean M (2013) The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet 132:1153–1163CrossRefPubMedPubMedCentralGoogle Scholar
  3. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712CrossRefPubMedGoogle Scholar
  4. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498CrossRefPubMedPubMedCentralGoogle Scholar
  5. Gonzaga-Jauregui C, Lupski JR, Gibbs RA (2012) Human genome sequencing in health and disease. Annu Rev Med 63:35–61CrossRefPubMedPubMedCentralGoogle Scholar
  6. Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ et al (2009) A highly annotated whole-genome sequence of a Korean individual. Nature 460:1011–1015CrossRefPubMedPubMedCentralGoogle Scholar
  7. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760CrossRefPubMedPubMedCentralGoogle Scholar
  8. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303CrossRefPubMedPubMedCentralGoogle Scholar
  9. Merriman B, Ion Torrent R, Team D, Rothberg JM (2012) Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33:3397–3417CrossRefPubMedGoogle Scholar
  10. Motoike IN, Matsumoto M, Danjoh I, Katsuoka F, Kojima K, Nariai N, Sato Y, Yamaguchi-Kabata Y, Ito S, Kudo H et al (2014) Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population. BMC Genom 15:673CrossRefGoogle Scholar
  11. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordonez GR, Bignell GR et al (2010) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463:191–196CrossRefPubMedGoogle Scholar
  12. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom 13:341CrossRefGoogle Scholar
  13. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842CrossRefPubMedPubMedCentralGoogle Scholar
  14. Rieber N, Zapatka M, Lasitschka B, Jones D, Northcott P, Hutter B, Jager N, Kool M, Taylor M, Lichter P et al (2013) Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS ONE 8:e66621CrossRefPubMedPubMedCentralGoogle Scholar
  15. Rusmini M, Federici S, Caroli F, Grossi A, Baldi M, Obici L, Insalaco A, Tommasini A, Caorsi R, Gallo E et al (2016) Next-generation sequencing and its initial applications for molecular diagnosis of systemic auto-inflammatory diseases. Ann Rheum Dis 75:1550–1557CrossRefPubMedGoogle Scholar
  16. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528CrossRefPubMedGoogle Scholar
  17. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81CrossRefPubMedPubMedCentralGoogle Scholar
  18. Tsai AG, Lieber MR (2010) Mechanisms of chromosomal rearrangement in the human genome. BMC Genom 11(Suppl 1):S1CrossRefGoogle Scholar
  19. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30:418–426CrossRefPubMedGoogle Scholar
  20. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J et al (2008) The diploid genome sequence of an Asian individual. Nature 456:60–65CrossRefPubMedPubMedCentralGoogle Scholar
  21. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE et al (2007) A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 80:91–104CrossRefPubMedGoogle Scholar
  22. Zhang G, Wang J, Yang J, Li W, Deng Y, Li J, Huang J, Hu S, Zhang B (2015) Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling. BMC Genom 16:581CrossRefGoogle Scholar
  23. Zong C, Lu S, Chapman AR, Xie XS (2012) Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338:1622–1626CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© The Genetics Society of Korea and Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative MedicineDankook UniversityCheonanRepublic of Korea
  2. 2.Theragen Etex Inc.SuwonRepublic of Korea
  3. 3.DKU-Theragen Institute for NGS analysis (DTiNa)CheonanRepublic of Korea
  4. 4.Department of Biological Sciences, College of Natural SciencesPusan National UniversityBusanRepublic of Korea

Personalised recommendations