Advertisement

Human Genetics

, Volume 135, Issue 5, pp 499–511 | Cite as

Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies

  • E. Damiati
  • G. Borsani
  • Edoardo GiacopuzziEmail author
Original Investigation

Abstract

The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

Keywords

Variant Identification Exome Sequencing Whole Exome Sequencing Torrent Variant Caller High Confident Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We acknowledge Prof. Massimo Gennarelli, Prof. Emilio Sacchetti, Prof. Marina Colombi and Dr. Chiara Magri for providing materials used in the study and let us include their samples data. Publication costs were covered by Grant “New Opportunities and Ways towards ERC” (NOW ERC, Project: 2014-2256) from Fondazione Cariplo and Regione Lombardia.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

For this type of study formal consent is not required.

Supplementary material

439_2016_1656_MOESM1_ESM.xls (10 kb)
Detailed results of the comparison of exome capture kits assessed in the study (XLS 9 kb)
439_2016_1656_MOESM2_ESM.xls (33.4 mb)
Supplementary tables 1–6. Lists of CDS regions not fully addressed in the 6 exome enrichment kits compared in the study. Gene symbol and ClinVar annotation are also reported (XLS 34183 kb)
439_2016_1656_MOESM3_ESM.pdf (40 kb)
Supplementary tables 7–8. Detailed results of the 27 sequencing runs and 34 exome sequencing data analyzed in the study (PDF 40 kb)
439_2016_1656_MOESM4_ESM.xls (128 kb)
Supplementary table 9. List of regions within human CDS exons that emerged as hard to sequence intervals based on our analysis. Gene symbols and ClinVar annotations are also reported (XLS 127 kb)
439_2016_1656_MOESM5_ESM.xls (11 kb)
Supplementary table 10. Detailed results on variant identification performances on the NA12878 sample, determined by comparison of WES datasets with gold standard variants provided by the GIAB consortium (XLS 11 kb)
439_2016_1656_MOESM6_ESM.pdf (1.6 mb)
Supplementary figures cited in the paper with figure legends (PDF 1661 kb)

References

  1. Adams DR, Sincan M, Fuentes Fajardo K et al (2012) Analysis of DNA sequence variants detected by high-throughput sequencing. Hum Mutat 33:599–608. doi: 10.1002/humu.22035 PubMedPubMedCentralCrossRefGoogle Scholar
  2. Allhoff M, Schönhuth A, Martin M et al (2013) Discovering motifs that induce sequencing errors. BMC Bioinform 14:S1. doi: 10.1186/1471-2105-14-S5-S1 CrossRefGoogle Scholar
  3. Bamshad MJ, Ng SB, Bigham AW et al (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12:745–755. doi: 10.1038/nrg3031 PubMedCrossRefGoogle Scholar
  4. Biesecker LG, Green RC (2014) Diagnostic clinical genome and exome sequencing. N Engl J Med 370:2418–2425. doi: 10.1056/NEJMra1312543 PubMedCrossRefGoogle Scholar
  5. Bodi K, Perera AG, Adams PS et al (2013) Comparison of commercially available target enrichment methods for next-generation sequencing. J Biomol Tech 24:73–86. doi: 10.7171/jbt.13-2402-002 PubMedPubMedCentralCrossRefGoogle Scholar
  6. Boland JF, Chung CC, Roberson D et al (2013) The new sequencer on the block: comparison of Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum Genet. doi: 10.1007/s00439-013-1321-4 PubMedPubMedCentralCrossRefGoogle Scholar
  7. Bragg LM, Stone G, Butler MK et al (2013) Shining a light on dark sequencing: characterising errors in ion torrent PGM data. PLoS Comput Biol 9:e1003031. doi: 10.1371/journal.pcbi.1003031 PubMedPubMedCentralCrossRefGoogle Scholar
  8. Chilamakuri CSR, Lorenz S, Madoui M-A et al (2014) Performance comparison of four exome capture systems for deep sequencing. BMC Genom 15:449. doi: 10.1186/1471-2164-15-449 CrossRefGoogle Scholar
  9. Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12:628–640. doi: 10.1038/nrg3046 PubMedCrossRefGoogle Scholar
  10. DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498. doi: 10.1038/ng.806 PubMedPubMedCentralCrossRefGoogle Scholar
  11. Dewey FE, Grove ME, Pan C et al (2014) Clinical interpretation and implications of whole-genome sequencing. JAMA 311:1035–1045. doi: 10.1001/jama.2014.1717 PubMedPubMedCentralCrossRefGoogle Scholar
  12. Do R, Kathiresan S, Abecasis GR (2012) Exome sequencing and complex disease: practical aspects of rare variant association studies. Hum Mol Genet. doi: 10.1093/hmg/dds387 PubMedPubMedCentralCrossRefGoogle Scholar
  13. Ghoneim DH, Myers JR, Tuttle E, Paciorkowski AR (2014) Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. BMC Res Notes 7:1–10. doi: 10.1186/1756-0500-7-864 CrossRefGoogle Scholar
  14. Gilissen C, Hoischen A, Brunner HG, Veltman JA (2011) Unlocking Mendelian disease using exome sequencing. Genome Biol 12:228. doi: 10.1186/gb-2011-12-9-228 PubMedPubMedCentralCrossRefGoogle Scholar
  15. Green RC, Berg JS, Grody WW et al (2013) ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 15:565–574. doi: 10.1038/gim.2013.73 PubMedPubMedCentralCrossRefGoogle Scholar
  16. Hatem A, Bozdağ D, Toland AE, Çatalyürek ÜV (2013) Benchmarking short sequence mapping tools. BMC Bioinform 14:184. doi: 10.1186/1471-2105-14-184 CrossRefGoogle Scholar
  17. Head SR, Komori HK, Lamere SA et al (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques 56:61–77. doi: 10.2144/000114133 PubMedPubMedCentralCrossRefGoogle Scholar
  18. Hou R, Yang Z, Li M, Xiao H (2013) Impact of the next-generation sequencing data depth on various biological result inferences. Sci China Life Sci 56:104–109. doi: 10.1007/s11427-013-4441-0 PubMedCrossRefGoogle Scholar
  19. Isakov O, Perrone M, Shomron N (2013) Exome sequencing analysis: a guide to disease variant detection. In: Shomron N (ed) Methods in molecular biology. Springer Science, Totowa, pp 137–158Google Scholar
  20. Jünemann S, Sedlazeck FJ, Prior K et al (2013) Updating benchtop sequencing performance comparison. Nat Biotechnol 31:294–296. doi: 10.1038/nbt.2522 PubMedCrossRefGoogle Scholar
  21. Kiezun A, Garimella K, Do R et al (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44:623–630. doi: 10.1038/ng.2303 PubMedPubMedCentralCrossRefGoogle Scholar
  22. Kim K, Seong M, Chung W et al (2015) Effect of next-generation exome sequencing depth for discovery of diagnostic variants. Genomics Inform 13:31–39. doi: 10.5808/GI.2015.13.2.31 PubMedPubMedCentralCrossRefGoogle Scholar
  23. Laehnemann D, Borkhardt A, McHardy AC (2015) Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction. Brief Bioinform. doi: 10.1093/bib/bbv029 PubMedPubMedCentralCrossRefGoogle Scholar
  24. Lee H, Deignan JL, Dorrani N et al (2014) Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA 312:1880–1887. doi: 10.1001/jama.2014.14604 PubMedPubMedCentralCrossRefGoogle Scholar
  25. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 PubMedPubMedCentralCrossRefGoogle Scholar
  26. Liu L, Li Y, Li S et al (2012) Comparison of next-generation sequencing systems. J Biomed Biotechnol. 2012:1–11PubMedGoogle Scholar
  27. Meienberg J, Zerjavic K, Keller I et al (2015) New insights into the performance of human whole-exome capture platforms. Nucleic Acids Res 43:e76. doi: 10.1093/nar/gkv216 PubMedPubMedCentralCrossRefGoogle Scholar
  28. Merriman B, Ion Torrent R&D Team, Rothberg JM (2012) Progress in Ion Torrent semiconductor chip based sequencing. Electrophoresis 33:3397–417. doi: 10.1002/elps.201200424 PubMedCrossRefGoogle Scholar
  29. Metzker ML (2009) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46. doi: 10.1038/nrg2626 PubMedCrossRefGoogle Scholar
  30. Pabinger S, Dander A, Fischer M et al (2013) A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. doi: 10.1093/bib/bbs086 PubMedPubMedCentralCrossRefGoogle Scholar
  31. Quail M, Smith ME, Coupland P et al (2012) A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers. BMC Genom 13:341. doi: 10.1186/1471-2164-13-341 CrossRefGoogle Scholar
  32. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi: 10.1093/bioinformatics/btq033 PubMedPubMedCentralCrossRefGoogle Scholar
  33. Ross MG, Russ C, Costello M et al (2013) Characterizing and measuring bias in sequence data. Genome Biol 14:R51. doi: 10.1186/gb-2013-14-5-r51 PubMedPubMedCentralCrossRefGoogle Scholar
  34. Rothberg JM, Hinz W, Rearick TM et al (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348–352. doi: 10.1038/nature10242 PubMedCrossRefGoogle Scholar
  35. Samarakoon PS, Sorte HS, Kristiansen BE et al (2014) Identification of copy number variants from exome sequence data. BMC Genom 15:661. doi: 10.1186/1471-2164-15-661 CrossRefGoogle Scholar
  36. Sims D, Sudbery I, Ilott NE et al (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15:121–132. doi: 10.1038/nrg3642 PubMedCrossRefGoogle Scholar
  37. Taylor JC, Martin HC, Lise S et al (2015) Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat Genet 47:717–726. doi: 10.1038/ng.3304 PubMedPubMedCentralCrossRefGoogle Scholar
  38. van Dijk EL, Jaszczyszyn Y, Thermes C (2014) Library preparation methods for next-generation sequencing: tone down the bias. Exp Cell Res. doi: 10.1016/j.yexcr.2014.01.008 PubMedCrossRefGoogle Scholar
  39. Wang S, Xing J (2013) A primer for disease gene prioritization using next-generation sequencing data. Genomics Inform 11:191–199. doi: 10.5808/GI.2013.11.4.191 PubMedPubMedCentralCrossRefGoogle Scholar
  40. Wang Z, Liu X, Yang B-Z, Gelernter J (2013) The role and challenges of exome sequencing in studies of human diseases. Front Genet 4:160. doi: 10.3389/fgene.2013.00160 PubMedPubMedCentralGoogle Scholar
  41. Yang Y, Muzny DM, Reid JG et al (2013) Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 369:1502–1511. doi: 10.1056/NEJMoa1306555 PubMedPubMedCentralCrossRefGoogle Scholar
  42. Yi M, Zhao Y, Jia L et al (2014) Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data. Nucleic Acids Res 42:e101. doi: 10.1093/nar/gku392 PubMedPubMedCentralCrossRefGoogle Scholar
  43. Zhang G, Wang J, Yang J et al (2015) Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling. BMC Genom 16:581. doi: 10.1186/s12864-015-1796-6 CrossRefGoogle Scholar
  44. Zook JM, Chapman B, Wang J et al (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32:246–251. doi: 10.1038/nbt.2835 PubMedCrossRefGoogle Scholar

Copyright information

© The Author(s) 2016

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Unit of Genetics, Department of Molecular and Translational MedicineUniversity of BresciaBresciaItaly

Personalised recommendations