Abstract
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv.
Keywords
A. Artyomenko, N.C. Wu and S. Mangul—Equal contributor.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aguiar, D., Istrail, S.: Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)
Beerenwinkel, N., et al.: Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proc. Natl. Acad. Sci. 99(12), 8271–8276 (2002)
Bushman, F.D., et al.: Massively parallel pyrosequencing in HIV research. Aids 22(12), 1411–1415 (2008)
Dilernia, D.A., et al.: Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing. Nucleic Acids Res. 43(20), e129 (2015)
Doi, K., et al.: Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing. Bioinformatics 30(6), 815–822 (2014)
Domingo, E.: Mutation rates and rapid evolution of RNA viruses. In: Morse, S.S. (ed.) The Evolutionary Biology of Viruses, pp. 161–184. Raven Press, New York (1994)
Domingo, E., Holland, J.: RNA virus mutations and fitness for survival. Annu. Rev. Microbiol. 51(1), 151–178 (1997)
Eid, J., et al.: Real-time dna sequencing from single polymerase molecules. Science 323(5910), 133–138 (2009)
Eigen, M.: Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10), 465–523 (1971)
Flaherty, P., et al.: Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res. 40(1), e2 (2012)
Forshew, T., et al.: Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci. Transl. Med. 4(136), 136ra68 (2012)
Goepfert, P.A., et al.: Transmission of HIV-1 Gag immune escape mutations is associated with reduced viral load in linked recipients. J. Exp. Med. 205(5), 1009–1017 (2008)
Harismendy, O., et al.: Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol. 12(12), R124 (2011)
Herfst, S., et al.: Airborne transmission of influenza A/H5N1 virus between ferrets. Science 336(6088), 1534–1541 (2012)
Holland, J., et al.: Rapid evolution of RNA genomes. Science 215(4540), 1577–1585 (1982)
Imai, M., et al.: Experimental adaptation of an influenza H5 HA confers respiratory droplet transmission to a reassortant H5 HA/H1N1 virus in ferrets. Nature 486(7403), 420–428 (2012)
Klarenbeek, P.L., et al.: Deep sequencing of antiviral T-cell responses to HCMV and EBV in humans reveals a stable repertoire that is maintained for many years. PLoS Pathog 8(9), e1002889 (2012)
Schrago, C.G., Carvalho, A.B.: Long-read single molecule sequencing to resolve tandem gene copies: the Mst77Y region on the drosophila melanogaster Y chromosome. G3 (Bethesda) 5(6), 1145–1150 (2015)
Lauring, A.S., Andino, R.: Quasispecies theory and the behavior of RNA viruses. PLoS Pathog 6(7), e1001005 (2010)
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Li, M., Stoneking, M.: A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 13(5), R34 (2012)
Liu, J., et al.: Analysis of low-frequency mutations associated with drug resistance to raltegravir before antiretroviral treatment. Antimicrob. Agents Chemother. 55(3), 1114–1119 (2011)
Macalalad, A.R., et al.: Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput. Biol. 8(3), e1002417 (2012)
Mangul, S., et al.: Accurate viral population assembly from ultra-deep sequencing data. Bioinformatics 30(12), i329–i337 (2014)
Mardis, E.R., Wilson, R.K.: Cancer genome sequencing: a review. Hum. Mol. Genet. 18(R2), R163–168 (2009)
Margeridon-Thermet, S., et al.: Ultra-deep pyrosequencing of hepatitis B virus quasispecies from nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI)-treated patients and NRTI-naive patients. J. Infect. Dis. 199(9), 1275–1285 (2009)
Miconnet, I.: Probing the T-cell receptor repertoire with deep sequencing. Curr. Opin. HIV AIDS 7(1), 64–70 (2012)
Murphy, F.A., Kingsbury, D.W.: Virus taxonomy. Fields Virol. 2, 15–57 (1996)
Asai, K., Hamada, M.: PBSIM: PacBio reads simulator toward accurate genome assembly. Bioinformatics 29(1), 119–121 (2013)
Palmer, S., et al.: Selection and persistence of non-nucleoside reverse transcriptase inhibitor-resistant HIV-1 in patients starting and stopping non-nucleoside therapy. Aids 20(5), 701–710 (2006)
Pendleton, M., et al.: Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015)
Beerenwinkel, N., Roth, V.: HIV haplotype inference using a propagating Dirichlet process mixture model. IEEE/ACM Trans. Computat. Biol. Bioinform. (TCBB) 11(1), 182–191 (2014)
Skums, P., et al.: Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. Bioinformatics 31(5), 682–690 (2015)
Sharon, D., Snyder, M.P.: Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. 111(27), 9869–9874 (2014)
Töpfer, A., Marschall, T., Bull, R.A., Luciani, F., Schönhuth, A., Beerenwinkel, N.: Viral quasispecies assembly via maximal clique enumeration. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 309–310. Springer, Heidelberg (2014)
Töpfer, A., et al.: Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20(2), 113–123 (2013)
Ummat, A., Bashir, A.: Resolving complex tandem repeats with long reads. Bioinformatics 30(24), 3491–3498 (2014)
Von Hahn, T., et al.: Hepatitis C virus continuously escapes from neutralizing antibody and T-cell responses during chronic infection in vivo. Gastroenterology 132(2), 667–678 (2007)
Ronaghi, M., Shafer, R.: Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 17(8), 1195–1201 (2007)
Wu, X., et al.: Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science 333(6049), 1593–1602 (2011)
Eriksson, N., Beerenwinkel, N.: Shorah: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinform. 12(1), 119 (2011)
Zhu, J., et al.: Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. Proc. Natl. Acad. Sci. U.S.A. 110(16), 6470–6475 (2013)
Zhu, J., et al.: De novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts. Proc. Natl. Acad. Sci. U.S.A. 110(43), E4088–4097 (2013)
Acknowledgments
We would like to thank H. Hao for performing the PacBio sequencing at Johns Hopkins Deep Sequencing & Microarray Core Facility. A.A. was supported by GSU Molecular Basis of Disease Fellowship. S.M. and E.E were supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448 and 1320589, and National Institutes of Health grants K25-HL080079, U01-DA024417, P01- HL30568, P01-HL28481, R01-GM083198, R01-MH101782 and R01-ES022282. S.M. was supported in part by Institute for Quantitative & Computational Biosciences Fellowship, UCLA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Artyomenko, A., Wu, N.C., Mangul, S., Eskin, E., Sun, R., Zelikovsky, A. (2016). Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31957-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31956-8
Online ISBN: 978-3-319-31957-5
eBook Packages: Computer ScienceComputer Science (R0)