Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants

  • Alexander Artyomenko
  • Nicholas C. Wu
  • Serghei Mangul
  • Eleazar Eskin
  • Ren Sun
  • Alex Zelikovsky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)


As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at


SMRT reads RNA viral variants Single nucleotide variation 



We would like to thank H. Hao for performing the PacBio sequencing at Johns Hopkins Deep Sequencing & Microarray Core Facility. A.A. was supported by GSU Molecular Basis of Disease Fellowship. S.M. and E.E were supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448 and 1320589, and National Institutes of Health grants K25-HL080079, U01-DA024417, P01- HL30568, P01-HL28481, R01-GM083198, R01-MH101782 and R01-ES022282. S.M. was supported in part by Institute for Quantitative & Computational Biosciences Fellowship, UCLA.

Supplementary material

420109_1_En_12_MOESM1_ESM.pdf (814 kb)
Supplementary material 1 (pdf 813 KB)


  1. 1.
    Aguiar, D., Istrail, S.: Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)CrossRefGoogle Scholar
  2. 2.
    Beerenwinkel, N., et al.: Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. Proc. Natl. Acad. Sci. 99(12), 8271–8276 (2002)CrossRefGoogle Scholar
  3. 3.
    Bushman, F.D., et al.: Massively parallel pyrosequencing in HIV research. Aids 22(12), 1411–1415 (2008)CrossRefGoogle Scholar
  4. 4.
    Dilernia, D.A., et al.: Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing. Nucleic Acids Res. 43(20), e129 (2015)CrossRefGoogle Scholar
  5. 5.
    Doi, K., et al.: Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing. Bioinformatics 30(6), 815–822 (2014)CrossRefGoogle Scholar
  6. 6.
    Domingo, E.: Mutation rates and rapid evolution of RNA viruses. In: Morse, S.S. (ed.) The Evolutionary Biology of Viruses, pp. 161–184. Raven Press, New York (1994)Google Scholar
  7. 7.
    Domingo, E., Holland, J.: RNA virus mutations and fitness for survival. Annu. Rev. Microbiol. 51(1), 151–178 (1997)CrossRefGoogle Scholar
  8. 8.
    Eid, J., et al.: Real-time dna sequencing from single polymerase molecules. Science 323(5910), 133–138 (2009)CrossRefGoogle Scholar
  9. 9.
    Eigen, M.: Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10), 465–523 (1971)CrossRefGoogle Scholar
  10. 10.
    Flaherty, P., et al.: Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res. 40(1), e2 (2012)CrossRefGoogle Scholar
  11. 11.
    Forshew, T., et al.: Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci. Transl. Med. 4(136), 136ra68 (2012)CrossRefGoogle Scholar
  12. 12.
    Goepfert, P.A., et al.: Transmission of HIV-1 Gag immune escape mutations is associated with reduced viral load in linked recipients. J. Exp. Med. 205(5), 1009–1017 (2008)CrossRefGoogle Scholar
  13. 13.
    Harismendy, O., et al.: Detection of low prevalence somatic mutations in solid tumors with ultra-deep targeted sequencing. Genome Biol. 12(12), R124 (2011)CrossRefGoogle Scholar
  14. 14.
    Herfst, S., et al.: Airborne transmission of influenza A/H5N1 virus between ferrets. Science 336(6088), 1534–1541 (2012)CrossRefGoogle Scholar
  15. 15.
    Holland, J., et al.: Rapid evolution of RNA genomes. Science 215(4540), 1577–1585 (1982)CrossRefGoogle Scholar
  16. 16.
    Imai, M., et al.: Experimental adaptation of an influenza H5 HA confers respiratory droplet transmission to a reassortant H5 HA/H1N1 virus in ferrets. Nature 486(7403), 420–428 (2012)Google Scholar
  17. 17.
    Klarenbeek, P.L., et al.: Deep sequencing of antiviral T-cell responses to HCMV and EBV in humans reveals a stable repertoire that is maintained for many years. PLoS Pathog 8(9), e1002889 (2012)CrossRefGoogle Scholar
  18. 18.
    Schrago, C.G., Carvalho, A.B.: Long-read single molecule sequencing to resolve tandem gene copies: the Mst77Y region on the drosophila melanogaster Y chromosome. G3 (Bethesda) 5(6), 1145–1150 (2015)CrossRefGoogle Scholar
  19. 19.
    Lauring, A.S., Andino, R.: Quasispecies theory and the behavior of RNA viruses. PLoS Pathog 6(7), e1001005 (2010)CrossRefGoogle Scholar
  20. 20.
    Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefGoogle Scholar
  21. 21.
    Li, M., Stoneking, M.: A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 13(5), R34 (2012)CrossRefGoogle Scholar
  22. 22.
    Liu, J., et al.: Analysis of low-frequency mutations associated with drug resistance to raltegravir before antiretroviral treatment. Antimicrob. Agents Chemother. 55(3), 1114–1119 (2011)CrossRefGoogle Scholar
  23. 23.
    Macalalad, A.R., et al.: Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput. Biol. 8(3), e1002417 (2012)CrossRefGoogle Scholar
  24. 24.
    Mangul, S., et al.: Accurate viral population assembly from ultra-deep sequencing data. Bioinformatics 30(12), i329–i337 (2014)CrossRefGoogle Scholar
  25. 25.
    Mardis, E.R., Wilson, R.K.: Cancer genome sequencing: a review. Hum. Mol. Genet. 18(R2), R163–168 (2009)CrossRefGoogle Scholar
  26. 26.
    Margeridon-Thermet, S., et al.: Ultra-deep pyrosequencing of hepatitis B virus quasispecies from nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI)-treated patients and NRTI-naive patients. J. Infect. Dis. 199(9), 1275–1285 (2009)CrossRefGoogle Scholar
  27. 27.
    Miconnet, I.: Probing the T-cell receptor repertoire with deep sequencing. Curr. Opin. HIV AIDS 7(1), 64–70 (2012)CrossRefGoogle Scholar
  28. 28.
    Murphy, F.A., Kingsbury, D.W.: Virus taxonomy. Fields Virol. 2, 15–57 (1996)Google Scholar
  29. 29.
    Asai, K., Hamada, M.: PBSIM: PacBio reads simulator toward accurate genome assembly. Bioinformatics 29(1), 119–121 (2013)CrossRefGoogle Scholar
  30. 30.
    Palmer, S., et al.: Selection and persistence of non-nucleoside reverse transcriptase inhibitor-resistant HIV-1 in patients starting and stopping non-nucleoside therapy. Aids 20(5), 701–710 (2006)CrossRefGoogle Scholar
  31. 31.
    Pendleton, M., et al.: Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015)CrossRefGoogle Scholar
  32. 32.
    Beerenwinkel, N., Roth, V.: HIV haplotype inference using a propagating Dirichlet process mixture model. IEEE/ACM Trans. Computat. Biol. Bioinform. (TCBB) 11(1), 182–191 (2014)CrossRefGoogle Scholar
  33. 33.
    Skums, P., et al.: Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. Bioinformatics 31(5), 682–690 (2015)CrossRefGoogle Scholar
  34. 34.
    Sharon, D., Snyder, M.P.: Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl. Acad. Sci. 111(27), 9869–9874 (2014)CrossRefGoogle Scholar
  35. 35.
    Töpfer, A., Marschall, T., Bull, R.A., Luciani, F., Schönhuth, A., Beerenwinkel, N.: Viral quasispecies assembly via maximal clique enumeration. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 309–310. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  36. 36.
    Töpfer, A., et al.: Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20(2), 113–123 (2013)CrossRefMathSciNetGoogle Scholar
  37. 37.
    Ummat, A., Bashir, A.: Resolving complex tandem repeats with long reads. Bioinformatics 30(24), 3491–3498 (2014)CrossRefGoogle Scholar
  38. 38.
    Von Hahn, T., et al.: Hepatitis C virus continuously escapes from neutralizing antibody and T-cell responses during chronic infection in vivo. Gastroenterology 132(2), 667–678 (2007)CrossRefGoogle Scholar
  39. 39.
    Ronaghi, M., Shafer, R.: Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 17(8), 1195–1201 (2007)CrossRefGoogle Scholar
  40. 40.
    Wu, X., et al.: Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science 333(6049), 1593–1602 (2011)CrossRefGoogle Scholar
  41. 41.
    Eriksson, N., Beerenwinkel, N.: Shorah: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinform. 12(1), 119 (2011)CrossRefGoogle Scholar
  42. 42.
    Zhu, J., et al.: Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. Proc. Natl. Acad. Sci. U.S.A. 110(16), 6470–6475 (2013)CrossRefGoogle Scholar
  43. 43.
    Zhu, J., et al.: De novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts. Proc. Natl. Acad. Sci. U.S.A. 110(43), E4088–4097 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexander Artyomenko
    • 1
  • Nicholas C. Wu
    • 2
  • Serghei Mangul
    • 3
  • Eleazar Eskin
    • 3
  • Ren Sun
    • 4
  • Alex Zelikovsky
    • 1
  1. 1.Computer Science DepartmentGeorgia State UniversityAtlantaUSA
  2. 2.Department of Integrative Structural and Computational BiologyThe Scripps Research InstituteLa JollaUSA
  3. 3.Computer Science DepartmentUniversity of California, Los AngelesLos AngelesUSA
  4. 4.Molecular and Medical PharmacologyUniversity of California, Los AngelesLos AngelesUSA

Personalised recommendations