Journal of Statistical Physics

, Volume 172, Issue 1, pp 143–155 | Cite as

On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

  • Raul Rabadan
  • Gyan Bhanot
  • Sonia Marsilio
  • Nicholas Chiorazzi
  • Laura Pasqualucci
  • Hossein Khiabanian


One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.


87.23.Kg 87.18.Tt 87.18.Wd 87.18.Vf 02.50.-r 



The authors gratefully acknowledge the constructive feedback of Mohammad Hadigol and Alexandra Jacunski. R.R. acknowledges funding from the NIH (U54CA193313, R01CA185486, and R01CA179044). H.K. acknowledges support from the ACS (IRG-15-168-01), Rutgers Cancer Institute (P30CA072720), and Rutgers Office of Advanced Research Computing (NIH 1S10OD012346-01A1).


  1. 1.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate–a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57(1), 289–300 (1995)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Burger, J.A., Tedeschi, A., Barr, P.M., Robak, T., Owen, C., Ghia, P., Bairey, O., Hillmen, P., Bartlett, N.L., Li, J., Simpson, D., Grosicki, S., Devereux, S., McCarthy, H., Coutre, S., Quach, H., Gaidano, G., Maslyak, Z., Stevens, D.A., Janssens, A., Offner, F., Mayer, J.: ODwyer, M., Hellmann, A., Schuh, A., Siddiqi, T., Polliack, A., Tam, C.S., Suri, D., Cheng, M., Clow, F., Styles, L., James, D.F., Kipps, T.J.: Ibrutinib as initial therapy for patients with chronic lymphocytic leukemia. N. Engl. J. Med. 373(25), 2425–2437 (2015)CrossRefGoogle Scholar
  3. 3.
    Chen, L., Liu, P., Evans, T.C., Ettwiller, L.M.: DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355(6326), 752–756 (2017)ADSCrossRefGoogle Scholar
  4. 4.
    Chen-Harris, H., Borucki, M.K., Torres, C., Slezak, T.R., Allen, J.E.: Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs. BMC Genom. 14(1), 96 (2013)CrossRefGoogle Scholar
  5. 5.
    Ciriello, G., Miller, M.L., Aksoy, B.A., Senbabaoglu, Y., Schultz, N., Sander, C.: Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45(10), 1127–33 (2013)CrossRefGoogle Scholar
  6. 6.
    Costello, M., Pugh, T.J., Fennell, T.J., Stewart, C., Lichtenstein, L., Meldrim, J.C., Fostel, J.L., Friedrich, D.C., Perrin, D., Dionne, D., Kim, S., Gabriel, S.B., Lander, E.S., Fisher, S., Getz, G.: Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative dna damage during sample preparation. Nucleic Acids Res. 41(6), e67 (2013)CrossRefGoogle Scholar
  7. 7.
    Furman, E.: On the convolution of the negative binomial random variables. Stat. Probab. Lett. 77(2), 169–172 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Gerstung, M., Beisel, C., Rechsteiner, M., Wild, P., Schraml, P., Moch, H., Beerenwinkel, N.: Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012)ADSCrossRefGoogle Scholar
  9. 9.
    Gerstung, M., Papaemmanuil, E., Campbell, P.J.: Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30(9), 1198–1204 (2014)CrossRefGoogle Scholar
  10. 10.
    Grossmann, V., Roller, A., Klein, H.U., Weissmann, S., Kern, W., Haferlach, C., Dugas, M., Haferlach, T., Schnittger, S., Kohlmann, A.: Robustness of amplicon deep sequencing underlines its utility in clinical applications. J. Mol. Diagn. 15(4), 473–84 (2013)CrossRefGoogle Scholar
  11. 11.
    Hadigol, M., Khiabanian, H.: Merit: a mutation error rate identification toolkit for ultra-deep sequencing applications. bioRxiv (2017).
  12. 12.
    Hallek, M.: Chronic lymphocytic leukemia: 2015 update on diagnosis, risk stratification, and treatment. Am. J. Hematol. 90(5), 446–460 (2015)CrossRefGoogle Scholar
  13. 13.
    Hata, A.N., Niederst, M.J., Archibald, H.L., Gomez-Caraballo, M., Siddiqui, F.M., Mulvey, H.E., Maruvka, Y.E., Ji, F., Bhang, H.E., Krishnamurthy Radhakrishna, V., Siravegna, G., Hu, H., Raoof, S., Lockerman, E., Kalsy, A., Lee, D., Keating, C.L., Ruddy, D.A., Damon, L.J., Crystal, A.S., Costa, C., Piotrowska, Z., Bardelli, A., Iafrate, A.J., Sadreyev, R.I., Stegmeier, F., Getz, G., Sequist, L.V., Faber, A.C., Engelman, J.A.: Tumor cells can follow distinct evolutionary paths to become resistant to epidermal growth factor receptor inhibition. Nat. Med. 22(3), 262–9 (2016)CrossRefGoogle Scholar
  14. 14.
    Huang, G., Wang, S., Wang, X., You, N.: An empirical bayes method for genotyping and snp detection using multi-sample next-generation sequencing data. Bioinformatics 32(21), 3240–3245 (2016)CrossRefGoogle Scholar
  15. 15.
    Illumina: Specifications for the miseq system. (2017). Accessed 12 May 2017
  16. 16.
    Ivey, A., Hills, R.K., Simpson, M.A., Jovanovic, J.V., Gilkes, A., Grech, A., Patel, Y., Bhudia, N., Farah, H., Mason, J., Wall, K., Akiki, S., Griffiths, M., Solomon, E., McCaughan, F., Linch, D.C., Gale, R.E., Vyas, P., Freeman, S.D., Russell, N., Burnett, A.K., Grimwade, D.: Group, U.K.N.C.R.I.A.W.: Assessment of minimal residual disease in standard-risk aml. N. Engl. J. Med. 374(5), 422–33 (2016)CrossRefGoogle Scholar
  17. 17.
    Jee, J., Rasouly, A., Shamovsky, I., Akivis, Y.R., Steinman, S., Mishra, B., Nudler, E.: Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534(7609), 693–696 (2016)ADSCrossRefGoogle Scholar
  18. 18.
    Jia, Y., Sanchez, J.A., Wangh, L.J.: Kinetic hairpin oligonucleotide blockers for selective amplification of rare mutations. Sci. Rep. 4, 5921 (2014)ADSCrossRefGoogle Scholar
  19. 19.
    Kennedy, S.R., Schmitt, M.W., Fox, E.J., Kohrn, B.F., Salk, J.J., Ahn, E.H., Prindle, M.J., Kuong, K.J., Shen, J.C., Risques, R.A., Loeb, L.A.: Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protoc. 9(11), 2586–2606 (2014)CrossRefGoogle Scholar
  20. 20.
    Kessler, D.A., Levine, H.: Large population solution of the stochastic Luria-Delbruck evolution model. Proc. Natl. Acad. Sci. USA 110(29), 11682–11687 (2013)ADSMathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Khiabanian, H., Carpenter, Z., Kugelman, J., Chan, J., Trifonov, V., Nagle, E., Warren, T., Iversen, P., Bavari, S., Palacios, G., Rabadan, R.: Viral diversity and clonal evolution from unphased genomic data. BMC Genom. 15(6), S17 (2014)CrossRefGoogle Scholar
  22. 22.
    Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K.W., Vogelstein, B.: Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl. Acad. Sci. 108(23), 9530–9535 (2011)ADSCrossRefGoogle Scholar
  23. 23.
    Kipps, T.J., Stevenson, F.K., Wu, C.J., Croce, C.M., Packham, G., Wierda, W.G., O’Brien, S., Gribben, J., Rai, K.: Chronic lymphocytic leukaemia. Nat. Rev. Dis. Prim. 3, 16096 (2017)CrossRefGoogle Scholar
  24. 24.
    Lazarian, G., Guieze, R., Wu, C.J.: Clinical implications of novel genomic discoveries in chronic lymphocytic leukemia. J. Clin. Oncol. 35(9), 984–993 (2017)CrossRefGoogle Scholar
  25. 25.
    Lee, J.C., Sabavala, D.J.: Bayesian estimation and prediction for the beta-binomial model. J. Bus. Econ. Stat. 5(3), 357–367 (1987)Google Scholar
  26. 26.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–60 (2009)CrossRefGoogle Scholar
  27. 27.
    Li, M., Stoneking, M.: A new approach for detecting low-level mutations in next-generation sequence data. Genome Biol. 13(5), R34–R34 (2012)CrossRefGoogle Scholar
  28. 28.
    Luria, S.E., Delbrück, M.: Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28(6), 491–511 (1943)Google Scholar
  29. 29.
    Marsilio, S., Khiabanian, H., Fabbri, G., Vergani, S., Scuoppo, C., Montserrat, E., Shpall, E.J., Hadigol, M., Marin, P., Rai, K.R., Rabadan, R., Devereux, S., Pasqualucci, L., Chiorazzi, N.: Somatic CLL mutations occur at multiple distinct hematopoietic maturation stages: documentation and cautionary note regarding cell fraction purity. Leukemia (2017).
  30. 30.
    Martincorena, I., Raine, K.M., Gerstung, M., Dawson, K.J., Haase, K., Van Loo, P., Davies, H., Stratton, M.R., Campbell, P.J.: Universal patterns of selection in cancer and somatic tissues. Cell 171(5), 1029–1041.e21 (2017).
  31. 31.
    Milbury, C.A., Li, J., Makrigiorgos, G.M.: Pcr-based methods for the enrichment of minority alleles and mutations. Clin. Chem. 55(4), 632–640 (2009)CrossRefGoogle Scholar
  32. 32.
    Morin, R.D., Mungall, K., Pleasance, E., Mungall, A.J., Goya, R., Huff, R.D., Scott, D.W., Ding, J., Roth, A., Chiu, R., Corbett, R.D., Chan, F.C., Mendez-Lago, M., Trinh, D.L., Bolger-Munro, M., Taylor, G., Hadj Khodabakhshi, A., Ben-Neriah, S., Pon, J., Meissner, B., Woolcock, B., Farnoud, N., Rogic, S., Lim, E.L., Johnson, N.A., Shah, S., Jones, S., Steidl, C., Holt, R., Birol, I., Moore, R., Connors, J.M., Gascoyne, R.D., Marra, M.A.: Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencing. Blood 122(7), 1256–1265 (2013)CrossRefGoogle Scholar
  33. 33.
    Muralidharan, O., Natsoulis, G., Bell, J., Ji, H., Zhang, N.R.: Detecting mutations in mixed sample sequencing data using empirical Bayes. Ann. Appl. Stat. 6(3), 1047–1067 (2012). MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Nadeu, F., Delgado, J., Royo, C., Baumann, T., Stankovic, T., Pinyol, M., Jares, P., Navarro, A., Martín-García, D., Beà, S., Salaverria, I., Oldreive, C., Aymerich, M., Suárez-Cisneros, H., Rozman, M., Villamor, N., Colomer, D., López-Guillermo, A., González, M., Alcoceba, M., Terol, M.J., Colado, E., Puente, X.S., López-Otín, C., Enjuanes, A., Campo, E.: Clinical impact of clonal and subclonal tp53, sf3b1, birc3, notch1, and atm mutations in chronic lymphocytic leukemia. Blood 127(17), 2122–2130 (2016)CrossRefGoogle Scholar
  35. 35.
    Naxerova, K., Reiter, J.G., Brachtel, E., Lennerz, J.K., van de Wetering, M., Rowan, A., Cai, T., Clevers, H., Swanton, C., Nowak, M.A., Elledge, S.J., Jain, R.K.: Origins of lymphatic and distant metastases in human colorectal cancer. Science 357(6346), 55–60 (2017)ADSCrossRefGoogle Scholar
  36. 36.
    Ndifon, W., Gal, H., Shifrut, E., Aharoni, R., Yissachar, N., Waysbort, N., Reich-Zeliger, S., Arnon, R., Friedman, N.: Chromatin conformation governs t-cell receptor j\(\beta \) gene segment usage. Proc. Natl. Acad. Sci. USA 109(39), 15865–15870 (2012)ADSCrossRefGoogle Scholar
  37. 37.
    Newman, A.M., Lovejoy, A.F., Klass, D.M., Kurtz, D.M., Chabon, J.J., Scherer, F., Stehr, H., Liu, C.L., Bratman, S.V., Say, C., Zhou, L., Carter, J.N., West, R.B., Sledge Jr., G.W., Shrager, J.B., Loo Jr., B.W., Neal, J.W., Wakelee, H.A., Diehn, M., Alizadeh, A.A.: Integrated digital error suppression for improved detection of circulating tumor dna. Nat. Biotechnol. 34(5), 547–555 (2016)CrossRefGoogle Scholar
  38. 38.
    Oshima, K., Khiabanian, H., da Silva-Almeida, A.C., Tzoneva, G., Abate, F., Ambesi-Impiombato, A., Sanchez-Martin, M., Carpenter, Z., Penson, A., Perez-Garcia, A., Eckert, C., Nicolas, C., Balbin, M., Sulis, M.L., Kato, M., Koh, K., Paganin, M., Basso, G., Gastier-Foster, J.M., Devidas, M., Loh, M.L., Kirschner-Schwabe, R., Palomero, T., Rabadan, R., Ferrando, A.A.: Mutational landscape, clonal evolution patterns, and role of RAS mutations in relapsed acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. USA 113, 11306–11311 (2016)CrossRefGoogle Scholar
  39. 39.
    Pospisilova, S., Gonzalez, D., Malcikova, J., Trbusek, M., Rossi, D., Kater, A.P., Cymbalista, F., Eichhorst, B., Hallek, M., Dohner, H., Hillmen, P., van Oers, M., Gribben, J., Ghia, P., Montserrat, E., Stilgenbauer, S., Zenz, T.: Eric recommendations on tp53 mutation analysis in chronic lymphocytic leukemia. Leukemia 26(7), 1458–1461 (2012)CrossRefGoogle Scholar
  40. 40.
    Rasi, S., Khiabanian, H., Ciardullo, C., Terzi-di Bergamo, L., Monti, S., Spina, V., Bruscaggin, A., Cerri, M., Deambrogi, C., Martuscelli, L., Biasi, A., Spaccarotella, E., De Paoli, L., Gattei, V., Foa, R., Rabadan, R., Gaidano, G., Rossi, D.: Clinical impact of small subclones harboring notch1, sf3b1 or birc3 mutations in chronic lymphocytic leukemia. Haematologica 101(4), e135–8 (2016)CrossRefGoogle Scholar
  41. 41.
    Rossi, D., Rasi, S., Spina, V., Bruscaggin, A., Monti, S., Ciardullo, C., Deambrogi, C., Khiabanian, H., Serra, R., Bertoni, F., Forconi, F., Laurenti, L., Marasca, R., Dal-Bo, M., Rossi, F.M., Bulian, P., Nomdedeu, J., Del Poeta, G., Gattei, V., Pasqualucci, L., Rabadan, R., Foà, R., Dalla-Favera, R., Gaidano, G.: Integrated mutational and cytogenetic analysis identifies new prognostic subgroups in chronic lymphocytic leukemia. Blood 121(8), 1403–1412 (2013)CrossRefGoogle Scholar
  42. 42.
    Rossi, D., Khiabanian, H., Spina, V., Ciardullo, C., Bruscaggin, A., Fama, R., Rasi, S., Monti, S., Deambrogi, C., De Paoli, L., Wang, J., Gattei, V., Guarini, A., Foa, R., Rabadan, R., Gaidano, G.: Clinical impact of small tp53 mutated subclones in chronic lymphocytic leukemia. Blood 123(14), 2139–47 (2014)CrossRefGoogle Scholar
  43. 43.
    Shiraishi, Y., Sato, Y., Chiba, K., Okuno, Y., Nagata, Y., Yoshida, K., Shiba, N., Hayashi, Y., Kume, H., Homma, Y., Sanada, M., Ogawa, S., Miyano, S.: An empirical bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41(7), e89 (2013)CrossRefGoogle Scholar
  44. 44.
    Shlush, L.I., Mitchell, A., Heisler, L., Abelson, S., Ng, S.W.K., Trotman-Grant, A., Medeiros, J.J.F., Rao-Bhatia, A., Jaciw-Zurakowsky, I., Marke, R., McLeod, J.L., Doedens, M., Bader, G., Voisin, V., Xu, C., McPherson, J.D., Hudson, T.J., Wang, J.C.Y., Minden, M.D., Dick, J.E.: Tracing the origins of relapse in acute myeloid leukaemia to stem cells. Nature 547(7661), 104–108 (2017)ADSCrossRefGoogle Scholar
  45. 45.
    Siravegna, G., Marsoni, S., Siena, S., Bardelli, A.: Integrating liquid biopsies into the management of cancer. Nat. Rev. Clin. Oncol. 14, 531–548 (2017)CrossRefGoogle Scholar
  46. 46.
    Souers, A.J., Leverson, J.D., Boghaert, E.R., Ackler, S.L., Catron, N.D., Chen, J., Dayton, B.D., Ding, H., Enschede, S.H., Fairbrother, W.J., Huang, D.C.S., Hymowitz, S.G., Jin, S., Khaw, S.L., Kovar, P.J., Lam, L.T., Lee, J., Maecker, H.L., Marsh, K.C., Mason, K.D., Mitten, M.J., Nimmer, P.M., Oleksijew, A., Park, C.H., Park, C.M., Phillips, D.C., Roberts, A.W., Sampath, D., Seymour, J.F., Smith, M.L., Sullivan, G.M., Tahir, S.K., Tse, C., Wendt, M.D., Xiao, Y., Xue, J.C., Zhang, H., Humerickhouse, R.A., Rosenberg, S.H., Elmore, S.W.: Abt-199, a potent and selective bcl-2 inhibitor, achieves antitumor activity while sparing platelets. Nat. Med. 19(2), 202–208 (2013)CrossRefGoogle Scholar
  47. 47.
    Stewart, T.J., Abrams, S.I.: How tumours escape mass destruction. Oncogene 27(45), 5894–5903 (2008)CrossRefGoogle Scholar
  48. 48.
    Trifonov, V., Pasqualucci, L., Tiacci, E., Falini, B., Rabadan, R.: Savi: a statistical algorithm for variant frequency identification. BMC Syst. Biol. 7(Suppl 2), S2 (2013)CrossRefGoogle Scholar
  49. 49.
    Vargas, D.Y., Kramer, F.R., Tyagi, S., Marras, S.A.E.: Multiplex real-time PCR assays that measure the abundance of extremely rare mutations associated with cancer. PLoS ONE 11(5), e0156546 (2016)CrossRefGoogle Scholar
  50. 50.
    Vellaisamy, P., Upadhye, N.S.: On the sums of compound negative binomial and gamma random vaariables. J. Appl. Probab. 46(1), 272–283 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Wang, J., Khiabanian, H., Rossi, D., Fabbri, G., Forconi, F., Laurenti, L., Marasca, R., Del Poeta, G., Fo, R., Pasqualucci, L., Gaidano, G., Rabadan, R.: Tumor evolutionary directed graphs and the history of chronic lymphocytic leukemia. eLife 3, e02869 (2014)Google Scholar
  52. 52.
    Wang, J., Cazzato, E., Ladewig, E., Frattini, V., Rosenbloom, D.I.S., Zairis, S., Abate, F., Liu, Z., Elliott, O., Shin, Y.J., Lee, J.K., Lee, I.H., Park, W.Y., Eoli, M., Blumberg, A.J., Lasorella, A., Nam, D.H., Finocchiaro, G., Iavarone, A., Rabadan, R.: Clonal evolution of glioblastoma under therapy. Nat. Genet. 48(7), 768–776 (2016)CrossRefGoogle Scholar
  53. 53.
    Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: Pear: a fast and accurate illumina paired-end read merger. Bioinformatics 30(5), 614–620 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Systems BiologyColumbia UniversityNew YorkUSA
  2. 2.Department of Physics and AstronomyRutgers UniversityPiscatawayUSA
  3. 3.The Feinstein Institute for Medical Research, Northwell HealthManhassetUSA
  4. 4.Institute for Cancer GeneticsColumbia UniversityNew YorkUSA
  5. 5.Rutgers Cancer Institute of New JerseyRutgers UniversityNew BrunswickUSA

Personalised recommendations