Skip to main content

Applications of probability and statistics in cancer genomics

Abstract

Background

The past decade has witnessed a rapid progress in our understanding of the genetics of cancer and its progression. Probabilistic and statistical modeling played a pivotal role in the discovery of general patterns from cancer genomics datasets and continue to be of central importance for personalized medicine.

Results

Scientists have programmed the living organisms using these circuits to attain multiple, delicate and well-defined functions. With the integration of tools or technologies from other disciplines, these rewired cells can achieve even more complex tasks.

Results

In this review we introduce cancer genomics from a probabilistic and statistical perspective. We start from (1) functional classification of genes into oncogenes and tumor suppressor genes, then (2) demonstrate the importance of comprehensive analysis of different mutation types for individual cancer genomes, followed by (3) tumor purity analysis, which in turn leads to (4) the concept of ploidy and clonality, that is next connected to (5) tumor evolution under treatment pressure, which yields insights into cancer drug resistance. We also discuss future challenges including the non-coding genomic regions, integrative analysis of genomics and epigenomics, as well as early cancer detection.

Conclusion

We believe probabilistic and statistical modeling will continue to play important roles for novel discoveries in the field of cancer genomics and personalized medicine.

References

  1. Nowell, P. C. (2007) Discovery of the Philadelphia chromosome: a personal perspective. J. Clin. Invest., 117, 2033–2035

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Nowell, P. H. D. (1960) A minute chromosome in human chronic granulocytic leukemia. Science, 132, 1497

    Google Scholar 

  3. Sanger, F. and Coulson, A. R. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol., 94, 441–448

    CAS  PubMed  Article  Google Scholar 

  4. Weinberg, R. A. (1991) Tumor suppressor genes. Science, 254, 1138–1146

    CAS  PubMed  Article  Google Scholar 

  5. Downing, J. R., Wilson, R. K., Zhang, J., Mardis, E. R., Pui, C. H., Ding, L., Ley, T. J. and Evans, W. E. (2012) The Pediatric Cancer Genome Project. Nat. Genet., 44, 619–622

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Cancer Genome Atlas Research Network. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068

    Article  CAS  Google Scholar 

  7. Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M. N., Gawad, C., Zhou, X., Li, Y., Rusch, M. C., Easton, J., et al. (2018) Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 555, 371–376

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. Gröbner, S. N., Worst, B. C., Weischenfeldt, J., Buchhalter, I., Kleinheinz, K., Rudneva, V. A., Johann, P. D., Balasubramanian, G. P., Segura-Wang, M., Brabetz, S., et al. (2018) The landscape of genomic alterations across childhood cancers. Nature, 555, 321–327

    PubMed  Article  CAS  Google Scholar 

  9. Lawrence, M. S., Stojanov, P., Mermel, C. H., Robinson, J. T., Garraway, L. A., Golub, T. R., Meyerson, M., Gabriel, S. B., Lander, E. S. and Getz, G. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature, 505, 495–501

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., Leiserson, M. D. M., Niu, B., McLellan, M. D., Uzunangelov, V., et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158, 929–944

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. Zack, T. I., Schumacher, S. E., Carter, S. L., Cherniack, A. D., Saksena, G., Tabak, B., Lawrence, M. S., Zhang, C. Z., Wala, J., Mermel, C. H., et al. (2013) Pan-cancer patterns of somatic copy number alteration. Nat. Genet., 45, 1134–1140

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Rusch, M., Nakitandwe, J., Shurtleff, S., Newman, S., Zhang, Z., Edmonson, M. N., Parker, M., Jiao, Y., Ma, X., Liu, Y., et al. (2018) Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat. Commun., 9, 3962

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  13. Crowley, E., Di Nicolantonio, F., Loupakis, F. and Bardelli, A. (2013) Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol., 10, 472–484

    CAS  PubMed  Article  Google Scholar 

  14. Cohen, J. D., Li, L., Wang, Y., Thoburn, C., Afsari, B., Danilova, L., Douville, C., Javed, A. A., Wong, F., Mattox, A., et al. (2018) Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science, 359, 926–930

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. Tomasetti, C., Vogelstein, B. and Parmigiani, G. (2013) Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl. Acad. Sci. USA, 110, 1999–2004

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  16. Kunkel, T. A. and Erie, D. A. (2015) Eukaryotic mismatch repair in relation to DNA replication. Annu. Rev. Genet., 49, 291–313

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Forsberg, L. A., Gisselsson, D. and Dumanski, J. P. (2017) Mosaicism in health and disease—clones picking up speed. Nat. Rev. Genet., 18, 128–142

    CAS  PubMed  Article  Google Scholar 

  18. Bianconi, E., Piovesan, A., Facchin, F., Beraudi, A., Casadei, R., Frabetti, F., Vitale, L., Pelleri, M. C., Tassani, S., Piva, F., et al. (2013) An estimation of the number of cells in the human body. Ann. Hum. Biol., 40, 463–471

    PubMed  Article  Google Scholar 

  19. Testa, C. M. and Jankovic, J. (2019) Huntington disease: A quarter century of progress since the gene discovery. J. Neurol. Sci., 396, 52–68

    CAS  PubMed  Article  Google Scholar 

  20. Zhang, J., Walsh, M. F., Wu, G., Edmonson, M. N., Gruber, T. A., Easton, J., Hedges, D., Ma, X., Zhou, X., Yergeau, D. A., et al. (2015) Germline mutations in predisposition genes in pediatric cancer. N. Engl. J. Med., 373, 2336–2346

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Vogelstein, B., Papadopoulos, N., Velculescu, V. E., Zhou, S., Diaz, L. A. Jr and Kinzler, K. W. (2013) Cancer genome landscapes. Science, 339, 1546–1558

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. Pounds, S., Cheng, C., Li, S., Liu, Z., Zhang, J. and Mullighan, C. (2013) A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics, 29, 2088–2095

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., et al. (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. Dees, N. D., Zhang, Q., Kandoth, C., Wendl, M. C., Schierding, W., Koboldt, D. C., Mooney, T. B., Callaway, M. B., Dooling, D., Mardis, E. R., et al. (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res., 22, 1589–1598

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. Soussi, T. and Wiman, K. G. (2015) TP53: an oncogene in disguise. Cell Death Differ., 22, 1239–1249

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. International Human Genome Sequencing Consortium. (2004) Finishing the euchromatic sequence of the human genome. Nature, 431, 931–945

    Article  CAS  Google Scholar 

  27. Cancer Genome Atlas Research Network. (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature, 489, 519–525

    Article  CAS  Google Scholar 

  28. Lawrence, M. S., Stojanov, P., Polak, P., Kryukov, G. V., Cibulskis, K., Sivachenko, A., Carter, S. L., Stewart, C., Mermel, C. H., Roberts, S. A., et al. (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Ma, X., Edmonson, M., Yergeau, D., Muzny, D. M., Hampton, O. A., Rusch, M., Song, G., Easton, J., Harvey, R. C., Wheeler, D. A., et al. (2015) Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia. Nat. Commun., 6, 6604

    CAS  PubMed  Article  Google Scholar 

  30. Nik-Zainal, S., Alexandrov, L. B., Wedge, D. C., Van Loo, P., Greenman, C. D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L. A., et al. (2012) Mutational processes molding the genomes of 21 breast cancers. Cell, 149, 979–993

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M. and Sirotkin, K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. Griffith, M., Miller, C. A., Griffith, O. L., Krysiak, K., Skidmore, Z. L., Ramu, A., Walker, J. R., Dang, H. X., Trani, L., Larson, D. E., et al. (2015) Optimizing cancer genome sequencing and analysis. Cell Syst., 1, 210–223

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Sundling, K. E. and Lowe, A. C. (2019) Circulating tumor cells: overview and opportunities in cytology. Adv. Anat. Pathol., 26, 56–63

    CAS  PubMed  Article  Google Scholar 

  34. Kakadia, P. M., Van de Water, N., Browett, P. J. and Bohlander, S. K. (2018) Efficient identification of somatic mutations in acute myeloid leukaemia using whole exome sequencing of fingernail derived DNA as germline control. Sci. Rep., 8, 13751

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  35. Mrózek, K., Heerema, N. A. and Bloomfield, C. D. (2004) Cytogenetics in acute leukemia. Blood Rev., 18, 115–136

    PubMed  Article  Google Scholar 

  36. Craig, D. W., Nasser, S., Corbett, R., Chan, S. K., Murray, L., Legendre, C., Tembe, W., Adkins, J., Kim, N., Wong, S., et al. (2016) A somatic reference standard for cancer genome sequencing. Sci. Rep., 6, 24607

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Li, B., Brady, S. W., Ma, X., Shen, S., Zhang, Y., Li, Y., Szlachta, K., Dong, L., Liu, Y., Yang, F., et al. (2019) Therapy-induced mutations drive the genomic landscape of relapsed acute lymphoblastic leukemia. Blood, 135, 41–55

    Article  Google Scholar 

  38. Brady, S.W., Ma, X., Bahrami, A., Satas, G., Wu, G., Newman, S., Rusch, M., Putnam, D. K., Mulder, H. L., Yergeau, D. A., et al. (2019) The clonal evolution of metastatic osteosarcoma as shaped by cisplatin treatment. Mol. Cancer Res., 17, 895–906

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  39. Li, B., Li, H., Bai, Y., Kirschner-Schwabe, R., Yang, J. J., Chen, Y., Lu, G., Tzoneva, G., Ma, X., Wu, T., et al. (2015) Negative feedback-defective PRPS1 mutants drive thiopurine resistance in relapsed childhood ALL. Nat. Med., 21, 563–571

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Salk, J. J., Schmitt, M. W. and Loeb, L. A. (2018) Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet., 19, 269–285

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Goodwin, S., McPherson, J. D. and McCombie, W. R. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351

    CAS  PubMed  Article  Google Scholar 

  42. Mardis, E. R. (2013) Next-generation sequencing platforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.), 6, 287–303

    CAS  Article  Google Scholar 

  43. Glenn, T. C. (2011) Field guide to next-generation DNA sequencers. Mol. Ecol. Resour., 11, 759–769

    CAS  PubMed  Article  Google Scholar 

  44. Cheng, D. T., Mitchell, T. N., Zehir, A., Shah, R. H., Benayed, R., Syed, A., Chandramohan, R., Liu, Z. Y., Won, H. H., Scott, S. N., et al. (2015) Memorial sloan kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): A hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn., 17, 251–264

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. Ma, X., Shao, Y., Tian, L., Flasch, D. A., Mulder, H. L., Edmonson, M. N., Liu, Y., Chen, X., Newman, S., Nakitandwe, J., et al. (2019) Analysis of error profiles in deep next-generation sequencing data. Genome Biol., 20, 50

    PubMed  PubMed Central  Article  Google Scholar 

  46. Young, A. L., Challen, G. A., Birmann, B. M. and Druley, T. E. (2016) Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun., 7, 12484

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Ulz, P., Heitzer, E., Geigl, J. B. and Speicher, M. R. (2017) Patient monitoring through liquid biopsies using circulating tumor DNA. Int. J. Cancer, 141, 887–896

    CAS  PubMed  Article  Google Scholar 

  48. Figueroa, M. E., Lugthart, S., Li, Y., Erpelinck-Verschueren, C., Deng, X., Christos, P. J., Schifano, E., Booth, J., van Putten, W., Skrabanek, L., et al. (2010) DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer Cell, 17, 13–27

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Ma, X., Wang, Y.W., Zhang, M. Q. and Gazdar, A. F. (2013) DNA methylation data analysis and its application to cancer research. Epigenomics, 5, 301–316

    PubMed  Article  CAS  Google Scholar 

  50. Zeineldin, M., Federico, S., Chen, X., Xu, B., Stewart, E., Naranjo, A., Hogarty, M.D., Dyer, M.A. (2020) MYCN amplification and ATRX mutations are incompatible in neuroblastoma. Nat. Commun., 11, 913

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Iacobucci, I., Li, Y., Roberts, K. G., Dobson, S. M., Kim, J. C., Payne- Turner, D., Harvey, R. C., Valentine, M., McCastlain, K., Easton, J., et al. (2016) Truncating erythropoietin receptor rearrangements in acute lymphoblastic leukemia. Cancer Cell, 29, 186–200

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Zhang, J., McCastlain, K., Yoshihara, H., Xu, B., Chang, Y., Churchman, M. L., Wu, G., Li, Y., Wei, L., Iacobucci, I., et al. (2016) Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet., 48, 1481–1489

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. Mansour, M. R., Abraham, B. J., Anders, L., Berezovskaya, A., Gutierrez, A., Durbin, A. D., Etchin, J., Lawton, L., Sallan, S. E., Silverman, L. B., et al. (2014) Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science, 346, 1373–1377

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Huang, F. W., Hodis, E., Xu, M. J., Kryukov, G. V., Chin, L. and Garraway, L. A. (2013) Highly recurrent TERT promoter mutations in human melanoma. Science, 339, 957–959

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. Zhang, H., Si, X., Ji, X., Fan, R., Liu, J., Chen, K., Wang, D. and Gao, C. (2018) Genome editing of upstream open reading frames enables translational control in plants. Nat. Biotechnol., 36, 894–898

    CAS  PubMed  Article  Google Scholar 

  56. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. and Stratton, M. R. (2013) Deciphering signatures of mutational processes operative in human cancer. Cell Reports, 3, 246–259

    CAS  PubMed  Article  Google Scholar 

  57. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Aparicio, S. A., Behjati, S., Biankin, A. V., Bignell, G. R., Bolli, N., Borg, A., Børresen-Dale, A. L., et al. (2013) Signatures of mutational processes in human cancer. Nature, 500, 415–421

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. Ng, A. W. T., Poon, S. L., Huang, M. N., Lim, J. Q., Boot, A., Yu, W., Suzuki, Y., Thangaraju, S., Ng, C. C. Y., Tan, P., et al. (2017) Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia. Sci. Transl. Med., 9, eaan6446

    PubMed  Article  CAS  Google Scholar 

  59. Brash, D. E. (2015) UV signature mutations. Photochem. Photobiol., 91, 15–26

    CAS  PubMed  Article  Google Scholar 

  60. Petljak, M., Alexandrov, L.B., Brammeld, J.S., Price, S., Wedge, D.C., Grossmann, S., Dawson, K.J., Ju, Y.S., Iorio, F., Tubio, J.M. C., et al. (2019) Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell, 176, 1282–1294

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Ye, K., Wang, J., Jayasinghe, R., Lameijer, E.W., McMichael, J. F., Ning, J., McLellan, M. D., Xie, M., Cao, S., Yellapantula, V., et al. (2016) Systematic discovery of complex insertions and deletions in human cancers. Nat. Med., 22, 97–104

    CAS  PubMed  Article  Google Scholar 

  62. Wiemels, J. L., Leonard, B. C., Wang, Y., Segal, M. R., Hunger, S. P., Smith, M. T., Crouse, V., Ma, X., Buffler, P. A. and Pine, S. R. (2002) Site-specific translocation and evidence of postnatal origin of the t(1;19) E2A-PBX1 fusion in childhood acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. USA, 99, 15101–15106

    CAS  PubMed  Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

X.M. is partly supported by The Innovation in Cancer Informatics (ICI) Fund. The authors are grateful to the editorial support by Makeda Porter- Carr.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaotu Ma.

Ethics declarations

The authors Xiaotu Ma, Sasi Arunachalam and Yanling Liu declare that they have no conflict of interests.

This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Author summary: With the rapid technology development and extensive research efforts in past decade, genomics approaches are playing an increasingly important role in human health problems such as cancer. A significant challenge in this endeavor is the analytical complexity associated with its big data nature that requires talents from scientists with quantitative background. In this review we aim to provide an introduction of genomic analyses by accounting for essential biological concepts, as well as exciting new frontiers for novel discoveries.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ma, X., Arunachalam, S. & Liu, Y. Applications of probability and statistics in cancer genomics. Quant Biol 8, 95–108 (2020). https://doi.org/10.1007/s40484-020-0203-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40484-020-0203-8

Keywords

  • cancer genomics
  • sequence analysis
  • probability and statistics