Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data

  • Benjamin J. Raphael
  • Fabio Vandin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)


Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruction tumor progression at the pathway level have restricted attention to known, a priori defined pathways.

In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the Pathway Linear Progression Model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that while this problem is NP-hard, with enough samples its optimal solution uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large number of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level.


Bayesian Network Somatic Mutation KRAS Mutation TP53 Mutation Cancer Pathway 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Attolini, C.S.-O., Cheng, Y.-K., Beroukhim, R., Getz, G., Abdel-Wahab, O., et al.: A mathematical framework to determine the temporal sequence of somatic genetic events in cancer. Proc. Natl. Acad. Sci. U S A 107(41), 17604–17609 (2010)CrossRefzbMATHGoogle Scholar
  2. 2.
    Beerenwinkel, N., Sullivant, S.: Markov models for accumulating mutations. Biometrika 96(3), 645–661 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Beerenwinkel, N., Eriksson, N., Sturmfels, B.: Evolution on distributive lattices. J. Theor. Biol. 242(2), 409–420 (2006)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Beerenwinkel, N., Eriksson, N., Sturmfels, B.: Conjunctive bayesian networks. Bernoulli 13(4), 893–909 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Beerenwinkel, N., Rahnenführer, J., Däumer, M., Hoffmann, D., Kaiser, R., et al.: Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol. 12(6), 584–598 (2005)CrossRefGoogle Scholar
  6. 6.
    Beerenwinkel, N., Rahnenführer, J., Kaiser, R., Hoffmann, D., Selbig, J., et al.: Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics 21(9), 2106–2107 (2005)CrossRefGoogle Scholar
  7. 7.
    Brennan, C.W., Verhaak, R.G.W., McKenna, A., Campos, B., Noushmehr, H., et al.: The somatic genomic landscape of glioblastoma. Cell 155(2), 462–477 (2013)CrossRefGoogle Scholar
  8. 8.
    Cheng, Y.-K., Beroukhim, R., Levine, R.L., Mellinghoff, I.K., Holland, E.C., et al.: A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput. Biol. 8(1), e1002337 (2012)Google Scholar
  9. 9.
    Ciriello, G., Cerami, E., Sander, C., Schultz, N.: Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22(2), 398–406 (2012)CrossRefGoogle Scholar
  10. 10.
    Dees, N.D., Zhang, Q., Kandoth, C., Wendl, M.C., Schierding, W., et al.: Music: identifying mutational significance in cancer genomes. Genome Res. 22(8), 1589–1598 (2012)CrossRefGoogle Scholar
  11. 11.
    Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H., et al.: Inferring tree models for oncogenesis from comparative genome hybridization data. J. Comput. Biol. 6(1), 37–51 (1999)CrossRefGoogle Scholar
  12. 12.
    Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H., et al.: Distance-based reconstruction of tree models for oncogenesis. J. Comput. Biol. 7(6), 789–803 (2000)CrossRefGoogle Scholar
  13. 13.
    Efron, B., Tibshirani, R.: An introduction to the bootstrap, 1st edn. Chapman and Hall (1994)Google Scholar
  14. 14.
    Fearon, E.R., Vogelstein, B.: A genetic model for colorectal tumorigenesis. Cell 61(5), 759–767 (1990)CrossRefGoogle Scholar
  15. 15.
    Fearon, E.R.: Molecular genetics of colorectal cancer. Annu. Rev. Pathol. 6, 479–507 (2011)CrossRefGoogle Scholar
  16. 16.
    Gerstung, M., Baudis, M., Moch, H., Beerenwinkel, N.: Quantifying cancer progression with conjunctive bayesian networks. Bioinformatics 25(21), 2809–2815 (2009)CrossRefGoogle Scholar
  17. 17.
    Gerstung, M., Eriksson, N., Lin, J., Vogelstein, B., Beerenwinkel, N.: The temporal order of genetic and pathway alterations in tumorigenesis. PLoS One 6(11), e27136 (2011)Google Scholar
  18. 18.
    Hjelm, M., Höglund, M., Lagergren, J.: New probabilistic network models and algorithms for oncogenesis. J. Comput. Biol. 13(4), 853–865 (2006)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Kandoth, C., McLellan, M.D., Vandin, F., Ye, K., Niu, B., et al.: Mutational landscape and significance across 12 major cancer types. Nature 502(7471), 333–339 (2013)CrossRefGoogle Scholar
  20. 20.
    Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., Cibulskis, K., et al.: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457), 214–218 (2013)CrossRefGoogle Scholar
  21. 21.
    Leiserson, M.D.M., Blokh, D., Sharan, R., Raphael, B.J.: Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9(5), e1003054 (2013)Google Scholar
  22. 22.
    Miller, C.A., Settle, S.H., Sulman, E.P., Aldape, K.D., Milosavljevic, A.: Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med. Genomics 4, 34 (2011)CrossRefGoogle Scholar
  23. 23.
    Rahnenführer, J., Beerenwinkel, N., Schulz, W.A., Hartmann, C., von Deimling, A., et al.: Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics 21(10), 2438–2446 (2005)CrossRefGoogle Scholar
  24. 24.
    Sakoparnig, T., Beerenwinkel, N.: Efficient sampling for bayesian inference of conjunctive bayesian networks. Bioinformatics 28(18), 2318–2324 (2012)CrossRefGoogle Scholar
  25. 25.
    Shahrabi Farahani, H., Lagergren, J.: Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 8(6), e65773 (2013)Google Scholar
  26. 26.
    The Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216), 1061–1068 (2008)Google Scholar
  27. 27.
    The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487(7407), 330–337 (2012)Google Scholar
  28. 28.
    Tofigh, A., Sjölund, E., Höglund, M., Lagergren, J.: A global structural em algorithm for a model of cancer progression. Advances in Neural Information Processing Systems 24, 163–171 (2011)Google Scholar
  29. 29.
    Vandin, F., Upfal, E., Raphael, B.J.: De novo discovery of mutated driver pathways in cancer. Genome Res. 22(2), 375–385 (2012)CrossRefGoogle Scholar
  30. 30.
    Vandin, F., Upfal, E., Raphael, B.J.: Finding driver pathways in cancer: models and algorithms. Algorithms Mol. Biol. 7(1), 23 (2012)CrossRefGoogle Scholar
  31. 31.
    Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz Jr., L.A., et al.: Cancer genome landscapes. Science 339(6127), 1546–1558 (2013)CrossRefGoogle Scholar
  32. 32.
    Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjöblom, T., et al.: The genomic landscapes of human breast and colorectal cancers. Science 318(5853), 1108–1113 (2007)CrossRefGoogle Scholar
  33. 33.
    Yeang, C.-H., McCormick, F., Levine, A.: Combinatorial patterns of somatic gene mutations in cancer. FASEB J. 22(8), 2605–2622 (2008)CrossRefGoogle Scholar
  34. 34.
    Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., et al.: International Cancer Genome Consortium Data Portal–a one-stop shop for cancer genomics data. Database (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Benjamin J. Raphael
    • 1
  • Fabio Vandin
    • 1
    • 2
  1. 1.Department of Computer Science and Center for Computational Molecular BiologyBrown UniversityProvidenceUSA
  2. 2.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark

Personalised recommendations