Abstract
Recent cancer sequencing studies provide a wealth of somatic mutation data from a large number of patients. One of the most intriguing and challenging questions arising from this data is to determine whether the temporal order of somatic mutations in a cancer follows any common progression. Since we usually obtain only one sample from a patient, such inferences are commonly made from cross-sectional data from different patients. This analysis is complicated by the extensive variation in the somatic mutations across different patients, variation that is reduced by examining combinations of mutations in various pathways. Thus far, methods to reconstruction tumor progression at the pathway level have restricted attention to known, a priori defined pathways.
In this work we show how to simultaneously infer pathways and the temporal order of their mutations from cross-sectional data, leveraging on the exclusivity property of driver mutations within a pathway. We define the Pathway Linear Progression Model, and derive a combinatorial formulation for the problem of finding the optimal model from mutation data. We show that while this problem is NP-hard, with enough samples its optimal solution uniquely identifies the correct model with high probability even when errors are present in the mutation data. We then formulate the problem as an integer linear program (ILP), which allows the analysis of datasets from recent studies with large number of samples. We use our algorithm to analyze somatic mutation data from three cancer studies, including two studies from The Cancer Genome Atlas (TCGA) on large number of samples on colorectal cancer and glioblastoma. The models reconstructed with our method capture most of the current knowledge of the progression of somatic mutations in these cancer types, while also providing new insights on the tumor progression at the pathway level.
This work is supported by NIH grant R01HG007069-01 and by NSF grant IIS-1247581.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Attolini, C.S.-O., Cheng, Y.-K., Beroukhim, R., Getz, G., Abdel-Wahab, O., et al.: A mathematical framework to determine the temporal sequence of somatic genetic events in cancer. Proc. Natl. Acad. Sci. U S A 107(41), 17604–17609 (2010)
Beerenwinkel, N., Sullivant, S.: Markov models for accumulating mutations. Biometrika 96(3), 645–661 (2009)
Beerenwinkel, N., Eriksson, N., Sturmfels, B.: Evolution on distributive lattices. J. Theor. Biol. 242(2), 409–420 (2006)
Beerenwinkel, N., Eriksson, N., Sturmfels, B.: Conjunctive bayesian networks. Bernoulli 13(4), 893–909 (2007)
Beerenwinkel, N., Rahnenführer, J., Däumer, M., Hoffmann, D., Kaiser, R., et al.: Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol. 12(6), 584–598 (2005)
Beerenwinkel, N., Rahnenführer, J., Kaiser, R., Hoffmann, D., Selbig, J., et al.: Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics 21(9), 2106–2107 (2005)
Brennan, C.W., Verhaak, R.G.W., McKenna, A., Campos, B., Noushmehr, H., et al.: The somatic genomic landscape of glioblastoma. Cell 155(2), 462–477 (2013)
Cheng, Y.-K., Beroukhim, R., Levine, R.L., Mellinghoff, I.K., Holland, E.C., et al.: A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis. PLoS Comput. Biol. 8(1), e1002337 (2012)
Ciriello, G., Cerami, E., Sander, C., Schultz, N.: Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22(2), 398–406 (2012)
Dees, N.D., Zhang, Q., Kandoth, C., Wendl, M.C., Schierding, W., et al.: Music: identifying mutational significance in cancer genomes. Genome Res. 22(8), 1589–1598 (2012)
Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H., et al.: Inferring tree models for oncogenesis from comparative genome hybridization data. J. Comput. Biol. 6(1), 37–51 (1999)
Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H., et al.: Distance-based reconstruction of tree models for oncogenesis. J. Comput. Biol. 7(6), 789–803 (2000)
Efron, B., Tibshirani, R.: An introduction to the bootstrap, 1st edn. Chapman and Hall (1994)
Fearon, E.R., Vogelstein, B.: A genetic model for colorectal tumorigenesis. Cell 61(5), 759–767 (1990)
Fearon, E.R.: Molecular genetics of colorectal cancer. Annu. Rev. Pathol. 6, 479–507 (2011)
Gerstung, M., Baudis, M., Moch, H., Beerenwinkel, N.: Quantifying cancer progression with conjunctive bayesian networks. Bioinformatics 25(21), 2809–2815 (2009)
Gerstung, M., Eriksson, N., Lin, J., Vogelstein, B., Beerenwinkel, N.: The temporal order of genetic and pathway alterations in tumorigenesis. PLoS One 6(11), e27136 (2011)
Hjelm, M., Höglund, M., Lagergren, J.: New probabilistic network models and algorithms for oncogenesis. J. Comput. Biol. 13(4), 853–865 (2006)
Kandoth, C., McLellan, M.D., Vandin, F., Ye, K., Niu, B., et al.: Mutational landscape and significance across 12 major cancer types. Nature 502(7471), 333–339 (2013)
Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., Cibulskis, K., et al.: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457), 214–218 (2013)
Leiserson, M.D.M., Blokh, D., Sharan, R., Raphael, B.J.: Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9(5), e1003054 (2013)
Miller, C.A., Settle, S.H., Sulman, E.P., Aldape, K.D., Milosavljevic, A.: Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med. Genomics 4, 34 (2011)
Rahnenführer, J., Beerenwinkel, N., Schulz, W.A., Hartmann, C., von Deimling, A., et al.: Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics 21(10), 2438–2446 (2005)
Sakoparnig, T., Beerenwinkel, N.: Efficient sampling for bayesian inference of conjunctive bayesian networks. Bioinformatics 28(18), 2318–2324 (2012)
Shahrabi Farahani, H., Lagergren, J.: Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS One 8(6), e65773 (2013)
The Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216), 1061–1068 (2008)
The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487(7407), 330–337 (2012)
Tofigh, A., Sjölund, E., Höglund, M., Lagergren, J.: A global structural em algorithm for a model of cancer progression. Advances in Neural Information Processing Systems 24, 163–171 (2011)
Vandin, F., Upfal, E., Raphael, B.J.: De novo discovery of mutated driver pathways in cancer. Genome Res. 22(2), 375–385 (2012)
Vandin, F., Upfal, E., Raphael, B.J.: Finding driver pathways in cancer: models and algorithms. Algorithms Mol. Biol. 7(1), 23 (2012)
Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz Jr., L.A., et al.: Cancer genome landscapes. Science 339(6127), 1546–1558 (2013)
Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjöblom, T., et al.: The genomic landscapes of human breast and colorectal cancers. Science 318(5853), 1108–1113 (2007)
Yeang, C.-H., McCormick, F., Levine, A.: Combinatorial patterns of somatic gene mutations in cancer. FASEB J. 22(8), 2605–2622 (2008)
Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., et al.: International Cancer Genome Consortium Data Portal–a one-stop shop for cancer genomics data. Database (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Raphael, B.J., Vandin, F. (2014). Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-05269-4_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)