On the Sample Complexity of Cancer Pathways Identification

  • Fabio VandinEmail author
  • Benjamin J. Raphael
  • Eli Upfal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)


In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis of mutations from large cancer sequencing studies. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.


Thyroid Cancer Papillary Thyroid Carcinoma Sample Complexity Cancer Pathway Range Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bousquet, O., Boucheron, S., Lugosi, G.: Introduction to Statistical Learning Theory. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) Machine Learning 2003. LNCS (LNAI), vol. 3176, pp. 169–207. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  2. 2.
    Ciriello, G., Miller, M.L., Aksoy, B.A., Senbabaoglu, Y., et al.: Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 45(10), 1127–1133 (2013)CrossRefGoogle Scholar
  3. 3.
    Ciriello, G., Cerami, E., Sander, C., Schultz, N.: Mutual exclusivity analysis identifies oncogenic network modules. Genome. Res. 22(2), 398–406 (2012)CrossRefGoogle Scholar
  4. 4.
    Dees, N.D., Zhang, Q., Kandoth, C., Wendl, M.C., et al.: Music: identifying mutational significance in cancer genomes. Genome. Res. 22(8), 1589–1598 (2012)CrossRefGoogle Scholar
  5. 5.
    Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proceedings of the National Academy of Sciences 103(15), 5923–5928 (2006)CrossRefGoogle Scholar
  6. 6.
    Garraway, L.A., Lander, E.S.: Lessons from the cancer genome. Cell 153(1), 17–37 (2013)CrossRefGoogle Scholar
  7. 7.
    Kimura, E.T., Nikiforova, M.N., Zhu, Z., Knauf, J.A., et al.: High prevalence of braf mutations in thyroid cancer: genetic evidence for constitutive activation of the ret/ptc-ras-braf signaling pathway in papillary thyroid carcinoma. Cancer Res. 63(7), 1454–1457 (2003)Google Scholar
  8. 8.
    Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., et al.: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457), 214–8 (2013)CrossRefGoogle Scholar
  9. 9.
    Leiserson, M.D.M., Blokh, D., Sharan, R., Raphael, B.J.: Simultaneous identification of multiple driver pathways in cancer. PLoS Comput. Biol. 9(5), e1003054 (2013)CrossRefGoogle Scholar
  10. 10.
    Martin, M., Maßhöfer, L., Temming, P., Rahmann, S., et al.: Exome sequencing identifies recurrent somatic mutations in eif1ax and sf3b1 in uveal melanoma with disomy 3. Nat. Genet. 45(8), 933–936 (2013)CrossRefGoogle Scholar
  11. 11.
    Miller, C.A., Settle, S.H., Sulman, E.P., Aldape, K.D., Milosavljevic, A.: Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med. Genomics 4, 34 (2011)CrossRefGoogle Scholar
  12. 12.
    Mitzenmacher, M., Upfal, E.: Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press (2005)Google Scholar
  13. 13.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT Press (2012)Google Scholar
  14. 14.
    Perkins, T.J., Hallett, M.T.: A trade-off between sample complexity and computational complexity in learning boolean networks from time-series data. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 118–125 (2010)CrossRefGoogle Scholar
  15. 15.
    Raphael, B.J., Dobson, J.R., Oesper, L., Vandin, F.: Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome. Med. 6(1), 5 (2014)CrossRefGoogle Scholar
  16. 16.
    Shrestha, R., Hodzic, E., Yeung, J., Wang, K., Sauerwald, T., Dao, P., Anderson, S., Beltran, H., Rubin, M.A., Collins, C.C., Haffari, G., Sahinalp, S.C.: HIT’nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 293–306. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  17. 17.
    Szczurek, E., Beerenwinkel, N.: Modeling mutual exclusivity of cancer mutations. PLoS Comput. Biol. 10(3), e1003503 (2014)CrossRefGoogle Scholar
  18. 18.
    TCGA Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216), 1061–8 (2008)CrossRefGoogle Scholar
  19. 19.
    Weinstein, J.N., Collisson, E.A., Mills, G.B., et al. TCGA Research Network, The cancer genome atlas pan-cancer analysis project. Nat. Genet., 45(10), 1113–1120 (2013)Google Scholar
  20. 20.
    Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)CrossRefzbMATHGoogle Scholar
  21. 21.
    Vandin, F., Upfal, E., Raphael, B.J.: Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18(3), 507–522 (2011)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Vandin, F., Upfal, E., Raphael, B.J.: De novo discovery of mutated driver pathways in cancer. Genome. Res. 22(2), 375–385 (2012)CrossRefGoogle Scholar
  23. 23.
    Vandin, F., Upfal, E., Raphael, B.J.: Finding driver pathways in cancer: models and algorithms. Algorithms Mol. Biol. 7(1), 23 (2012)CrossRefGoogle Scholar
  24. 24.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications 16(2), 264–280 (1971)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., et al.: Cancer genome landscapes. Science 339(6127), 1546–58 (2013)CrossRefGoogle Scholar
  26. 26.
    Yeang, C.-H., McCormick, F., Levine, A.: Combinatorial patterns of somatic gene mutations in cancer. FASEB J. 22(8), 2605–2622 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Fabio Vandin
    • 1
    • 2
    • 3
    Email author
  • Benjamin J. Raphael
    • 2
    • 3
  • Eli Upfal
    • 2
  1. 1.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark
  2. 2.Department of Computer ScienceBrown UniversityProvidenceUSA
  3. 3.Center for Computational Molecular BiologyBrown UniversityProvidenceUSA

Personalised recommendations