Novel Multi-sample Scheme for Inferring Phylogenetic Markers from Whole Genome Tumor Profiles

  • Ayshwarya Subramanian
  • Stanley Shackney
  • Russell Schwartz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)


Computational cancer phylogenetics seeks to enumerate the temporal sequence of aberrations in tumor evolution, thereby delineating the evolution of possible tumor progression pathways, molecular subtypes and mechanisms of action. We previously developed a pipeline for constructing phylogenies describing evolution between major recurring cell types computationally inferred from whole-genome tumor profiles. The accuracy and detail of the phylogenies, however, depends on the identification of accurate, high-resolution molecular markers of progression, i.e., reproducible regions of aberration that robustly differentiate different subtypes and stages of progression. Here we present a novel hidden Markov model (HMM) scheme for the problem of inferring such phylogenetically significant markers through joint segmentation and calling of multi-sample tumor data. Our method classifies sets of genome-wide DNA copy number measurements into a partitioning of samples into normal (diploid) or amplified at each probe. It differs from other similar HMM methods in its design specifically for the needs of tumor phylogenetics, by seeking to identify robust markers of progression conserved across a set of copy number profiles. We show an analysis of our method in comparison to other methods on both synthetic and real tumor data, which confirms its effectiveness for tumor phylogeny inference and suggests avenues for future advances.


Bioinformatics cancer phylogenetics multi-sample array comparative genomic hybridization (aCGH) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Kuraya, K., Schraml, P., Torhorst, J., et al.: Prognostic relevance of gene amplifications and coamplifications in breast cancer. Cancer Research 64(23), 8534–8540 (2004)CrossRefGoogle Scholar
  2. 2.
    Ashworth, A., de Bono, J.S.: Translating cancer research into targeted therapeutics. Nature (2010)Google Scholar
  3. 3.
    Bamford, S., Dawson, E., Forbes, S., et al.: The COSMIC (catalogue of somatic mutations in cancer) database and website. Br. J. Cancer (2004)Google Scholar
  4. 4.
    Beroukhim, R., Getz, G., Nghiemphu, L., et al.: Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. Proceedings of the National Academy of Sciences 104(50), 20007–20012 (2007)CrossRefGoogle Scholar
  5. 5.
    Eilers, P.H.C., de Menezes, R.: Quantile smoothing of array CGH data. Bioinformatics 21(7), 1146–1153 (2005)CrossRefGoogle Scholar
  6. 6.
    Felsenstein, J.: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166 (1989)Google Scholar
  7. 7.
    Futreal, P.A., Coin, L., Marshall, M., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177–183 (2004)CrossRefGoogle Scholar
  8. 8.
    Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  9. 9.
    Horlings, H.M., Bergamaschi, A., Nordgard, S.H., et al.: ESR1 gene amplification in breast cancer: a common phenomenon? Nat. Genet. (2008)Google Scholar
  10. 10.
    Hsu, L., Self, S.G., Grove, D., et al.: Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 6(2), 211–226 (2005)MATHCrossRefGoogle Scholar
  11. 11.
    Kuhner, M.K., Felsenstein, J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology and Evolution 11(3), 459–468 (1994)Google Scholar
  12. 12.
    Miller, L.D., Smeds, J., George, J., et al.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Sciences of the United States of America 102(38), 13550–13555 (2005)CrossRefGoogle Scholar
  13. 13.
    Mittendorf, E.A., Liu, Y., Tucker, S.L., et al.: A novel interaction between HER2/neu and cyclin E in breast cancer. Oncogene 29, 3896–3907 (2010)CrossRefGoogle Scholar
  14. 14.
    Moelans, C.B., de Weger, R.A., Monsuur, H.N., et al.: Molecular profiling of invasive breast cancer by multiplex ligation-dependent probe amplification-based copy number analysis of tumor suppressor and oncogenes. Mod. Pathol. (2010)Google Scholar
  15. 15.
    Navin, N., Krasnitz, A., Rodgers, L., et al.: Inferring tumor progression from genomic heterogeneity. Genome Research 20, 68–80 (2010)CrossRefGoogle Scholar
  16. 16.
    Nowak, G., Hastie, T., Pollack, J.R., Tibshirani, R.: A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics 12(4), 776–791 (2011)CrossRefGoogle Scholar
  17. 17.
    Olshen, A.B., Venkatraman, E.S., Lucito, R., Wigler, M.: Circular binary segmentation for the analysis of array based DNA copy number data. Biostatistics 5(4), 557–572 (2004)MATHCrossRefGoogle Scholar
  18. 18.
    Perou, C.M., Sorlie, T., Eisen, M.B., et al.: Molecular portraits of human breast tumors. Nature 406, 747–752 (2000)CrossRefGoogle Scholar
  19. 19.
    Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B., Robin, S.: Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 12(3), 413–428 (2011)CrossRefGoogle Scholar
  20. 20.
    Picard, F., Robin, S., Lavielle, M., et al.: A statistical approach for array CGH data analysis. BMC Bioinformatics 6 (2005)Google Scholar
  21. 21.
    Pique-Regi, R., Ortega, A., Asgharzadeh, S.: Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics 25(10), 1223–1230 (2009)CrossRefGoogle Scholar
  22. 22.
    Scaltriti, M., Eichhorn, P.J., Cortes, J., et al.: Cyclin E amplification/overexpression is a mechanism of trastuzumab resistance in HER2+ breast cancer patients. Proceedings of the National Academy of Sciences (2011)Google Scholar
  23. 23.
    Schwartz, R., Shackney, S.: Applying unmixing to gene expression data for tumor phylogeny inference. BMC Bioinformatics 11, 42 (2010)CrossRefGoogle Scholar
  24. 24.
    Shah, S.P., Cheung, K.J., Johnson, N.A., et al.: Model-based clustering of array cgh data. Bioinformatics 25(12), i30–i38 (2009)CrossRefGoogle Scholar
  25. 25.
    Sorlie, T., Perrou, C.M., Tibshirani, R., et al.: Gene expression profiles of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10864 (2001)Google Scholar
  26. 26.
    Sotiriou, C., Neo, S.Y., McShane, L.M., et al.: Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc. Natl. Acad. Sci. USA 100, 10393–10398 (2003)CrossRefGoogle Scholar
  27. 27.
    Subramanian, A., Shackney, S., Schwartz, R.: Inference of tumor phylogenies from genomic assays on heterogeneous samples. In: Proc. ACM-BCB 2011 (2011)Google Scholar
  28. 28.
    Swafford, D.: PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4 (2002)Google Scholar
  29. 29.
    Tolliver, D., Tsourakakis, C., Subramanian, A., et al.: Robust unmixing of tumor states in array comparative genomic hybridization data. Bioinformatics 26(12), i106–i114 (2010)CrossRefGoogle Scholar
  30. 30.
    van’t Veer, L.J., Dai, H., van de Vivjer, M., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)CrossRefGoogle Scholar
  31. 31.
    Wang, K., Li, M., Hadley, D., et al.: Penncnv: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data. Genome Research 17(11), 1665–1674 (2007)CrossRefGoogle Scholar
  32. 32.
    Wiel, V.D., Mark, A., Brosens, R., et al.: Smoothing waves in array CGH tumor profiles. Bioinformatics 25(9), 1099–1104 (2009)CrossRefGoogle Scholar
  33. 33.
    Wu, L.Y., Chipman, H.A., Bull, S.B., Briollais, L., Wang, K.: A Bayesian segmentation approach to ascertain copy number variations at the population level. Bioinformatics 25(13), 1669–1679 (2009)CrossRefGoogle Scholar
  34. 34.
    Zhang, N.R., Senbabaoglu, Y., Li, J.Z.: Joint estimation of DNA copy number from multiple platforms. Bioinformatics 26(2), 153–160 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ayshwarya Subramanian
    • 1
  • Stanley Shackney
    • 2
  • Russell Schwartz
    • 1
    • 3
  1. 1.Department of Biological SciencesCarnegie Mellon UniversityPittsburghUSA
  2. 2.OncotherapeuticsPittsburghUSA
  3. 3.Lane Center for Computational BiologyCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations