Phylogenetic Cladograms: Tools for Analyzing Biomedical Data

Abstract

This chapter provides an introduction to phylogenetic cladograms – a systems biology evolutionary-based computational methodology that emphasizes the importance of considering multilevel heterogeneity in living systems when mining data related to these systems. We start by defining intelligence as the ability to predict, because prediction is a very important objective in mining data, especially biomedical data (Sect. 16.1). We then give a brief review of artificial intelligence (AI) and computational intelligence (CI) (Sects. 16.2, 16.3), provide a conciliatory overview of CI, and suggest that phylogenetic cladograms which provide hypotheses about speciation and inheritance relationships should be considered to be a CI methodology. We then discuss heterogeneity in biomedical data and talk about data types, how statistical methods blur heterogeneity, and the different results obtained between more traditional CI methodologies (phenetic) and phylogenetic techniques. Finally, we give an example of constructing and interpreting a phylogenetic cladogram tree.

Keywords

Computational Intelligence Prophylactic Mastectomy Omics Data Reduction Mammoplasty Turing Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

NP-hard

nondeterministic polynomial-time hard

AI

artificial intelligence

ANN

artificial neural network

CART

classification and regression trees

CI

computational intelligence

CNV

copy number variation

DNA

deoxyribonucleic acid

ER−

estrogen receptor-negative

ER+

estrogen receptor-positive

ER

endoplasmic reticulum

GWAS

genome-wide association scan

GePS

genomatix pathway system

MP

maximum parsimony

NCBI

National Center for Biotechnology Information

NIEpi

normal epithelial

PM

prophylactic mastectomy

RM

reduction mammoplasty

RNA

ribonucleic acid

ROC

receiver operating characteristic

SNP

single-nucleotide polymorphism

miRNA

microRNA

References

  1. 16.1.
    M.S. Abu-Asab, M. Chaouchi, S. Alesci, S. Galli, M. Laassri, A.K. Cheema, F. Atouf, J. VanMeter, H. Amri: Biomarkers in the age of omics: Time for a systems biology approach, OMICS 15(3), 105–112 (2011)CrossRefGoogle Scholar
  2. 16.2.
    H.H. Heng, G. Liu, J.B. Stevens, S.W. Bremer, K.J. Ye, B.Y. Abdallah, S.D. Horne, C.J. Ye: Decoding the genome beyond sequencing: The new phase of genomic research, Genomics 98(4), 242–252 (2011)CrossRefGoogle Scholar
  3. 16.3.
    L. Hood, J.R. Heath, M.E. Phelps, B. Lin: Systems biology and new technologies enable predictive and preventative medicine, Science 306(5696), 640–643 (2004)CrossRefGoogle Scholar
  4. 16.4.
    M. Eklund, O. Spjuth, J.E. Wikberg: An eScience-Bayes strategy for analyzing omics data, BMC Bioinformatics 11, 282 (2010)CrossRefGoogle Scholar
  5. 16.5.
    A. Galvan, J.P. Ioannidis, T.A. Dragani: Beyond genome-wide association studies: Genetic heterogeneity and individual predisposition to cancer, Trends Genetics 26(3), 132–141 (2010)CrossRefGoogle Scholar
  6. 16.6.
    M.S. Abu-Asab, H. Amri: Analyzing heterogeneous complexity in CAM research: A systems biology solution through parsimony phylogenetics, Forsch. Komplementärmed./Res. Complement. Med. 19(1), 42–48 (2012)CrossRefGoogle Scholar
  7. 16.7.
    C.L. Sawyers: The cancer biomarker problem, Nature 452(7187), 548–552 (2008)CrossRefGoogle Scholar
  8. 16.8.
    F. Davidoff: Heterogeneity is not always noise: Lessons from improvement, JAMA 302(23), 2580–2586 (2009)CrossRefGoogle Scholar
  9. 16.9.
    M. Abu-Asab, M. Chaouchi, H. Amri: Evolutionary medicine: A meaningful connection between omics, disease, treatment, Proteomics Clin. Appl. 2(2), 122–134 (2008)CrossRefGoogle Scholar
  10. 16.10.
    M. Abu-Asab: Microarrays need phylogenetics. Science STKE e-Lett (2009) available online from http://stke.sciencemag.org/cgi/eletters/sigtrans;1/51/eg11
  11. 16.11.
    M.L. Minsky: The Emotion Machine (Simon Schuster, New York 2006)Google Scholar
  12. 16.12.
    A. Turing: Computing machinery and intelligence, Mind 50, 433–460 (1950)MathSciNetCrossRefGoogle Scholar
  13. 16.13.
    J. Searle: Minds, Brains and Programs, Behav. Brain Sci. 3(3), 417–457 (1980)CrossRefGoogle Scholar
  14. 16.14.
    J. Hawkins, S. Blakeslee: On Intelligence (Times Books, New York 2004)Google Scholar
  15. 16.15.
    A.P. Engelbrecht: Computational Intelligence: An Introduction (Wiley, Hoboken 2002)Google Scholar
  16. 16.16.
    I. Guyon: Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing (Springer, Berlin, Heidelberg 2006)CrossRefGoogle Scholar
  17. 16.17.
    V.S. Cherkassky, F. Mulier: Learning from Data: Concepts, Theory, Methods, 2nd edn. (Wiley-Interscience, Hoboken 2007)CrossRefMATHGoogle Scholar
  18. 16.18.
    R. Xu, D.C. Wunsch: Clustering, IEEE Press Series on Computation Intelligence (Wiley, Hoboken 2009)Google Scholar
  19. 16.19.
    J.M. DeLeo: Receiver operating characteristic laboratory (ROCLAB): Software for developing decision strategies that account for uncertainty, Proc. 2nd Int. Symp. Uncertain. Modeling Anal. (1993)Google Scholar
  20. 16.20.
    J.E. Dayhoff, J.M. DeLeo: Artificial neural networks: Opening the black box, Cancer 91(8), 1615–1635 (2001)CrossRefGoogle Scholar
  21. 16.21.
    G.H. Heppner, B.E. Miller: Tumor heterogeneity: Biological implications and therapeutic consequences, Cancer Metastasis Rev. 2(1), 5–23 (1983)CrossRefGoogle Scholar
  22. 16.22.
    F. Michor, K. Polyak: The origins and implications of intratumor heterogeneity, Cancer Prev. Res. 3(11), 1361–1364 (2010)CrossRefGoogle Scholar
  23. 16.23.
    K. Puniyani, S. Kim, E.P. Xing: Multi-population GWA mapping via multi-task regularized regression, Bioinformatics 26(12), i208–i216 (2010)CrossRefGoogle Scholar
  24. 16.24.
    J.J. Berman: Precancer: The beginning and the End of Cancer (Jones Bartlett, Sudbury 2010)Google Scholar
  25. 16.25.
    W. Liu, W. Zhao, M.L. Shaffer, N. Icitovic, G.A. Chase: Modelling clinical trials in heterogeneous samples, Stat. Med. 24(18), 2765–2775 (2005)MathSciNetCrossRefGoogle Scholar
  26. 16.26.
    H.H. Heng, S.W. Bremer, J.B. Stevens, K.J. Ye, G. Liu, C.J. Ye: Genetic and epigenetic heterogeneity in cancer: A genome-centric perspective, J. Cell Physiol. 220(3), 538–547 (2009)CrossRefGoogle Scholar
  27. 16.27.
    J. McClellan, M.C. King: Genetic heterogeneity in human disease, Cell 141(2), 210–217 (2010)CrossRefGoogle Scholar
  28. 16.28.
    M.M. Holland, M.R. McQuillan, K.A. OʼHanlon: Second generation sequencing allows for mtDNA mixture deconvolution and high resolution detection of heteroplasmy, Croat. Med. J. 52(3), 299–313 (2011)CrossRefGoogle Scholar
  29. 16.29.
  30. 16.30.
    W. Liu, N. Icitovic, M.L. Shaffer, G.A. Chase: The impact of population heterogeneity on risk estimation in genetic counseling, BMC Med. Genetics 5, 18 (2004)CrossRefGoogle Scholar
  31. 16.31.
    M.S. Abu-Asab, M. Chaouchi, H. Amri: Phylogenetic modeling of heterogeneous gene-expression microarray data from cancerous specimens, OMICS 12(3), 183–199 (2008)CrossRefGoogle Scholar
  32. 16.32.
    K. Hotakainen, U.H. Stenman: Will emerging prostate cancer markers redeem themselves?, Clin. Chem. 56(8), 1212–1213 (2010)CrossRefGoogle Scholar
  33. 16.33.
    R. Jones: Biomarkers: Casting the net wide, Nature 466(7310), S11–S12 (2010)CrossRefGoogle Scholar
  34. 16.34.
    M. May: Biomarkers still off the mark for detecting breast cancer, Nat. Med. 16(1), 3 (2010)CrossRefGoogle Scholar
  35. 16.35.
    G. Poste: Bring on the biomarkers, Nature 469(7329), 156–157 (2011)CrossRefGoogle Scholar
  36. 16.36.
    J. Lyons-Weiler, S. Patel, M.J. Becich, T.E. Godfrey: Tests for finding complex patterns of differential expression in cancers: Towards individualized medicine, BMC Bioinformatics 5, 110 (2004)CrossRefGoogle Scholar
  37. 16.37.
    R. Xu, S. Damelin, B. Nadler, D.C. Wunsch 2nd: Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps, Artif. Intell. Med. 48(2-3), 91–98 (2010)CrossRefGoogle Scholar
  38. 16.38.
    M. Abu-Asab, M. Chaouchi, H. Amri: Phyloproteomics: What phylogenetic analysis reveals about serum proteomics, J. Proteome Res. 5(9), 2236–2240 (2006)CrossRefGoogle Scholar
  39. 16.39.
    R.S. Varghese, H.W. Ressom: LC-MS data analysis for differential protein expression detection, Methods Mol. Biol. 694, 139–150 (2011)CrossRefGoogle Scholar
  40. 16.40.
    H. Tsugawa, Y. Tsujimoto, M. Arita, T. Bamba, E. Fukusaki: GC/MS based metabolomics: Development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA), BMC Bioinformatics 12, 131 (2011)CrossRefGoogle Scholar
  41. 16.41.
    B.A. Goldstein, A.E. Hubbard, A. Cutler, L.F. Barcellos: An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings, BMC Genetics 11, 49 (2010)CrossRefGoogle Scholar
  42. 16.42.
    A. Thakur, V. Mishra, S.K. Jain: Feed forward artificial neural network: Tool for early detection of ovarian cancer, Sci. Pharm. 79(3), 493–505 (2011)CrossRefGoogle Scholar
  43. 16.43.
    J. Felsenstein: Inferring Phylogenies (Sinauer, Sunderland 2004)Google Scholar
  44. 16.44.
    W. Hennig: Phylogenetic systematics (Univ. of Illinois Press, Urbana 1966)Google Scholar
  45. 16.45.
    R. Eck, M. Dayhoff: Atlas of Protein Sequence and Structure (National Biomedical Research Foundation, Silver Spring 1966)Google Scholar
  46. 16.46.
    J. Camin, R. Sokal: A method for deducing branching sequences in phylogeny, Evolution 19, 311–326 (1965)CrossRefGoogle Scholar
  47. 16.47.
    S. Sridhar, F. Lam, G.E. Blelloch, R. Ravi, R. Schwartz: Direct maximum parsimony phylogeny reconstruction from genotype data, BMC Bioinformatics 8, 472 (2007)CrossRefGoogle Scholar
  48. 16.48.
    D. Stefankovic, E. Vigoda: Pitfalls of heterogeneous processes for phylogenetic reconstruction, Syst. Biol. 56(1), 113–124 (2007)CrossRefGoogle Scholar
  49. 16.49.
    D. Stefankovic, E. Vigoda: Phylogeny of mixture models: Robustness of maximum likelihood and non-identifiable distributions, J. Comput. Biol. 14(2), 156–189 (2007)MathSciNetCrossRefGoogle Scholar
  50. 16.50.
    K. Graham, A. de las Morenas, A. Tripathi, C. King, M. Kavanah, J. Mendez, M. Stone, J. Slama, M. Miller, G. Antoine, H. Willers, P. Sebastiani, C.L. Rosenberg: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile, Br. J. Cancer 102(8), 1284–1293 (2010)CrossRefGoogle Scholar
  51. 16.51.
    Affymetrix: http://www.affymetrix.com/ (2012)
  52. 16.52.
    D.H. Roukos: Mea Culpa with cancer-targeted therapy: New thinking and new agents design for novel, causal networks-based, personalized biomedicine, Expert Rev. Mol. Diagn. 9(3), 217–221 (2009)CrossRefGoogle Scholar
  53. 16.53.
    H.H. Heng, J.B. Stevens, S.W. Bremer, K.J. Ye, G. Liu, C.J. Ye: The evolutionary mechanism of cancer, J. Cell Biochem. 109(6), 1072–1084 (2010)Google Scholar
  54. 16.54.
    J. Felsenstein: PHYLIP: Phylogeny inference package (version 3.2), Cladistics 5, 164–166 (1989)Google Scholar
  55. 16.55.
    National Center for Biotechnology Information: GEO DataSets, available online at http://www.ncbi.nlm.nih.gov/gds/ (2012)
  56. 16.56.
  57. 16.57.
    P.W. Ewald: An evolutionary perspective on parasitism as a cause of cancer, Adv. Parasitol. 68, 21–43 (2009)CrossRefGoogle Scholar
  58. 16.58.
    D.J. Greig, F.M. Gulland, C.A. Rios, A.J. Hall: Hematology and serum chemistry in stranded and wild-caught harbor seals in central California: Reference intervals, predictors of survival, and parameters affecting blood variables, J. Wildl. Dis. 46(4), 1172–1184 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.National Eye InstituteNational Institutes of HealthBethesdaUSA
  2. 2.Laboratory for Informatics DevelopmentNational Institutes of Health Clinical CenterBethesdaUSA

Personalised recommendations