Hierarchical Clustering, Languages and Cancer

  • Pritha Mahata
  • Wagner Costa
  • Carlos Cotta
  • Pablo Moscato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)


In this paper, we introduce a novel objective function for the hierarchical clustering of data from distance matrices, a very relevant task in Bioinformatics. To test the robustness of the method, we test it in two areas: (a) the problem of deriving a phylogeny of languages and (b) subtype cancer classification from microarray data. For comparison purposes, we also consider both the use of ultrametric trees (generated via a two-phase evolutionary approach that creates a large number of hypothesis trees, and then takes a consensus), and the best-known results from the literature.

We used a dataset of measured ’separation time’ among 84 Indo-European languages. The hierarchy we produce agrees very well with existing data about these languages across a wide range of levels, and it helps to clarify and raise new hypothesis about the evolution of these languages.

Our method also generated a classification tree for the different cancers in the NCI60 microarray dataset (comprising gene expression data for 60 cancer cell lines). In this case, the method seems to support the current belief about the heterogeneous nature of the ovarian, breast and non-small-lung cancer, as opposed to the relative homogeneity of other types of cancer. However, our method reveals a close relationship of the melanoma and CNS cell-lines. This is in correspondence with the fact that metastatic melanoma first appears in central nervous system (CNS).


Hierarchical Cluster National Cancer Institute Memetic Algorithm Markov Chain Monte Carlo Method Scatter Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cotta, C., Moscato, P.: A memetic-aided approach to hierarchical clustering from distance matrices: Application to phylogeny and gene expression clustering. Biosystems 71, 75–97 (2003)CrossRefGoogle Scholar
  2. 2.
    Merz, P., Freisleben, B.: Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning. Evolutionary Computation 8, 61–91 (2000)CrossRefGoogle Scholar
  3. 3.
    Battiti, R., Bertossi, A.: Differential greedy for the 0-1 equicut problem. In: Proc. of DIMACS Workshop on Network Design: Connectivity and Facilities (1997)Google Scholar
  4. 4.
    Festa, P., Pardalos, P., Resende, M.G.C., Ribeiro, C.C.: Randomized heuristics for the MAX-CUT problem. Optimization Methods and Software 7, 1033–1058 (2002)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Wu, B., Chao, K.M., Tang, C.: Approximation and exact algorithms for constructing minimum ultrametric trees from distance matrices. Journal of Combinatorial Optimization 3, 199–211 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Cotta, C.: Scatter search with path relinking for phylogenetic inference. European Journal of Operational Research 169, 520–532 (2006)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Wang, J., Shan, H., Shasha, D., Piel, W.: Treerank: A similarity measure for nearest neighbor searching in phylogenetic databases. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management, Cambridge MA, pp. 171–180. IEEE Press, Los Alamitos (2003)CrossRefGoogle Scholar
  8. 8.
    Cotta, C.: On the application of evolutionary algorithms to the consensus tree problem. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 58–67. Springer, Heidelberg (2005)Google Scholar
  9. 9.
    Moilanen, A.: Searching for the most parsimonious trees with simulated evolution. Cladistics 15, 39–50 (1999)CrossRefGoogle Scholar
  10. 10.
    Cotta, C., Moscato, P.: Inferring phylogenetic trees using evolutionary algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 720–729. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Mallory, J.P.: Search of the Indo-European languages. Archaelogy and Myth (1989)Google Scholar
  12. 12.
    Renfrew, C.: Time-depth in historical linguistics. The McDonald Institute for Archaeological Research, 413–439 (2000)Google Scholar
  13. 13.
    Richards, M.: Tracing european founder lineage in the near easter mtDNA pool. Am. K. Hum. Genet. 67, 1251–1276 (2000)Google Scholar
  14. 14.
    Semoni: The genetic legacy of Paleolithic Homo Sapiens in extant europeans: a Y chromosome perspective. Science 290, 1155–1159 (2000)Google Scholar
  15. 15.
    Chikhi, L., Nichols, R., Barbujani, G., Beaumont, M.: Y genetic data support the Neolithic demic diffusion model. Prod. Natl. Acad., Sci. 67, 11008–11013 (2002)CrossRefGoogle Scholar
  16. 16.
    Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the Anatolian theory of indo-european origin. Nature 426, 435–439 (2003)CrossRefGoogle Scholar
  17. 17.
    Bryant, D., Filimon, F., Gray, R.: Untangling our past: Languages, trees, splits and networks. In: Mace, R., Holden, C., Shennan, S. (eds.) The Evolution of Cultural Diversity: Phylogenetic Approaches, pp. 69–85. UCL Press (2005)Google Scholar
  18. 18.
    Dyen, I., Kruskal, J.B., Black, P.: An Indo-European classification: A lexicostatistical experiment. Transactions of the American Philosophical Society, New Ser. 82, 1–132 (1992)Google Scholar
  19. 19.
    Cavalli-Sforza, L.: Genes, peoples, and languages. Proceedings of the National Academy of Sciences of the United States of America 94, 7719–7724 (1997)CrossRefGoogle Scholar
  20. 20.
    Ross, D.T., Scherf, U., Eisen, M., Perou, C., Rees, C., Spellman, P., Iyer, V., Jeffrey, S., Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C., Lashkari, D., Shalon, D., Myers, T., Weinstein, J.N., Botstein, D., Brown, P.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24, 227–235 (2000)CrossRefGoogle Scholar
  21. 21.
    Cotta, C., Langston, M., Moscato, P.: Combinatorial and algorithmic issues for microarray data analysis. In: Handbook of Approximation Algorithms and Metaheuristics. Chapman and Hall, Boca Raton (2005)Google Scholar
  22. 22.
    Hourani, M., Mendes, A., Berretta, R., Moscato, P.: A genetic signature for parkinsons disease using rodent brain gene expression. In: Keith, J. (ed.) Bioinformatics. Humana Press (2006)Google Scholar
  23. 23.
    Ferraresi, V., Ciccarese, M., Zeuli, M., Cognetti, F.: Central system as exclusive site disease in patients with melanoma: treatment and clinical outcome of two cases. Melanoma Res. 15, 467–469 (2005)CrossRefGoogle Scholar
  24. 24.
    Marchetti, D., Denkins, Y., Reiland, J., Greiter-Wilke, A., Galjour, J., Murry, B., Blust, J., Roy, M.: Brain-metastatic melanoma: a neurotrophic perspective. Pathology Oncology Research 9, 147–158 (2003)CrossRefGoogle Scholar
  25. 25.
    Buell, J., Gross, T., Alloway, R., Trofe, J., Woodle, E.: Central nervous system tumors in donors: Misdiagnosis carries a high morbidity and mortality. Transplantation Proceedings 37, 583–584 (2005)CrossRefGoogle Scholar
  26. 26.
    Perou, C.M., Jeffrey, S.S., Rijn, M., Rees, C.A., Eisen, M.B., Ross, D.T., Pergamenschikov, A., Williams, C.F., Zhu, S.X., Lee, J.C.F., Lashkari, D., Shalon, D., Brown, P.O., Botstein, D.: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Genetics 96, 9212–9217 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pritha Mahata
    • 1
    • 2
  • Wagner Costa
    • 1
  • Carlos Cotta
    • 3
  • Pablo Moscato
    • 1
    • 2
  1. 1.Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia
  2. 2.Australian Research Centre in Bioinformatics 
  3. 3.Dept. Lenguajes y Ciencias de la ComputaciónUniversity of Málaga, ETSI InformáticaMálagaSpain

Personalised recommendations