Hierarchical Clustering, Languages and Cancer

  • Pritha Mahata
  • Wagner Costa
  • Carlos Cotta
  • Pablo Moscato
Conference paper

DOI: 10.1007/11732242_7

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)
Cite this paper as:
Mahata P., Costa W., Cotta C., Moscato P. (2006) Hierarchical Clustering, Languages and Cancer. In: Rothlauf F. et al. (eds) Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg

Abstract

In this paper, we introduce a novel objective function for the hierarchical clustering of data from distance matrices, a very relevant task in Bioinformatics. To test the robustness of the method, we test it in two areas: (a) the problem of deriving a phylogeny of languages and (b) subtype cancer classification from microarray data. For comparison purposes, we also consider both the use of ultrametric trees (generated via a two-phase evolutionary approach that creates a large number of hypothesis trees, and then takes a consensus), and the best-known results from the literature.

We used a dataset of measured ’separation time’ among 84 Indo-European languages. The hierarchy we produce agrees very well with existing data about these languages across a wide range of levels, and it helps to clarify and raise new hypothesis about the evolution of these languages.

Our method also generated a classification tree for the different cancers in the NCI60 microarray dataset (comprising gene expression data for 60 cancer cell lines). In this case, the method seems to support the current belief about the heterogeneous nature of the ovarian, breast and non-small-lung cancer, as opposed to the relative homogeneity of other types of cancer. However, our method reveals a close relationship of the melanoma and CNS cell-lines. This is in correspondence with the fact that metastatic melanoma first appears in central nervous system (CNS).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pritha Mahata
    • 1
    • 2
  • Wagner Costa
    • 1
  • Carlos Cotta
    • 3
  • Pablo Moscato
    • 1
    • 2
  1. 1.Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia
  2. 2.Australian Research Centre in Bioinformatics 
  3. 3.Dept. Lenguajes y Ciencias de la ComputaciónUniversity of Málaga, ETSI InformáticaMálagaSpain

Personalised recommendations