Mining Structural Databases: An Evolutionary Multi-Objetive Conceptual Clustering Methodology

  • R. Romero-Zaliz
  • C. Rubio-Escudero
  • O. Cordón
  • O. Harari
  • C. del Val
  • I. Zwir
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)


The increased availability of biological databases containing representations of complex objects permits access to vast amounts of data. In spite of the recent renewed interest in knowledge-discovery techniques (or data mining), there is a dearth of data analysis methods intended to facilitate understanding of the represented objects and related systems by their most representative features and those relationship derived from these features (i.e., structural data). In this paper we propose a conceptual clustering methodology termed EMO-CC for Evolutionary Multi-Objective Conceptual Clustering that uses multi-objective and multi-modal optimization techniques based on Evolutionary Algorithms that uncover representative substructures from structural databases. Besides, EMO-CC provides annotations of the uncovered substructures, and based on them, applies an unsupervised classification approach to retrieve new members of previously discovered substructures. We apply EMO-CC to the Gene Ontology database to recover interesting substructures that describes problems from different points of view and use them to explain inmuno-inflammatory responses measured in terms of gene expression profiles derived from the analysis of longitudinal blood expression profiles of human volunteers treated with intravenous endotoxin compared to placebo.


Pareto Front Structural Database Pareto Optimal Front Origin Recognition Complex Conceptual Cluster 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Siripurapu, V., Meth, J., Kobayashi, N., Hamaguchi, M.: Dbc2 significantly influences cell-cycle, apoptosis, cytoskeleton and membrane-trafficking pathways. Journal of Molecular Biology 346 (2005) 83–89 CrossRefGoogle Scholar
  2. 2.
    Nikitin, A., Egorov, S., Daraselia, N., Mazo, I.: Pathway studio–the analysis and navigation of molecular networks. Bioinformatics 19 (2003) 2155–2157 CrossRefGoogle Scholar
  3. 3.
    Consortium, T.G.O.: Gene ontology: tool for the unification of biology. Nature Genet 25, 25–29 (2000)CrossRefGoogle Scholar
  4. 4.
    Cook, D., Holder, L., Su, S., Maglothin, R., Jonyer, I.: Structural mining of molecular biology data. IEEE Engineering in Medicine and Biology, special issue on Advances in Genomics 4, 67–74 (2001)Google Scholar
  5. 5.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  6. 6.
    Ruspini, E., Zwir, I.: Automated generation of qualitative representations of complex object by hybrid soft-computing methods. In: Pal, S., Pal, A. (eds.) Pattern Recognition: From Classical to Modern Approaches, pp. 453–474. World Scientific Company, Singapore (2001)CrossRefGoogle Scholar
  7. 7.
    Back, T., Fogel, D., Michalewicz, Z. (eds.): Handbook of Evolutionary Computation. IOP Publishing Ltd., Bristol (1997)Google Scholar
  8. 8.
    Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, Chichester (2001)MATHGoogle Scholar
  9. 9.
    Coello-Coello, C., Veldhuizen, D.V., Lamont, G.: Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publishers, Dordrecht (2002)MATHGoogle Scholar
  10. 10.
    Romero-Zaliz, R., Cord´on, O., Rubio-Escudero, C., Zwir, I., Cobb, J. A multiobjective evolutionary conceptual clustering methodology for gene annotation from networking databases (Submited)Google Scholar
  11. 11.
    Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley- Interscience, Chichester (2000)Google Scholar
  12. 12.
    Der, G., Everitt, B.: A handbook of statistical analyses using SAS. CHAPMANHALL (1996)Google Scholar
  13. 13.
    Cheeseman, P., Oldfors, R.W.: Selecting models from data. Springer, Heidelberg (1994)MATHGoogle Scholar
  14. 14.
    Bezdek, J.: Fuzzy clustering. In: Ruspini, E., Bonissone, P., Pedrycz, W. (eds.) Handbook of Fuzzy Computation, pp. f6.1:1–f6.6:19. Institute of Physics Press (1998)Google Scholar
  15. 15.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2002)CrossRefGoogle Scholar
  16. 16.
    Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)MATHGoogle Scholar
  17. 17.
    Goldberg, D.: Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley, London (1989)MATHGoogle Scholar
  18. 18.
    Jaccard, P.: The distribution of flora in the alpine zone. The New Phytologist 11, 37–50 (1912); Mining Structural Databases: An EMO-CC Methodology 171CrossRefGoogle Scholar
  19. 19.
    Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation 3, 257–271 (1999)CrossRefGoogle Scholar
  20. 20.
    Romero-Zaliz, R., Zwir, I., Ruspini, E.: Generalized Analysis of Promoters (GAP): A method for DNA sequence description. In: Applications of Multi-Objective Evolutionary Algorithms, pp. 427–450. World Scientific, Singapore (2004)Google Scholar
  21. 21.
    Gasch, A., Eisen, M.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • R. Romero-Zaliz
    • 1
  • C. Rubio-Escudero
    • 1
  • O. Cordón
    • 1
  • O. Harari
    • 1
  • C. del Val
    • 1
  • I. Zwir
    • 1
    • 2
  1. 1.Dept. Computer Science and Artificial IntelligenceUniversity of GranadaSpain
  2. 2.Howard Hughes Medical Institute, Department of Molecular MicrobiologyWashington University School of MedicineSt. LouisUSA

Personalised recommendations