Relational Subgroup Discovery for Descriptive Analysis of Microarray Data

  • Igor Trajkovski
  • Filip Železný
  • Jakub Tolar
  • Nada Lavrač
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)


This paper presents a method that uses gene ontologies, together with the paradigm of relational subgroup discovery, to help find description of groups of genes differentially expressed in specific cancers. The descriptions are represented by means of relational features, extracted from gene ontology information, and are straightforwardly interpretable by the medical experts. We applied the proposed method to two known data sets: acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia and classification of fourteen types of cancer. Significant number of discovered groups of genes had a description, confirmed by the medical expert, which highlighted the underlying biological process that is responsible for distinguishing one class from the other classes. We view our methodology not just as a prototypical example of applying sophisticated machine learning algorithms to microarray data, but also as a motivation for developing more sophisticated functional annotations and ontologies, that can be processed by such learning algorithms.


Gene Ontology Acute Myeloid Leukemia Acute Lymphoblastic Leukemia Gene Expression Data Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning, 261–283 (1989)Google Scholar
  2. 2.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)CrossRefGoogle Scholar
  3. 3.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  4. 4.
    Kasper, G., et al.: Expression levels of the putative zinc transporter LIV-1 are associated with a better outcome of breast cancer patients. Int. J. Cancer. 20 117(6), 961–973 (2005)CrossRefGoogle Scholar
  5. 5.
    Khatri, P., et al.: Profiling gene expression using Onto-Express. Genomics 79, 266–270 (2002)CrossRefGoogle Scholar
  6. 6.
    Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21(18), 3587–3595 (2005)CrossRefGoogle Scholar
  7. 7.
    Lavrač, N., Železný, F., Flach, P.A.: RSD: Relational subgroup discovery through first-order feature construction. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS, vol. 2583, pp. 149–165. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Mani, A., Gelmann, E.P.: The ubiquitin-proteasome pathway and its role in cancer. Journal of Clinical Oncology 23, 4776–4789 (2005)CrossRefGoogle Scholar
  9. 9.
    Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)Google Scholar
  10. 10.
    Ramaswamy, S., Tamayo, P., Rifkin, R., et al.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)CrossRefGoogle Scholar
  11. 11.
    Raychaudhuri, S., Schtze, H.S., Altman, R.B.: Inclusion of textual documentation in the analysis of multidimensional data sets: application to gene expression data. Machine Learn. 52, 119–145 (2003)MATHCrossRefGoogle Scholar
  12. 12.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263. Springer, Heidelberg (1997)Google Scholar
  13. 13.
    Železný, F., Lavrač, N.: Propositionalization-Based Relational Subgroup Discovery with RSD. Machine Learning 62(1-2), 33–63 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Igor Trajkovski
    • 1
  • Filip Železný
    • 2
  • Jakub Tolar
    • 3
  • Nada Lavrač
    • 1
  1. 1.Department of Knowledge Technologies, Jozef Stefan InstituteLjubljanaSlovenia
  2. 2.Department of CyberneticsCzech Technical University in PraguePraha 6Czech Republic
  3. 3.Department of PediatricsUniversity of Minnesota Medical SchoolMinneapolisUSA

Personalised recommendations