Explaining Subgroups through Ontologies

  • Anže Vavpetič
  • Vid Podpečan
  • Stijn Meganck
  • Nada Lavrač
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7458)


Subgroup discovery (SD) methods can be used to find interesting subsets of objects of a given class. Subgroup descriptions (rules) are themselves good explanations of the subgroups. Domain ontologies provide additional descriptions to data and can provide alternative explanations of discovered rules; such explanations in terms of higher level ontology concepts have the potential of providing new insights into the domain of investigation. We show that this additional explanatory power can be ensured by using recently developed semantic SD methods. We present the new approach to explaining subgroups through ontologies and demonstrate its utility on a gene expression profiling use case where groups of patients, identified through SD in terms of gene expression, are further explained through concepts from the Gene Ontology and KEGG orthology.


data mining subgroup discovery ontologies microarray data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atzmüller, M., Puppe, F.: SD-Map – A Fast Algorithm for Exhaustive Subgroup Discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 6–17. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)MATHCrossRefGoogle Scholar
  3. 3.
    Demšar, J., Zupan, B., Leban, G.: From experimental machine learning to interactive data mining, white paper. Faculty of Computer and Information Science. University of Ljubljana (2004), http://www.ailab.si/orange
  4. 4.
    Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52 (1999)Google Scholar
  5. 5.
    Elston, C.W., Ellis, I.O.: Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19(5), 403–410 (1991)CrossRefGoogle Scholar
  6. 6.
    Galea, M., Blamey, R., Elston, C., Ellis, I.: The Nottingham prognostic index in primary breast cancer. Breast Cancer Research and Treatment 22, 207–219 (1992)CrossRefGoogle Scholar
  7. 7.
    Gamberger, D., Lavrač, N.: Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research 17, 501–527 (2002)MATHGoogle Scholar
  8. 8.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  9. 9.
    Kavšek, B., Lavrač, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence 20(7), 543–583 (2006)CrossRefGoogle Scholar
  10. 10.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)Google Scholar
  11. 11.
    Kralj Novak, P., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377–403 (2009)MATHGoogle Scholar
  12. 12.
    Lavrač, N., Vavpetič, A., Soldatova, L., Trajkovski, I., Novak, P.K.: Using Ontologies in Semantic Data Mining with SEGS and g-SEGS. In: Elomaa, T., Hollmén, J., Mannila, H. (eds.) DS 2011. LNCS, vol. 6926, pp. 165–178. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research 5, 153–188 (2004)Google Scholar
  14. 14.
    Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at NCBI. Nucleic Acids Research 33(Database issue) (2005)Google Scholar
  15. 15.
    McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11(2), 242–253 (2010)CrossRefGoogle Scholar
  16. 16.
    Podpečan, V., Zemenova, M., Lavrač, N.: Orange4WS environment for service-oriented data mining. The Computer Journal Online Access (2011); advanced Access Published August 7, 2011: 10.1093/comjnl/bxr077Google Scholar
  17. 17.
    Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., Gruden, K.: SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics 12, 416 (2011)CrossRefGoogle Scholar
  18. 18.
    Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53, 23–69 (2003)MATHCrossRefGoogle Scholar
  19. 19.
    Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis 98(4), 262–272 (2006)Google Scholar
  20. 20.
    Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  21. 21.
    Suzuki, E.: Autonomous discovery of reliable exception rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 259–262 (1997)Google Scholar
  22. 22.
    Suzuki, E.: Data mining methods for discovering interesting exceptions from an unsupervised table. Journal of Universal Computer Science 12(6), 627–653 (2006)Google Scholar
  23. 23.
    Taminau, J., Steenhoff, D., Coletta, A., Meganck, S., Lazar, C., de Schaetzen, V., Duque, R., Molter, C., Bersini, H., Nowé, A., Weiss Solís, D.Y.: InSilicoDB: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO. Bioinformatics (2011)Google Scholar
  24. 24.
    Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: Search for enriched gene sets in microarray data. Journal of Biomedical Informatics 41(4), 588–601 (2008)CrossRefGoogle Scholar
  25. 25.
    Vavpetič, A., Lavrač, N.: Semantic data mining system g-SEGS. In: Proceedings of the Workshop on Planning to Learn and Service-Oriented Knowledge Discovery, PlanSoKD 2011, ECML PKDD Conference, Athens, Greece, September 5-9, pp. 17–29 (2011)Google Scholar
  26. 26.
    Webb, G.I., Butler, S.M., Newlands, D.: On detecting differences between groups. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 256–265 (2003)Google Scholar
  27. 27.
    Wrobel, S.: An Algorithm for Multi-relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Anže Vavpetič
    • 1
  • Vid Podpečan
    • 1
  • Stijn Meganck
    • 2
  • Nada Lavrač
    • 1
    • 3
  1. 1.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia
  2. 2.Computational Modeling LabVrije Universiteit BrusselBrusselBelgium
  3. 3.University of Nova GoricaNova GoricaSlovenia

Personalised recommendations