Community-Based Semantic Subgroup Discovery

  • Blaž Škrlj
  • Jan Kralj
  • Anže Vavpetič
  • Nada Lavrač
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10785)


Modern data mining algorithms frequently need to address learning from heterogeneous data and knowledge sources, including ontologies. A data mining task in which ontologies are used as background knowledge is referred to as semantic data mining. A special form of semantic data mining is semantic subgroup discovery, where ontology terms are used in subgroup describing rules. We propose to enhance ontology-based subgroup identification by Community-Based Semantic Subgroup Discovery (CBSSD), taking into account also the structural properties of complex networks related to the studied phenomenon. The application of the developed CBSSD approach is demonstrated on two use cases from the field of molecular biology.


Semantic data mining Bioinformatics Community detection Network analysis Term enrichment analysis 



This research was funded by the Slovenian Research Agency funded project HinLife: Analysis of Heterogeneous Information Networks for Knowledge Discovery in Life Sciences (J7-7303), as well as the The Human Brain Project (FET Flagship grant FP7-ICT-604102). The authors also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan-XP GPU used for this research.


  1. 1.
    Drummond, A.J., Rambaut, A.: Beast: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7(1), 214 (2007)CrossRefGoogle Scholar
  2. 2.
    Madahian, B., Deng, L., Homayouni, R.: Development of a literature informed Bayesian machine learning method for feature extraction and classification. BMC Bioinform. 16(Suppl. 15), P9 (2015)CrossRefGoogle Scholar
  3. 3.
    Lavrač, N., Džeroski, S.: Inductive Logic Programming (1994)Google Scholar
  4. 4.
    Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2012)CrossRefGoogle Scholar
  5. 5.
    Balcan, N., Blum, A., Mansour, Y.: Exploiting structures and unlabeled data for learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. 1112–1120 (2013)Google Scholar
  6. 6.
    Liu, H., Dou, D., Jin, R., LePendu, P., Shah, N.: Mining biomedical ontologies and data using RDF hypergraphs. In: 2013 Proceedings of the 12th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 141–146. IEEE (2013)Google Scholar
  7. 7.
    Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, pp. 2–5 (2003)Google Scholar
  8. 8.
    Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)CrossRefGoogle Scholar
  9. 9.
    Eronen, L., Toivonen, H.: Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinform. 13(1), 119 (2012)CrossRefGoogle Scholar
  10. 10.
    Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 294–307. Springer, Heidelberg (2013). CrossRefGoogle Scholar
  11. 11.
    Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H.: Contrasting subgroup discovery. Comput. J. 56(3), 289–303 (2012)CrossRefGoogle Scholar
  12. 12.
    Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. Mach. Learn. 105(1), 3–39 (2016)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Cohen, R., Havlin, S.: Complex Networks: Structure, Robustness and Function. Cambridge University Press, Cambridge (2010)CrossRefMATHGoogle Scholar
  14. 14.
    Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. arXiv preprint physics/0506133 (2005)Google Scholar
  15. 15.
    Vrabič Rok, H.D., Butala, P.: Discovering autonomous structures within complex networks of work systems. CIRP Ann. Manuf. Technol. 61(1), 423–426 (2012)CrossRefGoogle Scholar
  16. 16.
    Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268 (2001)CrossRefMATHGoogle Scholar
  17. 17.
    Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72(2), 027104 (2005)CrossRefGoogle Scholar
  18. 18.
    The UniProt Consortium, et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2017)Google Scholar
  19. 19.
    Kanehisa, M., Goto, S.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  20. 20.
    Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: Genbank. Nucleic Acids Res. 41(D1), D36–D42 (2012)CrossRefGoogle Scholar
  21. 21.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)Google Scholar
  22. 22.
    Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  23. 23.
    Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Topics 178(1), 13–23 (2009)CrossRefGoogle Scholar
  24. 24.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  25. 25.
    Škrlj, B., Konc, J., Kunej, T.: Identification of sequence variants within experimentally validated protein interaction sites provides new insights into molecular mechanisms of disease development. Mol. Inform. 36, 1–8 (2017)Google Scholar
  26. 26.
    Škrlj, B., Kunej, T.: Computational identification of non-synonymous polymorphisms within regions corresponding to protein interaction sites. Comput. Biol. Med. 79, 30–35 (2016)CrossRefGoogle Scholar
  27. 27.
    Schröder, N.W., Schumann, R.R.: Single nucleotide polymorphisms of toll-like receptors and susceptibility to infectious disease. Lancet Infect. Dis. 5(3), 156–164 (2005)CrossRefGoogle Scholar
  28. 28.
    Kamburov, A., Lawrence, M.S., Polak, P., Leshchiner, I., Lage, K., Golub, T.R., Lander, E.S., Getz, G.: Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Nat. Acad. Sci. 112(40), E5486–E5495 (2015)CrossRefGoogle Scholar
  29. 29.
    Garrett, J.E., Capuano, I.V., Hammerland, L.G., Hung, B.C., Brown, E.M., Hebert, S.C., Nemeth, E.F., Fuller, F.: Molecular cloning and functional expression of human parathyroid calcium receptor cDNAs. J. Biol. Chem. 270(21), 12919–12925 (1995)CrossRefGoogle Scholar
  30. 30.
    Nanda, J.S., Kumar, R., Raghava, G.P.: dbEM: a database of epigenetic modifiers curated from cancerous and normal genomes. Sci. Rep. 6, 19340 (2016)CrossRefGoogle Scholar
  31. 31.
    Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., Stephens, R., Baseler, M.W., Lane, H.C., et al.: David bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(2), W169–W175 (2007)CrossRefGoogle Scholar
  32. 32.
    Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., et al.: Segmine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinform. 12(1), 416 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Blaž Škrlj
    • 1
  • Jan Kralj
    • 2
  • Anže Vavpetič
    • 2
  • Nada Lavrač
    • 2
    • 3
  1. 1.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia
  3. 3.University of Nova GoricaNova GoricaSlovenia

Personalised recommendations