Network Ranking Assisted Semantic Data Mining

  • Jan Kralj
  • Anže Vavpetič
  • Michel Dumontier
  • Nada Lavrač
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9656)

Abstract

Semantic data mining (SDM) uses annotated data and interconnected background knowledge to generate rules that are easily interpreted by the end user. However, the complexity of SDM algorithms is high, resulting in long running times even when applied to relatively small data sets. On the other hand, network analysis algorithms are among the most scalable data mining algorithms. This paper proposes an effective SDM approach that combines semantic data mining and network analysis. The proposed approach uses network analysis to extract the most relevant part of the interconnected background knowledge, and then applies a semantic data mining algorithm on the pruned background knowledge. The application on acute lymphoblastic leukemia data set demonstrates that the approach is well motivated, is more efficient and results in rules that are comparable or better than the rules obtained by applying the incorporated SDM algorithm without network reduction in data preprocessing.

References

  1. 1.
    Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 1–12. Springer, Heidelberg (2014)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  3. 3.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  4. 4.
    Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22, 723–730 (1950)CrossRefGoogle Scholar
  5. 5.
    Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(Database–Issue), 440–444 (2008)Google Scholar
  6. 6.
    Fisher, R.A.: On the interpretation of \(\chi ^{2}\) from contingency tables, and the calculation of P. J. Roy. Stat. Soc. 85(1), 87–94 (1922)CrossRefGoogle Scholar
  7. 7.
    Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)CrossRefGoogle Scholar
  8. 8.
    Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)CrossRefGoogle Scholar
  9. 9.
    Hämäläinen, W.: Efficient search for statistically significant dependency rules in binary data. Ph.D. thesis, Department of Computer Science, University of Helsinki, Finland (2010)Google Scholar
  10. 10.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)MathSciNetMATHGoogle Scholar
  11. 11.
    Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2008)CrossRefGoogle Scholar
  12. 12.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefMATHGoogle Scholar
  13. 13.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)Google Scholar
  15. 15.
    Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  16. 16.
    Ławrynowicz, A., Potoniec, J.: Fr-ONT: an algorithm for frequent concept mining with formal ontologies. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 428–437. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data mining (KDD 1998), pp. 80–86. AAAI Press (1998)Google Scholar
  18. 18.
    Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58 (2005)CrossRefGoogle Scholar
  19. 19.
    Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)CrossRefGoogle Scholar
  20. 20.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
  21. 21.
    Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI/MIT Press, Cambridge (1991)Google Scholar
  22. 22.
    Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., et al.: SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics 12(1), 416 (2011)CrossRefGoogle Scholar
  23. 23.
    Srinivasan, A.: Aleph Manual (2007)Google Scholar
  24. 24.
    Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: search for enriched gene sets in microarray data. J. Biomed. Inform. 41(4), 588–601 (2008a)CrossRefGoogle Scholar
  25. 25.
    Trajkovski, I., Železný, F., Lavrač, N., Tolar, J.: Learning relational descriptions of differentially expressed gene groups. IEEE Trans. Syst. Man Cybern. Part C 38(1), 16–25 (2008b)CrossRefGoogle Scholar
  26. 26.
    Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2013)CrossRefGoogle Scholar
  27. 27.
    Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 294–307. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  28. 28.
    Žáková, M., Železný, F., Garcia-Sedano, J.A., Masia Tissot, C., Lavrač, N., Křemen, P., Molina, J.: Relational data mining applied to virtual engineering of product designs. In: Muggleton, S.H., Otero, R., Tamaddoni-Nezhad, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 439–453. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  30. 30.
    Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: 2nd Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jan Kralj
    • 1
    • 2
  • Anže Vavpetič
    • 1
    • 2
  • Michel Dumontier
    • 4
  • Nada Lavrač
    • 1
    • 2
    • 3
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia
  2. 2.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia
  3. 3.University of Nova GoricaNova GoricaSlovenia
  4. 4.Stanford Center for Biomedical Informatics ResearchStanford UniversityStanfordUSA

Personalised recommendations