Network Ranking Assisted Semantic Data Mining

Kralj, Jan; Vavpetič, Anže; Dumontier, Michel; Lavrač, Nada

doi:10.1007/978-3-319-31744-1_65

Jan Kralj^15,16,
Anže Vavpetič^15,16,
Michel Dumontier¹⁸ &
…
Nada Lavrač^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9656))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1895 Accesses
5 Altmetric

Abstract

Semantic data mining (SDM) uses annotated data and interconnected background knowledge to generate rules that are easily interpreted by the end user. However, the complexity of SDM algorithms is high, resulting in long running times even when applied to relatively small data sets. On the other hand, network analysis algorithms are among the most scalable data mining algorithms. This paper proposes an effective SDM approach that combines semantic data mining and network analysis. The proposed approach uses network analysis to extract the most relevant part of the interconnected background knowledge, and then applies a semantic data mining algorithm on the pruned background knowledge. The application on acute lymphoblastic leukemia data set demonstrates that the approach is well motivated, is more efficient and results in rules that are comparable or better than the rules obtained by applying the incorporated SDM algorithm without network reduction in data preprocessing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The new variable needs to be ‘consumed’ by a literal to be added as a conjunction to this clause in the next step of rule refinement.

References

Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 1–12. Springer, Heidelberg (2014)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Article Google Scholar
Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 22, 723–730 (1950)
Article Google Scholar
Consortium, G.O.: The gene ontology project in 2008. Nucleic Acids Res. 36(Database–Issue), 440–444 (2008)
Google Scholar
Fisher, R.A.: On the interpretation of \(\chi ^{2}\) from contingency tables, and the calculation of P. J. Roy. Stat. Soc. 85(1), 87–94 (1922)
Article Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Article Google Scholar
Freeman, L.C.: Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1979)
Article Google Scholar
Hämäläinen, W.: Efficient search for statistically significant dependency rules in binary data. Ph.D. thesis, Department of Computer Science, University of Helsinki, Finland (2010)
Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
MathSciNet MATH Google Scholar
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2008)
Article Google Scholar
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Article MATH Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)
Google Scholar
Lavrač, N., Kavšek, B., Flach, P.A., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
MathSciNet Google Scholar
Ławrynowicz, A., Potoniec, J.: Fr-ONT: an algorithm for frequent concept mining with formal ontologies. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 428–437. Springer, Heidelberg (2011)
Chapter Google Scholar
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data mining (KDD 1998), pp. 80–86. AAAI Press (1998)
Google Scholar
Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T.: Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58 (2005)
Article Google Scholar
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI/MIT Press, Cambridge (1991)
Google Scholar
Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., et al.: SegMine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinformatics 12(1), 416 (2011)
Article Google Scholar
Srinivasan, A.: Aleph Manual (2007)
Google Scholar
Trajkovski, I., Lavrač, N., Tolar, J.: SEGS: search for enriched gene sets in microarray data. J. Biomed. Inform. 41(4), 588–601 (2008a)
Article Google Scholar
Trajkovski, I., Železný, F., Lavrač, N., Tolar, J.: Learning relational descriptions of differentially expressed gene groups. IEEE Trans. Syst. Man Cybern. Part C 38(1), 16–25 (2008b)
Article Google Scholar
Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2013)
Article Google Scholar
Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 294–307. Springer, Heidelberg (2013)
Chapter Google Scholar
Žáková, M., Železný, F., Garcia-Sedano, J.A., Masia Tissot, C., Lavrač, N., Křemen, P., Molina, J.: Relational data mining applied to virtual engineering of product designs. In: Muggleton, S.H., Otero, R., Tamaddoni-Nezhad, A. (eds.) ILP 2006. LNCS (LNAI), vol. 4455, pp. 439–453. Springer, Heidelberg (2007)
Chapter Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Chapter Google Scholar
Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: 2nd Annual Conference on Communication Networks and Services Research, pp. 305–314. IEEE (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj, Anže Vavpetič & Nada Lavrač
Jožef Stefan International Postgraduate School, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj, Anže Vavpetič & Nada Lavrač
University of Nova Gorica, Vipavska 13, 5000, Nova Gorica, Slovenia
Nada Lavrač
Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, USA
Michel Dumontier

Authors

Jan Kralj
View author publications
You can also search for this author in PubMed Google Scholar
Anže Vavpetič
View author publications
You can also search for this author in PubMed Google Scholar
Michel Dumontier
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kralj .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Francisco Ortuño
Universidad de Granada, Granada, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kralj, J., Vavpetič, A., Dumontier, M., Lavrač, N. (2016). Network Ranking Assisted Semantic Data Mining. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_65

Download citation

DOI: https://doi.org/10.1007/978-3-319-31744-1_65
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics