Towards Linked Open Data Enabled Data Mining

Strategies for Feature Generation, Propositionalization, Selection, and Consolidation
  • Petar RistoskiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9088)


Background knowledge from Linked Open Data sources can be used to improve the results of a data mining problem at hand: predictive models can become more accurate, and descriptive models can reveal more interesting findings. However, collecting and integrating background knowledge is a tedious manual work. In this paper we propose a set of desiderata, and identify the challenges for developing a framework for unsupervised generation of data mining features from Linked Data.


Linked Open Data Data mining Feature generation 



This thesis is supervised by prof. Dr. Heiko Paulheim. The work presented in this paper has been partly funded by the German Research Foundation (DFG) under grant number PA 2373/1-1 (Mine@LOD).


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. IJSWIS 5, 1–22 (2009)Google Scholar
  2. 2.
    Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1), 1–41 (2008)CrossRefGoogle Scholar
  3. 3.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. intell. 97, 245–271 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM (2011)Google Scholar
  5. 5.
    de Vries, G.K.D.: A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 606–621. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  6. 6.
    de Vries, G.K.D., de Rooij, S.: A fast and simple graph kernel for RDF. In: DMLOD (2013)Google Scholar
  7. 7.
    Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, New York (2007) zbMATHGoogle Scholar
  8. 8.
    Fanizzi, N., d’Amato, C.: A declarative kernel for \({\cal ALC}\) concept descriptions. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 322–331. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  9. 9.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. AAAI Press, Cambridge (1996)Google Scholar
  10. 10.
    Huang, Y., Tresp, V., Nickel, M., Kriegel, H.-P.: A scalable approach for statistical learning in semantic graphs. Semant. Web 5, 5–22 (2014)Google Scholar
  11. 11.
    Jeong, Y., Myaeng, S.-H.: Feature selection using a semantic hierarchy for event recognition and type classification. In: International Joint Conference on Natural Language Processing (2013)Google Scholar
  12. 12.
    Kappara, V.N.P., Ichise, R., Vyas, O.P.: Liddm: a data mining system for linked data. In: LDOW (2011)Google Scholar
  13. 13.
    Khan, M.A., Grimnes, G.A., Dengel, A.: Two pre-processing operators for improved learning from semanticweb data. In: RCOMM (2010)Google Scholar
  14. 14.
    Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 262–291. Springer, New York (2001) CrossRefGoogle Scholar
  15. 15.
    Lösch, U., Bloehdorn, S., Rettinger, A.: Graph kernels for RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 134–148. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  16. 16.
    Lu, S., Ye, Y., Tsui, R.: Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction. In: Collaboratecom, pp. 478–484 (2013)Google Scholar
  17. 17.
    Mynarz, J., Svátek, V.: Towards a benchmark for LOD-enhanced knowledge discovery from structured data. In: The Second International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (2013)Google Scholar
  18. 18.
    Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: WCWIMS (2012)Google Scholar
  19. 19.
    Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C.: Data mining with background knowledge from the web. In: RapidMiner World (2014)Google Scholar
  20. 20.
    Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. In: Semantic Web Challenge at ISWC (2014)Google Scholar
  21. 21.
    Ristoski, P., Loza Mencía, E., Paulheim, H.: A hybrid multi-strategy recommender system using linked open data. In: Presutti, V., et al. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 150–156. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  22. 22.
    Ristoski, P., Paulheim, H.: Analyzing statistics with background knowledge from linked open data. In: Workshop on Semantic Statistics (2013)Google Scholar
  23. 23.
    Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating features from linked open data. In: LD4KD (2014)Google Scholar
  24. 24.
    Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 288–300. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  25. 25.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  26. 26.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: PARIS: probabilistic alignment of relations, instances, and schema. PVLDB 5(3), 157–168 (2011)Google Scholar
  27. 27.
    Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: looking for clusters explanations in a labyrinth of linked data. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 333–348. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  28. 28.
    Wang, B.B., Mckay, R.I.B., Abbass, H.A., Barlow, M.: A comparative study for domain ontology guided feature extraction. In: ACSC (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of MannheimMannheimGermany

Personalised recommendations