Machine Learning with and for Semantic Web Knowledge Graphs

  • Heiko PaulheimEmail author
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11078)


Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.


Knowledge graphs Semantic web Machine learning Background knowledge 



I would like to thank (in alphabetical order) Aldo Gangemi, André Melo, Christian Bizer, Daniel Ringler, Eneldo Loza Mencía, Heiner Stuckenschmidt, Jessica Rosati, Johanna Völker, Julian Seitner, Kai Eckert, Michael Cochez, Nicolas Heist, Petar Ristoski, Renato De Leone, Robert Meusel, Simone Paolo Ponzetto, Stefano Faralli, Sven Hertling, and Tommaso Di Noia for their valuable input to this paper.


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H. (ed.) ISWC 2013. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013). Scholar
  3. 3.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998)CrossRefGoogle Scholar
  4. 4.
    Bryl, V., Bizer, C.: Learning conflict resolution strategies for cross-language Wikipedia data fusion. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 1129–1134. ACM (2014)Google Scholar
  5. 5.
    Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 101–110 (2010)Google Scholar
  6. 6.
    Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–78. ACM (2004)Google Scholar
  7. 7.
    Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: 20th ACM Conference on Information and Knowledge Management, CIKM 2011 (2011)Google Scholar
  8. 8.
    Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Biased graph walks for RDF graph embeddings. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, p. 21. ACM (2017)Google Scholar
  9. 9.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRefGoogle Scholar
  10. 10.
    Di Noia, T., Cantador, I., Ostuni, V.C.: Linked open data-enabled recommender systems: ESWC 2014 challenge on book recommendation. In: Presutti, V. (ed.) SemWebEval 2014. CCIS, vol. 475, pp. 129–143. Springer, Cham (2014). Scholar
  11. 11.
    Dong, X., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–610. ACM (2014)Google Scholar
  12. 12.
    Ehrlinger, L., Wöß, W.: Towards a definition of knowledge graphs. In: SEMANTiCS (2016)Google Scholar
  13. 13.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining, pp. 1–34. American Association for Artificial Intelligence, Menlo Park (1996). Scholar
  14. 14.
    Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 357–372. Springer, Cham (2014). Scholar
  15. 15.
    Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: 22nd International Conference on World Wide Web, pp. 413–422 (2013)Google Scholar
  16. 16.
    Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., Schneider, L.: Sweetening ontologies with DOLCE. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS, vol. 2473, pp. 166–181. Springer, Heidelberg (2002). Scholar
  17. 17.
    Glimm, B., Horrocks, I., Motik, B., Stoilos, G., Wang, Z.: HermiT: an OWL 2 reasoner. J. Autom. Reason. 53(3), 245–269 (2014)CrossRefGoogle Scholar
  18. 18.
    Groza, T., Oellrich, A., Collier, N.: Using silver and semi-gold standard corpora to compare open named entity recognisers. In: IEEE International Conference on Bioinformatics and Biomedicine, BIBM, pp. 481–485. IEEE, Piscataway (2013).
  19. 19.
    Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F. (ed.) ISWC 2010. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010). Scholar
  20. 20.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics (1992)Google Scholar
  21. 21.
    Hees, J., Bauer, R., Folz, J., Borth, D., Dengel, A.: An evolutionary algorithm to learn SPARQL queries for source-target-pairs: finding patterns for human associations in DBpedia. CoRR abs/1607.07249 (2016).
  22. 22.
    Heist, N., Hertling, S., Paulheim, H.: Language-agnostic relation extraction from abstracts in Wikis. Information 9(4), 75 (2018)CrossRefGoogle Scholar
  23. 23.
    Heist, N., Paulheim, H.: Language-agnostic relation extraction from Wikipedia abstracts. In: d’Amato, C. (ed.) ISWC 2017. LNCS, vol. 10587, pp. 383–399. Springer, Cham (2017). Scholar
  24. 24.
    Hertling, S., Paulheim, H.: WebIsALOD: providing hypernymy relations extracted from the web as linked open data. In: d’Amato, C. (ed.) ISWC 2017. LNCS, vol. 10588, pp. 111–119. Springer, Cham (2017). Scholar
  25. 25.
    Hertling, S., Paulheim, H.: Provisioning and usage of provenance data in the WEBIsALOD knowledge graph. In: First International Workshop on Contextualized Knowledge Graphs (2018)Google Scholar
  26. 26.
    Hofmann, A., Perchani, S., Portisch, J., Hertling, S., Paulheim, H.: DBkWik: towards knowledge graph creation from thousands of Wikis. In: International Semantic Web Conference (Posters and Demos) (2017)Google Scholar
  27. 27.
    Kang, N., van Mulligen, E.M., Kors, J.A.: Training text chunkers on a silver standard corpus: can silver replace gold? BMC Bioinform. 13(1), 17 (2012). Scholar
  28. 28.
    Narasimha, V., Kappara, P., Ichise, R., Vyas, O.P.: LiDDM: a data mining system for linked data. In: Workshop on Linked Data on the Web, LDOW 2011 (2011)Google Scholar
  29. 29.
    Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. 6(2), 167–195 (2013)Google Scholar
  31. 31.
    Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 75–76. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  32. 32.
    Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)CrossRefGoogle Scholar
  33. 33.
    Mahdisoltani, F., Biega, J., Suchanek, F.M.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2013)Google Scholar
  34. 34.
    Melo, A., Paulheim, H.: Local and global feature selection for multilabel classification with binary relevance. Artif. Intell. Rev. 1–28 (2017)Google Scholar
  35. 35.
    Melo, A., Paulheim, H.: Synthesizing knowledge graphs for link and type prediction benchmarking. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 136–151. Springer, Cham (2017). Scholar
  36. 36.
    Melo, A., Paulheim, H., Völker, J.: Type prediction in RDF knowledge bases using hierarchical multilabel classification. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, p. 14. ACM (2016)Google Scholar
  37. 37.
    Meusel, R., Petrovski, P., Bizer, C.: The WebDataCommons microdata, RDFa and microformat dataset series. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 277–292. Springer, Cham (2014). Scholar
  38. 38.
    Mihelčić, M., Antulov-Fantulin, N., Bošnjak, M., Šmuc, T.: Extending RapidMiner with recommender systems algorithms. In: RapidMiner Community Meeting and Conference, RCOMM 2012 (2012)Google Scholar
  39. 39.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  40. 40.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  41. 41.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). Scholar
  42. 42.
    Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: a survey and experimental evaluation. In: International Conference on Data Mining, ICDM, pp. 306–313. IEEE (2002)Google Scholar
  43. 43.
    Muñoz, E., Hogan, A., Mileo, A.: Using linked data to mine RDF from Wikipedia’s tables. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 533–542. ACM (2014)Google Scholar
  44. 44.
    Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20. AAAI, Palo Alto (2000).
  45. 45.
    Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013). Scholar
  46. 46.
    Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012). Scholar
  47. 47.
    Paulheim, H.: Knowledge graph refinement: a survey of approaches and evaluation methods. Semant. Web 8(3), 489–508 (2017)CrossRefGoogle Scholar
  48. 48.
    Paulheim, H.: Data-driven joint debugging of the DBpedia mappings and ontology. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 404–418. Springer, Cham (2017). Scholar
  49. 49.
    Paulheim, H.: How much is a triple? - estimating the cost of knowledge graph creation. In: ISWC Blue Sky Ideas (2018, to appear)Google Scholar
  50. 50.
    Paulheim, H.: Make embeddings semantic again! In: ISWC Blue Sky Ideas (2018, to appear)Google Scholar
  51. 51.
    Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). Scholar
  52. 52.
    Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
  53. 53.
    Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: International Conference on Web Intelligence, Mining, and Semantics, WIMS 2012 (2012)Google Scholar
  54. 54.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than just adding a cherry on top. In: Arenas, M. (ed.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Cham (2015). Scholar
  55. 55.
    Paulheim, H., Pan, J.Z.: Why the semantic web should become more imprecise (2012)Google Scholar
  56. 56.
    Paulheim, H., Ponzetto, S.P.: Extending DBpedia with Wikipedia list pages. In: NLP-DBPEDIA ISWC 2013 (2013)Google Scholar
  57. 57.
    Paulheim, H., Stuckenschmidt, H.: Fast approximate A-box consistency checking using machine learning. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 135–150. Springer, Cham (2016). Scholar
  58. 58.
    Pellissier Tanon, T., Vrandečić, D., Schaffert, S., Steiner, T., Pintscher, L.: From freebase to Wikidata: the great migration. In: Proceedings of the 25th International Conference on World Wide Web, pp. 1419–1428 (2016)Google Scholar
  59. 59.
    Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Boston, MA (2011). Scholar
  60. 60.
    Rico, M., Mihindukulasooriya, N., Kontokostas, D., Paulheim, H., Hellmann, S., Gómez-Pérez, A.: Predicting incorrect mappings: a data-driven approach applied to DBpedia (2018)Google Scholar
  61. 61.
    Ringler, D., Paulheim, H.: One knowledge graph to rule them all? Analyzing the differences between DBpedia, YAGO, Wikidata & co. In: Kern-Isberner, G., Fürnkranz, J., Thimm, M. (eds.) KI 2017. LNCS, vol. 10505, pp. 366–372. Springer, Cham (2017). Scholar
  62. 62.
    Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with RapidMiner. Web Semant.: Sci. Serv. Agents World Wide Web 35, 142–151 (2015)CrossRefGoogle Scholar
  63. 63.
    Ristoski, P., Loza Mencía, E., Paulheim, H.: A hybrid multi-strategy recommender system using linked open data. In: Presutti, V. (ed.) SemWebEval 2014. CCIS, vol. 475, pp. 150–156. Springer, Cham (2014). Scholar
  64. 64.
    Ristoski, P., Paulheim, H.: Analyzing statistics with background knowledge from linked open data. In: Workshop on Semantic Statistics (2013)Google Scholar
  65. 65.
    Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating features from linked open data. In: Linked Data for Knowledge Discovery, p. 6 (2014)Google Scholar
  66. 66.
    Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 288–300. Springer, Cham (2014). Scholar
  67. 67.
    Ristoski, P., Paulheim, H.: Visual analysis of statistical data on maps using linked open data. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 138–143. Springer, Cham (2015). Scholar
  68. 68.
    Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P. (ed.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). Scholar
  69. 69.
    Ristoski, P., Paulheim, H.: Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Semant.: Sci. Serv. Agents World Wide Web 36, 1–22 (2016)CrossRefGoogle Scholar
  70. 70.
    Ristoski, P., Rosati, J., Noia, T.D., Leone, R.D., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semant. Web (2018)Google Scholar
  71. 71.
    Rosati, J., Ristoski, P., Di Noia, T., Leone, R.D., Paulheim, H.: RDF graph embeddings for content-based recommender systems. In: CEUR Workshop Proceedings, vol. 1673, pp. 23–30. RWTH (2016)Google Scholar
  72. 72.
    Sarjant, S., Legg, C., Robinson, M., Medelyan, O.: “All you can eat” ontology-building: feeding Wikipedia to Cyc. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 341–348. IEEE Computer Society, Piscataway (2009).
  73. 73.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). Scholar
  74. 74.
    Seitner, J., et al.: A large database of hypernymy relations extracted from the web. In: LREC (2016)Google Scholar
  75. 75.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)MathSciNetCrossRefGoogle Scholar
  76. 76.
    Thalhammer, A., Rettinger, A.: PageRank on Wikipedia: towards general importance scores for entities. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 227–240. Springer, Cham (2016). Scholar
  77. 77.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. Artif. Intell. Res. 10(1), 271–289 (1999)CrossRefGoogle Scholar
  78. 78.
    Tonon, A., Felder, V., Difallah, D.E., Cudré-Mauroux, P.: VoldemortKG: mapping and web entities to linked open data. In: Groth, P. (ed.) ISWC 2016. LNCS, vol. 9982, pp. 220–228. Springer, Cham (2016). Scholar
  79. 79.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledge base. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  80. 80.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). Scholar
  81. 81.
    Zaveri, A., et al.: User-driven quality evaluation of DBpedia. In: 9th International Conference on Semantic Systems, I-SEMANTICS 2013, pp. 97–104. ACM, New York (2013).
  82. 82.
    Zhang, X., Yuan, Q., Zhao, S., Fan, W., Zheng, W., Wang, Z.: Multi-label classification without the multi-label cost. In: Proceedings of the Tenth SIAM International Conference on Data Mining (2010)Google Scholar
  83. 83.
    Zimmermann, A., Gravier, C., Subercaze, J., Cruzille, Q.: Nell2RDF: read the web, and turn it into RDF. In: Knowledge Discovery and Data Mining Meets Linked Open Data. CEUR Workshop Proceedings, vol. 992, pp. 2–8 (2013).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany

Personalised recommendations