Abstract
Python is currently the most used platform for data science and machine learning. At the same time, public knowledge graphs have been identified as a valuable source of background knowledge in many data science tasks. In this paper, we introduce the kgextension package for Python, which allows for using knowledge graph in data science pipelines built in Python. The demo shows how data from public knowledge graphs such as DBpedia and Wikidata can be used in data mining pipelines based on the popular Python package scikit-learn. We demonstrate the package’s utility by showing that the prediction accuracy on a popular Kaggle task can be significantly increased by using background knowledge from DBpedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Such as http://dbpedia.org/resource/*ENTITY*.
- 6.
- 7.
- 8.
- 9.
References
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_12
Jiménez-Ruiz, E., et al. (ed.): Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (2020)
Fernández, J.D., et al.: Binary RDF representation for publication and exchange (HDT). Web Semantics 19, 22–41 (2013)
Heist, N., Hertling, S., Ringler, D., Paulheim, H.: Knowledge graphs on the web-an overview (2020)
Jeong, Y., Myaeng, S.H.: Feature selection using a semantic hierarchy for event recognition and type classification. In: IJCNLP, pp. 136–144 (2013)
Lu, S., et al.: Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction. In: COLLABORATECOM, pp. 478–484 (2013)
Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: WIMS, pp. 1–12 (2012)
Pellegrino, M.A., Cochez, M., Garofalo, M., Ristoski, P.: A configurable evaluation framework for node embedding techniques. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11762, pp. 156–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32327-1_31
Portisch, J., Hladik, M., Paulheim, H.: KGvec2go-knowledge graph embeddings as a service. In: LREC, pp. 5641–5647 (2020)
Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with RapidMiner. J. Web Semant. 35, 142–151 (2015)
Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating features from linked open data. Linked Data Knowl. Discov. 6 (2014)
Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Discovery Science, pp. 288–300 (2014)
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Ristoski, P., Paulheim, H.: Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Semant. 36, 1–22 (2016)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10(4), 721–752 (2019)
Vandewiele, G., et al.: pyRDF2Vec: Python Implementation and Extension of RDF2Vec. IDLab (2020). https://github.com/IBCNServices/pyRDF2Vec
Verborgh, R., et al.: Triple Pattern Fragments: a low-cost knowledge graph interface for the Web. Web Semantics 37–38, 184–206 (2016)
Voit, M., Paulheim, H.: Bias in knowledge graphs - an empirical study with movie recommendation and different language editions of DBpedia. In: LDK (2021)
Wang, B.B., et al.: A comparative study for domain ontology guided feature extraction. In: ACSC, pp. 69–78 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bucher, TC., Jiang, X., Meyer, O., Waitz, S., Hertling, S., Paulheim, H. (2021). scikit-learn Pipelines Meet Knowledge Graphs. In: Verborgh, R., et al. The Semantic Web: ESWC 2021 Satellite Events. ESWC 2021. Lecture Notes in Computer Science(), vol 12739. Springer, Cham. https://doi.org/10.1007/978-3-030-80418-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-80418-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80417-6
Online ISBN: 978-3-030-80418-3
eBook Packages: Computer ScienceComputer Science (R0)