Advertisement

Sustainable Linked Data Generation: The Case of DBpedia

  • Wouter Maroy
  • Anastasia Dimou
  • Dimitris Kontokostas
  • Ben De Meester
  • Ruben Verborgh
  • Jens Lehmann
  • Erik Mannens
  • Sebastian Hellmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10588)

Abstract

dbpedia ef, the generation framework behind one of the Linked Open Data cloud’s central interlinking hubs, has limitations with regard to quality, coverage and sustainability of the generated dataset. dbpedia can be further improved both on schema and data level. Errors and inconsistencies can be addressed by amending (i) the dbpedia ef; (ii) the dbpedia mapping rules; or (iii) Wikipedia itself from which it extracts information. However, even though the dbpedia ef and mapping rules are continuously evolving and several changes were applied to both of them, there are no significant improvements on the dbpedia dataset since its limitations were identified. To address these shortcomings, we propose adapting a different semantic-driven approach that decouples, in a declarative manner, the extraction, transformation and mapping rules execution. In this paper, we provide details regarding the new dbpedia ef, its architecture, technical implementation and extraction results. This way, we achieve an enhanced data generation process, which can be broadly adopted, and that improves its quality, coverage and sustainability.

References

  1. 1.
    Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between RDF and XML with XSPARQL. J. Data Semant. 1(3), 147–185 (2012)Google Scholar
  2. 2.
    Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. Working group recommendation, W3C, September 2012. http://www.w3.org/TR/r2rml/
  3. 3.
    De Meester, B., Dimou, A.: The Function Ontology. Unofficial Draft (2016). https://w3id.org/function/spec
  4. 4.
    De Meester, B., Dimou, A., Verborgh, R., Mannens, E.: An ontology to semantically declare and describe functions. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 46–49. Springer, Cham (2016). doi: 10.1007/978-3-319-47602-5_10 CrossRefGoogle Scholar
  5. 5.
    De Meester, B., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: Declarative data transformations for linked data generation: the case of DBpedia. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 33–48. Springer, Cham (2017). doi: 10.1007/978-3-319-58451-5_3 CrossRefGoogle Scholar
  6. 6.
    Dimou, A., Kontokostas, D., Freudenberg, M., Verborgh, R., Lehmann, J., Mannens, E., Hellmann, S., Van de Walle, R.: Assessing and refining mappings to RDF to improve dataset quality. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 133–149. Springer, Cham (2015). doi: 10.1007/978-3-319-25010-6_8 CrossRefGoogle Scholar
  7. 7.
    Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the 7th Workshop on Linked Data on the Web, CEUR Workshop Proceedings, vol. 1184 (2014)Google Scholar
  8. 8.
    Heyvaert, P., Dimou, A., Verborgh, R., Mannens, E.: Ontology-based data access mapping generation using data, schema, query, and mapping knowledge. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 205–215. Springer, Cham (2017). doi: 10.1007/978-3-319-58451-5_15 CrossRefGoogle Scholar
  9. 9.
    Heyvaert, P., Dimou, A., Herregodts, A.-L., Verborgh, R., Schuurman, D., Mannens, E., Van de Walle, R.: RMLEditor: a graph-based mapping editor for linked data mappings. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 709–723. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_43 CrossRefGoogle Scholar
  10. 10.
    Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)Google Scholar
  11. 11.
    Lanthaler, M.: Hydra core vocabulary. Unofficial Draft, June 2014. http://www.hydra-cg.com/spec/latest/core/
  12. 12.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., Bizer, C.: DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web (2015)Google Scholar
  13. 13.
    Paulheim, H.: Data-driven joint debugging of the DBpedia mappings and ontology. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 404–418. Springer, Cham (2017). doi: 10.1007/978-3-319-58068-5_25 CrossRefGoogle Scholar
  14. 14.
    Paulheim, H., Gangemi, A.: Serving DBpedia with DOLCE – more than just adding a cherry on top. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 180–196. Springer, Cham (2015). doi: 10.1007/978-3-319-25007-6_11 CrossRefGoogle Scholar
  15. 15.
    Regalia, B., Janowicz, K., Gao, S.: VOLT: a provenance-producing, transparent SPARQL proxy for the on-demand computation of linked data and its application to spatiotemporally dependent data. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 523–538. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_32 CrossRefGoogle Scholar
  16. 16.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 245–260. Springer, Cham (2014). doi: 10.1007/978-3-319-11964-9_16 Google Scholar
  17. 17.
    Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Cham (2014). doi: 10.1007/978-3-319-07443-6_34 CrossRefGoogle Scholar
  18. 18.
    Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 97–104. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Wouter Maroy
    • 1
  • Anastasia Dimou
    • 1
  • Dimitris Kontokostas
    • 2
  • Ben De Meester
    • 1
  • Ruben Verborgh
    • 1
  • Jens Lehmann
    • 3
    • 4
  • Erik Mannens
    • 1
  • Sebastian Hellmann
    • 2
  1. 1.imec – IDLab, Department of Electronics and Information SystemsGhent UniversityGhentBelgium
  2. 2.Leipzig University – AKSW/KILTLeipzigGermany
  3. 3.University of Bonn, Smart Data Analytics GroupBonnGermany
  4. 4.Fraunhofer IAISSankt AugustinGermany

Personalised recommendations