Challenges of Linking Organizational Information in Open Government Data to Knowledge Graphs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12387)


Open Government Data (OGD) is being published by various public administration organizations around the globe. Within the metadata of OGD data catalogs, the publishing organizations (1) are not uniquely and unambiguously identifiable and, even worse, (2) change over time, by public administration units being merged or restructured. In order to enable fine-grained analyzes or searches on Open Government Data on the level of publishing organizations, linking those from OGD portals to publicly available knowledge graphs (KGs) such as Wikidata and DBpedia seems like an obvious solution. Still, as we show in this position paper, organization linking faces significant challenges, both in terms of available (portal) metadata and KGs in terms of data quality and completeness. We herein specifically highlight five main challenges, namely regarding (1) temporal changes in organizations and in the portal metadata, (2) lack of a base ontology for describing organizational structures and changes in public knowledge graphs, (3) metadata and KG data quality, (4) multilinguality, and (5) disambiguating public sector organizations. Based on available OGD portal metadata from the Open Data Portal Watch, we provide an in-depth analysis of these issues, make suggestions for concrete starting points on how to tackle them along with a call to the community to jointly work on these open challenges.


Open data Dataset evolution Entity linking Knowledge graphs Knowledge graph evolution 



The authors thank Vincent Emonet, Paola Espinoza-Arias, and Bilal Koteich who contributed preliminary analyses regarding the challenges addressed in this paper. We also thank the organizers of the International Semantic Web Summer school (ISWS) 2019: the idea for this paper origins in discussions at the school.


  1. 1.
    Extract meaning from your text.
  2. 2.
    Text analytics - meaningcloud text mining solutions (2016).
  3. 3.
    Assaf, A., Troncy, R., Senart, A.: HDL - towards a harmonized dataset model for open data portals. In: Workshop on Using the Web in the Age of Data (USEWOD ’15) Co-located with (ESWC 2015), pp. 62–74 (2015)Google Scholar
  4. 4.
    Brickley, D., Burgess, M., Noy, N.F.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: The World Wide Web Conference, WWW, pp. 1365–1375. ACM (2019)Google Scholar
  5. 5.
    Delpeuch, A.: Opentapioca: Lightweight entity linking for wikidata. CoRR abs/1904.09131 (2019).
  6. 6.
    Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). Scholar
  7. 7.
    Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: I-SEMANTICS 2013, pp. 105–112. ACM (2013)Google Scholar
  8. 8.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridg (1998)zbMATHGoogle Scholar
  9. 9.
    Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM, pp. 1625–1628 (2010)Google Scholar
  10. 10.
    Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J., Simperl, E.: Characterising dataset search - an analysis of search logs and data requests. J. Web Semant. 55, 37–55 (2019)CrossRefGoogle Scholar
  11. 11.
    Kremen, P., Necaský, M.: Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary. J. Web Semant. 55, 1–20 (2019)CrossRefGoogle Scholar
  12. 12.
    Maali, F., Erickson, J.: Data catalog vocabulary (DCAT). W3C Recommendation (2014).
  13. 13.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th International Conference on Semantic Systems, I-SEMANTICS 2011, Graz, Austria, 7–9 September 2011, pp. 1–8 (2011)Google Scholar
  14. 14.
    Neumaier, S.: Semantic enrichment of open data on the web. Ph.D. thesis, Vienna University of Technology (2019)Google Scholar
  15. 15.
    Neumaier, S., Thurnay, L., Lampoltshammer, T.J., Knap, T.: Search, filter, fork, and link open data: the adequate platform: data- and community-driven quality improvements. In: Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 1523–1526 (2018)Google Scholar
  16. 16.
    Neumaier, S., Umbrich, J., Polleres, A.: Automated quality assessment of metadata across open data portals. J. Data Inf. Qual. 8(1), 2:1–2:29 (2016)Google Scholar
  17. 17.
    Neumaier, S., Umbrich, J., Polleres, A.: Lifting data portals to the web of data. In: 10th Workshop on Linked Data on the Web (LDOW2017) (2017)Google Scholar
  18. 18.
    Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text. In: Proceedings of the 2019 NAACL-HLT 2019, pp. 2336–2346 (2019)Google Scholar
  19. 19.
    Tygel, A., Auer, S., Debattista, J., Orlandi, F., Campos, M.L.M.: Towards cleaning-up open data portals: a metadata reconciliation approach. In: 10th IEEE International Conference on Semantic Computing, ICSC 2016, pp. 71–78 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany
  2. 2.Information SchoolThe University of SheffieldSheffieldUK
  3. 3.Vienna University of Economics and BusinessViennaAustria
  4. 4.Complexity Science Hub ViennaViennaAustria
  5. 5.L3S Research CenterLeibniz University HannoverHanoverGermany

Personalised recommendations