Skip to main content

Challenges of Linking Organizational Information in Open Government Data to Knowledge Graphs

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2020)

Abstract

Open Government Data (OGD) is being published by various public administration organizations around the globe. Within the metadata of OGD data catalogs, the publishing organizations (1) are not uniquely and unambiguously identifiable and, even worse, (2) change over time, by public administration units being merged or restructured. In order to enable fine-grained analyzes or searches on Open Government Data on the level of publishing organizations, linking those from OGD portals to publicly available knowledge graphs (KGs) such as Wikidata and DBpedia seems like an obvious solution. Still, as we show in this position paper, organization linking faces significant challenges, both in terms of available (portal) metadata and KGs in terms of data quality and completeness. We herein specifically highlight five main challenges, namely regarding (1) temporal changes in organizations and in the portal metadata, (2) lack of a base ontology for describing organizational structures and changes in public knowledge graphs, (3) metadata and KG data quality, (4) multilinguality, and (5) disambiguating public sector organizations. Based on available OGD portal metadata from the Open Data Portal Watch, we provide an in-depth analysis of these issues, make suggestions for concrete starting points on how to tackle them along with a call to the community to jointly work on these open challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    URL prefixes such as dbo:, dbp:, wdt:, or schema: can be referenced in prefix.cc.

  2. 2.

    German writing of the English word “parliament”.

  3. 3.

    As https://data.gv.at/ is the Austrian national data portal, the label “Parlament” refers to the Austrian parliament.

  4. 4.

    cf. for instance http://opendatamonitor.eu or

    http://europeandataportal.eu/dashboard.

  5. 5.

    https://data.wu.ac.at/portalwatch.

  6. 6.

    https://data.europa.eu/euodp/de/data.

  7. 7.

    http://risis.eu/orgreg/.

  8. 8.

    The ODPW metadata already maps different schemata uniformly to DCAT, cf.  [17].

  9. 9.

    https://github.com/YaserJaradeh/LinkingODPublishers/blob/master/GoldStandard.csv.

  10. 10.

    https://github.com/YaserJaradeh/LinkingODPublishers/tree/master/Scripts.

  11. 11.

    dbr:London_Fire_and_Emergency_Planning_Authority.

  12. 12.

    dbr:London_Fire_and_Civil_Defence_Authority.

  13. 13.

    A SPARQL query for dbp:governingBody resulted in \(\sim 6,000\) usages with only 930 distinct objects over all of DBpedia.

  14. 14.

    https://web.archive.org/web/20190403150124/https://www.wikidata.org/wiki/Q2624680.

  15. 15.

    Note that to a certain extend, up-to-date metadata is available e.g. through the ODPW data base that was also used for our analysis: https://data.wu.ac.at/portalwatch/data.

References

  1. Extract meaning from your text. https://www.textrazor.com/

  2. Text analytics - meaningcloud text mining solutions (2016). https://www.meaningcloud.com/

  3. Assaf, A., Troncy, R., Senart, A.: HDL - towards a harmonized dataset model for open data portals. In: Workshop on Using the Web in the Age of Data (USEWOD ’15) Co-located with (ESWC 2015), pp. 62–74 (2015)

    Google Scholar 

  4. Brickley, D., Burgess, M., Noy, N.F.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: The World Wide Web Conference, WWW, pp. 1365–1375. ACM (2019)

    Google Scholar 

  5. Delpeuch, A.: Opentapioca: Lightweight entity linking for wikidata. CoRR abs/1904.09131 (2019). http://arxiv.org/abs/1904.09131

  6. Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_7

    Chapter  Google Scholar 

  7. Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: I-SEMANTICS 2013, pp. 105–112. ACM (2013)

    Google Scholar 

  8. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridg (1998)

    MATH  Google Scholar 

  9. Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM, pp. 1625–1628 (2010)

    Google Scholar 

  10. Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J., Simperl, E.: Characterising dataset search - an analysis of search logs and data requests. J. Web Semant. 55, 37–55 (2019)

    Article  Google Scholar 

  11. Kremen, P., Necaský, M.: Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary. J. Web Semant. 55, 1–20 (2019)

    Article  Google Scholar 

  12. Maali, F., Erickson, J.: Data catalog vocabulary (DCAT). W3C Recommendation (2014). http://www.w3.org/TR/vocab-dcat/

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th International Conference on Semantic Systems, I-SEMANTICS 2011, Graz, Austria, 7–9 September 2011, pp. 1–8 (2011)

    Google Scholar 

  14. Neumaier, S.: Semantic enrichment of open data on the web. Ph.D. thesis, Vienna University of Technology (2019)

    Google Scholar 

  15. Neumaier, S., Thurnay, L., Lampoltshammer, T.J., Knap, T.: Search, filter, fork, and link open data: the adequate platform: data- and community-driven quality improvements. In: Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 1523–1526 (2018)

    Google Scholar 

  16. Neumaier, S., Umbrich, J., Polleres, A.: Automated quality assessment of metadata across open data portals. J. Data Inf. Qual. 8(1), 2:1–2:29 (2016)

    Google Scholar 

  17. Neumaier, S., Umbrich, J., Polleres, A.: Lifting data portals to the web of data. In: 10th Workshop on Linked Data on the Web (LDOW2017) (2017)

    Google Scholar 

  18. Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text. In: Proceedings of the 2019 NAACL-HLT 2019, pp. 2336–2346 (2019)

    Google Scholar 

  19. Tygel, A., Auer, S., Debattista, J., Orlandi, F., Campos, M.L.M.: Towards cleaning-up open data portals: a metadata reconciliation approach. In: 10th IEEE International Conference on Semantic Computing, ICSC 2016, pp. 71–78 (2016)

    Google Scholar 

Download references

Acknowledgements

The authors thank Vincent Emonet, Paola Espinoza-Arias, and Bilal Koteich who contributed preliminary analyses regarding the challenges addressed in this paper. We also thank the organizers of the International Semantic Web Summer school (ISWS) 2019: the idea for this paper origins in discussions at the school.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Portisch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Portisch, J., Fallatah, O., Neumaier, S., Jaradeh, M.Y., Polleres, A. (2020). Challenges of Linking Organizational Information in Open Government Data to Knowledge Graphs. In: Keet, C.M., Dumontier, M. (eds) Knowledge Engineering and Knowledge Management. EKAW 2020. Lecture Notes in Computer Science(), vol 12387. Springer, Cham. https://doi.org/10.1007/978-3-030-61244-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61244-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61243-6

  • Online ISBN: 978-3-030-61244-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics