Detecting Similar Linked Datasets Using Topic Modelling

  • Michael Röder
  • Axel-Cyrille Ngonga Ngomo
  • Ivan Ermilov
  • Andreas Both
Conference paper

DOI: 10.1007/978-3-319-34129-3_1

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)
Cite this paper as:
Röder M., Ngonga Ngomo AC., Ermilov I., Both A. (2016) Detecting Similar Linked Datasets Using Topic Modelling. In: Sack H., Blomqvist E., d'Aquin M., Ghidini C., Ponzetto S., Lange C. (eds) The Semantic Web. Latest Advances and New Domains. ESWC 2016. Lecture Notes in Computer Science, vol 9678. Springer, Cham

Abstract

The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets. Given the size and growth of the Linked Data Cloud, the current mostly manual approach used for detecting relevant datasets for linking is obsolete. We study the use of topic modelling for dataset search experimentally and present Tapioca, a linked dataset search engine that provides data publishers with similar existing datasets automatically. Our search engine uses a novel approach for determining the topical similarity of datasets. This approach relies on probabilistic topic modelling to determine related datasets by relying solely on the metadata of datasets. We evaluate our approach on a manually created gold standard and with a user study. Our evaluation shows that our algorithm outperforms a set of comparable baseline algorithms including standard search engines significantly by 6 % F1-score. Moreover, we show that it can be used on a large real world dataset with a comparable performance.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Michael Röder
    • 1
  • Axel-Cyrille Ngonga Ngomo
    • 1
  • Ivan Ermilov
    • 1
  • Andreas Both
    • 2
  1. 1.AKSWLeipzig UniversityLeipzigGermany
  2. 2.Mercateo AGLeipzigGermany

Personalised recommendations