Beyond Established Knowledge Graphs-Recommending Web Datasets for Data Linking

  • Mohamed Ben EllefiEmail author
  • Zohra Bellahsene
  • Stefan Dietze
  • Konstantin Todorov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9671)


With the explosive growth of the Web of Data in terms of size and complexity, identifying suitable datasets to be linked, has become a challenging problem for data publishers. To understand the nature of the content of specific datasets, we adopt the notion of dataset profiles, where datasets are characterized through a set of topic annotations. In this paper, we adopt a collaborative filtering-like recommendation approach, which exploits both existing dataset profiles, as well as traditional dataset connectivity measures, in order to link arbitrary, non-profiled datasets into a global dataset-topic-graph. Our experiments, applied to all available Linked Datasets in the Linked Open Data (LOD) cloud, show an average recall of up to \(81\,\%\), which translates to an average reduction of the size of the original candidate dataset search space to up to \(86\,\%\). An additional contribution of this work is the provision of benchmarks for dataset interlinking recommendation systems.


Latent Dirichlet Allocation Rank Score Link Open Data Target Dataset Source Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research has been partially funded under the Datalyse project ( and by the European Commission as part of the DURAARK project, FP7 Grant Agreement No. 600908.


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Fetahu, B., Dietze, S., Pereira Nunes, B., Antonio Casanova, M., Taibi, D., Nejdl, W.: A scalable approach for efficiently generating structured dataset topic profiles. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 519–534. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  5. 5.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the WWW, pp. 697–706 (2007)Google Scholar
  7. 7.
    Pereira Nunes, B., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B., Nejdl, W.: Combining a co-occurrence-based and a semantic measure for entity linking. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 548–562. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search in RDF data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 83–97. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Rabello Lopes, G., Paes Leme, L.A.P., Pereira Nunes, B., Casanova, M.A., Dietze, S.: Two approaches to the dataset interlinking recommendation problem. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part I. LNCS, vol. 8786, pp. 324–339. Springer, Heidelberg (2014)Google Scholar
  10. 10.
    Taibi, D., Dietze, S., Fetahu, B., Fulantelli, G.: Exploring type-specific topic profiles of datasets: a demo for educational linked data. In: Proceedings of the ISWC Posters and Demonstrations Track a Track, pp. 353–356, Riva del Garda, Italy (2014)Google Scholar
  11. 11.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  12. 12.
    Ricci, F., Rokach, L., Shapira, B., Kantor, P.B.: Recommender Systems Handbook, vol. 1. Springer, Berlin (2011)CrossRefzbMATHGoogle Scholar
  13. 13.
    Weiss, S.M., Kulikowski, C.A.: Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann Publishers Inc., San Francisco (1991)Google Scholar
  14. 14.
    Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Discovering related data sources in data-portals. In: Proceedings of the 1st IWSS (2013)Google Scholar
  16. 16.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Entity-based data source contextualization for searching the web of data. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC Satellite Events 2014. LNCS, vol. 8798, pp. 25–41. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: Proceedings of the 8th ISWC, pp. 49–55. ACM (2012)Google Scholar
  18. 18.
    Nikolov, A., d’Aquin, M.: Identifying relevant sources for data linking using a semantic web index. In: WWW 2011 Workshop on Linked Data on the Web, Hyderabad, India (2011)Google Scholar
  19. 19.
    Mehdi, M., Iqbal, A., Hogan, A., Hasnain, A., Khan, Y., Decker, S., Sahay, R.: Discovering domain-specific public SPARQL endpoints: a life-sciences use-case. In: Proceedings of the 18th IDEAS, pp. 39–45, Porto, Portugal (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mohamed Ben Ellefi
    • 1
    Email author
  • Zohra Bellahsene
    • 1
  • Stefan Dietze
    • 2
  • Konstantin Todorov
    • 1
  1. 1.LIRMMUniversity of MontpellierMontpellierFrance
  2. 2.L3S Research CenterLeibniz University HannoverHannoverGermany

Personalised recommendations