Dataset Recommendation for Data Linking: An Intensional Approach

  • Mohamed Ben EllefiEmail author
  • Zohra Bellahsene
  • Stefan Dietze
  • Konstantin Todorov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


With the growing quantity and diversity of publicly available web datasets, most notably Linked Open Data, recommending datasets, which meet specific criteria, has become an increasingly important, yet challenging problem. This task is of particular interest when addressing issues such as entity retrieval, semantic search and data linking. Here, we focus on that last issue. We introduce a dataset recommendation approach to identify linking candidates based on the presence of schema overlap between datasets. While an understanding of the nature of the content of specific datasets is a crucial prerequisite, we adopt the notion of dataset profiles, where a dataset is characterized through a set of schema concept labels that best describe it and can be potentially enriched by retrieving their textual descriptions. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterium based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the Linked Open Data cloud, show that our method achieves an average precision of up to \(53\,\%\) for a recall of \(100\,\%\). As an additional contribution, our method returns the mappings between the schema concepts across datasets – a particularly useful input for the data linking step.


Recommender System Schema Concept Cosine Similarity Link Open Data Recommendation Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research has been partially funded under the Datalyse project-FSN-AAP Big Data n3- (, by the European Commission-funded DURAARK project (FP7 Grant Agreement No. 600908) and the COST Action IC1302 (KEYSTONE).


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Pereira Nunes, B., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B., Nejdl, W.: Combining a co-occurrence-based and a semantic measure for entity linking. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 548–562. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search in RDF data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 83–97. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)Google Scholar
  5. 5.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of WWW, pp. 697–706 (2007)Google Scholar
  9. 9.
    Rabello Lopes, G., Paes Leme, L.A.P., Pereira Nunes, B., Casanova, M.A., Dietze, S.: Two approaches to the dataset interlinking recommendation problem. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part I. LNCS, vol. 8786, pp. 324–339. Springer, Heidelberg (2014)Google Scholar
  10. 10.
    Han, L., Kashyap, A.L., Finin, T., Mayfield, J., Weese, J.: Umbc_ebiquity-core: semantic textual similarity systems. In: Proceedings of *SEM. Association for Computational Linguistics (2013)Google Scholar
  11. 11.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework for the web of data. In: Proceedings of WWW, LDOW (2009)Google Scholar
  12. 12.
    Ellefi, M.B., Bellahsene, Z., Scharffe, F., Todorov, K.: Towards semantic dataset profiling. In: Proceedings of Dataset PROFIling and fEderated Search for Linked Data Workshop Co-located with the 11th ESWC (2014)Google Scholar
  13. 13.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of 32nd ACL, pp. 133–138 (1994)Google Scholar
  14. 14.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of ICML, pp. 296–304 (1998)Google Scholar
  15. 15.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. In: JASIS 1990, vol. 41, pp. 391–407 (1990)Google Scholar
  16. 16.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  17. 17.
    Ricci, F., Rokach, L., Shapira, B., Kantor, P.B.: Recommender Systems Handbook, vol. 1. Springer, Heidelberg (2011)CrossRefzbMATHGoogle Scholar
  18. 18.
    Gottron, T., Knauf, M., Scheglmann, S., Scherp, A.: A systematic investigation of explicit and implicit schema information on the linked open data cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 228–242. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Ellefi, M.B., Bellahsene, Z., Todorov, K.: Datavore: a vocabulary recommender tool assisting linked data modeling. In: Proceedings of ISWC Posters & Demonstrations Track a Track (2015)Google Scholar
  20. 20.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Discovering related data sources in data-portals. In: Proceedings of 1st IWSS (2013)Google Scholar
  21. 21.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Entity-based data source contextualization for searching the web of data. In: Proceedings of Dataset PROFIling & fEderated Search for Linked Data Workshop Co-located with the 11th ESWC, pp. 25–41 (2014)Google Scholar
  22. 22.
    de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: Procedings of 8th ISWC, pp. 49–55. ACM (2012)Google Scholar
  23. 23.
    Nikolov, A., d’Aquin, M.: Identifying relevant sources for data linking using a semantic web index. In: WWW, LDOW (2011)Google Scholar
  24. 24.
    Mehdi, M., Iqbal, A., Hogan, A., Hasnain, A., Khan, Y., Decker, S., Sahay, R.: Discovering domain-specific public SPARQL endpoints, a life-sciences use-case. In: Proceedings of 18th IDEAS, pp. 39–45 (2014)Google Scholar
  25. 25.
    Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mohamed Ben Ellefi
    • 1
    Email author
  • Zohra Bellahsene
    • 1
  • Stefan Dietze
    • 2
  • Konstantin Todorov
    • 1
  1. 1.LIRMMUniversity of MontpellierMontpellierFrance
  2. 2.L3S Research CenterLeibniz University HannoverHannoverGermany

Personalised recommendations