Empirical Analysis of Ranking Models for an Adaptable Dataset Search

  • Angelo B. NevesEmail author
  • Rodrigo G. G. de Oliveira
  • Luiz André P. Paes Leme
  • Giseli Rabello Lopes
  • Bernardo P. Nunes
  • Marco A. Casanova
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)


Currently available datasets still have a large unexplored potential for interlinking. Ranking techniques contribute to this task by scoring datasets according to the likelihood of finding entities related to those of a target dataset. Ranked datasets can be either manually selected for standalone linking discovery tasks or automatically inspected by programs that would go through the ranking looking for entity links. This work presents empirical comparisons between different ranking models and argues that different algorithms could be used depending on whether the ranking is manually or automatically handled and, also, depending on the available metadata of the datasets. Experiments indicate that ranking algorithms that performed best with nDCG do not always have the best Recall at Position k, for high recall levels. The best ranking model for the manual use case (with respect to nDCG) may need 13% more datasets for 90% of recall, i.e., instead of just a slice of 34% of the datasets at the top of the ranking, reached by the best model for the automatic use case (with respect to recall@k), it would need almost 47% of the ranking.


Linked Data Entity linking Recommendation Dataset Ranking Empirical evaluation 



This work has been funded by FAPERJ/BR under grants E-26/010.000794/2016, E-26/201.000337/2014 and CNPq under grant 303332/2013-1.


  1. 1.
    Abele, A., McCrae, J.P., Buitelaar, P., Jentzsch, A., Cyganiak, R.: Linking open data cloud diagram 2017. Technical report, Insight Centre for Data Analytics at NUI Galway (2017).
  2. 2.
    Baeza-Yates, R.R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edn. ACM Press, New York (2011)Google Scholar
  3. 3.
    Caraballo, A.A.M., Arruda, N.M., Nunes, B.P., Lopes, G.R., Casanova, M.A.: TRTML - a tripleset recommendation tool based on supervised learning algorithms. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 413–417. Springer, Cham (2014). Scholar
  4. 4.
    Caraballo, A.A.M., Nunes, B.P., Lopes, G.R., Leme, L.A.P.P., Casanova, M.A.: Automatic creation and analysis of a linked data cloud diagram. In: Cellary, W., Mokbel, M.F., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2016. LNCS, vol. 10041, pp. 417–432. Springer, Cham (2016)CrossRefGoogle Scholar
  5. 5.
    Ellefi, M.B., Bellahsene, Z., Dietze, S., Todorov, K.: Dataset recommendation for data linking: an intensional approach. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 36–51. Springer, Cham (2016). Scholar
  6. 6.
    Ellefi, M.B., Bellahsene, Z., Dietze, S., Todorov, K.: Beyond established knowledge graphs-recommending web datasets for data linking. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds.) ICWE 2016. LNCS, vol. 9671, pp. 262–279. Springer, Cham (2016). Scholar
  7. 7.
    Emaldi, M., Corcho, O., López-De-Ipiña, D.: Detection of related semantic datasets based on frequent subgraph mining. In: Proceedings of the Intelligent Exploration of Semantic Data (IESD 2015) (2015)Google Scholar
  8. 8.
    Harris, S., Seaborne, A.: SPARQL 1.1 query language. Technical report, W3C (2013)Google Scholar
  9. 9.
    Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). Scholar
  10. 10.
    Liu, H., Wang, T., Tang, J., Ning, H., Wei, D.: Link prediction of datasets sameAS interlinking network on web of data. In: Proceedings of the 3rd International Conference on Information Management (ICIM 2017), pp. 346–352 (2017)Google Scholar
  11. 11.
    Lopes, G.R., Leme, L.A.P.P., Nunes, B.P., Casanova, M.A., Dietze, S.: Two approaches to the dataset interlinking recommendation problem. In: Proceedings of the 15th International Conference on Web Information Systems Engineering (WISE 2014), pp. 324–339 (2014)Google Scholar
  12. 12.
    Martins, Y.C., da Mota, F.F., Cavalcanti, M.C.: DSCrank: a method for selection and ranking of datasets. In: Garoufallou, E., Subirats Coll, I., Stellato, A., Greenberg, J. (eds.) MTSR 2016. CCIS, vol. 672, pp. 333–344. Springer, Cham (2016). Scholar
  13. 13.
    Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link discovery frameworks. Semant. Web 8(3), 419–436 (2016)CrossRefGoogle Scholar
  14. 14.
    Neves, A.B., Leme, L.A.P.P.: Dataset Descriptions. figshare (2017).
  15. 15.
    Ngomo, A.C.N., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pp. 2312–2317 (2011)Google Scholar
  16. 16.
    Nikolov, A., Uren, V., Motta, E.: KnoFuss: a comprehensive architecture for knowledge fusion. In: Proceedings of the 4th International Conference on Knowledge Capture (K-CAP 2007), pp. 185–186 (2007)Google Scholar
  17. 17.
    Oliveira, R.G.G., Neves, A.B., Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A.: Empirical Analysis of Ranking Models for an Adaptable Dataset Search: Complementary Material. figshare (2017).
  18. 18.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009). Scholar
  19. 19.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann Publishers Inc., Burlington (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Fluminense Federal UniversityNiteróiBrazil
  2. 2.Federal University of Rio de JaneiroRio de JaneiroBrazil
  3. 3.PUC-RioRio de JaneiroBrazil
  4. 4.Federal University of the State of Rio de JaneiroRio de JaneiroBrazil

Personalised recommendations