Automatic Key Selection for Data Linking

  • Manel Achichi
  • Mohamed Ben Ellefi
  • Danai Symeonidou
  • Konstantin Todorov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10024)


The paper proposes an RDF key ranking approach that attempts to close the gap between automatic key discovery and data linking approaches and thus reduce the user effort in linking configuration. Indeed, data linking tool configuration is a laborious process, where the user is often required to select manually the properties to compare, which supposes an in-depth expert knowledge of the data. Key discovery techniques attempt to facilitate this task, but in a number of cases do not fully succeed, due to the large number of keys produced, lacking a confidence indicator. Since keys are extracted from each dataset independently, their effectiveness for the matching task, involving two datasets, is undermined. The approach proposed in this work suggests to unlock the potential of both key discovery techniques and data linking tools by providing to the user a limited number of merged and ranked keys, well-suited to a particular matching task. In addition, the complementarity properties of a small number of top-ranked keys is explored, showing that their combined use improves significantly the recall. We report our experiments on data from the Ontology Alignment Evaluation Initiative, as well as on real-world benchmark data about music.


Ranking Function Benchmark Dataset Link Specification Ranking Approach Reference Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been partially supported by the French National Research Agency(ANR) within the DOREMUS Project, under grant number ANR-14-CE24-0020.


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. In: Semantic Services, Interoperability and Web Applications, pp. 205–227 (2009)Google Scholar
  2. 2.
    Symeonidou, D., Armant, V., Pernelle, N., Saïs, F.: SAKey: scalable almost key discovery in RDF data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 33–49. Springer, Heidelberg (2014)Google Scholar
  3. 3.
    Atencia, M., David, J., Euzenat, J.: Data interlinking through robust linkkey extraction. In: ECAI, pp. 15–20 (2014)Google Scholar
  4. 4.
    Soru, T., Marx, E., Ngomo, A.N.: ROCKER: a refinement operator for key discovery. WWW 2015, 1025–1033 (2015)CrossRefGoogle Scholar
  5. 5.
    Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data linking. J. Web Semant. 23, 16–30 (2013)CrossRefGoogle Scholar
  6. 6.
    Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Symeonidou, D., Sanchez, I., Croitoru, M., Neveu, P., Pernelle, N., Saïs, F., Roland-Vialaret, A., Buche, P., Muljarto, A., Schneider, R.: ICCS, pp. 222–236 (2016)Google Scholar
  8. 8.
    Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: SIGKDD, pp. 1065–1068. ACM (2008)Google Scholar
  10. 10.
    Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: WebDB (2011)Google Scholar
  11. 11.
    Ngomo, A.-C.N., Lehmann, J., Auer, S., Höffner, K.: Raven-active learning of link specifications. In: International Conference on Ontology Matching, pp. 25–36 (2011).
  12. 12.
    Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications, vol. 169 (2013)Google Scholar
  13. 13.
    Nentwig, M., Hartung, M., Ngomo, A.-C.N., Rahm, E.: A survey of current link discovery frameworks. Semantic Web, pp. 1–18 (2015, preprint)Google Scholar
  14. 14.
    Jentzsch, A., Isele, R., Bizer, C.: Silk-generating RDF links while publishing or consuming linked data. In: ISWC, Citeseer (2010)Google Scholar
  15. 15.
    Ngomo, A.N., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp. 2312–2317 (2011)Google Scholar
  16. 16.
    Shao, C., Hu, L., Li, J., Wang, Z., Chung, T.L., Xia, J.: RiMOM-IM: a novel iterative framework for instance matching. J. Comput. Sci. Technol. 31(1), 185–197 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: logic-based and scalable ontology matching. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 273–288. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Nikolov, A., Uren, V.S., Motta, E., De Roeck, A.: Integration of semantically annotated data by the KnoFuss architecture. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 265–274. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Araujo, S., Hidders, J., Schwabe, D., De Vries, A.P.: Serimi-resource description similarity, RDF instance matching, interlinking. arXiv preprint arXiv:1107.1104 (2011)
  20. 20.
    Rong, S., Niu, X., Xiang, E.W., Wang, H., Yang, Q., Yu, Y.: A machine learning approach for instance matching based on similarity metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 460–475. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Kejriwal, M., Miranker, D.P.: Semi-supervised instance matching using boosted classifiers. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 388–402. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  22. 22.
    Lesnikova, T., David, J., Euzenat, J.: Interlinking english, Chinese RDF data using babelnet. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 39–42. ACM (2015)Google Scholar
  23. 23.
    Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Transactions on knowledge and data engineering 25(1), 158–176 (2013)CrossRefGoogle Scholar
  24. 24.
    Achichi, M., Bailly, R., Cecconi, C., Destandau, M., Todorov, K., Troncy, R.: Doremus: doing reusable musical data. In: ISWC PD (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Manel Achichi
    • 1
  • Mohamed Ben Ellefi
    • 1
  • Danai Symeonidou
    • 2
  • Konstantin Todorov
    • 1
  1. 1.LIRMM/University of MontpellierMontpellierFrance
  2. 2.INRA, MISTEA Joint Research Unit, UMR729MontpellierFrance

Personalised recommendations