Advertisement

Network Metrics for Assessing the Quality of Entity Resolution Between Multiple Datasets

  • Al Koudous IdrissouEmail author
  • Frank van Harmelen
  • Peter van den Besselaar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11313)

Abstract

Matching entities between datasets is a crucial step for combining multiple datasets on the semantic web. A rich literature exists on different approaches to this entity resolution problem. However, much less work has been done on how to assess the quality of such entity links once they have been generated. Evaluation methods for link quality are typically limited to either comparison with a ground truth dataset (which is often not available), manual work (which is cumbersome and prone to error), or crowd sourcing (which is not always feasible, especially if expert knowledge is required). Furthermore, the problem of link evaluation is greatly exacerbated for links between more than two datasets, because the number of possible links grows rapidly with the number of datasets. In this paper, we propose a method to estimate the quality of entity links between multiple datasets. We exploit the fact that the links between entities from multiple datasets form a network, and we show how simple metrics on this network can reliably predict their quality. We verify our results in a large experimental study using six datasets from the domain of science, technology and innovation studies, for which we created a gold standard. This gold standard, available online, is an additional contribution of this paper. In addition, we evaluate our metric on a recently published gold standard to confirm our findings.

Keywords

Entity resolution Data integration Network metrics 

Notes

Acknowledgement

We kindly thank Paul Groth for his constructive comments and proofreading, Alieh Saeedi for sharing her experiments data and supporting the reproducibility of their experiments, and the EKAW reviewers for constructive comments. This work was supported by the European Union’s 7th Framework Programme under the project RISIS (GA no. 313082).

References

  1. 1.
    Baron, A., Freedman, M.: Who is who and what is what: experiments in cross-document co-reference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 274–283. Association for Computational Linguistics (2008)Google Scholar
  2. 2.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on EMNLP-CoNLL (2007)Google Scholar
  3. 3.
    David, J., Euzenat, J.: Comparison between ontology distances (Preliminary Results). In: Sheth, A., et al. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 245–260. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88564-1_16CrossRefGoogle Scholar
  4. 4.
    David, J., Euzenat, J., Šváb-Zamazal, O.: Ontology similarity in the alignment space. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 129–144. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-17746-0_9CrossRefGoogle Scholar
  5. 5.
    Euzenat, J., Shvaiko, P.: Ontology Matching, 2nd edn. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-30284-8_13CrossRefGoogle Scholar
  7. 7.
    Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: A framework for semantic link discovery over relational data. In: 18th ACM Conference on Information and Knowledge Management, pp. 1027–1036. ACM (2009)Google Scholar
  8. 8.
    Hassanzadeh, O., Xin, R., Miller, R.J., Kementsietsidis, A., Lim, L., Wang, M.: Linkage query writer. Proc. VLDB Endow. 2(2), 1590–1593 (2009)CrossRefGoogle Scholar
  9. 9.
    Li, W., Zhang, S., Qi, G.: A graph-based approach for resolving incoherent ontology mappings. In: Web Intelligence, vol. 16, pp. 15–35. IOS Press (2018)Google Scholar
  10. 10.
    Maedche, A., Staab, S.: Measuring similarity between ontologies. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 251–263. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45810-7_24CrossRefzbMATHGoogle Scholar
  11. 11.
    Menestrina, D., Whang, S.E., Garcia-Molina, H.: Evaluating entity resolution results. Proc. VLDB Endow. 3(1–2), 208–219 (2010)CrossRefGoogle Scholar
  12. 12.
    Ngomo, A.-C.N., Auer, S.: Limes-a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp. 2312–2317 (2011)Google Scholar
  13. 13.
    Saeedi, A., Peukert, E., Rahm, E.: Using link features for entity clustering in knowledge graphs. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 576–592. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-93417-4_37CrossRefGoogle Scholar
  14. 14.
    Sarasua, C., Staab, S., Thimm, M.: Methods for intrinsic evaluation of links in the web of data. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 68–84. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58068-5_5CrossRefGoogle Scholar
  15. 15.
    Usbeck, R., et al.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_29CrossRefGoogle Scholar
  16. 16.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., et al. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04930-9_41CrossRefGoogle Scholar
  17. 17.
    Vrandečić, D., Sure, Y.: How to design better ontology metrics. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 311–325. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-72667-8_23CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Al Koudous Idrissou
    • 1
    • 2
    Email author
  • Frank van Harmelen
    • 1
  • Peter van den Besselaar
    • 2
  1. 1.Department of Computer ScienceVrije Universiteit AmsterdamAmsterdamNetherlands
  2. 2.Department of Organization SciencesVrije Universiteit AmsterdamAmsterdamNetherlands

Personalised recommendations