Blocking for Entity Resolution in the Web of Data: Challenges and Algorithms

Conference paper
Part of the Springer Proceedings in Business and Economics book series (SPBE)

Abstract

In the Web of data, entities are described by interlinked data rather than documents on the Web. In this talk, we focus on entity resolution in the Web of data, i.e., on the problem of identifying descriptions that refer to the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, methods for entity resolution typically perform a preprocessing step, called blocking, which places similar entity descriptions into blocks and executes comparisons only between descriptions within the same block. The objective of this talk is to present challenges and algorithms for blocking for entity resolution, stemming from the Web openness in describing, by an unbounded number of KBs, a multitude of entity types across domains, as well as the high heterogeneity (semantic and structural) of descriptions, even for the same types of entities.

References

  1. Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z.G. Ives. 2007. Dbpedia: A nucleus for a web of open data. In ISWC.Google Scholar
  2. Bollacker, K.D., C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD.Google Scholar
  3. Christen, P. 2012. Data matching—Concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berling: Springer.Google Scholar
  4. Christophides, V., V. Efthymiou, and K. Stefanidis. 2015. Entity resolution in the web of data. Synthesis lectures on the semantic web: Theory and technology. Morgan & Claypool Publishers.Google Scholar
  5. Cimiano, P., C. Unger, and J. McCrae. 2014. Ontology-based interpretation of natural language. Synthesis lectures on human language technologies. Morgan & Claypool Publishers.Google Scholar
  6. Deshpande, O., D.S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. 2013. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD.Google Scholar
  7. Dong, X.L., and D. Srivastava. Big data integration. Synthesis lectures on data management. Morgan & Claypool Publishers.Google Scholar
  8. Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2015. Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data. In IEEE big data.Google Scholar
  9. Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2017. Parallel meta-blocking for scaling entity resolution over big heterogeneous data. Information Systems 65: 137–157.CrossRefGoogle Scholar
  10. Efthymiou, V., K. Stefanidis, and V. Christophides. 2015. Big data entity resolution: From highly to somehow similar entity descriptions in the web. In IEEE big data.Google Scholar
  11. Efthymiou, V., K. Stefanidis, and V. Christophides. 2017. Benchmarking blocking algorithms for web entities. IEEE Transactions on Big Data 3.Google Scholar
  12. Hogan, A., A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. 2011. Searching and browsing linked data with SWSE: The semantic web search engine. Journal of Web Semantics 9 (4): 365–401.CrossRefGoogle Scholar
  13. Hogan, A., J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of linked data conformance. Web Semantics 14: 14–44.CrossRefGoogle Scholar
  14. Hovy, E.H., R. Navigli, and S.P. Ponzetto. 2013. Collaboratively built semi-structured content and artificial intelligence: The story so far. Artificial Intelligence 194: 2–27.CrossRefGoogle Scholar
  15. Papadakis, G., E. Ioannou, T. Palpanas, C. Niederée, and W. Nejdl. 2013. A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Transactions on Knowledge and Data Engineering 25 (12): 2665–2682.CrossRefGoogle Scholar
  16. Schmachtenberg, M., C. Bizer, and H. Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In ISWC.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of TampereTampereFinland

Personalised recommendations