Object Matching: New Challenges for Record Linkage

Scannapieco, Monica

doi:10.1007/978-3-319-07121-3_6

Monica Scannapieco⁹

Part of the book series: Synthese Library ((SYLI,volume 358))

Abstract

Record Linkage is the problem of identifying pairs of records coming from different sources and representing the same real world object. Available techniques for record linkage provide a satisfying answer when data are “traditional” records, that is well-structured information with clearly identified metadata describing values. When this latter condition does not hold, record linkage is most properly called Object Matching. In this paper, we will focus on objects that have “some degree of structure”, which is the case of most part of the data available on the Web. We will describe challenges of Object Matching when objects have this latter meaning, and we will provide several examples of techniques that permit to face some of these challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Record Linkage

Notes

1.
A uniform resource identifier (URI) is a string of characters used to identify a name of a web resource.

References

Bergman, M. K. (2001). The deep web: Surfacing hidden value. The Journal of Electronic Publishing, 7(1), 1–17.
Article Google Scholar
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. E. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.
Article Google Scholar
Bleiholder, J., & Naumann, F. (2008). Data fusion. ACM Computing Surveys, 41(1), 1–41.
Article Google Scholar
Chaudhuri, S., Ganti, V., & Motwani, R. (2005). Robust identification of fuzzy duplicates. In Proceedings of the International Conference on Data Engineering (ICDE 2005), Tokyo.
Google Scholar
Chen, Z., Kalashnikov, D. V., & Mehrotra, S. (2005). Exploiting relationships for object consolidation. In Proceedings of the International Workshop on Information Quality in Information Systems (IQIS), Baltimore.
Google Scholar
Datalift. (2011). Deliverable 4.1 methods for automated dataset interlinking.
Google Scholar
DBPedia. (2007). DBPedia. Retrieved February 15, 2013, from http://dbpedia.org/About
Dempster, A. P., Laird, N., & Rubin, D. B. (1977). Maximum-likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Google Scholar
Dunn, H. L. (1946). Record linkage. American Journal of Public Health, 36, 1412–1416.
Article Google Scholar
Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge Data Engineering, 19(1), 57–72.
Article Google Scholar
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.
Article Google Scholar
Fortuna, C., & Grobelnik, M. (2011). Tutorial: The web of things. In Proceedings of the World Wide Web Conference, Hyderabad.
Google Scholar
Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., & Srivastava, D. (2001). Approximate string joins in a database (almost) for free. In Proceedings of Very Large Data Base (VLDB 2001), Rome.
Google Scholar
Halpin, H., Hayes, P. J., McCusker, J. P., McGuinness, D. L., & Thompson, H. S. (2010). When owl:sameas isn’t the same: An analysis of identity in linked data. In Proceedings of the 9th International Semantic Web Conference (ISWC), Shanghai.
Google Scholar
Hernandez, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.
Article Google Scholar
Kolb, L., Thor, A., & Rahm, E. (2012a). Multi-pass sorted neighborhood blocking with MapReduce. Computer Science – Research and Development, 27(1), 45–63.
Article Google Scholar
Kolb, L., Thor, A., & Rahm, E. (2012b). Dedoop: Efficient deduplication with Hadoop. In Proceedings of the 38th International Conference on Very Large Databases (VLDB)/Proceedings of the VLDB Endowment 5(12), 1878–1881.
Google Scholar
Li, P., Dong, X. L., Maurino, A., & Srivastava, D. (2012). Linking temporal records. Frontiers of Computer Science, 6(3), 293–312.
Google Scholar
Li, X., Luna Dong, X., Lyons, K. B., & Srivastava, D. (2012). Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 97–108.
Google Scholar
Linked Data. (2006). Retrieved February 15, 2013, from http://linkeddata.org/
Milano, D., Scannapieco, M., & Catarci, T. (2006). Structure-aware XML object identification. IEEE Data Engineering Bulletin, 29(2), 67–74.
Google Scholar
Newcombe, H. B., Kennedy, J. M., Axford, S., & James, A. (1959). Automatic linkage of vital records. Science, 130(3381), 954–959.
Article Google Scholar
On, B.W., Koudas, N., Lee, D., & Srivastava, D. (2007). Group linkage. In Proceedings of the International Conference on Data Engineering, Istanbul.
Google Scholar
OWL. (2004). Ontology web language. Overview. Retrieved February 15, 2013, from http://www.w3.org/TR/2004/REC-owl-features-20040210/
Pernici, B., & Scannapieco, M. (2003). Data quality in web information systems. Journal of Data Semantics, 1, 48–68.
Article Google Scholar
RDF. (2002). Resource description framework. Retrieved February 15, 2013, from http://www.w3.org/RDF/
RDF Schema. (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved February 15, 2013, from http://www.w3.org/TR/rdf-schema/
Sarawagi, S., & Kirpal, A. (2004). Efficient set joins on similarity predicates. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’04), Paris.
Google Scholar
Wikidata. (2012). Wikidata. Retrieved February 15, 2013, from http://www.wikidata.org/wiki/Wikidata:Main_Page
Zardetto, D., Scannapieco, M., & Catarci, T. (2010). Effective automated object matching. In Proceedings of the International Conference on Data Engineering (ICDE 2010), Longbeach.
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Communication Technology Directorate, ISTAT, Italian National Institute of Statistics, Via Balbo 16, 00185, Roma, Italia
Monica Scannapieco

Authors

Monica Scannapieco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monica Scannapieco .

Editor information

Editors and Affiliations

Oxford Internet Institute, University of Oxford, Oxford, Oxfordshire, UK
Luciano Floridi
Department of Science and Technology Studies, University College London, London, UK
Phyllis Illari

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Scannapieco, M. (2014). Object Matching: New Challenges for Record Linkage. In: Floridi, L., Illari, P. (eds) The Philosophy of Information Quality. Synthese Library, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-07121-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-07121-3_6
Published: 15 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07120-6
Online ISBN: 978-3-319-07121-3
eBook Packages: Humanities, Social Sciences and LawPhilosophy and Religion (R0)

Publish with us

Policies and ethics

Object Matching: New Challenges for Record Linkage

Abstract

Access this chapter

Similar content being viewed by others

Record Linkage

Record Linkage

Record Linkage

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Object Matching: New Challenges for Record Linkage

Abstract

Access this chapter

Similar content being viewed by others

Record Linkage

Record Linkage

Record Linkage

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation