Abstract
Record Linkage is the problem of identifying pairs of records coming from different sources and representing the same real world object. Available techniques for record linkage provide a satisfying answer when data are “traditional” records, that is well-structured information with clearly identified metadata describing values. When this latter condition does not hold, record linkage is most properly called Object Matching. In this paper, we will focus on objects that have “some degree of structure”, which is the case of most part of the data available on the Web. We will describe challenges of Object Matching when objects have this latter meaning, and we will provide several examples of techniques that permit to face some of these challenges.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A uniform resource identifier (URI) is a string of characters used to identify a name of a web resource.
References
Bergman, M. K. (2001). The deep web: Surfacing hidden value. The Journal of Electronic Publishing, 7(1), 1–17.
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. E. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.
Bleiholder, J., & Naumann, F. (2008). Data fusion. ACM Computing Surveys, 41(1), 1–41.
Chaudhuri, S., Ganti, V., & Motwani, R. (2005). Robust identification of fuzzy duplicates. In Proceedings of the International Conference on Data Engineering (ICDE 2005), Tokyo.
Chen, Z., Kalashnikov, D. V., & Mehrotra, S. (2005). Exploiting relationships for object consolidation. In Proceedings of the International Workshop on Information Quality in Information Systems (IQIS), Baltimore.
Datalift. (2011). Deliverable 4.1 methods for automated dataset interlinking.
DBPedia. (2007). DBPedia. Retrieved February 15, 2013, from http://dbpedia.org/About
Dempster, A. P., Laird, N., & Rubin, D. B. (1977). Maximum-likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Dunn, H. L. (1946). Record linkage. American Journal of Public Health, 36, 1412–1416.
Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge Data Engineering, 19(1), 57–72.
Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.
Fortuna, C., & Grobelnik, M. (2011). Tutorial: The web of things. In Proceedings of the World Wide Web Conference, Hyderabad.
Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., & Srivastava, D. (2001). Approximate string joins in a database (almost) for free. In Proceedings of Very Large Data Base (VLDB 2001), Rome.
Halpin, H., Hayes, P. J., McCusker, J. P., McGuinness, D. L., & Thompson, H. S. (2010). When owl:sameas isn’t the same: An analysis of identity in linked data. In Proceedings of the 9th International Semantic Web Conference (ISWC), Shanghai.
Hernandez, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.
Kolb, L., Thor, A., & Rahm, E. (2012a). Multi-pass sorted neighborhood blocking with MapReduce. Computer Science – Research and Development, 27(1), 45–63.
Kolb, L., Thor, A., & Rahm, E. (2012b). Dedoop: Efficient deduplication with Hadoop. In Proceedings of the 38th International Conference on Very Large Databases (VLDB)/Proceedings of the VLDB Endowment 5(12), 1878–1881.
Li, P., Dong, X. L., Maurino, A., & Srivastava, D. (2012). Linking temporal records. Frontiers of Computer Science, 6(3), 293–312.
Li, X., Luna Dong, X., Lyons, K. B., & Srivastava, D. (2012). Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 97–108.
Linked Data. (2006). Retrieved February 15, 2013, from http://linkeddata.org/
Milano, D., Scannapieco, M., & Catarci, T. (2006). Structure-aware XML object identification. IEEE Data Engineering Bulletin, 29(2), 67–74.
Newcombe, H. B., Kennedy, J. M., Axford, S., & James, A. (1959). Automatic linkage of vital records. Science, 130(3381), 954–959.
On, B.W., Koudas, N., Lee, D., & Srivastava, D. (2007). Group linkage. In Proceedings of the International Conference on Data Engineering, Istanbul.
OWL. (2004). Ontology web language. Overview. Retrieved February 15, 2013, from http://www.w3.org/TR/2004/REC-owl-features-20040210/
Pernici, B., & Scannapieco, M. (2003). Data quality in web information systems. Journal of Data Semantics, 1, 48–68.
RDF. (2002). Resource description framework. Retrieved February 15, 2013, from http://www.w3.org/RDF/
RDF Schema. (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved February 15, 2013, from http://www.w3.org/TR/rdf-schema/
Sarawagi, S., & Kirpal, A. (2004). Efficient set joins on similarity predicates. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’04), Paris.
Wikidata. (2012). Wikidata. Retrieved February 15, 2013, from http://www.wikidata.org/wiki/Wikidata:Main_Page
Zardetto, D., Scannapieco, M., & Catarci, T. (2010). Effective automated object matching. In Proceedings of the International Conference on Data Engineering (ICDE 2010), Longbeach.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Scannapieco, M. (2014). Object Matching: New Challenges for Record Linkage. In: Floridi, L., Illari, P. (eds) The Philosophy of Information Quality. Synthese Library, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-07121-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-07121-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07120-6
Online ISBN: 978-3-319-07121-3
eBook Packages: Humanities, Social Sciences and LawPhilosophy and Religion (R0)