Skip to main content

Object Matching: New Challenges for Record Linkage

  • Chapter
  • First Online:
The Philosophy of Information Quality

Part of the book series: Synthese Library ((SYLI,volume 358))

Abstract

Record Linkage is the problem of identifying pairs of records coming from different sources and representing the same real world object. Available techniques for record linkage provide a satisfying answer when data are “traditional” records, that is well-structured information with clearly identified metadata describing values. When this latter condition does not hold, record linkage is most properly called Object Matching. In this paper, we will focus on objects that have “some degree of structure”, which is the case of most part of the data available on the Web. We will describe challenges of Object Matching when objects have this latter meaning, and we will provide several examples of techniques that permit to face some of these challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A uniform resource identifier (URI) is a string of characters used to identify a name of a web resource.

References

  • Bergman, M. K. (2001). The deep web: Surfacing hidden value. The Journal of Electronic Publishing, 7(1), 1–17.

    Article  Google Scholar 

  • Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., & Fienberg, S. E. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16–23.

    Article  Google Scholar 

  • Bleiholder, J., & Naumann, F. (2008). Data fusion. ACM Computing Surveys, 41(1), 1–41.

    Article  Google Scholar 

  • Chaudhuri, S., Ganti, V., & Motwani, R. (2005). Robust identification of fuzzy duplicates. In Proceedings of the International Conference on Data Engineering (ICDE 2005), Tokyo.

    Google Scholar 

  • Chen, Z., Kalashnikov, D. V., & Mehrotra, S. (2005). Exploiting relationships for object consolidation. In Proceedings of the International Workshop on Information Quality in Information Systems (IQIS), Baltimore.

    Google Scholar 

  • Datalift. (2011). Deliverable 4.1 methods for automated dataset interlinking.

    Google Scholar 

  • DBPedia. (2007). DBPedia. Retrieved February 15, 2013, from http://dbpedia.org/About

  • Dempster, A. P., Laird, N., & Rubin, D. B. (1977). Maximum-likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.

    Google Scholar 

  • Dunn, H. L. (1946). Record linkage. American Journal of Public Health, 36, 1412–1416.

    Article  Google Scholar 

  • Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detection: A survey. IEEE Transactions on Knowledge Data Engineering, 19(1), 57–72.

    Article  Google Scholar 

  • Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.

    Article  Google Scholar 

  • Fortuna, C., & Grobelnik, M. (2011). Tutorial: The web of things. In Proceedings of the World Wide Web Conference, Hyderabad.

    Google Scholar 

  • Gravano, L., Ipeirotis, P. G., Jagadish, H. V., Koudas, N., Muthukrishnan, S., & Srivastava, D. (2001). Approximate string joins in a database (almost) for free. In Proceedings of Very Large Data Base (VLDB 2001), Rome.

    Google Scholar 

  • Halpin, H., Hayes, P. J., McCusker, J. P., McGuinness, D. L., & Thompson, H. S. (2010). When owl:sameas isn’t the same: An analysis of identity in linked data. In Proceedings of the 9th International Semantic Web Conference (ISWC), Shanghai.

    Google Scholar 

  • Hernandez, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.

    Article  Google Scholar 

  • Kolb, L., Thor, A., & Rahm, E. (2012a). Multi-pass sorted neighborhood blocking with MapReduce. Computer Science – Research and Development, 27(1), 45–63.

    Article  Google Scholar 

  • Kolb, L., Thor, A., & Rahm, E. (2012b). Dedoop: Efficient deduplication with Hadoop. In Proceedings of the 38th International Conference on Very Large Databases (VLDB)/Proceedings of the VLDB Endowment 5(12), 1878–1881.

    Google Scholar 

  • Li, P., Dong, X. L., Maurino, A., & Srivastava, D. (2012). Linking temporal records. Frontiers of Computer Science, 6(3), 293–312.

    Google Scholar 

  • Li, X., Luna Dong, X., Lyons, K. B., & Srivastava, D. (2012). Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 97–108.

    Google Scholar 

  • Linked Data. (2006). Retrieved February 15, 2013, from http://linkeddata.org/

  • Milano, D., Scannapieco, M., & Catarci, T. (2006). Structure-aware XML object identification. IEEE Data Engineering Bulletin, 29(2), 67–74.

    Google Scholar 

  • Newcombe, H. B., Kennedy, J. M., Axford, S., & James, A. (1959). Automatic linkage of vital records. Science, 130(3381), 954–959.

    Article  Google Scholar 

  • On, B.W., Koudas, N., Lee, D., & Srivastava, D. (2007). Group linkage. In Proceedings of the International Conference on Data Engineering, Istanbul.

    Google Scholar 

  • OWL. (2004). Ontology web language. Overview. Retrieved February 15, 2013, from http://www.w3.org/TR/2004/REC-owl-features-20040210/

  • Pernici, B., & Scannapieco, M. (2003). Data quality in web information systems. Journal of Data Semantics, 1, 48–68.

    Article  Google Scholar 

  • RDF. (2002). Resource description framework. Retrieved February 15, 2013, from http://www.w3.org/RDF/

  • RDF Schema. (2004). RDF vocabulary description language 1.0: RDF schema. Retrieved February 15, 2013, from http://www.w3.org/TR/rdf-schema/

  • Sarawagi, S., & Kirpal, A. (2004). Efficient set joins on similarity predicates. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’04), Paris.

    Google Scholar 

  • Wikidata. (2012). Wikidata. Retrieved February 15, 2013, from http://www.wikidata.org/wiki/Wikidata:Main_Page

  • Zardetto, D., Scannapieco, M., & Catarci, T. (2010). Effective automated object matching. In Proceedings of the International Conference on Data Engineering (ICDE 2010), Longbeach.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monica Scannapieco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Scannapieco, M. (2014). Object Matching: New Challenges for Record Linkage. In: Floridi, L., Illari, P. (eds) The Philosophy of Information Quality. Synthese Library, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-07121-3_6

Download citation

Publish with us

Policies and ethics