Skip to main content

A Declarative Approach to Entity Resolution

  • Chapter
  • First Online:
Book cover Data Engineering

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 132))

  • 2768 Accesses

Abstract

As companies gather and process more data from disparate sources, they are relying more heavily on entity resolution. Currently, creating an entity resolution system is a very procedural process. Blocking, transitive closure, and matching must all be pieced together whether by an Extract, Transform, and Load (ETL) tool or by a custom program (Galhardas et al. 2000). This is similar to the state of data querying before the advent of the Structured Query Language (SQL). In this chapter, a declarative approach to entity resolution is presented that gives the user the ability to specify what he or she would like resolved while allowing a code generator to determine the best way to resolve it. This chapter does not explore algorithms for blocking, transitive closure, clustering, or matching, but instead refers to papers on those subjects written by other authors (Baxter et al. 2003; Gu and Baxter 2004; Winkler 2000, 2003; Jaro 1989; Bhattacharya and Getoor 2006). Instead a background and defense of entity resolution and declarative languages is presented with a declarative solution and a possible representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baxter R, Christen P, and Churches T (2003) A Comparison of Fast Blocking Methods for Record Linkage. Proceedings of the ACM SIGKDD’03 Workshop on Data Cleaning, Record Linakge, and Object Consolidation.

    Google Scholar 

  • Benjelloun O, et al. (2006a) Swoosh: A Generic Approach to Entity Resolution. Stanford University Technical Report.

    Google Scholar 

  • Benjelloun O, et al. (2006b) DSwoosh: A Family of Algorithms for Generic, Distributed Entity Resolution. Stanford University Technical Report.

    Google Scholar 

  • Bhattacharya I, Getoor L (2005) Entity Resolution in Graphs. University of Maryland Technical Report CS-TR-4758.

    Google Scholar 

  • Bhattacharya I, Getoor L (2006) Collective Entity Resolution in Relational Data. IEEE Data Engineering Bulletin, Special Issue on Data Quality, June 2006.

    Google Scholar 

  • Cohen W, Richman J (2002) Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  • Fellegi I, Sunter A (1969) A Theory for Record Linkage. Journal of the American Statistical Association.

    Google Scholar 

  • Galhardas H et al. (2000) An Extensible Framework for Data Cleansing. International Conference on Data Engineering.

    Google Scholar 

  • Gu L, Baxter R (2004) Adaptive Filtering for Efficient Record Linkage. Proceedings of the Fourth SIAM International Conference on Data Mining.

    Google Scholar 

  • Hernandez M, Stolfo S (1995) The merge/purge problem for large databases. Proceedings of the 1995 ACM SIGMOD International Conference on Data Engineering.

    Google Scholar 

  • Jaro M (1989) Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association.

    Google Scholar 

  • Jin L, Li C, Mehrotra S (2003) Efficient Record Linkage in Large Data Sets. Proceedings of the 8th International Conference on Database Systems for Advanced Applications.

    Google Scholar 

  • McCallum A, Nigum K, Unger L (2000) Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. Knowledge Discovery and Data Mining.

    Google Scholar 

  • Navarro G (2001) A Guided Tour to Approximate String Matching. ACM Computing Surveys.

    Google Scholar 

  • Singla P, Domingos P (2006) Entity Resolution with Markov Logic. Proceedings of the Sixth International Conference on Data Mining.

    Google Scholar 

  • Spring (2008) retrieved 2008 from http://www.springframework.org.

  • Winkler W (1988) Using the EM Algorithm for Weight Computation in the Fellegi-Sunter Model of Record Linkage. Proceedings of the Section on Survey Research Methods, American Statistical Association.

    Google Scholar 

  • Winkler W (1990) String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. Proceedings of the Section on Survey Research Methods, American Statistical Association.

    Google Scholar 

  • Winkler W (2000) Machine Learning, Information Retrieval, and Record Linkage. Proceedings of the Section on Survey Research Methods, American Statistical Association.

    Google Scholar 

  • Winkler W (2003) Data Cleaning Methods. Proceedings of the ACM Workshop on Data Cleaning, Record Linkage, and Object Identification.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Gibbs, T.H. (2009). A Declarative Approach to Entity Resolution. In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-0176-7_2

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-0175-0

  • Online ISBN: 978-1-4419-0176-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics