Skip to main content

A Method for Match Key Blocking in Probabilistic Matching

(Research-in-Progress)

  • Conference paper
  • First Online:
Information Technology: New Generations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 448))

Abstract

The pair-wise nature of Entity Resolution makes it impractical to perform on large datasets without the use of blocking. Many blocking techniques have been researched and applied to effectively reduce pair-wise comparisons in Boolean rule based systems while also providing 100 % match recall. However, these approaches do not always work when applied to probabilistic matching. This paper discusses an approach to blocking for probabilistic scoring rules through the use of match key indexing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pullen, D., Wang, P., Wu, N., Talburt, J.R.: Mitigating data quality impairment on entity resolution errors in student enrollment data. In: Proceedings: Information and Knowledge Engineering Conference, 2013, pp. 96–100 (2013)

    Google Scholar 

  2. Winkler, W.: Using the EM Algorithm for Weight Computation in the Felligi-Sunter Model of Record Linkage. Report No. RR2000/05, Statistical Research Division, Methodology and Standards Directorate, U.S. Bureau of the Census, Washington, DC (2000)

    Google Scholar 

  3. Wang, P., Pullen, D., Wu, N., Talburt, J.R.: Iterative approach to weight calculation in probabilistic entity resolution. In: Proceedings: International Conference on Information Quality (ICIQ-19), Xi’an, China, August 1–3, 2014, pp. 245–258 (2014)

    Google Scholar 

  4. Talburt, J.R.: Entity Resolution and Information Quality. Morgan Kaufmann/Elsevier, San Francisco (2011)

    Google Scholar 

  5. Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2013)

    Google Scholar 

  6. Ihara, S.: Entropy. Information Theory for Continuous Systems, pp. 1–2. World Scientific, Singapore (1993)

    Google Scholar 

  7. Talburt, J.R., Zhou, Y.: Entity Information Life Cycle for Big Data: Master Data Management and Information Integration. Morgan Kaufmann (2015)

    Google Scholar 

  8. Zhou, Y., Talburt, J.R.: Entity Identity Information Management (EIIM). In: Proceedings: International Conference on Information Quality (ICIQ-11), Adelaide, Australia, November 18–20, 2011, pp. 327–341 (2011)

    Google Scholar 

  9. Fellegi, I., Sunter, A.: A Theory for Record Linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)

    Article  MATH  Google Scholar 

  10. Zhou, Y., Talburt, J.R., Kobayashi, F., Nelson, E.D.: Implementing Boolean matching rules in an entity resolution system using XML scripts. In: Proceedings: Information and Knowledge Engineering Conference, 2012, pp. 332–337 (2012)

    Google Scholar 

  11. Zhou, Y., Nelson, E., Talburt, J.R.: User-defined inverted index in boolean, rule-based entity resolution systems. In: Proceedings: International Conference on Information Technology: New Generations, Las Vegas, Nevada, April, 2013, pp. 608–612 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, P., Pullen, D., Talburt, J.R., Chen, C. (2016). A Method for Match Key Blocking in Probabilistic Matching. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_73

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32467-8_73

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32466-1

  • Online ISBN: 978-3-319-32467-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics