Abstract
The pair-wise nature of Entity Resolution makes it impractical to perform on large datasets without the use of blocking. Many blocking techniques have been researched and applied to effectively reduce pair-wise comparisons in Boolean rule based systems while also providing 100 % match recall. However, these approaches do not always work when applied to probabilistic matching. This paper discusses an approach to blocking for probabilistic scoring rules through the use of match key indexing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pullen, D., Wang, P., Wu, N., Talburt, J.R.: Mitigating data quality impairment on entity resolution errors in student enrollment data. In: Proceedings: Information and Knowledge Engineering Conference, 2013, pp. 96–100 (2013)
Winkler, W.: Using the EM Algorithm for Weight Computation in the Felligi-Sunter Model of Record Linkage. Report No. RR2000/05, Statistical Research Division, Methodology and Standards Directorate, U.S. Bureau of the Census, Washington, DC (2000)
Wang, P., Pullen, D., Wu, N., Talburt, J.R.: Iterative approach to weight calculation in probabilistic entity resolution. In: Proceedings: International Conference on Information Quality (ICIQ-19), Xi’an, China, August 1–3, 2014, pp. 245–258 (2014)
Talburt, J.R.: Entity Resolution and Information Quality. Morgan Kaufmann/Elsevier, San Francisco (2011)
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Berlin (2013)
Ihara, S.: Entropy. Information Theory for Continuous Systems, pp. 1–2. World Scientific, Singapore (1993)
Talburt, J.R., Zhou, Y.: Entity Information Life Cycle for Big Data: Master Data Management and Information Integration. Morgan Kaufmann (2015)
Zhou, Y., Talburt, J.R.: Entity Identity Information Management (EIIM). In: Proceedings: International Conference on Information Quality (ICIQ-11), Adelaide, Australia, November 18–20, 2011, pp. 327–341 (2011)
Fellegi, I., Sunter, A.: A Theory for Record Linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Zhou, Y., Talburt, J.R., Kobayashi, F., Nelson, E.D.: Implementing Boolean matching rules in an entity resolution system using XML scripts. In: Proceedings: Information and Knowledge Engineering Conference, 2012, pp. 332–337 (2012)
Zhou, Y., Nelson, E., Talburt, J.R.: User-defined inverted index in boolean, rule-based entity resolution systems. In: Proceedings: International Conference on Information Technology: New Generations, Las Vegas, Nevada, April, 2013, pp. 608–612 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, P., Pullen, D., Talburt, J.R., Chen, C. (2016). A Method for Match Key Blocking in Probabilistic Matching. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-32467-8_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32466-1
Online ISBN: 978-3-319-32467-8
eBook Packages: EngineeringEngineering (R0)