Skip to main content

Using Entropy Measures for Evaluating the Quality of Entity Resolution

  • Conference paper
  • First Online:
Advances in Data Science and Information Engineering

Abstract

This research describes some of the results from an unsupervised ER process using cluster entropy as a way to self-regulate linking. The experiments were performed using synthetic person references of varying quality. The process was able to obtain a linking accuracy of 93% for samples with moderate to high data quality. While results for low-quality references were much lower, there are many possible avenues of research that could further improve the results from this process. The purpose of this research is to allow ER processes to self-regulate linking based on cluster entropy. The results are very promising for entity references of relatively high quality; using this process for low-quality data needs further improvement. The best overall result obtained from the sample was just over 50% linking accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. J.R. Talburt, Entity resolution and information quality (Elsevier, Burlington, 2011)

    Google Scholar 

  2. J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integration (Elsevier, Waltham, 2015)

    Google Scholar 

  3. D. Hand, P. Christen, A note on using the F-measure for evaluating record linking algorithms. Stat. Comput. 28, 539–547 (2018)

    Article  MathSciNet  Google Scholar 

  4. D. Menstrina, S. Whang, H. Garcia-Moliina, Evalutation entity resolution results, in Proceedings of the VLDB Endowment, (2010)

    Google Scholar 

  5. C. E. Shannon, A note on the concept of entropy, Bell Syst. Tech. J., 1948.

    Google Scholar 

  6. Y. Ye, J.R. Talburt, Generating synthetic data to support entity resolution education and research. J. Comput. Sci Coll 34(7), 12–19 (2019)

    Google Scholar 

  7. J.R. Talburt, Y. Zhou, S.Y. Shivaiah, SOG: A synthetic occupancy generator to support entity resolution instruction and research. MIT Int. Conf. Inf. Qual., 91–105 (2009)

    Google Scholar 

  8. A. Alsarkhi, J. R. Talburt, Optimizing inverted index blocking for the matrix comparator in linking unstandardized references, in Proceedings of the 2019 International Conference on Scientific Computing, 2019.

    Google Scholar 

  9. A. Alsarkhi, J. Talburt, An analysis of the effect of stop words on the performance of the matrix comparator for entity resolution. J. Comput. Sci. Coll., 67–71 (2019)

    Google Scholar 

  10. A. Alsarkhi, R. T. John, A scalable, hybrid entity resolution process for unstandardized entity references, The Journal of Computing Sciences in Colleges Papers of the 18th Annual CCSC Mid-South Conference, 2020, pp. 19–29.

    Google Scholar 

  11. A. E. Monge, C. P. Elkan, The field matching problem: Algorithms and applications, in KDD-96 Proceedings, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Awaad Al Sarkhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al Sarkhi, A., Talburt, J.R. (2021). Using Entropy Measures for Evaluating the Quality of Entity Resolution. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, CY., Arabnia, H.R., Deligiannidis, L. (eds) Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71704-9_69

Download citation

Publish with us

Policies and ethics