Abstract
This research describes some of the results from an unsupervised ER process using cluster entropy as a way to self-regulate linking. The experiments were performed using synthetic person references of varying quality. The process was able to obtain a linking accuracy of 93% for samples with moderate to high data quality. While results for low-quality references were much lower, there are many possible avenues of research that could further improve the results from this process. The purpose of this research is to allow ER processes to self-regulate linking based on cluster entropy. The results are very promising for entity references of relatively high quality; using this process for low-quality data needs further improvement. The best overall result obtained from the sample was just over 50% linking accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J.R. Talburt, Entity resolution and information quality (Elsevier, Burlington, 2011)
J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integration (Elsevier, Waltham, 2015)
D. Hand, P. Christen, A note on using the F-measure for evaluating record linking algorithms. Stat. Comput. 28, 539–547 (2018)
D. Menstrina, S. Whang, H. Garcia-Moliina, Evalutation entity resolution results, in Proceedings of the VLDB Endowment, (2010)
C. E. Shannon, A note on the concept of entropy, Bell Syst. Tech. J., 1948.
Y. Ye, J.R. Talburt, Generating synthetic data to support entity resolution education and research. J. Comput. Sci Coll 34(7), 12–19 (2019)
J.R. Talburt, Y. Zhou, S.Y. Shivaiah, SOG: A synthetic occupancy generator to support entity resolution instruction and research. MIT Int. Conf. Inf. Qual., 91–105 (2009)
A. Alsarkhi, J. R. Talburt, Optimizing inverted index blocking for the matrix comparator in linking unstandardized references, in Proceedings of the 2019 International Conference on Scientific Computing, 2019.
A. Alsarkhi, J. Talburt, An analysis of the effect of stop words on the performance of the matrix comparator for entity resolution. J. Comput. Sci. Coll., 67–71 (2019)
A. Alsarkhi, R. T. John, A scalable, hybrid entity resolution process for unstandardized entity references, The Journal of Computing Sciences in Colleges Papers of the 18th Annual CCSC Mid-South Conference, 2020, pp. 19–29.
A. E. Monge, C. P. Elkan, The field matching problem: Algorithms and applications, in KDD-96 Proceedings, 1996.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Al Sarkhi, A., Talburt, J.R. (2021). Using Entropy Measures for Evaluating the Quality of Entity Resolution. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, CY., Arabnia, H.R., Deligiannidis, L. (eds) Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71704-9_69
Download citation
DOI: https://doi.org/10.1007/978-3-030-71704-9_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71703-2
Online ISBN: 978-3-030-71704-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)