Advertisement

Private Record Linkage: Comparison of Selected Techniques for Name Matching

  • Pawel GrzebalaEmail author
  • Michelle Cheatham
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)

Abstract

The rise of Big Data Analytics has shown the utility of analyzing all aspects of a problem by bringing together disparate data sets. Efficient and accurate private record linkage algorithms are necessary to achieve this. However, records are often linked based on personally identifiable information, and protecting the privacy of individuals is critical. This paper contributes to this field by studying an important component of the private record linkage problem: linking based on names while keeping those names encrypted, both on disk and in memory. We explore the applicability, accuracy and speed of three different primary approaches to this problem (along with several variations) and compare the results to common name-matching metrics on unprotected data. While these approaches are not new, this paper provides a thorough analysis on a range of datasets containing systematically introduced flaws common to name-based data entry, such as typographical errors, optical character recognition errors, and phonetic errors.

Keywords

Record Linkage Optical Character Recognition Encrypt Data Dice Coefficient Data Consumer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work was partially supported by the LexisNexis corporation.

References

  1. 1.
    Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Sixth IEEE International Conference on Data Mining Workshops, ICDM Workshops 2006, pp. 290–294. IEEE (2006)Google Scholar
  2. 2.
    Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012)Google Scholar
  3. 3.
    Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inform. Decis. Mak. 4(1), 9 (2004)CrossRefGoogle Scholar
  4. 4.
    Dreßler, K., Ngomo, A.C.N.: Time-efficient execution of bounded jaro-winkler distances. In: Proceedings of the 9th International Conference on Ontology Matching, vol. 1317, pp. 37–48. CEUR-WS. org (2014)Google Scholar
  5. 5.
    Giereth, M.: On partial encryption of RDF-graphs. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 308–322. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Keskustalo, H., Pirkola, A., Visala, K., Leppänen, E., Järvelin, K.: Non-adjacent digrams improve matching of cross-lingual spelling variants. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 252–265. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Muñoz, J.C., Tamura, G., Villegas, N.M., Müller, H.A.: Surprise: user-controlled granular privacy and security for personal data in smartercontext. In: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, pp. 131–145. IBM Corp. (2012)Google Scholar
  8. 8.
    Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12) (1990)Google Scholar
  9. 9.
    Snae, C.: A comparison and analysis of name matching algorithms. Int. J. Appl. Sci. Eng. Technol. 4(1), 252–257 (2007)Google Scholar
  10. 10.
    Tran, K.N., Vatsalan, D., Christen, P.: Geco: an online personal data generator and corruptor. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2473–2476. ACM (2013)Google Scholar
  11. 11.
    Vatsalan, D., Christen, P., Verykios, V.S.: An efficient two-party protocol for approximate matching in private record linkage. In: Proceedings of the Ninth Australasian Data Mining Conference, vol. 121, pp. 125–136. Australian Computer Society, Inc. (2011)Google Scholar
  12. 12.
    Yakout, M., Atallah, M.J., Elmagarmid, A.: Efficient private record linkage. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1283–1286. IEEE (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.DaSe LabWright State UniversityDaytonUSA

Personalised recommendations