Fake Injection Strategies for Private Phonetic Matching

  • Alexandros Karakasidis
  • Vassilios S. Verykios
  • Peter Christen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7122)


In many aspects of everyday life, from education to health care and from economics to homeland security, information exchange involving companies or government agencies has become a common application. Locating the same real world entities within this information however is not trivial at all due to insufficient identifying information, misspellings, etc. The problem becomes even more complicated when privacy considerations arise. This introduction describes an informal approach to the privacy preserving record linkage problem. In this paper we provide a solution to this problem by examining the alternatives offered by phonetic codes, a range of algorithms which despite their age, are still used for record linkage purposes. The main contribution of our work, as our extensive experimental evaluation indicates, is that our methodology manages to offer privacy guarantees for performing Privacy Preserving Record Linkage without the need of computationally expensive cryptographic methods.


Privacy Record Linkage Approximate Matching Phonetic Encoding Relative Information Gain 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atallah, M.J., Kerschbaum, F., Du, W.: Secure and private sequence comparisons. In: Proceedings of the 2003 ACM Workshop on Privacy in the Electronic Society, pp. 39–44. ACM, New York (2003)CrossRefGoogle Scholar
  2. 2.
    Christen, P.: A comparison of personal name matching: Techniques and practical issues. In: Workshop on Mining Complex Data, held at IEEE ICDM 2006, Hong Kong (2006)Google Scholar
  3. 3.
    Christen, P.: Febrl-: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1065–1068. ACM (2008)Google Scholar
  4. 4.
    Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering 99(PrePrints) (2011)Google Scholar
  5. 5.
    Churches, T., Christen, P.: Blind Data Linkage Using n-gram Similarity Comparisons. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 121–126. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Clifton, C., Kantarcioglu, M., Doan, A., Schadow, G., Vaidya, J., Elmagarmid, A., Suciu, D.: Privacy-preserving data integration and sharing. In: DMKD 2004: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 19–26. ACM (2004)Google Scholar
  7. 7.
    Durham, E.A., Xue, Y., Kantarcioglu, M., Malin, B.: Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage. Information Fusion (in press, 2011)Google Scholar
  8. 8.
    Elfeky, M.G., Elmagarmid, A.K., Verykios, V.S.: Tailor: A record linkage tool box. In: ICDE, pp. 17–28 (2002)Google Scholar
  9. 9.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)CrossRefGoogle Scholar
  10. 10.
    Gill, L.E.: The Oxford medical record linkage system. In: Int’l Record Linkage Workshop and Exposition, pp. 15–33 (1997)Google Scholar
  11. 11.
    Hall, R., Fienberg, S.E.: Privacy-Preserving Record Linkage. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 269–283. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: ICDE, pp. 496–505 (2008)Google Scholar
  13. 13.
    Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 123–134. ACM, New York (2010)CrossRefGoogle Scholar
  14. 14.
    Kantarcioglu, M., Jiang, W., Malin, B.: A Privacy-Preserving Framework for Integrating Person-Specific Databases. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 298–314. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Karakasidis, A., Verykios, V.S.: Privacy preserving record linkage using phonetic codes. In: Proceedings of the 4th Balkan Conference of Informatics, pp. 101–106 (2009)Google Scholar
  16. 16.
    Karakasidis, A., Verykios, V.S.: Advances in privacy preserving record linkage. In: E-Activity and Intelligent Web Construction: Effects of Social Design, pp. 22–29. IGI Global (2011)Google Scholar
  17. 17.
    Karakasidis, A., Verykios, V.S.: Secure blocking + secure matching = secure record linkage. J. of Comp. Science and Engineering 5(3), 101–106 (2011)Google Scholar
  18. 18.
    Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)MathSciNetGoogle Scholar
  19. 19.
    Morgenstern, M.: Security and inference in multilevel database and knowledge-base systems. In: Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data, SIGMOD 1987, pp. 357–373. ACM, New York (1987)CrossRefGoogle Scholar
  20. 20.
  21. 21.
    Odell, M., Russell, R.C.: The Soundex coding system. US Patents, 1261167 (1918)Google Scholar
  22. 22.
    Philips, L.: Hanging on the metaphone. Computer Language 7(12) (December 1990)Google Scholar
  23. 23.
    Rivest, R.L.: The MD5 message-digest algorithm (rfc 1321),
  24. 24.
    Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.K.: Privacy preserving schema and data matching. In: SIGMOD Conference, pp. 653–664 (2007)Google Scholar
  25. 25.
    Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making 9(1), 41+ (2009)CrossRefGoogle Scholar
  26. 26.
    Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423 (1948)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Sweeney, L.: K-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge Based Systems 10(5), 557–570 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Taft, R.L.: Name search techniques. Technical report, New York State Identification and Intelligence System, Albany, N.Y. (February 1970)Google Scholar
  29. 29.
    Trepetin, S.: Privacy-preserving string comparisons in record linkage systems: A review. Information Security Journal: A Global Perspective 17(5&6), 253–266 (2008)CrossRefGoogle Scholar
  30. 30.
    Verykios, V.S., Karakasidis, A., Mitrogiannis, V.K.: Privacy preserving record linkage approaches. Int. J. Data Mining, Modelling and Management 1(2), 206–221 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Alexandros Karakasidis
    • 1
  • Vassilios S. Verykios
    • 2
  • Peter Christen
    • 3
  1. 1.Department of Computer and Communication EngineeringUniversity of ThessalyVolosGreece
  2. 2.School of Science and TechnologyHellenic Open UniversityPatrasGreece
  3. 3.ANU College of Engineering and Computer ScienceThe Australian National UniversityCanberraAustralia

Personalised recommendations