The Effects of Location Access Behavior on Re-identification Risk in a Distributed Environment

  • Bradley Malin
  • Edoardo Airoldi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4258)


In this paper, we investigate how location access patterns influence the re-identification of seemingly anonymous data. In the real world, individuals visit different locations that gather similar information. For instance, multiple hospitals collect health information on the same patient. To protect anonymity for research purposes, hospitals share sensitive data, such as DNA sequences, stripped of explicit identifiers. Separately, for administrative functions, identified data, stripped of DNA, is made available. On a hospital by hospital basis, each pair of DNA and identified databases appears unlinkable, however, links can be established when multiple locations’ database are studied. This problem, known as trail re-identification, is a generalized phenomenon and occurs because an individual’s location access pattern can be matched across the shared databases.

Data holders can not exchange data to find and suppress trails that would be re-identified. Thus, it is important to assess the re-identification risk in a system in order to develop techniques to mitigate it. In this research, we evaluate several real world datasets and observe trail re-identification is related to the number of people to places. To study this phenomenon in more detail, we develop a generative model for location access patterns that simulates observed behavior. We evaluate trail re-identification risk in a range of simulated patterns and our findings suggest that the skew of the distribution of people to places is one of the main factors that drives trail re-identification.


Sensitive Data Record Linkage Reserved System Location Access Biomedical Informatics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altman, R.: Bioinformatics in support of molecular medicine. In: Proceedings of the American Medical Informatics Association Annual Symposium, Miami Beach, FL, pp. 53–61 (1998)Google Scholar
  2. 2.
    Sax, U., Schmidt, S.: Integration of genomic data in electronic health records: opportunities and dilemmas. Methods of Information in Medicine 44, 546–550 (2005)Google Scholar
  3. 3.
    Altman, R., Klein, T.: Challenges for biomedical informatics and pharmacogenomics. Annual Review of Pharmacology and Toxicology 42, 113–133 (2002)CrossRefGoogle Scholar
  4. 4.
    Department of Health and Human Services: 45 cfr (code of federal regulations), parts 160 - 164. standards for privacy of individually identifiable health information, final rule. Federal Register 67, 53182–53273 (2002)Google Scholar
  5. 5.
    Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 179–192 (2004)CrossRefGoogle Scholar
  6. 6.
    Karat, C., Brodie, C., Karat, J.: Usable privacy and security for personal information management. Communications of the ACM 49, 51–55 (2006)CrossRefGoogle Scholar
  7. 7.
    Malin, B.: Betrayed by my shadow: learning data identity via trail matching. Journal of Privacy Technology, 20050609001 (2005)Google Scholar
  8. 8.
    de Moor, G., Claerhout, B., de Meyer, F.: Privacy enhancing technologies: the key to secure communication and management of clinical and genomic data. Methods of Information in Medicine 42, 148–153 (2003)Google Scholar
  9. 9.
    Gulcher, J., Kristjansson, K., Gudbjartsson, H., Stefansson, K.: Protection of privacy by third-party encryption in genetic research. European Journal of Human Genetics 8, 739–742 (2000)CrossRefGoogle Scholar
  10. 10.
    Lin, Z., Owen, A., Altman, R.: Genomic research and human subject privacy. Science 305 (2004)Google Scholar
  11. 11.
    Malin, B., Sweeney, L.: Composition and disclosure of unlinkable distributed databases. In: Proceedings of the 22nd IEEE International Conference on Data Engineering, Atlanta, GA (2006)Google Scholar
  12. 12.
    Airoldi, E.M.: A statistical theory of record linkage with applications to privacy. Technical Report CMU-ISRI-05-112, School of Computer Science, Carnegie Mellon University (2004) Revision (December 2005)Google Scholar
  13. 13.
    Bender, S., Brand, R., Bacher, J.: Re-identifying register data by survey data: an empirical study. Statistical Journal of the United Nations ECE 18, 373–381 (2001)Google Scholar
  14. 14.
    Griffith, V., Jakobsson, M.: Messin with texas: deriving mother’s maiden name using public records. In: Proceedings of the Applied Cryptography and Network Security Conference, New York, NY (2005)Google Scholar
  15. 15.
    Malin, B., Sweeney, L.: Determining the identifiability of dna database entries. In: Proceedings of the American Medical Informatics Association Annual Symposium, Los Angeles, CA, pp. 537–541 (2000)Google Scholar
  16. 16.
    Sweeney, L.: Uniqueness of simple demographics in the us population. Technical Report LIDAP-WP04, Data Privacy Laboratory, Carnegie Mellon University, Pittsburgh, PA (2000)Google Scholar
  17. 17.
    Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Springer, New York (1996)MATHGoogle Scholar
  18. 18.
    Danezis, G., Serjantov, A.: Statistical disclosure or intersection attacks on anonymity systems. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)Google Scholar
  19. 19.
    Kesdogan, D., Agrawal, D., Penz, S.: Limits of anonymity in open environments. In: Varadharajan, V., Mu, Y. (eds.) ACISP 2001. LNCS, vol. 2119, Springer, Heidelberg (2001)Google Scholar
  20. 20.
    Winkler, W.E.: Matching and record linkage. In: Cox, et al. (eds.) Business Survey Methods, pp. 355–384. J. Wiley, New York (1995)Google Scholar
  21. 21.
    Winkler, W.: Data cleaning methods. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation, Washington, DC (2003)Google Scholar
  22. 22.
    State of Illinois Health Care Cost Containment Council: Data release overview. State of Illinois Health Care Cost Containment Council, Springfield, IL (March 1998)Google Scholar
  23. 23.
    Kraut, R., Mukhopadhyay, T., Szczypula, J., Kiesler, S., Scherlis, B.: Information and communication: alternative uses of the internet in households. Information Systems Research 10, 287–303 (2000)CrossRefGoogle Scholar
  24. 24.
    Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)MATHMathSciNetGoogle Scholar
  25. 25.
    Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 623–656 (1948)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bradley Malin
    • 1
  • Edoardo Airoldi
    • 2
  1. 1.Department of Biomedical InformaticsVanderbilt UniversityNashvilleUSA
  2. 2.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations