A Privacy-Preserving Framework for Integrating Person-Specific Databases

  • Murat Kantarcioglu
  • Wei Jiang
  • Bradley Malin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5262)


Many organizations capture personal information, but the quantity of records needed to detect statistically significant patterns is often beyond the grasp of a single data collector. In the biomedical realm, this problem has pressed regulatory agencies to require funded investigators to share research-derived data to public repositories. The challenge; however, is that shared records must not reveal the identity of the subjects. In this paper, we extend a secure framework in which data holders contribute and query encrypted person-specific data stored on a third party’s server. Specifically, we develop protocols that enable data holders to merge personal records, thus creating larger profiles and diminishing duplication. The repository administrator can merge records via encrypted identifiers without decrypting or inferring the contents of the joined records. Our model is more practical than prior secure join methods because each data holder needs only a single interaction with the central repository. We further present an extension to the protocol that permits the revelation of k-anonymous demographics, such that the administrator can perform joins more efficiently with the guarantee that each record can be linked to no less than k individuals in the population. We prove the privacy preserving features of our protocols and experimentally evaluate their efficiency in a real world Census dataset.


Data Site Privacy Preserve Secure Framework Data Holder Count Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    National Institutes of Health: Final NIH statement on sharing research data. NOT-OD-03-032 (2003)Google Scholar
  2. 2.
    National Institutes of Health: Genome-wide studies in biorepositories with electronic medical record data. RFA-HG-07-05 (2007)Google Scholar
  3. 3.
    National Institutes of Health: Policy for sharing of data obtained in nih supported or conducted genome-wide association studies. NOT-OD-07-88 (2007)Google Scholar
  4. 4.
    Benkner, S., Berti, G., Engelbrecht, G., Fingberg, J., Kohring, G., Middleton, S., Schmidt, R.: Gemss: grid-infrastructure for medical service provision. Methods of Information in Medicine 44, 177–181 (2005)Google Scholar
  5. 5.
    Anonymous: Medicine’s new central bankers. The Economist (2005)Google Scholar
  6. 6.
    Barbour, V.: UK Biobank: a project in search of a protocol? Lancet 361, 1734–1738 (2003)CrossRefGoogle Scholar
  7. 7.
    Kantarcioglu, M., Jiang, W., Liu, Y., Malin, B.: A cryptographic approach to securely share and query genomic sequences. IEEE Transactions on Information Technology in Biomedicine (in press, 2008)Google Scholar
  8. 8.
    Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 179–192Google Scholar
  9. 9.
    Helliker, K.: A new medical worry: identity thieves find ways to target hospital patients. Wall Street Journal (2005)Google Scholar
  10. 10.
    Quantin, C., Allaert, F., Avillach, P., Fassa, M., Riandey, B., Trouessin, G., Cohen, O.: Building application-related patient identifiers: what solution for a european country? Int. J. Telemed Appl., 678302 (2008)Google Scholar
  11. 11.
    Grannis, S., Overhage, J., McDonald, C.: Analysis of identifier performance using a deterministic linkage algorithm. In: Proceedings of the 2002 American Medical Informatics Annual Fall Symposium, pp. 305–309 (2002)Google Scholar
  12. 12.
    Berman, J.: Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions. Archives of Pathology and Laboratory Medicine 128, 344–346 (2004)Google Scholar
  13. 13.
    Sweeney, L.: k-Anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 557–570 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 1010–1027 (2001)CrossRefGoogle Scholar
  15. 15.
    Clifton, C., Kantarcioglu, M., Foan, A., Schadow, G., Vaidya, J., Elmagarmid, A.: Privacy-preserving data integration and sharing. In: Proc. of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2004)Google Scholar
  16. 16.
    Bhowmick, S., Gruenwald, L., Iwaihara, M., Chatvichienchai, S.: Private-iye: A framework for privacy preserving data integration. In: Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW 2006). IEEE Computer Society, Los Alamitos (2006)Google Scholar
  17. 17.
    Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007)Google Scholar
  18. 18.
    Agrawal, R., Asonov, D., Kantarcioglu, M., Li, Y.: Sovereign joins. In: ICDE 2006: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Washington (2006)Google Scholar
  19. 19.
    Kissner, L., Song, D.: Privacy preserving set operations. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 241–257. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Freedman, M.J., Nissim, K., Pinkas, B.: Efficient private matching and set intersection. In: Eurocrypt 2004, Interlaken, Switzerland, International Association for Cryptologic Research (IACR) (2004)Google Scholar
  21. 21.
    Emekci, F., Agrawal, D., El Abbadi, A., Gulbeden, A.: Privacy preserving query processing using third parties. In: Proceedings of ICDE 2006, Atlanta, GA (2006)Google Scholar
  22. 22.
    Pon, R., Critchlow, T.: Performance-oriented privacy-preserving data integration. In: Data Integration in the Life Sciences, pp. 240–256. Springer, Heidelberg (2005)Google Scholar
  23. 23.
    Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th Int’l Conf. on Data Engineering - ICDE 2008 (2008)Google Scholar
  24. 24.
    Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)Google Scholar
  25. 25.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  26. 26.
    Sweeney, L.: Guaranteeing anonymity when sharing medical data, the datafly system. In: Proceedings of the 1997 American Medical Informatics Association Annual Fall Symposium, pp. 51–55 (1997)Google Scholar
  27. 27.
    IBM: IBM PCI cryptographic coprocessor (2004),
  28. 28.
    Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)Google Scholar
  29. 29.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 571–588 (2002)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Murat Kantarcioglu
    • 1
  • Wei Jiang
    • 2
  • Bradley Malin
    • 3
  1. 1.Department of Computer ScienceUniversity of Texas at Dallas 
  2. 2.Department of Computer SciencePurdue University 
  3. 3.Department of Biomedical InformaticsVanderbilt University 

Personalised recommendations