Skip to main content

Threats and Solutions for Genomic Data Privacy

  • Chapter
Medical Data Privacy Handbook
  • 2693 Accesses

Abstract

With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., OpenSNP or 23andMe). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer’s or Parkinson’s), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person’s genome is correlated to his family members’ genomes, thus leading to interdependent privacy risks. Thus, in this chapter, focusing on our existing and ongoing work on genomic privacy carried out at EPFL/LCA1, we will first highlight the threats for genomic privacy. Then, we will present the high level descriptions of our solutions to protect the privacy of genomic data and we will discuss future research directions. For a description of the research contributions of other research groups, the reader is referred to Chaps. 16 and 17 of the present volume.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.nytimes.com/2013/03/24/opinion/sunday/the-immortal-life-of-henrietta-lacks-the-sequel.html?pagewanted=all

  2. 2.

    A SNP occurs when a nucleotide (at a specific position on the DNA) varies between individuals of a given population. SNPs carry privacy-sensitive information about individuals’ health. Recent discoveries show that the susceptibility of an individual to several diseases can be computed from his or her SNPs.

  3. 3.

    LD can be thought as a correlation between two variables.

  4. 4.

    The exact sequence of the family members (whose SNPs are revealed) is indicated for each evaluation.

  5. 5.

    Alignment is with respect to the reference genome, which is assembled by the scientists.

  6. 6.

    Position of a short read tells the position of the first nucleotide on the DNA sequence. Cigar string of a short read denotes the deletions and insertions on the short read. Content of a short read includes the nucleotides.

  7. 7.

    In this study, we only focused on the diseases which can be analyzed using the SNPs. We admit that there are also other diseases which depend on other forms of mutations or environmental factors.

  8. 8.

    It is public knowledge that a real SNP includes at least one minor allele, and the curious party uses this background information in the attack.

  9. 9.

    Depending on the privacy-sensitivity of the clinical and environmental data, the patient can choose which clinical and environmental attributes to reveal to the MU, and which ones to encrypt and keep at the SPU.

  10. 10.

    Our solution may also be used for GWAS, but it better scales for replication/fine-mapping association studies which are based on the a priori knowledge generated with GWAS.

  11. 11.

    A patient can choose a low-entropy password that is easier for him/her to remember, which is a common case in the real world [12].

  12. 12.

    https://www.counsyl.com/.

  13. 13.

    https://www.23andme.com/.

  14. 14.

    http://opensnp.wordpress.com/2011/11/17/first-results-of-the-survey-on-sharing-genetic-information/.

  15. 15.

    Later researchers have used correlations in the genome to unveil Watson’s predisposition to Alzheimer’s [35]. In this work, we also consider such correlations.

References

  1. Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2004)

    Google Scholar 

  2. Ateniese, G., Fu, K., Green, M., Hohenberger, S.: Improved proxy re-encryption schemes with applications to secure distributed storage. ACM Trans. Inf. Syst. Secur. 9, 1–30 (2006)

    Article  MATH  Google Scholar 

  3. Ayday, E., Cristofaro, E.D., Tsudik, G., Hubaux, J.-P.: Whole genome sequencing: revolutionary medicine or privacy nightmare. IEEE Computet 48(2), pp. 58–66 (2015)

    Article  Google Scholar 

  4. Ayday, E., Raisaro, J.L., Hengartner, U., Molyneaux, A., Hubaux, J.-P.: Privacy-preserving processing of raw genomic data. In: Proceeding of 8th International Workshop on Data Privacy Management (DPM). Egham, UK (2013)

    Google Scholar 

  5. Ayday, E., Raisaro, J.L., Mclaren, P.J., Fellay, J., Hubaux, J.-P.: Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. In: Proceedings of USENIX Security Workshop on Health Information Technologies (HealthTech) (2013)

    Google Scholar 

  6. Ayday, E., Raisaro, J.L., Rougemont, J., Hubaux, J.-P.: Protecting and evaluating genomic privacy in medical tests and personalized medicine. In: CM Workshop on Privacy in the Electronic Society (WPES). Berlin, Germany (2013)

    Google Scholar 

  7. Bresson, E., Catalano, D., Pointcheval, D.: A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In: Proceedings of Asiacrypt (2003)

    Book  Google Scholar 

  8. Caulfield, T., Cook-Deegan, R.M., Kieff, F.S., Walsh, J.P.: Evidence and anecdotes: an analysis of human gene patenting controversies. Nat. Biotechnol. 24(9), pp. 1091–1094 (2006)

    Article  Google Scholar 

  9. Clayton, D.: On inferring presence of an individual in a mixture: a bayesian approach. Biostatistics 11(4), 661–673 (2010)

    Article  MathSciNet  Google Scholar 

  10. Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al.: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961), 78–81 (2010)

    Article  Google Scholar 

  11. Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15(6), 409–421 (2014)

    Article  Google Scholar 

  12. Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 657–666. ACM, New York (2007). doi:10.1145/1242572.1242661. url:http://doi.acm.org/10.1145/1242572.1242661

  13. Francke, U., Dijamco, C., Kiefer, A.K., Eriksson, N., Moiseff, B., Tung, J.Y., Mountain, J.L.: Dealing with the unexpected: consumer responses to direct-access BRCA mutation testing. PeerJ 1 (2013)

    Google Scholar 

  14. Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., Ristenpart, T.: Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX Security Symposium (2014)

    Google Scholar 

  15. Fréville, A.: The multidimensional 0–1 knapsack problem: an overview. Eur. J. Oper. Res. 155(1), 1–21 (2004)

    Article  MATH  Google Scholar 

  16. Gitschier, J.: Inferential genotyping of y chromosomes in latter-day saints founders and comparison to Utah samples in the hapmap project. Am. J. Hum. Genet. 84(2), 251–258 (2009)

    Article  Google Scholar 

  17. Google Genomics: (2015) https://cloud.google.com/genomics/

  18. Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)

    Article  Google Scholar 

  19. Hawkins, N.: The impact of human gene patents on genetic testing in the UK. J. Gene Med. 13(4), pp. 320–324 (2011)

    Article  MathSciNet  Google Scholar 

  20. Hayden, E.C.: Privacy protections: the genome hacker. Nature 497, 172–174 (2013)

    Article  Google Scholar 

  21. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4 (2008)

    Google Scholar 

  22. Huang, Z., Ayday, E., Hubaux, J.-P., Fellay, J., Juels, A.: Genoguard: protecting genomic data against brute-force attacks. In: Proceedings of IEEE Symposium on Security and Privacy (2015)

    Google Scholar 

  23. Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Addressing the concerns of the Lacks family: quantification of kin genomic privacy. In: Proceeding of the 20th ACM Conference on Computer and Communications Security (CCS) (2013)

    Google Scholar 

  24. Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Reconciling utility with privacy in genomics. In: Proceedings of ACM Workshop on Privacy in the Electronic Society (WPES) (2014)

    Google Scholar 

  25. Im, H.K., Gamazon, E.R., Nicolae, D.L., Cox, N.J.: On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90(4), 591–598 (2012)

    Article  Google Scholar 

  26. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1079–1087 (2013)

    Google Scholar 

  27. Juels, A., Ristenpart, T.: Honey encryption: security beyond the brute-force bound. In: Advances in Cryptology–EUROCRYPT, pp. 293–310 (2014)

    Google Scholar 

  28. Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics. 2013 Apr 1;29(7):886-93

    Google Scholar 

  29. Kantarcioglu, M., Jiang, W., Liu, Y., Malin, B.: A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12(5), 606–617 (2008). doi: 10.1109/TITB.2007.908465

    Article  Google Scholar 

  30. Kschischang, F., Frey, B., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, pp. 498–519 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  31. Lin, Z., Owen, A.B., Altman, R.B.: Genomic research and human subject privacy. Science 305(5681), 183 (2004)

    Article  Google Scholar 

  32. Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. PNAS 107(17), 7898–7903 (2010)

    Article  Google Scholar 

  33. Malin, B.A., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37(3), 179–192 (2004)

    Article  Google Scholar 

  34. National Human Genome Research Institute: Intellectual Property and Genomics. (2015) http://www.genome.gov/19016590

  35. Nyholt, D., Yu, C., Visscher, P.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009)

    Article  Google Scholar 

  36. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo (1988)

    Google Scholar 

  37. Popa, R.A., Li, F.H., Zeldovich, N.: An ideal-security protocol for order-preserving encoding. In: Proceedings of the 2013 IEEE Symposium on Security and Privacy (2013)

    Google Scholar 

  38. Raisaro, J.L., Ayday, E., McLaren, P., Telenti, A., Hubaux, J.P.: On a novel privacy-preserving framework for both personalized medicine and genetic association studies. In: Privacy-Aware Computational Genomics (PRIVAGEN) (2015)

    Google Scholar 

  39. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  40. Shih, W.: A branch and bound method for the multiconstraint zero-one knapsack problem. J. Oper. Res. Soc. 30, 369–378 (1979)

    Article  MATH  Google Scholar 

  41. Stajano, F., Bianchi, L., Liò, P., Korff, D.: Forensic genomics: kin privacy, driftnets and other open questions. In: Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society (2008)

    Google Scholar 

  42. Sweeney, L., Abu, A., Winn, J.: Identifying Participants in the Personal Genome Project by Name. Harvard University, Cambridge (2013)

    Google Scholar 

  43. Wang, R., Li, Y.F., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 534–544 (2009)

    Google Scholar 

  44. Yu, F., Fienberg, S.E., Slavkovic, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed Inform. 2014 Aug;50:133-41

    Google Scholar 

  45. Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: Proceedings of the 16th European Conference on Research in Computer Security (ESORICS’11), pp. 607–627 (2011)

    Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to Mathias Humbert, Jean Louis Raisaro, Zhicong Huang, Emiliano De Cristofaro, Gene Tsudik, Jacques Fellay, Amalio Telenti and Paul Mc Laren.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erman Ayday .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ayday, E., Hubaux, JP. (2015). Threats and Solutions for Genomic Data Privacy. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23633-9_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23632-2

  • Online ISBN: 978-3-319-23633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics