Abstract
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., OpenSNP or 23andMe). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer’s or Parkinson’s), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person’s genome is correlated to his family members’ genomes, thus leading to interdependent privacy risks. Thus, in this chapter, focusing on our existing and ongoing work on genomic privacy carried out at EPFL/LCA1, we will first highlight the threats for genomic privacy. Then, we will present the high level descriptions of our solutions to protect the privacy of genomic data and we will discuss future research directions. For a description of the research contributions of other research groups, the reader is referred to Chaps. 16 and 17 of the present volume.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
A SNP occurs when a nucleotide (at a specific position on the DNA) varies between individuals of a given population. SNPs carry privacy-sensitive information about individuals’ health. Recent discoveries show that the susceptibility of an individual to several diseases can be computed from his or her SNPs.
- 3.
LD can be thought as a correlation between two variables.
- 4.
The exact sequence of the family members (whose SNPs are revealed) is indicated for each evaluation.
- 5.
Alignment is with respect to the reference genome, which is assembled by the scientists.
- 6.
Position of a short read tells the position of the first nucleotide on the DNA sequence. Cigar string of a short read denotes the deletions and insertions on the short read. Content of a short read includes the nucleotides.
- 7.
In this study, we only focused on the diseases which can be analyzed using the SNPs. We admit that there are also other diseases which depend on other forms of mutations or environmental factors.
- 8.
It is public knowledge that a real SNP includes at least one minor allele, and the curious party uses this background information in the attack.
- 9.
Depending on the privacy-sensitivity of the clinical and environmental data, the patient can choose which clinical and environmental attributes to reveal to the MU, and which ones to encrypt and keep at the SPU.
- 10.
Our solution may also be used for GWAS, but it better scales for replication/fine-mapping association studies which are based on the a priori knowledge generated with GWAS.
- 11.
A patient can choose a low-entropy password that is easier for him/her to remember, which is a common case in the real world [12].
- 12.
- 13.
- 14.
- 15.
Later researchers have used correlations in the genome to unveil Watson’s predisposition to Alzheimer’s [35]. In this work, we also consider such correlations.
References
Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order preserving encryption for numeric data. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2004)
Ateniese, G., Fu, K., Green, M., Hohenberger, S.: Improved proxy re-encryption schemes with applications to secure distributed storage. ACM Trans. Inf. Syst. Secur. 9, 1–30 (2006)
Ayday, E., Cristofaro, E.D., Tsudik, G., Hubaux, J.-P.: Whole genome sequencing: revolutionary medicine or privacy nightmare. IEEE Computet 48(2), pp. 58–66 (2015)
Ayday, E., Raisaro, J.L., Hengartner, U., Molyneaux, A., Hubaux, J.-P.: Privacy-preserving processing of raw genomic data. In: Proceeding of 8th International Workshop on Data Privacy Management (DPM). Egham, UK (2013)
Ayday, E., Raisaro, J.L., Mclaren, P.J., Fellay, J., Hubaux, J.-P.: Privacy-preserving computation of disease risk by using genomic, clinical, and environmental data. In: Proceedings of USENIX Security Workshop on Health Information Technologies (HealthTech) (2013)
Ayday, E., Raisaro, J.L., Rougemont, J., Hubaux, J.-P.: Protecting and evaluating genomic privacy in medical tests and personalized medicine. In: CM Workshop on Privacy in the Electronic Society (WPES). Berlin, Germany (2013)
Bresson, E., Catalano, D., Pointcheval, D.: A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In: Proceedings of Asiacrypt (2003)
Caulfield, T., Cook-Deegan, R.M., Kieff, F.S., Walsh, J.P.: Evidence and anecdotes: an analysis of human gene patenting controversies. Nat. Biotechnol. 24(9), pp. 1091–1094 (2006)
Clayton, D.: On inferring presence of an individual in a mixture: a bayesian approach. Biostatistics 11(4), 661–673 (2010)
Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., et al.: Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961), 78–81 (2010)
Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15(6), 409–421 (2014)
Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 657–666. ACM, New York (2007). doi:10.1145/1242572.1242661. url:http://doi.acm.org/10.1145/1242572.1242661
Francke, U., Dijamco, C., Kiefer, A.K., Eriksson, N., Moiseff, B., Tung, J.Y., Mountain, J.L.: Dealing with the unexpected: consumer responses to direct-access BRCA mutation testing. PeerJ 1 (2013)
Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., Ristenpart, T.: Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In: Proceedings of the 23rd USENIX Security Symposium (2014)
Fréville, A.: The multidimensional 0–1 knapsack problem: an overview. Eur. J. Oper. Res. 155(1), 1–21 (2004)
Gitschier, J.: Inferential genotyping of y chromosomes in latter-day saints founders and comparison to Utah samples in the hapmap project. Am. J. Hum. Genet. 84(2), 251–258 (2009)
Google Genomics: (2015) https://cloud.google.com/genomics/
Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)
Hawkins, N.: The impact of human gene patents on genetic testing in the UK. J. Gene Med. 13(4), pp. 320–324 (2011)
Hayden, E.C.: Privacy protections: the genome hacker. Nature 497, 172–174 (2013)
Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W.: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4 (2008)
Huang, Z., Ayday, E., Hubaux, J.-P., Fellay, J., Juels, A.: Genoguard: protecting genomic data against brute-force attacks. In: Proceedings of IEEE Symposium on Security and Privacy (2015)
Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Addressing the concerns of the Lacks family: quantification of kin genomic privacy. In: Proceeding of the 20th ACM Conference on Computer and Communications Security (CCS) (2013)
Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Reconciling utility with privacy in genomics. In: Proceedings of ACM Workshop on Privacy in the Electronic Society (WPES) (2014)
Im, H.K., Gamazon, E.R., Nicolae, D.L., Cox, N.J.: On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90(4), 591–598 (2012)
Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1079–1087 (2013)
Juels, A., Ristenpart, T.: Honey encryption: security beyond the brute-force bound. In: Advances in Cryptology–EUROCRYPT, pp. 293–310 (2014)
Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics. 2013 Apr 1;29(7):886-93
Kantarcioglu, M., Jiang, W., Liu, Y., Malin, B.: A cryptographic approach to securely share and query genomic sequences. IEEE Trans. Inf. Technol. Biomed. 12(5), 606–617 (2008). doi: 10.1109/TITB.2007.908465
Kschischang, F., Frey, B., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, pp. 498–519 (2001)
Lin, Z., Owen, A.B., Altman, R.B.: Genomic research and human subject privacy. Science 305(5681), 183 (2004)
Loukides, G., Gkoulalas-Divanis, A., Malin, B.: Anonymization of electronic medical records for validating genome-wide association studies. PNAS 107(17), 7898–7903 (2010)
Malin, B.A., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inform. 37(3), 179–192 (2004)
National Human Genome Research Institute: Intellectual Property and Genomics. (2015) http://www.genome.gov/19016590
Nyholt, D., Yu, C., Visscher, P.: On Jim Watson’s APOE status: genetic information is hard to hide. Eur. J. Hum. Genet. 17, 147–149 (2009)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo (1988)
Popa, R.A., Li, F.H., Zeldovich, N.: An ideal-security protocol for order-preserving encoding. In: Proceedings of the 2013 IEEE Symposium on Security and Privacy (2013)
Raisaro, J.L., Ayday, E., McLaren, P., Telenti, A., Hubaux, J.P.: On a novel privacy-preserving framework for both personalized medicine and genetic association studies. In: Privacy-Aware Computational Genomics (PRIVAGEN) (2015)
Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Shih, W.: A branch and bound method for the multiconstraint zero-one knapsack problem. J. Oper. Res. Soc. 30, 369–378 (1979)
Stajano, F., Bianchi, L., Liò, P., Korff, D.: Forensic genomics: kin privacy, driftnets and other open questions. In: Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society (2008)
Sweeney, L., Abu, A., Winn, J.: Identifying Participants in the Personal Genome Project by Name. Harvard University, Cambridge (2013)
Wang, R., Li, Y.F., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 534–544 (2009)
Yu, F., Fienberg, S.E., Slavkovic, A.B., Uhler, C.: Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed Inform. 2014 Aug;50:133-41
Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.: To release or not to release: evaluating information leaks in aggregate human-genome data. In: Proceedings of the 16th European Conference on Research in Computer Security (ESORICS’11), pp. 607–627 (2011)
Acknowledgements
The authors would like to express their gratitude to Mathias Humbert, Jean Louis Raisaro, Zhicong Huang, Emiliano De Cristofaro, Gene Tsudik, Jacques Fellay, Amalio Telenti and Paul Mc Laren.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ayday, E., Hubaux, JP. (2015). Threats and Solutions for Genomic Data Privacy. In: Gkoulalas-Divanis, A., Loukides, G. (eds) Medical Data Privacy Handbook. Springer, Cham. https://doi.org/10.1007/978-3-319-23633-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-23633-9_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23632-2
Online ISBN: 978-3-319-23633-9
eBook Packages: Computer ScienceComputer Science (R0)