Skip to main content

Private Computation on Encrypted Genomic Data

  • Conference paper
  • First Online:
Progress in Cryptology - LATINCRYPT 2014 (LATINCRYPT 2014)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8895))

Abstract

A number of databases around the world currently host a wealth of genomic data that is invaluable to researchers conducting a variety of genomic studies. However, patients who volunteer their genomic data run the risk of privacy invasion. In this work, we give a cryptographic solution to this problem: to maintain patient privacy, we propose encrypting all genomic data in the database. To allow meaningful computation on the encrypted data, we propose using a homomorphic encryption scheme.

Specifically, we take basic genomic algorithms which are commonly used in genetic association studies and show how they can be made to work on encrypted genotype and phenotype data. In particular, we consider the Pearson Goodness-of-Fit test, the \(D'\) and \(r^2\)-measures of linkage disequilibrium, the Estimation Maximization (EM) algorithm for haplotyping, and the Cochran-Armitage Test for Trend. We also provide performance numbers for running these algorithms on encrypted data.

Adriana López-Alt—Research conducted while visiting Microsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The running time is linear in the population size for a fixed parameter set. For larger population sizes, parameters need to be increased and performance degrades, but not by a large factor (see Table 1 for a comparison of the running times for two typical parameter sets).

  2. 2.

    1 degree of freedom = 3 genotypes \(-\) 2 alleles.

  3. 3.

    Common choices for the set of weights \({{\varvec{w}}} = (w_0, w_1, w_2)\) are: \({{\varvec{w}}} = (0,1,2)\) for the additive (co-dominant) model, \({{\varvec{w}}} = (0,1,1)\) for the dominant model (\(A\) is dominant over \(a\)), and \({{\varvec{w}}} = (0,0,1)\) for the recessive model (\(A\) is recessive to allele \(a\)).

  4. 4.

    For a bi-allelic gene with alleles \(A\) and \(a\), the value 0 corresponds to the genotype \(AA\), the value 1 corresponds to the genotype \(Aa\) and the value 2 corresponds to the genotype \(aa\).

  5. 5.

    An arithmetic circuit over \(\mathbb {F}_t\) has addition and multiplication gates modulo \(t\).

  6. 6.

    The algorithms in Sect. 2 include divisions. In Sect. 4, we show how to get around this issue.

  7. 7.

    The only modification we make to the scheme of López-Alt and Naehrig is removing a step called “relinearization” or “key switching”, needed to make decryption independent of the function that was homomorphically evaluated. In our implementation, decryption depends on the number of homomorphic multiplications that were performed. We make this change for efficiency reasons, as relinearization is very costly.

  8. 8.

    Informally, a function has degree \(D\) if it can be represented as a (possibly multivariate) polynomial of degree \(D\). See Sect. 4.4 for more details.

  9. 9.

    Recall from Sect. 3 that we cannot perform homomorphic divisions.

  10. 10.

    Admittedly, the size of the parameters needed does depend on the magnitude of the genotype and phenotype counts, which can be as large as the size of the population sample. This is because the size of the message encrypted at any given time (i.e. the size of the counts and all the intermediate values in the computation) cannot grow too large relative to the modulus \(q\). Therefore, larger population sizes (and therefore larger counts) require a larger modulus \(q\), which in turn requires a larger dimension \(n\) for security. However, for a fixed parameter set, it is possible to compute an upper bound on the size of the population sample and the homomorphic computations detailed in this work do work correctly for any population sample with size smaller than the given bound.

References

  1. Ayday, E., De Cristofaro, E., Hubaux, J.-P., Tsudik, G.: The Chills and Thrills of Whole Genome Sequencing. Technical report (2013). http://infoscience.epfl.ch/record/186866/files/survey.pdf

  2. Ayday, E., Raisaro, J.L., Hubaux, J.-P.: Personal use of the genomic data: Privacy vs. storage cost. In: Proceedings of IEEE Global Communications Conference, Exhibition and Industry Forum (Globecom) (2013)

    Google Scholar 

  3. Blanton, M., Atallah, M.J., Frikken, K.B., Malluhi, Q.: Secure and efficient outsourcing of sequence comparisons. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 505–522. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Bosma, W., Cannon, J., Playoust, C.: The magma algebra system. I. The user language. J. Symbolic Comput. 24(3–4), 235–265 (1997). Computational algebra and number theory (London, 1993)

    Article  MATH  MathSciNet  Google Scholar 

  5. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: Fully homomorphic encryption without bootstrapping. In: ITCS (2012)

    Google Scholar 

  6. Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring-based fully homomorphic encryption scheme. In: Stam, M. (ed.) IMACC 2013. LNCS, vol. 8308, pp. 45–64. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Bos, J.W., Lauter, K., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform. 50, 234–243 (2014). MSR-TR-2013-81

    Article  Google Scholar 

  8. Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: Ostrovsky, R. (ed.) FOCS, pp. 97–106. IEEE (2011)

    Google Scholar 

  10. Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic Encryption from Ring-LWE and security for key dependent messages. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 505–524. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Brakerski, Z., Vaikuntanathan, V.: Lattice-based FHE as secure as PKE. In: Naor, M. (ed.) ITCS, pp. 1–12. ACM (2014)

    Google Scholar 

  12. Database of Genotypes and Phenotypes (dbGaP). http://www.ncbi.nlm.nih.gov/gap/

  13. De Cristofaro, E., Faber, S., Tsudik, G.: Secure genomic testing with size-and position-hiding private substring matching. In: Proceedings of the 2013 ACM Workshop on Privacy in the Electronic Society (WPES 2013). ACM (2013)

    Google Scholar 

  14. European Bioinformatics Institute. http://www.ebi.ac.uk/ (Accessed 30 October 2013)

  15. Fienberg, S.E., Slavkovic, A., Uhler, C.: Privacy preserving GWAS data sharing. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 628–635. IEEE (2011)

    Google Scholar 

  16. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)

    Google Scholar 

  17. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.) STOC, pp. 169–178. ACM (2009)

    Google Scholar 

  18. Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Creating a global alliance to enable responsible sharing of genomic and clinical data, White Paper (2013). http://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf

  20. Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)

    Article  Google Scholar 

  21. Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Addressing the concerns of the lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1141–1152. ACM (2013)

    Google Scholar 

  23. International cancer genome consortium (ICGC). http://www.icgc.org

  24. International rare diseases research consortium (IRDiRC). http://www.irdirc.org

  25. DNA Data Bank Of Japan. http://www.ddbj.nig.ac.jp/

  26. Johnson, A., Shmatikov, V.: Privacy-preserving data exploration in genome-wide association studies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1079–1087. ACM (2013)

    Google Scholar 

  27. Lepoint, T., Naehrig, M.: A comparison of the homomorphic encryption schemes \({\sf {FV}}\) and \({\sf {YASHE}}\). In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT. LNCS, vol. 8469, pp. 318–335. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  28. López-Alt, A., Naehrig, M.: Large integer plaintexts in ring-based fully homomorphic encryption. In preparation (2014)

    Google Scholar 

  29. Lauter, K., Naehrig, M., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Cloud Computing Security Workshop, pp. 113–124. ACM (2011)

    Google Scholar 

  30. López-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Karloff, H.J., Pitassi, T. (eds.) STOC, pp. 1219–1234. ACM (2012)

    Google Scholar 

  31. McCarty, C.A., Chisholm, R.L., Chute, C.G., Kullo, I.J., Jarvik, G.P., Larson, E.B., Li, R., Masys, D.R., Ritchie, M.D., Roden, D.M., et al.: The emerge network a consortium of biorepositories linked toelectronic medical records data for conducting genomic studies. BMC Med. Genomics 4(1), 13 (2011)

    Article  Google Scholar 

  32. Park, M.Y., Hastie, T.: Penalized logistic regression for detecting gene interactions. Biostatistics 9(1), 30–50 (2008)

    Article  MATH  Google Scholar 

  33. Stehlé, D., Steinfeld, R.: Making NTRU as secure as worst-case problems over ideal lattices. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 27–47. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  34. A map of human genome variation from population-scale sequencing. Nature, 467:1061–1073. http://www.1000genomes.org

  35. van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 24–43. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  36. Wang, R., Li, Y.F., Wang, X.F., Tang, H., Zhou, X.: Learning your identity and disease from research papers: Information leaks in genome wide association study. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS 2009, pp. 534–544. ACM, New York (2009)

    Google Scholar 

  37. Yasuda, M., Shimoyama, T., Kogure, J., Yokoyama, K., Koshiba, T.: Secure pattern matching using somewhat homomorphic encryption. In: Proceedings of the 2013 ACM Cloud Computing Security Workshop, pp. 65–76. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

We thank Tancrède Lepoint for suggesting the encoding in Sect. 4.1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Naehrig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lauter, K., López-Alt, A., Naehrig, M. (2015). Private Computation on Encrypted Genomic Data. In: Aranha, D., Menezes, A. (eds) Progress in Cryptology - LATINCRYPT 2014. LATINCRYPT 2014. Lecture Notes in Computer Science(), vol 8895. Springer, Cham. https://doi.org/10.1007/978-3-319-16295-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16295-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16294-2

  • Online ISBN: 978-3-319-16295-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics