Skip to main content

Privacy-Preserving Ridge Regression with only Linearly-Homomorphic Encryption

Part of the Lecture Notes in Computer Science book series (LNSC,volume 10892)

Abstract

Linear regression with 2-norm regularization (i.e., ridge regression) is an important statistical technique that models the relationship between some explanatory values and an outcome value using a linear function. In many applications (e.g., predictive modeling in personalized health-care), these values represent sensitive data owned by several different parties who are unwilling to share them. In this setting, training a linear regression model becomes challenging and needs specific cryptographic solutions. This problem was elegantly addressed by Nikolaenko et al. in S&P (Oakland) 2013. They suggested a two-server system that uses linearly-homomorphic encryption (LHE) and Yao’s two-party protocol (garbled circuits). In this work, we propose a novel system that can train a ridge linear regression model using only LHE (i.e., without using Yao’s protocol). This greatly improves the overall performance (both in computation and communication) as Yao’s protocol was the main bottleneck in the previous solution. The efficiency of the proposed system is validated both on synthetically-generated and real-world datasets.

Keywords

  • Ridge regression
  • Linear regression
  • Privacy
  • Homomorphic encryption

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-93387-0_13
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-93387-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    Size of the messages exchanged among the parties running the system.

  2. 2.

    Timing on a 2.6 GHz 8 GB RAM machine running Linux 16.04; 80-bit security.

  3. 3.

    Timing on a 1.9 GHz 64 GB RAM machine running Linux 12.04; 80-bit security.

  4. 4.

    In other words, \(\delta =\max \{\Vert X\Vert _\infty , \Vert {\varvec{y}}\Vert _\infty \}\) for the original X and \({\varvec{y}}\).

  5. 5.

    We assume that \(\lambda \in \mathbb {R}\) has at most \(2\ell \) digits in the fractional part.

  6. 6.

    \(\mathrm {GL}(d,\mathcal {M})\) denotes the general linear group of degree d over the ring \(\mathcal {M}\); namely, the group of \(d\times d\) invertible matrices with entries from \(\mathcal {M}\).

  7. 7.

    Notice that the system presented in [26] fails because no techniques are used to make the arithmetic over \(\mathbb {Q}\) compatible with the modular arithmetic used by the underling LHE (i.e., Paillier’s scheme). See [7] for more details on this.

  8. 8.

    That is, trusted to be non-colluding.

  9. 9.

    If \({\varvec{x}}_t[i]\) and \({\varvec{x}}_t[j]\) are both held by one \(\mathrm {DO}_k\), then the former can send \({\mathsf {Enc}}_ pk ({\varvec{x}}_t[i]{\varvec{x}}_t[j])\) to \(\mathrm {MLE}\), who updates the formulas in Step 3 of \(\varPi _{1,\mathrm {arb}}\) accordingly.

  10. 10.

    In this section, for our system we assume \(\ell =3\) and Paillier’s scheme with 80-bit security as underlying LHE.

  11. 11.

    https://archive.ics.uci.edu/ml/datasets.html.

  12. 12.

    According to NIST standard, an RSA modulus of 2048 bits gives 112-bit security.

  13. 13.

    http://python-paillier.readthedocs.io.

  14. 14.

    https://pypi.python.org/pypi/gmpy2.

  15. 15.

    For larger values of n and d, using Damgård and Jurik’s scheme instead of Paillier’s scheme reduces the running time of operations on ciphertexts. See [14, Appendix A.5].

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450. ACM Press (2000)

    Google Scholar 

  2. Aono, Y., Hayashi, T., Phong, L.T., Wang, L.: Fast and secure linear regression and biometric authentication with security update. Cryptology ePrint Archive, Report 2015/692 (2015)

    Google Scholar 

  3. Bar-Ilan, J., Beaver, D.: Non-cryptographic fault-tolerant computing in constant number of rounds of interaction. In: Eighth Annual ACM Symposium on Principles of Distributed Computing, pp. 201–209. ACM Press (1989)

    Google Scholar 

  4. Barbosa, M., Catalano, D., Fiore, D.: Labeled homomorphic encryption: scalable and privacy-preserving processing of outsourced data. In: Foley, S.N., Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10492, pp. 146–166. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66402-6_10

    CrossRef  Google Scholar 

  5. Beaver, D.: Efficient multiparty protocols using circuit randomization. In: Feigenbaum, J. (ed.) CRYPTO 1991. LNCS, vol. 576, pp. 420–432. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-46766-1_34

    CrossRef  Google Scholar 

  6. Ben-Or, M., Goldwasser, S., Wigderson, A.: Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: 20th Annual ACM Symposium on Theory of Computing, STOC, pp. 1–10. ACM Press (1988)

    Google Scholar 

  7. Cao, Z., Liu, L.: Comment on “harnessing the cloud for securely outsourcing large-scale systems of linear equations”. IEEE Trans. Parallel Distrib. Syst. 27(5), 1551–1552 (2016)

    MathSciNet  CrossRef  Google Scholar 

  8. Cock, M.D., Dowsley, R., Nascimento, A.C.A., Newman, S.C.: Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In: 8th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM Press (2015)

    Google Scholar 

  9. Damgård, I., Jurik, M.: A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Kim, K. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44586-2_9

    CrossRef  Google Scholar 

  10. Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Fourth SIAM International Conference on Data Mining, pp. 222–233. SIAM (2004)

    CrossRef  Google Scholar 

  11. Fouque, P.-A., Stern, J., Wackers, G.-J.: CryptoComputing with rationals. In: Blaze, M. (ed.) FC 2002. LNCS, vol. 2357, pp. 136–146. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36504-4_10

    CrossRef  Google Scholar 

  12. Gascón, A., Schoppmann, P., Balle, B., Raykova, M., Doerner, J., Zahur, S., Evans, D.: Privacy-preserving distributed linear regression on high-dimensional data. PoPETS 2017(4), 248–267 (2017)

    Google Scholar 

  13. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: 41st Annual ACM Symposium on Theory of Computing, STOC, pp. 169–178. ACM Press (2009)

    Google Scholar 

  14. Giacomelli, I., Jha, S., Joye, M., Page, C.D., Yoon, K.: Privacy-preserving ridge regression with only linearly-homomorphic encryption. Cryptology ePrint Archive, Report 2017/979 (2017)

    Google Scholar 

  15. Hall, R., Fienberg, S.E., Nardi, Y.: Secure multiple linear regression based on homomorphic encryption. J. Off. Stat. 27(4), 669–691 (2011)

    Google Scholar 

  16. Kamara, S., Mohassel, P., Raykova, M.: Outsourcing multi-party computation. Cryptology ePrint Archive, Report 2011/272 (2011)

    Google Scholar 

  17. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Regression on distributed databases via secure multi-party computation. In: 2004 Annual National Conference on Digital Government Research, pp. 108:1–108:2 (2004)

    Google Scholar 

  18. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure regression on distributed databases. J. Comput. Graph. Stat. 14(2), 263–279 (2005)

    MathSciNet  CrossRef  Google Scholar 

  19. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Off. Stat. 25(1), 125–138 (2009)

    Google Scholar 

  20. Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44598-6_3

    CrossRef  Google Scholar 

  21. McDonald, G.C.: Ridge regression. Wiley Interdiscip. Rev.: Comput. Stat. 1(1), 93–100 (2009)

    CrossRef  Google Scholar 

  22. Mohassel, P., Zhang, Y.: SecureML: a system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy, pp. 19–38. IEEE Computer Society (2017)

    Google Scholar 

  23. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: 2013 IEEE Symposium on Security and Privacy, pp. 334–348. IEEE Computer Society (2013)

    Google Scholar 

  24. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16

    CrossRef  Google Scholar 

  25. Sanil, A.P., Karr, A.F., Lin, X., Reiter, J.P.: Privacy preserving regression modelling via distributed computation. In: Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 677–682. ACM Press (2004)

    Google Scholar 

  26. Wang, C., Ren, K., Wang, J., Wang, Q.: Harnessing the cloud for securely outsourcing large-scale systems of linear equations. IEEE Trans. Parallel Distrib. Syst. 24(6), 1172–1181 (2013)

    CrossRef  Google Scholar 

  27. Wang, P.S., Guy, M.J.T., Davenport, J.H.: \(P\)-adic reconstruction of rational numbers. ACM SIGSAM Bull. 16(2), 2–3 (1982)

    CrossRef  Google Scholar 

  28. The International Warfarin Pharmacogenetics Consortium: Estimation of the Warfarin dose with clinical and pharmacogenetic data. N. Engl. J. Med. 360(8), 753–764 (2009)

    Google Scholar 

  29. Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science, FOCS, pp. 162–167. IEEE Computer Society (1986)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS) grant UL1TR002373, and by the NIH BD2K Initiative grant U54 AI117924.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene Giacomelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Giacomelli, I., Jha, S., Joye, M., Page, C.D., Yoon, K. (2018). Privacy-Preserving Ridge Regression with only Linearly-Homomorphic Encryption. In: Preneel, B., Vercauteren, F. (eds) Applied Cryptography and Network Security. ACNS 2018. Lecture Notes in Computer Science(), vol 10892. Springer, Cham. https://doi.org/10.1007/978-3-319-93387-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93387-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93386-3

  • Online ISBN: 978-3-319-93387-0

  • eBook Packages: Computer ScienceComputer Science (R0)