Skip to main content
Log in

Supervised machine learning using encrypted training data

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Preservation of privacy in data mining and machine learning has emerged as an absolute prerequisite in many practical scenarios, especially when the processing of sensitive data is outsourced to an external third party. Currently, privacy preservation methods are mainly based on randomization and/or perturbation, secure multiparty computations and cryptographic methods. In this paper, we take advantage of the partial homomorphic property of some cryptosystems to train simple machine learning models with encrypted data. Our basic scenario has three parties: multiple Data Owners, which provide encrypted training examples; the Algorithm Owner (or Application), which processes them to adjust the parameters of its models; and a semi-trusted third party, which provides privacy and secure computation services to the Application in some operations not supported by the homomorphic cryptosystem. In particular, we focus on two issues: the use of multiple-key cryptosystems, and the impact of the quantization of real-valued input data required before encryption. In addition, we develop primitives based on the outsourcing of a reduced set of operations that allows to implement general machine learning algorithms using efficient dedicated hardware. As applications, we consider the training of classifiers using privacy-protected data and the tracking of a moving target using encrypted distance measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM SIGMOD Rec. 29(2), 439–450 (2000). doi:10.1145/335191.335438

    Article  Google Scholar 

  2. Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)

  3. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software. Wiley, Hoboken (2004)

    Google Scholar 

  4. Beye, M., Erkin, Z., Lagendijk, R.: Efficient privacy preserving k-means clustering in a three-party setting. In: Information Forensics and Security (WIFS), 2011 IEEE International Workshop on, pp. 1–6. doi:10.1109/WIFS.2011.6123148 (2011)

  5. Bianchi, T., Piva, A., Barni, M.: Efficient linear filtering of encrypted signals via composite representation. In: Digital Signal Processing, 2009 16th International Conference on, pp. 1–6. doi:10.1109/ICDSP.2009.5201116 (2009)

  6. Bossuet, L., Grand, M., Gaspar, L., Fischer, V., Gogniat, G.: Architectures of flexible symmetric key crypto engines survey: from hardware coprocessor to multi-crypto-processor system on chip. ACM Comput. Surv. (CSUR) 45(4), 41 (2013)

    Article  Google Scholar 

  7. Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. Cryptology ePrint Archive, Rep. 2014/331. http://eprint.iacr.org/ (2014)

  8. Bresson, E., Catalano, D., Pointcheval, D.: A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In: Advances in Cryptology-ASIACRYPT 2003, Springer, pp. 37–54 (2003)

  9. Catrina, O., Saxena, A.: Secure computation with fixed-point numbers. In: Financial Cryptography and Data Security, Springer, pp. 35–50 (2010)

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Diffie, W., Hellman, M.E.: New directions in cryptography. IEEE Trans. Infor. Theory 22(6), 644–654 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  12. ElGamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985). doi:10.1109/TIT.1985.1057074

    Article  MathSciNet  MATH  Google Scholar 

  13. Erkin, Z., Veugen, T., Toft, T., Lagendijk, R.: Generating private recommendations efficiently using homomorphic encryption and data packing. IEEE Trans. Inf. For. Secur. 7(3), 1053–1066 (2012). doi:10.1109/TIFS.2012.2190726

    Article  Google Scholar 

  14. Gentry, C.: Computing arbitrary functions of encrypted data. Commun. ACM 53(3), 97–105 (2010). doi:10.1145/1666420.1666444

    Article  MATH  Google Scholar 

  15. Goldreich, O.: Foundations of Cryptography II. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  16. González-Serrano, F., Amor-Martín, A., Casamayón-Antón, J.: State estimation using an extended Kalman filter with privacy-protected observed inputs. In: GlobalSIP14-Workshop on Information Forensics and Security 2014. Proceedings of the, pp. 1647–1652 (2014)

  17. Han, S., Ng, W.K., Yu, P.S.: Privacy-preserving singular value decomposition. In: Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, IEEE, pp. 1267–1270 (2009)

  18. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, vol 2, pp. 985–990 vol.2. doi:10.1109/IJCNN.2004.1380068 (2004)

  19. Huang, J., Ling, C.X.: Using auc and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  20. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, pp. 37–48 (2005)

  21. Instruments, T.: The TMS320 Family of Digital Signal Processors. Literature number spra396 http://www.ti.com/lit/an/spra396/spra396.pdf (1997)

  22. Kwon, T.W., You, C.S., Heo, W.S., Kang, Y.K., Choi, J.R.: Two implementation methods of a 1024-bit RSA cryptoprocessor based on modified Montgomery algorithm. In: Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, IEEE, vol 4, pp. 650–653 (2001)

  23. Lagendijk, R., Erkin, Z., Barni, M.: Encrypted signal processing for privacy protection: conveying the utility of homomorphic encryption and multiparty computation. IEEE Signal Process. Mag. 30(1), 82–105 (2013). doi:10.1109/MSP.2012.2219653

    Article  Google Scholar 

  24. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: Security and Privacy (SP), 2013 IEEE Symposium on, pp. 334–348. doi:10.1109/SP.2013.30 (2013)

  25. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of the International Conference on the Theory and Application of Cryptographic Techniques, Springer, Prague, Czech Republic, EUROCRYPT ’99, vol 1592, pp. 223–238 (1999)

  26. Peter, A., Tews, E., Katzenbeisser, S.: Efficiently outsourcing multiparty computation under multiple keys. IEEE Trans. Inf. For. Secur. 8(12), 2046–2058 (2013). doi:10.1109/TIFS.2013.2288131

    Article  Google Scholar 

  27. Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomorphisms. Found. Secure Comput. 32(4), 169–178 (1978)

    MathSciNet  Google Scholar 

  28. Samet, S., Miri, A.: Privacy-preserving back-propagation and extreme learning machine algorithms. Data Knowl. Eng. 79, 40–61 (2012)

    Article  Google Scholar 

  29. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm. Math. Programm. 127(1), 3–30 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  30. Troncoso-Pastoriza, J., Perez-Gonzalez, F.: Secure signal processing in the cloud: enabling technologies for privacy-preserving multimedia cloud processing. IEEE Signal Process. Mag. 30(2), 29–41 (2013). doi:10.1109/MSP.2012.2228533

    Article  Google Scholar 

  31. Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 206–215 (2003)

  32. Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving svm classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)

    Article  Google Scholar 

  33. Vapnik, V.N.: Statistical Learning Theory, 1st edn. Wiley, Hoboken (1998). (September 30, 1998)

    MATH  Google Scholar 

  34. Veugen, T.: Comparing encrypted data. In: Technical Report, Multimedia Signal Processing Group, Delft University of Technology, The Netherlands, and TNO Information and Communication Technology, Delft, The Netherlands (2011)

  35. Yao, A.C.: Protocols for secure computations. In: Foundations of Computer Science, 1982. SFCS ’08. 23rd Annual Symposium on, pp. 160–164. doi:10.1109/SFCS.1982.38 (1982)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco-Javier González-Serrano.

Additional information

This work is partially funded by the Project CASI-CAM P2013/ICE2845 of the Regional Government of Madrid, Spain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González-Serrano, FJ., Amor-Martín, A. & Casamayón-Antón, J. Supervised machine learning using encrypted training data. Int. J. Inf. Secur. 17, 365–377 (2018). https://doi.org/10.1007/s10207-017-0381-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-017-0381-1

Keywords

Navigation