# Unsupervised Machine Learning on Encrypted Data

## Abstract

In the context of Fully Homomorphic Encryption, which allows computations on encrypted data, Machine Learning has been one of the most popular applications in the recent past. All of these works, however, have focused on supervised learning, where there is a labeled training set that is used to configure the model. In this work, we take the first step into the realm of unsupervised learning, which is an important area in Machine Learning and has many real-world applications, by addressing the clustering problem. To this end, we show how to implement the \(K\)-Means-Algorithm. This algorithm poses several challenges in the FHE context, including a division, which we tackle by using a natural encoding that allows division and may be of independent interest. While this theoretically solves the problem, performance in practice is not optimal, so we then propose some changes to the clustering algorithm to make it executable under more conventional encodings. We show that our new algorithm achieves a clustering accuracy comparable to the original \(K\)-Means-Algorithm, but has less than \(5\%\) of its runtime.

## Keywords

Machine Learning Clustering Fully Homomorphic Encryption## Supplementary material

## References

- 1.Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44503-X_27CrossRefGoogle Scholar
- 2.Armknecht, F., et al.: A guide to fully homomorphic encryption. IACR Cryptology ePrint Archive (2015/1192)Google Scholar
- 3.Armknecht, F., Katzenbeisser, S., Peter, A.: Group homomorphic encryption: characterizations, impossibility results, and applications. DCC
**67**, 209–232 (2013)MathSciNetzbMATHGoogle Scholar - 4.Barnett, A., et al.: Image classification using non-linear support vector machines on encrypted data. IACR Cryptology ePrint Archive (2017/857)Google Scholar
- 5.Bonte, C., Vercauteren, F.: Privacy-preserving logistic regression training. IACR Cryptology ePrint Archive 233 (2018)Google Scholar
- 6.Bos, J.W., Lauter, K.E., Naehrig, M.: Private predictive analysis on encrypted medical data. J. Biomed. Inform.
**50**, 234–243 (2014)CrossRefGoogle Scholar - 7.Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. In: NDSS (2015)Google Scholar
- 8.Brakerski, Z., Gentry, C., Vaikuntanathan, V.: Fully homomorphic encryption without bootstrapping. In: ECCC, vol. 18 (2011)Google Scholar
- 9.Bunn, P., Ostrovsky, R.: Secure two-party k-means clustering. In: CCS (2007)Google Scholar
- 10.Chabanne, H., de Wargny, A., Milgram, J., Morel, C., Prouff, E.: Privacy-preserving classification on deep neural network. IACR Cryptology ePrint Archive (2017/035)Google Scholar
- 11.Chen, H., Laine, K., Player, R.: Simple encrypted arithmetic library - SEAL v2.1. IACR Cryptology ePrint Archive 2017, 224 (2017)Google Scholar
- 12.Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10031, pp. 3–33. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6_1CrossRefzbMATHGoogle Scholar
- 13.Coron, J.-S., Lepoint, T., Tibouchi, M.: Scale-invariant fully homomorphic encryption over the integers. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 311–328. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54631-0_18CrossRefGoogle Scholar
- 14.Coron, J.-S., Naccache, D., Tibouchi, M.: Public key compression and modulus switching for fully homomorphic encryption over the integers. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 446–464. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-4_27CrossRefGoogle Scholar
- 15.van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 24–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_2CrossRefGoogle Scholar
- 16.Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 617–640. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_24CrossRefzbMATHGoogle Scholar
- 17.Esperança, P.M., Aslett, L.J.M., Holmes, C.C.: Encrypted accelerated least squares regression. In: Singh, A., Zhu, X.J. (eds.) AISTATS (2017)Google Scholar
- 18.Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive (2012/144)Google Scholar
- 19.Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University (2009)Google Scholar
- 20.Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_5CrossRefGoogle Scholar
- 21.Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K.E., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML (2016)Google Scholar
- 22.Graepel, T., Lauter, K., Naehrig, M.: ML confidential: machine learning on encrypted data. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 1–21. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_1CrossRefGoogle Scholar
- 23.Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_31CrossRefzbMATHGoogle Scholar
- 24.Jagannathan, G., Pillaipakkamnatt, K., Wright, R.N., Umano, D.: Communication-efficient privacy-preserving clustering. Trans. Data Priv.
**3**, 1–25 (2010)MathSciNetGoogle Scholar - 25.Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: SIGKDD (2005)Google Scholar
- 26.Jäschke, A., Armknecht, F.: Accelerating homomorphic computations on rational numbers. In: Manulis, M., Sadeghi, A.-R., Schneider, S. (eds.) ACNS 2016. LNCS, vol. 9696, pp. 405–423. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39555-5_22CrossRefGoogle Scholar
- 27.Jäschke, A., Armknecht, F.: (Finite) field work: choosing the best encoding of numbers for FHE computation. In: Capkun, S., Chow, S. (eds.) Cryptology and Network Security. CANS 2017, vol. 11261, pp. 482–492. Springer, Cham (2017). https://doi.org/10.1007/978-3-030-02641-7_23CrossRefGoogle Scholar
- 28.Jha, S., Kruger, L., McDaniel, P.: Privacy preserving clustering. In: di Vimercati, S.C., Syverson, P., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005). https://doi.org/10.1007/11555827_23CrossRefGoogle Scholar
- 29.Kim, A., Song, Y., Kim, M., Lee, K., Cheon, J.H.: Logistic regression model training based on the approximate homomorphic encryption. IACR Cryptology ePrint Archive (254) (2018)Google Scholar
- 30.Kim, M., Song, Y., Wang, S., Xia, Y., Jiang, X.: Secure logistic regression based on homomorphic encryption. IACR Cryptology ePrint Archive (074) (2018)Google Scholar
- 31.Liu, X., et al.: Outsourcing two-party privacy preserving k-means clustering protocol in wireless sensor networks. In: MSN (2015)Google Scholar
- 32.Lu, W., Kawasaki, S., Sakuma, J.: Using fully homomorphic encryption for statistical analysis of categorical, ordinal and numerical data. IACR Cryptology ePrint Archive (2016/1163)Google Scholar
- 33.MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (1967)Google Scholar
- 34.Meskine, F., Bahloul, S.N.: Privacy preserving k-means clustering: a survey research. Int. Arab J. Inf. Technol.
**9**, 194–200 (2012)Google Scholar - 35.Naehrig, M., Lauter, K.E., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: CCSW (2011)Google Scholar
- 36.Phong, L.T., Aono, Y., Hayashi, T., Wang, L., Moriai, S.: Privacy-preserving deep learning via additively homomorphic encryption. IACR Cryptology ePrint Archive (2017/715)Google Scholar
- 37.Smart, N.P., Vercauteren, F.: Fully homomorphic encryption with relatively small key and ciphertext sizes. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 420–443. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7_25CrossRefzbMATHGoogle Scholar
- 38.TFHE Library. https://tfhe.github.io/tfhe
- 39.Ultsch, A.: Clustering with SOM: U* c. In: Proceedings of Workshop on Self-Organizing Maps (2005)Google Scholar
- 40.Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: SIGKDD (2003)Google Scholar
- 41.Wu, D.J., Feng, T., Naehrig, M., Lauter, K.E.: Privately evaluating decision trees and random forests. PoPETs, (4) (2016)CrossRefGoogle Scholar
- 42.Xing, K., Hu, C., Yu, J., Cheng, X., Zhang, F.: Mutual privacy preserving \(k\) -means clustering in social participatory sensing. IEEE Trans. Ind. Inform.
**13**, 2066–2076 (2017)CrossRefGoogle Scholar