# Secure Naïve Bayesian Classification over Encrypted Data in Cloud

## Abstract

To enjoy the advantage of cloud service while preserving security and privacy, huge data is increasingly outsourced to cloud in encrypted form. Unfortunately, encryption may impede the analysis and computation over the outsourced dataset. Naïve Bayesian classification is an effective algorithm to predict the class label of unlabeled samples. In this paper, we investigate naïve Bayesian classification on encrypted dataset in cloud and propose a secure scheme for the challenging problem. In our scheme, all the computation task of naïve Bayesian classification are completed by the cloud, which can dramatically reduce the burden of data owner and users. Based on the theoretical proof, our scheme can guarantee the security of both input dataset and output classification results, and the cloud can learn nothing useful about the training data of data owner and the test samples of users throughout the computation. Additionally, we evaluate our computation complexity and communication overheads in detail.

## Keywords

Cloud security Naïve Bayesian classification Privacy## Notes

### Acknowledgements

We thank the anonymous reviewers and our shepherd, Prof. Xun Yi, for their valuable feedbacks. This work is partly supported by the Natural Science Foundation of Jiangsu Province of China (No. BK20150760), the Fundamental Research Funds for the Central Universities (No. NZ2015108, NS2016094), the China Postdoctoral Science Foundation funded project (No. 2015M571752), and the Natural Science Foundation of China (No. 61472470).

## References

- 1.Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform.
**77**(2), 81–97 (2008)CrossRefGoogle Scholar - 2.Boneh, D., Goh, E.-J., Nissim, K.: Evaluating 2-DNF formulas on ciphertexts. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 325–341. Springer, Heidelberg (2005). doi: 10.1007/978-3-540-30576-7_18 CrossRefGoogle Scholar
- 3.Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. In: The Network and Distributed System Security Symposium (NDSS), pp. 1–14 (2015)Google Scholar
- 4.Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. ACM Sigkdd Explorations Newslett.
**4**(2), 28–34 (2002)CrossRefGoogle Scholar - 5.Clifton, C., Vaidya, J., Kantarcioglu, M.: Privacy-preserving naïve Bayes classification. VLDB J.
**17**(4), 879–898 (2008)CrossRefGoogle Scholar - 6.Dong, C., Chen, L., Camenisch, J., Russello, G.: Fair private set intersection with a semi-trusted arbiter. In: Wang, L., Shafiq, B. (eds.) DBSec 2013. LNCS, vol. 7964, pp. 128–144. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 7.Elgamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory
**31**(4), 469–472 (1985)MathSciNetCrossRefzbMATHGoogle Scholar - 8.Elmehdwi, Y., Samanthula, B.K., Jiang, W.: Secure k-nearest neighbor query over encrypted data in outsourced environments. In: IEEE 30th International Conference on Data Engineering (ICDE), pp. 664–675 (2014)Google Scholar
- 9.Goldreich, O.: Foundations of Cryptography: Volume II, Basic Applications. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
- 10.Kantarcıoglu, M., Vaidya, J., Clifton, C.: Privacy preserving naive Bayes classifier for horizontally partitioned data. In: IEEE ICDM workshop on privacy preserving data mining, pp. 3–9 (2003)Google Scholar
- 11.Kim, H.J., Kim, J.U., Ra, Y.G.: Boosting naïve Bayes text classification using uncertainty-based selective sampling. Neurocomputing
**67**, 403–410 (2005)CrossRefGoogle Scholar - 12.Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptology
**15**(3), 36–54 (2002)MathSciNetCrossRefzbMATHGoogle Scholar - 13.Liu, A., Zhengy, K., Liz, L., Liu, G., Zhao, L., Zhou, X.: Efficient secure similarity computation on encrypted trajectory data. In: IEEE 31st International Conference on Data Engineering (ICDE), pp. 66–77 (2015)Google Scholar
- 14.Liu, X., Lu, R., Ma, J., Chen, L., Qin, B.: Privacy-preserving patient-centric clinical decision support system on naive Bayesian classification. IEEE J. Biomed. Health Inform.
**20**(2), 655–668 (2016)CrossRefGoogle Scholar - 15.Lops, P., Gemmis, M.D., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Recommender Systems Handbook, pp. 73–105 (2011)Google Scholar
- 16.Mitchell, T.: Machine Learning, 1st edn. McGraw-Hill Science/Engineering/Math, New York (1997)zbMATHGoogle Scholar
- 17.Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)Google Scholar
- 18.Samanthula, B.K., Elmehdwi, Y., Jiang, W.: k-nearest neighbor classification over semantically secure encrypted relational data. IEEE Trans. Knowl. Data Eng.
**27**(5), 1261–1273 (2015)CrossRefGoogle Scholar - 19.Samanthula, B.K., Jiang, W.: Efficient privacy-preserving range queries over encrypted data in cloud computing. In: IEEE Sixth International Conference on Cloud Computing, pp. 51–58 (2013)Google Scholar
- 20.Yang, Z., Zhong, S., Wright, R.N.: Privacy-preserving classification of customer data without loss of accuracy. In: Siam International Conference on Data Mining, pp. 92–102 (2005)Google Scholar
- 21.Yao, A.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science, pp. 162–167. IEEE (1986)Google Scholar
- 22.Yi, X., Zhang, Y.: Privacy-preserving naive Bayes classification on distributed data via semi-trusted mixers. Inform. Syst.
**34**(3), 371–380 (2009)CrossRefGoogle Scholar - 23.Yuan, J., Yu, S.: Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Trans. Parallel Distrib. Syst.
**25**(1), 212–221 (2014)CrossRefGoogle Scholar - 24.Zhu, Y., Huang, Z., Takagi, T.: Secure and controllable k-nn query over encrypted cloud data with key confidentiality. J. Parallel Distrib. Comput.
**89**, 1–12 (2016)CrossRefGoogle Scholar - 25.Zhu, Y., Wang, Z., Zhang, Y.: Secure k-NN query on encrypted cloud data with limited key-disclosure and offline data owner. In: The 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 401–414 (2016)Google Scholar