Advertisement

Cluster Computing

, Volume 22, Supplement 1, pp 1581–1593 | Cite as

Outsourced privacy-preserving C4.5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties

  • Ye LiEmail author
  • Zoe L. JiangEmail author
  • Lin Yao
  • Xuan Wang
  • S. M. Yiu
  • Zhengan Huang
Article

Abstract

Many companies want to share data for data-mining tasks. However, privacy and security concerns have become a bottleneck in the data-sharing field. The secure multiparty computation (SMC)-based privacy-preserving data mining has emerged as a solution to this problem. However, there is heavy computation cost at user side in traditional SMC solutions. This study introduces an outsourcing method to reduce the computation cost of the user side. We also preserve the privacy of the shared databy proposing an outsourced privacy-preserving C4.5 algorithm over horizontally and vertically partitioned data for multiple parties based on the outsourced privacy preserving weighted average protocol (OPPWAP) and outsourced secure set intersection protocol (OSSIP). Consequently, we have found that our method can achieve a result same the original C4.5 decision tree algorithm while preserving data privacy. Furthermore, we also implement the proposed protocols and the algorithms. It shows that a sublinear relationship exists between the computational cost of the user side and the number of participating parties.

Keywords

Secure multiparty computation Outsourced computation C4.5 decision tree Privacy preserving data mining PPWAP SSIP 

Notes

Acknowledgements

This work is supported by National High Technology Research and Development Program of China (No. 2015AA016008), National Natural Science Foundation of China (No. 61402136), Natural Science Foundation of Guangdong Province, China (No. 2014A030313697), National Natural Science Foundation of China (No. 61472091), Natural Science Foundation of Guangdong Province for Distinguished Young Scholars (2014A030306020), Guangzhou scholars project for universities of Guangzhou (No. 1201561613), Science and Technology Planning Project of Guangdong Province, China (2015B01012 9015) and Guangdong Province Key Laboratory of High Performance Computing (No. [2013]82).

References

  1. 1.
    A. Yao.: How to generate and exchange secrets. In: Proceedings of Annual Symposium on Foundations of Computer Science, pp. 162–167 (1986)Google Scholar
  2. 2.
    Bresson, E., Catalano, D., Pointcheval. A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In: Advances in Cryptology—ASIACRYPT 2003, Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, November 30–December 4, 2003, vol. 2894, pp. 37–54. (2003)Google Scholar
  3. 3.
    Liu, D., Bertino, E., Yi, X.: Privacy of outsourced K-means clustering. In: Proceedings of ACM Symposium on Information, Computer and Communications Security, pp. 123–134. (2014)Google Scholar
  4. 4.
    Elgamal, T.: A public key cryptosystem and a signature scheme based on discrete logarithms. In: CRYPTO 1984: Proceedings of Advances in Cryptology, pp. 10–18. (1985)Google Scholar
  5. 5.
    Emekci, F., Sahin, O.D., et al.: Privacy preserving decision tree learning over multiple parties. Data Knowl Eng 63(2), 348–361 (2007)CrossRefGoogle Scholar
  6. 6.
    Fu, Z., Huang, F., Sun, X., et al.: Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Serv. Comput. (2016). doi: 10.1109/TSC.2016.2622697
  7. 7.
    Gangrade, A., Patel, R.: Building privacy-preserving C4.5 decision tree classifier on multi-parties. Int. J. Comput. Sci. Eng. 1(3), 199–205 (2009)Google Scholar
  8. 8.
    Gupta, B.B., Agrawal, D.P., Yamaguchi, S.: Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security. IGI Global Publisher, Hershey (2016)CrossRefGoogle Scholar
  9. 9.
    G. Jagannathan, Wright, R.N.: Privacy-preserving distributed Kmeans clustering over arbitrarily partitioned data. In: Proceedings of ACM International Conference on Knowledge Discovery, pp. 593–599. (2005)Google Scholar
  10. 10.
    Hohenberger, S., Lysyanskaya, A.: How to securely outsource cryptographic computations. In: Lecture Notes in Computer Science, vol. 3378, pp. 264–282. (2005)Google Scholar
  11. 11.
    Li, Jin, Chen, Xiaofeng, Li, Mingqiang, Li, Jingwei, Lee, Patrick, Lou, Wenjing: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25(6), 1615–1625 (2014)CrossRefGoogle Scholar
  12. 12.
    Li, Jin, Li, Jingwei, Chen, Xiaofeng, Jia, C., Lou, W.: Identity-based encryption with outsourced revocation in cloud computing. IEEE Trans. Comput. 64(2), 425–437 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Li, Jian, Li, Xiaolong, Yang, Bin, Sun, Xingming: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507–518 (2015)CrossRefGoogle Scholar
  14. 14.
    Zhan, J., Matwin, S., et al.: Privacy preserving decision tree classification over horizontally partitioned data. In: Proceedings of International Conference on Electronic Business, pp. 470–476. (2005)Google Scholar
  15. 15.
    Kamara, S., Mohassel, P., Raykova, M.: Outsourcing multi-party computation. IACR Cryptol. Eprint Arch. 2011(3), 435–451 (2011)Google Scholar
  16. 16.
    Keonsoo, L., et al.: A comparative evaluation of atrial fibrillation detection methods in Koreans based on optical recordings using a smartphone. In: IEEE Access. (2017). doi: 10.1109/ACCESS.2017.2700488
  17. 17.
    Li, J., Yan, H., Liu, Z., et al.: Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Syst. J. 99, 1–10 (2015)Google Scholar
  18. 18.
    Malek, M.S.B.A., Ahmadon, M.A.B., Yamaguchi, S., et al.: On privacy verification in the IoT service based on PN2. In: Global Conference on Consumer Electronics, 2016 IEEE. (2016)Google Scholar
  19. 19.
    Xiao, M., Huang, L., et al.: Privacy preserving ID3 algorithm over horizontally partitioned data. In: Proceedings of International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 239–243. (2005)Google Scholar
  20. 20.
    Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT 1999 Proceedings, pp. 223–238. (1999)Google Scholar
  21. 21.
    Lory, P.: Enhancing the efficiency in privacy preserving learning of decision trees in partitioned databases. In: Proceedings of International Conference on Privacy in Statistical Databases, pp. 322–334. (2012)Google Scholar
  22. 22.
    Peter, A., Tews, E., Katzenbeisser, S.: Efficiently outsourcing multiparty computation under multiple keys. IEEE Trans. Inf. Forensics Secur. 8(12), 2046–2058 (2013)CrossRefGoogle Scholar
  23. 23.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Burlington (1993)Google Scholar
  24. 24.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  25. 25.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 439–450. (2000)Google Scholar
  26. 26.
    Samet, S., Miri, A.: Privacy preserving ID3 using Gini index over horizontally partitioned data. In: Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, pp. 645–651. (2008)Google Scholar
  27. 27.
    Shen, Y., Shao, H., Yang, L.: Privacy preserving C4.5 algorithm over vertically distributed datasets. In: Proceedings of IEEE International Conference on Networks Security, Wireless Communications and Trusted Computing, pp. 446–448. (2009)Google Scholar
  28. 28.
    Stergiou, C., Psannis, K.E., Kim, B.G., et al.: Secure integration of IoT and cloud computing. Future Gener. Comput. Syst. (2016)Google Scholar
  29. 29.
    Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: Lecture Notes in Computer Science, vol. 2, pp. 139–152. (2005)Google Scholar
  30. 30.
    Vaidya J, Clifton C.: Privacy preserving association rule mining in vertically partitioned data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 639–644. (2002)Google Scholar
  31. 31.
    Vaidya, J., Clifton, C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4), 593–622 (2005)CrossRefGoogle Scholar
  32. 32.
    Vaidya, J., Shafiq, B., Fan, W., et al.: A random decision tree framework for privacy-preserving data mining. IEEE Trans. Dependable Secur. Comput. 11(5), 399–411 (2014)CrossRefGoogle Scholar
  33. 33.
    Veluru, S., Gupta, B.B., Rahulamathavan, Y., et al.: Privacy preserving text analytics: research challenges and strategies in name analysis. In: Handbook of Research on Securing Cloud-Based Databases with Biometric Applications, pp. 67–92 (2015)Google Scholar
  34. 34.
    Wang, Z., Gu, T. and Cheung, S.: A theoretical framework for distributed secure outsourced computing using secret sharing. In: Proceedings of IEEE International Workshop on Information Forensics and Security. (2014)Google Scholar
  35. 35.
    Fang, W., Yang, B.: Privacy preserving decision tree learning over vertically partitioned data. In: Proceedings of IEEE International Conference on Computer Science and Software Engineering, pp. 1049–1052. (2008)Google Scholar
  36. 36.
    Wu, D.J., Feng, T., Naehrig, M., et al.: Privately evaluating decision trees and random forests. In: Proceedings on Privacy Enhancing Technologies, vol. (4). (2016)Google Scholar
  37. 37.
    Liu, X., Jiang, Z.L., Yiu, S.M., Wang, X..: Outsourcing two-party privacy preserving K-means clustering protocol in wireless sensor networks. In: Proceedings of International Conference on Mobile Ad-Hoc and Sensor Networks, pp. 124–133. (2015)Google Scholar
  38. 38.
    Xiao, M.J., Han, K., Huang, L.S., et al.: Privacy preserving C4.5 algorithm over horizontally partitioned data. In: Proceedings of International Conference on Grid and Cooperative Computing, pp. 78–85. (2006)Google Scholar
  39. 39.
    Jararweh, Y., Alsmirat, M., Al-Ayyoub, M., et al.: Software defined system support for enabling ubiquitous mobile edge computing. Comput. J. Oxf. (2017). doi: 10.1093/comjnl/bxx019
  40. 40.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Lindell, Y., Pinkas, B.: Secure multi-party computation for privacy-preserving data mining. J. Privacy Confid. 25(2), 59–98 (2009)Google Scholar
  42. 42.
    Zhangjie, Fu, Xinle, Wu, Guan, Chaowen, Sun, Xingming, Ren, Kui: Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans. Inf. Forensics Secur. 11(12), 2706–2716 (2016)CrossRefGoogle Scholar
  43. 43.
    Zhangjie, Fu, Ren, Kui, Shu, Jiangang, Sun, Xingming, Huang, Fengxiao: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Tran. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)CrossRefGoogle Scholar
  44. 44.
    Xia, Zhihua, Wang, Xinhui, Sun, Xingming, Wang, Qian: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Harbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  2. 2.Harbin Institute of Technology Shenzhen Graduate School and Guangdong Provincial Key Laboratory of High Performance ComputingShenzhenChina
  3. 3.Harbin Institute of Technology School of SoftwareHarbinChina
  4. 4.The University of Hong KongHong Kong, SARChina
  5. 5.Guangzhou UniversityGuangzhouChina

Personalised recommendations