Practical Secure Decision Tree Learning in a Teletreatment Application

  • Sebastiaan de Hoogh
  • Berry Schoenmakers
  • Ping Chen
  • Harm op den Akker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8437)

Abstract

In this paper we develop a range of practical cryptographic protocols for secure decision tree learning, a primary problem in privacy preserving data mining. We focus on particular variants of the well-known ID3 algorithm allowing a high level of security and performance at the same time. Our approach is basically to design special-purpose secure multiparty computations, hence privacy will be guaranteed as long as the honest parties form a sufficiently large quorum.

Our main ID3 protocol will ensure that the entire database of transactions remains secret except for the information leaked from the decision tree output by the protocol. We instantiate the underlying ID3 algorithm such that the performance of the protocol is enhanced considerably, while at the same time limiting the information leakage from the decision tree. Concretely, we apply a threshold for the number of transactions below which the decision tree will consist of a single leaf—limiting information leakage. We base the choice of the “best” predicting attribute for the root of a decision tree on the Gini index rather than the well-known information gain based on Shannon entropy, and we develop a particularly efficient protocol for securely finding the attribute of highest Gini index. Moreover, we present advanced secure ID3 protocols, which generate the decision tree as a secret output, and which allow secure lookup of predictions (even hiding the transaction for which the prediction is made). In all cases, the resulting decision trees are of the same quality as commonly obtained for the ID3 algorithm.

We have implemented our protocols in Python using VIFF, where the underlying protocols are based on Shamir secret sharing. Due to a judicious use of secret indexing and masking techniques, we are able to code the protocols in a recursive manner without any loss of efficiency. To demonstrate practical feasibility we apply the secure ID3 protocols to an automated health care system of a real-life rehabilitation organization.

Notes

Acknowledgements

This work was supported by the Dutch national program COMMIT.

References

  1. [AJH10]
    op den Akker, H., Jones, V.M., Hermens, H.J.: Predicting feedback compliance in a teletreatment application. In: Proceedings of ISABEL 2010: The 3rd International Symposium on Applied Sciences in Biomedical and Communication Technologies, Rome, Italy (2010)Google Scholar
  2. [AS00]
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 439–450. ACM, New York (2000)Google Scholar
  3. [Bre96]
    Breiman, L.: Technical note: some properties of splitting criteria. Mach. Learn. 24, 41–47 (1996)MATHMathSciNetGoogle Scholar
  4. [BTW12]
    Bogdanov, D., Talviste, R., Willemson, J.: Deploying secure multi-party computation for financial data analysis. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 57–64. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. [CdH10]
    Catrina, O., de Hoogh, S.: Secure multiparty linear programming using fixed-point arithmetic. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345, pp. 134–150. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. [CDI05]
    Cramer, R., Damgård, I., Ishai, Y.: Share conversion, pseudorandom secret-sharing and applications to secure computation. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 342–362. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. [DZ02]
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, vol. 14, pp. 1–8. Australian Computer Society Inc. (2002)Google Scholar
  8. [EFG+09]
    Erkin, Z., Franz, M., Guajardo, J., Katzenbeisser, S., Lagendijk, I., Toft, T.: Privacy-preserving face recognition. In: Goldberg, I., Atallah, M.J. (eds.) PETS 2009. LNCS, vol. 5672, pp. 235–253. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. [FA10]
    Frank, A., Asuncion, A.: UCI machine learning repository (2010)Google Scholar
  10. [Gei10]
    Geisler, M.: Cryptographic protocols: theory and implementation. Ph.D. thesis, Aarhus University, Denmark, February 2010Google Scholar
  11. [Kel10]
  12. [LP00]
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. [MD08]
    Ma, Q., Deng, P.: Secure multi-party protocols for privacy preserving data mining. In: Li, Y., Huynh, D.T., Das, S.K., Du, D.-Z. (eds.) WASA 2008. LNCS, vol. 5258, pp. 526–537. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. [MGA12]
    Bashir Malik, M., Asger Ghazi, M., Ali, R.: Privacy preserving data mining techniques: current scenario and future prospects. In: Proceedings of the 2012 Third International Conference on Computer and Communication Technology, ICCCT ’12, pp. 26–32. IEEE Computer Society, Washington, DC (2012)Google Scholar
  15. [NO07]
    Nishide, T., Ohta, K.: Multiparty computation for interval, equality, and comparison without bit-decomposition protocol. In: Okamoto, T., Wang, X. (eds.) PKC 2007. LNCS, vol. 4450, pp. 343–360. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. [Qui86]
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  17. [RM05]
    Rokach, L., Maimon, O.: Decision trees. In: The Data Mining and Knowledge Discovery Handbook, pp. 165–192. Springer, US (2005)Google Scholar
  18. [RS00]
    Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41, 77–93 (2000)CrossRefMathSciNetGoogle Scholar
  19. [SM08]
    Samet, S., Miri, A.: Privacy preserving ID3 using Gini index over horizontally partitioned data. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008, pp. 645–651. IEEE (2008)Google Scholar
  20. [VCKP08]
    Vaidya, J., Clifton, C., Kantarcıoğlu, M., Scott Patterson, A.: Privacy-preserving decision trees over vertically partitioned data. ACM Trans. Knowl. Discov. Data 2(3), 14:1–14:27 (2008)Google Scholar
  21. [WXSY06]
    Wang, K., Xu, Y., She, R., Yu, P.S.: Classification spanning private databases. In: Proceedings of the National Conference on Artificial Intelligence, vol. 21, p. 293. AAAI Press, MIT Press, Cambridge, London (1999, 2006)Google Scholar
  22. [XHLS05]
    Xiao, M.-J., Huang, L.-S., Luo, Y.-L., Shen, H.: Privacy preserving ID3 algorithm over horizontally partitioned data. In: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2005, pp. 239–243. IEEE (2005)Google Scholar
  23. [Yao86]
    Yao, A.: How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symposium on Foundations of Computer Science (FOCS ’86), pp. 162–167. IEEE Computer Society (1986)Google Scholar

Copyright information

© International Financial Cryptography Association 2014

Authors and Affiliations

  • Sebastiaan de Hoogh
    • 1
  • Berry Schoenmakers
    • 2
  • Ping Chen
    • 3
  • Harm op den Akker
    • 4
  1. 1.TU DelftDelftThe Netherlands
  2. 2.TU EindhovenEindhovenThe Netherlands
  3. 3.KU LeuvenLeuvenBelgium
  4. 4.Roessingh R&D and U TwenteEnschedeThe Netherlands

Personalised recommendations