Privacy-preserving boosting

Abstract

We describe two algorithms, BiBoost (Bipartite Boosting) and MultBoost (Multiparty Boosting), that allow two or more participants to construct a boosting classifier without explicitly sharing their data sets. We analyze both the computational and the security aspects of the algorithms. The algorithms inherit the excellent generalization performance of AdaBoost. Experiments indicate that the algorithms are better than AdaBoost executed separately by the participants, and that, independently of the number of participants, they perform close to AdaBoost executed using the entire data set.

This is a preview of subscription content, log in to check access.

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM symposium of principles of databases systems, pp 247–255

  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 439–450

  3. Aïmeur E, Brassard G, Gambs S, Kégl B (2004) Privacy-preserving boosting. In: Proceedings of the international workshop on privacy and security issues in data mining, in conjunction with PKDD’04, pp 51–69

  4. Amit Y, Blanchard G, Wilder K (2000) Multiple randomized classifiers: MRCL. Technical Report 496, Department of Statistics, University of Chicago

  5. Atallah MJ, Bertino E, Elmagarmid AK, Ibrahim M, Verykios VS (1999) Disclosure limitations of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering workshop, pp 45–52

  6. Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE international conference on data engineering, pp 217–228

  7. Ben-Or M, Goldwasser S, Wigderson A (1988) Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM annual symposium on the theory of computing, pp 1–10

  8. Bertino E, Fovino IN, Provenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Mining Knowledge Discovery 11(2):121–154

    Article  Google Scholar 

  9. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Available at http://www.ics.uci.edu/∼mlearn/MLRepository.html

  10. Chang L, Moskowitz IL, (2000) An integrated framework for database inference and privacy protection. In: Proceedings of data and applications security, pp 161–172

  11. Chang Y-C, Lu C-J (2001) Oblivious polynomial evaluation and oblivious neural learning. In: Proceedings of Asiacrypt’01, pp 369–384

  12. Chaum D (1981) Untracable electronic mail, return address and digital pseudonyms. Commun ACM 24(2):84–88

    Article  Google Scholar 

  13. Chaum D, Crépeau C, Damgård I (1988) Multiparty unconditionally secure protocols. In: Proceedings of the 20th ACM annual symposium on the theory of computing, pp 11–19

  14. Chaum D, Damgård I, van de Graaf J (1987) Multiparty computations ensuring privacy of each party’s input and correctness of the result. In: Proceedings of Crypto’87, pp 87–119

  15. Chawla S, Dwork C, McSherry F, Smith A, Wee H (2005) Towards privacy in public databases. In: Proceedings of the 2nd theory of cryptography conference, pp 363–385

  16. Clifton C, Kantarcioglǔ M, Vaidya J (2004) Data mining: next generation challenges and future directions, chapter Defining privacy for data mining. AAAI/MIT Press

  17. Clifton C, Kantarcioglǔ M, Vaidya J, Lin X, Zhiu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34

    Article  Google Scholar 

  18. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  19. Dinur I, Nissim K (2003) Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 202–210

  20. Evfimievski A (2002) Randomization in privacy preserving data mining. SIGKDD Explor 4(2): 43–48

    Article  Google Scholar 

  21. Evfimievski A, Gehrke JE, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 211–222

  22. Fan W, Stolfo SJ, Zhang J (1999) The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 362–366

  23. Feigenbaum J, Ishai Y, Malkin T, Nissim K, Strauss M, Wright R (2001) Secure multiparty computation of approximations. In: Proceedings of the 28th international colloquium on automata, languages and programming, pp 927–938

  24. Fienberg SE, McIntyre J, (2004) Data swapping: variations on a theme by Dalenius and Reiss. In: Proceedings of privacy in statistical databases, pp 14–29

  25. Freedman M, Nissim K, Pinkas B (2004) Efficient private matching and set intersection. In: Proceedings of Eurocrypt’04, pp 1–19

  26. Freund Y, Schapire RE, (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput System Sci 55:119–139

    MATH  Article  MathSciNet  Google Scholar 

  27. Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    MATH  Article  Google Scholar 

  28. Furukawa J, Sako K (2001) An efficient scheme for proving a shuffle. In: Proceedings of Crypto 2001, pp 368–387

  29. Goldreich O (2004) Foundations of cryptography, volume II: basic applications. Cambridge University Press

  30. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game – A completeness theorem for protocols with honest majority. In: Proceedings of the 19th ACM symposium on theory of computing, pp 218–229

  31. Iyengar V (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288

  32. Kalyanasundaran B, Schnitger G (1987) The probabilistic communication of set intersection. In: Proceedings of the 2nd annual IEEE conference on structure in complexity theory, pages 41–47.

  33. Kantarcioglǔ M, Clifton C, (2004a) Privacy-preserving distributed k-nn classifier. In: European conference on principles of data mining and knowledge discovery, pp 279–290

  34. Kantarcioglǔ M, Clifton C, (2004b) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transac on Knowledge Data Engi 16(9):1026–1037

    Article  Google Scholar 

  35. Kantarcioglǔ M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 599–604

  36. Kantarcioglǔ M, Vaidya J (2004) Privacy preserving naive bayes classifier for horizontally partitioned data. In: Proceedings of the workshop on privacy preserving data mining held in association with the third IEEE international conference on data mining

  37. Kégl B (2003) Robust regression by boosting the median. In: Proceedings of the 16th conference on computational learning theory, pp 258–272

  38. Kissner L, Song D (2005) Privacy-preserving set operations. In: Proceedings of Crypto 2005, pp 241–257

  39. Kolcz A, Xiaomei S, Kalita J (2002). Efficient handling of high-dimensional feature spaces by randomized classifier ensembles. In: Proceedings of SIGKDD’02, pp 307–313

  40. Kruger L, Jha S, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security, pp 397–417

  41. Lazarevic A, Obradovic Z (2002) Boosting algorithms for parallel and distributed learning. Distrib Parallel Databases 11(2):203–229

    MATH  Article  Google Scholar 

  42. Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206

    MATH  Article  MathSciNet  Google Scholar 

  43. Paillier P (2000) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Asiacrypt’00, pp 573–584

  44. Neff A (2001) A verifiable secret shuffle and its application to e-voting. In: ACM CCS, pp 116́b-125

  45. Predd JB, Kulkarni SR, Poor HV (2006) Consistency in models distributed learning under communication constraints. IEEE Transac Information Theor 52(1):52–63

    Article  MathSciNet  Google Scholar 

  46. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  47. Rabin T, Ben-Or M (1989) Verifiable secret sharing and multiparty protocols with honest majority. In: Proceedings of the 21th ACM symposium on theory of computing, pp 73–85

  48. Schapire RE, Singer Y, (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    MATH  Article  Google Scholar 

  49. Shamir A (1979) How to share a secret. Communications of the ACM 22(11):612–613

    MATH  Article  MathSciNet  Google Scholar 

  50. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty, Fuzziness, Knowledge-based Syst 10(5):571–588

    MATH  Article  MathSciNet  Google Scholar 

  51. Valiant L (1984) A theory of the learnable. Communications of the ACM 27(11):1134–1142

    MATH  Article  Google Scholar 

  52. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Record 3(1):50–57

    Article  Google Scholar 

  53. Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167

  54. Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 21st annual ACM symposium on applied computing, pp 603–610

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Balázs Kégl.

Additional information

Responsible Editor: Charu Aggarwal.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Gambs, S., Kégl, B. & Aïmeur, E. Privacy-preserving boosting. Data Min Knowl Disc 14, 131–170 (2007). https://doi.org/10.1007/s10618-006-0051-9

Download citation

Keywords

  • Privacy-preserving data mining
  • Boosting
  • AdaBoost distributed learning
  • Secure multiparty computation