Data Mining and Knowledge Discovery

, Volume 14, Issue 1, pp 131–170 | Cite as

Privacy-preserving boosting

Article

Abstract

We describe two algorithms, BiBoost (Bipartite Boosting) and MultBoost (Multiparty Boosting), that allow two or more participants to construct a boosting classifier without explicitly sharing their data sets. We analyze both the computational and the security aspects of the algorithms. The algorithms inherit the excellent generalization performance of AdaBoost. Experiments indicate that the algorithms are better than AdaBoost executed separately by the participants, and that, independently of the number of participants, they perform close to AdaBoost executed using the entire data set.

Keywords

Privacy-preserving data mining Boosting AdaBoost distributed learning Secure multiparty computation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM symposium of principles of databases systems, pp 247–255Google Scholar
  2. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 439–450Google Scholar
  3. Aïmeur E, Brassard G, Gambs S, Kégl B (2004) Privacy-preserving boosting. In: Proceedings of the international workshop on privacy and security issues in data mining, in conjunction with PKDD’04, pp 51–69Google Scholar
  4. Amit Y, Blanchard G, Wilder K (2000) Multiple randomized classifiers: MRCL. Technical Report 496, Department of Statistics, University of ChicagoGoogle Scholar
  5. Atallah MJ, Bertino E, Elmagarmid AK, Ibrahim M, Verykios VS (1999) Disclosure limitations of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering workshop, pp 45–52Google Scholar
  6. Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE international conference on data engineering, pp 217–228Google Scholar
  7. Ben-Or M, Goldwasser S, Wigderson A (1988) Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM annual symposium on the theory of computing, pp 1–10Google Scholar
  8. Bertino E, Fovino IN, Provenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Mining Knowledge Discovery 11(2):121–154CrossRefGoogle Scholar
  9. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Available at http://www.ics.uci.edu/∼mlearn/MLRepository.htmlGoogle Scholar
  10. Chang L, Moskowitz IL, (2000) An integrated framework for database inference and privacy protection. In: Proceedings of data and applications security, pp 161–172Google Scholar
  11. Chang Y-C, Lu C-J (2001) Oblivious polynomial evaluation and oblivious neural learning. In: Proceedings of Asiacrypt’01, pp 369–384Google Scholar
  12. Chaum D (1981) Untracable electronic mail, return address and digital pseudonyms. Commun ACM 24(2):84–88CrossRefGoogle Scholar
  13. Chaum D, Crépeau C, Damgård I (1988) Multiparty unconditionally secure protocols. In: Proceedings of the 20th ACM annual symposium on the theory of computing, pp 11–19Google Scholar
  14. Chaum D, Damgård I, van de Graaf J (1987) Multiparty computations ensuring privacy of each party’s input and correctness of the result. In: Proceedings of Crypto’87, pp 87–119Google Scholar
  15. Chawla S, Dwork C, McSherry F, Smith A, Wee H (2005) Towards privacy in public databases. In: Proceedings of the 2nd theory of cryptography conference, pp 363–385Google Scholar
  16. Clifton C, Kantarcioglǔ M, Vaidya J (2004) Data mining: next generation challenges and future directions, chapter Defining privacy for data mining. AAAI/MIT PressGoogle Scholar
  17. Clifton C, Kantarcioglǔ M, Vaidya J, Lin X, Zhiu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34CrossRefGoogle Scholar
  18. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATHGoogle Scholar
  19. Dinur I, Nissim K (2003) Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 202–210Google Scholar
  20. Evfimievski A (2002) Randomization in privacy preserving data mining. SIGKDD Explor 4(2): 43–48CrossRefGoogle Scholar
  21. Evfimievski A, Gehrke JE, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 211–222Google Scholar
  22. Fan W, Stolfo SJ, Zhang J (1999) The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 362–366Google Scholar
  23. Feigenbaum J, Ishai Y, Malkin T, Nissim K, Strauss M, Wright R (2001) Secure multiparty computation of approximations. In: Proceedings of the 28th international colloquium on automata, languages and programming, pp 927–938Google Scholar
  24. Fienberg SE, McIntyre J, (2004) Data swapping: variations on a theme by Dalenius and Reiss. In: Proceedings of privacy in statistical databases, pp 14–29Google Scholar
  25. Freedman M, Nissim K, Pinkas B (2004) Efficient private matching and set intersection. In: Proceedings of Eurocrypt’04, pp 1–19Google Scholar
  26. Freund Y, Schapire RE, (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput System Sci 55:119–139MATHCrossRefMathSciNetGoogle Scholar
  27. Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378MATHCrossRefGoogle Scholar
  28. Furukawa J, Sako K (2001) An efficient scheme for proving a shuffle. In: Proceedings of Crypto 2001, pp 368–387Google Scholar
  29. Goldreich O (2004) Foundations of cryptography, volume II: basic applications. Cambridge University PressGoogle Scholar
  30. Goldreich O, Micali S, Wigderson A (1987) How to play any mental game – A completeness theorem for protocols with honest majority. In: Proceedings of the 19th ACM symposium on theory of computing, pp 218–229Google Scholar
  31. Iyengar V (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288Google Scholar
  32. Kalyanasundaran B, Schnitger G (1987) The probabilistic communication of set intersection. In: Proceedings of the 2nd annual IEEE conference on structure in complexity theory, pages 41–47.Google Scholar
  33. Kantarcioglǔ M, Clifton C, (2004a) Privacy-preserving distributed k-nn classifier. In: European conference on principles of data mining and knowledge discovery, pp 279–290Google Scholar
  34. Kantarcioglǔ M, Clifton C, (2004b) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transac on Knowledge Data Engi 16(9):1026–1037CrossRefGoogle Scholar
  35. Kantarcioglǔ M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 599–604Google Scholar
  36. Kantarcioglǔ M, Vaidya J (2004) Privacy preserving naive bayes classifier for horizontally partitioned data. In: Proceedings of the workshop on privacy preserving data mining held in association with the third IEEE international conference on data miningGoogle Scholar
  37. Kégl B (2003) Robust regression by boosting the median. In: Proceedings of the 16th conference on computational learning theory, pp 258–272Google Scholar
  38. Kissner L, Song D (2005) Privacy-preserving set operations. In: Proceedings of Crypto 2005, pp 241–257Google Scholar
  39. Kolcz A, Xiaomei S, Kalita J (2002). Efficient handling of high-dimensional feature spaces by randomized classifier ensembles. In: Proceedings of SIGKDD’02, pp 307–313Google Scholar
  40. Kruger L, Jha S, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security, pp 397–417Google Scholar
  41. Lazarevic A, Obradovic Z (2002) Boosting algorithms for parallel and distributed learning. Distrib Parallel Databases 11(2):203–229MATHCrossRefGoogle Scholar
  42. Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206MATHCrossRefMathSciNetGoogle Scholar
  43. Paillier P (2000) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Asiacrypt’00, pp 573–584Google Scholar
  44. Neff A (2001) A verifiable secret shuffle and its application to e-voting. In: ACM CCS, pp 116́b-125Google Scholar
  45. Predd JB, Kulkarni SR, Poor HV (2006) Consistency in models distributed learning under communication constraints. IEEE Transac Information Theor 52(1):52–63CrossRefMathSciNetGoogle Scholar
  46. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  47. Rabin T, Ben-Or M (1989) Verifiable secret sharing and multiparty protocols with honest majority. In: Proceedings of the 21th ACM symposium on theory of computing, pp 73–85Google Scholar
  48. Schapire RE, Singer Y, (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336MATHCrossRefGoogle Scholar
  49. Shamir A (1979) How to share a secret. Communications of the ACM 22(11):612–613MATHCrossRefMathSciNetGoogle Scholar
  50. Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty, Fuzziness, Knowledge-based Syst 10(5):571–588MATHCrossRefMathSciNetGoogle Scholar
  51. Valiant L (1984) A theory of the learnable. Communications of the ACM 27(11):1134–1142MATHCrossRefGoogle Scholar
  52. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Record 3(1):50–57CrossRefGoogle Scholar
  53. Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167Google Scholar
  54. Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 21st annual ACM symposium on applied computing, pp 603–610Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Computer Science and Operations ResearchUniversity of MontrealMontréalCanada

Personalised recommendations