Privacy-preserving boosting
Article
First Online:
Received:
Accepted:
- 243 Downloads
- 19 Citations
Abstract
We describe two algorithms, BiBoost (Bipartite Boosting) and MultBoost (Multiparty Boosting), that allow two or more participants to construct a boosting classifier without explicitly sharing their data sets. We analyze both the computational and the security aspects of the algorithms. The algorithms inherit the excellent generalization performance of AdaBoost. Experiments indicate that the algorithms are better than AdaBoost executed separately by the participants, and that, independently of the number of participants, they perform close to AdaBoost executed using the entire data set.
Keywords
Privacy-preserving data mining Boosting AdaBoost distributed learning Secure multiparty computationPreview
Unable to display preview. Download preview PDF.
References
- Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM symposium of principles of databases systems, pp 247–255Google Scholar
- Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 439–450Google Scholar
- Aïmeur E, Brassard G, Gambs S, Kégl B (2004) Privacy-preserving boosting. In: Proceedings of the international workshop on privacy and security issues in data mining, in conjunction with PKDD’04, pp 51–69Google Scholar
- Amit Y, Blanchard G, Wilder K (2000) Multiple randomized classifiers: MRCL. Technical Report 496, Department of Statistics, University of ChicagoGoogle Scholar
- Atallah MJ, Bertino E, Elmagarmid AK, Ibrahim M, Verykios VS (1999) Disclosure limitations of sensitive rules. In: Proceedings of the IEEE knowledge and data engineering workshop, pp 45–52Google Scholar
- Bayardo R, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of the 21st IEEE international conference on data engineering, pp 217–228Google Scholar
- Ben-Or M, Goldwasser S, Wigderson A (1988) Completeness theorems for non-cryptographic fault-tolerant distributed computation. In Proceedings of the 20th ACM annual symposium on the theory of computing, pp 1–10Google Scholar
- Bertino E, Fovino IN, Provenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Mining Knowledge Discovery 11(2):121–154CrossRefGoogle Scholar
- Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Available at http://www.ics.uci.edu/∼mlearn/MLRepository.htmlGoogle Scholar
- Chang L, Moskowitz IL, (2000) An integrated framework for database inference and privacy protection. In: Proceedings of data and applications security, pp 161–172Google Scholar
- Chang Y-C, Lu C-J (2001) Oblivious polynomial evaluation and oblivious neural learning. In: Proceedings of Asiacrypt’01, pp 369–384Google Scholar
- Chaum D (1981) Untracable electronic mail, return address and digital pseudonyms. Commun ACM 24(2):84–88CrossRefGoogle Scholar
- Chaum D, Crépeau C, Damgård I (1988) Multiparty unconditionally secure protocols. In: Proceedings of the 20th ACM annual symposium on the theory of computing, pp 11–19Google Scholar
- Chaum D, Damgård I, van de Graaf J (1987) Multiparty computations ensuring privacy of each party’s input and correctness of the result. In: Proceedings of Crypto’87, pp 87–119Google Scholar
- Chawla S, Dwork C, McSherry F, Smith A, Wee H (2005) Towards privacy in public databases. In: Proceedings of the 2nd theory of cryptography conference, pp 363–385Google Scholar
- Clifton C, Kantarcioglǔ M, Vaidya J (2004) Data mining: next generation challenges and future directions, chapter Defining privacy for data mining. AAAI/MIT PressGoogle Scholar
- Clifton C, Kantarcioglǔ M, Vaidya J, Lin X, Zhiu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34CrossRefGoogle Scholar
- Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATHGoogle Scholar
- Dinur I, Nissim K (2003) Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 202–210Google Scholar
- Evfimievski A (2002) Randomization in privacy preserving data mining. SIGKDD Explor 4(2): 43–48CrossRefGoogle Scholar
- Evfimievski A, Gehrke JE, Srikant R (2003) Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems, pp 211–222Google Scholar
- Fan W, Stolfo SJ, Zhang J (1999) The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 362–366Google Scholar
- Feigenbaum J, Ishai Y, Malkin T, Nissim K, Strauss M, Wright R (2001) Secure multiparty computation of approximations. In: Proceedings of the 28th international colloquium on automata, languages and programming, pp 927–938Google Scholar
- Fienberg SE, McIntyre J, (2004) Data swapping: variations on a theme by Dalenius and Reiss. In: Proceedings of privacy in statistical databases, pp 14–29Google Scholar
- Freedman M, Nissim K, Pinkas B (2004) Efficient private matching and set intersection. In: Proceedings of Eurocrypt’04, pp 1–19Google Scholar
- Freund Y, Schapire RE, (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput System Sci 55:119–139MATHCrossRefMathSciNetGoogle Scholar
- Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378MATHCrossRefGoogle Scholar
- Furukawa J, Sako K (2001) An efficient scheme for proving a shuffle. In: Proceedings of Crypto 2001, pp 368–387Google Scholar
- Goldreich O (2004) Foundations of cryptography, volume II: basic applications. Cambridge University PressGoogle Scholar
- Goldreich O, Micali S, Wigderson A (1987) How to play any mental game – A completeness theorem for protocols with honest majority. In: Proceedings of the 19th ACM symposium on theory of computing, pp 218–229Google Scholar
- Iyengar V (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288Google Scholar
- Kalyanasundaran B, Schnitger G (1987) The probabilistic communication of set intersection. In: Proceedings of the 2nd annual IEEE conference on structure in complexity theory, pages 41–47.Google Scholar
- Kantarcioglǔ M, Clifton C, (2004a) Privacy-preserving distributed k-nn classifier. In: European conference on principles of data mining and knowledge discovery, pp 279–290Google Scholar
- Kantarcioglǔ M, Clifton C, (2004b) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transac on Knowledge Data Engi 16(9):1026–1037CrossRefGoogle Scholar
- Kantarcioglǔ M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 599–604Google Scholar
- Kantarcioglǔ M, Vaidya J (2004) Privacy preserving naive bayes classifier for horizontally partitioned data. In: Proceedings of the workshop on privacy preserving data mining held in association with the third IEEE international conference on data miningGoogle Scholar
- Kégl B (2003) Robust regression by boosting the median. In: Proceedings of the 16th conference on computational learning theory, pp 258–272Google Scholar
- Kissner L, Song D (2005) Privacy-preserving set operations. In: Proceedings of Crypto 2005, pp 241–257Google Scholar
- Kolcz A, Xiaomei S, Kalita J (2002). Efficient handling of high-dimensional feature spaces by randomized classifier ensembles. In: Proceedings of SIGKDD’02, pp 307–313Google Scholar
- Kruger L, Jha S, McDaniel P (2005) Privacy preserving clustering. In: Proceedings of the 10th European symposium on research in computer security, pp 397–417Google Scholar
- Lazarevic A, Obradovic Z (2002) Boosting algorithms for parallel and distributed learning. Distrib Parallel Databases 11(2):203–229MATHCrossRefGoogle Scholar
- Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15:177–206MATHCrossRefMathSciNetGoogle Scholar
- Paillier P (2000) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Asiacrypt’00, pp 573–584Google Scholar
- Neff A (2001) A verifiable secret shuffle and its application to e-voting. In: ACM CCS, pp 116́b-125Google Scholar
- Predd JB, Kulkarni SR, Poor HV (2006) Consistency in models distributed learning under communication constraints. IEEE Transac Information Theor 52(1):52–63CrossRefMathSciNetGoogle Scholar
- Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
- Rabin T, Ben-Or M (1989) Verifiable secret sharing and multiparty protocols with honest majority. In: Proceedings of the 21th ACM symposium on theory of computing, pp 73–85Google Scholar
- Schapire RE, Singer Y, (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336MATHCrossRefGoogle Scholar
- Shamir A (1979) How to share a secret. Communications of the ACM 22(11):612–613MATHCrossRefMathSciNetGoogle Scholar
- Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertainty, Fuzziness, Knowledge-based Syst 10(5):571–588MATHCrossRefMathSciNetGoogle Scholar
- Valiant L (1984) A theory of the learnable. Communications of the ACM 27(11):1134–1142MATHCrossRefGoogle Scholar
- Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Record 3(1):50–57CrossRefGoogle Scholar
- Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167Google Scholar
- Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 21st annual ACM symposium on applied computing, pp 603–610Google Scholar
Copyright information
© Springer Science+Business Media, LLC 2007