Abstract
Privacy-preserving data mining—developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.
Similar content being viewed by others
References
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 247–255. ACM, Santa Barbara, California, USA (2001). http://doi.acm.org/10.1145/375551.375602
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM, Dallas, TX (2000). http://doi.acm.org/10.1145/342009.335438
Benaloh, J.C.: Secret sharing homomorphisms: Keeping shares of a secret secret. In: A. Odlyzko (ed.) Advances in Cryptography–CRYPTO86: Proceedings, vol. 263, pp. 251–260. Lecture Notes in Computer Science, Springer Heidelberg (1986). http://springerlink.metapress.com/openurl.asp? genre=article&issn= 0302-9743&volume=263&spage=251
Blum, M., Goldwasser, S.: An efficient probabilistic public-key encryption that hides all partial information. In: R. (ed.) Advances in Cryptology—Crypto 84 Proceedings. Springer, Heidelberg (1984)
Chang, Y.C., Lu, C.J.: Oblivious polynomial evaluation and oblivious neural learning. Lecture Notes in Computer Science, vol. 2248, pp. 369+ (2001). citeseer.nj.nec. com/531490.html
Chor B. and Kushilevitz E (1993). A communication-privacy tradeoff for modular addition. Inf. Process. Lett. 45: 205–210
Cramer, R., Gilboa, N., Naor, M., Pinkas, B., Poupard, G.: Oblivious Polynomial Evaluation. In: The Privacy Preserving Data Mining (paper by Naor and Pinkas) (2000)
Du, W., Atallah, M.J.: Privacy-preserving statistical analysis. In: Proceeding of the 17th Annual Computer Security Applications Conference. New Orleans, , USA (2001). http://www.cerias.purdue.edu/homes/duw/research/ paper/acsac20 01.ps
Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: A review and open problems. In: New Security Paradigms Workshop, pp. 11–20. Cloudcroft, New Mexico, USA (2001). http://www.cerias. purdue.edu/homes/duw/research/paper/nspw2001.ps
Du, W., Zhan, Z.: Building decision tree classifier on private data. In: C. Clifton, V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.html
Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official J. Eur. Communities I(281), 31–50 (1995) http://europa.eu.int/comm/internal_market/ privacy
Even S., Goldreich O. and Lempel A (1985). A randomized protocol for signing contracts. Commun. ACM 28(6): 637–647
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–228. Edmonton, Alberta, Canada (2002). http://doi.acm.org/10.1145/775047.775080
Feingold, M., Corzine, M., Wyden, M., Nelson, M.: Data Mining Moratorium Act of 2003. U.S. Senate Bill (proposed) (2003). http://thomas.loc.gov/cgi-bin/query/z?c108:S.188:
Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On secure scalar product computation for privacy-preserving data mining. In: C. Park S. Chee (eds.) The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), vol. 3506, pp. 104–120 (2004)
Goldreich, O.: The Foundations of Cryptography, vol. 2, chap. General Cryptographic Protocols. Cambridge University Press, Cambridge (2004). http://www.wisdom. weizmann.ac.il/~oded/PSBookFrag/prot.ps
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp. 218–229 (1987). http://doi.acm.org/10.1145/28395.28420
Standard for privacy of individually identifiable health information. Fed. Regist. 67(157), 53,181–53,273 (2002). http://www.hhs.gov/ocr/hipaa/finalreg.html
Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD (2005)
Ioannidis, I., Grama, A., Atallah, M.: A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 International Conference on Parallel Processing, Vancouver, British Columbia (2002)
Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, pp. 593–599 (2005)
Kantarcioglu, M., Vaidya, J.: An architecture for privacy- preserving mining of client information. In: C. Clifton V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 37–42. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.html
Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), Madison, Wisconsin, pp. 24–31 (2002) http://www.bell-labs.com/ user/minos/DMKD02/Papers/kantarcioglu.pdf
Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004) http://doi.ieeecomputersociety.org/10.1109/TKDE. 2004.45
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). Melbourne, Florida (2003)
Lin, X., Clifton, C., Zhu, M.: Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005) http://dx.doi.org/10.1007/s10115-004-0148-7
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Advances in Cryptology—CRYPTO 2000, pp. 36–54. Springer, Heidelberg (2000)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002) http://www.research. ibm.com/people/l/lindell//id3_abs.html
Mitchell, T.: Machine Learning, 1st edn. McGraw-Hill Science/Engineering/Math, New York (1997)
Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on Computer and communications security, pp. 59–66. ACM Press, San Francisco, California, United States (1998). doi: http://doi.acm.org/10.1145/288090.288106
Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: Proceedings of the 31st annual ACM symposium on Theory of computing, pp. 245–254. ACM Press, Atlanta, Georgia, United States (1999). doi: http://doi.acm.org/10.1145/301250.301312
Naor, M., Pinkas, B.: Efficient oblivious transfer protocols. In: Proceedings of SODA 2001 (SIAM Symposium on Discrete Algorithms), Washington, D.C. (2001)
Okamoto, T., Uchiyama, S.: A new public-key cryptosystem as secure as factoring. In: Advances in Cryptology—Eurocrypt ’98, LNCS 1403, pp. 308–318. Springer, Heidelberg (1998)
Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, LNCS 1592, pp. 223–238. Springer, Heidelberg (1999)
Perry, J.M.: Statement of John M. Perry, President and CEO, Cardsystems Solutions, Inc. before the United States House of Representatives Subcommittee on Oversight and Investigations of the Committee on Financial services. http://financialservices.house.gov/hearings.asp? formmode=detail&hearing =407&comm=4(2005). http: //financialservices.house.gov/hearings.asp?formmode=deta il&hearing=407&comm=4
Rabin, M.: How to exchange secrets by oblivious transfer. Tech. Rep. TR-81, Aiken Computation Laboratory, Harvard University (1981)
Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 682–693. VLDB, Hong Kong (2002) http://www.vldb.org/conf/2002/S19P03.pdf
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644. (2002). http://doi.acm.org/10.1145/775047.775142
Vaidya J., Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003). http://doi.acm.org/10.1145/956750.956776
Vaidya, J., Clifton, C.: Privacy preserving naïve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004) http://www.siam.org/meetings/sdm04/proceedings/sdm04_059.pdf
Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: The 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security. Springer, Storrs, Connecticut (2005) http://dx.doi.org/10.1007/11535706_11
Vaidya J., Clifton C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4) (2005).
Wright, R., Yang, Z.: Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)
Yao, A.C.:How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167. IEEE, NewYork (1986)
Author information
Authors and Affiliations
Corresponding author
Additional information
This material is based upon work supported by the National Science Foundation under Grant No. 0312357. Portions of this work were also supported by a Rutgers Research Resources committee award.
Rights and permissions
About this article
Cite this article
Vaidya, J., Kantarcıoğlu, M. & Clifton, C. Privacy-preserving Naïve Bayes classification. The VLDB Journal 17, 879–898 (2008). https://doi.org/10.1007/s00778-006-0041-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0041-y