Skip to main content
Log in

Privacy-preserving Naïve Bayes classification

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Privacy-preserving data mining—developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 247–255. ACM, Santa Barbara, California, USA (2001). http://doi.acm.org/10.1145/375551.375602

  2. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM, Dallas, TX (2000). http://doi.acm.org/10.1145/342009.335438

  3. Benaloh, J.C.: Secret sharing homomorphisms: Keeping shares of a secret secret. In: A. Odlyzko (ed.) Advances in Cryptography–CRYPTO86: Proceedings, vol. 263, pp. 251–260. Lecture Notes in Computer Science, Springer Heidelberg (1986). http://springerlink.metapress.com/openurl.asp? genre=article&issn= 0302-9743&volume=263&spage=251

  4. Blum, M., Goldwasser, S.: An efficient probabilistic public-key encryption that hides all partial information. In: R. (ed.) Advances in Cryptology—Crypto 84 Proceedings. Springer, Heidelberg (1984)

  5. Chang, Y.C., Lu, C.J.: Oblivious polynomial evaluation and oblivious neural learning. Lecture Notes in Computer Science, vol. 2248, pp. 369+ (2001). citeseer.nj.nec. com/531490.html

  6. Chor B. and Kushilevitz E (1993). A communication-privacy tradeoff for modular addition. Inf. Process. Lett. 45: 205–210

    Article  MATH  MathSciNet  Google Scholar 

  7. Cramer, R., Gilboa, N., Naor, M., Pinkas, B., Poupard, G.: Oblivious Polynomial Evaluation. In: The Privacy Preserving Data Mining (paper by Naor and Pinkas) (2000)

  8. Du, W., Atallah, M.J.: Privacy-preserving statistical analysis. In: Proceeding of the 17th Annual Computer Security Applications Conference. New Orleans, , USA (2001). http://www.cerias.purdue.edu/homes/duw/research/ paper/acsac20 01.ps

  9. Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: A review and open problems. In: New Security Paradigms Workshop, pp. 11–20. Cloudcroft, New Mexico, USA (2001). http://www.cerias. purdue.edu/homes/duw/research/paper/nspw2001.ps

  10. Du, W., Zhan, Z.: Building decision tree classifier on private data. In: C. Clifton, V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.html

  11. Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official J. Eur. Communities I(281), 31–50 (1995) http://europa.eu.int/comm/internal_market/ privacy

  12. Even S., Goldreich O. and Lempel A (1985). A randomized protocol for signing contracts. Commun. ACM 28(6): 637–647

    Article  MathSciNet  Google Scholar 

  13. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–228. Edmonton, Alberta, Canada (2002). http://doi.acm.org/10.1145/775047.775080

  14. Feingold, M., Corzine, M., Wyden, M., Nelson, M.: Data Mining Moratorium Act of 2003. U.S. Senate Bill (proposed) (2003). http://thomas.loc.gov/cgi-bin/query/z?c108:S.188:

  15. Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On secure scalar product computation for privacy-preserving data mining. In: C. Park S. Chee (eds.) The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), vol. 3506, pp. 104–120 (2004)

  16. Goldreich, O.: The Foundations of Cryptography, vol. 2, chap. General Cryptographic Protocols. Cambridge University Press, Cambridge (2004). http://www.wisdom. weizmann.ac.il/~oded/PSBookFrag/prot.ps

  17. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp. 218–229 (1987). http://doi.acm.org/10.1145/28395.28420

  18. Standard for privacy of individually identifiable health information. Fed. Regist. 67(157), 53,181–53,273 (2002). http://www.hhs.gov/ocr/hipaa/finalreg.html

  19. Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD (2005)

  20. Ioannidis, I., Grama, A., Atallah, M.: A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 International Conference on Parallel Processing, Vancouver, British Columbia (2002)

  21. Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, pp. 593–599 (2005)

  22. Kantarcioglu, M., Vaidya, J.: An architecture for privacy- preserving mining of client information. In: C. Clifton V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 37–42. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.html

  23. Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), Madison, Wisconsin, pp. 24–31 (2002) http://www.bell-labs.com/ user/minos/DMKD02/Papers/kantarcioglu.pdf

  24. Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004) http://doi.ieeecomputersociety.org/10.1109/TKDE. 2004.45

    Google Scholar 

  25. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). Melbourne, Florida (2003)

  26. Lin, X., Clifton, C., Zhu, M.: Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005) http://dx.doi.org/10.1007/s10115-004-0148-7

    Google Scholar 

  27. Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Advances in Cryptology—CRYPTO 2000, pp. 36–54. Springer, Heidelberg (2000)

  28. Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002) http://www.research. ibm.com/people/l/lindell//id3_abs.html

    Google Scholar 

  29. Mitchell, T.: Machine Learning, 1st edn. McGraw-Hill Science/Engineering/Math, New York (1997)

  30. Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on Computer and communications security, pp. 59–66. ACM Press, San Francisco, California, United States (1998). doi: http://doi.acm.org/10.1145/288090.288106

  31. Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: Proceedings of the 31st annual ACM symposium on Theory of computing, pp. 245–254. ACM Press, Atlanta, Georgia, United States (1999). doi: http://doi.acm.org/10.1145/301250.301312

  32. Naor, M., Pinkas, B.: Efficient oblivious transfer protocols. In: Proceedings of SODA 2001 (SIAM Symposium on Discrete Algorithms), Washington, D.C. (2001)

  33. Okamoto, T., Uchiyama, S.: A new public-key cryptosystem as secure as factoring. In: Advances in Cryptology—Eurocrypt ’98, LNCS 1403, pp. 308–318. Springer, Heidelberg (1998)

  34. Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, LNCS 1592, pp. 223–238. Springer, Heidelberg (1999)

  35. Perry, J.M.: Statement of John M. Perry, President and CEO, Cardsystems Solutions, Inc. before the United States House of Representatives Subcommittee on Oversight and Investigations of the Committee on Financial services. http://financialservices.house.gov/hearings.asp? formmode=detail&hearing =407&comm=4(2005). http: //financialservices.house.gov/hearings.asp?formmode=deta il&hearing=407&comm=4

  36. Rabin, M.: How to exchange secrets by oblivious transfer. Tech. Rep. TR-81, Aiken Computation Laboratory, Harvard University (1981)

  37. Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 682–693. VLDB, Hong Kong (2002) http://www.vldb.org/conf/2002/S19P03.pdf

  38. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644. (2002). http://doi.acm.org/10.1145/775047.775142

  39. Vaidya J., Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003). http://doi.acm.org/10.1145/956750.956776

  40. Vaidya, J., Clifton, C.: Privacy preserving naïve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004) http://www.siam.org/meetings/sdm04/proceedings/sdm04_059.pdf

  41. Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: The 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security. Springer, Storrs, Connecticut (2005) http://dx.doi.org/10.1007/11535706_11

  42. Vaidya J., Clifton C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4) (2005).

  43. Wright, R., Yang, Z.: Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)

  44. Yao, A.C.:How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167. IEEE, NewYork (1986)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaideep Vaidya.

Additional information

This material is based upon work supported by the National Science Foundation under Grant No. 0312357. Portions of this work were also supported by a Rutgers Research Resources committee award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vaidya, J., Kantarcıoğlu, M. & Clifton, C. Privacy-preserving Naïve Bayes classification. The VLDB Journal 17, 879–898 (2008). https://doi.org/10.1007/s00778-006-0041-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0041-y

Keywords

Navigation