The VLDB Journal

, Volume 17, Issue 4, pp 879–898 | Cite as

Privacy-preserving Naïve Bayes classification

  • Jaideep Vaidya
  • Murat Kantarcıoğlu
  • Chris Clifton
Regular Paper

Abstract

Privacy-preserving data mining—developing models without seeing the data – is receiving growing attention. This paper assumes a privacy-preserving distributed data mining scenario: data sources collaborate to develop a global model, but must not disclose their data to others. The problem of secure distributed classification is an important one. In many situations, data is split between multiple organizations. These organizations may want to utilize all of the data to create more accurate predictive models while revealing neither their training data/databases nor the instances to be classified. Naïve Bayes is often used as a baseline classifier, consistently providing reasonable classification performance. This paper brings privacy-preservation to that baseline, presenting protocols to develop a Naïve Bayes classifier on both vertically as well as horizontally partitioned data.

Keywords

Data mining Privacy Security Naïve Bayes Distributed computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 247–255. ACM, Santa Barbara, California, USA (2001). http://doi.acm.org/10.1145/375551.375602Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM, Dallas, TX (2000). http://doi.acm.org/10.1145/342009.335438Google Scholar
  3. 3.
    Benaloh, J.C.: Secret sharing homomorphisms: Keeping shares of a secret secret. In: A. Odlyzko (ed.) Advances in Cryptography–CRYPTO86: Proceedings, vol. 263, pp. 251–260. Lecture Notes in Computer Science, Springer Heidelberg (1986). http://springerlink.metapress.com/openurl.asp? genre=article&issn= 0302-9743&volume=263&spage=251Google Scholar
  4. 4.
    Blum, M., Goldwasser, S.: An efficient probabilistic public-key encryption that hides all partial information. In: R. (ed.) Advances in Cryptology—Crypto 84 Proceedings. Springer, Heidelberg (1984)Google Scholar
  5. 5.
    Chang, Y.C., Lu, C.J.: Oblivious polynomial evaluation and oblivious neural learning. Lecture Notes in Computer Science, vol. 2248, pp. 369+ (2001). citeseer.nj.nec. com/531490.htmlGoogle Scholar
  6. 6.
    Chor B. and Kushilevitz E (1993). A communication-privacy tradeoff for modular addition. Inf. Process. Lett. 45: 205–210 MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Cramer, R., Gilboa, N., Naor, M., Pinkas, B., Poupard, G.: Oblivious Polynomial Evaluation. In: The Privacy Preserving Data Mining (paper by Naor and Pinkas) (2000)Google Scholar
  8. 8.
    Du, W., Atallah, M.J.: Privacy-preserving statistical analysis. In: Proceeding of the 17th Annual Computer Security Applications Conference. New Orleans, , USA (2001). http://www.cerias.purdue.edu/homes/duw/research/ paper/acsac20 01.psGoogle Scholar
  9. 9.
    Du, W., Atallah, M.J.: Secure multi-party computation problems and their applications: A review and open problems. In: New Security Paradigms Workshop, pp. 11–20. Cloudcroft, New Mexico, USA (2001). http://www.cerias. purdue.edu/homes/duw/research/paper/nspw2001.psGoogle Scholar
  10. 10.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: C. Clifton, V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 1–8. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.htmlGoogle Scholar
  11. 11.
    Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Official J. Eur. Communities I(281), 31–50 (1995) http://europa.eu.int/comm/internal_market/ privacyGoogle Scholar
  12. 12.
    Even S., Goldreich O. and Lempel A (1985). A randomized protocol for signing contracts. Commun. ACM 28(6): 637–647 CrossRefMathSciNetGoogle Scholar
  13. 13.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–228. Edmonton, Alberta, Canada (2002). http://doi.acm.org/10.1145/775047.775080Google Scholar
  14. 14.
    Feingold, M., Corzine, M., Wyden, M., Nelson, M.: Data Mining Moratorium Act of 2003. U.S. Senate Bill (proposed) (2003). http://thomas.loc.gov/cgi-bin/query/z?c108:S.188:Google Scholar
  15. 15.
    Goethals, B., Laur, S., Lipmaa, H., Mielikäinen, T.: On secure scalar product computation for privacy-preserving data mining. In: C. Park S. Chee (eds.) The 7th Annual International Conference in Information Security and Cryptology (ICISC 2004), vol. 3506, pp. 104–120 (2004)Google Scholar
  16. 16.
    Goldreich, O.: The Foundations of Cryptography, vol. 2, chap. General Cryptographic Protocols. Cambridge University Press, Cambridge (2004). http://www.wisdom. weizmann.ac.il/~oded/PSBookFrag/prot.psGoogle Scholar
  17. 17.
    Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp. 218–229 (1987). http://doi.acm.org/10.1145/28395.28420Google Scholar
  18. 18.
    Standard for privacy of individually identifiable health information. Fed. Regist. 67(157), 53,181–53,273 (2002). http://www.hhs.gov/ocr/hipaa/finalreg.htmlGoogle Scholar
  19. 19.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD (2005)Google Scholar
  20. 20.
    Ioannidis, I., Grama, A., Atallah, M.: A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 International Conference on Parallel Processing, Vancouver, British Columbia (2002)Google Scholar
  21. 21.
    Jagannathan, G., Wright, R.N.: Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, pp. 593–599 (2005)Google Scholar
  22. 22.
    Kantarcioglu, M., Vaidya, J.: An architecture for privacy- preserving mining of client information. In: C. Clifton V. Estivill-Castro (eds.) IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, vol. 14, pp. 37–42. Australian Computer Society, Maebashi City, Japan (2002). http://crpit.com/Vol14.htmlGoogle Scholar
  23. 23.
    Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’02), Madison, Wisconsin, pp. 24–31 (2002) http://www.bell-labs.com/ user/minos/DMKD02/Papers/kantarcioglu.pdfGoogle Scholar
  24. 24.
    Kantarcıoğlu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng. 16(9), 1026–1037 (2004) http://doi.ieeecomputersociety.org/10.1109/TKDE. 2004.45Google Scholar
  25. 25.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). Melbourne, Florida (2003)Google Scholar
  26. 26.
    Lin, X., Clifton, C., Zhu, M.: Privacy preserving clustering with distributed EM mixture modeling. Knowl. Inf. Syst. 8(1), 68–81 (2005) http://dx.doi.org/10.1007/s10115-004-0148-7Google Scholar
  27. 27.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Advances in Cryptology—CRYPTO 2000, pp. 36–54. Springer, Heidelberg (2000)Google Scholar
  28. 28.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. J. Cryptol. 15(3), 177–206 (2002) http://www.research. ibm.com/people/l/lindell//id3_abs.htmlGoogle Scholar
  29. 29.
    Mitchell, T.: Machine Learning, 1st edn. McGraw-Hill Science/Engineering/Math, New York (1997)Google Scholar
  30. 30.
    Naccache, D., Stern, J.: A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on Computer and communications security, pp. 59–66. ACM Press, San Francisco, California, United States (1998). doi: http://doi.acm.org/10.1145/288090.288106Google Scholar
  31. 31.
    Naor, M., Pinkas, B.: Oblivious transfer and polynomial evaluation. In: Proceedings of the 31st annual ACM symposium on Theory of computing, pp. 245–254. ACM Press, Atlanta, Georgia, United States (1999). doi: http://doi.acm.org/10.1145/301250.301312Google Scholar
  32. 32.
    Naor, M., Pinkas, B.: Efficient oblivious transfer protocols. In: Proceedings of SODA 2001 (SIAM Symposium on Discrete Algorithms), Washington, D.C. (2001)Google Scholar
  33. 33.
    Okamoto, T., Uchiyama, S.: A new public-key cryptosystem as secure as factoring. In: Advances in Cryptology—Eurocrypt ’98, LNCS 1403, pp. 308–318. Springer, Heidelberg (1998)Google Scholar
  34. 34.
    Paillier, P.: Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, LNCS 1592, pp. 223–238. Springer, Heidelberg (1999)Google Scholar
  35. 35.
    Perry, J.M.: Statement of John M. Perry, President and CEO, Cardsystems Solutions, Inc. before the United States House of Representatives Subcommittee on Oversight and Investigations of the Committee on Financial services. http://financialservices.house.gov/hearings.asp? formmode=detail&hearing =407&comm=4(2005). http: //financialservices.house.gov/hearings.asp?formmode=deta il&hearing=407&comm=4Google Scholar
  36. 36.
    Rabin, M.: How to exchange secrets by oblivious transfer. Tech. Rep. TR-81, Aiken Computation Laboratory, Harvard University (1981)Google Scholar
  37. 37.
    Rizvi, S.J., Haritsa, J.R.: Maintaining data privacy in association rule mining. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 682–693. VLDB, Hong Kong (2002) http://www.vldb.org/conf/2002/S19P03.pdfGoogle Scholar
  38. 38.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644. (2002). http://doi.acm.org/10.1145/775047.775142Google Scholar
  39. 39.
    Vaidya J., Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003). http://doi.acm.org/10.1145/956750.956776Google Scholar
  40. 40.
    Vaidya, J., Clifton, C.: Privacy preserving naïve bayes classifier for vertically partitioned data. In: 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, pp. 522–526 (2004) http://www.siam.org/meetings/sdm04/proceedings/sdm04_059.pdfGoogle Scholar
  41. 41.
    Vaidya, J., Clifton, C.: Privacy-preserving decision trees over vertically partitioned data. In: The 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security. Springer, Storrs, Connecticut (2005) http://dx.doi.org/10.1007/11535706_11Google Scholar
  42. 42.
    Vaidya J., Clifton C.: Secure set intersection cardinality with application to association rule mining. J. Comput. Secur. 13(4) (2005).Google Scholar
  43. 43.
    Wright, R., Yang, Z.: Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)Google Scholar
  44. 44.
    Yao, A.C.:How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symposium on Foundations of Computer Science, pp. 162–167. IEEE, NewYork (1986)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Jaideep Vaidya
    • 1
  • Murat Kantarcıoğlu
    • 2
  • Chris Clifton
    • 3
  1. 1.Rutgers UniversityNewarkUSA
  2. 2.University of Texas at DallasDallasUSA
  3. 3.Purdue UniversityWest LafayetteUSA

Personalised recommendations