Knowledge and Information Systems

, Volume 14, Issue 2, pp 161–178 | Cite as

Privacy-preserving SVM classification

Regular Paper

Abstract

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

Keywords

Support vector machine Classification Privacy Security 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, Santa Barbara, CA, pp 247–255. [Online]. Available: http://doi.acm.org/10.1145/375551.375602Google Scholar
  2. 2.
    Aggarwal CC, Yu PS (2004) A condensation approach to privacy preserving data mining. In: Lecture notes in computer science, vol 2992, pp 183–199Google Scholar
  3. 3.
    Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD conference on management of data, ACM, Dallas, TX, pp 439–450. [Online]. Available: http://doi.acm.org/10.1145/342009.335438Google Scholar
  4. 4.
    Benaloh JC (1986) Secret sharing homomorphisms: Keeping shares of a secret secret. In: Odlyzko A (ed) Advances in cryptography—CRYPTO86: proceedings, vol 263, Lecture notes in computer science, 1986, Springer-Verlag, Berlin, pp 251–260. [Online]. Available: http://springerlink.metapress.com/openurl.asp?genre=article&issn=0302-9743&volume=263&spage=251Google Scholar
  5. 5.
    Blum M, Goldwasser S (1984) An efficient probabilistic public-key encryption that hides all partial information. In: Blakely R (ed) Advances in cryptology—Crypto 84 proceedings. Springer-Verlag, BerlinGoogle Scholar
  6. 6.
    Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRefGoogle Scholar
  7. 7.
    Christianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New YorkGoogle Scholar
  8. 8.
    Directive 95/46/EC of the European parliament and of the council of 24 october 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off J Eur Communities I(281):31–50Google Scholar
  9. 9.
    Du W, Atallah MJ (2001) Privacy-preserving statistical analysis. In: Proceedings of the 17th annual computer security applications conference, New Orleans, LA, [Online]. Available: http://www.cerias.purdue.edu/homes/duw/research/paper/acsac2001.psGoogle Scholar
  10. 10.
    Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Clifton C, Estivill-Castro V (eds) IEEE international conference on data mining workshop on privacy, security, and data mining, vol~14. Australian Computer Society, Maebashi City, Japan, pp 1–8. [Online]. Available: http://crpit.com/Vol14.htmlGoogle Scholar
  11. 11.
    Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 217–228. [Online]. Available: http://doi.acm.org/10.1145/775047.775080Google Scholar
  12. 12.
    Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the ACM SIGKDD international conference knowledge discovery and data mining (KDD’01), pp 77–86Google Scholar
  13. 13.
    Goethals B, Laur S, Lipmaa H, Mielikäinen T (2004) On secure scalar product computation for privacy-preserving data mining. In: Park C, Chee S (eds) The 7th annual international conference in information security and cryptology (ICISC 2004), vol 3506, pp 104–120Google Scholar
  14. 14.
    Goldreich O, Micali S, Wigderson A (1987) How to play any mental game—a completeness theorem for protocols with honest majority. In: 19th ACM Symposium on the Theory of Computing, pp 218–229. [Online]. Available: http://doi.acm.org/10.1145/28395.28420Google Scholar
  15. 15.
    Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, Baltimore, MDGoogle Scholar
  16. 16.
    Ioannidis I, Grama A, Atallah M (2002) A secure protocol for computing dot-products in clustered and distributed environments. In: The 2002 international conference on parallel processing, Vancouver, British Columbia, CanadaGoogle Scholar
  17. 17.
    Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, pp 593–599Google Scholar
  18. 18.
    Kantarcıoglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng 16(9):1026–1037. [Online]. Available: http://csdl.computer.org/comp/trans/tk/2004/09/k1026abs.htmGoogle Scholar
  19. 19.
    Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), Melbourne, FLGoogle Scholar
  20. 20.
    Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414. [Online]. Available: http://www.springerlink.com/content/va0409rm86aqv9umGoogle Scholar
  21. 21.
    Karr AF, Lin X, Sanil AP, Reiter JP (2005) Secure regressions on distributed databases. J Comput Graph Stat 14:263–279CrossRefMathSciNetGoogle Scholar
  22. 22.
    Lin X, Clifton C, Zhu M (2005) Privacy preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1):68–81Google Scholar
  23. 23.
    Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3):177–206MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Mielikainen T (2004) Privacy problems with anonymized transaction databases. In: Discovery science: 7th international conference proceedings, Lecture notes in computer science, vol 3245, Springer-Verlag, Berlin, January, pp 219–229Google Scholar
  25. 25.
    Naccache D, Stern J (1998) A new public key cryptosystem based on higher residues. In: Proceedings of the 5th ACM conference on computer and communications security, ACM, San Francisco, CA, pp 59–66Google Scholar
  26. 26.
    Okamoto T, Uchiyama S (1998) A new public-key cryptosystem as secure as factoring. In: Advances in cryptology—Eurocrypt ’98, Lecture notes in computer science, vol 1403. Springer-Verlag, Berlin, pp 308–318Google Scholar
  27. 27.
    Oliveira S, Zaiane O (2003) Privacy preserving clustering by data transformation. In: Proceedings of the 18th Brazilian symposium on databases, pp 304–318. [Online]. Available: citeseer.ifi.unizh.ch/oliveira03privacy.htmlGoogle Scholar
  28. 28.
    Paillier P (1999) Public key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology—Eurocrypt ’99 Proceedings, Lecture notes in computer science, vol 1592, Springer-Verlag, Berlin, pp 223–238Google Scholar
  29. 29.
    Ravikumar P, Cohen WW, Fienberg SE (2004) A secure protocol for computing string distance metrics. In: Proceedings of the workshop on privacy and security aspects of data mining at the international conference on data mining, pp 40–46Google Scholar
  30. 30.
    Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of 28th international conference on very large data bases, VLDB, Hong Kong, pp 682–693. [Online]. Available: http://www.vldb.org/conf/2002/S19P03.pdfGoogle Scholar
  31. 31.
    Sanil AP, Karr AF, Lin X, Reiter JP (2004) Privacy preserving regression modelling via distributed computation. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 677–682Google Scholar
  32. 32.
    Standard for privacy of individually identifiable health information. Fed Regist 66(40), 2001. [Online]. Available: http://www.hhs.gov/ocr/hipaa/finalreg.htmlGoogle Scholar
  33. 33.
    Sweeney L, Shamos M (2004) A multiparty computation for randomly ordering players and making random selections. Carnegie Mellon University, School of Computer Science, Tech Rep CMU-ISRI-04-126Google Scholar
  34. 34.
    Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 639–644. [Online]. Available: http://doi.acm.org/10.1145/775047.775142Google Scholar
  35. 35.
    Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 206–215. [Online]. Available: http://doi.acm.org/10.1145/956750.956776Google Scholar
  36. 36.
    Vaidya J, Clifton C (2004) Privacy preserving naï ve bayes classifier for vertically partitioned data. In: 2004 SIAM international conference on data mining, Lake Buena Vista, FL, pp 522–526Google Scholar
  37. 37.
    Vaidya J, Clifton C (2004) Privacy-preserving outlier detection. In: Proceedings of the fourth IEEE international conference on data mining (ICDM’04). IEEE Computer Society Press, Los Alamitos, CA, pp 233–240Google Scholar
  38. 38.
    Vaidya J, Clifton C (2005) Privacy-preserving decision trees over vertically partitioned data. In: The 19th annual IFIP WG 11.3 working conference on data and applications security, Storrs, CT, 7–10 August. Springer, Berlin Heidelberg New York [Online]. Available: http://dx.doi.org/10.1007/11535706_11Google Scholar
  39. 39.
    Vaidya J, Clifton C (2005) Secure set intersection cardinality with application to association rule mining. J Comput Secur 13(4):593–622Google Scholar
  40. 40.
    Vaidya J, Clifton C, Zhu M (2005) Privacy-preserving data mining, 1st~edn., Advances in information security, vol~19, Springer-Verlag, Berlin. [Online]. Available: http://www.springeronline.com/sgw/cda/frontpage/0,11855,4-40356-72-52496494-0,00.htmlGoogle Scholar
  41. 41.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
  42. 42.
    Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD Rec 33(1):50–57. [Online]. Available: http://www.acm.org/sigmod/record/issues/0403/B1.bertion-sigmod-record2.pdfGoogle Scholar
  43. 43.
    Wright R, Yang Z (2004) Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WAGoogle Scholar
  44. 44.
    Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10(3):383–397. [Online]. Available: http://www.springerlink.com/content/r5778lt2q3763213Google Scholar
  45. 45.
    Yao AC (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, IEEE Press, Los Alamitos, CA, pp 162–167.Google Scholar
  46. 46.
    Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: SAC ’06: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, pp 603–610Google Scholar
  47. 47.
    Yu H, Vaidya J (2004) Secure matrix addition. UIOWA Tech Rep UIOWA-CS-04-04. Available: http://hwanjoyu.org/paper/techreport04-04.pdf, Tech. Rep.Google Scholar
  48. 48.
    Yu H, Vaidya J, Jiang X (2006) Privacy-preserving SVM classification on vertically partitioned data. In: Proceedings of PAKDD ’06, Lecture notes in computer science, vol 3918. Springer-Verlag, Berlin, pp 647–656. [Online]. Available: http://dx.doi.org/10.1007/11731139_74Google Scholar
  49. 49.
    Yu H, Vaidya J (in press) Privacy preserving linear SVM classification. Submitted for publication to Data & Knowledge Engineering, Elsevier, Science, AmsterdamGoogle Scholar
  50. 50.
    Zhang N, Wang S, Zhao W (2004) A new scheme on privacy-preserving association rule mining. In: The 8th European conference on principles and practice of knowledge discovery in databases (PKDD 2004), Pisa, Italy. [Online]. Available: http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=3202&spage=484Google Scholar
  51. 51.
    Zhu Y, Liu L (2004) Optimal randomization for privacy preserving data mining. In: KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 761–766Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  1. 1.Management Science and Information Systems DepartmentRutgers UniversityNewarkUSA
  2. 2.Department of Computer ScienceUniversity of IowaIowa CityUSA

Personalised recommendations