Privacy-Preserving Evaluation of Generalization Error and Its Application to Model and Attribute Selection

  • Jun Sakuma
  • Rebecca N. Wright
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)


Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postprocessing— such as model selection or attribute selection—play a prominent role in achieving higher classification accuracy. In this paper, we show generalization error of classifiers in privacy-preserving classification can be securely evaluated without sharing prediction results. Our main technical contribution is a new generalized Hamming distance protocol that is universally applicable to preprocessing and postprocessing of various privacy-preserving classification problems, such as model selection in support vector machine and attribute selection in naive Bayes classification.


Polynomial Kernel Generalization Error Attribute Selection Private Input Privacy Preserve Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Du, W., Zhan, Z.: Building decision tree classifier on private data. In: Proceedings of the IEEE international conference on Privacy, security and data mining, vol. 14, pp. 1–8. Australian Computer Society (2002)Google Scholar
  3. 3.
    Goethals, B., Laur, S., Lipmaa, H., Mielikainen, T.: On private scalar product computation for privacy-preserving data mining. In: Park, C.-s., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 104–120. Springer, Heidelberg (2005)Google Scholar
  4. 4.
    Goldreich, O.: Foundations of Cryptography, Basic Applications, vol. 2. Cambridge University Press, Cambridge (2004)Google Scholar
  5. 5.
    Laur, S., Lipmaa, H., Mielikäinen, T.: Cryptographically private support vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 618–624. ACM Press, New York (2006)CrossRefGoogle Scholar
  6. 6.
    Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. Journal of Cryptology 15(3), 177–206 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay: a secure two-party computation system. In: Proceedings of the 13th USENIX Security Symposium, pp. 287–302 (2004)Google Scholar
  8. 8.
    Paillier, P.: Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)Google Scholar
  9. 9.
    Schölkopf, B., Smola, A.J.: Learning with kernels. MIT Press, Cambridge (2002)Google Scholar
  10. 10.
    Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Series: Advances in Information Security, vol. 19 (2006)Google Scholar
  11. 11.
    Vaidya, J., Kantarcıoğlu, M., Clifton, C.: Privacy-preserving Naïve Bayes classification. The VLDB Journal The International Journal on Very Large Data Bases, 1–20 (2007)Google Scholar
  12. 12.
    Yang, Z., Zhong, S., Wright, R.N.: Towards Privacy-Preserving Model Selection. In: Bonchi, F., Ferrari, E., Malin, B., Saygın, Y. (eds.) PinKDD 2007. LNCS, vol. 4890, pp. 138–152. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Yao, A.C.-C.: How to generate and exchange secrets. In: Proceedings of the 27th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 162–167 (1986)Google Scholar
  14. 14.
    Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of the 2006 ACM symposium on Applied computing (SAC), pp. 603–610. ACM Press, New York (2006)CrossRefGoogle Scholar
  15. 15.
    Zhan, J., Chang, L.W., Matwin, S.: Privacy Preserving K-nearest Neighbor Classification. International Journal of Network Security 1(1), 46–51 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jun Sakuma
    • 1
  • Rebecca N. Wright
    • 2
  1. 1.University of TsukubaTsukubaJapan
  2. 2.Rutgers UniversityPiscatawayUSA

Personalised recommendations