Advertisement

A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining

  • Keke Chen
  • Ling Liu
Part of the Advances in Database Systems book series (ADBS, volume 34)

The major challenge of data perturbation is to achieve the desired balance between the level of privacy guarantee and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving data mining systems and applications. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired level of data utility by selectively preserving the mining task and model specific information during the data perturbation process. By preserving the task and model specific information, a set of “transformation-invariant data mining models” can be applied to the perturbed data directly, achieving the required model accuracy. Often a multiplicative perturbation algorithm may find multiple data transformations that preserve the required data utility. Thus the next major challenge is to find a good transformation that provides a satisfactory level of privacy guarantee. In this chapter, we review three representative multiplicative perturbation methods: rotation perturbation, projection perturbation, and geometric perturbation, and discuss the technical issues and research challenges. We first describe the mining task and model specific information for a class of data mining models, and the transformations that can (approximately) preserve the information. Then we discuss the design of appropriate privacy evaluation models for multiplicative perturbations, and give an overview of how we use the privacy evaluation model to measure the level of privacy guarantee in the context of different types of attacks.

Keywords

Multiplicative perturbation random projection sketches 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C. C., and Yu, P. S. A condensation approach to privacy preserving data mining. Proc. of Intl. Conf. on Extending Database Technology (EDBT) 2992 (2004), 183–199.Google Scholar
  2. 2.
    Aggarwal, C. C., and Yu, P. S. On privacy-preservation of text and sparse binary data with sketches. SIAM Data Mining Conference (2007).Google Scholar
  3. 3.
    Agrawal, D., and Aggarwal, C. C. On the design and quantification of privacy preserving data mining algorithms. Proc. of ACM PODS Conference (2002).Google Scholar
  4. 4.
    Agrawal, R., and Srikant, R. Privacy-preserving data mining. Proc. of ACM SIGMOD Conference (2000).Google Scholar
  5. 5.
    Alon, N., Matias, Y., and Szegedy, M. The space complexity of approximating the frequency moments. Proc. of ACM PODS Conference (1996).Google Scholar
  6. 6.
    Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. OPTICS: Ordering points to identify the clustering structure. Proc. of ACM SIGMOD Conference (1999), 49–60.Google Scholar
  7. 7.
    Chen, K., and Liu, L. A random geometric perturbation approach to privacy-preserving data classification. Technical Report, College of Computing, Georgia Tech (2005).Google Scholar
  8. 8.
    Chen, K., and Liu, L. A random rotation perturbation approach to privacy preserving data classification. Proc. of Intl. Conf. on Data Mining (ICDM) (2005).Google Scholar
  9. 9.
    Chen, K., and Liu, L. Towards attack-resilient geometric data perturbation. SIAM Data Mining Conference (2007).Google Scholar
  10. 10.
    Cristianini, N., and Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.Google Scholar
  11. 11.
    Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Second International Conference on Knowledge Discovery and Data Mining (1996), 226–231.Google Scholar
  12. 12.
    Evfimievski, A., Gehrke, J., and Srikant, R. Limiting privacy breaches in privacy preserving data mining. Proc. of ACM PODS Conference (2003).Google Scholar
  13. 13.
    Evfimievski, A., Srikant, R., Agrawal, R., and Gehrke, J. Privacy preserving mining of association rules. Proc. of ACM SIGKDD Conference (2002).Google Scholar
  14. 14.
    Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., and Wright, R. N. Secure multiparty computation of approximations. In ICALP ’01: Proceedings of the 28th International Colloquium on Automata, Languages and Programming, (2001), Springer-Verlag, pp. 927–938.Google Scholar
  15. 15.
    Guo, S., and Wu, X. Deriving private information from arbitrarily projected data. In Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD07) (Warsaw, Poland, Sept 2007).Google Scholar
  16. 16.
    Hastie, T., Tibshirani, R., and Friedmann, J. The Elements of Statistical Learning. Springer-Verlag, 2001.Google Scholar
  17. 17.
    Hinneburg, A., and Keim, D. A. An efficient approach to clustering in large multimedia databases with noise. Proc. of ACM SIGKDD Conference (1998), 58–65.Google Scholar
  18. 18.
    Hyvarinen, A., Karhunen, J., and Oja, E. Independent Component Analysis. Wiley-Interscience, 2001.Google Scholar
  19. 19.
    Jain, A. K., and Dubes, R. C. Data clustering: A review. ACM Computing Surveys 31 (1999), 264–323.CrossRefGoogle Scholar
  20. 20.
    Jiang, T. How many entries in a typical orthogonal matrix can be approximated by independent normals. To appear in The Annals of Probability (2005).Google Scholar
  21. 21.
    Johnson, W. B., and Lindenstrauss, J. Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics 26 (1984).Google Scholar
  22. 22.
    Kargupta, H., Datta, S., Wang, Q., and Sivakumar, K. On the privacy preserving properties of random data perturbation techniques. Proc. of Intl. Conf. on Data Mining (ICDM) (2003).Google Scholar
  23. 23.
    Kim, J. J., and Winkler, W. E. Multiplicative noise for masking continuous data. Tech. Rep. Statistics #2003-01, Statistical Research Division, U.S. Bureau of the Census, Washington D.C., April 2003.Google Scholar
  24. 24.
    LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. Mondrain multidimensional k-anonymity. Proc. of IEEE Intl. Conf. on Data Eng. (ICDE) (2006).Google Scholar
  25. 25.
    Lewicki, M. S., and Sejnowski, T. J. Learning overcomplet representations. Neural Computation 12, 2 (2000).Google Scholar
  26. 26.
    Lindell, Y., and Pinkas, B. Privacy preserving data mining. Journal of Cryptology 15, 3 (2000), 177–206.CrossRefMathSciNetGoogle Scholar
  27. 27.
    Liu, K., Giannella, C., and Kargupta, H. An attacker’s view of distance preserving maps for privacy preserving data mining. In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’06) (Berlin, Germany, September 2006).Google Scholar
  28. 28.
    Liu, K., Kargupta, H., and Ryan, J. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering (TKDE) 18, 1 (January 2006), 92–106.CrossRefGoogle Scholar
  29. 29.
    Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. l-diversity: Privacy beyond k-anonymity. Proc. of IEEE Intl. Conf. on Data Eng. (ICDE) (2006).Google Scholar
  30. 30.
    Neter, J., Kutner, M. H., Nachtsheim, C. J., and Wasserman, W. Applied Linear Statistical Methods. WCB/McGraw-Hill, 1996.Google Scholar
  31. 31.
    Oliveira, S. R. M., and Zaïane, O. R. Privacy preservation when sharing data for clustering. In Proceedings of the International Workshop on Secure Data Management in a Connected World (Toronto, Canada, August 2004), pp. 67–82.Google Scholar
  32. 32.
    Sadun, L. Applied Linear Algebra: the Decoupling Principle. Prentice Hall, 2001.Google Scholar
  33. 33.
    Stewart, G. The efficient generation of random orthogonal matrices with an application to condition estimation. SIAM Journal on Numerical Analysis 17 (1980).Google Scholar
  34. 34.
    Sweeney, L. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10, 5 (2002).CrossRefGoogle Scholar
  35. 35.
    Vaidya, J., and Clifton, C. Privacy preserving k-means clustering over vertically partitioned data. Proc. of ACM SIGKDD Conference (2003).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Keke Chen
    • 1
  • Ling Liu
    • 2
  1. 1.College of ComputingGeorgia Institute of TechnologySanta MonicaUSA
  2. 2.College of ComputingGeorgia Institute of TechnologyArlingtonUSA

Personalised recommendations