Skip to main content
Log in

A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Privacy preserving data mining has become increasingly popular because it allows sharing of privacy-sensitive data for analysis purposes. However, existing techniques such as random perturbation do not fare well for simple yet widely used and efficient Euclidean distance-based mining algorithms. Although original data distributions can be pretty accurately reconstructed from the perturbed data, distances between individual data points are not preserved, leading to poor accuracy for the distance-based mining methods. Besides, they do not generally focus on data reduction. Other studies on secure multi-party computation often concentrate on techniques useful to very specific mining algorithms and scenarios such that they require modification of the mining algorithms and are often difficult to generalize to other mining algorithms or scenarios. This paper proposes a novel generalized approach using the well-known energy compaction power of Fourier-related transforms to hide sensitive data values and to approximately preserve Euclidean distances in centralized and distributed scenarios to a great degree of accuracy. Three algorithms to select the most important transform coefficients are presented, one for a centralized database case, the second one for a horizontally partitioned, and the third one for a vertically partitioned database case. Experimental results demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. EDBT, pp. 183–199 (2004)

  2. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. ICDT pp. 246–258 (2005)

  3. Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Twentieth ACM SIGMOD SIGACT-SIGART symposium on principles of database systems. pp. 247–255. Santa Barbara (2001)

  4. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Fourth international conference of foundations of data organization and algorithms. pp. 69–84 (1993)

  5. Agrawal, R., Srikant, R.: Privacy preserving data mining. In: 2000 ACM SIGMOD conference on management of data. pp. 439–450. Dallas (2000)

  6. Agrawal, S., Haritsa, J.R.: A framework for high-accuracy privacy-preserving mining. ICDE (2005)

  7. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. ICDE. pp. 217–228 (2005)

  8. Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: The SuLQ framework. PODS (2005)

  9. Caragea, D., Silvescu, A., Honavar, V.: Decision tree induction from distributed, heterogeneous, autonomous data sources. In: Conference on intelligent systems design and applications (2003)

  10. Clifton C., Kantarcioglu M., Vaidya J., Lin X., Zhu M. (2002). Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. 4(2): 28–34

    Article  Google Scholar 

  11. Cover, T.M.: Rates of convergence for nearest neighbor procedures. In: Inter. Conference. on Systems Sciences. pp. 413–415 (1968)

  12. D.Corney: Clustering with matlab. http://www.cs.ucl.ac.uk/ staff/D.Corney

  13. Du, W., Clifton, C., Atallah, M.J.: Distributed data mining to protect information privacy. In: NSF information and data management (IDM) workshop (2004)

  14. Duda, R., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, Newyork (1973)

  15. Dwork, C., Nissim, K.: Privacy-preserving data mining on vertically partitioned databases. CRYPTO pp. 528–544 (2004)

  16. Egecioglu O., Ferhatosmanoglu H., Ogras U. (2004). Dimensionality reduction and similarity computation by inner-product approximations. IEEE Trans. Know. Data Eng. 16(6): 714–726

    Article  Google Scholar 

  17. Evfimevski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: 22nd ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. pp. 211 – 222. San Diego (2003)

  18. Evfimevski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02). pp. 217 – 228. Edmonton (2002)

  19. Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. ICDE pp. 205–216 (2005)

  20. Giannella, C., Liu, K., Olsen, T., Kargupta, H.: Communication efficient construction of decision trees over heterogeneously distributed data. In: Fourth IEEE international conference on data mining (2004)

  21. Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

  22. James, J.F.: A student’s guide to Fourier transforms with applications in physics and engineering. Cambridge University Press, London (1995)

  23. Kantarcioglu, M., Vaidya, J.: Privacy preserving naive Bayes classifier for horizontally partitioned data. In: IEEE ICDM workshop on privacy preserving data mining. pp. 3–9. Melbourne (2003)

  24. Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. ICDM, pp. 99–106 (2003)

  25. Kargupta H., Park B.H. (2004). A fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments. IEEE Trans. Knowl Data Eng 16(2): 216–229

    Article  Google Scholar 

  26. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. SIGMOD (2005)

  27. Lindell Y., Pinkas B. (2000). Privacy preserving data mining. Advances in Cryptology (CRYPTO’00), Lect. Notes Comput. Sci. 180, 36–53

    MathSciNet  Google Scholar 

  28. Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: 3rd IEEE international conference on data mining (ICDM’03) pp. 211–218. Melbourne (2003)

  29. Oliveira, S., Zanane, O.R.: Privacy preserving clustering by data transformation. In: 18th Brazilian symposium on databases. pp. 304–318 (2003)

  30. Oliveira, S., Zaiane, O.R.: Privacy-preserving clustering by object similarity-based representation and dimensionality reduction transformation. In: Workshop on privacy and security aspects of data mining (PSDM’04). pp. 21–30 (2004)

  31. Oppenheim, A., Schafer, R.: Discrete-time signal processing. Prentice-Hall, Englewood Cliffs (1999)

  32. Proakis, J.: Digital communications. McGraw-Hill, Newyork (2000)

  33. Rijsbergen, C.J.V.: Information retrieval. Butterworths, London (1979)

  34. Rizvi, S., Haritsa, J.R.: Maintaining data privacy in association rule mining. VLDB pp. 682–693 (2002)

  35. Samarati P. (2001). Protecting respondents’ identities in microdata release. TKDE 13(6): 1010–1027

    Google Scholar 

  36. Vaidya, J.S., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Nineth ACM SIGKDD international conference on knowledge discovery and data mining. pp. 206–215. Washington D.C (2003)

  37. Vaidya, J.S., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. ACM SIGKDD 02, pp. 639–644. Edmonton (2002)

  38. Wallace, G.: The JPEG still picture compression standard. Commun ACM p. 35 (1991)

  39. Wang, J.T., Wang, X., Lin, K.I., Shasha, D., Shapiro, B.A., Zhang, K.: Evaluating a class of distance-mapping algorithms for data mining and clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining (1999)

  40. Wu, C.W.: Privacy preserving data mining: a signal processing perspective and a simple data perturbation protocol. In: The 2nd workshop on privacy preserving data mining (PPDM’03) (2003)

  41. Zhu, Y., Liu, L.: Optimal randomization for privacy preserving data mining. KDD, pp. 761–766 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyuan Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Chen, Z. & Gangopadhyay, A. A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms. The VLDB Journal 15, 293–315 (2006). https://doi.org/10.1007/s00778-006-0010-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0010-5

Keywords

Navigation