Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

  • Yingpeng Sang
  • Hong Shen
  • Hui Tian
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5782)


Random Projection (\(\mathcal{RP}\)) has drawn great interest from the research of privacy-preserving data mining due to its high efficiency and security. It was proposed in [27] where the original data set composed of m attributes, is multiplied with a mixing matrix of dimensions k×m  (m > k) which is random and orthogonal on expectation, and then the k series of perturbed data are released for mining purposes. To our knowledge little work has been done from the view of the attacker, to reconstruct the original data to get some sensitive information, given the data perturbed by \(\mathcal{RP}\) and some priori knowledge, e.g. the mixing matrix, the means and variances of the original data. In the case that the attributes of the original data are mutually independent and sparse, the reconstruction can be treated as a problem of Underdetermined Independent Component Analysis (UICA), but UICA has some permutation and scaling ambiguities. In this paper we propose a reconstruction framework based on UICA and also some techniques to reduce the ambiguities. The cases that the attributes of the original data are correlated and not sparse are also common in data mining. We also propose a reconstruction method for the typical case of Multivariate Gaussian Distribution, based on the method of Maximum A Posterior (MAP). Our experiments show that our reconstructions can achieve high recovery rates, and outperform the reconstructions based on Principle Component Analysis (PCA).


Privacy-preserving Data Mining Data Perturbation Data Reconstruction Underdetermined Independent Component Analysis Maximum A Posteriori Principle Component Analysis 


  1. 1.
    Adam, N., Worthmann, J.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C., Yu, P.S. (eds.): Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM, New York (2000)CrossRefGoogle Scholar
  4. 4.
    Agrawal, S., Haritsa, J.R.: A Framework for High-Accuracy Privacy-Preserving Mining. In: Proc. 21st Int’l Conf. Data Eng. (ICDE 2005), pp. 193–204 (2005)Google Scholar
  5. 5.
    Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V.: Disclosure Limitation of Sensitive Rules. In: Proc. of IEEE Knowledge and Data Engineering Workshop, pp. C45–C52 (1999)Google Scholar
  6. 6.
    Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Processing 81(11), 2353–2362 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    Cao, X., Liu, R.: General Approach to Blind Source Separation. IEEE Transactions on Signal Processing 44(3), 562–571 (1996)CrossRefGoogle Scholar
  8. 8.
    Chen, K., Sun, G., Liu, L.: Towards Attack-resilient Geometric Data Perturbation. In: Proceedings of the 2007 SIAM International Conference on Data Mining (SDM 2007), Minneapolis, MN (April 2007)Google Scholar
  9. 9.
    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic Decomposition by Basis Pursuit. SIAM Review 43(1), 129–159 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Dasgupta, S., Hsu, D., Verma, N.: A Concentration Theorem for Projections. In: Proc. the 22nd Conference in Uncertainty in Artificial Intelligence, pp. 1–17. AUAI Press (2006)Google Scholar
  12. 12.
    Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proc. 22nd ACM Symposium on Principles of Database Systems (PODS 2003), pp. 211–222 (2003)Google Scholar
  13. 13.
    Fienberg, S.E., McIntyre, J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 14–29. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Goldreich, O.: Foundations of Cryptography: Basic Applications, vol. 2. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  15. 15.
    Gretton, A., Fukumizu, K., Teo, C., Song, L., Scholkopf, B., Smola, A.: A Kernel Statistical Test of Independence. In: Proc. Advances in Neural Information Processing Systems (NIPS 2007), pp. 585–592. MIT Press, Cambridge (2007)Google Scholar
  16. 16.
    Guo, S., Wu, X.: Deriving private information from arbitrarily projected data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 84–95. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Huang, Z., Du, W., Chen, B.: Deriving Private Information from Randomized Data. In: SIGMOD 2005, pp. 37–48. ACM, New York (2005)Google Scholar
  18. 18.
    Hyvärinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks 13, 411–430 (2000)CrossRefGoogle Scholar
  19. 19.
    Jha, S., Kruger, L., McDaniel, P.: Privacy Preserving Clustering. In: de di Vimercati, S.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Kantarcioglu, M., Clifton, C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE Transactions on Knowledge and Data Engineering 16(9), 1026–1037 (2004)CrossRefGoogle Scholar
  21. 21.
    Kankainen, A., Ushakov, N.: A consistent modification of a test for independence based on the empirical characteristic function. Journal of Mathematical Sciences, 1–10 (1998)Google Scholar
  22. 22.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proc. 3rd IEEE International Conference on Data Mining (ICDM 2003), p. 99 (2003)Google Scholar
  23. 23.
    Lefons, E., Silvestri, A., Tangorra, F.: An analytic approach to statistical databases. In: Proceedings of the 9th VLDB Conference (1983)Google Scholar
  24. 24.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proc. ICDE 2007, pp. 106–115 (2007)Google Scholar
  25. 25.
    Liew, C.K., Choi, U.J., Liew, C.J.: A data distortion by probability distribution. ACM Transactions on Database Systems 10(3), 395–411 (1985)CrossRefzbMATHGoogle Scholar
  26. 26.
    Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  27. 27.
    Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)CrossRefGoogle Scholar
  28. 28.
    Liu, K., Giannella, C., Kargupta, H.: An Attacker’s View of Distance Preserving Maps for Privacy Preserving Data Mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  29. 29.
    Liu, K.: Multiplicative Data Perturbation for Privacy Preserving Data Mining., PhD thesis, University of Maryland, Baltimore County, Baltimore, MD (January 2007)Google Scholar
  30. 30.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy Beyond k-Anonymity. In: Proc. of ICDE 2006, p. 24 (2006)Google Scholar
  31. 31.
    O’Grady, P.D., Pearlmutter, B.A., Rickard, S.T.: Survey of Sparse and Non-Sparse Methods in Source Separation. International Journal of Imaging Systems and Technology 15(1), 18–33 (2005)CrossRefGoogle Scholar
  32. 32.
    Oliveira, S.R.M., Zaïane, O.R.: A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Computers & Security 26(1), 81–93 (2007)CrossRefGoogle Scholar
  33. 33.
    Rizvi, S., Haritsa, J.: Maintaining Data Privacy in Association Rule Mining. In: Proc. of 28th Intl. Conf. on Very Large Databases (VLDB) (August 2002)Google Scholar
  34. 34.
    Saygin, Y., Verykios, V.S., Clifton, C.: Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4), 45–54 (2001)CrossRefGoogle Scholar
  35. 35.
    Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Szekely, G.J., Rizzo, M.L.: Testing for Equal Distributions in High Dimension, InterStat, November (5)Google Scholar
  37. 37.
    Theis, F.J., Lang, E.W., Puntonet, C.G.: A Geometric Algorithm for Overcomplete Linear ICA. Neurocomputing 56, 381–398 (2004)CrossRefGoogle Scholar
  38. 38.
    Turgay, E.O., Pedersen, T.B., Saygin, Y., Savas, E., Levi, A.: Disclosure Risks of Distance Preserving Data Transformations. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 79–94. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  39. 39.
    Verykios, V., Elmagarmid, A., Elisa, B., Elena, D., Saygin, Y., Dasseni, E.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)CrossRefzbMATHGoogle Scholar
  40. 40.
    Yang, Z., Zhong, S., Wright, R.N.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. In: Proc. of the 2005 SIAM International Conference on Data Mining, SDM (2005)Google Scholar
  41. 41.
    Zibulevsky, M., Pearlmutter, B.A.: Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural Computation 13(4), 863–882 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Yingpeng Sang
    • 1
  • Hong Shen
    • 1
  • Hui Tian
    • 2
  1. 1.School of Computer ScienceThe University of AdelaideAustralia
  2. 2.School of Mathematical ScienceThe University of AdelaideAustralia

Personalised recommendations