Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

Sang, Yingpeng; Shen, Hong; Tian, Hui

doi:10.1007/978-3-642-04174-7_22

Yingpeng Sang²²,
Hong Shen²² &
Hui Tian²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5782))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3778 Accesses
5 Citations

Abstract

Random Projection (\(\mathcal{RP}\)) has drawn great interest from the research of privacy-preserving data mining due to its high efficiency and security. It was proposed in [27] where the original data set composed of m attributes, is multiplied with a mixing matrix of dimensions k×m (m > k) which is random and orthogonal on expectation, and then the k series of perturbed data are released for mining purposes. To our knowledge little work has been done from the view of the attacker, to reconstruct the original data to get some sensitive information, given the data perturbed by \(\mathcal{RP}\) and some priori knowledge, e.g. the mixing matrix, the means and variances of the original data. In the case that the attributes of the original data are mutually independent and sparse, the reconstruction can be treated as a problem of Underdetermined Independent Component Analysis (UICA), but UICA has some permutation and scaling ambiguities. In this paper we propose a reconstruction framework based on UICA and also some techniques to reduce the ambiguities. The cases that the attributes of the original data are correlated and not sparse are also common in data mining. We also propose a reconstruction method for the typical case of Multivariate Gaussian Distribution, based on the method of Maximum A Posterior (MAP). Our experiments show that our reconstructions can achieve high recovery rates, and outperform the reconstructions based on Principle Component Analysis (PCA).

This work is partially supported by Australian Research Council Discovery Project grant #DP0985063.

Download to read the full chapter text

Chapter PDF

Entropy-Randomized Projection

Article 21 March 2021

A survey: deriving private information from perturbed data

Article 14 September 2015

Use of EM algorithm for data reduction under sparsity assumption

Article 03 May 2016

Keywords

References

Adam, N., Worthmann, J.: Security-control methods for statistical databases: a comparative study. ACM Computing Surveys 21(4), 515–556 (1989)
Article Google Scholar
Aggarwal, C., Yu, P.S. (eds.): Privacy-Preserving Data Mining: Models and Algorithms. Springer, Heidelberg (2008)
Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the 2000 ACM SIGMOD Conference on Management of Data, pp. 439–450. ACM, New York (2000)
Chapter Google Scholar
Agrawal, S., Haritsa, J.R.: A Framework for High-Accuracy Privacy-Preserving Mining. In: Proc. 21st Int’l Conf. Data Eng. (ICDE 2005), pp. 193–204 (2005)
Google Scholar
Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V.: Disclosure Limitation of Sensitive Rules. In: Proc. of IEEE Knowledge and Data Engineering Workshop, pp. C45–C52 (1999)
Google Scholar
Bofill, P., Zibulevsky, M.: Underdetermined blind source separation using sparse representations. Signal Processing 81(11), 2353–2362 (2001)
Article MATH Google Scholar
Cao, X., Liu, R.: General Approach to Blind Source Separation. IEEE Transactions on Signal Processing 44(3), 562–571 (1996)
Article Google Scholar
Chen, K., Sun, G., Liu, L.: Towards Attack-resilient Geometric Data Perturbation. In: Proceedings of the 2007 SIAM International Conference on Data Mining (SDM 2007), Minneapolis, MN (April 2007)
Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic Decomposition by Basis Pursuit. SIAM Review 43(1), 129–159 (2001)
Article MathSciNet MATH Google Scholar
Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)
Article MathSciNet MATH Google Scholar
Dasgupta, S., Hsu, D., Verma, N.: A Concentration Theorem for Projections. In: Proc. the 22nd Conference in Uncertainty in Artificial Intelligence, pp. 1–17. AUAI Press (2006)
Google Scholar
Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proc. 22nd ACM Symposium on Principles of Database Systems (PODS 2003), pp. 211–222 (2003)
Google Scholar
Fienberg, S.E., McIntyre, J.: Data Swapping: Variations on a Theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 14–29. Springer, Heidelberg (2004)
Chapter Google Scholar
Goldreich, O.: Foundations of Cryptography: Basic Applications, vol. 2. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Gretton, A., Fukumizu, K., Teo, C., Song, L., Scholkopf, B., Smola, A.: A Kernel Statistical Test of Independence. In: Proc. Advances in Neural Information Processing Systems (NIPS 2007), pp. 585–592. MIT Press, Cambridge (2007)
Google Scholar
Guo, S., Wu, X.: Deriving private information from arbitrarily projected data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 84–95. Springer, Heidelberg (2007)
Chapter Google Scholar
Huang, Z., Du, W., Chen, B.: Deriving Private Information from Randomized Data. In: SIGMOD 2005, pp. 37–48. ACM, New York (2005)
Google Scholar
Hyvärinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks 13, 411–430 (2000)
Article Google Scholar
Jha, S., Kruger, L., McDaniel, P.: Privacy Preserving Clustering. In: de di Vimercati, S.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)
Chapter Google Scholar
Kantarcioglu, M., Clifton, C.: Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data. IEEE Transactions on Knowledge and Data Engineering 16(9), 1026–1037 (2004)
Article Google Scholar
Kankainen, A., Ushakov, N.: A consistent modification of a test for independence based on the empirical characteristic function. Journal of Mathematical Sciences, 1–10 (1998)
Google Scholar
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Proc. 3rd IEEE International Conference on Data Mining (ICDM 2003), p. 99 (2003)
Google Scholar
Lefons, E., Silvestri, A., Tangorra, F.: An analytic approach to statistical databases. In: Proceedings of the 9th VLDB Conference (1983)
Google Scholar
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: Proc. ICDE 2007, pp. 106–115 (2007)
Google Scholar
Liew, C.K., Choi, U.J., Liew, C.J.: A data distortion by probability distribution. ACM Transactions on Database Systems 10(3), 395–411 (1985)
Article MATH Google Scholar
Lindell, Y., Pinkas, B.: Privacy Preserving Data Mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Chapter Google Scholar
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)
Article Google Scholar
Liu, K., Giannella, C., Kargupta, H.: An Attacker’s View of Distance Preserving Maps for Privacy Preserving Data Mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)
Chapter Google Scholar
Liu, K.: Multiplicative Data Perturbation for Privacy Preserving Data Mining., PhD thesis, University of Maryland, Baltimore County, Baltimore, MD (January 2007)
Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy Beyond k-Anonymity. In: Proc. of ICDE 2006, p. 24 (2006)
Google Scholar
O’Grady, P.D., Pearlmutter, B.A., Rickard, S.T.: Survey of Sparse and Non-Sparse Methods in Source Separation. International Journal of Imaging Systems and Technology 15(1), 18–33 (2005)
Article Google Scholar
Oliveira, S.R.M., Zaïane, O.R.: A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Computers & Security 26(1), 81–93 (2007)
Article Google Scholar
Rizvi, S., Haritsa, J.: Maintaining Data Privacy in Association Rule Mining. In: Proc. of 28th Intl. Conf. on Very Large Databases (VLDB) (August 2002)
Google Scholar
Saygin, Y., Verykios, V.S., Clifton, C.: Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4), 45–54 (2001)
Article Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Szekely, G.J., Rizzo, M.L.: Testing for Equal Distributions in High Dimension, InterStat, November (5)
Google Scholar
Theis, F.J., Lang, E.W., Puntonet, C.G.: A Geometric Algorithm for Overcomplete Linear ICA. Neurocomputing 56, 381–398 (2004)
Article Google Scholar
Turgay, E.O., Pedersen, T.B., Saygin, Y., Savas, E., Levi, A.: Disclosure Risks of Distance Preserving Data Transformations. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 79–94. Springer, Heidelberg (2008)
Chapter Google Scholar
Verykios, V., Elmagarmid, A., Elisa, B., Elena, D., Saygin, Y., Dasseni, E.: Association Rule Hiding. IEEE Transactions on Knowledge and Data Engineering 16(4), 434–447 (2004)
Article MATH Google Scholar
Yang, Z., Zhong, S., Wright, R.N.: Privacy-Preserving Classification of Customer Data without Loss of Accuracy. In: Proc. of the 2005 SIAM International Conference on Data Mining, SDM (2005)
Google Scholar
Zibulevsky, M., Pearlmutter, B.A.: Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural Computation 13(4), 863–882 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Adelaide, SA, 5005, Australia
Yingpeng Sang & Hong Shen
School of Mathematical Science, The University of Adelaide, SA, 5005, Australia
Hui Tian

Authors

Yingpeng Sang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Hui Tian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St.,, WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sang, Y., Shen, H., Tian, H. (2009). Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-04174-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

Abstract

Chapter PDF

Similar content being viewed by others

Entropy-Randomized Projection

A survey: deriving private information from perturbed data

Use of EM algorithm for data reduction under sparsity assumption

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

Abstract

Chapter PDF

Similar content being viewed by others

Entropy-Randomized Projection

A survey: deriving private information from perturbed data

Use of EM algorithm for data reduction under sparsity assumption

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation