Abstract
Data privacy is an issue of increasing importance for big data mining, especially for micro-level data. A popular approach to protecting the such is perturbation. Therefore, techniques used to recover the statistical information of the original data from the perturbed data become indispensable in data mining.
This paper reviews and exams the existing techniques for estimating (alternatively, reconstructing) the density function of the original data based on the data perturbed using the additive/multiplicative noise method. Our studies show that the techniques developed for noise-added data cannot replace the techniques for noise-multiplied data, though the two types of masked data could be mutually converted through data transformation. This conclusion might attract data providers’ attention.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Some literature uses the term reconstructing. We will use them interchangeably in this paper.
- 2.
See the discussion of the KEtal2003 Approach.
- 3.
Other multiplicative noise distributions might be considered. Identifying a best multiplicative noise for masking the underlying data in terms of minimising the level of values disclosure risk and minimising the original data utility loss subject for future work.
References
Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM SIGMOD Rec. 29, 439–450 (2000)
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 247–255. ACM (2001)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: 2003 Third IEEE International Conference on Data Mining, ICDM 2003, pp. 99–106. IEEE (2003)
Lin, Y.-X.: Density approximant based on noise multiplied data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 89–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_8
Lin, Y.X., Fielding, M.J.: MaskDensity14: an R package for the density approximant of a univariate based on noise multiplied data. SoftwareX 3, 37–43 (2015)
Lin, Y.X.: Mining the statistical information of confidential data from noise-multiplied data. In: Proceedings of the 3rd IEEE International Conference on Big Data Intelligence and Computing (2017)
Domingo-Ferrer, J., Sebé, F., Castellà -Roca, J.: On the security of noise addition for privacy in statistical databases. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 149–161. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25955-8_12
Lin, Y.X., Mazur, L., Sarathy, R., Muralidhar, K.: Statistical information recovery from multivariate noise-multiplied data, a computational approach. Trans. Data Priv. 11, 23–45 (2018)
Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)
Kim, J., Winkler, W.: Multiplicative noise for masking continuous data. Statistics 2003-01 (2003)
Mivule, K.: Utilizing noise addition for data privacy, an overview. In: Proceedings of the International Conference on Information and Knowledge Engineering (IKE), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 1 (2012)
Torra, V.: Data Privacy: Foundations, New Developments and the Big Data Challenge. SBD, vol. 28. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57358-8
Nayak, T.K., Sinha, B., Zayatz, L.: Statistical properties of multiplicative noise masking for confidentiality protection. J. Off. Stat. 27(3), 527–544 (2011)
Muralidhar, K., Batra, D., Kirs, P.J.: Accessibility, security, and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach. Manag. Sci. 41(9), 1549–1564 (1995)
Provost, S.B.: Moment-based density approximants. Math. J. 9(4), 727–756 (2005)
Lin, Y.X.: A computational Bayesian approach for estimating density functions based on noise-multiplied data. Int. J. Big Data Intell. (2018). (in press)
Ma, Y., Lin, Y.X., Sarathy, R.: The vulnerability of multiplicative noise protection to correlational attacks on continuous microdata. Technical report, National Institute for Applied Statistics Research Australia, School of Mathematics and Applied Statistics, University of Wollongong, Australia (2017)
United States Census Bureau: United states census dataset (2000). Accessed 27 July 2000
Acknowledgements
Part of R code for implementing the AS2000 Approach was developed by Miss A. Fernando supported by the Winter Project Scholarship 2016, School of Mathematics and Applied Statistics, UoW.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, YX., Krivitsky, P.N. (2018). Reviewing the Methods of Estimating the Density Function Based on Masked Data. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-99771-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99770-4
Online ISBN: 978-3-319-99771-1
eBook Packages: Computer ScienceComputer Science (R0)