Abstract
When multiplicative noises are used to protect values of a sensitive attribute in a microdata, it is frequently assumed that data intruders use the noise-multiplied value to estimate the corresponding unobservable original value of a target record. In this paper, we show that, data intruders could easily construct another estimate instead of using the noise-multiplied value to attack an original value. The new estimate, namely “correlation-attack” estimate, is obtained by exploiting the potentially high correlation between the noise-multiplied data and the original data. We provide a detailed comparison between the two estimates (noise-multiplied value and the correlation-attack estimate) by comparing the mean squared errors of the two underlying estimators, and we propose that data providers should always assess the disclosure risks from both estimators when generating noise-multiplied data. Correspondingly, we propose a disclosure risk measure which could be used by data providers for noise generating variable selection during data masking stage. A simulation study is provided to illustrate how the disclosure risk measure could help with noise generating variable selection for masking a set of original data.
Similar content being viewed by others
References
Agrawal, R. and Srikant, R. (2000). Privacy preserving data mining. In Proceedings of the ACM SIGMOD, p. 439–450.
Agrawal, R. and Aggarwal, C. (2001). On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara, California USA.
Brand, R. (2002). Microdata protection through noise addition. In Inference Control in Statistical Databases, vol. 2316 of LNCS. Springer Berlin Heidelberg, p. 61–74.
Domingo-Ferrer, J., Sebé, F. and Castellà-Roca, J. (2004). On the security of noise addition for privacy in statistical databases. Lecture Notes in Computer Science3050, 149–161.
Domingo-Ferrer, J. and Torra, V. (2001). Disclosure Protection Methods and Information Loss for Microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Doyle, P., Lane, J. I., Theeuwes, J. J. M. and Zayatz, L. (eds.), p. 91–110.
Duncan, G., Keller-McNulty, S. and Stokes, S. (2001). Disclosure Risk Vs. Data Utility: the R-U Confidentiality Map. Technical Report LA-UR-01-6428, Los Alamos National Laboratory, Statistical Sciences Group, Los Alamos, New Mexico.
Duncan, G., Keller-McNulty, S. and Stokes, S. (2004). Database security and confidentiality: examining disclosure risk vs. data utility through the R-U confidentiality map. Techncal Report Number 142, National Institute of Statistical Science.
Evans, T. (1996). Effects on trend statistics of the use of multiplicative noise for disclosure limitation. U.S. Bureau of the Census. http://www.census.gov/srd/sdc/papers.html.
Hwang, J.T. (1986). Multiplicative errors-in-variables models with applications to recent data released by the U.S. Department of Energy. Journal of American Statistical Association 81, 680–688.
Klein, M., Mathew, T. and Sinha, B. (2014). Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. Journal of Privacy and Confidentiality 6, 77–125.
Kim, J.J. (2007). Application of Truncated Triangular and Trapezoidal Distributions for Developing Multiplicative Noise. Proceedings of the Survey Methods Research Section, American Statistical Assoication, CD Rom.
Kim, J.J. and Winkler, W.E. (1995). Masking microdata files. American Statistical Association. Proceedings of the Section on Survey Research Methods, p. 114–119.
Kim, J.J. and Winkler, W.E. (2003). Multiplicative noise for masking continuous data. Statistical Research Division, Research Report Series(Statistics #2003-01). U.S Census Bureau.
Kim, J. and Jeong, D.M. (2008). Truncated triangular distribution for multiplicative noise and domain estimation. Section on Government Statistics-JSM 2008, 1023–1030.
Li, X.B. and Sarkar, S. (2011). Protecting Privacy against Regression Attacks in Predictive Data Mining. International Conference on Information Systems, Icis 2011, Shanghai, China.
Li, X.B. and Sarkar, S. (2013). Class-restricted clustering and microperturbation for data privacy. Management Science 59, 4, 796–812.
Lin, Y.X. and Wise, P. (2012). Estimation of regression paremeters from noise multiplied data. Journal of Privacy and Confidentiality 4, 61–94.
Lin, Y.X. (2014). Density approximant based on noise multiplied data. Privacy in statistical databases. LNCS 8744, 89–104.
Lin, Y.X. and Fielding, M.J. (2015). Maskdensity14: a R package for the density approximant of a univariate based on noise multiplied data. SoftwareX 3-4, 37–43.
Liu, K., Giannella, C. and Kargupta, H. (2008). A survey of attack techniques on Privacy-Preserving data perturbation methods. Privacy-Preserving Data Mining, vol. 34 of the series Advances in Database Systems, p. 359–381.
Ma, Y., Lin, Y.X., Chipperfield, J.O., Newman, J. and Leaver, V. (2016). A new algorithm for protecting aggregated business microdata via a remote system. Privacy in Statistical Databases. LNCS 9867, 210–221.
Muralidhar, K. and Domingo-Ferrer, J. (2016). Rank-based record linakge for re-identification risk assessment. Privacy in Statistical Databases. LNCS 9867, 225–236.
Nayak, T.K., Sinha, B. and Zayatz, L. (2011). Statistical properties of multiplicative noise masking for confidentiality protection. Journal of Official Statistics27, 3, 527–544.
Oganyan, A. and Karr, A. (2011). Masking methods that preserve positivity constraints in microdata. Journal of Statistical Planning and Inference 141, 31–41.
Shlomo, N. (2010). Releasing microdata: Disclosure risk estimation, data masking and assessing utility. Journal of Privacy and Confidentiality 2, 1, 73–91.
Sinha, B., Nayak, T.K. and Zayatz, L. (2011). Privacy protection and quantile estimation from noise multiplied data. Sankhya B 73, 2, 297–315.
Yancey, W.E., Winkler, W.E. and Creecy, R.H. (2002). Disclosure risk assessment in perturbative micro-data protection. Inference Control in Statistical Databases (ed. J. Domingo-Ferrer), New York: Springer, p. 135–151.
Acknowledgements
We thank all anonymous reviewers for their careful readings and constructive comments on the paper. This research has been conducted with the support of the Australian Government Research Training Program Scholarship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, Y., Lin, YX. & Sarathy, R. The Vulnerability of Multiplicative Noise Protection to Correlation-Attacks on Continuous Microdata. Sankhya B 82, 305–327 (2020). https://doi.org/10.1007/s13571-019-00191-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-019-00191-0
Keywords
- Data confidentiality
- Noise multiplication masking
- Continuous microdata
- Disclosure risk
- Attacking strategy