Abstract
When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Common approaches such as adding random noises, top coding variables and swapping data values will distort the relationships in the original data. To achieve the aforementioned properties, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. By using our proposed methods, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed methods in term of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a dataset taken from 1987 National Indonesia Contraceptive Prevalence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Fienberg, S.E., McIntyre, J.: Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 14–29. Springer, Heidelberg (2004)
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Official Stat. 9(2), 383–406 (1993)
Drechsler, J., Reiter, J.P.: Accounting for intruder uncertainty due to sampling when estimating identification disclosure risks in partially synthetic data. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 227–238. Springer, Heidelberg (2008)
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)
Lee, M.C., Mitra, R.: Multiply imputing missing values in data sets with measurement scales using a sequence of generalised linear models. Comput. Stat. Data Anal. 95, 24–38 (2016)
Little, R.J.A.: Statistical analysis of masked data. J. Official Stat. 9(2), 407–426 (1993)
Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Method. 29, 181–188 (2003)
Reiter, J.P.: Using CART to generate partially synthetic public use microdata. J. Official Stat. 21(3), 441–461 (2005)
Reiter, J.P., Mitra, R.: Estimating risks of identification disclosure in partially synthetic data. J. Priv. Confidentiality 1(1), 99–110 (2009)
Rubin, D.B.: Statistical disclosure limitation. J. Official Stat. 9(2), 461–468 (1993)
Schafer, J.L.: Analysis of incomplete multivariate data. Chapman & Hall/CRC, London (1997)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Woo, M.-J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1), 111–124 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lee, M.C., Mitra, R., Lazaridis, E., Lai, A.C., Goh, Y.K., Yap, WS. (2016). Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-40253-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40252-9
Online ISBN: 978-3-319-40253-6
eBook Packages: Computer ScienceComputer Science (R0)