Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models

Lee, Min Cherng; Mitra, Robin; Lazaridis, Emmanuel; Lai, An Chow; Goh, Yong Kheng; Yap, Wun-She

doi:10.1007/978-3-319-40253-6_5

Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models

Min Cherng Lee³,
Robin Mitra⁴,
Emmanuel Lazaridis⁵,
An Chow Lai³,
Yong Kheng Goh³ &
…
Wun-She Yap³

Conference paper
First Online: 30 June 2016

1699 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9722))

Abstract

When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Common approaches such as adding random noises, top coding variables and swapping data values will distort the relationships in the original data. To achieve the aforementioned properties, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. By using our proposed methods, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed methods in term of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a dataset taken from 1987 National Indonesia Contraceptive Prevalence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Article MathSciNet Google Scholar
Fienberg, S.E., McIntyre, J.: Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 14–29. Springer, Heidelberg (2004)
Chapter Google Scholar
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Official Stat. 9(2), 383–406 (1993)
Google Scholar
Drechsler, J., Reiter, J.P.: Accounting for intruder uncertainty due to sampling when estimating identification disclosure risks in partially synthetic data. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 227–238. Springer, Heidelberg (2008)
Chapter Google Scholar
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)
Article Google Scholar
Lee, M.C., Mitra, R.: Multiply imputing missing values in data sets with measurement scales using a sequence of generalised linear models. Comput. Stat. Data Anal. 95, 24–38 (2016)
Article MathSciNet Google Scholar
Little, R.J.A.: Statistical analysis of masked data. J. Official Stat. 9(2), 407–426 (1993)
Google Scholar
Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Method. 29, 181–188 (2003)
Google Scholar
Reiter, J.P.: Using CART to generate partially synthetic public use microdata. J. Official Stat. 21(3), 441–461 (2005)
Google Scholar
Reiter, J.P., Mitra, R.: Estimating risks of identification disclosure in partially synthetic data. J. Priv. Confidentiality 1(1), 99–110 (2009)
Google Scholar
Rubin, D.B.: Statistical disclosure limitation. J. Official Stat. 9(2), 461–468 (1993)
Google Scholar
Schafer, J.L.: Analysis of incomplete multivariate data. Chapman & Hall/CRC, London (1997)
Book Google Scholar
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)
Book Google Scholar
Woo, M.-J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1), 111–124 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Universiti Tunku Abdul Rahman, Bandar Sungai Long, 43300, Kajang, Malaysia
Min Cherng Lee, An Chow Lai, Yong Kheng Goh & Wun-She Yap
Southampton Statistical Sciences Research Institute, University of Southampton, Southampton, UK
Robin Mitra
National Institute for Cardiovascular Outcomes Research, University College London, London, UK
Emmanuel Lazaridis

Authors

Min Cherng Lee
View author publications
You can also search for this author in PubMed Google Scholar
Robin Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Lazaridis
View author publications
You can also search for this author in PubMed Google Scholar
An Chow Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yong Kheng Goh
View author publications
You can also search for this author in PubMed Google Scholar
Wun-She Yap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wun-She Yap .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Joseph K. Liu
Monash University, Melbourne, Victoria, Australia
Ron Steinfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, M.C., Mitra, R., Lazaridis, E., Lai, A.C., Goh, Y.K., Yap, WS. (2016). Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-40253-6_5
Published: 30 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40252-9
Online ISBN: 978-3-319-40253-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics