Skip to main content

Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models

  • Conference paper
  • First Online:
  • 1699 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9722))

Abstract

When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Common approaches such as adding random noises, top coding variables and swapping data values will distort the relationships in the original data. To achieve the aforementioned properties, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. By using our proposed methods, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed methods in term of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a dataset taken from 1987 National Indonesia Contraceptive Prevalence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)

    Article  MathSciNet  Google Scholar 

  2. Fienberg, S.E., McIntyre, J.: Data swapping: variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 14–29. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Official Stat. 9(2), 383–406 (1993)

    Google Scholar 

  4. Drechsler, J., Reiter, J.P.: Accounting for intruder uncertainty due to sampling when estimating identification disclosure risks in partially synthetic data. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008. LNCS, vol. 5262, pp. 227–238. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)

    Article  Google Scholar 

  6. Lee, M.C., Mitra, R.: Multiply imputing missing values in data sets with measurement scales using a sequence of generalised linear models. Comput. Stat. Data Anal. 95, 24–38 (2016)

    Article  MathSciNet  Google Scholar 

  7. Little, R.J.A.: Statistical analysis of masked data. J. Official Stat. 9(2), 407–426 (1993)

    Google Scholar 

  8. Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Method. 29, 181–188 (2003)

    Google Scholar 

  9. Reiter, J.P.: Using CART to generate partially synthetic public use microdata. J. Official Stat. 21(3), 441–461 (2005)

    Google Scholar 

  10. Reiter, J.P., Mitra, R.: Estimating risks of identification disclosure in partially synthetic data. J. Priv. Confidentiality 1(1), 99–110 (2009)

    Google Scholar 

  11. Rubin, D.B.: Statistical disclosure limitation. J. Official Stat. 9(2), 461–468 (1993)

    Google Scholar 

  12. Schafer, J.L.: Analysis of incomplete multivariate data. Chapman & Hall/CRC, London (1997)

    Book  Google Scholar 

  13. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, New York (2001)

    Book  Google Scholar 

  14. Woo, M.-J., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1), 111–124 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wun-She Yap .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lee, M.C., Mitra, R., Lazaridis, E., Lai, A.C., Goh, Y.K., Yap, WS. (2016). Statistical Disclosure Control for Data Privacy Using Sequence of Generalised Linear Models. In: Liu, J., Steinfeld, R. (eds) Information Security and Privacy. ACISP 2016. Lecture Notes in Computer Science(), vol 9722. Springer, Cham. https://doi.org/10.1007/978-3-319-40253-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40253-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40252-9

  • Online ISBN: 978-3-319-40253-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics