Statistical Disclosure Limitation for Health Data: A Statistical Agency Perspective

Abstract

Statistical agencies release heath data collected in surveys, censuses and registers. In this chapter, statistical disclosure limitation (SDL) from the perspective of statistical agencies is presented. Traditional outputs in the form of survey microdata and tabular outputs are first presented with respect to quantifying disclosure risk, common SDL techniques for protecting the data, and measuring information loss. In recent years, however, there is greater demand for data including government ‘open data’ initiatives, which have led statistical agencies to examine additional forms of disclosure risks, related to the concept of differential privacy in the computer science literature. A discussion on whether SDL practices carried out at statistical agencies for traditional outputs are differentially private, is provided in the chapter. The chapter concludes with the presentation of some innovative data dissemination strategies that are currently being assessed by statistical agencies, where stricter privacy guarantees are necessary.

References

  1. 1.
    Abowd, J.M., Vilhuber, L.: How protective are synthetic data? In: J. Domingo-Ferrer, Y. Saygn (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, pp. 239–246. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Antal, L., Shlomo, N., Elliot, M.: In: J. Domingo-Ferrer (ed.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 8744, pp. 62–78. Springer International Publishing, New York (2014)Google Scholar
  3. 3.
    Anwar, N.: Micro-aggregation – the small aggregates method. Informe Intern. Eurostat, Luxembourg (1993)Google Scholar
  4. 4.
    Benedetti, R., Capobianchi, A., Franconi, L.: Individual risk of disclosure using sampling Design information. Istat: Contributi (2003) http://www3.istat.it/dati/pubbsci/contributi/Contributi/contr_2003/2003_14.pdf
  5. 5.
    Bethlehem, J., Keller, W., Pannekoek, J.: Disclosure limitation of microdata. J. Am. Stat. Assoc. 85, 38–45 (1990)CrossRefGoogle Scholar
  6. 6.
    Brand, R.: Microdata protection through noise addition. In: J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases. Lecture Notes in Computer Science, vol. 2316, pp. 97–116. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Chambers, R.L., Dunstan, R.: Estimating distribution functions from survey data. Biometrika 73(3), 597–604 (1986)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Chaudhuri, K., Mishra, N.: When random sampling preserves privacy. In: C. Dwork (ed.) Advances in Cryptology - CRYPTO 2006. Lecture Notes in Computer Science, vol. 4117, pp. 198–213. Springer, Berlin (2006)CrossRefGoogle Scholar
  9. 9.
    Dalenius, T., Reiss, S.P.: Data swapping: a technique for disclosure limitation. J. Stat. Plann. Inference 7(1), 73–85 (1982)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Dandekar, R.A., Cox, L.H.: Synthetic tabular data: an alternative to complementary cell suppression. Energy Information Administration, U.S. Department of Energy (2002)Google Scholar
  11. 11.
    Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of Statistics Canada Symposium 92, Design and Analysis of Longitudinal Surveys, p. 195204 (1992)Google Scholar
  12. 12.
    Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’03), pp. 202–210. Association for Computing Machinery (2003)Google Scholar
  13. 13.
    Domingo-Ferrer, J., Mateo-Sanz, J., Torra, V.: Comparing sdc methods for micro-data on the basis of information loss and disclosure risk. In: Proceedings of the ETK-NTTS Conference (2001)Google Scholar
  14. 14.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRefGoogle Scholar
  15. 15.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: S. Halevi, T. Rabin (eds.) Theory of Cryptography. Lecture Notes in Computer Science, vol. 3876, pp. 265–284. Springer, Berlin (2006)Google Scholar
  16. 16.
    Elamir, E.A., Skinner, C.J.: Record level measures of disclosure risk for survey microdata. J. Off. Stat. 22(3), 525–539 (2006)Google Scholar
  17. 17.
    Fienberg, S., McIntyre, J.: Data swapping: variations on a theme by dalenius and reiss. J. Off. Stat. 9(1), 383–406 (2005)Google Scholar
  18. 18.
    Fraser, B., Wooton, J.: A proposed method for confidentialising tabular output to protect against differencing. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality (2005)Google Scholar
  19. 19.
    Fuller, W.A.: Masking procedures for micro-data disclosure limitation. J. Off. Stat. 9(1), 383–406 (1993)MathSciNetGoogle Scholar
  20. 20.
    Gomatam, S., Karr, A.: Distortion measures for categorical data swapping. Technical Report Number 131, National Institute of Statistical Sciences (2003)Google Scholar
  21. 21.
    Gouweleeuw, J., Kooiman, P., Willenborg, L., De Wolf, P.: Post randomisation for statistical disclosure limitation: theory and implementation. J. Off. Stat. 14(1), 463–478 (1998)MATHGoogle Scholar
  22. 22.
    Hundepool, A.: The casc project. In: J. Domingo-Ferrer (ed.) Inference Control in Statistical Databases. Lecture Notes in Computer Science, vol. 2316, pp. 172–180. Springer, Berlin (2002)CrossRefGoogle Scholar
  23. 23.
    Kim, J.: A method for limiting disclosure in micro-data based on random noise and transformation. In: Proceedings of the American Statistical Association, Section on Survey Research Methods, pp. 370–374 (1986)Google Scholar
  24. 24.
    Little, R., Liu, F.: Selective multiple imputation of keys for statistical disclosure control in microdata. The University of Michigan Department of Biostatistics Working Paper Series. Working Paper 6. (2003)Google Scholar
  25. 25.
    Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE ’08), pp. 277–286. IEEE (2008)Google Scholar
  26. 26.
    O’Keefe, C.M., Shlomo, N.: Comparison of remote analysis with statistical disclosure control for protecting the confidentiality of business data. Trans. Data Privacy 5(2), 403–432 (2012)MathSciNetGoogle Scholar
  27. 27.
    O’Keefe, C.M., Good, N.M.: A remote analysis server - what does regression output look like? In: J. Domingo-Ferrer, Y. Saygn (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, pp. 270–283. Springer, Berlin (2008)CrossRefGoogle Scholar
  28. 28.
    Raghunathan, T.E., Reiter, J.P., Rubin, D.B: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)Google Scholar
  29. 29.
    Reiter, J.: Releasing multiply imputed, synthetic public-use microdata: an illustration and empirical study. J. R. Stat. Soc. A 168(1), 185–205 (2005)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Rinott, Y., Shlomo, N.: A generalized negative binomial smoothing model for sample disclosure risk estimation. In: J. Domingo-Ferrer, L. Franconi (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 4302, pp. 82–93. Springer, Berlin (2006)CrossRefGoogle Scholar
  31. 31.
    Rinott, Y., Shlomo, N.: A smoothing model for sample disclosure risk estimation. In: Complex Datasets and Inverse Problems: Tomography, Networks and Beyond. Institute of Mathematical Statistics, Lecture Notes Monograph Series 54, 161–171 (2007)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Rinott, Y., Shlomo, N.: Variances and confidence intervals for sample disclosure risk measures. Proceedings of the 56th World Statistics Conference Lisboa, Portugal, Instituto Nacional de Estatística (INE), 1090-1096 (2007) http://isi.cbs.nl/iamamember/CD7-Lisboa2007/Bulletin-of-the-ISI-Volume-LXII-2007.pdf
  33. 33.
    Salazar-Gonzalez, J.J., Bycroft, C., Staggemeier, A.T.: Controlled rounding implementation. In: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Geneva, 9-11 Nov 2005Google Scholar
  34. 34.
    Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J.Off.Stat. 31(2), 305–324 (2015)Google Scholar
  35. 35.
    Shlomo, N.: Statistical disclosure limitation methods for census frequency tables. J. Int. Stat. Rev. 75(2), 199–217 (2007)CrossRefGoogle Scholar
  36. 36.
    Shlomo, N., Skinner, C.: Privacy protection from sampling and perturbation in survey microdata. J. Privacy Confidentiality 4(1), 155–169 (2012)MathSciNetGoogle Scholar
  37. 37.
    Shlomo, N., Skinner, C.: Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata. Ann. Appl. Stat. 4(3), 1291–1310 (2010)MathSciNetCrossRefMATHGoogle Scholar
  38. 38.
    Shlomo, N., De Waal, T.: Protection of micro-data subject to edit constraints against statistical disclosure. J. Off. Stat. 24(2), 1–26 (2008)Google Scholar
  39. 39.
    Shlomo, N., Skinner, C.: Privacy protection from sampling and perturbation in survey microdata. J. Privacy Confidentiality 4(1), 155–169 (2012)MathSciNetGoogle Scholar
  40. 40.
    Shlomo, N., Young, C.: Statistical disclosure control methods through a risk-utility framework. In: Proceedings of the 2006 CENEX-SDC Project International Conference on Privacy in Statistical Databases (PSD’06), pp. 68–81. Springer (2006)Google Scholar
  41. 41.
    Shlomo, N., Young, C.: Invariant post-tabular protection of census frequency counts. In: J. Domingo-Ferrer, Y. Saygn (eds.) Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 5262, pp. 77–89. Springer, Berlin (2008)CrossRefGoogle Scholar
  42. 42.
    Skinner, C., Holmes, D.: Estimating the re-identification risk per record in microdata. J. Off. Stat. 14(1), 361–372 (1998)MathSciNetGoogle Scholar
  43. 43.
    Skinner, C., Shlomo, N.: Assessing identification risk in survey micro-data using log-linear models. J. Am. Stat. Assoc. 103(483), 989–1001 (2008)MathSciNetCrossRefMATHGoogle Scholar
  44. 44.
    Willenborg, L., De Waal, T.: Elements of statistical disclosure limitation in practice. In: Lecture Notes in Statistics, vol. 155. Springer, New York (2001)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Social Statistics, School of Social SciencesUniversity of ManchesterManchesterUK

Personalised recommendations