Abstract
For micro-datasets considered for release as scientific or public use files, statistical agencies have to face the dilemma of guaranteeing the confidentiality of survey respondents on the one hand and offering sufficiently detailed data on the other hand. For that reason, a variety of methods to guarantee disclosure control is discussed in the literature. In this paper, we present an application of Rubin’s (J. Off. Stat. 9, 462–468, 1993) idea to generate synthetic datasets from existing confidential survey data for public release.
We use a set of variables from the 1997 wave of the German IAB Establishment Panel and evaluate the quality of the approach by comparing results from an analysis by Zwick (Ger. Econ. Rev. 6(2), 155–184, 2005) with the original data with the results we achieve for the same analysis run on the dataset after the imputation procedure. The comparison shows that valid inferences can be obtained using the synthetic datasets in this context, while confidentiality is guaranteed for the survey participants.
Similar content being viewed by others
References
Abowd, J.M., Lane, J.: New approaches to confidentiality protection: synthetic data, remote access and research data centers. In: Privacy in Statistical Databases, pp. 282–289. Springer, New York (2004)
Abowd, J.M., Woodcock, S.D.: Disclosure limitation in longitudinal linked data. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 215–277. North-Holland, Amsterdam (2001)
Abowd, J.M., Woodcock, S.D.: Multiply-imputing confidential characteristics and file links in longitudinal linked data. In: Privacy in Statistical Databases, pp. 290–297. Springer, New York (2004)
Barnard, J., Rubin, D.B.: Small-sample degrees of freedom with multiple imputation. Biometrika 86, 948–955 (1999)
Brand, R.: Anonymität von Betriebsdaten—Verfahren zur Erfassung und Maßnahmen zur Verringerung des Reidentifikationsrisikos. Beiträge zur Arbeitsmarkt- und Berufsforschung, Bd. 237 (2000)
Brand, R.: Masking through noise addition. In: Inference Control in Statistical Databases, pp. 97–116. Springer, Berlin (2002)
Brand, R., Bender, S., Kohaut, S.: Possibilities for the creation of a scientific-use file for the IAB-establishment-panel. In: Statistical Data Confidentiality Proceedings of the Joint Eurostat/UN-ECE Work Session on Statistical Data Confidentiality Held in Thessaloniki in March 1999, pp. 57–74. Eurostat, Brüssel (1999)
Fischer, G., Janik, F., Müller, D., Schmucker, A.: The IAB establishment panel—from sample to survey to projection. FDZ-Methodenreport, No. 1 (2008)
Gottschalk, S.: Unternehmensdaten zwischen Datenschutz und Analysepotenzial. ZEW Wirtschaftsanalysen, Bd. 76. Nomos Verlag, Baden Baden (2005)
Karr, A.F., Kohen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. Am. Stat. 60, 224–232 (2006)
Kennickell, A.B.: Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. In: Record Linkage Techniques, pp. 248–267. National Academy Press, Washington (1997)
Kölling, A.: The IAB-establishment panel. J. Appl. Soc. Sci. Stud. 120, 291–300 (2000)
Lane, J.: Optimizing the use of micro-data: an overview of the issues. Paper presented at the Joint Statistical Meetings. http://client.norc.org/jole/SOLEweb/Accesstomicrodata%5B1%5D.pdf. (2005)
Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9, 407–426 (1993)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2002)
Meng, X.-L.: Multiple-imputation inferences with uncongenial sources of input. Stat. Sci. 9, 538–558 (1994)
Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a series of regression models. Surv. Methodol. 27, 85–96 (2001)
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19, 1–16 (2003)
Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)
Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29, 181–188 (2003)
Reiter, J.P.: Simultaneous use of multiple imputation for missing data and disclosure limitation. Surv. Methodol. 30, 235–242 (2004)
Reiter, J.P.: Releasing multiply-imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. Ser. A 168, 185–205 (2005)
Reiter, J.P., Drechsler, J.: Releasing multiply-imputed, synthetic data generated in two stages to protect confidentiality. Tech. rep., IAB Discussion Paper, No. 20 (2007)
Ronning, G., Rosemann, M.: Estimation of the probit model from anonymized micro data. In: Work Session on Statistical Data Confidentiality, Geneva, 9–11 November 2005. Monograph of Official Statistics, pp. 207–216. Eurostat, Luxemburg (2006)
Ronning, G., Rosemann, M., Strotmann, H.: Post-randomization under test: estimation of a probit model. J. Econ. Stat. 225, 544–566 (2005)
Rosemann, M.: Auswirkungen datenverändernder Anonymisierungsverfahren auf die Analyse von Mikrodaten. IAW (2006)
Rubin, D.B.: Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. In: American Statistical Association Proceedings of the Section on Survey Research Methods, pp. 20–40 (1978)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Rubin, D.B.: Discussion: statistical disclosure limitation. J. Off. Stat. 9, 462–468 (1993)
Rubin, D.B.: The design of a general and flexible system for handling nonresponse in sample surveys. Am. Stat. 58, 298–302 (2004)
Rubin, D.B., Schenker, N.: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Stat. Assoc. 81, 366–374 (1986)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Zwick, T.: Continuing vocational training forms and establishment productivity in Germany. Ger. Econ. Rev. 6(2), 155–184 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
The research provided in this paper is part of the project “Wirtschaftsstatistische Paneldaten und faktische Anonymisierung” financed by the Federal Ministry for Education and Research (BMBF) and conducted by the following institutes: Federal Statistical Office Germany, Statistical Offices of the Länder, Institute for Applied Economic Research (IAW), Centre for European Economic Research (ZEW), Institute for Employment Research (IAB). For more information about this project, see for instance Ronning and Rosemann (2006) or Ronning et al. (2005). We thank our project partners and the participants of the “UNECE Conference on Data Editing and Imputation,” 25.09.2006–27.09.2006 in Bonn and “The Conference on Privacy in Statistical Databases ’06,” 13.12.2006–15.12.2006 in Rome, and especially J.M. Abowd, T.E. Raghunathan, D.B. Rubin, J.P. Reiter, and two anonymous referees for their helpful comments on the paper.
Rights and permissions
About this article
Cite this article
Drechsler, J., Dundler, A., Bender, S. et al. A new approach for disclosure control in the IAB establishment panel—multiple imputation for a better data access. AStA Adv Stat Anal 92, 439–458 (2008). https://doi.org/10.1007/s10182-008-0090-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-008-0090-1