Abstract
We argue that any microdata protection strategy is based on a formal reference model. The extent of model specification yields “parametric”, “semiparametric”, or “nonparametric” strategies. Following this classification, a parametric probability model, such as a normal regression model, or a multivariate distribution for simulation can be specified. Matrix masking (Cox [2]), covering local suppression, coarsening, microaggregation (Domingo-Ferrer [8]), noise injection, perturbation (e.g. Kim [15]; Fuller [12]), provides examples of the second and third class of models. Finally, a nonparametric approach, e.g. use of bootstrap procedures for generating synthetic microdata (e.g. Dandekar et. al. [4]) can be adopted.
In this paper we discuss the application of a regression based imputation procedure for business microdata to the Italian sample from the Community Innovation Survey. A set of regressions (Franconi and Stander [11]) is used for generating flexible perturbation, for the protection varies according to identifiability of the enterprise; a spatial aggregation strategy is also proposed, based on principal components analysis. The inferential usefulness of the released data and the protection achieved by the strategy are evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brand, R.: Microdata protection through noise addition. In: “Inference Control in Statistical Databases”, LNCS 2316, Springer-Verlag (2002), 97–116.
Cox, L.H.: Matrix masking methods for disclosure limitation in microdata. Surv. Method. 20 (1994) 165–169.
Cox, L.H.: Towards a Bayesian Perspective on Statistical Disclosure Limitation. Paper presented at ISBA 2000—The Sixth World Meeting of the International Society for Bayesian Analysis (2000).
Dandekar, R., Cohen, M., Kirkendall, N.: Applicability of Latin Hypercube Sampling to create multi variate synthetic micro data. In: ETK-NTTS 2001 Preproceedings of the Conference. European Communities Luxembourg (2001) 839–847.
Dandekar, R., Cohen, M., Kirkendall, N.: Sensitive micro data protection using Latin Hypercube Sampling technique. In: “Inference Control in Statistical Databases”, LNCS 2316, Springer-Verlag (2002), 117–125.
Duncan, G.T. and Mukherjee S.: Optimal disclosure limitation strategy in statistical databases: deterring tracker attacks through additive noise. J. Am. Stat. Ass. 95 (2000) 720–729.
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 40 (1977) 1–38.
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering In Press (2001).
Fienberg, S.E., Makov, U., Steele, R.J.: Disclosure limitation using perturbation and related methods for categorical data (with discussion). J. O.. Stat. 14 (1998) 485–502.
Franconi, L., Stander, J.: Model based disclosure limitation for business microdata. In: Proceedings of the International Conference on Establishment Surveys-II, June 17–21, 2000 Buffalo, New York (2000) 887–896.
Franconi, L., Stander, J.: A model based method for disclosure limitation of business microdata. J. Roy. Stat. Soc. D Statistician 51 (2002) 1–11.
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. O.. Stat. 9 (1993) 383–406.
Grim, J., Bocek, P., Pudil, P.: Safe dissemination of census results by means of Interactive Probabilistic Models. In: ETK-NTTS 2001 Pre-proceedings of the Conference. European Communities Luxembourg (2001) 849–856.
Kennickell, A.B.: Multiple imputation and disclosure protection. In: Proceedings of the Conference on Statistical Data Protection, March, 25–27, 1998 Lisbon (1999) 381–400.
Kim, J.: A method for limiting disclosure of microdata based on random noise and transformation. In: Proceedings of the Survey Research Methods Section, American Statistical Association (1986) 370–374.
Little, R.J.A.: Statistical analysis of masked data. J. O.. Stat. 9 (1993) 407–426.
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley New York (1987).
Raghunathan T., Rubin, D.B.: Bayesian multiple imputation to Preserve Confidentiality in Public-Use Data Sets. In: Proceedings of ISBA 2000—The Sixth World Meeting of the International Society for Bayesian Analysis. European Communities Luxembourg (2000).
Rubin, D.B.: Discussion of “Statistical disclosure limitation”. J. O.. Stat. 9 (1993) 461–468.
Winkler, W.E., Yancey, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection via record linkage. In: “Inference Control in Statistical Databases”, LNCS 2316, Springer-Verlag (2002), 135–152.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Polettini, S., Franconi, L., Stander, J. (2002). Model Based Disclosure Protection. In: Domingo-Ferrer, J. (eds) Inference Control in Statistical Databases. Lecture Notes in Computer Science, vol 2316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47804-3_7
Download citation
DOI: https://doi.org/10.1007/3-540-47804-3_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43614-0
Online ISBN: 978-3-540-47804-1
eBook Packages: Springer Book Archive