Abstract
The problem of limiting the disclosure of information gathered on a set of companies or individuals (the “respondents”) is considered, the aim being to provide useful information while preserving confidentiality of sensitive information. The paper proposes a method which explicitly preserves certain information contained in the data. The data are assumed to consist of two sets of information on each “respondent”: public data and specific survey data. It is assumed in this paper that both sets of data are liable to be released for a subset of respondents. However, the public data will be altered in some way to preserve confidentiality whereas the specific survey data is to be disclosed without alteration. The paper proposes a model based approach to this problem by utilizing the information contained in the sufficient statistics obtained from fitting a model to the public data by conditioning on the survey data. Deterministic and stochastic variants of the method are considered.
Similar content being viewed by others
References
Agresti A. 1992. A survey of exact inference for contingency tables (with discussion). Statistical Science 7: 131–177.
Anderson T.W. 1984. An Introduction to Multivariate Statistical Analysis, 2nd edn. John Wiley & Sons, New York.
Bishop Y.M.M., Fienberg S.E., and Holland P.W. 1975. Discrete Multivariate Analysis. MIT Press, Cambridge, Massachusetts.
Booth J.G. and Butler R.W. 1999. An importance sampling algorithm for exact conditional tests in log-linear models. Biometrika 86: 321–332.
Boyett J.M. 1979. Algorithm AS144. Random R ×C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 28: 329–332.
Diaconis P. and Sturmfels B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 26: 363–397.
Dobra A. and Fienberg S.E. 2000. Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences 97: 11885–11892.
Fienberg S.E., Makov U.E., and Steele R.J. 1998. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14: 485–502.
Forster J.J., McDonald J.W., and Smith P.W.F. 1996. Monte Carlo exact conditional tests for log-linear and logistic models. Journal of the Royal Statistical Society, Series B 58: 445–453.
Franconi L. and Stander J. 2002. A model based method for disclosure limitation of business microdata. Journal of the Royal Statistical Society, Series D 51: 51–61.
Geng Z. 1989. Decomposability and collapsibility for log-linear models. Journal of the Royal Statistical Society, Series C 38: 189–197.
Haberman S.J. 1977. The Analysis of Frequency Data. The University of Chicago Press, Chicago.
Lauritzen S.L. 1996. Graphical Models. Clarendon Press, Oxford.
Mehta C.R. and Patel N.R. 1983. A network algorithm for performing Fisher's exact test in r × c contingency tables. Journal of the American Statistical Association 78: 427–434.
Muralidhar K. and Sarathy R. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13: 329–335.
Patefield W.M. 1981. Algorithm AS159. An efficient method of generating random R × C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 30: 91–97.
Whittaker J. 1990. Graphical Models in Applied Multivariate Statistics. John Wiley & Sons, Chichester.
Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice (Vol. 111, Lecture Notes in Statistics). Springer-Verlag, New York.
Willenborg L. and de Waal T. 2000. Elements of Statistical Disclosure Control (Vol. 155, Lecture Notes in Statistics). Springer-Verlag, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Burridge, J. Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003). https://doi.org/10.1023/A:1025658621216
Issue Date:
DOI: https://doi.org/10.1023/A:1025658621216