Information preserving statistical obfuscation

Burridge, Jim

doi:10.1023/A:1025658621216

Information preserving statistical obfuscation

Published: October 2003

Volume 13, pages 321–327, (2003)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Jim Burridge¹

292 Accesses
56 Citations
2 Altmetric
Explore all metrics

Abstract

The problem of limiting the disclosure of information gathered on a set of companies or individuals (the “respondents”) is considered, the aim being to provide useful information while preserving confidentiality of sensitive information. The paper proposes a method which explicitly preserves certain information contained in the data. The data are assumed to consist of two sets of information on each “respondent”: public data and specific survey data. It is assumed in this paper that both sets of data are liable to be released for a subset of respondents. However, the public data will be altered in some way to preserve confidentiality whereas the specific survey data is to be disclosed without alteration. The paper proposes a model based approach to this problem by utilizing the information contained in the sufficient statistics obtained from fitting a model to the public data by conditioning on the survey data. Deterministic and stochastic variants of the method are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agresti A. 1992. A survey of exact inference for contingency tables (with discussion). Statistical Science 7: 131–177.
Google Scholar
Anderson T.W. 1984. An Introduction to Multivariate Statistical Analysis, 2nd edn. John Wiley & Sons, New York.
Google Scholar
Bishop Y.M.M., Fienberg S.E., and Holland P.W. 1975. Discrete Multivariate Analysis. MIT Press, Cambridge, Massachusetts.
Google Scholar
Booth J.G. and Butler R.W. 1999. An importance sampling algorithm for exact conditional tests in log-linear models. Biometrika 86: 321–332.
Google Scholar
Boyett J.M. 1979. Algorithm AS144. Random R ×C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 28: 329–332.
Google Scholar
Diaconis P. and Sturmfels B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 26: 363–397.
Google Scholar
Dobra A. and Fienberg S.E. 2000. Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences 97: 11885–11892.
Google Scholar
Fienberg S.E., Makov U.E., and Steele R.J. 1998. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14: 485–502.
Google Scholar
Forster J.J., McDonald J.W., and Smith P.W.F. 1996. Monte Carlo exact conditional tests for log-linear and logistic models. Journal of the Royal Statistical Society, Series B 58: 445–453.
Google Scholar
Franconi L. and Stander J. 2002. A model based method for disclosure limitation of business microdata. Journal of the Royal Statistical Society, Series D 51: 51–61.
Google Scholar
Geng Z. 1989. Decomposability and collapsibility for log-linear models. Journal of the Royal Statistical Society, Series C 38: 189–197.
Google Scholar
Haberman S.J. 1977. The Analysis of Frequency Data. The University of Chicago Press, Chicago.
Google Scholar
Lauritzen S.L. 1996. Graphical Models. Clarendon Press, Oxford.
Google Scholar
Mehta C.R. and Patel N.R. 1983. A network algorithm for performing Fisher's exact test in r × c contingency tables. Journal of the American Statistical Association 78: 427–434.
Google Scholar
Muralidhar K. and Sarathy R. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13: 329–335.
Google Scholar
Patefield W.M. 1981. Algorithm AS159. An efficient method of generating random R × C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 30: 91–97.
Google Scholar
Whittaker J. 1990. Graphical Models in Applied Multivariate Statistics. John Wiley & Sons, Chichester.
Google Scholar
Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice (Vol. 111, Lecture Notes in Statistics). Springer-Verlag, New York.
Google Scholar
Willenborg L. and de Waal T. 2000. Elements of Statistical Disclosure Control (Vol. 155, Lecture Notes in Statistics). Springer-Verlag, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Plymouth, UK
Jim Burridge

Authors

Jim Burridge
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burridge, J. Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003). https://doi.org/10.1023/A:1025658621216

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1025658621216

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information preserving statistical obfuscation

Abstract

Access this article

Similar content being viewed by others

Probabilistic Record Linkage for Disclosure Risk Assessment

Information preserving regression-based tools for statistical disclosure control

Efficient estimation of population variance of a sensitive variable using a new scrambling response model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Information preserving statistical obfuscation

Abstract

Access this article

Similar content being viewed by others

Probabilistic Record Linkage for Disclosure Risk Assessment

Information preserving regression-based tools for statistical disclosure control

Efficient estimation of population variance of a sensitive variable using a new scrambling response model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation