Skip to main content
Log in

Information preserving statistical obfuscation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The problem of limiting the disclosure of information gathered on a set of companies or individuals (the “respondents”) is considered, the aim being to provide useful information while preserving confidentiality of sensitive information. The paper proposes a method which explicitly preserves certain information contained in the data. The data are assumed to consist of two sets of information on each “respondent”: public data and specific survey data. It is assumed in this paper that both sets of data are liable to be released for a subset of respondents. However, the public data will be altered in some way to preserve confidentiality whereas the specific survey data is to be disclosed without alteration. The paper proposes a model based approach to this problem by utilizing the information contained in the sufficient statistics obtained from fitting a model to the public data by conditioning on the survey data. Deterministic and stochastic variants of the method are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti A. 1992. A survey of exact inference for contingency tables (with discussion). Statistical Science 7: 131–177.

    Google Scholar 

  • Anderson T.W. 1984. An Introduction to Multivariate Statistical Analysis, 2nd edn. John Wiley & Sons, New York.

    Google Scholar 

  • Bishop Y.M.M., Fienberg S.E., and Holland P.W. 1975. Discrete Multivariate Analysis. MIT Press, Cambridge, Massachusetts.

    Google Scholar 

  • Booth J.G. and Butler R.W. 1999. An importance sampling algorithm for exact conditional tests in log-linear models. Biometrika 86: 321–332.

    Google Scholar 

  • Boyett J.M. 1979. Algorithm AS144. Random R ×C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 28: 329–332.

    Google Scholar 

  • Diaconis P. and Sturmfels B. 1998. Algebraic algorithms for sampling from conditional distributions. Annals of Statistics 26: 363–397.

    Google Scholar 

  • Dobra A. and Fienberg S.E. 2000. Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences 97: 11885–11892.

    Google Scholar 

  • Fienberg S.E., Makov U.E., and Steele R.J. 1998. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14: 485–502.

    Google Scholar 

  • Forster J.J., McDonald J.W., and Smith P.W.F. 1996. Monte Carlo exact conditional tests for log-linear and logistic models. Journal of the Royal Statistical Society, Series B 58: 445–453.

    Google Scholar 

  • Franconi L. and Stander J. 2002. A model based method for disclosure limitation of business microdata. Journal of the Royal Statistical Society, Series D 51: 51–61.

    Google Scholar 

  • Geng Z. 1989. Decomposability and collapsibility for log-linear models. Journal of the Royal Statistical Society, Series C 38: 189–197.

    Google Scholar 

  • Haberman S.J. 1977. The Analysis of Frequency Data. The University of Chicago Press, Chicago.

    Google Scholar 

  • Lauritzen S.L. 1996. Graphical Models. Clarendon Press, Oxford.

    Google Scholar 

  • Mehta C.R. and Patel N.R. 1983. A network algorithm for performing Fisher's exact test in r × c contingency tables. Journal of the American Statistical Association 78: 427–434.

    Google Scholar 

  • Muralidhar K. and Sarathy R. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13: 329–335.

    Google Scholar 

  • Patefield W.M. 1981. Algorithm AS159. An efficient method of generating random R × C tables with given row and column totals. Journal of the Royal Statistical Society, Series C 30: 91–97.

    Google Scholar 

  • Whittaker J. 1990. Graphical Models in Applied Multivariate Statistics. John Wiley & Sons, Chichester.

    Google Scholar 

  • Willenborg L. and de Waal T. 1996. Statistical Disclosure Control in Practice (Vol. 111, Lecture Notes in Statistics). Springer-Verlag, New York.

    Google Scholar 

  • Willenborg L. and de Waal T. 2000. Elements of Statistical Disclosure Control (Vol. 155, Lecture Notes in Statistics). Springer-Verlag, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burridge, J. Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003). https://doi.org/10.1023/A:1025658621216

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025658621216

Navigation