Skip to main content

Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies

  • Conference paper
Privacy in Statistical Databases (PSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5262))

Included in the following conference series:

Abstract

Statistical methods for disclosure limitation (or control) have seen coupling of tools from statistical methodologies and operations research. For the summary and release of data in the form of a contingency table some methods have focused on evaluation of bounds on cell entries in k-way tables given the sets of marginal totals, with less focus on evaluation of disclosure risk given other summaries such as conditional probabilities, that is, tables of rates derived from the observed contingency tables. Narrow intervals - especially for cells with low counts - could pose a privacy risk. In this paper we derive the closed-form solutions for the linear relaxation bounds on cell counts of a two-way contingency table given observed conditional probabilities. We also compute the corresponding sharp integer bounds via integer programming and show that there can be large differences in the width of these bounds, suggesting that using the linear relaxation is often an unacceptable shortcut to estimating the sharp bounds and the disclosure risk.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze, 8 (1936)

    Google Scholar 

  2. Buzzigoli, L., Gusti, A.: An algorithm to calculate the upper and lower bounds of the elements of an array given its marginals. In: Statistical Data Protection (SDP 1998) Proceedings, pp. 131–147. Eurostat, Luxembourg (1998)

    Google Scholar 

  3. Cox, L.: Bounds on entries in 3-dimensional contingency tables. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 21–33. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Cox, L.: Contingency tables of network type: Models, markov basis and applications. Statistica Sinica 17, 1371–1393 (2007)

    MATH  Google Scholar 

  5. Dobra, A., Fienberg, S., Rinaldo, A., Slavković, A., Zhou, Y.: Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation and disclosure limitation. In: Putinar, M., Sullivant, S. (eds.) IMA Volumes in Mathematics and its Applications: Emerging Applications of Algebraic Geometry, vol. 149, pp. 63–88. Springer, Heidelberg (2008)

    Google Scholar 

  6. Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Statistical Journal of the United Nations Economic Commission for Europe 18(4), 363–371 (2001)

    Google Scholar 

  7. Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals. Statistical Journal of the United Nations ECE 18, 363–371 (2003)

    Google Scholar 

  8. Federal Committe on Statistical Methodology, Statistical Policy Working Paper 22 (Version Two). Report on Statistical Disclosure Limitation Methodology (2005)

    Google Scholar 

  9. Fienberg, S.E.: Fréchet and Bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical Data Protection: Proceedings of the Conference, pp. 115–129. Eurostat, Luxembourg (1999)

    Google Scholar 

  10. Fienberg, S.E.: Contingency tables and log-linear models: Basic results and new developments. Journal of the American Statistical Association 95(450), 643–647 (2000)

    Article  Google Scholar 

  11. Fienberg, S.E., Slavkovic, A.B.: Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11, 155–180 (2005)

    Article  MathSciNet  Google Scholar 

  12. Fréchet, M.: Les Probabilitiés Associées a un Système dÉvénments Compatibles et Dépendants, Vol. Premiere Partie. Hermann & Cie, Paris (1940)

    Google Scholar 

  13. Hoeffding, W.: Scale-invariant correlation theory. Schriften des Mathematischen Instituts und des Instituts fur Angewandte Mathematik der Universit at Berlin 5(3), 181–233 (1940)

    Google Scholar 

  14. Hosten, S., Sturmfels, B.: Computing the integer programming gap (2003), http://www.citebase.org/abstract?id=oai:arXiv.org:math/0301266

  15. ILOG CPLEX, ILOG CPLEX 10.1 User’s Manual. ILOG (2006)

    Google Scholar 

  16. Lee, J., Slavković, A.: Synthetic tabular data preserving the observed conditional probabilities. In: PSD 2008 (submitted, 2008)

    Google Scholar 

  17. Lu, H., Li, Y., Wu, X.: Disclosure analysis for two-way contingency tables. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 57–67. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley-Interscience (1988)

    Google Scholar 

  19. Onn, S.: Entry uniqueness in margined tables. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 94–101. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Salazar-Gonzalez, J.-J.: Statistical confidentiality: Optimization techniques to protect tables. Computers and Operations Research 35, 1638–1651 (2008)

    Article  MATH  Google Scholar 

  21. Slavković, A.B.: Statistical Disclosure Limitation Beyond the Margins: Characterization of Joint Distributions for Contingency Tables. PhD thesis, Carnegie Mellon University (2004)

    Google Scholar 

  22. Slavković, A.B., Fienberg, S.E.: Bounds for cell entries in two-way tables given conditional relative frequencies. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 30–43. Springer, Heidelberg (2004)

    Google Scholar 

  23. Smucker, B., Slavković, A.: Cell bounds in K-way tables given conditional frequencies. Journal of Official Statistics (to be submitted, 2008)

    Google Scholar 

  24. Sullivant, S.: Small contingency tables with large gaps. Siam J. Discrete Math. 18(4), 787–793 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  25. Willenborg, L., de Waal, T.: Statistical Disclosure Control in Practice. Lecture Notes in Statistics III. Springer, New York (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Domingo-Ferrer Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Smucker, B., Slavković, A.B. (2008). Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87471-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87470-6

  • Online ISBN: 978-3-540-87471-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics