Abstract
The increasing demand for information, coupled with the increasing capability of computer systems, has compelled information providers to reassess their procedures for preventing disclosure of confidential information. This paper considers the problem of protecting an unpublished, sensitive table by suppressing cells in related, published tables. A conventional integer programming technique for two-dimensional tables is extended to find an optimal suppression set for the public tables. This can be used to protect the confidentiality of sensitive data in three- and higher-dimensional tables. More importantly, heuristics that are intimately related to the structure of the problem are also presented to mitigate the computational difficulty of the integer program. An example is drawn from healthcare management. Data tables are randomly generated to assess the computational time/space restrictions of the IP model, and to evaluate the heuristics.
References
Almeida MT, Schütz G and Carvalho FD (2008). Cell suppression problem: A genetic-based approach. Comput Opns Res 35: 1613–1623.
Bradley SP, Hax AC and Magnanti TL (1977). Applied Mathematical Programming. Addison-Wesley Publishing Company: Reading, MA.
Causey BD, Cox LH and Ernst LR (1985). Application of transportation theory to statistical problems. JASA 80: 903–909.
Cox LH (1980). Suppression methodology and statistical disclosure control. JASA 75: 377–385.
Cox LH (1987). A constructive procedure for unbiased controlled rounding. JASA 82: 520–524.
Cox LH (1992). Solving confidentiality protection problems in tabulations using network optimization: A network model for cell suppression in U.S. economic censuses. International Seminar on Statistical Confidentiality. Eurostat, Dublin, Ireland, pp 229–245.
Cox LH (1995). Network models for complementary cell suppression. JASA 90: 1453–1462.
De Carvalho FD, Dellaert N and Osorio MS (1994). Statistical disclosure in two-dimensional tables: General tables. JASA 89: 1547–1557.
Dobra A and Fienberg SE (2000). Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences of the United States of America 97 (22): 11885–11892.
Duncan GT and Fienberg S (1999). Obtaining information while preserving privacy: A Markov perturbation method for tabular data. In: Domingo-Ferrer J (ed). Statistical Data Protection (SDP ‘98) Proceedings. IDS Press, Eurostat: Lisbon, pp 351–362.
Duncan GT and Lambert D (1989). The risk of disclosure for microdata. J Bus Econ Stat 7: 207–217.
Duncan GT and Mukherjee S (1994). Confidentiality within computer databases. Stat Appl 6: 227–239.
Duncan GT and Mukherjee S (2000). Optimal disclosure limitation strategy in statistical databases’ deterring tracker attacks through additive noise. JASA 95: 720–729.
Duncan GT, Krishnan R, Padman R and Roehrig S (1998). Inference of three-way table entries from two-dimensional projections. In: Proceedings of HICSS ’98, IEEE Computer Society: Washington, DC, pp 204–212.
Dutta Chowdhury S, Duncan GT, Krishnan R, Roehrig S and Mukherjee S (1999). Disclosure detection in multivariate categorical databases: Auditing confidentiality protection through two new matrix operators. Mngt Sci 45: 1710–1723.
Evans T, Zayatz L and Slanta J (1996). Using Noise for Disclosure Limitation of Establishment Tabular Data. U.S. Bureau of Census: Washington, DC.
Fellegi IP (1972). On the question of statistical confidentiality. JASA 67: 7–18.
Fischetti M and Salazar JJ (2000). Models and algorithms for optimizing cell suppression in tabular data with linear constraints. JASA 95: 915–928.
Geurts J (1992). Heuristics for cell suppression in tables. Working paper, Netherlands Central Bureau of Statistics.
Glover F and Laguna M (1997). Tabu Search. Kluwer Academic Publishers: Boston, MA.
Gopal RD, Goes PB and Garfinkel R (1998). Interval protection of confidential information in a database. INFORMS JOC 10: 309–322.
Kelly JP (1990). Confidentiality protection in two and three-dimensional tables. PhD Thesis, University of Maryland, College Park, MD.
Kelly JP, Golden BL and Assad AA (1990). Using simulated annealing to solve controlled rounding problems. ORSA JOC 2: 174–185.
Kelly JP, Golden BL and Assad AA (1992). Cell suppression: Disclosure protection for sensitive tabular data. Networks 22: 397–417.
Lougee-Heimer R (1989). Guaranteeing confidentiality: The protection of tabular data. Masters Thesis, Department of Mathematical Sciences, Clemson University.
Muralidhar K, Batra D and Kirs PJ (1995). Accessibility, security, and accuracy in statistical databases: The case for the multiplicative fixed data perturbation approach. Mng Sci 41: 1549–1564.
Nargundkar MS and Saveland W (1972). Random-rounding of tables to prevent statistical disclosures. In: Proceedings of the American Statistical Association, Washington, DC, ASA: Alexandria, VA, pp 382–385.
Smith JE, Clark A and Staggemeier A (2009). A genetic approach to statistical disclosure control. In: Raidl G (ed.) Proceedings of Gecco, the ACM-SIGEVO Conference on Evolutionary Computation, ACM: New York, USA, pp 1625–1632.
Sullivan CM and Rowe E (1992). A data structure and linear programming technique to facilitate cell suppression strategies. In: Proceedings of the Section on Survey Research Methods, American Statistical Association: Washington, DC, pp 685–690.
Sullivan CM and Zayatz L (1991). A network flow disclosure avoidance system applied to the census of agriculture. In: Proceedings of the Section on Survey Research Methods, American Statistical Association: Washington: DC, pp 363–368.
Willenborg L and De Waal AG (1996). Statistical Disclosure Control in Practice. Springer-Verlag: New York.
Willenborg L and Hundepool A (1998). ARGUS for statistical disclosure control. In: Domingo-Ferrer J (ed). Statistical Data Protection (SDP ‘98) Proceedings. IDS Press, Eurostat: Lisbon, pp 227–242.
Willenborg L, De Waal AG and Keller WJ (1996). Some Methodological Issues in Statistical Disclosure Control. Fundacao Instituto Brasileiro de Geographica e Estatistica.
Zayatz L (1992). Using linear programming methodology for disclosure avoidance purposes I. Research Report, Statistical Research Division, Bureau of the Census, Washington, D.C.
Acknowledgements
This research was supported in part by the National Science Foundation, NSF IRI-9312143, and by the US Army Research Office under Grant DAAH04-94-6-0239. We thank Anthony Colatrella for invaluable help in programming the heuristics and collecting the statistics presented here.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roehrig, S., Padman, R., Krishnan, R. et al. Exact and heuristic methods for cell suppression in multi-dimensional linked tables. J Oper Res Soc 62, 291–304 (2011). https://doi.org/10.1057/jors.2010.133
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/jors.2010.133