Abstract
To obtain full cooperation from respondents, statistical offices must guarantee that confidential data will not be disclosed when their reports are published. For tabular data, cell suppression is one of the preferred techniques to control statistical disclosure. When suppressing only confidential values does not guarantee the desired data protection, it is also necessary to suppress the values in some non-confidential cells. The problem of finding an optimal set of complementary suppressions—the cell suppression problem (CSP)—is NP-hard. We present a three-phase algorithm for the CSP based on a binary relaxation derived from row and column protection conditions. To enforce violated single cell conditions, integer cuts are added to the CSP relaxation. The numerical results obtained in 1410 instances with up to more than 250 000 cells, which were generated to reproduce two classes of real-world data, indicate that the algorithm is quite effective for both classes of instances and that it outperforms state-of-the-art algorithms for one of them.
Similar content being viewed by others
References
Ahuja RK, Magnanti TL and Orlin JB (1993). Network Flows: Theory, Algorithms and Applications. Prentice-Hall: Englewood Cliffs, NJ.
Almeida MT and Carvalho FD (2005). Exact disclosure prevention in two-dimensional statistical tables. Comput Opns Res 32: 2919–2936.
Carvalho FD and Almeida MT (2000). Lower-bounding procedures for the 2-dimensional cell suppression problem. Eur J Opl Res 123: 29–41.
Carvalho FD, Dellaert N and Osório M (1994a). Statistical disclosure in two-dimensional tables: General tables. J Am Statist Assoc 89: 1547–1557.
Carvalho FD, Dellaert N and Osório M (1994b). Statistical disclosure in two-dimensional tables: Positive tables. Report 9441/a, Econometric Institute, Erasmus University Rotterdam.
Cox LH (1995). Network models for complementary cell suppression. J Am Statist Assoc 90: 1453–1462.
Domingo-Ferrer J and Torra V (eds) (2004). Privacy in Statistical Databases. Lecture Notes in Computer Science, Vol. 3050. Springer-Verlag: Berlin, Heidelberg.
Doyle P, Lane J, Theeuwes J and Zayatz L (eds) (2001). Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. North-Holland: Amsterdam..
Fischetti M and Salazar JJ (1999). Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control. Math Program 84: 283–312.
Fischetti M and Salazar JJ (2000). Models and algorithms for optimizing cell suppression in tabular data with linear constraints. J Am Statist Assoc 95: 916–928.
Fischetti M and Salazar JJ (2003). Partial cell suppression: A new methodology for statistical disclosure control. Statist Comput 13: 13–21.
Gonzalez J Jr and Cox LH (2005). Software for tabular data protection. Statist Med 24: 659–669.
Gusfield D (1988). A graph theoretic approach to statistical data security. SIAM J Comput 17: 552–571.
ILOG (2002). ILOG Cplex 8.0 User's Manual and Reference Manual. ILOG SA. http://www.ilog.com.
Kelly J, Golden B and Assad A (1992). Cell suppression: Disclosure protection for sensitive tabular data. Networks 22: 397–417.
Willenborg L and de Waal T (2001). Elements of Statistical Disclosure Control. Springer-Verlag: New York.
Acknowledgements
The authors thank two anonymous referees for their comments and suggestions. Thanks are also due to Ann Henshall for her assistance in editing the final version of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carvalho, F., Almeida, M. A three-phase algorithm for the cell suppression problem in two-dimensional statistical tables. J Oper Res Soc 59, 556–562 (2008). https://doi.org/10.1057/palgrave.jors.2602389
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/palgrave.jors.2602389