Skip to main content
Log in

Privacy protection and aggregate health data: a review of tabular cell suppression methods (not) employed in public health data systems

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

Public health research often relies on individuals’ confidential medical data. Therefore, data collecting entities, such as states, seek to disseminate this medical data as widely as possible while still maintaining the privacy of the individual for legal and ethical reasons. One common way in which this medical data is released is through the use of Web-based Data Query Systems (WDQS). In this article, we examined WDQS listed in the National Association for Public Health Statistics and Information Systems (NAPHSIS) specifically reviewing them for how they prevent statistical disclosure in queries that produce a tabular response. One of the most common methods to combat this type of disclosure is through the use of suppression, that is, if a cell count in a table is below a certain threshhold, the true value is suppressed. This technique does work to prevent the direct disclosure of small cell counts, however, primary suppression by itself is not always enough to preserve privacy in tabular data. Here, we present several real examples of tabular response queries that employ suppression, but we are able to infer the values of the suppressed cells, including cells with 1 counts, which could be linked to auxiliary data sources and thus has the possibility to create an identity disclosure. We seek to stimulate awareness of the potential for disclosure of information that individuals may wish to keep private through an online query system. This research is undertaken in the hope that privacy concerns can be dealt with preemptively rather than only after a major disclosure has taken place. In the wake of a such an event, a major concern is that state and local officials would react to this by permanently shutting down these sites and cutting off a valuable source of research data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Barbaro, M., Zeller, T, Jr.: A face is exposed for AOL searcher No. 4417749. New York Times, 9 August 2006. http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=0 (2006). Accessed 9 March 2016

  • Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75, 377–385 (1980)

    Article  Google Scholar 

  • Cox, L.H.: Disclosure control methods for frequency count data. Technical Report, U.S. Bureau of the Census (1984)

  • Cox, L.H.: A constructive procedure for unbiased controlled rounding. J. Am. Stat. Assoc. 82, 520–524 (1987)

    Article  Google Scholar 

  • Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R.: Disclosure avoidance techniques for tabular data. Technical Report, U.S. Bureau of the Census (1987)

  • Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan Inference 6, 73–85 (1982)

    Article  Google Scholar 

  • Duke-Williams, O., Rees, P.: Can census offices publish statistics for more than one small area geography? an analysis of the differencing problem in statistical disclosure. Int. J. Geogr. Inf. Sci. 12(6), 579–605 (1998)

    Article  CAS  PubMed  Google Scholar 

  • El Emam, K.: Risk-based de-identification of health data. The IEEE Computer and Reliability Societies, Los Alamitos (2010)

    Google Scholar 

  • El Emam, K., Fineberg, A.: An overview of techniques for de-identifying personal health data. Access Information and Privacy Division of Health Canada (2009)

  • Fienberg, S.E., McIntyre, J.: Data swapping: Variations on a theme. Technical Report, National Institute of Statistical Sciences, Research Triangle Park, NC (2005)

  • Fraser, B., Wooten, J.: A proposed method for confidentialising tabular output to protect against differencing. Joint UNECE/Eurostat Work Sess. Stat Data Confid. (2005)

  • Gouweleeuw, J., Kooiman, P., Kooiman LW, P., de Wolf, P.P.: Post randomisation for statistical disclosure control: theory and implementation. J. Off. Stat. 14(4), 463–478 (1998)

    Google Scholar 

  • Hundepool, A,, Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P.: A CENtre of EXcellence for statistical disclosure control handbook on statistical disclosure control Version 1.01 (2006)

  • Matthews, G.J., Harel, O.: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Stat. Surv. 5, 1–29 (2011)

    Article  Google Scholar 

  • Matthews, G.J., Harel, O., Aseltine, R.H. Jr.: A review of statistical disclosure control techniques employed by web-based data query systems. J. Pub Health Manag. Pract. (in Press)

  • NAPHSIS: NAPHSIS web-based data query systems (WDQS) webpage. https://naphsis-web.sharepoint.com/Pages/WebbasedDataQuerySystemsWDQS.aspx (2016). Accessed 9 March 2016

  • Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. Proceedings of the 2008 IEEE Symposium on Security and Privacy pp. 111–125 (2008)

  • NCHS Research Data Center: Disclosure manual: preventing disclosure: rules for researchers. http://www.cdcgov/rdc/data/b4/disclosuremanualpdf (2012). p. 15

  • O’Keefe, C.M., Rubin, D.B.: Individual privacy versus public good: protecting confidentiality in health research. Stat. Med. 34(23), 3081–3103 (2015). doi:10.1002/sim.6543

    Article  PubMed  Google Scholar 

  • Shlomo, N., Antal, L., Elliot, M.: Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31(2), 305–324 (2015)

    Google Scholar 

  • Skinner, C.: Statistical disclosure control for survey data. In: Pfeffermann, D., Rao, C.R. (eds). Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications, pp. 381–396. Elsevier (2009)

  • Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl Based Syst. 10(5), 557–570 (2002)

    Article  Google Scholar 

  • Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, Berlin (2001)

    Book  Google Scholar 

Download references

Acknowledgments

This project was partially supported by Award Number K01MH087219 from the National Institute of Mental Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health or the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ofer Harel.

Ethics declarations

Conflict of interest

None.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matthews, G.J., Harel, O. & Aseltine, R.H. Privacy protection and aggregate health data: a review of tabular cell suppression methods (not) employed in public health data systems. Health Serv Outcomes Res Method 16, 258–270 (2016). https://doi.org/10.1007/s10742-016-0162-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-016-0162-8

Keywords

Navigation