Abstract
As the amount of data generated continues to increase, consideration of individuals’ privacy is a growing concern. As a result, there has been a vast quantity of research done on methods of statistical disclosure control. Some of these methods propose to release a randomized version of the data rather than the actual data. While methods of this type certainly offer some layer of protection, there is still the potential for private information to be disclosed. Quantifying the level of privacy provided by these methods is often difficult. In the past, a method for assessing privacy using the receiver operating characteristic (ROC) curve based on ideas related to differential privacy was proposed. However, the method was only demonstrated for univariate randomized releases. Here, the ROC-based privacy measure is extended to the release of randomized vectors.
Similar content being viewed by others
References
Abowd, J., Woodcock, S.: Disclosure limitation in longitudinal linked data. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 215–277. Elsevier, Amsterdam (2001)
Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)
Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75, 377–385 (1980)
Cox, L.H.: Disclosure control methods for frequency count data. Technical report, U.S. Bureau of the Census (1984)
Cox, L.H.: A constructive procedure for unbiased controlled rounding. J. Am. Stat. Assoc. 82, 520–524 (1987)
Cox, L.H.: Matrix masking methods for disclosure limitation in microdata. Surv. Methodol. 6, 165–169 (1994)
Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R.: Disclosure avoidance techniques for tabular data. Technical report, U.S. Bureau of the Census (1987)
Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)
De Waal, A., Hundepool, A., Willenborg, L.: Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau (1995)
Duncan, G., Lambert, D.: The risk of disclosure for microdata. J. Bus. Econ. Stat. 7, 207–217 (1989)
Duncan, G., Pearson, R.: Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Stat. Sci. 6, 219–232 (1991)
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP, pp. 1–12. Springer, Heidelberg (2006)
Fienberg, S.E., McIntyre, J.: Data swapping: variations on a theme. Technical report, National Institute of Statistical Sciences, Research Triangle Park (2005)
Fuller, W.: Masking procedurse for microdata disclosure limitation. J. Off. Stat. 9, 383–406 (1993)
Gouweleeuw, J., Kooiman, L.W.P., de Wolf, P.-P.: Post randomisation for statistical disclosure control: theory and implementation. J. Off. Stat. 14(4), 463–478 (1998)
Harel, O., Zhou, X.-H.: Multiple imputation: review and theory, implementation and software. Stat. Med. 26, 3057–3077 (2007)
Hundepool, A., Wetering, A.v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P.: τ-argus 3.1 User Manual. Statistics Netherlands, Voorburg NL (2005)
Kennickell, A.B.: Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. In: Alvey, W., Jamerson, B. (eds.) Record Linkage Techniques, pp. 248–267. National Academy Press, Washington (1997)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Liu, F., Little, R.J.A.: Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings of Joint Statistical Meeting, pp. 2133–2138 (2002)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: International Conference on Data Engineering, p. 10. Cornell University Comuputer Science Department, Cornell, USA (2008)
Manning, A.M., Haglin, D.J., Keane, J.A.: A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16(2), 165–196 (2008)
Matthews, G.J., Harel, O.: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Stat. Surv. 5, 1–29 (2011)
Matthews, G.J., Harel, O., Aseltine, R.H.: Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv. Outcomes Res. Method. 10(1), 1–15 (2010a)
Matthews, G.J., Harel, O., Aseltine, R.H.: Examining the robustness of fully synthetic data techniques for data with binary variables. J. Stat. Comput. Simul. 80(6), 609–624 (2010b)
McIntosh, M.W., Pepe, M.S.: Combining several screening tests: optimality of the risk score. Biometrics 58(3), 657–664 (2002)
Moore, Jr., R.: Controlled data-swapping techniques for masking public use microdata. Census Tech Report (1996)
Mugge, R.: Issues in protecting confidentiality in national health statistics. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, Washington (1983)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 75–84, San Diego (2007)
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)
Reiter, J.P.: Satisfying disclosure restriction with synthetic data sets. J. Off. Stat. 18(4), 531–543 (2002)
Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29(2), 181–188 (2003)
Reiter, J.P.: New approaches to data dissemination: a glimpse into the future (?). Chance 17(3), 11–15 (2004a)
Reiter, J.P.: Simultaneous use of multiple imputation for missing data and disclosure limitation. Surv. Methodol. 30(2), 235–242 (2004b)
Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. Ser. A Stat. Soc. 168(1), 185–205 (2005a)
Reiter, J.P.: Using CART to generate partially synthetic public use microdata. J. Off. Stat. 21(3), 441–462 (2005b)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken (1987)
Rubin, D.B.: Comment on “Statistical disclosure limitation”. J. Off. Stat. 9, 461–468 (1993)
Sarathy, R., Muralidhar, K.: The security of confidential numerical data in databases. Inf. Syst. Res. 13(4), 389–403 (2002)
Schafer, J.L., Graham, J.W.: Missing data: our view of state of the art. Psychol. Methods 7(2), 147–177 (2002)
Singh, A., Yu, F., Dunteman, G.: MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality, pp. 373–394, Luxembourg (2003)
Sweeney, L.: Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association, pp. 333–337. Hanley and Belfus, Inc., Washington (1996)
Sweeney, L.: Guaranteeing anonymity when sharing medical data, the datafly system. J. Am. Med. Inform. Assoc. 4, 51–55 (1997)
Sweeney, L.: The identifiability of data (2000)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowledge-Based Syst. 10(5), 557–570 (2002)
Acknowledgments
This project was partially supported by Award Number K01MH087219 from the National Institute of Mental Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Mental Health or the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Matthews, G.J., Harel, O. Assessing the privacy of randomized vector-valued queries to a database using the area under the receiver operating characteristic curve. Health Serv Outcomes Res Method 12, 141–155 (2012). https://doi.org/10.1007/s10742-012-0093-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-012-0093-y