Abstract
One of the most pressing issues in the confidentiality literature is the quantification of privacy. One proposal, \( \epsilon \)-differential privacy, moves away from absolute guarantees of privacy to relative guarantees. However, the selection of an appropriate \( \epsilon \) is difficult because its interpretation is unclear. Further, when comparing different privacy preserving techniques to one another, a direct comparison cannot be made by simply comparing the respective values of \( \epsilon \). The aim of this work is to provide a measure that allows for direct comparison across different privacy schemes and is more easily interpreted. In turn, this will aid in policy debate pertaining to how much privacy is acceptable. Our proposal sets the problem in a hypothesis testing framework and uses the area under the receiver-operator characteristic (ROC) curve as a measure of privacy.
Similar content being viewed by others
References
Bethlehem, J.G., Keller, W., Pannekoek, J.: Disclosure control of microdata. J. Am. Stat. Assoc. 85, 38–45 (1990)
Blum, A., Dwork, C., McSherry, F., Nissam, K.: Practical privacy: the sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138 (2005)
Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75, 377–385 (1980)
Cox, L.H.: Matrix masking methods for disclosure limitation in microdata. Surv. Methodol. 6, 165–169 (1994)
Dale, A., Elliot, M.: Proposals for 2001 samples of anonymized records: an assessment of disclosure risk. J. R. Stat. Soc. Ser. A 164(3), 427–447 (2001)
Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)
Dinur, I., Nissam, K.: Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems, pp. 202–210 (2003)
Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164–165(1), 285–293 (2004)
Duncan, G., Lambert, D.: The risk of disclosure for microdata. J. Bus. Econ. Stat. 7, 207–217 (1989)
Dwork, C.: Differential privacy. In: ICALP, pp.1–12. Springer, New York (2006)
Dwork, C.: An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science, 10 pp. Springer, New York (2008)
Dwork, C., Lei, J.: Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC), pp. 371–380 (2009)
Dwork, C., Nissam, K.: Privacy-preserving datamining on vertically partitioneddatabases. In: Advances in Cryptology: Proceedings of Crypto, pp. 528–544 (2004)
Dwork, C., Smith, A.: Differential privacy for statistics: what we know and what we want to learn. J. Priv. Confid. 1(2), 135–154 (2009)
Keller, W.J., Bethlehem, J.G.: Disclosure protection of microdata: problems and solutions. Stat. Neerl. 46, 5–19 (1992)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 106–115 (2007)
Little, R.J.A.: Statistical analysis of masked data (Disc: P455-474) (Corr: 94V10 p469). J. Off. Stat. 9, 407–426 (1993)
Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: International Conference on Data Engineering, April, 10 pp. Cornell University Computer Science Department, Cornell (2008)
Matthews, G.J., Harel, O., Aseltine, R.H.: Examining the robustness of fully synthetic data techniques for data with binary variables. J. Stat. Comput. Simul. 80(6), 609–624 (2010)
Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 75–84. ACM Press, New York (2007)
Paass, G.: Disclosure risk and disclosure avoidance for microdata. J. Bus. Econ. Stat. 6(4), 487–500 (1988)
Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford (2003)
Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)
Reiter, J.P.: Satisfying disclosure restriction with synthetic data sets. J. Off. Stat. 18(4), 531–543 (2002)
Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. Ser. A 168(1), 185–205 (2005)
Rubin, D.B.: Comment on “statistical disclosure limitation”. J. Off. Stat. 9, 461–468 (1993)
Sarathy, R., Muralidhar, K.: The security of confidential numerical data in databases. Inf. Syst. Res. 13(4), 389–403 (2002)
Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. J. R. Stat. Soc. Ser. B 64(4), 855–867 (2002)
Smith, A.: Efficient, differentially private point estimators. arXiv:0809.4794v1 [cs.CR] (2008)
Sweeney, L.: k-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Wasserman, L., Zhou S.: A statistical framework for differential privacy. J. Am. Stat. Assoc. 105(489), 375–389 (2010)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases, From Theory to Practice, pp. 135–152. Springer, London (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Matthews, G.J., Harel, O. & Aseltine, R.H. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcomes Res Method 10, 1–15 (2010). https://doi.org/10.1007/s10742-010-0061-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-010-0061-3