Skip to main content
Log in

Assessing database privacy using the area under the receiver-operator characteristic curve

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

One of the most pressing issues in the confidentiality literature is the quantification of privacy. One proposal, \( \epsilon \)-differential privacy, moves away from absolute guarantees of privacy to relative guarantees. However, the selection of an appropriate \( \epsilon \) is difficult because its interpretation is unclear. Further, when comparing different privacy preserving techniques to one another, a direct comparison cannot be made by simply comparing the respective values of \( \epsilon \). The aim of this work is to provide a measure that allows for direct comparison across different privacy schemes and is more easily interpreted. In turn, this will aid in policy debate pertaining to how much privacy is acceptable. Our proposal sets the problem in a hypothesis testing framework and uses the area under the receiver-operator characteristic (ROC) curve as a measure of privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bethlehem, J.G., Keller, W., Pannekoek, J.: Disclosure control of microdata. J. Am. Stat. Assoc. 85, 38–45 (1990)

    Article  Google Scholar 

  • Blum, A., Dwork, C., McSherry, F., Nissam, K.: Practical privacy: the sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 128–138 (2005)

  • Cox, L.H.: Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75, 377–385 (1980)

    Article  Google Scholar 

  • Cox, L.H.: Matrix masking methods for disclosure limitation in microdata. Surv. Methodol. 6, 165–169 (1994)

    Google Scholar 

  • Dale, A., Elliot, M.: Proposals for 2001 samples of anonymized records: an assessment of disclosure risk. J. R. Stat. Soc. Ser. A 164(3), 427–447 (2001)

    Article  Google Scholar 

  • Dalenius, T., Reiss, S.P.: Data-swapping: a technique for disclosure control. J. Stat. Plan. Inference 6, 73–85 (1982)

    Article  Google Scholar 

  • Dinur, I., Nissam, K.: Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems, pp. 202–210 (2003)

  • Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164–165(1), 285–293 (2004)

    Article  Google Scholar 

  • Duncan, G., Lambert, D.: The risk of disclosure for microdata. J. Bus. Econ. Stat. 7, 207–217 (1989)

    Article  Google Scholar 

  • Dwork, C.: Differential privacy. In: ICALP, pp.1–12. Springer, New York (2006)

  • Dwork, C.: An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science, 10 pp. Springer, New York (2008)

  • Dwork, C., Lei, J.: Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC), pp. 371–380 (2009)

  • Dwork, C., Nissam, K.: Privacy-preserving datamining on vertically partitioneddatabases. In: Advances in Cryptology: Proceedings of Crypto, pp. 528–544 (2004)

  • Dwork, C., Smith, A.: Differential privacy for statistics: what we know and what we want to learn. J. Priv. Confid. 1(2), 135–154 (2009)

    Google Scholar 

  • Keller, W.J., Bethlehem, J.G.: Disclosure protection of microdata: problems and solutions. Stat. Neerl. 46, 5–19 (1992)

    Article  Google Scholar 

  • Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 106–115 (2007)

  • Little, R.J.A.: Statistical analysis of masked data (Disc: P455-474) (Corr: 94V10 p469). J. Off. Stat. 9, 407–426 (1993)

    Google Scholar 

  • Machanavajjhala A., Kifer D., Gehrke J., Venkitasubramaniam M.: L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1), 3 (2007)

    Article  Google Scholar 

  • Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L.: Privacy: theory meets practice on the map. In: International Conference on Data Engineering, April, 10 pp. Cornell University Computer Science Department, Cornell (2008)

  • Matthews, G.J., Harel, O., Aseltine, R.H.: Examining the robustness of fully synthetic data techniques for data with binary variables. J. Stat. Comput. Simul. 80(6), 609–624 (2010)

    Article  Google Scholar 

  • Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 75–84. ACM Press, New York (2007)

  • Paass, G.: Disclosure risk and disclosure avoidance for microdata. J. Bus. Econ. Stat. 6(4), 487–500 (1988)

    Article  Google Scholar 

  • Pepe, M.S.: The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford (2003)

    Google Scholar 

  • Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19(1), 1–16 (2003)

    Google Scholar 

  • Reiter, J.P.: Satisfying disclosure restriction with synthetic data sets. J. Off. Stat. 18(4), 531–543 (2002)

    Google Scholar 

  • Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. Ser. A 168(1), 185–205 (2005)

    Google Scholar 

  • Rubin, D.B.: Comment on “statistical disclosure limitation”. J. Off. Stat. 9, 461–468 (1993)

    Google Scholar 

  • Sarathy, R., Muralidhar, K.: The security of confidential numerical data in databases. Inf. Syst. Res. 13(4), 389–403 (2002)

    Article  Google Scholar 

  • Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. J. R. Stat. Soc. Ser. B 64(4), 855–867 (2002)

    Article  Google Scholar 

  • Smith, A.: Efficient, differentially private point estimators. arXiv:0809.4794v1 [cs.CR] (2008)

  • Sweeney, L.: k-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  Google Scholar 

  • Wasserman, L., Zhou S.: A statistical framework for differential privacy. J. Am. Stat. Assoc. 105(489), 375–389 (2010)

    Article  CAS  Google Scholar 

  • Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Inference Control in Statistical Databases, From Theory to Practice, pp. 135–152. Springer, London (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ofer Harel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matthews, G.J., Harel, O. & Aseltine, R.H. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Serv Outcomes Res Method 10, 1–15 (2010). https://doi.org/10.1007/s10742-010-0061-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-010-0061-3

Keywords

Navigation