Data Mining and Knowledge Discovery

, Volume 11, Issue 2, pp 181–193 | Cite as

Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata

  • Josep M. Mateo-Sanz
  • Josep Domingo-Ferrer
  • Francesc Sebé
Article

Abstract

Inference control for protecting the privacy of microdata (individual data) should try to optimize the tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). Whereas risk measures are bounded between 0 and 1, information loss measures proposed in the literature for continuous data are unbounded, which makes it awkward to trade off information loss for disclosure risk. We propose in this paper to use probabilities to define bounded information loss measures for continuous microdata.

Keywords

database security privacy statistical disclosure control microdata protection information loss measures 

References

  1. Agrawal, D. and Aggarwal, C.C. 2001. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara CA: ACM.Google Scholar
  2. Dandekar, R., Domingo-Ferrer, J., and Sebé, F. 2002. Lhs-based hybrid microdata vs. rank swapping and microaggregation for numeric microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 153–162Google Scholar
  3. Domingo-Ferrer, J. and Mateo-Sanz, J.M. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201.CrossRefGoogle Scholar
  4. Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 vol. 2, Luxemburg: Eurostat, pp. 807–826Google Scholar
  5. Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 91–110, http://vneumann.etse.urv.es/publications/bcpi
  6. Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), North-Holland: Amsterdam, pp. 111–134, http://vneumann.etse.urv.es/publications/bcpi
  7. Härdle, W. 1991. Smoothing Techniques with Implementation in S. New York: Springer-VerlagMATHGoogle Scholar
  8. Kendall, M.G., Stuart, A., J.K. Ord, S.F.A., and O'Hagan, A. 1994. Kendall's Advanced Theory of Statistics, Volume 1: Distribution Theory (6th Edition). London: ArnoldGoogle Scholar
  9. Moore, R. 1996. Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census, Washington, DC (unpublished manuscript).Google Scholar
  10. Parzen, E. 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics, 35:1065–1076.CrossRefMathSciNetGoogle Scholar
  11. Rosenblatt, M. 1956. Remarks on some non-parametric estimates of a density function. Annals of Mathematical Statistics, 27:642–669.CrossRefMathSciNetGoogle Scholar
  12. Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 163–171Google Scholar
  13. Silverman, B.W. 1982. Kernel density estimation using the fast fourier transformation. Applied Statistics, 31:93–97.MATHCrossRefGoogle Scholar
  14. Trottini, M. 2003. Decision models for data disclosure limitation. PhD thesis, Carnegie Mellon University. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf
  15. Winkler, W.E. 1999. Re-identification methods for evaluating the confidentiality of analytically valid microdata. In Statistical Data Protection, J. Domingo-Ferrer (Ed.), Luxemburg: Office for Official Publications of the European Communities. (Journal version in Research in Official Statistics, vol. 1, no. 2, pp. 50–69, 1998).Google Scholar
  16. Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In Inference Control in Statistical Databases, J. Domingo-Ferrer (Ed.), volume 2316 of LNCS, Berlin, Heidelberg: Springer, pp. 135–152Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Josep M. Mateo-Sanz
    • 1
  • Josep Domingo-Ferrer
    • 1
  • Francesc Sebé
    • 1
  1. 1.Department of Computer Engineering and MathematicsRovira i Virgili University of TarragonaTarragonaSpain

Personalised recommendations