Skip to main content

Re-identification Methods for Masked Microdata

  • Conference paper
Privacy in Statistical Databases (PSD 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3050))

Included in the following conference series:

Abstract

Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor methods), metrics are used to determine how close a value of a variable in a record is from the value of the corresponding variable in another record. If a sufficient number of variables in one record have values that are close to values in another record, then the records may be a match and correspond to the same entity. This paper shows that it is possible to create metrics for which re- identification is straightforward in many situations where masking is currently done. We begin by demonstrating how to quickly construct metrics for continuous variables that have been micro-aggregated one at a time using conventional methods. We extend the methods to situations where rank swapping is performed and discuss the situation where several continuous variables are micro-aggregated simultaneously. We close by indicating how metrics might be created for situations of synthetic microdata satisfying several sets of analytic constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive Name Matching in Information Integration. IEEE Intelligent Systems 18(5), 16–23 (2003)

    Article  Google Scholar 

  2. Brand, R.: Microdata Protection Through Noise Addition. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 97. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Dalenius, T., Reiss, S.P.: Data-swapping: A Technique for Disclosure Control. Journal of Statistical Planning and Inference 6, 73–85 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  4. Dandekar, R.A., Domingo-Ferrer, J., Sebe, F.: LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 153. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Dandekar, R., Cohen, M., Kirkendal, N.: Sensitive Microdata Protection Using Latin Hypercube Sampling Technique. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 117. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical Data-Oriented Microaggregation for Statistical Disclosure Control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)

    Article  Google Scholar 

  7. Domingo-Ferrer, J., Mateo-Sanz, J.M., Organian, A., Torres, A.: On the Security of Microaggregation with Individual Ranking: Analytic Attacks. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems 10(5), 477–492 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  8. Domingo-Ferrer, J., Sebe, F., Castella, J.: On the Security of Noise Addition for Privacy in Statistical Databases. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 149–161. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure Control and Data Access: Theory and Practical Applications, pp. 111–134. North Holland, Amsterdam (2001)

    Google Scholar 

  10. Fienberg, S.E.: Confidentiality and Disclosure Limitation Methodology: Challenges for National Statistics and Statistical Research, commissioned by Committee on National Statistics of the National Academy of Sciences (1997)

    Google Scholar 

  11. Fuller, W.A.: Masking Procedures for Microdata Disclosure Limitation. Journal of Official Statistics 9, 383–406 (1993)

    Google Scholar 

  12. Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning Probabilistic Models for Link Structure. Journal Machine Learning Research 3, 679–707 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  13. Grim, J., Bocek, P., Pudil, P.: Safe Dissemination of Census Results by Means of Interactive Probabilistic Models. In: Proceedings of 2001 NTTS and ETK, pp. 849–856. Eurostat, Luxembourg (2001)

    Google Scholar 

  14. Kargupta, H., Datta, S., Wang, Q., Ravikumar, K.: Random Data Perturbation Techniques and Privacy Preserving Data Mining. Expanded version of best paper awarded paper from the IEEE International Conference on Data Mining, Orlando, FL (November 2003)

    Google Scholar 

  15. Kim, J.J.: A Method for Limiting Disclosure in Microdata Based on Random Noise and Transformation. In: American Statistical Association, Proceedings of the Section on Survey Research Methods, pp. 303–308 (1986)

    Google Scholar 

  16. Kim, J.J.: Subdomain Estimation for the Masked Data. In: American Statistical Association, Proceedings of the Section on Survey Research Methods, pp. 456–461 (1990)

    Google Scholar 

  17. Kim, J.J., Winkler, W.E.: Masking Microdata Files. In: American Statistical Association. Proceedings of the Section on Survey Research Methods, pp. 114–119 (1995)

    Google Scholar 

  18. Koller, D., Pfeffer, A.: Probabilistic Frame-Based Systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligience (1998)

    Google Scholar 

  19. Lambert, D.: Measures of Disclosure Risk and Harm. Journal of Official Statistics 9, 313–331 (1993)

    Google Scholar 

  20. McCallum, A., Wellner, B.: Object Consolidation by Graph Partitioning with a Conditionally-Trained Distance Metric. In: Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification, Washington DC (August 2003)

    Google Scholar 

  21. Mera, R.: Matrix Masking Methods That Preserve Moments. In: American Statistical Association, Proceedings of the Section on Survey Research Methods, pp. 445–450 (1998)

    Google Scholar 

  22. Moore, R.: Controlled Data Swapping Techniques For Masking Public Use Data Sets, U.S. Bureau of the Census, Statistical Research Division Report rr96/04 (1996), available at http://www.census.gov/srd/www/byyear.html

  23. Muralidhar, K.: Verification of Re-identification Rates with Micro-aggregation, private communication (2003)

    Google Scholar 

  24. Muralidhar, K., Sarathy, R.: A Theoretical Basis for Perturbation Methods. Statistics and Computing 13(4), 329–335 (2003)

    Article  MathSciNet  Google Scholar 

  25. Paas, G.: Disclosure Risk and Disclosure Avoidance for Microdata. Journal of Business and Economic Statistics 6, 487–500 (1988)

    Article  Google Scholar 

  26. Palley, M.A., Simonoff, J.S.: The Use of Regression Methodology for the Compromise of Confidential Information in Statistical Databases. ACM Transactions on Database Systems 12(4), 593–608 (1987)

    Article  Google Scholar 

  27. Polletini, S.: Maximum Entropy Simulation for Microdata Protection. Statistics and Computing 13(4), 307–320 (2003)

    Article  MathSciNet  Google Scholar 

  28. Raghunathan, T.E., Reiter, J.P., Rubin, D.R.: Multiple Imputation for Statistical Disclosure Limitation. Journal of Official Statistics 19, 1–16 (2003)

    Google Scholar 

  29. Reiter, J.P.: Releasing Multiply Imputed, Synthetic Public-Use Microdata: An Illustration and Empirical Study. Journal of the Royal Statistical Society, A (2004)

    Google Scholar 

  30. Samarati, P.: Protecting Respondents’ Identity in Microdata Release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  31. Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: k-anonymity and Its Enforcement through Generalization and Cell Suppression, Technical Report, SRI International (1998)

    Google Scholar 

  32. Scheuren, F., Winkler, W.: Regression Analysis of Data Files that are Computer Matched - Part II. In: Survey Methodology, pp. 157–165 (1997)

    Google Scholar 

  33. Taskar, B., Abdeel, P., Koller, D.: Discriminative Probabilistic Models for Relational Data. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2002)

    Google Scholar 

  34. Taskar, B., Segal, E., Koller, D.: Probabilistic Classification and Clustering in Relational Data. In: Proceedings of the International Joint Conference on Artificial Intelligence (2001)

    Google Scholar 

  35. Taskar, B., Wong, M.F., Abdeel, P., Koller, D.: Link Prediction in Relational Data. Neural Information Processing Systems (2003)

    Google Scholar 

  36. Taskar, B., Wong, M.F., Koller, D.: Learning on Test Data: Leveraging “Unseen” Features. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 744–751 (2003)

    Google Scholar 

  37. Thibaudeau, Y., Winkler, W.E.: Bayesian Networks Representations, Generalized Imputation, and Synthetic Microdata Satisfying Analytic Restraints (2002), Statistical Research Division report RR 2002/09 at http://www.census.gov/srd/www/byyear.html

  38. Torra, V.: Re-Identifying Individuals Using OWA Operators. In: Proceedings of the Sixth Conference on Soft Computing, Iizuka, Fukuoka, Japan (2000)

    Google Scholar 

  39. Torra, V.O.: Operators in Data Modeling and Re-Identification. IEEE Transactions on Fuzzy Systems (to appear)

    Google Scholar 

  40. Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, New York (2000)

    Google Scholar 

  41. Winkler, W.E.: Matching and Record Linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. J. Wiley, New York (1995)

    Google Scholar 

  42. Winkler, W.E.: Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata. Research in Official Statistics 1, 87–104 (1998)

    Google Scholar 

  43. Winkler, W.E.: Issues with Linking Files and Performing Analyses on the Merged Files. In: Proceedings of the Sections on Government Statistics and Social Statistics, American Statistical Association, pp. 262–265 (1999)

    Google Scholar 

  44. Winkler, W.E.: Single Ranking Micro-aggregation and Re-identification (2002), Statistical Research Division report RR 2002/08 at http://www.census.gov/srd/www/byyear.html

  45. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbat-ive Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 135. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Winkler, W.E. (2004). Re-identification Methods for Masked Microdata. In: Domingo-Ferrer, J., Torra, V. (eds) Privacy in Statistical Databases. PSD 2004. Lecture Notes in Computer Science, vol 3050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25955-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25955-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22118-0

  • Online ISBN: 978-3-540-25955-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics