Advertisement

Data Swapping: Variations on a Theme by Dalenius and Reiss

  • Stephen E. Fienberg
  • Julie McIntyre
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3050)

Abstract

Data swapping, a term introduced in 1978 by Dalenius and Reiss for a new method of statistical disclosure protection in confidential data bases, has taken on new meanings and been linked to new statistical methodologies over the intervening twenty-five years. This paper revisits the original (1982) published version of the the Dalenius-Reiss data swapping paper and then traces the developments of statistical disclosure limitation methods that can be thought of as rooted in the original concept. The emphasis here, as in the original contribution, is on both disclosure protection and the release of statistically usable data bases.

Keywords

Bounds table cell entries Constrained perturbation Contingency tables Marginal releases Minimal sufficient statistics Rank swapping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp. 439–450 (2000)Google Scholar
  2. Aoki, S., Takemura, A.: Minimal basis for connected Markov chain over 3x3xK contingency tables with fixed two-dimensional marginals. Australian and New Zealand Journal of Statistics 45, 229–249 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  3. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge (1975)zbMATHGoogle Scholar
  4. Burridge, J.: Information preserving statistical obfuscation. Journal of Official Statistics 13, 321–327 (2003)MathSciNetGoogle Scholar
  5. Carlson, M., Salabasis, M.: A data-swapping technique for generating synthetic samples; A method for disclosure control. Research in Official Statistics 5, 35–64 (2002)Google Scholar
  6. Dalenius, T.: Towards a methodology for statistical disclosure control. Statistisk Tidskrift 5, 429–444 (1977)Google Scholar
  7. Dalenius, T.: Controlling Invasion of Privacy in Surveys. Statistics Sweden, Stockholm (1988)Google Scholar
  8. Dalenius, T., Reiss, S.P.: Data-swapping: A technique for disclosure control (extended abstract). In: American Statistical Association, Proceedings ofthe Section on Survey Research Methods, Washington, DC, pp. 191–194 (1978)Google Scholar
  9. Dalenius, T., Reiss, S.P.: Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  10. Diaconis, P., Sturmfels, B.: Algebraic algorithms for sampling From conditional distributions. Annals of Statistics 26, 363–397 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  11. Dobra, A.: Markov bases for decomposable graphical models. Bernoulli 9, 1–16 (2003)CrossRefMathSciNetGoogle Scholar
  12. Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingencytables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences 97, 11885–11892 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  13. Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingencytables induced by fixed marginal totals. Statistical Journal of the United Nations ECE 18, 363–371 (2001)Google Scholar
  14. Domingo-Ferrer, J., Mateo-Sanz, J.M.: On resampling for statistical confidentiality in contingency tables. Computers & Mathematics with Applications 38, 13–32 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  15. Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access. Theory and Practical Applications for Statistical Agencies, pp. 91–110. North-Holland, Amsterdam (2001)Google Scholar
  16. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access. Theory and Practical Applications for Statistical Agencies, pp. 111–133. North-Holland, Amsterdam (2001)Google Scholar
  17. Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.V. (eds.): Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam (2001)Google Scholar
  18. Duncan, G.T., Fienberg, S.E., Krishnan, R., Padman, R., Roehrig, S.F.: Disclosure Limitation Methods and Information Loss for Tabular Data. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 135–166. North-Holland, Amsterdam (2001)Google Scholar
  19. Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings 2003 ACM PODS Symposium on Principles of Database Systems (2003)Google Scholar
  20. Federal Committee on Statistical Methodology: Report on Statistical Disclosure and Disclosure-Avoidance Techniques. Statistical Policy Working Paper 2. Subcommittee on Disclosure-Avoidance Techniques. U.S. Department of Commerce, Washington, DC (1978)Google Scholar
  21. Federal Committee on Statistical Methodology: Report on Statistical Disclosure Limitation Methodology. Statistical Policy Working Paper 22. Subcommittee on Disclosure Limitation Methodology. Office of Management and Budget, Executive Office of the President, Washington, DC (1994)Google Scholar
  22. Fienberg, S.E.: Comment on a paper by M. Carlson and M. Salabasis: ’A data-swapping technique using ranks - A method for disclosure control’. Research in Official Statistics 5, 65–70 (2002)Google Scholar
  23. Fienberg, S.E., Makov, U.E., Meyer, M.M., Steele, R.J.: Computing the exact distribution for a multi-way contingency table conditional on its marginal totals. In: Saleh, A.K.E. (ed.) Data Analysis from Statistical Foundations: Papers in Honor of D.A.S. Fraser, pp. 145–165. Nova Science Publishing, New York (2001)Google Scholar
  24. Fienberg, S.E., Steele, R.J., Makov, U.E.: Statistical notions of data disclosure avoidance and their relationship to traditional statistical methodology: Data swapping and loglinear models. In: Proceedings of Bureau ofthe Census, Annual Research Conference, pp. 87–105. US Bureau of the Census, Washington (1996)Google Scholar
  25. Fienberg, S.E., Steele, R.J., Makov, U.E.: Disclosure limitation using perturbation and related methods for categorical data (with discussion). Journal of Official Statistics 14, 485–511 (1998)Google Scholar
  26. Gomatam, S., Karr, A.F.: Distortion measures for categorical data swapping. Technical Report 132, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)Google Scholar
  27. Gomatam, S., Karr, A.F., Sanil, A.: A risk-utility framework for categorical data swapping. Technical Report 132, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)Google Scholar
  28. Gomatam, S., Karr, A.F., Chunhua, C.L., Sanil, A.: Data swapping: A risk-utility framework and web service implementation. In: Technical Report 134, National Institute of Statistical Sciences, Research Triangle Park, NC (2003)Google Scholar
  29. Gomatam, S., Karr, A.F., Sanil, A.: Data swapping as a decision problem. Technical Report 140, National Institute of Statistical Sciences, Research Triangle Park, NC (2004)Google Scholar
  30. Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J., de Wolf, P.P.: Post randomization for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14, 463–478 (1998)Google Scholar
  31. Griffin, R., Navarro, A., Flores-Baez, L.: Disclosure avoidance for the 1990 census. In: Proceedings of the Section on Survey Research, American Statistical Association, pp. 516–521 (1989)Google Scholar
  32. Karr, A.F., Dobra, A., Sanil, A.P.: Table servers protect confidentiality in tabular data releases. Communications of the ACM 46, 57–58 (2003)CrossRefGoogle Scholar
  33. Muralidhar, K., Sarathy, R.: Masking numerical data: Past, present, and future. In: Presentation to Confidentiality and Data Access Committee of the Federal Committee on Statistical Methodology, Washington DC, April 2003 (2003a)Google Scholar
  34. Muralidhar, K., Sarathy, R.: Access, data utility and privacy. Summary from NSF Workshop on Confidentiality, Washington, DC, May 2003 (2003b)Google Scholar
  35. Moore, R.A.: Controlled data-swapping techniques for masking public use microdata sets. Statistical Research Division Report Series, RR96-04, U.S. Bureau of the Census (1996)Google Scholar
  36. Navarro, A., Flores-Baez, L., Thompson, J.: Results of Data Switching Simulation. Presented at the Spring meeting of the American Statistical Association and Population Statistics Census Advisory Committees (1988)Google Scholar
  37. Office of National Statistics: 2001 census disclosure control. Memorandum AG(01)06 dataed November 27 (2001)Google Scholar
  38. Reiss, S.P.: Practical data-swapping: The first steps. ACM Transactions on Database Systems 9, 20–37 (1984)zbMATHCrossRefGoogle Scholar
  39. Reiss, S.P., Post, M.J., Dalenius, T.: Non-reversible privacy transformations. In: Proceedings ofthe ACM Symposium on Principles of Database Systems, Los Angeles, California, March 29-31, pp. 139–146 (1982)Google Scholar
  40. Schlorer, J.: Security of statistical databases: multidimensional transformation. ACM Transactions on Database Systems 6, 95–112 (1981)CrossRefMathSciNetGoogle Scholar
  41. Takemura, A.: Local recoding and record swapping by maximum weight matching for disclosure control of microdata sets. Journal of Official Statistics 18, 275–289 (2002)MathSciNetGoogle Scholar
  42. Trottini, M.: Decision Models for Disclosure Limitation. Unpublished Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University (2003)Google Scholar
  43. Warner, S.L.: Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60, 63–69 (1965)CrossRefGoogle Scholar
  44. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, New york (2001)zbMATHCrossRefGoogle Scholar
  45. Zayatz, L.: SDC in the 2000 U.S. Decennial census. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 183–202. Springer, Heidelberg (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Stephen E. Fienberg
    • 1
  • Julie McIntyre
    • 2
  1. 1.Department of Statistics, Center for Automated Learning and Discovery, Center for Computer Communications and SecurityCarnegie Mellon UniversityPittsburghUSA
  2. 2.Department of StatisticsCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations