Statistics and Computing

, Volume 13, Issue 4, pp 343–354 | Cite as

Disclosure risk assessment in statistical microdata protection via advanced record linkage

  • Josep Domingo-Ferrer
  • Vicenç Torra

Abstract

The performance of Statistical Disclosure Control (SDC) methods for microdata (also called masking methods) is measured in terms of the utility and the disclosure risk associated to the protected microdata set. Empirical disclosure risk assessment based on record linkage stands out as a realistic and practical disclosure risk assessment methodology which is applicable to every conceivable masking method. The intruder is assumed to know an external data set, whose records are to be linked to those in the protected data set; the percent of correctly linked record pairs is a measure of disclosure risk. This paper reviews conventional record linkage, which assumes shared variables between the external and the protected data sets, and then shows that record linkage—and thus disclosure—is still possible without shared variables.

statistical disclosure control record linkage disclosure risk for microdata re-identification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderberg M.R. 1973. Cluster Analysis for Applications. Academic Press, New York.Google Scholar
  2. Bacher J., Brand R., and Bender S. 2002. Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 589–608.Google Scholar
  3. Dempster A.P., Laird N.N., and Rubin D.B. 1977. Maximum likelihood from incomplete data via the EM algori0074hm. Journal of the Royal Statistical Society 39: 1–38.Google Scholar
  4. Domingo-Ferrer J. and Torra V. 2002. Validating distance-based record linkage with probabilistic record linkage. Lecture Notes in Computer Science 2504: 207–215.Google Scholar
  5. Domingo-Ferrer J. and Torra V. 2001. A quantitative comparison of disclosure control methods for microdata. In: Zayatz L., Doyle P., Theeuwes J., and Lane J. (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, Amsterdam, pp. 111–134.Google Scholar
  6. Duda R.O., Hart P.E., and Stork D.G. 2001. Pattern Classification, 2nd edition, Wiley, New York.Google Scholar
  7. Everitt B. 1977. Cluster Analysis. Heinemann Educational Books Ltd., London.Google Scholar
  8. Fellegi I.P. and Sunter A.B. 1969. A theory of record linkage. Journal of the American Statistical Association 64: 1183–1210.Google Scholar
  9. Fishburn P.C. and Rubinstein A. 1986. Aggregation of equivalence relations. Journal of Classification 3: 61–65.Google Scholar
  10. Godo L. and Torra V. 2000. On aggregation operators for ordinal qualitative information. IEEE Transaction on Fuzzy Systems 8(2): 143–154.Google Scholar
  11. Gill L. 2001. Methods for Automatic Record Matching and Linking and Their Use in National Statistics, National Statistics Methodology Series no. 25, London: Office for National Statistics.Google Scholar
  12. Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning. Springer, Berlin.Google Scholar
  13. Hoppner F., Klawonn F., Kruse R., and Runkler T. 1999. Fuzzy Cluster Analysis. Wiley, New York.Google Scholar
  14. Jaro M.A. 1989. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84: 414–420.Google Scholar
  15. Neumann D.A. and Norton V.T. (Jr). 1986. Clustering and isolation in the consensus problem for partitions. Journal of Classification 3: 281–297.Google Scholar
  16. Newcombe H.B., Kennedy J.M., Axford S.J., and James A.P. 1959. Automatic linkage of vital records. Science 130: 954–959.Google Scholar
  17. Pagliuca D. and Seri G. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project, Deliverable MI-3/D2.Google Scholar
  18. Reinhard F. and Soeder H. 1997. Atlas des mathématiques, Librairie Générale Française, Paris.Google Scholar
  19. Robinson-Cox J.F. 1998. A record-linkage approach to imputation of missing data: Analyzing tag retention in a tag-recapture experiment. Journal of Agricultural, Biological, and Environmental Statistics 3: 48–61.Google Scholar
  20. Rosman D.L. 1995. The Linkage of Hospital and Police Information on Road Crash Casualties: An Investigation of Alternative Methods, Report N. RIIP-7.Google Scholar
  21. Torra V. and Domingo-Ferrer J. 2003. Record linkage methods for multidatabase data mining. In: Torra V. (Ed.), Information Fusion in Data Mining, Springer, Berlin, pp. 99–130.Google Scholar
  22. Winkler W.E. 1995a. Matching and record linkage. In: Cox B.G. (Ed.), Business Survey Methods, Wiley, New York, pp. 355–384.Google Scholar
  23. Winkler W.E. 1995b. Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472. http://www.integrity.comGoogle Scholar
  24. U.S. Bureau of the Census Data Extraction System, 2003. http://www.census.gov/DES/www/welcome.html.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Josep Domingo-Ferrer
    • 1
  • Vicenç Torra
    • 2
  1. 1.Dept. Comput. Eng. and Maths—ETSEUniversitat Rovira i VirgiliTarragona, Catalonia
  2. 2.Institut d'Investigació en Intel.ligència Artificial—CSIC, Campus UABBellaterra, Catalonia

Personalised recommendations