Skip to main content
Log in

Disclosure risk assessment in statistical microdata protection via advanced record linkage

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The performance of Statistical Disclosure Control (SDC) methods for microdata (also called masking methods) is measured in terms of the utility and the disclosure risk associated to the protected microdata set. Empirical disclosure risk assessment based on record linkage stands out as a realistic and practical disclosure risk assessment methodology which is applicable to every conceivable masking method. The intruder is assumed to know an external data set, whose records are to be linked to those in the protected data set; the percent of correctly linked record pairs is a measure of disclosure risk. This paper reviews conventional record linkage, which assumes shared variables between the external and the protected data sets, and then shows that record linkage—and thus disclosure—is still possible without shared variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Anderberg M.R. 1973. Cluster Analysis for Applications. Academic Press, New York.

    Google Scholar 

  • Bacher J., Brand R., and Bender S. 2002. Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 589–608.

    Google Scholar 

  • Dempster A.P., Laird N.N., and Rubin D.B. 1977. Maximum likelihood from incomplete data via the EM algori0074hm. Journal of the Royal Statistical Society 39: 1–38.

    Google Scholar 

  • Domingo-Ferrer J. and Torra V. 2002. Validating distance-based record linkage with probabilistic record linkage. Lecture Notes in Computer Science 2504: 207–215.

    Google Scholar 

  • Domingo-Ferrer J. and Torra V. 2001. A quantitative comparison of disclosure control methods for microdata. In: Zayatz L., Doyle P., Theeuwes J., and Lane J. (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, Amsterdam, pp. 111–134.

    Google Scholar 

  • Duda R.O., Hart P.E., and Stork D.G. 2001. Pattern Classification, 2nd edition, Wiley, New York.

    Google Scholar 

  • Everitt B. 1977. Cluster Analysis. Heinemann Educational Books Ltd., London.

    Google Scholar 

  • Fellegi I.P. and Sunter A.B. 1969. A theory of record linkage. Journal of the American Statistical Association 64: 1183–1210.

    Google Scholar 

  • Fishburn P.C. and Rubinstein A. 1986. Aggregation of equivalence relations. Journal of Classification 3: 61–65.

    Google Scholar 

  • Godo L. and Torra V. 2000. On aggregation operators for ordinal qualitative information. IEEE Transaction on Fuzzy Systems 8(2): 143–154.

    Google Scholar 

  • Gill L. 2001. Methods for Automatic Record Matching and Linking and Their Use in National Statistics, National Statistics Methodology Series no. 25, London: Office for National Statistics.

    Google Scholar 

  • Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning. Springer, Berlin.

    Google Scholar 

  • Hoppner F., Klawonn F., Kruse R., and Runkler T. 1999. Fuzzy Cluster Analysis. Wiley, New York.

    Google Scholar 

  • Jaro M.A. 1989. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84: 414–420.

    Google Scholar 

  • Neumann D.A. and Norton V.T. (Jr). 1986. Clustering and isolation in the consensus problem for partitions. Journal of Classification 3: 281–297.

    Google Scholar 

  • Newcombe H.B., Kennedy J.M., Axford S.J., and James A.P. 1959. Automatic linkage of vital records. Science 130: 954–959.

    Google Scholar 

  • Pagliuca D. and Seri G. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project, Deliverable MI-3/D2.

  • Reinhard F. and Soeder H. 1997. Atlas des mathématiques, Librairie Générale Française, Paris.

    Google Scholar 

  • Robinson-Cox J.F. 1998. A record-linkage approach to imputation of missing data: Analyzing tag retention in a tag-recapture experiment. Journal of Agricultural, Biological, and Environmental Statistics 3: 48–61.

    Google Scholar 

  • Rosman D.L. 1995. The Linkage of Hospital and Police Information on Road Crash Casualties: An Investigation of Alternative Methods, Report N. RIIP-7.

  • Torra V. and Domingo-Ferrer J. 2003. Record linkage methods for multidatabase data mining. In: Torra V. (Ed.), Information Fusion in Data Mining, Springer, Berlin, pp. 99–130.

    Google Scholar 

  • Winkler W.E. 1995a. Matching and record linkage. In: Cox B.G. (Ed.), Business Survey Methods, Wiley, New York, pp. 355–384.

    Google Scholar 

  • Winkler W.E. 1995b. Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472. http://www.integrity.com

  • U.S. Bureau of the Census Data Extraction System, 2003. http://www.census.gov/DES/www/welcome.html.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domingo-Ferrer, J., Torra, V. Disclosure risk assessment in statistical microdata protection via advanced record linkage. Statistics and Computing 13, 343–354 (2003). https://doi.org/10.1023/A:1025666923033

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025666923033

Navigation