Abstract
The performance of Statistical Disclosure Control (SDC) methods for microdata (also called masking methods) is measured in terms of the utility and the disclosure risk associated to the protected microdata set. Empirical disclosure risk assessment based on record linkage stands out as a realistic and practical disclosure risk assessment methodology which is applicable to every conceivable masking method. The intruder is assumed to know an external data set, whose records are to be linked to those in the protected data set; the percent of correctly linked record pairs is a measure of disclosure risk. This paper reviews conventional record linkage, which assumes shared variables between the external and the protected data sets, and then shows that record linkage—and thus disclosure—is still possible without shared variables.
Similar content being viewed by others
References
Anderberg M.R. 1973. Cluster Analysis for Applications. Academic Press, New York.
Bacher J., Brand R., and Bender S. 2002. Re-identifying register data by survey data using cluster analysis: An empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5): 589–608.
Dempster A.P., Laird N.N., and Rubin D.B. 1977. Maximum likelihood from incomplete data via the EM algori0074hm. Journal of the Royal Statistical Society 39: 1–38.
Domingo-Ferrer J. and Torra V. 2002. Validating distance-based record linkage with probabilistic record linkage. Lecture Notes in Computer Science 2504: 207–215.
Domingo-Ferrer J. and Torra V. 2001. A quantitative comparison of disclosure control methods for microdata. In: Zayatz L., Doyle P., Theeuwes J., and Lane J. (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, Amsterdam, pp. 111–134.
Duda R.O., Hart P.E., and Stork D.G. 2001. Pattern Classification, 2nd edition, Wiley, New York.
Everitt B. 1977. Cluster Analysis. Heinemann Educational Books Ltd., London.
Fellegi I.P. and Sunter A.B. 1969. A theory of record linkage. Journal of the American Statistical Association 64: 1183–1210.
Fishburn P.C. and Rubinstein A. 1986. Aggregation of equivalence relations. Journal of Classification 3: 61–65.
Godo L. and Torra V. 2000. On aggregation operators for ordinal qualitative information. IEEE Transaction on Fuzzy Systems 8(2): 143–154.
Gill L. 2001. Methods for Automatic Record Matching and Linking and Their Use in National Statistics, National Statistics Methodology Series no. 25, London: Office for National Statistics.
Hastie T., Tibshirani R., and Friedman J. 2001. The Elements of Statistical Learning. Springer, Berlin.
Hoppner F., Klawonn F., Kruse R., and Runkler T. 1999. Fuzzy Cluster Analysis. Wiley, New York.
Jaro M.A. 1989. Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84: 414–420.
Neumann D.A. and Norton V.T. (Jr). 1986. Clustering and isolation in the consensus problem for partitions. Journal of Classification 3: 281–297.
Newcombe H.B., Kennedy J.M., Axford S.J., and James A.P. 1959. Automatic linkage of vital records. Science 130: 954–959.
Pagliuca D. and Seri G. 1999. Some Results of Individual Ranking Method on the System of Enterprise Accounts Annual Survey, Esprit SDC Project, Deliverable MI-3/D2.
Reinhard F. and Soeder H. 1997. Atlas des mathématiques, Librairie Générale Française, Paris.
Robinson-Cox J.F. 1998. A record-linkage approach to imputation of missing data: Analyzing tag retention in a tag-recapture experiment. Journal of Agricultural, Biological, and Environmental Statistics 3: 48–61.
Rosman D.L. 1995. The Linkage of Hospital and Police Information on Road Crash Casualties: An Investigation of Alternative Methods, Report N. RIIP-7.
Torra V. and Domingo-Ferrer J. 2003. Record linkage methods for multidatabase data mining. In: Torra V. (Ed.), Information Fusion in Data Mining, Springer, Berlin, pp. 99–130.
Winkler W.E. 1995a. Matching and record linkage. In: Cox B.G. (Ed.), Business Survey Methods, Wiley, New York, pp. 355–384.
Winkler W.E. 1995b. Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472. http://www.integrity.com
U.S. Bureau of the Census Data Extraction System, 2003. http://www.census.gov/DES/www/welcome.html.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Domingo-Ferrer, J., Torra, V. Disclosure risk assessment in statistical microdata protection via advanced record linkage. Statistics and Computing 13, 343–354 (2003). https://doi.org/10.1023/A:1025666923033
Issue Date:
DOI: https://doi.org/10.1023/A:1025666923033