Skip to main content

Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment

  • Conference paper
Privacy in Statistical Databases (PSD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4302))

Included in the following conference series:

Abstract

Distance-based record linkage (DBRL) is a common approach to empirically assessing the disclosure risk in SDC-protected microdata. Usually, the Euclidean distance is used. In this paper, we explore the potential advantages of using the Mahalanobis distance for DBRL. We illustrate our point for partially synthetic microdata and show that, in some cases, Mahalanobis DBRL can yield a very high re-identification percentage, far superior to the one offered by other record linkage methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata, European Project IST-2000-25069 CASC (2002), http://neon.vb.cbs.nl/casc

  2. Burridge, J.: Information preserving statistical obfuscation. Statistics and Computing 13, 321–327 (2003)

    Article  MathSciNet  Google Scholar 

  3. Dalenius, T.: Finding a needle in a haystack - or identifying anonymous census records. Journal of Official Statistics 2(3), 329–336 (1986)

    Google Scholar 

  4. Dandekar, R., Domingo-Ferrer, J., Sebé, F.: Lhs-based hybrid microdata vs rank swapping and microaggregation for numeric microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 153–162. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS 2001 (vol. 2), pp. 807–826. Eurostat, Luxemburg (2001)

    Google Scholar 

  6. Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation (manuscript, 2005)

    Google Scholar 

  7. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  8. Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Research data center-based confidentiality research: Systematic measures of re-identification risk based on the probabilistic links of the partially synthetic data back to the original microdata, final report. Technical report, Rovira i Virgili University and IIIA-CSIC (2005)

    Google Scholar 

  9. Domingo-Ferrer, J., Torra, V., Mateo-Sanz, J.M., Sebé, F.: Empirical disclosure risk assessment of the ipso synthetic data generators. In: Monographs in Official Statistics-Work Session On Statistical Data Confidentiality, pp. 227–238. Eurostat, Luxemburg (2006)

    Google Scholar 

  10. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)

    Article  Google Scholar 

  11. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84(406), 414–420 (1989)

    Article  Google Scholar 

  12. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)

    Article  Google Scholar 

  13. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  14. Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 101–132. Springer, Berlin (2003)

    Google Scholar 

  15. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Torra, V., Abowd, J.M., Domingo-Ferrer, J. (2006). Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_20

Download citation

  • DOI: https://doi.org/10.1007/11930242_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49330-3

  • Online ISBN: 978-3-540-49332-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics