Analysis of the Univariate Microaggregation Disclosure Risk
- 64 Downloads
Microaggregation is a protection method used by statistical agencies to limit the disclosure risk of confidential information. Formally, microaggregation assigns each original datum to a small cluster and then replaces the original data with the centroid of such cluster. As clusters contain at least k records, microaggregation can be considered as preserving k-anonymity. Nevertheless, this is only so when multivariate microaggregation is applied and, moreover, when all variables are microaggregated at the same time.
When different variables are protected using univariate microaggregation, k-anonymity is only ensured at the variable level. Therefore, the real k-anonymity decreases for most of the records and it is then possible to cause a leakage of privacy. Due to this, the analysis of the disclosure risk is still meaningful in microaggregation.
This paper proposes a new record linkage method for univariate microaggregation based on finding the optimal alignment between the original and the protected sorted variables. We show that our method, which uses a DTW distance to compute the optimal alignment, provides the intruder with enough information in many cases to to decide if the link is correct or not. Note that, standard record linkage methods never ensure the correctness of the linkage. Furthermore, we present some experiments using two well-known data sets, which show that our method has better results (larger number of correct links) than the best standard record linkage method.
Keywords:Privacy on Statistical Databases Privacy Preserving Data Mining Record Linkage Microaggregation DTW Distance
Unable to display preview. Download preview PDF.
- 2.Brand, R., Domingo-Ferrer, J., and Mateo-Sanz, J. M., “Reference datasets to test and compare sdc methods for protection of numerical microdata,” European Project IST-2000-25069(CASC), 2002.Google Scholar
- 3.Capitani, P. and Ciaccia, P., “Efficiently and Accurately Comparing Real-valued Data Streams,” Proc. SEBD, pp. 161-168, 2005.Google Scholar
- 4.Data Extraction System, U.S. Census Bureau, http://www.census.gov/
- 5.Domingo-Ferrer, J. and Torra, V., “A Quantitative Comparison of Disclosure Control Methods for Microdata,” Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science, pp.111-133, 2001.Google Scholar
- 10.U.S. Energy Information Authority, http://www.eia.doe.gov/
- 12.Hundepool, A., Van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R. and Giessing, S., μ-ARGUS version 3.2 Software and User's Manual, Statistics Netherlands, Voorburg NL, 2003.Google Scholar
- 13.Jaro, M. A., “Advances in Record Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida,” Journal of the American Statistical Society, 84, 406, pp. 414-420, 1989.Google Scholar
- 14.Lane, J., Heus, P. and Mulcahy, T., “Data Access in a Cyber World: Making Use of Cyberinfrastructure,” Transactions on Data Privacy, 1, 1, pp. 216, 2008.Google Scholar
- 16.Myers, C. S. and Rabiner, L. R., “A comparative study of several dynamic time-warping algorithms for connected word recognition,” The Bell System Technical Journal, 60, pp. 1389-1409, 1981.Google Scholar
- 20.Oganian, A. and Domingo-Ferrer, J., “On the Complexity of Optimal Microaggregation for Statistical Disclosure Control,” Statistical J. United Nations Economic Commission for Europe, 18, 4, pp. 345-354, 2000.Google Scholar
- 21.Pagliuca, D. and Seri, G., “Some results of individual ranking method on the system of enterprise accounts annual survey,” Esprit SDC Project, Deliverable MI-3/D2., 1999.Google Scholar
- 22.Ratanamahatana, C. and Keogh, E., “Three Myths about Dynamic Time Warping Data Mining,” SIAM Int. Conf. on Data Mining, 2005.Google Scholar
- 23.Samarati, P. and Sweeney, L., “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory. Palo Alto, CA, 1998.Google Scholar
- 25.Templ, M., “Statistical Disclosure Control for Microdata Using the R-Package sdcMicro,” Transactions on Data Privacy, 1, 2, pp. 67-85. 2008.Google Scholar
- 26.Torra, V., Abowd, J. M. and Domingo-Ferrer, J., “Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment,” LNCS 4302, pp. 233-242, 2006.Google Scholar
- 27.Torra, V. and Domingo-Ferrer, J. “Record linkage methods for multidatabase data mining,” Information Fusion in Data Mining, pp. 101-132., 2003.Google Scholar
- 28.Torra, V. and Miyamoto, S., “Evaluating fuzzy clustering algorithms for microdata protection,” LNCS 3050, pp. 175-186, 2004Google Scholar
- 30.Torra, V., “Constrained Microaggregation: Adding Constraints for Data Editing,” Transactions on Data Privacy, 1, 2, pp. 86-104, 2008.Google Scholar
- 31.Wu, X., Bertino, E., “Achieving K-anonymity in mobile ad hoc networks,” 1st IEEE ICNP Workshop on Secure Network Protocols, pp. 37-42, 2005.Google Scholar