On Optimizing the k-Ward Micro-aggregation Technique for Secure Statistical Databases
We consider the problem of securing a statistical database by utilizing the well-known micro-aggregation strategy, and in particular, the k-Ward strategy introduced in  and utilized in . The latter scheme, which represents the state-of-the-art, coalesces the sorted data attribute values into groups, and on being queried, reports the means of the corresponding groups. We demonstrate that such a scheme can be optimized on two fronts. First of all, we minimize the computations done in evaluating the between-class distance matrix, to require only a constant number of updating distance computations. Secondly, and more importantly, we propose that the data set be partitioned recursively before a k-Ward strategy is invoked, and that the latter be invoked on the “primitive” sub-groups which terminate the recursion. Our experimental results, done on two benchmark data sets, demonstrate a marked improvement. While the information loss is comparable to the k-Ward micro-aggregation technique proposed by Domingo-Ferrer et.al. , the computations required to achieve this loss is a fraction of the computations required in the latter – providing a computational advantage which sometimes exceeds 80% if one method is used by itself, and more than 90% if both enhancements are invoked simultaneously.
KeywordsDistance Matrix Data Vector Information Loss Statistical Database Recursive Call
Unable to display preview. Download preview PDF.
- 4.Baeyens, Y., Defays, D.: Estimation of variance loss following microaggregation by the individual ranking method. In: Proceedings of Statistical Data Protection 1998, pp. 101–108. Office for Official Publications of the Eur. Comm., Luxembourg (1999)Google Scholar
- 5.Cuppen, M.: Source Data Perturbation in Statistical Disclosure Control. PhD thesis, Statistics Netherlands (2000)Google Scholar
- 6.Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Proceedings of Statistical Data Protection 1998, pp. 89–99. Office for Official Publications of the European Communities, Luxembourg (1999)Google Scholar
- 10.Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (2006)Google Scholar
- 11.Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)Google Scholar
- 12.Defays, D., Anwar, N.: Micro-aggregation: A generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, pp. 69–78. Office for Official Publications of the European Communities, Luxembourg (1995)Google Scholar
- 13.Solanas, A., Martínez-Ballesté, A.: V-mdav: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)Google Scholar
- 16.Fayyoumi, E., Oommen, B.J.: (Enhancing k-ward micro-aggregation for secure statistical databases using distance-based and recursive optimizations) Unabridged Version of This PaperGoogle Scholar
- 17.Brucker, P.: On the complexity of clustering problems. In: Hehn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research, pp. 45–54 (1977)Google Scholar
- 18.Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. Springer, Berlin (2002)Google Scholar
- 19.Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. Technical report, CASC PROJECT, Computational Aspects of Statistical Confidentiality (2002)Google Scholar