On Optimizing the k-Ward Micro-aggregation Technique for Secure Statistical Databases

  • Ebaa Fayyoumi
  • B. John Oommen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4058)


We consider the problem of securing a statistical database by utilizing the well-known micro-aggregation strategy, and in particular, the k-Ward strategy introduced in [1] and utilized in [2]. The latter scheme, which represents the state-of-the-art, coalesces the sorted data attribute values into groups, and on being queried, reports the means of the corresponding groups. We demonstrate that such a scheme can be optimized on two fronts. First of all, we minimize the computations done in evaluating the between-class distance matrix, to require only a constant number of updating distance computations. Secondly, and more importantly, we propose that the data set be partitioned recursively before a k-Ward strategy is invoked, and that the latter be invoked on the “primitive” sub-groups which terminate the recursion. Our experimental results, done on two benchmark data sets, demonstrate a marked improvement. While the information loss is comparable to the k-Ward micro-aggregation technique proposed by Domingo-Ferrer [2], the computations required to achieve this loss is a fraction of the computations required in the latter – providing a computational advantage which sometimes exceeds 80% if one method is used by itself, and more than 90% if both enhancements are invoked simultaneously.


Distance Matrix Data Vector Information Loss Statistical Database Recursive Call 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. J. American Statistical Association 58, 236–245 (1963)CrossRefGoogle Scholar
  2. 2.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 189–201 (2002)CrossRefGoogle Scholar
  3. 3.
    Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 515–556 (1989)CrossRefGoogle Scholar
  4. 4.
    Baeyens, Y., Defays, D.: Estimation of variance loss following microaggregation by the individual ranking method. In: Proceedings of Statistical Data Protection 1998, pp. 101–108. Office for Official Publications of the Eur. Comm., Luxembourg (1999)Google Scholar
  5. 5.
    Cuppen, M.: Source Data Perturbation in Statistical Disclosure Control. PhD thesis, Statistics Netherlands (2000)Google Scholar
  6. 6.
    Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Proceedings of Statistical Data Protection 1998, pp. 89–99. Office for Official Publications of the European Communities, Luxembourg (1999)Google Scholar
  7. 7.
    Hansen, S.L., Mukherjee, S.: A polynomial algorithm for univariate optimal microaggregation. IEEE Trans. on Know. and Data Eng. 15, 1043–1044 (2003)CrossRefGoogle Scholar
  8. 8.
    Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. on Know. and Data Eng. 17, 902–911 (2005)CrossRefGoogle Scholar
  9. 9.
    Mateo-Sanz, J.M., Domingo-Ferrer, J.: A comparative study of microaggregation methods. Questiio 22, 511–526 (1998)zbMATHGoogle Scholar
  10. 10.
    Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (2006)Google Scholar
  11. 11.
    Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)Google Scholar
  12. 12.
    Defays, D., Anwar, N.: Micro-aggregation: A generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, pp. 69–78. Office for Official Publications of the European Communities, Luxembourg (1995)Google Scholar
  13. 13.
    Solanas, A., Martínez-Ballesté, A.: V-mdav: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)Google Scholar
  14. 14.
    Li, Y., Zhu, S., Wang, L., Jajodia, S.: A privacy-enhanced microaggregation method. In: Eiter, T., Schewe, K.-D. (eds.) FoIKS 2002. LNCS, vol. 2284, pp. 148–159. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Resampling for statistical confidentiality in contingency tables. Comp. and Math. with App. 38, 13–32 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Fayyoumi, E., Oommen, B.J.: (Enhancing k-ward micro-aggregation for secure statistical databases using distance-based and recursive optimizations) Unabridged Version of This PaperGoogle Scholar
  17. 17.
    Brucker, P.: On the complexity of clustering problems. In: Hehn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research, pp. 45–54 (1977)Google Scholar
  18. 18.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. Springer, Berlin (2002)Google Scholar
  19. 19.
    Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. Technical report, CASC PROJECT, Computational Aspects of Statistical Confidentiality (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ebaa Fayyoumi
    • 1
  • B. John Oommen
    • 2
  1. 1.School of Computer ScienceCarleton UniversityOttawaCanada
  2. 2.Professor and Fellow of the IEEE, School of Computer ScienceCarleton UniversityOttawaCanada

Personalised recommendations