On Optimizing the k-Ward Micro-aggregation Technique for Secure Statistical Databases

  • Ebaa Fayyoumi
  • B. John Oommen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4058)


We consider the problem of securing a statistical database by utilizing the well-known micro-aggregation strategy, and in particular, the k-Ward strategy introduced in [1] and utilized in [2]. The latter scheme, which represents the state-of-the-art, coalesces the sorted data attribute values into groups, and on being queried, reports the means of the corresponding groups. We demonstrate that such a scheme can be optimized on two fronts. First of all, we minimize the computations done in evaluating the between-class distance matrix, to require only a constant number of updating distance computations. Secondly, and more importantly, we propose that the data set be partitioned recursively before a k-Ward strategy is invoked, and that the latter be invoked on the “primitive” sub-groups which terminate the recursion. Our experimental results, done on two benchmark data sets, demonstrate a marked improvement. While the information loss is comparable to the k-Ward micro-aggregation technique proposed by Domingo-Ferrer et.al. [2], the computations required to achieve this loss is a fraction of the computations required in the latter – providing a computational advantage which sometimes exceeds 80% if one method is used by itself, and more than 90% if both enhancements are invoked simultaneously.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. J. American Statistical Association 58, 236–245 (1963)CrossRefGoogle Scholar
  2. 2.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14, 189–201 (2002)CrossRefGoogle Scholar
  3. 3.
    Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21, 515–556 (1989)CrossRefGoogle Scholar
  4. 4.
    Baeyens, Y., Defays, D.: Estimation of variance loss following microaggregation by the individual ranking method. In: Proceedings of Statistical Data Protection 1998, pp. 101–108. Office for Official Publications of the Eur. Comm., Luxembourg (1999)Google Scholar
  5. 5.
    Cuppen, M.: Source Data Perturbation in Statistical Disclosure Control. PhD thesis, Statistics Netherlands (2000)Google Scholar
  6. 6.
    Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Proceedings of Statistical Data Protection 1998, pp. 89–99. Office for Official Publications of the European Communities, Luxembourg (1999)Google Scholar
  7. 7.
    Hansen, S.L., Mukherjee, S.: A polynomial algorithm for univariate optimal microaggregation. IEEE Trans. on Know. and Data Eng. 15, 1043–1044 (2003)CrossRefGoogle Scholar
  8. 8.
    Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. on Know. and Data Eng. 17, 902–911 (2005)CrossRefGoogle Scholar
  9. 9.
    Mateo-Sanz, J.M., Domingo-Ferrer, J.: A comparative study of microaggregation methods. Questiio 22, 511–526 (1998)MATHGoogle Scholar
  10. 10.
    Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (2006)Google Scholar
  11. 11.
    Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of 92 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)Google Scholar
  12. 12.
    Defays, D., Anwar, N.: Micro-aggregation: A generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, pp. 69–78. Office for Official Publications of the European Communities, Luxembourg (1995)Google Scholar
  13. 13.
    Solanas, A., Martínez-Ballesté, A.: V-mdav: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)Google Scholar
  14. 14.
    Li, Y., Zhu, S., Wang, L., Jajodia, S.: A privacy-enhanced microaggregation method. In: Eiter, T., Schewe, K.-D. (eds.) FoIKS 2002. LNCS, vol. 2284, pp. 148–159. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Resampling for statistical confidentiality in contingency tables. Comp. and Math. with App. 38, 13–32 (1999)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Fayyoumi, E., Oommen, B.J.: (Enhancing k-ward micro-aggregation for secure statistical databases using distance-based and recursive optimizations) Unabridged Version of This PaperGoogle Scholar
  17. 17.
    Brucker, P.: On the complexity of clustering problems. In: Hehn, R., Korte, B., Oettli, W. (eds.) Optimization and Operations Research, pp. 45–54 (1977)Google Scholar
  18. 18.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. Springer, Berlin (2002)Google Scholar
  19. 19.
    Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare SDC methods for protection of numerical microdata. Technical report, CASC PROJECT, Computational Aspects of Statistical Confidentiality (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ebaa Fayyoumi
    • 1
  • B. John Oommen
    • 2
  1. 1.School of Computer ScienceCarleton UniversityOttawaCanada
  2. 2.Professor and Fellow of the IEEE, School of Computer ScienceCarleton UniversityOttawaCanada

Personalised recommendations