Advertisement

Journal of Global Optimization

, Volume 60, Issue 2, pp 165–182 | Cite as

Column generation bounds for numerical microaggregation

  • Daniel Aloise
  • Pierre Hansen
  • Caroline Rocha
  • Éverton Santi
Article

Abstract

The biggest challenge when disclosing private data is to share information contained in databases while protecting people from being individually identified. Microaggregation is a family of methods for statistical disclosure control. The principle of microaggregation is that confidentiality rules permit the publication of individual records if they are partitioned into groups of size larger or equal to a fixed threshold value, where none is more representative than the others in the same group. The application of such rules leads to replacing individual values by those computed from small groups (microaggregates), before data publication. This work proposes a column generation algorithm for numerical microaggregation in which its pricing problem is solved by a specialized branch-and-bound. The algorithm is able to find, for the first time, lower bounds for instances of three real-world datasets commonly used in the literature. Furthermore, new best known solutions are obtained for these instances by means of a simple heuristic method with the columns generated.

Keywords

Microaggregation Column generation Cuts Branch-and-bound 

Notes

Acknowledgments

Research of the first author has been supported by the National Council for Scientific and Technological Development—CNPq/Brazil Grant Numbers 474231/2010-0 and 305070/2011-8. The authors also thank Prof. Costas Panagiotakis for providing the Tarragona, Census and Eia datasets.

References

  1. 1.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu., A.: Approximation algorithms for \(k\)-anonymity. J. Privacy Tech. (2005).Google Scholar
  2. 2.
    Aloise, D., Hansen, P.: Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering. J. Glob. Optim. 49, 449–465 (2011)CrossRefGoogle Scholar
  3. 3.
    Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131, 195–220 (2012)CrossRefGoogle Scholar
  4. 4.
    Bonami, P., Lee, J.: BONMIN user’s manual. IBM Corporation, Tech. rep., New York (2007)Google Scholar
  5. 5.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  6. 6.
    Chang, C.C., Li, Y.C., Huang, W.H.: TRFP: An efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80, 1866–1878 (2007)CrossRefGoogle Scholar
  7. 7.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)CrossRefGoogle Scholar
  8. 8.
    Domingo-Ferrer, J., Torra, V.: Ordinal continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11, 195–212 (2005)CrossRefGoogle Scholar
  9. 9.
    Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)CrossRefGoogle Scholar
  10. 10.
    Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55, 714–732 (2008)CrossRefGoogle Scholar
  11. 11.
    du Merle, O., Hansen, P., Jaumard, B., Mladenović, N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)CrossRefGoogle Scholar
  12. 12.
    Elhallaoui, I., Villeneuve, D., Soumis, F., Desaulniers, G.: Dynamic aggregation of set-partitioning constraints in column generation. Oper. Res. 53, 632–645 (2005)CrossRefGoogle Scholar
  13. 13.
    Goffin, J.L., Haurie, A., Vial, J.-P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992)Google Scholar
  14. 14.
    Grötschel, M., Wakabayashi, Y.: Facets of the clique partitioning polytope. Math. Program. 47, 367–387 (1990)CrossRefGoogle Scholar
  15. 15.
    Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001)CrossRefGoogle Scholar
  16. 16.
    Hansen, P., Mladenović, N., Pérez, J.: Variable neighborhood search. Methods Appl. 4OR6, 319–360 (2008)Google Scholar
  17. 17.
    Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15, 1043–1044 (2003)CrossRefGoogle Scholar
  18. 18.
    Heinz, G., Peterson, L., Johnson, R., Kerk, C.: Exploring relationships in body dimensions. J. Stat. Educ. 11. www.amstat.org/publications/jse/v11n2/datasets.heinz.html (2003)
  19. 19.
    Ji, X., Mitchell, J.E.: Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement. Discret. Optim. 4, 87–102 (2007)CrossRefGoogle Scholar
  20. 20.
    Kabir, E., Wang, H., Zhang, Y.: A pairwise-systematic microaggregation for statistical disclosure control. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 266–273 (2010)Google Scholar
  21. 21.
    Koontz, W., Narendra, P., Fukunaga, K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C–24, 908–915 (1975)CrossRefGoogle Scholar
  22. 22.
    Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17, 902–911 (2005)CrossRefGoogle Scholar
  23. 23.
    Liberti, L.: Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1), 55–86 (2009)CrossRefGoogle Scholar
  24. 24.
    Lin, J.L., Hsieh, T.H., Chang, J.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37, 3256–3263 (2010)CrossRefGoogle Scholar
  25. 25.
    Marsten, R., Hogan, W., Blankenship, J.: The boxstep method for large-scale optimization. Oper. Res. 23, 389–405 (1975)CrossRefGoogle Scholar
  26. 26.
    Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Com. Eur. 18, 345–354 (2001)Google Scholar
  27. 27.
    Panagiotakis, C., Tziritas, G.: Sucessive group selection for microaggregation. IEEE Trans. Knowl. Data Eng. 25, 1191–1195 (2012)Google Scholar
  28. 28.
    Rebollo-Monedero, D., Forné, J., Soriano, M.: An algorithm for \(k\)-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers. Data Knowl. Eng. 70, 892–921 (2011)CrossRefGoogle Scholar
  29. 29.
    Rocha Neto, A., Barreto, G.: On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: A comparative analysis. IEEE Lat. Am. Trans. 7, 487–496 (2009)Google Scholar
  30. 30.
    Ryan, D., Foster, B.: An integer programming approach to scheduling. In: A. Wren (ed.) Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, pp. 269–280. North-Holland (1981)Google Scholar
  31. 31.
    Solanas, A., Gavalda, A., Rallo, R.: Micro-som: a linear-time multivariate microaggregation algorithm based on self-organizing maps. LNCS 5768, 525–535 (2009)Google Scholar
  32. 32.
    Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: V-MDAV: A multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC (2006)Google Scholar
  33. 33.
    Solanas, A., Martínez-Ballesté, A., Domingo-Ferrer, J., Mateo-Sanz, J.: A \(2^d\)-tree-based blocking method for microaggregating very large data sets. In: Proceedings of the First international conference on availability, reliability and security (2006)Google Scholar
  34. 34.
    Sun, X., Wang, H., Li, J., Zhang, Y.: An approximate microaggregation approach for microdata protection. Expert Syst. Appl. 39, 2211–2219 (2012)CrossRefGoogle Scholar
  35. 35.
    Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Syst 10, 557–570 (2002)CrossRefGoogle Scholar
  36. 36.
    Willenborg, L., DeWaal, T.: Elements of statistical disclosure control. Springer, New York (2001)CrossRefGoogle Scholar
  37. 37.
    Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101–1113 (1993)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Daniel Aloise
    • 1
  • Pierre Hansen
    • 2
  • Caroline Rocha
    • 1
  • Éverton Santi
    • 1
  1. 1.Universidade Federal do Rio Grande do NorteNatalBrazil
  2. 2.GERAD and HEC MontréalMontrealCanada

Personalised recommendations