On the Complexity of Clustering with Relaxed Size Constraints

  • Massimiliano Goldwurm
  • Jianyi Lin
  • Francesco Saccà
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9778)

Abstract

We study the computational complexity of the problem of computing an optimal clustering \(\{A_1,A_2,...,A_k\}\) of a set of points assuming that every cluster size \(|A_i|\) belongs to a given set M of positive integers. We present a polynomial time algorithm for solving the problem in dimension 1, i.e. when the points are simply rational values, for an arbitrary set M of size constraints, which extends to the \(\ell _1\)-norm an analogous procedure known for the \(\ell _2\)-norm. Moreover, we prove that in the Euclidean plane, i.e. assuming dimension 2 and \(\ell _2\)-norm, the problem is NP-hard even with size constraints set reduced to \(M=\{2,3\}\).

Keywords

Geometric clustering problems Cluster size constraints Computational complexity Constrained k-means 

References

  1. 1.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–249 (2009)CrossRefGoogle Scholar
  2. 2.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman and Hall/CRC, Boca Raton (2008)MATHGoogle Scholar
  3. 3.
    Bertoni, A., Goldwurm, M., Lin, J., Saccà, F.: Size constrained distance clustering: separation properties and some complexity results. Fundamenta Informaticae 115(1), 125–139 (2012)MathSciNetMATHGoogle Scholar
  4. 4.
    Bertoni, A., Rè, M., Saccà, F., Valentini, G.: Identification of promoter regions in genomic sequences by 1-dimensional constraint clustering. In: Neural Nets WIRN11, pp. 162–169 (2011)Google Scholar
  5. 5.
    Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATHGoogle Scholar
  6. 6.
    Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. Technical report MSR-TR-2000-65, Miscrosoft Research Publication, May 2000Google Scholar
  7. 7.
    Dasgupta, S.: The hardness of \(k\)-means clustering. Technical report CS2007-0890, Department of Computer Science and Engineering, University of California, San Diego (2007)Google Scholar
  8. 8.
    Fisher, W.D.: On grouping for maximum homogeneity. J. Am. Stat. Assoc. 53(284), 789–798 (1958)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Fößmeier, U., Kant, G., Kaufmann, M.: 2-Visibility drawings of planar graphs. In: North, S. (ed.) Graph Drawing. LNCS, vol. 1190, pp. 155–168. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  10. 10.
    Hasegawa, S., Imai, H., Inaba, M., Katoh, N.: Efficient algorithms for variance-based \(k\)-clustering. In: Proceedings of Pacific Graphics 1993, pp. 75–89 (1993)Google Scholar
  11. 11.
    Knuth, D.E., Raghunathan, A.: The problem of compatible representatives. SIAM J. Discrete Math. 5(3), 422–427 (1992)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Lichtenstein, D.: Planar formulae and their uses. SIAM J. Comput. 11(2), 329–343 (1982)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Lin, J., Bertoni, A., Goldwurm, M.: Exact algorithms for size constrained 2-clustering in the plane. Theor. Comput. Sci. 629, 80–95 (2016)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129–137 (1982)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  16. 16.
    Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Mulzer, W., Rote, G.: Minimum-weight triangulation is NP-hard. J. ACM 55(2), 11 (2008)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Papadimitriou, C., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover, New York (1998)MATHGoogle Scholar
  19. 19.
    Rao, M.R.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66(335), 622–626 (1971)CrossRefMATHGoogle Scholar
  20. 20.
    Stephan, R.: Cardinality constrained combinatorial optimization: complexity and polyhedra. Discrete Optim. 7(3), 99–113 (2010)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large databases. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405–419. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  22. 22.
    Vattani, A.: \(k\)-means requires exponentially many iterations even in the plane. Discrete Comput. Geom. 45(4), 596–616 (2011)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Vazirani, V.: Approximation Algorithms. Springer, Heidelberg (2001)MATHGoogle Scholar
  24. 24.
    Vinod, H.: Integer programming and the theory of grouping. J. Am. Stat. Assoc. 64(326), 506–519 (1969)CrossRefMATHGoogle Scholar
  25. 25.
    Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning, pp. 1103–1110 (2000)Google Scholar
  26. 26.
    Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowl. Based Syst. 23(8), 883–889 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Massimiliano Goldwurm
    • 2
  • Jianyi Lin
    • 1
  • Francesco Saccà
    • 1
  1. 1.Dipartimento di InformaticaUniversità degli Studi di MilanoMilanItaly
  2. 2.Dipartimento di MatematicaUniversità degli Studi di MilanoMilanItaly

Personalised recommendations