Skip to main content

NP-hardness of Euclidean sum-of-squares clustering


A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9–33, 2004), is not valid. An alternate short proof is provided.


  • Aloise, D., & Hansen, P. (2007). On the complexity of minimum sum-of-squares clustering. Cahiers du GERAD, G-2007-50, July 2007, available online at

  • Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In 2007 ACM-SIAM symposium on discrete algorithms (SODA’07).

  • Beringer, J., & Hüllermeier, E. (2006). Online clustering of parallel data streams. Data & Knowledge Engineering, 58, 180–204.

    Article  Google Scholar 

  • Cilibrasi, R., van Iersel, L., Kelk, S., & Tromp, J. (2005). On the complexity of several haplotyping problems. Lecture Notes in Computer Science, 3692, 128–139.

    Article  Google Scholar 

  • Dasgupta, S. (2008). The hardness of k -means clustering (Technical Report CS2008-0916). University of California, 17 January 2008.

  • Deshpande, A., & Popat, P. (2008). Email sent to Ravi Kannan et al. and transmitted by Nina Mishra to the first and third authors. 22 January 2008.

  • Drineas, P., Frieze, A., Kannan, R., Vempala, S., & Vinay, V. (2004). Clustering large graphs via the singular value decomposition. Machine Learning, 56, 9–33.

    MATH  Article  Google Scholar 

  • Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical programming. Mathematical Programming, 79, 191–215.

    MathSciNet  Google Scholar 

  • Kanade, G., Nimbhorkar, P., & Varadarajan, K. (2008). On the NP-hardness of the 2-means problem. Manuscript of 14 February 2008.

  • Matula, D., & Shahrokhi, F. (1990). Sparsest cuts and bottlenecks in graphs. Discrete Applied Mathematics, 27, 113–123.

    MATH  Article  MathSciNet  Google Scholar 

  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley symposium on mathematical statistics and probability (Vol. 2, pp. 281–297), Berkeley, CA.

  • Ostrovsky, R., Rabani, Y., Schulman, L. J., & Swamy, C. (2006). The effectiveness of Lloyd-type methods for the k-means problem. In Proceedings of the 47th annual IEEE symposium on foundations of computer science (FOCS’06).

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Daniel Aloise.

Additional information

Editor: Nina Mishra.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Aloise, D., Deshpande, A., Hansen, P. et al. NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75, 245–248 (2009).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Clustering
  • Sum-of-squares
  • Complexity