Machine Learning

, Volume 75, Issue 2, pp 245–248 | Cite as

NP-hardness of Euclidean sum-of-squares clustering

  • Daniel AloiseEmail author
  • Amit Deshpande
  • Pierre Hansen
  • Preyas Popat


A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9–33, 2004), is not valid. An alternate short proof is provided.


Clustering Sum-of-squares Complexity 


  1. Aloise, D., & Hansen, P. (2007). On the complexity of minimum sum-of-squares clustering. Cahiers du GERAD, G-2007-50, July 2007, available online at
  2. Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In 2007 ACM-SIAM symposium on discrete algorithms (SODA’07). Google Scholar
  3. Beringer, J., & Hüllermeier, E. (2006). Online clustering of parallel data streams. Data & Knowledge Engineering, 58, 180–204. CrossRefGoogle Scholar
  4. Cilibrasi, R., van Iersel, L., Kelk, S., & Tromp, J. (2005). On the complexity of several haplotyping problems. Lecture Notes in Computer Science, 3692, 128–139. CrossRefGoogle Scholar
  5. Dasgupta, S. (2008). The hardness of k -means clustering (Technical Report CS2008-0916). University of California, 17 January 2008. Google Scholar
  6. Deshpande, A., & Popat, P. (2008). Email sent to Ravi Kannan et al. and transmitted by Nina Mishra to the first and third authors. 22 January 2008. Google Scholar
  7. Drineas, P., Frieze, A., Kannan, R., Vempala, S., & Vinay, V. (2004). Clustering large graphs via the singular value decomposition. Machine Learning, 56, 9–33. zbMATHCrossRefGoogle Scholar
  8. Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical programming. Mathematical Programming, 79, 191–215. MathSciNetGoogle Scholar
  9. Kanade, G., Nimbhorkar, P., & Varadarajan, K. (2008). On the NP-hardness of the 2-means problem. Manuscript of 14 February 2008. Google Scholar
  10. Matula, D., & Shahrokhi, F. (1990). Sparsest cuts and bottlenecks in graphs. Discrete Applied Mathematics, 27, 113–123. zbMATHCrossRefMathSciNetGoogle Scholar
  11. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley symposium on mathematical statistics and probability (Vol. 2, pp. 281–297), Berkeley, CA. Google Scholar
  12. Ostrovsky, R., Rabani, Y., Schulman, L. J., & Swamy, C. (2006). The effectiveness of Lloyd-type methods for the k-means problem. In Proceedings of the 47th annual IEEE symposium on foundations of computer science (FOCS’06). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Daniel Aloise
    • 1
    Email author
  • Amit Deshpande
    • 2
  • Pierre Hansen
    • 3
  • Preyas Popat
    • 4
  1. 1.École Polytechnique de MontréalMontrealCanada
  2. 2.Microsoft Research IndiaBangaloreIndia
  3. 3.GERAD and HEC MontréalMontrealCanada
  4. 4.Chennai Mathematical InstituteSiruseriIndia

Personalised recommendations