Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5755))

Included in the following conference series:

Abstract

In this paper, we propose a new approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. The so called Minimum Sum-of-Squares Clustering (MSSC in short) is first formulated in the form of a hard combinatorial optimization problem. It is afterwards recast as a (continuous) DC program with the help of exact penalty in DC programming. A DCA scheme is then investigated. The related DCA is original and very inexpensive because it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, that all are given in the explicit form. Numerical results on real word data sets show the efficiency of DCA and its great superiority with respect to K-means, a standard method of clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: Np-hardness of Euclidean Sum-of-squares Clustering, Cahiers du GERAD, G-2008-33 (2008)

    Google Scholar 

  2. Arora, S., Kannan, R.: Learning Mixtures of Arbitrary Gaussians. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 247–257 (2001)

    Google Scholar 

  3. Bradley, B.S., Mangasarian, O.L.: Feature Selection via Concave Minimization and Support Vector Machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conferences (ICML 1998), pp. 82–90. MorganKaufmann, San Francisco (1998)

    Google Scholar 

  4. Brusco, M.J.: A Repetitive Branch-and-bound Procedure for Minimum Within-cluster Sum of Squares Partitioning. Psychometrika 71, 347–363 (2006)

    Article  MathSciNet  Google Scholar 

  5. Dhilon, I.S., Korgan, J., Nicholas, C.: Feature Selection and Document Clustering. In: Berry, M.W. (ed.) A Comprehensive Survey of Text Mining, pp. 73–100. Springer, Heidelberg (2003)

    Google Scholar 

  6. Duda, R.O., Hart, P.E.: Pattern classification and Scene Analysis. Wiley, Chichester (1972)

    Google Scholar 

  7. Feder, T., Greene, D.: Optimal Algorithms for Approximate Clustering. In: Proc. STOC (1988)

    Google Scholar 

  8. Fisher, D.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)

    Google Scholar 

  9. Forgy, E.: Cluster Analysis of Multivariate Date: Efficiency vs. Interpretability of Classifications. Biometrics, 21–768 (1965)

    Google Scholar 

  10. Jancey, R.C., Botany, J.: Multidimensional Group Analysis. Australian, 14–127 (1966)

    Google Scholar 

  11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a Review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  12. Krause, N., Singer, Y.: Leveraging the Margin More Carefully. In: International Conference on Machine Learning ICML (2004)

    Google Scholar 

  13. Le, T.H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen (1997)

    Google Scholar 

  14. Le, T.H.A., Pham, D.T.: Solving a Class of Linearly Constrained Indefinite Quadratic Problems by DC Algorithms. Journal of Global Optimization 11, 253–285 (1997)

    Article  MATH  Google Scholar 

  15. Le, T.H.A., Pham, D.T.: The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems. Annals of Operations Research 133, 23–46 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Le, T.H.A., Pham, D.T., Huynh, V.: Ngai, Exact penalty in DC Programming, Technical Report. LMI, INSA-Rouen (2005)

    Google Scholar 

  17. Le, T.H.A., Belghiti, T., Pham, D.T.: A New Efficient Algorithm Based on DC Programming and DCA for Clustering. Journal of Global Optimization 37, 593–608 (2007)

    Article  MATH  Google Scholar 

  18. Le, T.H.A., Le, H.M., Pham, D.T.: Optimization Based DC Programming and DCA for Hierarchical Clustering. European Journal of Operational Research (2006)

    Google Scholar 

  19. Le, T.H.A., Le, H.M., Nguyen, V.V., Pham, D.T.: A DC Programming Approach for Feature Selection in Support Vector Machines Learning. Journal of Advances in Data Analysis and Classification 2, 259–278 (2008)

    Article  Google Scholar 

  20. Liu, Y., Shen, X., Doss, H.: Multicategory ψ-Learning and Support Vector Machine: Computational Tools. Journal of Computational and Graphical Statistics 14, 219–236 (2005)

    Article  MathSciNet  Google Scholar 

  21. Liu, Y., Shen, X.: Multicategoryψ -Learning. Journal of the American Statistical Association 101, 500–509 (2006)

    Article  MathSciNet  Google Scholar 

  22. Mangasarian, O.L.: Mathematical Programming in Data Mining. Data Mining and Knowledge Discovery 1, 183–201 (1997)

    Article  Google Scholar 

  23. MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  24. Merle, O.D., Hansen, P., Jaumard, B., Mladenovi’c, N.: An Interior Point Algorithm for Minimum Sum of Squares Clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)

    Article  MATH  Google Scholar 

  25. Neumann, J., Schnörr, C., Steidl, G.: SVM-based feature selection by direct objective minimisation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 212–219. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  26. Peng, J., Xiay, Y.: A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering. In: Proceedings of the SIAM International Data Mining Conference (2005)

    Google Scholar 

  27. Pham, D.T., Le, T.H.A.: DC Optimization Algorithms for Solving the Trust Region Subproblem. SIAM J. Optimization 8, 476–505 (1998)

    Article  MATH  Google Scholar 

  28. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215 (1997)

    MathSciNet  Google Scholar 

  29. Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)

    MATH  Google Scholar 

  30. Ronan, C., Fabian, S., Jason, W., Léon, B.: Trading Convexity for Scalability. In: International Conference on Machine Learning ICML (2006)

    Google Scholar 

  31. Shen, X., Tseng, G.C., Zhang, X., Wong, W.H.: ψ -Learning. Journal of American Statistical Association 98, 724–734 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  32. Sherali, H.D., Desai, J.: A global Optimization RLT-based Approach for Solving the Hard Clustering Problem. Journal of Global Optimization 32, 281–306 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  33. Yuille, A.L., Rangarajan, A.: The Convex Concave Procedure (CCCP). In: Advances in Neural Information Processing System, vol. 14. MIT Press, Cambrige (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoai An, L.T., Tao, P.D. (2009). Minimum Sum-of-Squares Clustering by DC Programming and DCA. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04020-7_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04019-1

  • Online ISBN: 978-3-642-04020-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics