Skip to main content
Log in

Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Artina, M., Fornasier, M., Solombrino, F.: Linearly constrained nonsmooth and nonconvex minimization. SIAM J. Optim. 23(3), 1904–1937 (2013)

    Article  MathSciNet  Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics (2007)

  3. Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., Zecchina, R.: Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Phys. Rev. Lett. 115(12), 101–128 (2015)

    Article  Google Scholar 

  4. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (2008)

    Google Scholar 

  5. Bottou, L., Bengio, Y.: Convergence properties of the \(k\)-means algorithms. Adv. Neural Inf. Process. Syst. 3, 82 (1995)

    Google Scholar 

  6. Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., Zecchina, R.: Entropy-SGD: Biasing Gradient Descent into Wide Valleys (2016). arXiv:1611.01838

  7. Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep Relaxation: Partial Differential Equations for Optimizing Deep Neural Networks (2017). arXiv:1704.04932

  8. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., Mytkowicz, T.: Yinyang \(k\)-means: a drop-in replacement of the classic \(k\)-means with consistent speedup. In: Proceedings of the 32nd International Conference on Machine Learning (2015)

  9. Elkan, C.: Using the triangle inequality to accelerate \(k\)-means. In: Proceedings of the 20th International Conference on Machine Learning (2003)

  10. Kaplan, A., Tichatschke, R.: Proximal point method and nonconvex optimization. J. Global Optim. 13, 389–406 (1998)

    Article  MathSciNet  Google Scholar 

  11. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  13. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  14. Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    Article  MathSciNet  Google Scholar 

  15. Newling, J., Fleuret, F.: Nested mini-batch \(k\)-means. In: Advances in Neural Information Processing Systems, pp. 1352–1360 (2016)

  16. Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  17. Sculley, D.: Web-scale \(k\)-means clustering. In: Proceedings of the 19th International Conference on World wide web. ACM (2010)

  18. Tang, C., Monteleoni, C.: Convergence Rate of Stochastic \(k\)-Means (2016). arXiv:1610.04900

Download references

Acknowledgements

This work was partially supported by AFOSR Grant FA9550-15-1-0073 and ONR Grant N00014-16-1-2157. We would like to thank Dr. Bao Wang for helpful discussions. We also thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Penghang Yin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, P., Pham, M., Oberman, A. et al. Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering. J Sci Comput 77, 1133–1146 (2018). https://doi.org/10.1007/s10915-018-0744-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-018-0744-4

Keywords

Navigation