Skip to main content
Log in

SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

A wide class of problems involves the minimization of a coercive and differentiable function F on \({\mathbb {R}}^N\) whose gradient cannot be evaluated in an exact manner. In such context, many existing convergence results from standard gradient-based optimization literature cannot be directly applied and robustness to errors in the gradient is not necessarily guaranteed. This work is dedicated to investigating the convergence of Majorization-Minimization (MM) schemes when stochastic errors affect the gradient terms. We introduce a general stochastic optimization framework, called StochAstic suBspace majoRIzation-miNimization Algorithm SABRINA that encompasses MM quadratic schemes possibly enhanced with a subspace acceleration strategy. New asymptotical results are built for the stochastic process generated by SABRINA. Two sets of numerical experiments in the field of machine learning and image processing are presented to support our theoretical results and illustrate the good performance of SABRINA with respect to state-of-the-art gradient-based stochastic optimization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. A preliminary version of this work has been presented in the conference proceedings [36]. The convergence result was weaker, and stated without proof. The experimental validation was limited to a single, simpler, numerical scenario.

  2. If \(A\preceq B\) and D is a non-necessary square matrix, then \(D^\top A D \preceq D^\top B D\).

  3. Let \(({{\varvec{z}}} _k)_{k\in {\mathbb {N}}}\) a bounded sequence of \({\mathbb {R}} ^N\), verifying \({{\varvec{z}}} _{k+1} - {{\varvec{z}}} _k \underset{k\rightarrow +\infty }{\longrightarrow } 0\). Then the set of cluster points of \(({{\varvec{z}}} _k)_{k\in {\mathbb {N}}}\) is connex.

References

  1. Absil, P.-A., Gallivan, K.A.: Accelerated line-search and trust-region methods. SIAM J. Numer. Anal. 47(2), 997–1018 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  2. Akyildiz, Ö.D., Chouzenoux, E., Elvira, V., Míguez, J.: A probabilistic incremental proximal gradient method. IEEE Signal Process. Lett. 26(8), 1257–1261 (2019)

    Article  Google Scholar 

  3. Allain, M., Idier, J., Goussard, Y.: On global and local convergence of half-quadratic algorithms. IEEE Trans. Image Process. 15(5), 1130–1142 (2006)

    Article  Google Scholar 

  4. Atchadé, Y.F., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18, 1–33 (2017)

    MATH  MathSciNet  Google Scholar 

  5. Bell, T., Xu, J., Zhang, S.: Method for out-of-focus camera calibration. Appl. Opt. 55(9), 2346–2352 (2016)

    Article  Google Scholar 

  6. Bertsekas, D.P.: Nonlinear Programming, 3rd edn. Athena Scientific, Belmont, Massachusetts (2016)

    MATH  Google Scholar 

  7. Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bhatia, R.: Matrix Analysis, vol. 169. Springer (2013)

  9. Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. 41(2), 442–465 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  10. Bonnans, J.-F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects. Springer (2006)

  11. Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-Newton stochastic gradient descent. J. Mach. Learn. Res. 10, 1737–1754 (2009)

    MATH  MathSciNet  Google Scholar 

  12. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In Y. Lechevallier and G. Saporta, (eds.), Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010)

  13. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223–311 (2018)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bouchard, G.: Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In Proceedings of the Neural Information Processing Systems (NIPS 2008), vol. 31. Vancouver, Canada (2008)

  15. Briceño-Arias, L.M., Chierchia, G., Chouzenoux, E., Pesquet, J.-C.: A random block-coordinate Douglas-Rachford splitting method with low computational complexity for binary logistic regression. Comput. Optim. Appl. 72(3), 707–726 (2019)

    Article  MATH  MathSciNet  Google Scholar 

  16. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  17. Cadoni, S., Chouzenoux, E., Pesquet, J.-C., Chaux, C.: A block parallel majorize-minimize memory gradient algorithm. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP 2016), pp. 3194–3198. Phoenix, AZ (2016)

  18. Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)

  19. Castera, C., Bolte, J., Fevotte, C., Pauwels, E.: An inertial Newton algorithm for deep learning. Technical report, (2019). arxiv:1905.12278

  20. Chalvidal, M., Chouzenoux, E.: Block distributed 3MG algorithm and its application to 3D image restoration. In Proceedings of the IEEE International Conference on Image Processing (ICIP 2020), pp. 938–942. Abu Dhabi, UAE (virtual) (2020)

  21. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  22. Chouzenoux, E., Idier, J., Moussaoui, S.: A majorize-minimize strategy for subspace optimization applied to image restoration. IEEE Trans. Image Process. 20(6), 1517–1528 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  23. Chouzenoux, E., Jezierska, A., Pesquet, J.-C., Talbot, H.: A majorize-minimize subspace approach for \(\ell _2-\ell _0\) image regularization. SIAM J. Imaging Sci. 6(1), 563–591 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  24. Chouzenoux, E., Pesquet, J.-C.: Convergence rate analysis of the majorize-minimize subspace algorithm. IEEE Signal Process. Lett. 23(9), 1284–1288 (2016)

    Article  Google Scholar 

  25. Chouzenoux, E., Pesquet, J.-C.: A stochastic majorize-minimize subspace algorithm for online penalized least squares estimation. IEEE Trans. Signal Process. 65(18), 4770–4783 (2017)

    Article  MATH  MathSciNet  Google Scholar 

  26. Combettes, P.L., Pesquet, J.-C.: Stochastic approximations and perturbations in forward-backward splitting for monotone operators. Pure Appl. Funct. Anal. 1(1), 13–37 (2016)

    MATH  MathSciNet  Google Scholar 

  27. Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  28. Dieuleveut, A., Durmus, A., Bach, F.: Bridging the gap between constant step size stochastic gradient descent and markov chains. arXiv preprint arXiv:1707.06386, (2017)

  29. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), (2011)

  30. Dudar, V., Chierchia, G., Chouzenoux, E., Pesquet, J.-C., Semenov, V.: A two-stage subspace trust region approach for deep neural network training. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017), Kos Island, Greece (2017)

  31. Duflo, M.: Random iterative models. Springer, Berlin (2013)

    Google Scholar 

  32. Elvira, V., Chouzenoux, E.: Optimized population Monte Carlo. IEEE Trans. Signal Process. (2022)

  33. Ermolie, J. M., Nekrylova, Z.V.: The method of stochastic gradients and its application. In Seminar: Theory of Optimal Solutions. No. 1 (Russian)

  34. Fehrman, B., Gess, B., Jentzen, A.: Convergence rates for the stochastic gradient descent method for non-convex objective functions. J. Mach. Learn. Res., 21, (2020)

  35. Fernandez-Bes, J., Elvira, V., Van Vaerenbergh, S.: A probabilistic least-mean-squares filter. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), pp. 2199–2203, Brisbane, Australia (2015)

  36. Fest, J.-B., Chouzenoux, E.: Stochastic majorize-minimize subspace algorithm with application to binary classification. In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2021), Dublin, Ireland (virtual) (2021)

  37. Florescu, A., Chouzenoux, E., Pesquet, J.-C., Ciuciu, P., Ciochina, S.: A majorize-minimize memory gradient method for complex-valued inverse problems. Signal Process. 103, 285–295 (2014)

    Article  Google Scholar 

  38. Gadat, S.: Stochastic optimization algorithms, non asymptotic and asymptotic behaviour. University of Toulouse (2017)

    Google Scholar 

  39. Gadat, S., Gavra, I.: Asymptotic study of stochastic adaptive algorithm in non-convex landscape. Technical report, (2021). arxiv:2012.05640

  40. Geman, D., Yang, C.: Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4(7), 932–946 (1995)

    Article  Google Scholar 

  41. Gharbi, M., Chouzenoux, E., Pesquet, J.-C., Duval, L.: GPU-based implementations of MM algorithms. Application to spectroscopy signal restoration. In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2021), Dublin, Ireland (2021)

  42. Gitman, I., Lang, H., Zhang, P., Xiao, L.: Understanding the role of momentum in stochastic gradient methods. Adv. Neural Inf. Process. Syst. pp. 9633–9643. (2019)

  43. Huang, Y., Chouzenoux, E., Elvira, V.: Probabilistic modeling and inference for sequential space-varying blur identification. IEEE Trans. Comput. Imaging 7, 531–546 (2021)

    Article  MathSciNet  Google Scholar 

  44. Jacobson, M.W., Fessler, J.A.: An expanded theoretical treatment of iteration-dependent Majorize-Minimize algorithms. IEEE Trans. Image Process. 16(10), 2411–2422 (2007)

    Article  MathSciNet  Google Scholar 

  45. Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization. Technical report (2014). arxiv:1412.6980

  46. Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signal Process. 10(2), 242–255 (2015)

    Article  Google Scholar 

  47. Li, C., Chen, C., Carlson, D., Carin, L.: Preconditioned stochastic gradient langevin dynamics for deep neural networks. Technical report (2015). arxiv:1512.07666

  48. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  49. Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77(3), 653–710 (2020)

    Article  MATH  MathSciNet  Google Scholar 

  50. Mairal, J.: Stochastic majorization-minimization algorithms for large-scale optimization. Technical report (2013). arxiv:1306.4650

  51. Marnissi, Y., Chouzenoux, E., Benazza-Benyahia, A., Pesquet, J.-C.: Majorize-minimize adapted Metropolis–Hastings algorithm. IEEE Trans. Signal Process. 68, 2356–2369 (2020)

    Article  MATH  MathSciNet  Google Scholar 

  52. Meyer, P.-A.: Martingales and stochastic integrals I, 1st edn. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  53. Miele, A., Cantrell, J.: Study on a memory gradient method for the minimization of functions. J. Optim. Theory Appl. 3(6), 459–470 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  54. Moulines, E., Bach, F.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv. Neural Inf. Process. Systems, vol. 24, (2011)

  55. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. akad. nauk Sssr 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  56. Nocedal, J., Wright, S.: Numerical optimization. Springer (2006)

  57. Ostrowski, A. N.: Solutions of equations in Euclidean and Banach spaces. Academic Press, (1973)

  58. Pereyra, M., Schniter, P., Chouzenoux, E., Pesquet, J.-C., Tourneret, J.-Y., Hero, A.O., McLaughlin, S.: A survey of stochastic simulation and optimization methods in signal processing. IEEE J. Sel. Top. Signal Process. 10(2), 224–241 (2015)

    Article  Google Scholar 

  59. Repetti, A., Wiaux,Y.: A forward-backward algorithm for reweighted procedures: application to radio-astronomical imaging. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), pp. 1434–1438. (2020)

  60. Richardson, E., Herskovitz, R., Ginsburg, B., Zibulevsky, M.: Seboost-boosting stochastic learning using subspace optimization techniques. Technical report (2016). arxiv:1609.00629

  61. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. pp. 400–407 (1951)

  62. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In Optimizing Methods in Statistics, pp. 233–257. Academic Press, (1971)

  63. Robini, M.C., Zhu, Y.: Generic half-quadratic optimization for image reconstruction. SIAM J. Imaging Sci. 8(3), 1752–1797 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  64. Rosasco, L., Villa, S., Vu, B.: Convergence of stochastic proximal gradient algorithm. Appl. Math. Optim. 82, 891–917 (2020)

    Article  MATH  MathSciNet  Google Scholar 

  65. Ruder, S.: An overview of gradient descent optimization algorithms. Technical report (2016). arxiv:1609.04747

  66. Sghaier, M., Chouzenoux, E., Pesquet, J.-C., Muller, S.: A novel task-based reconstruction approach for digital breast tomosynthesis. Med. Image Anal. 77, 102341 (2022)

    Article  Google Scholar 

  67. Shi, Z.-J., Shen, J.: Convergence of supermemory gradient method. J. Appl. Math. Comput. 24(1), 367–376 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  68. Sonneveld, P.: CGS: A fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 10(1), 36–52 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  69. Sun, Y., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process. 65(3), 794–816 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  70. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In Proceedings of the International conference on machine learning (ICML 2013), pp. 1139–1147, Atlanta, USA (2013)

  71. Tieleman, T., Hinton, G.: RMSProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

  72. Yuan, Y.-X.: Subspace techniques for nonlinear optimization, In R. Jeltsch, D. Li and I.H. Sloan, (eds.), Some Topics in Industrial and Applied Mathematics (Series in Contemporary Applied Mathematics CAM 8), pp. 206–218, Higher Education Press, Beijing, (2007)

  73. Zhang, Z., Kwok, J.T., Yeung, D.-Y.: Surrogate maximization/minimization algorithms and extensions. Mach. Learn. 69, 1–33 (2007)

    Article  MATH  Google Scholar 

  74. Zibulevsky, M.: SESOP-TN: combining sequential subspace optimization with truncated newton method. Technical report (2008). http://www.optimization-online.org/DB_FILE/2008/09/2098.pdf

Download references

Acknowledgements

The authors thank Jean-Christophe Pesquet (Univ. Paris Saclay, France) and Matthieu Terris (Heriot-Watt University, UK) for initial motivations and thoughtful discussions.

Funding

This research work received funding support from the European Research Council Starting Grant MAJORIS ERC-2019-STG-850925.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Baptiste Fest.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Communicated by Nguyen Dong Yen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chouzenoux, E., Fest, JB. SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm. J Optim Theory Appl 195, 919–952 (2022). https://doi.org/10.1007/s10957-022-02122-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-022-02122-y

Keywords

Navigation