SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm

Chouzenoux, Emilie; Fest, Jean-Baptiste

doi:10.1007/s10957-022-02122-y

SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm

Published: 03 November 2022

Volume 195, pages 919–952, (2022)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

304 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

A wide class of problems involves the minimization of a coercive and differentiable function F on \({\mathbb {R}}^N\) whose gradient cannot be evaluated in an exact manner. In such context, many existing convergence results from standard gradient-based optimization literature cannot be directly applied and robustness to errors in the gradient is not necessarily guaranteed. This work is dedicated to investigating the convergence of Majorization-Minimization (MM) schemes when stochastic errors affect the gradient terms. We introduce a general stochastic optimization framework, called StochAstic suBspace majoRIzation-miNimization Algorithm SABRINA that encompasses MM quadratic schemes possibly enhanced with a subspace acceleration strategy. New asymptotical results are built for the stochastic process generated by SABRINA. Two sets of numerical experiments in the field of machine learning and image processing are presented to support our theoretical results and illustrate the good performance of SABRINA with respect to state-of-the-art gradient-based stochastic optimization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Proximal Alternating Direction Method of Multipliers for DC Programming with Structured Constraints

Article 11 May 2024

The Hadamard decomposition problem

Article Open access 21 May 2024

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Article 07 June 2018

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

A preliminary version of this work has been presented in the conference proceedings [36]. The convergence result was weaker, and stated without proof. The experimental validation was limited to a single, simpler, numerical scenario.
If \(A\preceq B\) and D is a non-necessary square matrix, then \(D^\top A D \preceq D^\top B D\).
Let \(({{\varvec{z}}} _k)_{k\in {\mathbb {N}}}\) a bounded sequence of \({\mathbb {R}} ^N\), verifying \({{\varvec{z}}} _{k+1} - {{\varvec{z}}} _k \underset{k\rightarrow +\infty }{\longrightarrow } 0\). Then the set of cluster points of \(({{\varvec{z}}} _k)_{k\in {\mathbb {N}}}\) is connex.

References

Absil, P.-A., Gallivan, K.A.: Accelerated line-search and trust-region methods. SIAM J. Numer. Anal. 47(2), 997–1018 (2009)
Article MATH MathSciNet Google Scholar
Akyildiz, Ö.D., Chouzenoux, E., Elvira, V., Míguez, J.: A probabilistic incremental proximal gradient method. IEEE Signal Process. Lett. 26(8), 1257–1261 (2019)
Article Google Scholar
Allain, M., Idier, J., Goussard, Y.: On global and local convergence of half-quadratic algorithms. IEEE Trans. Image Process. 15(5), 1130–1142 (2006)
Article Google Scholar
Atchadé, Y.F., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18, 1–33 (2017)
MATH MathSciNet Google Scholar
Bell, T., Xu, J., Zhang, S.: Method for out-of-focus camera calibration. Appl. Opt. 55(9), 2346–2352 (2016)
Article Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 3rd edn. Athena Scientific, Belmont, Massachusetts (2016)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)
Article MATH MathSciNet Google Scholar
Bhatia, R.: Matrix Analysis, vol. 169. Springer (2013)
Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs. Math. Oper. Res. 41(2), 442–465 (2016)
Article MATH MathSciNet Google Scholar
Bonnans, J.-F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects. Springer (2006)
Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-Newton stochastic gradient descent. J. Mach. Learn. Res. 10, 1737–1754 (2009)
MATH MathSciNet Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In Y. Lechevallier and G. Saporta, (eds.), Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223–311 (2018)
Article MATH MathSciNet Google Scholar
Bouchard, G.: Efficient bounds for the softmax function and applications to approximate inference in hybrid models. In Proceedings of the Neural Information Processing Systems (NIPS 2008), vol. 31. Vancouver, Canada (2008)
Briceño-Arias, L.M., Chierchia, G., Chouzenoux, E., Pesquet, J.-C.: A random block-coordinate Douglas-Rachford splitting method with low computational complexity for binary logistic regression. Comput. Optim. Appl. 72(3), 707–726 (2019)
Article MATH MathSciNet Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MATH MathSciNet Google Scholar
Cadoni, S., Chouzenoux, E., Pesquet, J.-C., Chaux, C.: A block parallel majorize-minimize memory gradient algorithm. In Proceedings of the 23rd IEEE International Conference on Image Processing (ICIP 2016), pp. 3194–3198. Phoenix, AZ (2016)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Castera, C., Bolte, J., Fevotte, C., Pauwels, E.: An inertial Newton algorithm for deep learning. Technical report, (2019). arxiv:1905.12278
Chalvidal, M., Chouzenoux, E.: Block distributed 3MG algorithm and its application to 3D image restoration. In Proceedings of the IEEE International Conference on Image Processing (ICIP 2020), pp. 938–942. Abu Dhabi, UAE (virtual) (2020)
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
Chouzenoux, E., Idier, J., Moussaoui, S.: A majorize-minimize strategy for subspace optimization applied to image restoration. IEEE Trans. Image Process. 20(6), 1517–1528 (2010)
Article MATH MathSciNet Google Scholar
Chouzenoux, E., Jezierska, A., Pesquet, J.-C., Talbot, H.: A majorize-minimize subspace approach for \(\ell _2-\ell _0\) image regularization. SIAM J. Imaging Sci. 6(1), 563–591 (2013)
Article MATH MathSciNet Google Scholar
Chouzenoux, E., Pesquet, J.-C.: Convergence rate analysis of the majorize-minimize subspace algorithm. IEEE Signal Process. Lett. 23(9), 1284–1288 (2016)
Article Google Scholar
Chouzenoux, E., Pesquet, J.-C.: A stochastic majorize-minimize subspace algorithm for online penalized least squares estimation. IEEE Trans. Signal Process. 65(18), 4770–4783 (2017)
Article MATH MathSciNet Google Scholar
Combettes, P.L., Pesquet, J.-C.: Stochastic approximations and perturbations in forward-backward splitting for monotone operators. Pure Appl. Funct. Anal. 1(1), 13–37 (2016)
MATH MathSciNet Google Scholar
Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)
Article MATH MathSciNet Google Scholar
Dieuleveut, A., Durmus, A., Bach, F.: Bridging the gap between constant step size stochastic gradient descent and markov chains. arXiv preprint arXiv:1707.06386, (2017)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), (2011)
Dudar, V., Chierchia, G., Chouzenoux, E., Pesquet, J.-C., Semenov, V.: A two-stage subspace trust region approach for deep neural network training. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017), Kos Island, Greece (2017)
Duflo, M.: Random iterative models. Springer, Berlin (2013)
Google Scholar
Elvira, V., Chouzenoux, E.: Optimized population Monte Carlo. IEEE Trans. Signal Process. (2022)
Ermolie, J. M., Nekrylova, Z.V.: The method of stochastic gradients and its application. In Seminar: Theory of Optimal Solutions. No. 1 (Russian)
Fehrman, B., Gess, B., Jentzen, A.: Convergence rates for the stochastic gradient descent method for non-convex objective functions. J. Mach. Learn. Res., 21, (2020)
Fernandez-Bes, J., Elvira, V., Van Vaerenbergh, S.: A probabilistic least-mean-squares filter. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), pp. 2199–2203, Brisbane, Australia (2015)
Fest, J.-B., Chouzenoux, E.: Stochastic majorize-minimize subspace algorithm with application to binary classification. In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2021), Dublin, Ireland (virtual) (2021)
Florescu, A., Chouzenoux, E., Pesquet, J.-C., Ciuciu, P., Ciochina, S.: A majorize-minimize memory gradient method for complex-valued inverse problems. Signal Process. 103, 285–295 (2014)
Article Google Scholar
Gadat, S.: Stochastic optimization algorithms, non asymptotic and asymptotic behaviour. University of Toulouse (2017)
Google Scholar
Gadat, S., Gavra, I.: Asymptotic study of stochastic adaptive algorithm in non-convex landscape. Technical report, (2021). arxiv:2012.05640
Geman, D., Yang, C.: Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4(7), 932–946 (1995)
Article Google Scholar
Gharbi, M., Chouzenoux, E., Pesquet, J.-C., Duval, L.: GPU-based implementations of MM algorithms. Application to spectroscopy signal restoration. In Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2021), Dublin, Ireland (2021)
Gitman, I., Lang, H., Zhang, P., Xiao, L.: Understanding the role of momentum in stochastic gradient methods. Adv. Neural Inf. Process. Syst. pp. 9633–9643. (2019)
Huang, Y., Chouzenoux, E., Elvira, V.: Probabilistic modeling and inference for sequential space-varying blur identification. IEEE Trans. Comput. Imaging 7, 531–546 (2021)
Article MathSciNet Google Scholar
Jacobson, M.W., Fessler, J.A.: An expanded theoretical treatment of iteration-dependent Majorize-Minimize algorithms. IEEE Trans. Image Process. 16(10), 2411–2422 (2007)
Article MathSciNet Google Scholar
Kingma, D. P., Ba, J.: Adam: A method for stochastic optimization. Technical report (2014). arxiv:1412.6980
Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signal Process. 10(2), 242–255 (2015)
Article Google Scholar
Li, C., Chen, C., Carlson, D., Carin, L.: Preconditioned stochastic gradient langevin dynamics for deep neural networks. Technical report (2015). arxiv:1512.07666
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Article MATH MathSciNet Google Scholar
Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77(3), 653–710 (2020)
Article MATH MathSciNet Google Scholar
Mairal, J.: Stochastic majorization-minimization algorithms for large-scale optimization. Technical report (2013). arxiv:1306.4650
Marnissi, Y., Chouzenoux, E., Benazza-Benyahia, A., Pesquet, J.-C.: Majorize-minimize adapted Metropolis–Hastings algorithm. IEEE Trans. Signal Process. 68, 2356–2369 (2020)
Article MATH MathSciNet Google Scholar
Meyer, P.-A.: Martingales and stochastic integrals I, 1st edn. Springer, Berlin, Heidelberg (2006)
Google Scholar
Miele, A., Cantrell, J.: Study on a memory gradient method for the minimization of functions. J. Optim. Theory Appl. 3(6), 459–470 (1969)
Article MATH MathSciNet Google Scholar
Moulines, E., Bach, F.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Adv. Neural Inf. Process. Systems, vol. 24, (2011)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. akad. nauk Sssr 269, 543–547 (1983)
MathSciNet Google Scholar
Nocedal, J., Wright, S.: Numerical optimization. Springer (2006)
Ostrowski, A. N.: Solutions of equations in Euclidean and Banach spaces. Academic Press, (1973)
Pereyra, M., Schniter, P., Chouzenoux, E., Pesquet, J.-C., Tourneret, J.-Y., Hero, A.O., McLaughlin, S.: A survey of stochastic simulation and optimization methods in signal processing. IEEE J. Sel. Top. Signal Process. 10(2), 224–241 (2015)
Article Google Scholar
Repetti, A., Wiaux,Y.: A forward-backward algorithm for reweighted procedures: application to radio-astronomical imaging. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), pp. 1434–1438. (2020)
Richardson, E., Herskovitz, R., Ginsburg, B., Zibulevsky, M.: Seboost-boosting stochastic learning using subspace optimization techniques. Technical report (2016). arxiv:1609.00629
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. pp. 400–407 (1951)
Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In Optimizing Methods in Statistics, pp. 233–257. Academic Press, (1971)
Robini, M.C., Zhu, Y.: Generic half-quadratic optimization for image reconstruction. SIAM J. Imaging Sci. 8(3), 1752–1797 (2015)
Article MATH MathSciNet Google Scholar
Rosasco, L., Villa, S., Vu, B.: Convergence of stochastic proximal gradient algorithm. Appl. Math. Optim. 82, 891–917 (2020)
Article MATH MathSciNet Google Scholar
Ruder, S.: An overview of gradient descent optimization algorithms. Technical report (2016). arxiv:1609.04747
Sghaier, M., Chouzenoux, E., Pesquet, J.-C., Muller, S.: A novel task-based reconstruction approach for digital breast tomosynthesis. Med. Image Anal. 77, 102341 (2022)
Article Google Scholar
Shi, Z.-J., Shen, J.: Convergence of supermemory gradient method. J. Appl. Math. Comput. 24(1), 367–376 (2007)
Article MATH MathSciNet Google Scholar
Sonneveld, P.: CGS: A fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 10(1), 36–52 (1989)
Article MATH MathSciNet Google Scholar
Sun, Y., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process. 65(3), 794–816 (2016)
Article MATH MathSciNet Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In Proceedings of the International conference on machine learning (ICML 2013), pp. 1139–1147, Atlanta, USA (2013)
Tieleman, T., Hinton, G.: RMSProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Yuan, Y.-X.: Subspace techniques for nonlinear optimization, In R. Jeltsch, D. Li and I.H. Sloan, (eds.), Some Topics in Industrial and Applied Mathematics (Series in Contemporary Applied Mathematics CAM 8), pp. 206–218, Higher Education Press, Beijing, (2007)
Zhang, Z., Kwok, J.T., Yeung, D.-Y.: Surrogate maximization/minimization algorithms and extensions. Mach. Learn. 69, 1–33 (2007)
Article MATH Google Scholar
Zibulevsky, M.: SESOP-TN: combining sequential subspace optimization with truncated newton method. Technical report (2008). http://www.optimization-online.org/DB_FILE/2008/09/2098.pdf

Download references

Acknowledgements

The authors thank Jean-Christophe Pesquet (Univ. Paris Saclay, France) and Matthieu Terris (Heriot-Watt University, UK) for initial motivations and thoughtful discussions.

Funding

This research work received funding support from the European Research Council Starting Grant MAJORIS ERC-2019-STG-850925.

Author information

Authors and Affiliations

Centre de Vision Numérique, Inria, CentraleSupélec, Université Paris-Saclay, 9 rue Joliot Curie, 91190, Gif sur Yvette, France
Emilie Chouzenoux & Jean-Baptiste Fest

Authors

Emilie Chouzenoux
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Fest
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-Baptiste Fest.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Communicated by Nguyen Dong Yen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chouzenoux, E., Fest, JB. SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm. J Optim Theory Appl 195, 919–952 (2022). https://doi.org/10.1007/s10957-022-02122-y

Download citation

Received: 31 August 2021
Accepted: 28 September 2022
Published: 03 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10957-022-02122-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm

Abstract

Access this article

Similar content being viewed by others

A Proximal Alternating Direction Method of Multipliers for DC Programming with Structured Constraints

The Hadamard decomposition problem

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SABRINA: A Stochastic Subspace Majorization-Minimization Algorithm

Abstract

Access this article

Similar content being viewed by others

A Proximal Alternating Direction Method of Multipliers for DC Programming with Structured Constraints

The Hadamard decomposition problem

Global Convergence of ADMM in Nonconvex Nonsmooth Optimization

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation