Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

Fort, G.; Gach, P.; Moulines, E.

doi:10.1007/s11222-021-10023-9

Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

Published: 15 June 2021

Volume 31, article number 48, (2021)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

369 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Fast incremental expectation maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the Stochastic Approximation within EM framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples n and of the maximal number of iterations \(K_\mathrm {max}\). We propose two strategies for achieving an \(\epsilon \)-approximate stationary point, respectively with \(K_\mathrm {max}= O(n^{2/3}/\epsilon )\) and \(K_\mathrm {max}= O(\sqrt{n}/\epsilon ^{3/2})\), both strategies relying on a random termination rule before \(K_\mathrm {max}\) and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow \(K_\mathrm {max}\) to scale as \(\sqrt{n}\) which is better than \(n^{2/3}\) which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance \(\epsilon \), thus making this control relevant for small to medium accuracy with respect to the number of examples n. Second, for the \(n^{2/3}\)-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results design a less conservative choice of the step size and provide a better control of the convergence in expectation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Article 03 February 2015

Incremental constraint projection methods for variational inequalities

Article 17 May 2014

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

Notes

The numerical applications are developed in MATLAB by the first author of the paper. The code files are publicly available from https://github.com/gfort-lab/OpSiMorE/tree/master/FIEM .
Available at http://yann.lecun.com/exdb/mnist/

References

Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: Bach, F., Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 37, pp. 78–86 (2015)
Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Balcan, M., Weinberger, K. (eds.), Proceedings of The 33rd International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 48, pp. 699–707 (2016)
Benveniste, A., Priouret, P., Métivier, M.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)
Book MATH Google Scholar
Borkar, V.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Bottou, L.: Stochastic Gradient Descent Tricks, pp. 421–436. Springer, Berlin (2012)
Google Scholar
Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory, Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol. 9. Institute of Mathematical Statistics, Hayward (1986)
Google Scholar
Cappé, O., Moulines, E.: On-line Expectation Maximization algorithm for latent data models. J. R. Stat. Soc. B Met. 71(3), 593–613 (2009)
Article MathSciNet MATH Google Scholar
Celeux, G., Diebolt, J.: The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Stat. Q. 2, 73–82 (1985)
Google Scholar
Chen, J., Zhu, J., Teh, Y., Zhang, T.: Stochastic expectation maximization with variance reduction. In: Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Bengio, S. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 7967–7977. Curran Associates Inc, Red Hook (2018)
Google Scholar
Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedures. In: Recent Results in Estimation Theory and Related Topics, suppl. 1, Statist. Decisions, pp. 205–237 (1984)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates Inc, Red Hook (2014)
Google Scholar
Delyon, B., Lavielle, M., Moulines, E.: Convergence of a Stochastic Approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)
Article MathSciNet MATH Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B Met. 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Fang, C., Li, C., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Bengio, S. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 689–699. Curran Associates Inc, Red Hook (2018)
Google Scholar
Fort, G., Moulines, E.: Convergence of the Monte Carlo Expectation Maximization for curved exponential families. Ann. Stat. 31(4), 1220–1259 (2003)
Article MathSciNet MATH Google Scholar
Frühwirth-Schnatter, S., Celeux, G., Robert, C.P. (eds.): Handbook of Mixture Analysis. Handbooks of Modern Statistical Methods. Chapman & Hall/CRC Press, Boca Raton (2019)
MATH Google Scholar
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet MATH Google Scholar
Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
MATH Google Scholar
Gunawardana, A., Byrne, W.: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)
MathSciNet MATH Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Burges, C.J.C. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 315–323. Curran Associates Inc, Red Hook (2013)
Google Scholar
Karimi, B., Miasojedow, B., Moulines, E., Wai, H.T.: Non-asymptotic analysis of biased stochastic approximation scheme. In: Beygelzimer, A., Hsu, D. (eds.) Proceedings of the Thirty-Second Conference on Learning Theory, PMLR, Phoenix, USA, Proceedings of Machine Learning Research, vol. 99, pp. 1944–1974 (2019a)
Karimi, B., Wai, H.T., Moulines, E., Lavielle, M.: On the global convergence of (fast) incremental expectation maximization methods. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 2837–2847. Curran Associates Inc, Red Hook (2019b)
Google Scholar
Kwedlo, W.: A new random approach for initialization of the multiple restart EM algorithm for Gaussian model-based clustering. Pattern Anal. Appl. 18, 757–770 (2015)
Article MathSciNet Google Scholar
Lange, K.: MM Optimization Algorithms. Other Titles in Applied Mathematics, Society for Industrial and Applied Mathematics (2016)
Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. B 57(2), 425–437 (1995)
MathSciNet MATH Google Scholar
Little, R.J.A., Rubin, D.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics, 2nd edn. Wiley, Hoboken (2002)
Google Scholar
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley Series in Probability and Statistics. Wiley, New York (2008)
MATH Google Scholar
Murty, K., Kabadi, S.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)
Article MathSciNet MATH Google Scholar
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Springer, Dordrecht (1998)
Chapter Google Scholar
Ng, S.K., McLachlan, G.J.: On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Stat. Comput. 13(1), 45–55 (2003)
Article MathSciNet Google Scholar
Nguyen, H., Forbes, F., McLachlan, G.: Mini-batch learning of exponential family finite mixture models. Stat. Comput. 30, 731–748 (2020)
Article MathSciNet MATH Google Scholar
Parizi, S.N., He, K., Aghajani, R., Sclaroff, S., Felzenszwalb, P.: Generalized majorization-minimization. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, California, USA, Proceedings of Machine Learning Research, vol. 97, pp. 5022–5031 (2019)
Reddi, S., Sra, S., Póczos, B., Smola, A.: Fast incremental method for smooth nonconvex optimization. In: 2016 IEEE 55th conference on decision and control (CDC), pp. 1971–1977 (2016)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet MATH Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Article MathSciNet MATH Google Scholar
Sundberg, R.: Statistical Modelling by Exponential Families. Cambridge University Press, Cambridge (2019)
Book MATH Google Scholar
Wei, G., Tanner, M.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85(411), 699–704 (1990)
Article Google Scholar
Wu, C.: On the convergence properties of the EM algorithm. Ann. Stat. 11(1), 95–103 (1983)
Article MathSciNet MATH Google Scholar
Zangwill, W.I.: Non-linear programming via penalty functions. Manag. Sci. 13, 344–358 (1967)
Article MathSciNet MATH Google Scholar
Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduced gradient descent for nonconvex optimization. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3921–3932. Curran Associates Inc, Red Hook (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut de Mathématiques de Toulouse & CNRS, Toulouse, France
G. Fort
Institut de Mathématiques de Toulouse, Université Toulouse 3, Toulouse, France
P. Gach
Centre Mathématiques Appliquées, Ecole polytechnique, Palaiseau, France
E. Moulines
Computer Science Department, National Research University, Higher-School of Economics, Russia Federation Ecole Polytechnique, Palaiseau, France
E. Moulines

Authors

G. Fort
View author publications
You can also search for this author in PubMed Google Scholar
P. Gach
View author publications
You can also search for this author in PubMed Google Scholar
E. Moulines
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Fort.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the Fondation Simone et Cino Del Duca through the project OpSiMorE; by the French Agence Nationale de la Recherche (ANR), project under reference ANR-PRC-CE23 MASDOL and Chair ANR of research and teaching in artificial intelligence - SCAI Statistics and Computation for AI; and by the Russian Academic Excellence Project ‘5-100’.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 426 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fort, G., Gach, P. & Moulines, E. Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence. Stat Comput 31, 48 (2021). https://doi.org/10.1007/s11222-021-10023-9

Download citation

Received: 23 June 2020
Accepted: 26 May 2021
Published: 15 June 2021
DOI: https://doi.org/10.1007/s11222-021-10023-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

Abstract

Access this article

Similar content being viewed by others

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Incremental constraint projection methods for variational inequalities

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 426 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

Abstract

Access this article

Similar content being viewed by others

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Incremental constraint projection methods for variational inequalities

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 426 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation