Skip to main content
Log in

Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Fast incremental expectation maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the Stochastic Approximation within EM framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples n and of the maximal number of iterations \(K_\mathrm {max}\). We propose two strategies for achieving an \(\epsilon \)-approximate stationary point, respectively with \(K_\mathrm {max}= O(n^{2/3}/\epsilon )\) and \(K_\mathrm {max}= O(\sqrt{n}/\epsilon ^{3/2})\), both strategies relying on a random termination rule before \(K_\mathrm {max}\) and on a constant step size in the Stochastic Approximation step. Our bounds provide some improvements on the literature. First, they allow \(K_\mathrm {max}\) to scale as \(\sqrt{n}\) which is better than \(n^{2/3}\) which was the best rate obtained so far; it is at the cost of a larger dependence upon the tolerance \(\epsilon \), thus making this control relevant for small to medium accuracy with respect to the number of examples n. Second, for the \(n^{2/3}\)-rate, the numerical illustrations show that thanks to an optimized choice of the step size and of the bounds in terms of quantities characterizing the optimization problem at hand, our results design a less conservative choice of the step size and provide a better control of the convergence in expectation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The numerical applications are developed in MATLAB by the first author of the paper. The code files are publicly available from https://github.com/gfort-lab/OpSiMorE/tree/master/FIEM .

  2. Available at http://yann.lecun.com/exdb/mnist/

References

  • Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. In: Bach, F., Blei, D. (eds.), Proceedings of the 32nd International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 37, pp. 78–86 (2015)

  • Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Balcan, M., Weinberger, K. (eds.), Proceedings of The 33rd International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol. 48, pp. 699–707 (2016)

  • Benveniste, A., Priouret, P., Métivier, M.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)

    Book  MATH  Google Scholar 

  • Borkar, V.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  • Bottou, L.: Stochastic Gradient Descent Tricks, pp. 421–436. Springer, Berlin (2012)

    Google Scholar 

  • Brown, L.D.: Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory, Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol. 9. Institute of Mathematical Statistics, Hayward (1986)

    Google Scholar 

  • Cappé, O., Moulines, E.: On-line Expectation Maximization algorithm for latent data models. J. R. Stat. Soc. B Met. 71(3), 593–613 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Diebolt, J.: The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Stat. Q. 2, 73–82 (1985)

    Google Scholar 

  • Chen, J., Zhu, J., Teh, Y., Zhang, T.: Stochastic expectation maximization with variance reduction. In: Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Bengio, S. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 7967–7977. Curran Associates Inc, Red Hook (2018)

    Google Scholar 

  • Csiszár, I., Tusnády, G.: Information geometry and alternating minimization procedures. In: Recent Results in Estimation Theory and Related Topics, suppl. 1, Statist. Decisions, pp. 205–237 (1984)

  • Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates Inc, Red Hook (2014)

    Google Scholar 

  • Delyon, B., Lavielle, M., Moulines, E.: Convergence of a Stochastic Approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B Met. 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fang, C., Li, C., Lin, Z., Zhang, T.: SPIDER: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Bengio, S. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 689–699. Curran Associates Inc, Red Hook (2018)

    Google Scholar 

  • Fort, G., Moulines, E.: Convergence of the Monte Carlo Expectation Maximization for curved exponential families. Ann. Stat. 31(4), 1220–1259 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter, S., Celeux, G., Robert, C.P. (eds.): Handbook of Mixture Analysis. Handbooks of Modern Statistical Methods. Chapman & Hall/CRC Press, Boca Raton (2019)

    MATH  Google Scholar 

  • Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)

    MATH  Google Scholar 

  • Gunawardana, A., Byrne, W.: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)

    MathSciNet  MATH  Google Scholar 

  • Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Burges, C.J.C. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 315–323. Curran Associates Inc, Red Hook (2013)

    Google Scholar 

  • Karimi, B., Miasojedow, B., Moulines, E., Wai, H.T.: Non-asymptotic analysis of biased stochastic approximation scheme. In: Beygelzimer, A., Hsu, D. (eds.) Proceedings of the Thirty-Second Conference on Learning Theory, PMLR, Phoenix, USA, Proceedings of Machine Learning Research, vol. 99, pp. 1944–1974 (2019a)

  • Karimi, B., Wai, H.T., Moulines, E., Lavielle, M.: On the global convergence of (fast) incremental expectation maximization methods. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 2837–2847. Curran Associates Inc, Red Hook (2019b)

    Google Scholar 

  • Kwedlo, W.: A new random approach for initialization of the multiple restart EM algorithm for Gaussian model-based clustering. Pattern Anal. Appl. 18, 757–770 (2015)

    Article  MathSciNet  Google Scholar 

  • Lange, K.: MM Optimization Algorithms. Other Titles in Applied Mathematics, Society for Industrial and Applied Mathematics (2016)

  • Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. B 57(2), 425–437 (1995)

    MathSciNet  MATH  Google Scholar 

  • Little, R.J.A., Rubin, D.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics, 2nd edn. Wiley, Hoboken (2002)

    Google Scholar 

  • McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley Series in Probability and Statistics. Wiley, New York (2008)

    MATH  Google Scholar 

  • Murty, K., Kabadi, S.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Springer, Dordrecht (1998)

    Chapter  Google Scholar 

  • Ng, S.K., McLachlan, G.J.: On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Stat. Comput. 13(1), 45–55 (2003)

    Article  MathSciNet  Google Scholar 

  • Nguyen, H., Forbes, F., McLachlan, G.: Mini-batch learning of exponential family finite mixture models. Stat. Comput. 30, 731–748 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Parizi, S.N., He, K., Aghajani, R., Sclaroff, S., Felzenszwalb, P.: Generalized majorization-minimization. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, California, USA, Proceedings of Machine Learning Research, vol. 97, pp. 5022–5031 (2019)

  • Reddi, S., Sra, S., Póczos, B., Smola, A.: Fast incremental method for smooth nonconvex optimization. In: 2016 IEEE 55th conference on decision and control (CDC), pp. 1971–1977 (2016)

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  • Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Sundberg, R.: Statistical Modelling by Exponential Families. Cambridge University Press, Cambridge (2019)

    Book  MATH  Google Scholar 

  • Wei, G., Tanner, M.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85(411), 699–704 (1990)

    Article  Google Scholar 

  • Wu, C.: On the convergence properties of the EM algorithm. Ann. Stat. 11(1), 95–103 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Zangwill, W.I.: Non-linear programming via penalty functions. Manag. Sci. 13, 344–358 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduced gradient descent for nonconvex optimization. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3921–3932. Curran Associates Inc, Red Hook (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Fort.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the Fondation Simone et Cino Del Duca through the project OpSiMorE; by the French Agence Nationale de la Recherche (ANR), project under reference ANR-PRC-CE23 MASDOL and Chair ANR of research and teaching in artificial intelligence - SCAI Statistics and Computation for AI; and by the Russian Academic Excellence Project ‘5-100’.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 426 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fort, G., Gach, P. & Moulines, E. Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence. Stat Comput 31, 48 (2021). https://doi.org/10.1007/s11222-021-10023-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10023-9

Keywords

Mathematics Subject Classification

Navigation