Properties of the stochastic approximation EM algorithm with mini-batch sampling

Abstract

To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation–Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Allassonnière, S., Amit, Y., Trouvé, A.: Toward a coherent statistical framework for dense deformable template estimation. J. R. Stat. Soc. Ser. B 69, 3–29 (2007)

    MathSciNet  Article  Google Scholar 

  2. Allassonnière, S., Kuhn, E., Trouvé, A.: Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study. Bernoulli 16(3), 641–678 (2010)

    MathSciNet  Article  Google Scholar 

  3. Andrieu, C., Moulines, E., Priouret, P.: Stability of stochastic approximation under verifiable conditions. SIAM J. Control. Optim. 44(1), 283–312 (2005)

    MathSciNet  Article  Google Scholar 

  4. Cappé, O.: Online EM algorithm for hidden Markov models. J. Comput. Graph. Stat. 20(3), 728–749 (2011)

    MathSciNet  Article  Google Scholar 

  5. Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc. Ser. B 71(3), 593–613 (2009)

    MathSciNet  Article  Google Scholar 

  6. Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. CRC Press, Boca Raton (1995)

    Google Scholar 

  7. Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)

    MathSciNet  Article  Google Scholar 

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  9. Duchateau, L., Janssen, P.: The Frailty Model. Springer, New York (2008)

    Google Scholar 

  10. Fort, G., Moulines, E., Roberts, G.O., Rosenthal, J.S.: On the geometric ergodicity of hybrid samplers. J. Appl. Probab. 40, 123–146 (2003)

    MathSciNet  Article  Google Scholar 

  11. Fort, G., Jourdain, B., Kuhn, E., Lelièvre, T., Stoltz, G.: Convergence of the Wang-Landau algorithm. Math. Comput. 84(295), 2297–2327 (2015)

    MathSciNet  Article  Google Scholar 

  12. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  13. Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)

    Article  Google Scholar 

  14. Karimi, B., Lavielle, M., Moulines, E.: On the Convergence Properties of the Mini-batch EM and MCEM Algorithms (unpublished) (2018)

  15. Karimi, B.: Non-Convex Optimization for Latent Data Models: Algorithms, Analysis and Applications. phD Thesis https://tel.archives-ouvertes.fr/tel-02319140 (2019)

  16. Kuhn, E., Lavielle, M.: Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: P&S 8, 115–131 (2004)

  17. Kuhn, E., Lavielle, M.: Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Ann. 49(4), 1020–1038 (2005)

    MathSciNet  Article  Google Scholar 

  18. Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. B 2(57), 425–437 (1995)

    MathSciNet  MATH  Google Scholar 

  19. Liang, P., Klein, D.: Online EM for unsupervised models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, NAACL ’09, pp 611–619 (2009)

  20. Matias, C., Robin, S.: Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proce. Surv. 47, 55–74 (2014)

    MathSciNet  Article  Google Scholar 

  21. Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. MIT Press, Cambridge (1999)

    Google Scholar 

  22. Nguyen, H., Forbes, F., McLachlan, G.: Mini-batch learning of exponential family finite mixture models. Stat, Comput (2020)

    Google Scholar 

  23. Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer Texts in Statistics. Springer, New York (2004)

    Google Scholar 

  24. Titterington, D.M.: Recursive parameter estimation using incomplete data. J. R. Stat. Soc. Ser. B 2(46), 257–267 (1984)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Work partly supported by the Grant ANR-18-CE02-0010 of the French National Research Agency ANR (Project EcoNet).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tabea Rebafka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kuhn, E., Matias, C. & Rebafka, T. Properties of the stochastic approximation EM algorithm with mini-batch sampling. Stat Comput 30, 1725–1739 (2020). https://doi.org/10.1007/s11222-020-09968-0

Download citation

Keywords

  • EM algorithm
  • Mini-batch sampling
  • Stochastic approximation
  • Monte Carlo Markov chain

Mathematics Subject Classification

  • 65C60
  • 62F12