Skip to main content

Mini-batch learning of exponential family finite mixture models

Abstract

Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the constructed mini-batch algorithms. Theoretical results regarding the convergence of the mini-batch EM algorithms are presented. We then demonstrate how the mini-batch framework may be applied to conduct maximum likelihood (ML) estimation of mixtures of exponential family distributions, with emphasis on ML estimation for mixtures of normal distributions. Via a simulation study, we demonstrate that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm. Further evidence of the performance of the mini-batch framework is provided via an application to the famous MNIST data set.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Amari, S.: Information Geometry and Its Applications. Springer, Japan (2016)

    MATH  Google Scholar 

  • Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)

    MathSciNet  MATH  Google Scholar 

  • Buhlmann, P., Drineas, P., Kane, M., van der Laan, M. (eds.): Handbook of Big Data. CRC Press, Boca Raton (2016)

    MATH  Google Scholar 

  • Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. J. R. Stat. Soc. B 71, 593–613 (2009)

    MathSciNet  MATH  Google Scholar 

  • Celeux, G., Chretien, S., Forbes, F., Mkhadri, A.: A component-wise EM algorithm for mixtures. J. Comput. Graph. Stat. 10, 697–712 (2001)

    MathSciNet  Google Scholar 

  • Chau, M., Fu, M.C.: An overview of stochastic approximation. In: Fu, M.C. (ed.) Handbook of Simulation Optimization, pp. 149–178. Springer, New York (2015)

    Google Scholar 

  • Chen, H.-F.: Stochastic Approximiation and Its Applications. Kluwer, New York (2003)

    Google Scholar 

  • Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Advances in Neural Information Processing Systems, pp. 1647–1655 (2011)

  • DasGupta, A.: Probability for Statistics and Machine Learning. Springer, New York (2011)

    MATH  Google Scholar 

  • Delyon, B., Lavielle, M., Moulines, E.: Counvergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27, 94–128 (1999)

    MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Eddelbuettel, D.: Seamless R and C++ Integration with Rcpp. Springer, New York (2013)

    MATH  Google Scholar 

  • Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)

    Google Scholar 

  • Forbes, C., Evans, M., Hastings, N., Peacock, B.: Statistical Distributions. Wiley, New York (2011)

    MATH  Google Scholar 

  • Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14, 529–546 (2005)

    Google Scholar 

  • Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. Ser. A 155, 267–305 (2016)

    MathSciNet  MATH  Google Scholar 

  • Han, Z., Hong, M., Wang, D.: Signal Processing and Networking for Big Data Applications. Cambridge University Press, Cambridge (2017)

    MATH  Google Scholar 

  • Hardle, W.K., Lu, H.H.-S., Shen, X. (eds.): Handbook of Big Data Analytics. Springer, Cham (2018)

    MATH  Google Scholar 

  • Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28, 100–108 (1979)

    MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    MATH  Google Scholar 

  • Iverson, K.E.: A Programming Language. Wiley, New York (1967)

    MATH  Google Scholar 

  • Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  • Jones, P.N., McLachlan, G.J.: Fitting finite mixture models in a regression context. Aust. J. Stat. 34, 233–240 (1992)

    Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)

    MathSciNet  MATH  Google Scholar 

  • Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    MathSciNet  MATH  Google Scholar 

  • Kushner, H.J., Yin, G.G.: Stochastic Approximiation and Recursive Algorithms and Applications. Springer, New York (2003)

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Google Scholar 

  • Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 661–670) (2014)

  • Liang, F., Zhang, J.: Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika 95, 961–977 (2008)

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (2008)

    MATH  Google Scholar 

  • McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Ann. Rev. Stat. Appl. 6, 355–378 (2019)

    MathSciNet  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    MATH  Google Scholar 

  • Melnykov, V., Chen, W.-C., Maitra, R.: MixSim: an R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51, 1–25 (2012)

    Google Scholar 

  • Ng, S.-K., McLachlan, G.J.: Speeding up the EM algorithm for mixture model-based segmentation of magnetic resonance images. Pattern Recognit. 37, 1573–1589 (2004)

    MATH  Google Scholar 

  • Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of-experts modeling: an overview. WIREs Data Min. Knowl. Discov. 8(4), e1246 (2018)

    Google Scholar 

  • Nguyen, H.D., Jones, A.T.: Big Data-appropriate clustering via stochastic approximation and Gaussian mixture models. In: Ahmed, M., Pathan, A.-S.K. (eds.) Data Analytics: Concepts, Techniques, and Applications. CRC Press, Boca Raton (2018)

    Google Scholar 

  • Nguyen, H.D., McLachlan, G.J.: Maximum likelihood estimation of Gaussian mixture models without matrix operations. Adv. Data Anal. Classif. 9, 371–394 (2015)

    MathSciNet  MATH  Google Scholar 

  • Pearson, K.: Contributions to the theory of mathematical evolution. Philos. Trans. R. Soc. Lond. A 185, 71–110 (1894)

    MATH  Google Scholar 

  • Polyak, B.T.: A new method of stochastic approximation type. Autom. Remote Control 51, 98–107 (1990)

    MathSciNet  Google Scholar 

  • Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)

    MathSciNet  MATH  Google Scholar 

  • Prosperetti, A.: Advanced Mathematics for Applications. Cambridge University Press, Cambridge (2011)

    MATH  Google Scholar 

  • R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2018)

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    MathSciNet  MATH  Google Scholar 

  • Schubert, E., Koos, A., Emrich, T., Zufle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc. VLDB Endow. 8, 1976–1979 (2015)

    Google Scholar 

  • Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: mclust: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016)

    Google Scholar 

  • Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Process. Lett. 15, 77–87 (2002)

    MATH  Google Scholar 

  • White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50, 1–25 (1982)

    MathSciNet  MATH  Google Scholar 

  • White, H.: Asymptotic Theory For Econometricians. Academic Press, San Diego (2001)

    Google Scholar 

  • Wickham, H., Cook, D., Hofmann, H., Buja, A.: tourr: an R package for exploring multivariate data with projections. J. Stat. Softw. 40, 1–18 (2011)

    Google Scholar 

  • Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)

    MathSciNet  MATH  Google Scholar 

  • Xu, L., Jordan, M.I., Hinton, G.E.: An alternative model for mixtures of experts. In: Advances in Neural Information Processing Systems, pp. 633–640 (1995)

  • Zhang, J., Liang, F.: Convergence of stochastic approximation algorithms under irregular conditions. Stat. Neerl. 62, 393–403 (2008)

    MathSciNet  MATH  Google Scholar 

  • Zhao, T., Yu, M., Wang, Y., Arora, R., Liu, H.: Accelerated mini-batch randomized block coordinate descent method. In Advances in Neural Information Processing Systems (pp. 3329–3337) (2014)

Download references

Acknowledgements

The authors are indebted to the Co-ordinating Editor and two Reviewers for their insightful comments that have improved the exposition of the manuscript. HDN is personally funded by Australian Research Council (ARC) Grant DE170101134. GJM and HDN are also funded under ARC Grant DP180101192. The work is supported by Inria project LANDER.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien D. Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1745 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, H.D., Forbes, F. & McLachlan, G.J. Mini-batch learning of exponential family finite mixture models. Stat Comput 30, 731–748 (2020). https://doi.org/10.1007/s11222-019-09919-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-019-09919-4

Keywords

  • Expectation–maximization algorithm
  • Exponential family distributions
  • Finite mixture models
  • Mini-batch algorithm
  • Normal mixture models
  • Online algorithm