MAP approximation to the variational Bayes Gaussian mixture model and application
The learning of variational inference can be widely seen as first estimating the class assignment variable and then using it to estimate parameters of the mixture model. The estimate is mainly performed by computing the expectations of the prior models. However, learning is not exclusive to expectation. Several authors report other possible configurations that use different combinations of maximization or expectation for the estimation. For instance, variational inference is generalized under the expectation–expectation (EE) algorithm. Inspired by this, another variant known as the maximization–maximization (MM) algorithm has been recently exploited on various models such as Gaussian mixture, Field-of-Gaussians mixture, and sparse-coding-based Fisher vector. Despite the recent success, MM is not without issue. Firstly, it is very rare to find any theoretical study comparing MM to EE. Secondly, the computational efficiency and accuracy of MM is seldom compared to EE. Hence, it is difficult to convince the use of MM over a mainstream learner such as EE or even Gibbs sampling. In this work, we revisit the learning of EE and MM on a simple Bayesian GMM case. We also made theoretical comparison of MM with EE and found that they in fact obtain near identical solutions. In the experiments, we performed unsupervised classification, comparing the computational efficiency and accuracy of MM and EE on two datasets. We also performed unsupervised feature learning, comparing Bayesian approach such as MM with other maximum likelihood approaches on two datasets.
KeywordsVariational Bayes Gaussian mixture model Expectation maximization Image classification
We are grateful to Dr Shiping Wang for his helpful discussion and guidance.
Compliance with ethical standards
Conflict of interest
Author Kart-Leong Lim declares no conflict of interest. Co-author Han Wang declares no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. In: Artificial intelligence and statistics, vol 2001. Morgan Kaufmann, Waltham, MA, pp 27–34Google Scholar
- Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 2169–2178Google Scholar
- Lian X-C, Li Z, Wang C, Lu B-L, Zhang L (2010) Probabilistic models for supervised dictionary learning. In: Proceedings of the 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2305–2312Google Scholar
- Lim K-L, Wang H (2016) Learning a field of Gaussian mixture model for image classification. In: Proceedings of the 2016 14th international conference on control, automation, robotics and vision (ICARCV). IEEE, pp 1–5Google Scholar
- Lim K-L, Wang H (2017) Sparse coding based Fisher vector using a Bayesian approach. IEEE Signal Process. Lett. 24(1):91Google Scholar
- Lim K-L, Wang H, Mou X (2016) Learning Gaussian mixture model with a maximization-maximization algorithm for image classification. In: Proceedings of the 2016 12th IEEE international conference on control and automation (ICCA). IEEE, pp 887–891Google Scholar
- Liu L, Shen C, Wang L, van den Hengel A, Wang C (2014) Encoding high dimensional local features by sparse coding based Fisher vectors. In: Advances in neural information processing systems, pp 1143–1151Google Scholar
- Ozuysal M, Lepetit V, Fua P (2009) Pose estimation for category specific multiview object localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 778–785Google Scholar
- Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Sharing clusters among related groups: hierarchical Dirichlet processes. In: Advances in Neural Information Processing Systems, pp 1385–1392Google Scholar
- Welling M, Kurihara K (2006) Bayesian k-means as a maximization-expectation algorithm. In: Proceedings of the 2006 SIAM international conference on data mining, pp 474–478Google Scholar