Free Energy of Stochastic Context Free Grammar on Variational Bayes
Variational Bayesian learning is proposed for approximation method of Bayesian learning. In spite of efficiency and experimental good performance, their mathematical property has not yet been clarified. In this paper we analyze variational Bayesian Stochastic Context Free Grammar which includes the true distribution thus the model is non-identifiable. We derive their asymptotic free energy. It is shown that in some prior conditions, the free energy is much smaller than identifiable models and satisfies eliminating redundant non-terminals.
KeywordsHide Markov Model Generalization Error Bayesian Learning Terminal Symbol Nonterminal Symbol
Unable to display preview. Download preview PDF.
- 1.Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, pp. 21–20 (1999)Google Scholar
- 2.Beal, M.J.: Variational Algorithms for Approximate Bayesian Inference, PhD thesis, University College London (2003)Google Scholar
- 4.Hosino, T., Watanabe, K., Watanabe, S.: Stochastic Complexity of Variational Bayesian Hidden Markov Models. In: International Joint Conference on Neural Networks (2005)Google Scholar
- 5.Kurihara, K., Sato, T.: An Application of the Variational Bayesian Approach to Probabilistic Context-Free Grammars. In: International Joint Conference on Natural Language Processing (2004)Google Scholar
- 8.Nakajima, S., Watanabe, S.: Generalization Error and Free Energy of Linear Neural Networks in Variational Bayes Approach. In: The 12th International Conference on Neural Information Processing (2005)Google Scholar
- 11.Watanabe, K., Watanabe, S.: Variational bayesian stochastic complexity of mixture models. In: Advances in Neural Information Processing Systems 18. MIT Press, Cambridge (2006) (to appear)Google Scholar
- 12.Yamazaki, K., Watanabe, S.: Generalization Errors in Estimating of Stochastic Context-Free Grammar. Artificial Intelligence and Soft Computing (2005)Google Scholar