On Model Selection, Bayesian Networks, and the Fisher Information Integral
- 142 Downloads
We study BIC-like model selection criteria and in particular, their refinements that include a constant term involving the Fisher information matrix. We perform numerical simulations that enable increasingly accurate approximation of this constant in the case of Bayesian networks. We observe that for complex Bayesian network models, the constant term is a negative number with a very large absolute value that dominates the other terms for small and moderate sample sizes. For networks with a fixed number of parameters, d, the leading term in the complexity penalty, which is proportional to d, is the same. However, as we show, the constant term can vary significantly depending on the network structure even if the number of parameters is fixed. Based on our experiments, we conjecture that the distribution of the nodes’ outdegree is a key factor. Furthermore, we demonstrate that the constant term can have a dramatic effect on model selection performance for small sample sizes.
KeywordsModel selection Bayesian networks Fisher information approximation NML BIC
An earlier version of this paper was presented at the Second Workshop on Advanced Methodologies for Bayesian Networks (AMBN 2015) in Yokohama. The authors thank the anonymous reviewers for insightful comments and suggestions and the organizers of AMBN-2015 for their invitation to submit this work to this special issue. This work was funded in part by the Academy of Finland (Centre-of-Excellence COIN).
- 2.Grünwald, P.D.: The minimum description length principle. MIT Press, Cambridge (2007)Google Scholar
- 10.Rasmussen, C. E., Ghahramani, Z.: “Occam’s razor”. In: Adv. Neural Inf. Process. Syst. (Leen, T., Dietterich T., Tresp, V.), pp. 294–300 (2001)Google Scholar
- 13.Roos, T.: Monte Carlo estimation of minimax regret with an application to MDL model selection. In: Proc. IEEE Information Theory Workshop, IEEE Press, pp. 284–288 (2008)Google Scholar
- 14.Roos, T., Rissanen, J: On sequentially normalized maximum likelihood models. In: Proc. Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-08) (Rissanen, J., Liski, E., Tabus, I., Myllymäki, P., Kontoyiannis, I., Heikkonen, J.), Tampere, Finland (2008)Google Scholar
- 15.Roos, T., Zou, Y.: Keep it simple stupid—on the effect of lower-order terms in BIC-like criteria. In: Information Theory and Applications Workshop (ITA), IEEE Press, pp. 1–7 (2013)Google Scholar
- 18.Silander, T., Roos, T., Kontkanen, P., Myllymäki, P.: Factorized normalized maximum likelihood criterion for learning Bayesian network structures. In: Proc. 4th European Workshop on Probabilistic Graphical Models (PGM-08) (Jaeger, M., Nielsen, T. D.), pp. 257–272 (2008)Google Scholar
- 20.Ueno, M.: Robust learning Bayesian networks for prior belief. In: Proc. Uncertainty in Artificial Intelligence (UAI-2011) (Cozman, F.G., Pfeffer, A.), Barcelona, Spain, pp. 698–707 (2011)Google Scholar