Abstract
This paper deals with prior uncertainty in the parameter learning procedure in Bayesian networks. In most studies in the literature, parameter learning is based on two well-known criteria, i.e., the maximum likelihood and the maximum a posteriori. In presence of prior information, the literature abounds with situations in which a maximum a posteriori estimate is computed as a desired estimate but in those studies, it does not seem that the viewpoint behind its use is according to a loss function-based viewpoint. In this paper, we recall the maximum a posteriori estimator as the Bayes estimator under the zero-one loss function and criticizing the zero-one loss, we suggest the use of the general Entropy loss function as a useful loss when overlearning and underlearning need serious attention. We take prior uncertainty into account and extend the act of parameter learning for the case when prior information is polluted. Addressing a real world problem, we conduct a simulation procedure to study behavior of the proposed estimates. Finally, in order to seek the effect of changing hyperparameters of a chosen prior on the learning procedure, we carry out a sensitivity analysis w.r.t. some chosen hyperparameters.
Similar content being viewed by others
References
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer, New York (1985)
Boratynska, A.: Posterior regret \(\Gamma \)-minimax estimation of insurance Premium in collective risk model. Astin Bull. 38, 277–291 (2008)
Buntine, W.L.: Theory refinement on Bayesian networks. In: Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann Publishers (1991)
Calabria, R., Pulcini, G.: An engineering approch to Bayes estimation for the Weibull distribution. Microelect. Reliab. 34, 789–802 (1994)
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning bayesian networks from data: an information-theory based approach. Artif. Intel. J. 137, 43–90 (2002)
Cooper, G.F., Herskovits, E.H.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1991)
Cowell, R.G.: Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 91–97. Morgan Kaufmann Publishers Inc. (2001)
de Campos, L.M.: A scoring function for learning bayesian networks based on mutual information and conditional independence tests. J. Machine Learn. Res. 7, 2149–2187 (2006)
de Campos, C.P., Qiang, J.: Improving Bayesian network parameter learning using constraints. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Dey, D.K., Ghosh, M., Srinivasan, C.: Simultaneous estimation of parameters under entropy loss. J. Stat. Plan. Inference 15, 347–363 (1987)
Dey, D.K., Liu, P.L.: On comparison of estimators in a generalized life model. Microelect. Reliab. 32, 207–221 (1992)
Eaton, D., Murphy, K. P.: Exact Bayesian structure learning from uncertain interventions. In: International Conference on Artificial Intelligence and Statistics, pp. 107–114 (2007)
Fan, X., Malone, B., Yuan, C.: Finding optimal Bayesian network structures with constraints learned from data. In: Proceedings of the 30th Annual Conference on Uncertainty in Artificial Intelligence (2014)
Grünwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Heckerman, D., Meek, C., Cooper, G.: Computation, causation, and discovery. In: Glymour, C., Cooper, G.F. (eds.) A Bayesian Approach to Causal Discovery, pp. 141–166. AAAI Press, Menlo Park (1999)
Jozani, Jafari, Parsian, A.: Posterior regret \(\Gamma \)-minimax estimation and prediction with applications on \(k\)-Records data under entropy loss function. Commun. Stat.- Theory. Methods 37, 2202–2212 (2008)
Jensen, F.V.: An Introduction to Bayesian Networks. Springer-Verlag, New York (1996)
Koski, T., Noble, J.M.: Bayesian Networks—An Introduction. John Wiley and Sons, NewYork (2011)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Series B 50, 157–224 (1988)
Parsian, A., Kirmani, S.N.U.A.: Estimation under LINEX function. In: Ullah, A., Wan, A.T.K., Chaturvedi, A. (eds.), pp. 53–76 (2002)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo (1988)
Ramoni, M., Sebastiani, P.: Bayesian methods. Intelligent Data Analysis. An Introduction, pp. 131–168. Springer, Berlin Heidelberg (2003)
Silander, T., Roos, T., Myllymki, P.: Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason. 51(5), 544–557 (2010)
Sivaganesan, S.: Range of posterior measures for priors with arbitrary contaminations. Commun. Stat. -Theor. Methods 17, 1591–1612 (1988)
Sivaganesan, S., Berger, J.O.: Ranges of posterior measures for priors with unimodal contaminations. Ann. Stat. 17, 868–889 (1989)
Soliman, A.: Estimation of parameters of life from progressively. IEEE Trans. Reliab. 54, 34–42 (2005)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)
Tamada, Y., Imoto, S., Miyano, S.: Parallel algorithm for learning optimal Bayesian network structure. J. Machine Learn. Res. 12, 2437–2459 (2011)
Ueno, M. Learning networks determined by the ratio of prior and data. In Grunwald, P., Spirtes, P. (eds.). In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pp 598–605 (2010)
Acknowledgments
The authors are indebted to Professor Ahmad Parsian for his constructive comments and suggestions during preparation of this work. The authors are also grateful to an anonymous reviewer for several helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Proposition 1
F irst note that by a direct consequence of Calabria and Pulcini [4], we have
Now, by using the fact that \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})\), it is easy to verify that
These complete the proof.
Proof of Lemma1
The proof of (i) is easily attained following Lemma 4.2 of Berger [1] with some adjustments in the notations. To prove (ii), notice that the Bayes estimate of \(\theta _{jil}\) w.r.t. the prior \(\pi _\epsilon \) under the GEL function (2) is given by
where
Now, using the facts that \(E_{\pi _0}[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi _0,q}^q}}({\varvec{x}})\) and \(E_{\pi }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi ,q}^q}}({\varvec{x}})\), the proof is completed.
Proof of Corollary 1
First note that the posterior distributions w.r.t. the priors \(\pi _0\) and \(\pi \) are given by \(Dir(n_{j1l}+\alpha _{j1l},\ldots ,n_{jk_jl}+\alpha _{jk_jl})\) and \(Dir(n_{j1l}+\alpha _{j1l}^{'},\ldots ,n_{jk_jl}+\alpha _{jk_jl}^{'})\), respectively. Also, the marginal posteriors w.r.t. the priors \(\pi _0\) and \(\pi \) are \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})\) and \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil},n_{jl}+\alpha _{jl}^{'}-n_{jil}-\alpha _{jil}^{'})\). These will result in \(\delta _{jil}^{{\pi _0,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}+q)\Gamma (n_{jl}+\alpha _{jl})}{\Gamma (n_{jil}+\alpha _{jil})\Gamma (n_{jl}+\alpha _{jl}+q)}=\gamma _{jil}^{\pi _0,q}\) and \(\delta _{jil}^{{\pi ,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}^{'}+q)\Gamma (\alpha _{jl}^{'}+n_{jl})}{\Gamma (n_{jil}+\alpha _{jil}^{'})\Gamma (n_{jl}+\alpha _{jl}^{'}+q)}=\gamma _{jil}^{\pi ,q}\). Also, note that
where \(\Theta =\left\{ (\theta _{j1l},\ldots ,\theta _{jk_jl}): 0<\theta _{jil}<1, \sum _{i=1}^{k_j}\theta _{jil}=1\right\} \). Similarly
Now, substitute \(\lambda _{jil}({\varvec{x}})\) by \(\frac{(1-\epsilon )\vartheta _{jil}^{\pi _0}}{(1-\epsilon )\vartheta _{jil}^{\pi _0}+\epsilon \vartheta _{jil}^{\pi }}=\vartheta _{jil}\). Combining all theses facts, results in the Bayes estimate \(\delta _{jil}^{{\pi _\epsilon },q}({\varvec{x}}) = \left( \vartheta _{jil}\gamma _{jil}^{\pi _0,q}+[1-\vartheta _{jil}]\gamma _{jil}^{\pi ,q}\right) ^\frac{1}{q}\).
Rights and permissions
About this article
Cite this article
Karimnezhad, A., Moradi, F. Bayesian parameter learning with an application. METRON 74, 61–74 (2016). https://doi.org/10.1007/s40300-015-0077-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-015-0077-0