Skip to main content
Log in

Bayesian parameter learning with an application

  • Published:
METRON Aims and scope Submit manuscript

Abstract

This paper deals with prior uncertainty in the parameter learning procedure in Bayesian networks. In most studies in the literature, parameter learning is based on two well-known criteria, i.e., the maximum likelihood and the maximum a posteriori. In presence of prior information, the literature abounds with situations in which a maximum a posteriori estimate is computed as a desired estimate but in those studies, it does not seem that the viewpoint behind its use is according to a loss function-based viewpoint. In this paper, we recall the maximum a posteriori estimator as the Bayes estimator under the zero-one loss function and criticizing the zero-one loss, we suggest the use of the general Entropy loss function as a useful loss when overlearning and underlearning need serious attention. We take prior uncertainty into account and extend the act of parameter learning for the case when prior information is polluted. Addressing a real world problem, we conduct a simulation procedure to study behavior of the proposed estimates. Finally, in order to seek the effect of changing hyperparameters of a chosen prior on the learning procedure, we carry out a sensitivity analysis w.r.t. some chosen hyperparameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer, New York (1985)

    Book  MATH  Google Scholar 

  2. Boratynska, A.: Posterior regret \(\Gamma \)-minimax estimation of insurance Premium in collective risk model. Astin Bull. 38, 277–291 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  3. Buntine, W.L.: Theory refinement on Bayesian networks. In: Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann Publishers (1991)

  4. Calabria, R., Pulcini, G.: An engineering approch to Bayes estimation for the Weibull distribution. Microelect. Reliab. 34, 789–802 (1994)

    Article  MATH  Google Scholar 

  5. Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning bayesian networks from data: an information-theory based approach. Artif. Intel. J. 137, 43–90 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cooper, G.F., Herskovits, E.H.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1991)

    MATH  Google Scholar 

  7. Cowell, R.G.: Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 91–97. Morgan Kaufmann Publishers Inc. (2001)

  8. de Campos, L.M.: A scoring function for learning bayesian networks based on mutual information and conditional independence tests. J. Machine Learn. Res. 7, 2149–2187 (2006)

    MathSciNet  MATH  Google Scholar 

  9. de Campos, C.P., Qiang, J.: Improving Bayesian network parameter learning using constraints. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)

  10. Dey, D.K., Ghosh, M., Srinivasan, C.: Simultaneous estimation of parameters under entropy loss. J. Stat. Plan. Inference 15, 347–363 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dey, D.K., Liu, P.L.: On comparison of estimators in a generalized life model. Microelect. Reliab. 32, 207–221 (1992)

    Article  Google Scholar 

  12. Eaton, D., Murphy, K. P.: Exact Bayesian structure learning from uncertain interventions. In: International Conference on Artificial Intelligence and Statistics, pp. 107–114 (2007)

  13. Fan, X., Malone, B., Yuan, C.: Finding optimal Bayesian network structures with constraints learned from data. In: Proceedings of the 30th Annual Conference on Uncertainty in Artificial Intelligence (2014)

  14. Grünwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)

    Google Scholar 

  15. Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  16. Heckerman, D., Meek, C., Cooper, G.: Computation, causation, and discovery. In: Glymour, C., Cooper, G.F. (eds.) A Bayesian Approach to Causal Discovery, pp. 141–166. AAAI Press, Menlo Park (1999)

    Google Scholar 

  17. Jozani, Jafari, Parsian, A.: Posterior regret \(\Gamma \)-minimax estimation and prediction with applications on \(k\)-Records data under entropy loss function. Commun. Stat.- Theory. Methods 37, 2202–2212 (2008)

    MATH  Google Scholar 

  18. Jensen, F.V.: An Introduction to Bayesian Networks. Springer-Verlag, New York (1996)

    Google Scholar 

  19. Koski, T., Noble, J.M.: Bayesian Networks—An Introduction. John Wiley and Sons, NewYork (2011)

    MATH  Google Scholar 

  20. Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Series B 50, 157–224 (1988)

    MathSciNet  MATH  Google Scholar 

  21. Parsian, A., Kirmani, S.N.U.A.: Estimation under LINEX function. In: Ullah, A., Wan, A.T.K., Chaturvedi, A. (eds.), pp. 53–76 (2002)

  22. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo (1988)

    MATH  Google Scholar 

  23. Ramoni, M., Sebastiani, P.: Bayesian methods. Intelligent Data Analysis. An Introduction, pp. 131–168. Springer, Berlin Heidelberg (2003)

  24. Silander, T., Roos, T., Myllymki, P.: Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason. 51(5), 544–557 (2010)

    Article  MathSciNet  Google Scholar 

  25. Sivaganesan, S.: Range of posterior measures for priors with arbitrary contaminations. Commun. Stat. -Theor. Methods 17, 1591–1612 (1988)

    Article  MATH  Google Scholar 

  26. Sivaganesan, S., Berger, J.O.: Ranges of posterior measures for priors with unimodal contaminations. Ann. Stat. 17, 868–889 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  27. Soliman, A.: Estimation of parameters of life from progressively. IEEE Trans. Reliab. 54, 34–42 (2005)

    Article  Google Scholar 

  28. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  29. Tamada, Y., Imoto, S., Miyano, S.: Parallel algorithm for learning optimal Bayesian network structure. J. Machine Learn. Res. 12, 2437–2459 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Ueno, M. Learning networks determined by the ratio of prior and data. In Grunwald, P., Spirtes, P. (eds.). In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pp 598–605 (2010)

Download references

Acknowledgments

The authors are indebted to Professor Ahmad Parsian for his constructive comments and suggestions during preparation of this work. The authors are also grateful to an anonymous reviewer for several helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Karimnezhad.

Appendix

Appendix

Proof of Proposition 1

F irst note that by a direct consequence of Calabria and Pulcini [4], we have

$$\begin{aligned}\delta _{jil}^{\pi ,q}({\varvec{x}}) = \left( {E_\pi [\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]}\right) ^\frac{1}{q}. \end{aligned}$$

Now, by using the fact that \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})\), it is easy to verify that

$$\begin{aligned} E_\pi [\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]=\frac{\Gamma (n_{jl}+\alpha _{jl})\Gamma (n_{jil}+\alpha _{jil}+q)}{\Gamma (n_{jil}+\alpha _{jil})\Gamma (n_{jl}+\alpha _{jl}+q)}=\gamma _{jil}^{\pi ,q}. \end{aligned}$$

These complete the proof.

Proof of Lemma1

The proof of (i) is easily attained following Lemma 4.2 of Berger [1] with some adjustments in the notations. To prove (ii), notice that the Bayes estimate of \(\theta _{jil}\) w.r.t. the prior \(\pi _\epsilon \) under the GEL function (2) is given by

$$\begin{aligned} \delta _{jil}^{\pi _\epsilon ,q}({\varvec{x}}) = \left( {E_{\pi _\epsilon }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]}\right) ^\frac{1}{q}, \end{aligned}$$

where

$$\begin{aligned} E_{\pi _\epsilon }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]= & {} \int _{\Theta _{jil}}\theta _{jil}^q\pi _\epsilon (\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\ {[}\text {using part (i)}{]}= & {} \lambda _{jil}({\varvec{x}})\int _{\Theta _{jil}}\theta _{jil}^q\pi _0(\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\&+[1-\lambda _{jil}({\varvec{x}})]\int _{\Theta _{jil}}\theta _{jil}^q\pi (\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\= & {} \lambda _{jil}({\varvec{x}})E_{\pi _0}[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}] +[1-\lambda _{jil}({\varvec{x}})]E_{\pi }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]. \end{aligned}$$

Now, using the facts that \(E_{\pi _0}[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi _0,q}^q}}({\varvec{x}})\) and \(E_{\pi }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi ,q}^q}}({\varvec{x}})\), the proof is completed.

Proof of Corollary 1

First note that the posterior distributions w.r.t. the priors \(\pi _0\) and \(\pi \) are given by \(Dir(n_{j1l}+\alpha _{j1l},\ldots ,n_{jk_jl}+\alpha _{jk_jl})\) and \(Dir(n_{j1l}+\alpha _{j1l}^{'},\ldots ,n_{jk_jl}+\alpha _{jk_jl}^{'})\), respectively. Also, the marginal posteriors w.r.t. the priors \(\pi _0\) and \(\pi \) are \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})\) and \(\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil},n_{jl}+\alpha _{jl}^{'}-n_{jil}-\alpha _{jil}^{'})\). These will result in \(\delta _{jil}^{{\pi _0,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}+q)\Gamma (n_{jl}+\alpha _{jl})}{\Gamma (n_{jil}+\alpha _{jil})\Gamma (n_{jl}+\alpha _{jl}+q)}=\gamma _{jil}^{\pi _0,q}\) and \(\delta _{jil}^{{\pi ,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}^{'}+q)\Gamma (\alpha _{jl}^{'}+n_{jl})}{\Gamma (n_{jil}+\alpha _{jil}^{'})\Gamma (n_{jl}+\alpha _{jl}^{'}+q)}=\gamma _{jil}^{\pi ,q}\). Also, note that

$$\begin{aligned} m_{jil}^{\pi _0}({\varvec{x}})= & {} \int _{\Theta _{jil}}p({\varvec{x}}|\theta _{jil})\pi _0(\theta _{jil})d\theta _{jil}\\= & {} \int _{\Theta }p({\varvec{x}}|\theta _{j1l},\ldots ,\theta _{jk_jl})\pi _0(\theta _{j1l},\ldots ,\theta _{jk_jl})d\theta _{j1l},\ldots ,d\theta _{jk_jl}\\= & {} \int _{\Theta } \prod _{i=1}^{k_j}\theta _{jil}^{n_{jil}}\frac{\Gamma (\alpha _{jl})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil})}\prod _{i=1}^{k_j}\theta _{jil}^{\alpha _{jil}-1}d\theta _{j1l},\ldots ,d\theta _{jk_jl}\\= & {} \frac{\Gamma (\alpha _{jl})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil})}\frac{\prod _{i=1}^{k_j}\Gamma (n_{jil}+\alpha _{jil})}{\Gamma (n_{jl}+\alpha _{jl})}=\vartheta _{jil}^{\pi _0}, \end{aligned}$$

where \(\Theta =\left\{ (\theta _{j1l},\ldots ,\theta _{jk_jl}): 0<\theta _{jil}<1, \sum _{i=1}^{k_j}\theta _{jil}=1\right\} \). Similarly

$$\begin{aligned} m_{jil}^{\pi }({\varvec{x}})= & {} \frac{\Gamma (\alpha _{jl}^{'})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil}^{'})}\frac{\prod _{i=1}^{k_j}\Gamma (n_{jil}+\alpha _{jil}^{'})}{\Gamma (n_{jl}+\alpha _{jl}^{'})}=\vartheta _{jil}^{\pi }. \end{aligned}$$

Now, substitute \(\lambda _{jil}({\varvec{x}})\) by \(\frac{(1-\epsilon )\vartheta _{jil}^{\pi _0}}{(1-\epsilon )\vartheta _{jil}^{\pi _0}+\epsilon \vartheta _{jil}^{\pi }}=\vartheta _{jil}\). Combining all theses facts, results in the Bayes estimate \(\delta _{jil}^{{\pi _\epsilon },q}({\varvec{x}}) = \left( \vartheta _{jil}\gamma _{jil}^{\pi _0,q}+[1-\vartheta _{jil}]\gamma _{jil}^{\pi ,q}\right) ^\frac{1}{q}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karimnezhad, A., Moradi, F. Bayesian parameter learning with an application. METRON 74, 61–74 (2016). https://doi.org/10.1007/s40300-015-0077-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-015-0077-0

Keywords

Navigation