Bayesian parameter learning with an application

Karimnezhad, Ali; Moradi, Fahimeh

doi:10.1007/s40300-015-0077-0

Bayesian parameter learning with an application

Published: 13 October 2015

Volume 74, pages 61–74, (2016)
Cite this article

METRON Aims and scope Submit manuscript

Ali Karimnezhad¹ &
Fahimeh Moradi¹

166 Accesses
2 Citations
Explore all metrics

Abstract

This paper deals with prior uncertainty in the parameter learning procedure in Bayesian networks. In most studies in the literature, parameter learning is based on two well-known criteria, i.e., the maximum likelihood and the maximum a posteriori. In presence of prior information, the literature abounds with situations in which a maximum a posteriori estimate is computed as a desired estimate but in those studies, it does not seem that the viewpoint behind its use is according to a loss function-based viewpoint. In this paper, we recall the maximum a posteriori estimator as the Bayes estimator under the zero-one loss function and criticizing the zero-one loss, we suggest the use of the general Entropy loss function as a useful loss when overlearning and underlearning need serious attention. We take prior uncertainty into account and extend the act of parameter learning for the case when prior information is polluted. Addressing a real world problem, we conduct a simulation procedure to study behavior of the proposed estimates. Finally, in order to seek the effect of changing hyperparameters of a chosen prior on the learning procedure, we carry out a sensitivity analysis w.r.t. some chosen hyperparameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Bayesian network parameters with soft-hard constraints

Article 15 June 2022

A Review of Parameter Learning Methods in Bayesian Network

Learning Bayesian Network Parameters from Small Data Set: A Spatially Maximum a Posteriori Method

References

Berger, J.O.: Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer, New York (1985)
Book MATH Google Scholar
Boratynska, A.: Posterior regret $\Gamma $-minimax estimation of insurance Premium in collective risk model. Astin Bull. 38, 277–291 (2008)
Article MathSciNet MATH Google Scholar
Buntine, W.L.: Theory refinement on Bayesian networks. In: Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann Publishers (1991)
Calabria, R., Pulcini, G.: An engineering approch to Bayes estimation for the Weibull distribution. Microelect. Reliab. 34, 789–802 (1994)
Article MATH Google Scholar
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning bayesian networks from data: an information-theory based approach. Artif. Intel. J. 137, 43–90 (2002)
Article MathSciNet MATH Google Scholar
Cooper, G.F., Herskovits, E.H.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1991)
MATH Google Scholar
Cowell, R.G.: Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 91–97. Morgan Kaufmann Publishers Inc. (2001)
de Campos, L.M.: A scoring function for learning bayesian networks based on mutual information and conditional independence tests. J. Machine Learn. Res. 7, 2149–2187 (2006)
MathSciNet MATH Google Scholar
de Campos, C.P., Qiang, J.: Improving Bayesian network parameter learning using constraints. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Dey, D.K., Ghosh, M., Srinivasan, C.: Simultaneous estimation of parameters under entropy loss. J. Stat. Plan. Inference 15, 347–363 (1987)
Article MathSciNet MATH Google Scholar
Dey, D.K., Liu, P.L.: On comparison of estimators in a generalized life model. Microelect. Reliab. 32, 207–221 (1992)
Article Google Scholar
Eaton, D., Murphy, K. P.: Exact Bayesian structure learning from uncertain interventions. In: International Conference on Artificial Intelligence and Statistics, pp. 107–114 (2007)
Fan, X., Malone, B., Yuan, C.: Finding optimal Bayesian network structures with constraints learned from data. In: Proceedings of the 30th Annual Conference on Uncertainty in Artificial Intelligence (2014)
Grünwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Google Scholar
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
MATH Google Scholar
Heckerman, D., Meek, C., Cooper, G.: Computation, causation, and discovery. In: Glymour, C., Cooper, G.F. (eds.) A Bayesian Approach to Causal Discovery, pp. 141–166. AAAI Press, Menlo Park (1999)
Google Scholar
Jozani, Jafari, Parsian, A.: Posterior regret $\Gamma $-minimax estimation and prediction with applications on $k$-Records data under entropy loss function. Commun. Stat.- Theory. Methods 37, 2202–2212 (2008)
MATH Google Scholar
Jensen, F.V.: An Introduction to Bayesian Networks. Springer-Verlag, New York (1996)
Google Scholar
Koski, T., Noble, J.M.: Bayesian Networks—An Introduction. John Wiley and Sons, NewYork (2011)
MATH Google Scholar
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Series B 50, 157–224 (1988)
MathSciNet MATH Google Scholar
Parsian, A., Kirmani, S.N.U.A.: Estimation under LINEX function. In: Ullah, A., Wan, A.T.K., Chaturvedi, A. (eds.), pp. 53–76 (2002)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo (1988)
MATH Google Scholar
Ramoni, M., Sebastiani, P.: Bayesian methods. Intelligent Data Analysis. An Introduction, pp. 131–168. Springer, Berlin Heidelberg (2003)
Silander, T., Roos, T., Myllymki, P.: Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason. 51(5), 544–557 (2010)
Article MathSciNet Google Scholar
Sivaganesan, S.: Range of posterior measures for priors with arbitrary contaminations. Commun. Stat. -Theor. Methods 17, 1591–1612 (1988)
Article MATH Google Scholar
Sivaganesan, S., Berger, J.O.: Ranges of posterior measures for priors with unimodal contaminations. Ann. Stat. 17, 868–889 (1989)
Article MathSciNet MATH Google Scholar
Soliman, A.: Estimation of parameters of life from progressively. IEEE Trans. Reliab. 54, 34–42 (2005)
Article Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)
MATH Google Scholar
Tamada, Y., Imoto, S., Miyano, S.: Parallel algorithm for learning optimal Bayesian network structure. J. Machine Learn. Res. 12, 2437–2459 (2011)
MathSciNet MATH Google Scholar
Ueno, M. Learning networks determined by the ratio of prior and data. In Grunwald, P., Spirtes, P. (eds.). In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, pp 598–605 (2010)

Download references

Acknowledgments

The authors are indebted to Professor Ahmad Parsian for his constructive comments and suggestions during preparation of this work. The authors are also grateful to an anonymous reviewer for several helpful suggestions.

Author information

Authors and Affiliations

Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, Canada
Ali Karimnezhad & Fahimeh Moradi

Authors

Ali Karimnezhad
View author publications
You can also search for this author in PubMed Google Scholar
Fahimeh Moradi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Karimnezhad.

Appendix

Proof of Proposition 1

F irst note that by a direct consequence of Calabria and Pulcini [4], we have

$$\begin{aligned}\delta _{jil}^{\pi ,q}({\varvec{x}}) = \left( {E_\pi [\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]}\right) ^\frac{1}{q}. \end{aligned}$$

Now, by using the fact that $\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})$, it is easy to verify that

$$\begin{aligned} E_\pi [\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]=\frac{\Gamma (n_{jl}+\alpha _{jl})\Gamma (n_{jil}+\alpha _{jil}+q)}{\Gamma (n_{jil}+\alpha _{jil})\Gamma (n_{jl}+\alpha _{jl}+q)}=\gamma _{jil}^{\pi ,q}. \end{aligned}$$

These complete the proof.

Proof of Lemma1

The proof of (i) is easily attained following Lemma 4.2 of Berger [1] with some adjustments in the notations. To prove (ii), notice that the Bayes estimate of $\theta _{jil}$ w.r.t. the prior $\pi _\epsilon $ under the GEL function (2) is given by

$$\begin{aligned} \delta _{jil}^{\pi _\epsilon ,q}({\varvec{x}}) = \left( {E_{\pi _\epsilon }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]}\right) ^\frac{1}{q}, \end{aligned}$$

where

$$\begin{aligned} E_{\pi _\epsilon }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]= & {} \int _{\Theta _{jil}}\theta _{jil}^q\pi _\epsilon (\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\ {[}\text {using part (i)}{]}= & {} \lambda _{jil}({\varvec{x}})\int _{\Theta _{jil}}\theta _{jil}^q\pi _0(\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\&+[1-\lambda _{jil}({\varvec{x}})]\int _{\Theta _{jil}}\theta _{jil}^q\pi (\theta _{jil}|{\varvec{x}})d\nu (\theta _{jil})\\= & {} \lambda _{jil}({\varvec{x}})E_{\pi _0}[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}] +[1-\lambda _{jil}({\varvec{x}})]E_{\pi }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]. \end{aligned}$$

Now, using the facts that $E_{\pi _0}[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi _0,q}^q}}({\varvec{x}})$ and $E_{\pi }[\theta _{jil}^q|{\varvec{X}}={\varvec{x}}]={\delta _{jil}^{{\pi ,q}^q}}({\varvec{x}})$, the proof is completed.

Proof of Corollary 1

First note that the posterior distributions w.r.t. the priors $\pi _0$ and $\pi $ are given by $Dir(n_{j1l}+\alpha _{j1l},\ldots ,n_{jk_jl}+\alpha _{jk_jl})$ and $Dir(n_{j1l}+\alpha _{j1l}^{'},\ldots ,n_{jk_jl}+\alpha _{jk_jl}^{'})$, respectively. Also, the marginal posteriors w.r.t. the priors $\pi _0$ and $\pi $ are $\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil}+\alpha _{jil},n_{jl}+\alpha _{jl}-n_{jil}-\alpha _{jil})$ and $\theta _{jil}|{\varvec{x}}\sim Beta(n_{jil},n_{jl}+\alpha _{jl}^{'}-n_{jil}-\alpha _{jil}^{'})$. These will result in $\delta _{jil}^{{\pi _0,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}+q)\Gamma (n_{jl}+\alpha _{jl})}{\Gamma (n_{jil}+\alpha _{jil})\Gamma (n_{jl}+\alpha _{jl}+q)}=\gamma _{jil}^{\pi _0,q}$ and $\delta _{jil}^{{\pi ,q}^q}({\varvec{x}})=\frac{\Gamma (n_{jil}+\alpha _{jil}^{'}+q)\Gamma (\alpha _{jl}^{'}+n_{jl})}{\Gamma (n_{jil}+\alpha _{jil}^{'})\Gamma (n_{jl}+\alpha _{jl}^{'}+q)}=\gamma _{jil}^{\pi ,q}$. Also, note that

$$\begin{aligned} m_{jil}^{\pi _0}({\varvec{x}})= & {} \int _{\Theta _{jil}}p({\varvec{x}}|\theta _{jil})\pi _0(\theta _{jil})d\theta _{jil}\\= & {} \int _{\Theta }p({\varvec{x}}|\theta _{j1l},\ldots ,\theta _{jk_jl})\pi _0(\theta _{j1l},\ldots ,\theta _{jk_jl})d\theta _{j1l},\ldots ,d\theta _{jk_jl}\\= & {} \int _{\Theta } \prod _{i=1}^{k_j}\theta _{jil}^{n_{jil}}\frac{\Gamma (\alpha _{jl})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil})}\prod _{i=1}^{k_j}\theta _{jil}^{\alpha _{jil}-1}d\theta _{j1l},\ldots ,d\theta _{jk_jl}\\= & {} \frac{\Gamma (\alpha _{jl})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil})}\frac{\prod _{i=1}^{k_j}\Gamma (n_{jil}+\alpha _{jil})}{\Gamma (n_{jl}+\alpha _{jl})}=\vartheta _{jil}^{\pi _0}, \end{aligned}$$

where $\Theta =\left\{ (\theta _{j1l},\ldots ,\theta _{jk_jl}): 0<\theta _{jil}<1, \sum _{i=1}^{k_j}\theta _{jil}=1\right\} $. Similarly

$$\begin{aligned} m_{jil}^{\pi }({\varvec{x}})= & {} \frac{\Gamma (\alpha _{jl}^{'})}{\prod _{i=1}^{k_j}\Gamma (\alpha _{jil}^{'})}\frac{\prod _{i=1}^{k_j}\Gamma (n_{jil}+\alpha _{jil}^{'})}{\Gamma (n_{jl}+\alpha _{jl}^{'})}=\vartheta _{jil}^{\pi }. \end{aligned}$$

Now, substitute $\lambda _{jil}({\varvec{x}})$ by $\frac{(1-\epsilon )\vartheta _{jil}^{\pi _0}}{(1-\epsilon )\vartheta _{jil}^{\pi _0}+\epsilon \vartheta _{jil}^{\pi }}=\vartheta _{jil}$. Combining all theses facts, results in the Bayes estimate $\delta _{jil}^{{\pi _\epsilon },q}({\varvec{x}}) = \left( \vartheta _{jil}\gamma _{jil}^{\pi _0,q}+[1-\vartheta _{jil}]\gamma _{jil}^{\pi ,q}\right) ^\frac{1}{q}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karimnezhad, A., Moradi, F. Bayesian parameter learning with an application. METRON 74, 61–74 (2016). https://doi.org/10.1007/s40300-015-0077-0

Download citation

Received: 13 November 2014
Accepted: 22 September 2015
Published: 13 October 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s40300-015-0077-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian parameter learning with an application

Abstract

Access this article

Similar content being viewed by others

Learning Bayesian network parameters with soft-hard constraints

A Review of Parameter Learning Methods in Bayesian Network

Learning Bayesian Network Parameters from Small Data Set: A Spatially Maximum a Posteriori Method

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Proposition 1

Proof of Lemma1

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian parameter learning with an application

Abstract

Access this article

Similar content being viewed by others

Learning Bayesian network parameters with soft-hard constraints

A Review of Parameter Learning Methods in Bayesian Network

Learning Bayesian Network Parameters from Small Data Set: A Spatially Maximum a Posteriori Method

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Proposition 1

Proof of Lemma1

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation