Abstract
In this paper, we investigate Bayesian and robust Bayesian estimation of a wide range of parameters of interest in the context of Bayesian nonparametrics under a broad class of loss functions. Dealing with uncertainty regarding the prior, we consider the Dirichlet and the Dirichlet invariant priors, and provide explicit form of the resulting Bayes and robust Bayes estimators. Tractability of the results is supported by numerous examples of different well-known loss functions. The practical utility of the proposed Bayes and robust Bayes estimators are examined for a real data set.
Similar content being viewed by others
Notes
It is important to realize that in Lemma A, the posterior distribution of \(\theta _P\) given \({\varvec{X}}\) has two related parameters, i.e., aH and \(a(1-H)\). There was no need to use the Taylor’s expansions if the two parameters were unrelated. For example, if \(P|{\varvec{X}}\) was assumed to follow \(Beta(a_1,a_2)\)-distribution, then one could derive \(E[\ln \theta _P|{\varvec{X}}]\) as follows
$$\begin{aligned} E[\ln \theta _P|{\varvec{X}}]= & {} \int _0^1\frac{\ln v}{B(a_1,a_2)} v^{a_1-1}(1-v)^{a_2-1}dv\\= & {} \frac{1}{B(a_1,a_2)} \int _0^1 \frac{\partial }{\partial a_1} v^{a_1-1}(1-v)^{a_2-1}dv\\= & {} \frac{\partial }{\partial a_1} \ln B(a_1,a_2)\\= & {} \frac{\partial }{\partial a_1}\ln \varGamma (a_1) -\frac{\partial }{\partial a_1}\ln \varGamma (a_1+a_2)\\= & {} \psi (a_1)-\psi (a_1+a_2), \end{aligned}$$where \(\psi (\cdot )\) is the digamma function, i.e., \(\psi (u)=\frac{\partial }{\partial u}\ln \varGamma (u)\). In the same way, one could prove that \(E[\ln (1-\theta _P)|{\varvec{X}}]=\psi (a_2) -\psi (a_1+a_2)\). However, these simple and well-known relations can not be used in the derivation of \(E[\ln \theta _P]\) and \(E[\ln (1-\theta _P)]\) in Lemma A due to the fact that \(a_2=a(1-H{(S)})=a-a_1\) is a function of \(a_1=aH{(S)}\).
References
Al-Labadi L, Evans M (2016) Prior based model checking. arXiv preprint. arXiv:1606.08106
Al-Labadi L, Evans M (2017) Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Anal 12(3):705–728
Al-Labadi L, Zarepour M (2013) On asymptotic properties and almost sure approximation of the normalized inverse-Gaussian process. Bayesian Anal 8(3):553–568
Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2(6):1152–1174
Arias-Nicolás JP, Martín J, Ruggeri F, Suárez-Llorens A (2009) Optimal actions in problems with convex loss functions. Int J Approx Reason 50(2):303–314
Banerjee A, Merugu S, Dhillon IS, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6:1705–1749
Basu S (2000) Bayesian robustness and Bayesian nonparametrics. In: Insua DR, Ruggeri F (eds) Robust Bayesian analysis. Springer, New York, pp 223–240
Benavoli A, Mangili F, Ruggeri F, Zaffalon M (2015) Imprecise Dirichlet process with application to the hypothesis test on the probability that \(X\le Y\). J Stat Theory Pract 9(3):658–684
Berger JO (1990) Robust Bayesian analysis: sensitivity to the prior. J Stat Plan Inference 25(3):303–328
Berger JO (2013) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York
Berger JO, Moreno E, Pericchi LR, Bayarri MJ, Bernardo JM, Cano JA, De la Horra J, Martín J, Ríos-Insúa D, Betrò B et al (1994) An overview of robust Bayesian analysis. Test 3(1):5–124
Bose S (2017) Robustness in Bayesian nonparametrics. Int J Approx Reason 82:161–169
Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 7(3):200–217
Brown L (1968) Inadmissibility of the usual estimators of scale parameters in problems with unknown location and scale parameters. Ann Math Stat 39(1):29–48
Cagno E, Caron F, Mancini M, Ruggeri F (2000) Using AHP in determining the prior distributions on gas pipeline failures in a robust Bayesian approach. Reliab Eng Syst Saf 67(3):275–284
Calabria R, Pulcini G (1994) An engineering approach to Bayes estimation for the Weibull distribution. Microelectron Reliab 34(5):789–802
Carlton MA (1999) Applications of the two-parameter Poisson–Dirichlet distribution. Ph.D. thesis, Department of Statistics, University of California, Los Angeles
Dalal SR (1979) Dirichlet invariant processes and applications to nonparametric estimation of symmetric distribution functions. Stoch Process Their Appl 9(1):99–107
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230
Ferguson TS, Klass MJ (1972) A representation of independent increment processes without Gaussian components. Ann Math Stat 43(5):1634–1643
Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830
Golparver L, Karimnezhad A, Parsian A (2013) Optimal rules and robust Bayes estimation of a gamma scale parameter. Metrika 76(5):595–622
Gupta AK, Nadarajah S (2004) Mathematical properties of the Beta distribution. In: Gupta AK, Nadarajah S (eds) Handbook of beta distribution and its applications. Marcel Dekker Inc., New York, pp 33–53
Hjort NL (1990) Nonparametric Bayes estimators based on beta processes in models for life history data. Ann Stat 18(3):1259–1294
Hosseini R, Zarepour M (2018) A note on Bayesian nonparametric inference for spherically symmetric distribution. arXiv preprint. arXiv:1807.11066v2
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96(453):161–173
Ishwaran H, Zarepour M (2000) Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika 87(2):371–390
Jagers P (1974) Aspects of random measures and point processes. Adv Probab Relat Top 3:179–239
James LF, Lijoi A, Prünster I (2006) Conjugacy as a distinctive feature of the Dirichlet process. Scand J Stat 33(1):105–120
James W, Stein C (1992) Estimation with quadratic loss. In: Kotz S, Johnson NL (eds) Breakthroughs in statistics. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, pp 443–460
Jang GH, Lee J, Lee S (2010) Posterior consistency of species sampling priors. Stat Sinica 20:581–593
Kalbfleisch JD (1978) Non-parametric Bayesian analysis of survival time data. J R Stat Soc Ser B (Methodol) 40(2):214–221
Karimnezhad A, Parsian A (2014) Robust Bayesian methodology with applications in credibility premium derivation and future claim size prediction. AStA Adv Stat Anal 98(3):287–303
Karimnezhad A, Parsian A (2018) Most stable sample size determination in clinical trials. Stat Methods Appl. https://doi.org/10.1007/s10260-017-0419-6
Karimnezhad A, Parsian A (2019) Bayesian and robust Bayesian analysis in a general setting. Commun Stat Theory Methods 48(15):3899–3920
Karimnezhad A, Lucas PJ, Parsian A (2017) Constrained parameter estimation with uncertain priors for Bayesian networks. Electron J Stat 11(2):4000–4032
Kiapour A, Nematollahi N (2011) Robust Bayesian prediction and estimation under a squared log error loss function. Stat Probab Lett 81(11):1717–1724
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, New York
Lo AY (1984) On a class of Bayesian nonparametric estimates: I. Density estimates. Ann Stat 12(1):351–357
Lijoi A, Mena RH, Prünster I (2005) Hierarchical mixture modeling with normalized inverse-Gaussian priors. J Am Stat Assoc 100(472):1278–1291
Makov UE (1995) Loss robustness via Fisher-weighted squared-error loss function. Insur Math Econ 16(1):1–6
Masi I, Tran AT, Hassner T, Leksut JT, Medioni G (2016) Do we really need to collect millions of faces for effective face recognition? In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision, ECCV. Lecture Notes in Computer Science, Col 9909. Springer, Cham
Müller P, Quintana FA, Jara A, Hanson T (2016) Bayesian nonparametric data analysis. Springer, New York
Norstrom JG (1996) The use of precautionary loss functions in risk analysis. IEEE Trans Reliab 45(3):400–403
Parsian A, Kirmani S (2002) Estimation under LINEX loss function. In: Ullah A, Wan ATK, Chaturvedi A (eds) Handbook of applied econometrics and statistical inference. Marcel Dekker Inc., New York, pp 53–75
Phadia EG (1973) Minimax estimation of a cumulative distribution function. Ann Stat 1(6):1149–1157
Phadia EG (2016) Prior processes and their applications, 2nd edn. Springer, Cham
Pitman J (1996) Blackwell-macqueen urn scheme. Stat Probab Game Theory Pap Honor David Blackwell 30:245
Pitman J, Yor M (1997) The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann Probab 2:855–900
Regazzini E, Lijoi A, Prünster I (2003) Distributional results for means of normalized random measures with independent increments. Ann Stat 31(2):560–585
Ruggeri F (2010) Nonparametric Bayesian robustness. Chil J Stat 2:51–68
Ruggeri F (2014) On some optimal Bayesian nonparametric rules for estimating distribution functions. Econ Rev 33(1–4):289–304
Ríos Insua D, Ruggeri F, Wiper M (2012) Bayesian analysis of stochastic process models. Wiley, Chichester
Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sinica 4:639–650
Soliman AA (2005) Estimation of parameters of life from progressively censored data using Burr-xii model. IEEE Trans Reliab 54(1):34–42
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82:528–550
Teh YW (2006) A hierarchical Bayesian language model based on Pitman–Yor processes. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics. Association for Computational Linguistics pp 985–992
Thibaux R, Jordan MI (2007) Hierarchical beta processes and the Indian buffet process. In artificial intelligence and statistics. In: Meila M, Shen X (eds.) Proceedings of the 10th conference on artificial intelligence and statistics. Society for Artificial Intelligence and Statistics, pp 564–571
van Dyk DA, Meng XL (2001) The art of data augmentation. J Comput Graph Stat 10(1):1–50
Varian HR (1975) A Bayesian approach to real estate assessment. Studies in Bayesian Econometric and Statistics in Honor of Leonard J. Savage 195–208
Walker S, Muliere P (1997) Beta-Stacy processes and a generalization of the Pólya-urn scheme. Ann Stat 25(4):1762–1780
Zarepour M, Al-Labadi L (2012) On a rapid simulation of the Dirichlet process. Stat Probab Lett 82(5):916–924
Zellner A (1986) Bayesian estimation and prediction using asymmetric loss functions. J Am Stat Assoc 81(394):446–451
Acknowledgements
The authors are cordially grateful to the Editor in Chief and two anonymous reviewers for raising several helpful comments and suggestions which led to a substantial improvement in the quality of our work. Research of the second author was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) (RGPIN-2018-04008).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma A
Let \({\varvec{X}}=({\varvec{X}}_{1},\ldots ,{\varvec{X}}_{m})^{T}\) be a sample of size m of random vectors from unknown distribution \(P\in {\mathcal {F}}\), where \({\mathcal {F}}\) is a general class of beliefs on \((\mathbb {R},{\mathcal {B}})\). Suppose \(\theta _P=P(X_1\in S)\), where S is an arbitrary subset of the real line. Assuming that \(P\sim DP(a,H)\ [\)or \(P\sim DIP(a,H)]\), suppose that \(P|{\varvec{X}}\sim DP(\tilde{a},\tilde{H})\ [\)or \(P|{\varvec{X}}\sim DIP(\tilde{a},\tilde{H})]\). Then, for some known \(\alpha \), \(\beta \) and \(\gamma \),
- (i)$$\begin{aligned} E[\theta _P^{\alpha }(1-\theta _P)^{\beta }|{\varvec{X}}] =\frac{B\left( \tilde{a}\tilde{H}{(S)}+\alpha ,\tilde{a}(1-\tilde{H}{(S)}) +\beta \right) }{B\left( \tilde{a}\tilde{H}{(S)}, \tilde{a}(1-\tilde{H}{(S)})\right) }, \end{aligned}$$
- (ii)$$\begin{aligned} E[e^{\gamma \theta _P}|{\varvec{X}}]= 1+\sum _{l=1}^{\infty } \frac{c^l}{l!} \prod _{j=0}^{l-1} \frac{\tilde{a}\tilde{H}{(S)}+j}{\tilde{a}+j}, \end{aligned}$$
- (iii)$$\begin{aligned} E[\ln \theta _P|{\varvec{X}}]=-\sum _{l=1}^{\infty }\frac{1}{l} \frac{B\left( \tilde{a}\tilde{H}{(S)},\tilde{a}(1-\tilde{H}{(S)})+l\right) }{B\left( \tilde{a}\tilde{H}{(S)},\tilde{a}(1-\tilde{H}{(S)})\right) }, \end{aligned}$$
- (iv)$$\begin{aligned} E[\ln (1-\theta _P)|{\varvec{X}}]=-\sum _{l=1}^{\infty }\frac{1}{l} \frac{B\left( \tilde{a}\tilde{H}{(S)}+l,\tilde{a}(1-\tilde{H}{(S)})\right) }{B\left( \tilde{a}\tilde{H}{(S)},\tilde{a}(1-\tilde{a})\right) }, \end{aligned}$$
where \(B(\alpha ,\beta )=\frac{\varGamma (\alpha )\varGamma (\beta )}{\varGamma (\alpha +\beta )}\), and \(\varGamma (u)=\int _0^{\infty } v^{u-1} e^{-v}\,dv\) is the gamma function.
Proof
In Definition 1 take \(k=2\) and consider \({\mathcal {X}}\) as a union of two disjoint sets. This yields that \(P\sim Beta(aH{(S)},a(1-H{(S)}))\). Part (i) is simply verified using simple algebraic manipulation. Part (ii) is in fact obtained by using the moment generating function of the Beta distribution, see Gupta and Nadarajah (2004). Part (iii)Footnote 1 is verified using the Taylor’s expansion of \(\ln \theta _P\), i.e., \(\ln \theta _P=-\sum _{l=1}^{\infty }\frac{1}{l}(1-\theta _P)^l\), along with applying part (i) of the lemma with the choices \(\alpha =0\) and \(\beta =l\). Part (iv) is verified using the Taylor’s expansion of \(\ln (1-\theta _P)\), i.e., \(\ln (1-\theta _P)=-\sum _{l=1}^{\infty } \frac{1}{l}{\theta _P^l}\), along with applying part (i) of the lemma with the choices \(\alpha =l\) and \(\beta =0\).
Proof of Theorem 3
-
(i)
The function \(\eta _j^{\mathcal {F}}(a,\theta _0)\) is increasing in \(\theta _0\) and thus, \(\eta _j^{\mathcal {F}} (a,\theta _0)\) attains its infimum at \(\theta _0=\underline{\theta }_0\). Similarly, \( \eta _j^{\mathcal {F}}(a,\theta _0)\) attains its supremum at \(\theta _0=\overline{\theta }_0\).
-
(ii)
The function \( \eta _j^{\mathcal {F}}(a,\theta _0)\) is increasing in a, provided that \(\theta _0>\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\), and is decreasing in a, provided that \(\theta _0<\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\). Otherwise, \(\eta _j^{\mathcal {F}}(a,\theta _0)\) is constant. Thus, if \(\theta _0>\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\), \(\underline{\eta }_j^{\mathcal {F}}= \eta _j^{\mathcal {F}}(a_1,\theta _0)\) and if \(\theta _0<\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\), \(\overline{\eta }_j^{\mathcal {F}}= \eta _j^{\mathcal {F}}(a_2,\theta _0)\). The same discussion leads to the determination of \(\overline{\eta }_j^{\mathcal {F}}\).
-
(iii)
First, suppose that \( \underline{\theta }_0 \ge \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\). The fact that \(\theta _0\in [\underline{\theta }_0,\overline{\theta }_0]\) yields that \(\theta _0 \ge \underline{\theta }_0 \ge \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\) which results in \(\eta _j^{\mathcal {F}}(a,\theta _0)\ge \eta _j^{\mathcal {F}}(a_1,\theta _0)\), as well as \(\theta _0-\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \ge \underline{\theta }_0 -\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\ge 0\). The latter inequality implies that \(\eta _j^{\mathcal {F}}(a_1,\theta _0)\ge \eta _j^{\mathcal {F}}(a_1,\underline{\theta }_0)\). Now, using the fact that \( \eta _j^{\mathcal {F}}(a,\theta _0)\ge \eta _j^{\mathcal {F}}(a_1,\theta _0)\), it is concluded that for every a and H, \( \eta _j^{\mathcal {F}}(a,\theta _0)\ge \eta _j^{\mathcal {F}}(a_1,\underline{\theta }_0)\) or equivalently \(\underline{\eta }_j^{\mathcal {F}}(a,\theta _0) =\eta _j^{\mathcal {F}}(a_1,\underline{\theta }_0)\). The same discussion leads us to \(\overline{\eta }_j^{\mathcal {F}} =\eta _j^{\mathcal {F}}(a_2,\overline{\theta }_0)\). If \(\overline{\theta }_0 \le \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\), the fact that \(\theta _0\in [\underline{\theta }_0,\overline{\theta }_0]\) yields that \(\underline{\theta }_0 \le \theta _0 \le \overline{\theta }_0 \le \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j}\) which results in \( \eta _j^{\mathcal {F}}(a,\theta _0)\le \eta _j^{\mathcal {F}}(a_2,\theta _0)\), as well as \(\underline{\theta }_0-\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \le \theta _0-\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \le \overline{\theta }_0 -\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \le 0\). The latter inequality implies that \(\eta _j^{\mathcal {F}}(a_2,\theta _0)\le \eta _j^{\mathcal {F}}(a_2,\underline{\theta }_0)\). The fact that \(\eta _j^{\mathcal {F}}(a,\theta _0)\le \eta _j^{\mathcal {F}}(a_2,\theta _0)\) yields that for every a and \(\theta _0\), \( \eta _j^{\mathcal {F}}(a,\theta _0)\le \eta _j^{\mathcal {F}}(a_2,\underline{\theta }_0)\) or equivalently \(\underline{\eta }_j^{\mathcal {F}}=\eta _j^{\mathcal {F}}(a_2, \underline{\theta }_0)\). The same discussion leads us to \(\overline{\eta }_j^{\mathcal {F}}= \eta _j^{\mathcal {F}}(a_1,\overline{\theta }_0)\). Now, if \(\underline{\theta }_0< \frac{j+{v^{\mathcal {F}} (Z,{\varvec{X}})}}{m+j} < \overline{\theta }_0\), the fact that \(\theta _0\in [\underline{\theta }_0,\overline{\theta }_0]\) yields to the two possibilities \(\underline{\theta }_0 \le \theta _0<\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} < \overline{\theta }_0\) and \( \underline{\theta }_0<\frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} < \theta _0 \le \overline{\theta }_0\). In both cases, \(\underline{\theta }_0- \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \le \theta _0- \frac{j+{v^{\mathcal {F}}(Z,{\varvec{X}})}}{m+j} \) which yields that \( \eta _j^{\mathcal {F}}(a,\theta _0)\ge \eta _j^{\mathcal {F}}(a,\underline{\theta }_0)\). On the other hand \(\eta _j^{\mathcal {F}}(a,\underline{\theta }_0)\ge \eta _j^{\mathcal {F}}(a_2,\underline{\theta }_0)\). Combining these facts results that for every a and H, \(\eta _j^{\mathcal {F}} (a,\theta _0)\ge \eta _j^{\mathcal {F}}(a_2,\underline{\theta }_0)\) or equivalently \(\underline{\eta }_j^{\mathcal {F}}=\eta _j^{\mathcal {F}} (a_2,\underline{\theta }_0)\). The same discussion proves that \(\overline{\eta }_j^{\mathcal {F}}= \eta _j^{\mathcal {F}}(a_2, \overline{\theta }_0)\).
Rights and permissions
About this article
Cite this article
Karimnezhad, A., Zarepour, M. A general guide in Bayesian and robust Bayesian estimation using Dirichlet processes. Metrika 83, 321–346 (2020). https://doi.org/10.1007/s00184-019-00737-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-019-00737-2