Abstract
A characterization of Jeffreys’ prior for a parameter of a distribution in the exponential family is given by the asymptotic equivalence of the posterior mean of the canonical parameter to the maximum likelihood estimator. A promising role of the posterior mean is discussed because of its optimality property. Further, methods for improving estimators are explored, when neither the posterior mean nor the maximum likelihood estimator performs favorably. The possible advantages of conjugate analysis based on a suitably chosen prior are examined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison J (1975) Goodness of prediction fit. Biometrika 62:547–554
Amari S, Nagaoka H (2007) Methods of information geometry. Am Math Soc, Rhode Island
Berger JO, Bernardo JM (1992) Ordered group reference priors with application to the multinomial problem. Biometrika 79:25–37
Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J Roy Statist Soc B 41:113–147
Corcuera JM, Giummole F (1999) A generalized Bayes rule for prediction. Scand J Statist 26:265–279
Cox DR, Reid N (1987) Parameter orthogonality and approximate conditional inference (with discussion). J Roy Statist Soc B 49:1–39
Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann Statist 7:269–281
Fisher NL (1995) Statistical Analysis of circular data. Cambridge University Press, Cambridge
Ghosh M, Liu R (2011) Moment matching priors. Sankhya A 73:185–201
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
James W, Stein C (1961) Estimation with quadratic loss. Proc Fourth Berkeley Symp Math Statist Prob 1:361–380
Jeffreys H (1961) Theory of probability, 3rd edn. Oxford Univ Press, Oxford
Lehmann EL (1959) Testing statistical hypotheses. Wiley, New York
Lindsey JK (1996) Parametric statistical inference. Clarendon Press, Oxford
Neyman J, Scott EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32
Robert CP (2001) The Bayesian choice, 2nd edn. Springer, New York
Sakumura T and Yanagimoto T (2019) Posterior mean of the canonical parameter in the von-Mises distribution (in Japanese). Read at Japan Joint Statist Meet Abstract 69
Spiegelhalter DJ, Best NG, Carlin BP, van der Lind A (2002) Bayesian measures of model complexity and fit (with discussions). J R Statist Soc B 64:583–639
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2014) The deviance information criterion: 12 years on. J R Statist Soc B 76:485–493
Tierney L, Kass RE, Kadane JB (1989) Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J Am Statist Assoc 84:710–716
Yanagimoto T, Ohnishi T (2009) Bayesian prediction of a density function in terms of \(e\)-mixture. J Statist Plann Inf 139:3064–3075
Yanagimoto T, Ohnishi T (2011) Saddlepoint condition on a predictor to reconfirm the need for the assumption of a prior distribution. J Statist Plann Inf 41:1990–2000
Acknowledgements
The authors express their thanks to a reviewer and the editors for their comments on points to be clarified.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A. Proof of Theorem 6.1
Before presenting the proof, we clarify the notation that is more rigorous than that in the text. Write a density in the exponential family as
where \(\mathop {\textstyle {\sum }}t_i \in \mathcal {X} \subset \mathbb {R}^p\) is the sufficient statistic, and \(\theta \in \Theta \subset \mathbb {R}^p\) with \(\theta =(\theta _1,\ldots ,\theta _p)\) is the canonical parameter.
For a given \(\theta _0\), the corresponding mean parameter is written as \(\mu _0 = \nabla M (\theta _0)\). The Kullback–Leibler divergence is expressed as
and the Fisher information matrix is written as
where \(M_{ij} (\theta ) = \partial ^2 M (\theta )/\partial \theta _i \partial \theta _j\). For notational convenience, the partial derivative of a function of a vector variable with respect to its components is denoted by the corresponding suffixes.
We begin the proof with presenting an expression of the posterior mean as
Thus, we may evaluate
for cases in which \(g (\theta )\) is \( \theta _i b (\theta )\) and \(b (\theta )\).
When \(\theta \approx \theta _0\), the following formal approximation is possible:
where \(\theta _{0i}\) denotes the i-th component of \(\theta _0\).
Since \(I (\theta _0)\) is assumed to be positive definite, the a-th power can be defined for \(a=1/2\) and \(-1/2\). Both matrices are positive definite, and one is the inverse matrix of the other. We consider here the following parameter transformation of \(\theta \) to z as
The Jacobian of this transformation is
Then the asymptotic expansion of the Kullback–Leibler divergence up to the order O(1/n) is given by
where \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) are defined as
Note that \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) remain unchanged under the permutation of the suffixes, since \(M (\theta )\) is assumed to be of \(C^4\) class. To evaluate the integral in (6.16), we evaluate the asymptotic expansion of \(\exp \{ -n \text{ D }(\theta _0, \theta ) \}\). Writing the density of the standard p-dimensional normal as \(\phi (z)\), we can give the asymptotic expansion up to the order O(1/n) as
In a sequel, we regard the domain of \(\theta \) as \(\mathbb {R}^p\).
Next, we calculate the asymptotic expansion of \(g (\theta )\) up to the order O(1/n) by
where \(c_{j_1}^{(1)}\) and \(c_{j_1 j_2}^{(2)}\) denote, respectively,
and
Note that these coefficients also remain unchanged under the permutation of the suffixes, as are \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) in (6.17) and (6.18).
Combining these asymptotic expansions, we obtain that of the integrant in (6.16) as follows:
Since this approximated integrant contains \(\phi (z)\), we may discard the odd order terms of the polynomial of z to give
Let \(Z = (Z_1, \ldots , Z_p)^T\) be a random variable having the density \(\phi (z)\). Then the second moment is written as \(E \{ Z_i Z_j \} = \delta _{ij}\). To evaluate the fourth moment, set \(\gamma _{ijkl} := E \{ Z_i Z_j Z_k Z_l\}\). It follows that \(\gamma _{iiii} = 3\) for every i, and that \(\gamma _{iikk} = 1 \) for (i, j, k, l) such that two pairs take different integers. The sixth moment remains in the asymptotic expansion of the integral of (6.16), but disappears in the asymptotic expansion of the posterior mean.
The asymptotic expansion of the integral of (6.16) up to the order O(1/n) is expressed as
Set the sum of the second and third terms as A, that is,
Note that A is independent of \(b(\theta )\). Next, we simplify the fourth term, by applying the properties of \(a_{j_1 j_2 j_3}^{(1)}\), as follows:
The fifth term is rewritten as
Therefore, the asymptotic expansion of the integral is of the form
Thus, both the numerator and the denominator of the posterior mean are expressed in similar forms as
Consequently, we obtain the asymptotic expansion of the posterior mean as
Next, we prove the necessity and suppose that \(d_{Ni} = d_D\) for every i. Since the coefficients \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) are independent of \(b(\theta )\), the difference \(d_{Ni} = d_D\) is independent of A for every i. Thus, the difference depends on \(c_{j_1}^{(1)}\) and \(c_{j_1 j_1}^{(2)}\). To evaluate the difference, we decompose it into two terms \( F_1 + F_2\) such that \(F_1\) is a function of \(c_{j_1}^{(1)}\) and \(F_2\) is a function of \(c_{j_1 j_1}^{(2)}\).
Since \(M(\theta )\) is assumed to be of \(C^4\) class, \(\text{ I }(\theta )\) is of \(C^2\) class. Using the equality
we find that the coefficient of \(c_{j_1}^{(1)}\) in the difference \(d_{Ni} = d_D\) is written as
This implies that \(F_1\) can be expressed as follows:
Since \(I^{-1/2} (\theta _0)\) is symmetric, it follows that
and also that
Consequently, the former term is given by
Next, we evaluate the latter term \(F_2\). It holds that
Hence, the coefficient of \(c_{j_1 j_1}^{(2)}\) in the difference \(d_{Ni} = d_D\) is written as
Thus, the latter term \(F_2\) is given by
Combining these results, we obtain that
Using the equality
and replacing the index \(j_4\) in the summation of the difference (6.21) by \(j_3\), we can rewrite the difference as follows:
Applying the differentiation formula for the determinant of a differentiable and invertible matrix A(t), \(d \{A (t)\}/dt = \det A (t)\, \text{ tr }\{ A^{-1} d\{ A(t)\}/dt\}\), we can express the right-hand side in terms of the derivative of the determinant of the matrix \(I(\theta _0)\):
This implies that
The condition for this equality to hold for every i is expressed as
This completes the proof.
Appendix B. Proof of Corollary 6.1
To apply Theorem 6.1, set \(s=\nabla N(t)\), and define the function \(f_n (s)\) as
Theorem 6.1 yields that, for an arbitrary fixed s,
where the coefficient \(a_n (s)\) is continuous and is of the order O(1).
Write the MLE of \(\theta \) for a sample of size n, \({\varvec{x}}_n\), as \(\hat{\theta }_{ML}({\varvec{x}}_n) \). Since the sample density is assumed to be in the exponential family, \(\hat{\theta }_{ML}({\varvec{x}}_n)\) can be expressed as \(\nabla N(\bar{t})\), which is written as \(s_n\). Then, the posterior mean \(\hat{\theta }({\varvec{x}}_n)\) is expressed as \(\hat{\theta }({\varvec{x}}_n)=f_n (s_n)\). The assumption that the sampling density is in the regular exponential family shows that a true parameter \(\theta _T\) is in the interior of \(\Theta \); the law of large numbers implies that for an arbitrary small positive value \(\epsilon \), the probability of the subspace of samples \(\mathcal{{X}}_{\epsilon }(n) = \{{\varvec{x}}_n|\, |s_n -\theta _T| \le \epsilon \}\) is greater than \(1-\epsilon \). From the continuity of the coefficient \(a_n(s_n)\) in (6.22), it follows that \(a_n(s_n)\) is bounded for \({\varvec{x}}_n \in \mathcal{{X}}_{\epsilon }(n)\). Thus, it holds that
where \(c_n\) is a finite value. Combining these results, we find that
Rights and permissions
Copyright information
© 2020 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Yanagimoto, T., Ohnishi, T. (2020). A Characterization of Jeffreys’ Prior with Its Implications to Likelihood Inference. In: Hoshino, N., Mano, S., Shimura, T. (eds) Pioneering Works on Distribution Theory. SpringerBriefs in Statistics(). Springer, Singapore. https://doi.org/10.1007/978-981-15-9663-6_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-9663-6_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9662-9
Online ISBN: 978-981-15-9663-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)