Skip to main content

A Characterization of Jeffreys’ Prior with Its Implications to Likelihood Inference

  • Chapter
  • First Online:
Pioneering Works on Distribution Theory

Part of the book series: SpringerBriefs in Statistics ((JSSRES))

  • 245 Accesses

Abstract

A characterization of Jeffreys’ prior for a parameter of a distribution in the exponential family is given by the asymptotic equivalence of the posterior mean of the canonical parameter to the maximum likelihood estimator. A promising role of the posterior mean is discussed because of its optimality property. Further, methods for improving estimators are explored, when neither the posterior mean nor the maximum likelihood estimator performs favorably. The possible advantages of conjugate analysis based on a suitably chosen prior are examined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aitchison J (1975) Goodness of prediction fit. Biometrika 62:547–554

    Article  MathSciNet  Google Scholar 

  2. Amari S, Nagaoka H (2007) Methods of information geometry. Am Math Soc, Rhode Island

    Book  Google Scholar 

  3. Berger JO, Bernardo JM (1992) Ordered group reference priors with application to the multinomial problem. Biometrika 79:25–37

    Article  MathSciNet  Google Scholar 

  4. Bernardo JM (1979) Reference posterior distributions for Bayesian inference. J Roy Statist Soc B 41:113–147

    MathSciNet  MATH  Google Scholar 

  5. Corcuera JM, Giummole F (1999) A generalized Bayes rule for prediction. Scand J Statist 26:265–279

    Article  MathSciNet  Google Scholar 

  6. Cox DR, Reid N (1987) Parameter orthogonality and approximate conditional inference (with discussion). J Roy Statist Soc B 49:1–39

    MathSciNet  MATH  Google Scholar 

  7. Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann Statist 7:269–281

    Article  MathSciNet  Google Scholar 

  8. Fisher NL (1995) Statistical Analysis of circular data. Cambridge University Press, Cambridge

    Google Scholar 

  9. Ghosh M, Liu R (2011) Moment matching priors. Sankhya A 73:185–201

    Article  Google Scholar 

  10. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  11. James W, Stein C (1961) Estimation with quadratic loss. Proc Fourth Berkeley Symp Math Statist Prob 1:361–380

    MathSciNet  MATH  Google Scholar 

  12. Jeffreys H (1961) Theory of probability, 3rd edn. Oxford Univ Press, Oxford

    MATH  Google Scholar 

  13. Lehmann EL (1959) Testing statistical hypotheses. Wiley, New York

    MATH  Google Scholar 

  14. Lindsey JK (1996) Parametric statistical inference. Clarendon Press, Oxford

    MATH  Google Scholar 

  15. Neyman J, Scott EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32

    Article  MathSciNet  Google Scholar 

  16. Robert CP (2001) The Bayesian choice, 2nd edn. Springer, New York

    Google Scholar 

  17. Sakumura T and Yanagimoto T (2019) Posterior mean of the canonical parameter in the von-Mises distribution (in Japanese). Read at Japan Joint Statist Meet Abstract 69

    Google Scholar 

  18. Spiegelhalter DJ, Best NG, Carlin BP, van der Lind A (2002) Bayesian measures of model complexity and fit (with discussions). J R Statist Soc B 64:583–639

    Article  Google Scholar 

  19. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2014) The deviance information criterion: 12 years on. J R Statist Soc B 76:485–493

    Article  MathSciNet  Google Scholar 

  20. Tierney L, Kass RE, Kadane JB (1989) Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J Am Statist Assoc 84:710–716

    Article  MathSciNet  Google Scholar 

  21. Yanagimoto T, Ohnishi T (2009) Bayesian prediction of a density function in terms of \(e\)-mixture. J Statist Plann Inf 139:3064–3075

    Article  MathSciNet  Google Scholar 

  22. Yanagimoto T, Ohnishi T (2011) Saddlepoint condition on a predictor to reconfirm the need for the assumption of a prior distribution. J Statist Plann Inf 41:1990–2000

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors express their thanks to a reviewer and the editors for their comments on points to be clarified.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takemi Yanagimoto .

Editor information

Editors and Affiliations

Appendices

Appendix A. Proof of Theorem 6.1

Before presenting the proof, we clarify the notation that is more rigorous than that in the text. Write a density in the exponential family as

$$ p ({\varvec{x}}|\theta ) = \prod \exp \{ \theta \cdot t_i - M (\theta ) \} a (x_i), $$

where \(\mathop {\textstyle {\sum }}t_i \in \mathcal {X} \subset \mathbb {R}^p\) is the sufficient statistic, and \(\theta \in \Theta \subset \mathbb {R}^p\) with \(\theta =(\theta _1,\ldots ,\theta _p)\) is the canonical parameter.

For a given \(\theta _0\), the corresponding mean parameter is written as \(\mu _0 = \nabla M (\theta _0)\). The Kullback–Leibler divergence is expressed as

$$\begin{aligned} \text{ D }(\theta _0, \theta ) \,=\, M (\theta ) + N (\mu _0) - \mu _0 \cdot \theta , \end{aligned}$$

and the Fisher information matrix is written as

$$\begin{aligned} I (\theta ) \,=\, \left\{ M_{ij} (\theta ) \right\} _{1 \le i, j \le p}, \end{aligned}$$

where \(M_{ij} (\theta ) = \partial ^2 M (\theta )/\partial \theta _i \partial \theta _j\). For notational convenience, the partial derivative of a function of a vector variable with respect to its components is denoted by the corresponding suffixes.

We begin the proof with presenting an expression of the posterior mean as

$$\begin{aligned} \text{ E }\bigl \{ \theta \,;\, \pi (\theta ; \theta _0, n) \bigr \} = \frac{\int _\Theta \theta b (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta }{\int _\Theta b (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta }. \end{aligned}$$

Thus, we may evaluate

$$\begin{aligned} \int _\Theta g (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta \end{aligned}$$
(6.16)

for cases in which \(g (\theta )\) is \( \theta _i b (\theta )\) and \(b (\theta )\).

When \(\theta \approx \theta _0\), the following formal approximation is possible:

$$\begin{aligned} \text{ D }(\theta _0, \theta )&\approx \frac{1}{2} (\theta - \theta _0)^T I (\theta _0) (\theta - \theta _0) \\&\qquad + \frac{1}{3!} \sum _{j_1, j_2, j_3} M_{j_1 j_2 j_3} (\theta _0) (\theta _{j_1} - \theta _{0 j_1}) (\theta _{j_2} - \theta _{0 j_2}) (\theta _{j_3} - \theta _{0 j_3}) \\&\qquad + \frac{1}{4!} \sum _{j_1, j_2, j_3, j_4} M_{j_1 j_2 j_3 j_4} (\theta _0) (\theta _{j_1} - \theta _{0 j_1}) (\theta _{j_2} - \theta _{0 j_2}) (\theta _{j_3} - \theta _{0 j_3}) (\theta _{j4} - \theta _{0 j_4}), \end{aligned}$$

where \(\theta _{0i}\) denotes the i-th component of \(\theta _0\).

Since \(I (\theta _0)\) is assumed to be positive definite, the a-th power can be defined for \(a=1/2\) and \(-1/2\). Both matrices are positive definite, and one is the inverse matrix of the other. We consider here the following parameter transformation of \(\theta \) to z as

$$\begin{aligned} z \,=\, \sqrt{n} I^{1/2} (\theta _0) (\theta - \theta _0). \end{aligned}$$

The Jacobian of this transformation is

$$\begin{aligned} \frac{1}{n^{p/2} \sqrt{\det I (\theta _0)}}. \end{aligned}$$

Then the asymptotic expansion of the Kullback–Leibler divergence up to the order O(1/n) is given by

$$\begin{aligned} n \text{ D }(\theta _0, \theta ) \approx \frac{| z |^2}{2} + \frac{1}{\sqrt{n}} \sum _{j_1, j_2, j_3} a_{j_1 j_2 j_3}^{(1)} z_{j_1} z_{j_2} z_{j_3} + \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} z_{j_1} z_{j_2} z_{j_3} z_{j_4}, \end{aligned}$$

where \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) are defined as

$$\begin{aligned} a_{j_1 j_2 j_3}^{(1)} := \frac{1}{3!} \sum _{j_4, j_5, j_6} M_{j_4 j_5 j_6} (\theta _0) I^{-1/2}_{j_4 j_1} (\theta _0) I^{-1/2}_{j_5 j_2} (\theta _0) I^{-1/2}_{j_6 j_3} (\theta _0); \end{aligned}$$
(6.17)
$$\begin{aligned} a_{j_1 j_2 j_3 j_4}^{(2)} := \frac{1}{4!} \sum _{j_5, j_6, j_7, j_8} M_{j_5 j_6 j_7 j_8} (\theta _0) I^{-1/2}_{j_5 j_1} (\theta _0) I^{-1/2}_{j_6 j_2} (\theta _0) I^{-1/2}_{j_7 j_3} (\theta _0) I^{-1/2}_{j_8 j_4} (\theta _0). \end{aligned}$$
(6.18)

Note that \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) remain unchanged under the permutation of the suffixes, since \(M (\theta )\) is assumed to be of \(C^4\) class. To evaluate the integral in (6.16), we evaluate the asymptotic expansion of \(\exp \{ -n \text{ D }(\theta _0, \theta ) \}\). Writing the density of the standard p-dimensional normal as \(\phi (z)\), we can give the asymptotic expansion up to the order O(1/n) as

$$\begin{aligned}&\exp \{ -n \text{ D }(\theta _0, \theta ) \} \\&= (2 \pi )^{p/2} \phi (z) \exp \left( - \frac{1}{\sqrt{n}} \sum _{j_1, j_2, j_3} a_{j_1 j_2 j_3}^{(1)} z_{j_1} z_{j_2} z_{j_3} - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} \right) \\&\approx (2 \pi )^{p/2} \phi (z) \Biggl ( 1 - \frac{1}{\sqrt{n}} \sum _{j_1, j_2, j_3} a_{j_1 j_2 j_3}^{(1)} z_{j_1} z_{j_2} z_{j_3} - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} \\&\qquad \qquad \qquad \qquad \qquad \qquad + \frac{1}{2 n} \sum _{j_1, j_2, j_3, j_4, j_5, j_6} a_{j_1 j_2 j_3}^{(1)} a_{j_4 j_5 j_6}^{(1)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} z_{j_5} z_{j_6} \Biggr ). \end{aligned}$$

In a sequel, we regard the domain of \(\theta \) as \(\mathbb {R}^p\).

Next, we calculate the asymptotic expansion of \(g (\theta )\) up to the order O(1/n) by

$$\begin{aligned} g (\theta )&\approx g (\theta _0) + \frac{1}{\sqrt{n}} \sum _{j_1, j_2} g_{j_2} (\theta _0) I^{-1/2}_{j_2 j_1} (\theta _0) z_{j_1} \\&\qquad \qquad + \frac{1}{2 n} \sum _{j_1, j_2, j_3, j_4} g_{j_3, j_4} (\theta _0) I^{-1/2}_{j_3 j_1} (\theta _0) I^{-1/2}_{j_4 j_2} (\theta _0) z_{j_1} z_{j_2} \\&= g (\theta _0) \left( 1 + \frac{1}{\sqrt{n}} \sum _{j_1} c_{j_1}^{(1)} z_{j_1} + \frac{1}{n} \sum _{j_1, j_2} c_{j_1 j_2}^{(2)} z_{j_1} z_{j_2} \right) , \end{aligned}$$

where \(c_{j_1}^{(1)}\) and \(c_{j_1 j_2}^{(2)}\) denote, respectively,

$$\begin{aligned} c_{j_1}^{(1)} := \sum _{j_2} \frac{g_{j_2} (\theta _0)}{g (\theta _0)} I^{-1/2}_{j_2 j_1} (\theta _0) \end{aligned}$$
(6.19)

and

$$\begin{aligned} c_{j_1 j_2}^{(2)} := \frac{1}{2} \sum _{j_3, j_4} \frac{g_{j_3 j_4} (\theta _0)}{g (\theta _0)} I^{-1/2}_{j_3 j_1} (\theta _0) I^{-1/2}_{j_4 j_2} (\theta _0). \end{aligned}$$
(6.20)

Note that these coefficients also remain unchanged under the permutation of the suffixes, as are \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) in (6.17) and (6.18).

Combining these asymptotic expansions, we obtain that of the integrant in (6.16) as follows:

$$\begin{aligned}&g (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} \\&\approx (2 \pi )^{p/2} g (\theta _0) \phi (z) \left( 1 + \frac{1}{\sqrt{n}} \sum _{j_1} c_{j_1}^{(1)} z_{j_1} + \frac{1}{n} \sum _{j_1, j_2} c_{j_1 j_2}^{(2)} z_{j_1} z_{j_2} \right) \\&\qquad \times \Biggl ( 1 - \frac{1}{\sqrt{n}} \sum _{j_1, j_2, j_3} a_{j_1 j_2 j_3}^{(1)} z_{j_1} z_{j_2} z_{j_3} - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} \\&\qquad \qquad \qquad \qquad \qquad \qquad + \frac{1}{2 n} \sum _{j_1, j_2, j_3, j_4, j_5, j_6} a_{j_1 j_2 j_3}^{(1)} a_{j_4 j_5 j_6}^{(1)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} z_{j_5} z_{j_6} \Biggr ). \end{aligned}$$

Since this approximated integrant contains \(\phi (z)\), we may discard the odd order terms of the polynomial of z to give

$$\begin{aligned} (2 \pi )^{p/2} g (\theta _0) \phi (z) \\&\times \Biggl \{ 1 - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} \\&\qquad \qquad + \frac{1}{2 n} \sum _{j_1, j_2, j_3, j_4, j_5, j_6} a_{j_1 j_2 j_3}^{(1)} a_{j_4 j_5 j_6}^{(1)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} z_{j_5} z_{j_6} \\&\qquad \qquad - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} c_{j_1}^{(1)} a_{j_2 j_3 j_4}^{(1)} z_{j_1} z_{j_2} z_{j_3} z_{j_4} + \frac{1}{n} \sum _{j_1, j_2} c_{j_1 j_2}^{(2)} z_{j_1} z_{j_2} \Biggr \}. \end{aligned}$$

Let \(Z = (Z_1, \ldots , Z_p)^T\) be a random variable having the density \(\phi (z)\). Then the second moment is written as \(E \{ Z_i Z_j \} = \delta _{ij}\). To evaluate the fourth moment, set \(\gamma _{ijkl} := E \{ Z_i Z_j Z_k Z_l\}\). It follows that \(\gamma _{iiii} = 3\) for every i, and that \(\gamma _{iikk} = 1 \) for (ijkl) such that two pairs take different integers. The sixth moment remains in the asymptotic expansion of the integral of (6.16), but disappears in the asymptotic expansion of the posterior mean.

The asymptotic expansion of the integral of (6.16) up to the order O(1/n) is expressed as

$$\begin{aligned}&\int _\Theta g (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta \\&\approx \left( \frac{2 \pi }{n} \right) ^{p/2} \frac{g (\theta _0)}{\sqrt{\det I (\theta _0)}} \\&\qquad \times \Biggl \{ 1 - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} \gamma _{j_1 j_2 j_3 j_4} + \frac{1}{2 n} \sum _{j_1, j_2, j_3, j_4, j_5, j_6} a_{j_1 j_2 j_3}^{(1)} a_{j_4 j_5 j_6}^{(1)} \gamma _{j_1 j_2 j_3 j_4 j_5 j_6} \\&\qquad \qquad \qquad \qquad \qquad - \frac{1}{n} \sum _{j_1, j_2, j_3, j_4} c_{j_1}^{(1)} a_{j_2 j_3 j_4}^{(1)} \gamma _{j_1 j_2 j_3 j_4} + \frac{1}{n} \sum _{j_1, j_2} c_{j_1 j_2}^{(2)} \delta _{j_1 j_2} \Biggr \}. \end{aligned}$$

Set the sum of the second and third terms as A, that is,

$$\begin{aligned} A \,=\, - \sum _{j_1, j_2, j_3, j_4} a_{j_1 j_2 j_3 j_4}^{(2)} \gamma _{j_1 j_2 j_3 j_4} \,+\, \frac{1}{2} \sum _{j_1, j_2, j_3, j_4, j_5, j_6} a_{j_1 j_2 j_3}^{(1)} a_{j_4 j_5 j_6}^{(1)} \gamma _{j_1 j_2 j_3 j_4 j_5 j_6}. \end{aligned}$$

Note that A is independent of \(b(\theta )\). Next, we simplify the fourth term, by applying the properties of \(a_{j_1 j_2 j_3}^{(1)}\), as follows:

$$\begin{aligned}&\sum _{j_1, j_2, j_3, j_4} c_{j_1}^{(1)} a_{j_2 j_3 j_4}^{(1)} \gamma _{j_1 j_2 j_3 j_4} \\&= 3 \sum _{j_1} c_{j_1}^{(1)} a_{j_1 j_1 j_1}^{(1)} + \sum _{j_1 \ne j_2} c_{j_1}^{(1)} a_{j_1 j_2 j_2}^{(1)} + \sum _{j_1 \ne j_2} c_{j_1}^{(1)} a_{j_2 j_1 j_2}^{(1)} + \sum _{j_1 \ne j_2} c_{j_1}^{(1)} a_{j_2 j_2 j_1}^{(1)} \\&= 3 \sum _{j_1} c_{j_1}^{(1)} a_{j_1 j_1 j_1}^{(1)} + 3 \sum _{j_1 \ne j_2} c_{j_1}^{(1)} a_{j_1 j_2 j_2}^{(1)} \\&= 3 \sum _{j_1, j_2} c_{j_1}^{(1)} a_{j_1 j_2 j_2}^{(1)}. \end{aligned}$$

The fifth term is rewritten as

$$\begin{aligned} \sum _{j_1, j_2} c_{j_1 j_2}^{(2)} \delta _{j_1 j_2} = \sum _{j_1} c_{j_1 j_1}^{(2)}. \end{aligned}$$

Therefore, the asymptotic expansion of the integral is of the form

$$\begin{aligned}&\int _\Theta g (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta \\&\approx \left( \frac{2 \pi }{n} \right) ^{p/2} \frac{g (\theta _0)}{\sqrt{\det I (\theta _0)}} \Biggl \{ 1 + \frac{1}{n} \left( A - 3 \sum _{j_1, j_2} c_{j_1}^{(1)} a_{j_1 j_2 j_2}^{(1)} + \sum _{j_1} c_{j_1 j_1}^{(2)} \right) \Biggr \}. \end{aligned}$$

Thus, both the numerator and the denominator of the posterior mean are expressed in similar forms as

$$\begin{aligned} \int _\Theta \theta _i b (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta&\approx \left( \frac{2 \pi }{n} \right) ^{p/2} \frac{\theta _{0i} g (\theta _0)}{\sqrt{\det I (\theta _0)}} \left( 1 + \frac{d_{Ni}}{n} \right) , \\ \int _\Theta b (\theta ) \exp \{ - n \text{ D }(\theta _0, \theta ) \} d \theta&\approx \left( \frac{2 \pi }{n} \right) ^{p/2} \frac{g (\theta _0)}{\sqrt{\det I (\theta _0)}} \left( 1 + \frac{d_D}{n} \right) . \end{aligned}$$

Consequently, we obtain the asymptotic expansion of the posterior mean as

$$\begin{aligned} \text{ E }\bigl \{ \theta _i \,;\, \pi (\theta ; \theta _0, n) \bigr \} \approx \theta _{0i} \left( 1 + \frac{d_{Ni} - d_D}{n} \right) . \end{aligned}$$

Next, we prove the necessity and suppose that \(d_{Ni} = d_D\) for every i. Since the coefficients \(a_{j_1 j_2 j_3}^{(1)}\) and \(a_{j_1 j_2 j_3 j_4}^{(2)}\) are independent of \(b(\theta )\), the difference \(d_{Ni} = d_D\) is independent of A for every i. Thus, the difference depends on \(c_{j_1}^{(1)}\) and \(c_{j_1 j_1}^{(2)}\). To evaluate the difference, we decompose it into two terms \( F_1 + F_2\) such that \(F_1\) is a function of \(c_{j_1}^{(1)}\) and \(F_2\) is a function of \(c_{j_1 j_1}^{(2)}\).

Since \(M(\theta )\) is assumed to be of \(C^4\) class, \(\text{ I }(\theta )\) is of \(C^2\) class. Using the equality

$$\begin{aligned} \frac{1}{\theta _i b (\theta )} \frac{\partial \{ \theta _i b (\theta ) \}}{\partial \theta _{j_2}} - \frac{b_{j_2} (\theta )}{b (\theta )}&= \frac{\delta _{j_2 i}}{\theta _i}, \end{aligned}$$

we find that the coefficient of \(c_{j_1}^{(1)}\) in the difference \(d_{Ni} = d_D\) is written as

$$\begin{aligned} \sum _{j_3} \frac{\delta _{j_3 i}}{\theta _{0i}} I^{-1/2}_{j_3 j_1} (\theta _0) = \frac{I^{-1/2}_{i j_1} (\theta _0)}{\theta _{0i}}. \end{aligned}$$

This implies that \(F_1\) can be expressed as follows:

$$\begin{aligned} F_1&= -3 \sum _{j_1 j_2} \frac{I^{-1/2}_{i j_1} (\theta _0)}{\theta _{0i}} a_{j_1 j_2 j_2}^{(1)} \\&= -3 \sum _{j_1 j_2} \frac{I^{-1/2}_{i j_1} (\theta _0)}{\theta _{0i}} \frac{1}{3!} \sum _{j_4, j_5, j_6} M_{j_4 j_5 j_6} (\theta _0) I^{-1/2}_{j_4 j_1} (\theta _0) I^{-1/2}_{j_5 j_2} (\theta _0) I^{-1/2}_{j_6 j_2} (\theta _0) \\&= -\frac{1}{2 \theta _{0i}} \sum _{j_1, j_2, j_4, j_5, j_6} I^{-1/2}_{i j_1} (\theta _0) I^{-1/2}_{j_4 j_1} (\theta _0) I^{-1/2}_{j_5 j_2} (\theta _0) I^{-1/2}_{j_6 j_2} (\theta _0) M_{j_4 j_5 j_6} (\theta _0). \end{aligned}$$

Since \(I^{-1/2} (\theta _0)\) is symmetric, it follows that

$$\begin{aligned} \sum _{j_1} I^{-1/2}_{i j_1} (\theta _0) I^{-1/2}_{j_4 j_1} (\theta _0) = \sum _{j_1} I^{-1/2}_{i j_1} (\theta _0) I^{-1/2}_{j_1 j_4} (\theta _0) = I^{-1}_{i j_4} (\theta _0) \end{aligned}$$

and also that

$$\begin{aligned} \sum _{j_2} I^{-1/2}_{j_5 j_2} (\theta _0) I^{-1/2}_{j_6 j_2} (\theta _0)&= I^{-1}_{j_5 j_6} (\theta _0). \end{aligned}$$

Consequently, the former term is given by

$$\begin{aligned} F_1 \,=\, -\frac{1}{2 \theta _{0i}} \sum _{j_4, j_5, j_6} I^{-1}_{i j_4} (\theta _0) I^{-1}_{j_5 j_6} (\theta _0) M_{j_4 j_5 j_6} (\theta _0). \end{aligned}$$

Next, we evaluate the latter term \(F_2\). It holds that

$$\begin{aligned} \frac{1}{\theta _i b (\theta )} \frac{\partial ^2 \{ \theta _i b (\theta ) \}}{\partial \theta _{j_3} \partial _{j4}} \,-\, \frac{b_{j_3 j_4} (\theta )}{b (\theta )}&= \frac{\delta _{j_4 i} b_{j_3} (\theta ) + \delta _{j_3 i} b_{j_4} (\theta )}{\theta _i b (\theta )}. \end{aligned}$$

Hence, the coefficient of \(c_{j_1 j_1}^{(2)}\) in the difference \(d_{Ni} = d_D\) is written as

$$\begin{aligned}&\frac{1}{2} \sum _{j_3, j_4} \frac{\delta _{j_4 i} b_{j_3} (\theta _0) + \delta _{j_3 i} b_{j_4} (\theta _0)}{\theta _{0i} b (\theta )} I^{-1/2}_{j_3 j_1} (\theta _0) I^{-1/2}_{j_4 j_1} (\theta _0) \\&= \frac{I^{-1/2}_{i j_1} (\theta _0)}{2 \theta _{0i}} \sum _{j_3} \frac{b_{j_3} (\theta _0)}{b (\theta )} I^{-1/2}_{j_3 j_1} (\theta _0) + \frac{I^{-1/2}_{i j_1} (\theta _0)}{2 \theta _{0i}} \sum _{j_4} \frac{b_{j_4} (\theta _0)}{b (\theta )} I^{-1/2}_{j_4 j_1} (\theta _0) \\&= \frac{I^{-1/2}_{i j_1} (\theta _0)}{\theta _{0i}} \sum _{j_3} \frac{b_{j_3} (\theta _0)}{b (\theta )} I^{-1/2}_{j_3 j_1} (\theta _0). \end{aligned}$$

Thus, the latter term \(F_2\) is given by

$$\begin{aligned} F_2 = \frac{1}{\theta _{0i}} \sum _{j_1, j_3} I^{-1/2}_{i j_1} (\theta _0) \frac{b_{j_3} (\theta _0)}{b (\theta )} I^{-1/2}_{j_3 j_1} (\theta _0) = \frac{1}{\theta _{0i}} \sum _{j_3} I^{-1}_{i j_3} (\theta _0) \frac{b_{j_3} (\theta _0)}{b (\theta )}. \end{aligned}$$

Combining these results, we obtain that

$$\begin{aligned} d_{Ni} - d_D \,=\, -\frac{1}{2 \theta _{0i}} \sum _{j_4, j_5, j_6} I^{-1}_{i j_4} (\theta _0) I^{-1}_{j_5 j_6} (\theta _0) M_{j_4 j_5 j_6} (\theta _0) + \frac{1}{\theta _{0i}} \sum _{j_3} I^{-1}_{i j_3} (\theta _0) \frac{b_{j_3} (\theta _0)}{b (\theta )}. \end{aligned}$$
(6.21)

Using the equality

$$ M_{j_3 j_5 j_6} (\theta _0) = \frac{\partial I_{j_5 j_6} (\theta _0)}{\partial \theta _{0 j_3}} $$

and replacing the index \(j_4\) in the summation of the difference (6.21) by \(j_3\), we can rewrite the difference as follows:

$$\begin{aligned} d_{Ni} - d_D&= \frac{1}{\theta _{0i}} \sum _{j_3} I^{-1}_{i j_3} (\theta _0) \left\{ \frac{b_{j_3} (\theta _0)}{b (\theta )} -\frac{1}{2} \sum _{j_5, j_6} I^{-1}_{j_5 j_6} (\theta _0) \frac{\partial I_{j_5 j_6} (\theta _0)}{\partial \theta _{0 j_3}} \right\} . \end{aligned}$$

Applying the differentiation formula for the determinant of a differentiable and invertible matrix A(t), \(d \{A (t)\}/dt = \det A (t)\, \text{ tr }\{ A^{-1} d\{ A(t)\}/dt\}\), we can express the right-hand side in terms of the derivative of the determinant of the matrix \(I(\theta _0)\):

$$\begin{aligned} \sum _{j_5, j_6} I^{-1}_{j_5 j_6} (\theta _0) \frac{\partial I_{j_5 j_6} (\theta _0)}{\partial \theta _{0 j_3}} = \frac{\partial \;\;\; }{\partial \theta _{0 j_3}} \log \det I (\theta _0). \end{aligned}$$

This implies that

$$\begin{aligned} d_{Ni} - d_D \,=\, \frac{1}{\theta _{0i}} \sum _{j_3} I^{-1}_{i j_3} (\theta _0) \frac{\partial \;\;\; }{\partial \theta _{0 j_3}} \left\{ \log b (\theta ) -\frac{1}{2} \log \det I (\theta _0) \right\} . \end{aligned}$$

The condition for this equality to hold for every i is expressed as

$$ \nabla \log \frac{b (\theta )}{\sqrt{ \det I (\theta )}} \,=\, 0. $$

This completes the proof.

Appendix B. Proof of Corollary 6.1

To apply Theorem 6.1, set \(s=\nabla N(t)\), and define the function \(f_n (s)\) as

$$ f_n (s) = \frac{\mathop {\textstyle {\int }}\theta \exp \{ -n D (s, \theta ) \} \pi _J (\theta ) d \theta }{\mathop {\textstyle {\int }}\exp \{ -n D (s, \theta ) \} \pi _J (\theta ) d \theta }. $$

Theorem 6.1 yields that, for an arbitrary fixed s,

$$\begin{aligned} f_n (s) = s + a_n (s) / n^2, \end{aligned}$$
(6.22)

where the coefficient \(a_n (s)\) is continuous and is of the order O(1).

Write the MLE of \(\theta \) for a sample of size n, \({\varvec{x}}_n\), as \(\hat{\theta }_{ML}({\varvec{x}}_n) \). Since the sample density is assumed to be in the exponential family, \(\hat{\theta }_{ML}({\varvec{x}}_n)\) can be expressed as \(\nabla N(\bar{t})\), which is written as \(s_n\). Then, the posterior mean \(\hat{\theta }({\varvec{x}}_n)\) is expressed as \(\hat{\theta }({\varvec{x}}_n)=f_n (s_n)\). The assumption that the sampling density is in the regular exponential family shows that a true parameter \(\theta _T\) is in the interior of \(\Theta \); the law of large numbers implies that for an arbitrary small positive value \(\epsilon \), the probability of the subspace of samples \(\mathcal{{X}}_{\epsilon }(n) = \{{\varvec{x}}_n|\, |s_n -\theta _T| \le \epsilon \}\) is greater than \(1-\epsilon \). From the continuity of the coefficient \(a_n(s_n)\) in (6.22), it follows that \(a_n(s_n)\) is bounded for \({\varvec{x}}_n \in \mathcal{{X}}_{\epsilon }(n)\). Thus, it holds that

$$ f_n (s_n) = s_n + c_n / n^2, $$

where \(c_n\) is a finite value. Combining these results, we find that

$$ \hat{\theta }({\varvec{x}}_n) = \hat{\theta }_{ML}({\varvec{x}}_n) + O_P ( 1 / n^2 ).$$

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yanagimoto, T., Ohnishi, T. (2020). A Characterization of Jeffreys’ Prior with Its Implications to Likelihood Inference. In: Hoshino, N., Mano, S., Shimura, T. (eds) Pioneering Works on Distribution Theory. SpringerBriefs in Statistics(). Springer, Singapore. https://doi.org/10.1007/978-981-15-9663-6_6

Download citation

Publish with us

Policies and ethics