Abstract
A family of the estimators adjusting the maximum likelihood estimator by a higher-order term maximizing the asymptotic predictive expected log-likelihood is introduced under possible model misspecification. The negative predictive expected log-likelihood is seen as the Kullback–Leibler distance plus a constant between the adjusted estimator and the population counterpart. The vector of coefficients in the correction term for the adjusted estimator is given explicitly by maximizing a quadratic form. Examples using typical distributions in statistics are shown.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csáki F (eds) Proceedings of the 2nd international symposium on information theory. Académiai Kiado, Budapest, pp 267–281
Bjornstad JF (1990) Predictive likelihood: a review. Stat Sci 5:242–265
DeGroot MH, Schervish MJ (2002) Probability and statistics, 3rd edn. Addison-Wesley, Boston
Fisher RA (1956) Statistical methods and scientific inference. Oliver and Boyd, Edinburgh
Giles DEA, Rayner AC (1979) The mean squared errors of the maximum likelihood and natural-conjugate Bayes regression estimators. J Econometr 11:319–334
Gruber MHJ (1998) Improving efficiency by shrinkage: The James-Stein and ridge regression estimators. Marcel Dekker, New York
Hinkley D (1979) Predictive likelihood. Ann Stat 7:718–728
Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83:875–890
Konishi S, Kitagawa G (2003) Asymptotic theory for information criteria in model selection—functional approach. J Stat Plan Inference 114:45–61
Konishi S, Kitagawa G (2008) Information criteria and statistical modeling. Springer, New-York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Lawless JF, Fredette M (2005) Frequentist prediction intervals and predictive distributions. Biometrika 92:529–542
Lejeune M, Faulkenberry GD (1982) A simple predictive density function. J Am Stat Assoc 77:654–657
Leonard T (1982) Comment (on Lejeune & Faulkenberry, 1982). J Am Stat Assoc 77:657–658
Ogasawara H (2010) Asymptotic expansions for the pivots using log-likelihood derivatives with an application in item response theory. J Multivar Anal 101:2149–2167
Ogasawara H (2013) Asymptotic cumulants of the estimator of the canonical parameter in the exponential family. J Stat Plan Inference 143:2142–2150
Ogasawara H (2014a) Supplement to the paper “Asymptotic cumulants of the estimator of the canonical parameter in the exponential family”. Econ Rev (Otaru University of Commerce), 65 (2 & 3), 3–16. Permalink: http://hdl.handle.net/10252/5399. Accessed 20 Nov 2016
Ogasawara H (2014b) Optimization of the Gaussian and Jeffreys power priors with emphasis on the canonical parameters in the exponential family. Behaviormetrika 41:195–223
Ogasawara H (2015) Bias adjustment minimizing the asymptotic mean square error. Commun Stat Theory Methods 44:3503–3522
Ogasawara H (2016) Optimal information criteria minimizing their asymptotic mean square errors. Sankhyā B 78:152–182
Ogasawara H (2017) Asymptotic cumulants of some information criteria. J Jpn Soc Comput Stat (to appear)
Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Reidel, Dordrecht
Stone M (1977) An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J R Stat Soc B 39:44–47
Takeuchi K (1976) Distributions of information statistics and criteria of the goodness of models. Math Sci 153:12–18 (in Japanese)
Takezawa K (2012) A revision of AIC for normal error models. Open J Stat 2:309–312
Takezawa K (2014) Estimation of the exponential distribution from a viewpoint of prediction. In: Proceedings of 2014 Japanese Joint Statistical Meeting, 305. University of Tokyo, Tokyo, Japan (in Japanese)
Takezawa K (2015) Estimation of the exponential distribution in the light of future data. Br J Math Comput Sci 5:128–132
Acknowledgements
This work was partially supported by a Grant-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (JSPS KAKENHI, Grant No. 26330031).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Shuichi Kawano.
Appendix
Appendix
1.1 The asymptotic predictive expected log-likelihood for the maximum likelihood estimator
In this appendix, \( {\text{E}}_{\text{T}} (\hat{\bar{l}}_{\text{ML}}^{*} )_{{ \to O(n^{ - 2} )}} \) in (14) is obtained, where
\( {\text{E}}_{\text{T}} (\hat{\bar{l}}_{\text{ML}}^{*} ) = {\text{E}}_{\text{T}} \{ \bar{l}^{*} ({\hat{\varvec{\uptheta }}}_{\text{ML}} )\} = \int_{{R({\mathbf{X}})}} {\bar{l}^{*} \{ {\varvec{\uptheta}}_{\text{ML}} ({\mathbf{X}})\} f_{\text{T}} ({\mathbf{X}}|{\varvec{\upzeta}}_{0} ){\text{d}}{\mathbf{X}}} \) (see (10)). For this expectation, we use the expansion of \( {\hat{\varvec{\uptheta }}}_{\text{ML}} \) by Ogasawara (2010, p. 2151) as follows:
where \( {\text{v}}( \cdot ) \) is the vectorizing operator taking the non-duplicated elements of a symmetric matrix and \( {\text{v}}^{{\prime}} ( \cdot ) = \{ {\text{v}}( \cdot )\}^{{\prime}} \).
Using (129), the matrices \( {\varvec{\Lambda}}^{(2 - j)} \,\,(j = 1,2) \) and \( {\varvec{\Lambda}}^{(3 - j)} \,\,(j = 1, \ldots ,4) \) are implicitly defined by
The expectation to be derived is
In (131), the asymptotic expectation is derived term by term. In the following, the notation, e.g., \( ({\varvec{\Lambda}}^{(2 - 1)} )_{(e:ab,c,d)} \) indicates an element of \( {\varvec{\Lambda}}^{(2 - 1)} \) corresponding to the e-th row and the column denoted by “ab, c, d” which corresponds to \( ({\mathbf{M}})_{ab} \,\,(a \ge b),\,\,\,\partial \bar{l}/\partial ({\varvec{\uptheta}}_{0} )_{c} \) and \( \partial \bar{l}/\partial ({\varvec{\uptheta}}_{0} )_{d} \) in \( {\mathbf{l}}_{0}^{(2 - 1)} \). The notation, e.g., \( \sum\nolimits_{(g,\,h)}^{(2)} {} \) indicates the summation of two terms exchanging g and h; \( \sum\limits_{a \ge b}^{{}} {( \cdot )} \equiv \sum\limits_{b = 1}^{q} {\sum\limits_{a = b}^{q} {( \cdot )} } \); and \( \lambda^{ab} = ({\varvec{\Lambda}}^{ - 1} )_{ab} \).
-
1.
$$ \begin{aligned} {\text{E}}_{\text{T}} \left\{ {\frac{1}{2}\frac{{\partial^{2} \bar{l}^{*} }}{{(\partial {\varvec{\uptheta}}_{0}^{\prime} )^{ \langle 2 \rangle } }}({\hat{\varvec{\uptheta }}}_{\text{ML}} - {\varvec{\uptheta}}_{0} )^{ \langle 2 \rangle } } \right\} \hfill \\ = \frac{1}{2}n^{ - 1} {\text{vec}}^{{\prime}} ({\varvec{\Lambda}})n{\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(1)} {\mathbf{l}}_{0}^{(1)} )^{ \langle 2 \rangle } \} \hfill \\ \,\,\, + \frac{1}{2}n^{ - 2} {\text{vec}}^{{\prime}} ({\varvec{\Lambda}})\bigg[ {2n^{2} } {\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(2)} {\mathbf{l}}_{0}^{(2)} ) \otimes ({\varvec{\Lambda}}^{(1)} {\mathbf{l}}_{0}^{(1)} )\} \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, + 2n^{2} {\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(3)} {\mathbf{l}}_{0}^{(3)} ) \otimes ({\varvec{\Lambda}}^{(1)} {\mathbf{l}}_{0}^{(1)} )\} + n^{2} {{\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(2)} {\mathbf{l}}_{0}^{(2)} )^{ \langle 2 \rangle } \} } \bigg] \hfill \\ \,\,\, + O(n^{ - 3} ) \hfill \\ \end{aligned} $$(132)
where, e.g., \( \mathop [\limits_{{({\text{A}})}} \cdot \mathop ]\limits_{{({\text{A}})}} \) is for ease of finding correspondence.
-
2.
$$ \begin{aligned} {\text{E}}_{\text{T}} \left\{ {\frac{1}{6}\frac{{\partial^{3} \bar{l}^{*} }}{{(\partial {\varvec{\uptheta}}_{0}^{\prime} )^{ \langle 3 \rangle } }}({\hat{\varvec{\uptheta }}}_{\text{ML}} - {\varvec{\uptheta}}_{0} )^{ \langle 3 \rangle } } \right\} \hfill \\ = n^{ - 2} \frac{{\partial^{3} \bar{l}^{*} }}{{(\partial {\varvec{\uptheta}}_{0}^{\prime} )^{ \langle 3 \rangle } }}\left[ {\frac{{n^{2} }}{6}{\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(1)} {\mathbf{l}}_{0}^{(1)} )^{ \langle 3 \rangle } \} + \frac{{n^{2} }}{2}{\text{E}}_{\text{T}} \{ ({\varvec{\Lambda}}^{(2)} {\mathbf{l}}_{0}^{(2)} ) \otimes ({\varvec{\Lambda}}^{(1)} {\mathbf{l}}_{0}^{(1)} )^{ \langle 2 \rangle } \} } \right] \hfill \\ \,\,\, + O(n^{ - 3} ) \hfill \\ \end{aligned} $$(133)
-
3.
$$ \begin{aligned} {\text{E}}_{\text{T}} \left\{ {\frac{1}{24}\frac{{\partial^{4} \bar{l}^{*} }}{{(\partial {\varvec{\uptheta}}_{0}^{\prime} )^{ \langle 4 \rangle } }}({\hat{\varvec{\uptheta }}}_{\text{ML}} - {\varvec{\uptheta}}_{0} )^{ \langle 4 \rangle } } \right\} \hfill \\ \,\,\,\,\,\,\,\, = \frac{{n^{ - 2} }}{8}{\text{vec}}^{{\prime}} ({\varvec{\Lambda}}^{ - 1} {\varvec{\Gamma \Lambda }}^{ - 1} )\frac{{\partial^{4} \bar{l}^{*} }}{{(\partial {\varvec{\uptheta}}_{0} )^{ \langle 2 \rangle } (\partial {\varvec{\uptheta}}_{0}^{\prime} )^{ \langle 2 \rangle } }}\,{\text{vec}}({\varvec{\Lambda}}^{ - 1} {\varvec{\Gamma \Lambda }}^{ - 1} )\,\, +\, O(n^{ - 3} ). \hfill \\ \end{aligned} $$(134)
In 1, 2 and 3, when the model is true \( {\text{E}}_{\text{T}} ( \cdot ) = {\text{E}}_{{\theta_{0} }} ( \cdot ) \) and \( - {\varvec{\Lambda}} = {\varvec{\Gamma}} = {\mathbf{I}}_{0} \). Especially, the term of order \( O(n^{ - 1} ) \) becomes
That is, the expectation is asymptotically smaller than \( \bar{l}^{*} ({\varvec{\uptheta}}_{0} ) = \bar{l}_{0}^{*} \) by \( n^{ - 1} q/2 \) up to this order.
About this article
Cite this article
Ogasawara, H. A family of the adjusted estimators maximizing the asymptotic predictive expected log-likelihood. Behaviormetrika 44, 57–95 (2017). https://doi.org/10.1007/s41237-016-0004-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-016-0004-6