Advertisement

Pitman closeness domination in predictive density estimation for two-ordered normal means under \(\alpha\)-divergence loss

  • Yuan-Tsung ChangEmail author
  • Nobuo Shinozaki
  • William E. Strawderman
Original Paper
  • 8 Downloads

Abstract

We consider Pitman closeness domination in predictive density estimation problems when the underlying loss metric is \(\alpha\)-divergence, \(\{D(\alpha )\}\), a loss introduced by Csiszàr (Stud Sci Math Hung 2:299–318, 1967). The underlying distributions considered are normal location-scale models, including the distribution of the observables, the distribution of the variable, whose density is to be predicted, and the estimated predictive density which will be taken to be of the plug-in type. The scales may be known or unknown. Chang and Strawderman (J Multivar Anal 128:1–9, 2014) have derived a general expression for the \(\alpha\)-divergence loss in this setup, and have shown that it is a concave monotone function of quadratic loss, and also a function of the variances (predicand, and plug-in). We demonstrate \(\{D(\alpha )\}\) Pitman closeness domination of certain plug-in predictive densities over others for the entire class of metrics simultaneously when modified Pitman closeness domination holds in the related problem of estimating the mean. We also establish \(\{D(\alpha )\}\) Pitman closeness results for certain generalized Bayesian (best invariant) predictive density estimators. Examples of \(\{D(\alpha )\}\) Pitman closeness domination presented relate to the problem of estimating the predictive density of the variable with the larger mean. We also consider the case of two-ordered normal means with a known covariance matrix.

Keywords

Predictive density \(\alpha\)-Divergence Stochastic dominance Ordered normal means Pitman’s closeness criterion 

1 Introduction

We consider Pitman closeness domination in predictive density estimation problems when the underlying loss metric is \(\alpha\)-divergence \(\{D(\alpha )\}\), a loss introduced by Csiszàr (1967). The underlying distributions considered are normal, including the distribution of the observables, the distribution of the variable, whose density is to be predicted, and the estimated predictive density which will be taken to be of the plug-in type. We demonstrate \(\{D(\alpha )\}\) Pitman closeness domination of certain plug-in predictive densities over others for the entire class of metrics simultaneously when related Pitman’s closeness domination holds in the problem of estimating the mean. We also consider \(\{D(\alpha )\}\) Pitman domination of certain generalized Bayesian (best invariant) procedures.

Examples of Pitman closeness domination presented relate to the problem of estimating the predictive density of the variable with the larger mean. More precisely, let \(X_1 \sim N( \mu _1, \sigma _1^2)\) and \(X_2 \sim N( \mu _2, \sigma _2^2)\) be two independent random normal variables, where \(\mu _1 \le \mu _2\). Under the above restriction, we wish to predict a normal population with mean equal to the larger mean, \(\mu _2\), and variance equal to \(\sigma ^2\), \({\tilde{Y}} \sim N( \mu _2 , \sigma ^2)\). We consider different versions of this problem, depending on whether the \(\sigma _i^2, i=1,2\) are known or are unknown, but satisfy the additional order restriction, \(\sigma _1^2 \le \sigma _2^2\). The case of two-ordered normal means with known covariance matrix is also considered.

Kullback–Leibler (KL) loss is given by
$$\begin{aligned} D_{{\rm KL}}\{ {\hat{p}}( \tilde{y} | y), p( \tilde{y} | \psi ) \} = \int p( \tilde{y} | \psi ) \log \frac{ p( \tilde{y} | \psi )}{{\hat{p}}( \tilde{y} | y)}{{\rm d}}\tilde{y}, \end{aligned}$$
(1)
where \(p( \tilde{y} | \psi )\) is the true density to be estimated and \({\hat{p}}( \tilde{y} | y)\) is the estimated predictive density based on observing \(Y=y\), where \(Y \sim P(y| \psi )\) is the most studied among losses for the predictive density estimation problem.
The associated KL risk is defined as
$$\begin{aligned} R_{{\rm KL}} = \int D_{{\rm KL}}\{ {\hat{p}}( \tilde{y} | y), p( \tilde{y} | \psi ) \} p( y| \psi ) {{\rm d}}y, \end{aligned}$$
where \(p( y| \psi )\) is the density of y.
As pointed out in Maruyama and Strawderman (2012), KL loss is essentially contained in the class of \(\alpha\)-divergence losses (\(D_\alpha\) introduced by Csiszàr (1967) given by
$$\begin{aligned} D_{\alpha }\{ {\hat{p}}( \tilde{y} | y), p( \tilde{y} | \psi ) \} = \int f_{\alpha } \left( \frac{ {\hat{p}}( \tilde{y} | y)}{p( \tilde{y} | \psi )} \right) p( \tilde{y} | \psi ) {{\rm d}}\tilde{y}, \end{aligned}$$
(2)
where for \(-1 \le \alpha \le 1\)
$$\begin{aligned} f_{\alpha }(z)= \left\{ \begin{array}{ll} \frac{4}{1 - \alpha ^2} ( 1- z^{(1+\alpha )/2}) , &{}\quad | \alpha | < 1 \\ z \log z, &{} \quad \alpha = 1 \\ - \log z, &{} \quad \alpha = -1 . \end{array} \right. \end{aligned}$$
(3)
Here, KL loss corresponds to \(\alpha = - \ 1\). The case \(\alpha = 1\) is sometimes referred to as reverse KL loss.

Chang and Strawderman (2014) have derived the general form of \(D_\alpha\) loss for the case of normal models and have shown that it is a concave monotone function of quadratic loss and is also a function of the variances (observed, predicand, and plug-in). This is reviewed in Sect. 2.

An alternative criterion to evaluate the goodness of estimators was introduced by Pitman (1937) as follows:

Let \(T_1\) and \(T_2\) be two estimators of \(\theta\). Then, \(T_1\) is closer to \(\theta\) than \(T_2\) (or \(T_1\) is preferred to \(T_2)\) if Pitman nearness (PN) of \(T_1\) compared to \(T_2\)
$$\begin{aligned} {{\rm PN}}_{\theta }(T_1, T_2) = P\{ | T_1 - \theta | < | T_2 - \theta | \} > 1/2. \end{aligned}$$
For the case, when the estimators are equal with positive probability, Nayak (1990) modified Pitman’s criterion as follows:
\(T_1\) is said to be closer to \(\theta\) than \(T_2\) if
$$\begin{aligned} P\{ | T_1 - \theta | < | T_2 - \theta | \} > \frac{1}{2} P\{ T_1 \ne T_2 \}. \end{aligned}$$
Motivated by Nayak (1990) and Gupta and Singh (1992) defined the modified Pitman nearness (MPN) of \(T_1\) compared to \(T_2\). Setting
$$\begin{aligned} {{\rm MPN}}_\theta ( T_1, T_2) = P \{ | T_1 - \theta |< | T_2 - \theta | {\vert } T_1 \ne T_2\} = \frac{P \{ | T_1 - \theta | < | T_2 - \theta | ,T_1 \ne T_2 \}}{P\{ T_1 \ne T_2 \}}, \end{aligned}$$
\(T_1\) is closer to \(\theta\) than \(T_2\) if \({{\rm MPN}}_\theta ( T_1, T_2)> 1/2\). Many works related to Pitman’s criterion were published in the special issue of Communications in Statistics—Theory and Methods A20 (11) in 1992 and were unified in the monograph by Keating et al. (1993).

In Sect. 3, modified Pitman closeness domination results for the estimation problems of ordered means with ordered variances are reviewed, which have been previously established by Chang and Shinozaki (2015) for a broader class of estimators. As noted above, we apply the result of the \(D_\alpha\) loss metric for plug-in predictive density estimates in normal models to obtain Pitman closeness domination for plug-in predictive density estimates in Sect. 4. Section 5 considers \(\{D(\alpha )\}\) Pitman closeness domination of the best invariant (generalized Bayes) predictive density estimator. Section 6 contains some concluding remarks. An Appendix contains some needed technical results.

Here is a brief review of some of the relevant literature for the problem of estimating the mean.

Let
$$\begin{aligned} {\bar{X}}_i = \sum _{j=1}^{n_i} X_{ij} / n_i, \quad s_i^2 = \sum _{j=1}^{n_i} ( X_{ij}- {\bar{X}}_i)^2 / ( n_i -1) \end{aligned}$$
be the unbiased estimators of \(\mu _i\) and \(\sigma _i^2\), respectively, based on samples of size \(n_i\) from two normal populations, \(X_1 \sim N( \mu _1, \sigma _1^2)\) and \(X_2 \sim N( \mu _2, \sigma _2^2)\), respectively. When the means are equal (the common mean problem) \(\mu _1 = \mu _2 = \mu\) and the variances are known, the UMVE of \(\mu\) is
$$\begin{aligned} \hat{\mu } = \frac{n_1\sigma _2^2}{n_1\sigma _2^2+ n_2\sigma _1^2}{\bar{X}}_1 + \frac{n_2\sigma _1^2}{n_1\sigma _2^2+ n_2\sigma _1^2}{\bar{X}}_2. \end{aligned}$$
When the variances are unknown, the unbiased estimator
$$\begin{aligned} \hat{\mu }^{{\rm GD}} = \frac{n_1s_2^2}{n_1s_2^2+ n_2s_1^2}{\bar{X}}_1 + \frac{n_2s_1^2}{n_1s_2^2+ n_2s_1^2}{\bar{X}}_2 \end{aligned}$$
was proposed by Graybill and Deal (1959) and they gave a necessary and sufficient condition on \(n_1\) and \(n_2\) for \(\hat{\mu }^{{\rm GD}}\) to have a smaller variance than both \({\bar{X}}_1\) and \({\bar{X}}_2\).
When estimating the ordered means \(\mu _1 \le \mu _2\), Oono and Shinozaki (2005) proposed truncated estimators of \(\mu _i, i=1,2\),
$$\begin{aligned} \hat{\mu }_1^{{\rm OS}} = \min \{ {\bar{X}}_1, \hat{\mu }^{{\rm GD}} \}, \quad \hat{\mu }_2^{{\rm OS}} = \max \{ {\bar{X}}_2, \hat{\mu }^{{\rm GD}} \}, \end{aligned}$$
(4)
and showed that \(\hat{\mu }_i^{{\rm OS}}\) dominates the \({\bar{X}}_i\) in terms of MSE if and only if MSE of \(\hat{\mu }^{{\rm GD}}\) is not larger than that of \({\bar{X}}_i\) to estimate \(\mu _i\) when \(\mu _1= \mu _2\).
When there are order restrictions given on both means and variances, \(\mu _1 \le \mu _2, \sigma _1^2 \le \sigma _2^2\), Chang et al. (2012) have proposed
$$\begin{aligned}&\hat{\mu }_1^{{\rm CS}}= \left\{ \begin{array}{ll} \hat{\mu }_1^{{\rm OS}} , &{} {{\rm if }} \, s_1^2 \le s_2^2 \\ \min \left\{ {\bar{X}}_1,\frac{n_1}{n_1+n_2}{\bar{X}}_1 + \frac{n_2}{n_1+n_2}{\bar{X}}_2 \right\} , &{} {{\rm if }} \, s_1^2 > s_2^2 \end{array} \right. \end{aligned}$$
(5)
and
$$\begin{aligned}&\hat{\mu }_2^{{\rm CS}}= \left\{ \begin{array}{ll} \hat{\mu }_2^{{\rm OS}} , &{} \, {{\rm if }} \, s_1^2 \le s_2^2 \\ \max \left\{ {\bar{X}}_2,\frac{n_1}{n_1+n_2}{\bar{X}}_1 + \frac{n_2}{n_1+n_2}{\bar{X}}_2 \right\} , &{}\, {{\rm if }} \, s_1^2 > s_2^2. \end{array} \right. \end{aligned}$$
(6)
They show that \(\hat{\mu }_2^{{\rm CS}}\) stochastically dominates \(\hat{\mu }_2^{{\rm OS}}\), but \(\hat{\mu }_1^{{\rm CS}}\) cannot dominate \(\hat{\mu }_1^{{\rm OS}}\) even in terms of MSE when \(\mu _2 - \mu _1\) is sufficiently large. We will show that \(\hat{\mu }_2^{{\rm CS}}\) is Pitman closer to \(\mu _2\) than \(\hat{\mu }_2^{{\rm OS}}\) in Sect. 3.
When considering the estimation of ordered means of a normal distribution with a known covariance matrix, it has been recognized that the restricted MLEs do not always behave properly for general order restrictions and covariance matrices, see for example, Lee (1981), Shinozaki and Chang (1999), Fernández et al. (2000), and Cohen and Sackrowitz (2004). Let \({\varvec{X}}_i = ( X_{1i}, X_{2i})^\prime , i=1, \dots , n,\) be independent observations from \(N({\varvec{\mu }}, \varSigma )\), where \({\varvec{\mu }} =(\mu _1, \mu _2)^\prime ,\) and
$$\begin{aligned} \varSigma =\left( \begin{array}{cc} \sigma _1^2 &{} \rho \sigma _1 \sigma _2 \\ \rho \sigma _1 \sigma _2 &{} \sigma _2^2 \end{array} \right) , \end{aligned}$$
(7)
is a known covariance matrix. We assume that \(|\rho |\ne 1\) and consider the estimation problem of \(\mu _i,i=1,2\) when there is an order restriction, \(\mu _1 \le \mu _2\). Using \({\bar{X}}_i=\sum _{i=1}^n X_i/n, i=1,2,\), the restricted maximum likelihood estimators (MLE) of \(\mu _1\) and \(\mu _2\) are given as
$$\begin{aligned} \hat{\mu }_1^{{\rm MLE}}= {\bar{X}}_1 - \beta ( {\bar{X}}_1- {\bar{X}}_2)^+\ \, \text{ and }\ \ \hat{\mu }_2^{{\rm MLE}}= {\bar{X}}_2 + \alpha ( {\bar{X}}_1- {\bar{X}}_2)^+, \end{aligned}$$
(8)
where \(\alpha = \omega _1/(\omega _1+ \omega _2)\) and \(\beta =\omega _2/(\omega _1+ \omega _2)\) with \(\omega _1= \sigma _2^2- \rho \sigma _1\sigma _2\), \(\omega _2= \sigma _1^2- \rho \sigma _1\sigma _2\). We note that \(\omega _1+\omega _2= \sigma _1^2-2\rho \sigma _1 \sigma _2+\sigma _2^2>0\), although \(\omega _1\) or \(\omega _2\) may be negative. Hwang and Peddada (1994) have proposed alternative estimators which are motivated by the case when a covariance matrix is diagonal. In our two-dimensional case, the proposed estimators of \(\mu _1\) and \(\mu _2\) are given as
$$\begin{aligned} \hat{\mu }_1^{{\rm HP}}= \min ( {\bar{X}}_1, \alpha {\bar{X}}_1 + \beta {\bar{X}}_2 )\quad {{\rm and}} \quad \hat{\mu }_2^{{\rm HP}}= \max ( {\bar{X}}_2, \alpha {\bar{X}}_1 + \beta {\bar{X}}_2 ). \end{aligned}$$
(9)
Clearly, \(\hat{\mu }_2^{{\rm HP}} \ge \hat{\mu }_1^{{\rm HP}}\) and Hwang and Peddada (1994) have shown that \(\hat{\mu }_i^{{\rm HP}}\) stochastically dominates the unrestricted MLE \({\bar{X}}_i, i=1,2\). Chang et al. (2017) have shown that \(\hat{\mu }_i^{{\rm MLE}}\) not only stochastically dominates \(\hat{\mu }_i^{{\rm HP}}\), but also dominates \(\hat{\mu }_i^{{\rm HP}}, i=1,2\) in the sense of Pitman closeness.

Broader reviews of statistical inference under order restrictions are given in Barlow et al. (1972) and Robertson et al. (1988) and the two monographs, Silvapulle and Sen (2004) and van Eeden (2006).

2 The form of \(D(\alpha )\) loss for normal distributions

In this section, we review the form of the \(\{D(\alpha )\}\) loss when the density to be predicted and the predictive density estimate are both normal.

The aim here is to show that when \({\hat{p}}( \tilde{y} | y)\) and \(p( \tilde{y} | \psi )\) are both normal distributions, and then, \(D(\alpha )\) loss can be expressed as a concave monotone function of squared error loss.

Lemma 1

(Chang and Strawderman (2014), Theorem 2.1) If the true density function ofYis\(N(\mu , \sigma ^2)\)and the estimated predictive density ofY, is\(N( \hat{\mu } , \hat{\sigma }^2)\)then

(a) for\(-1< \alpha <1\)
$$\begin{aligned} D_{\alpha }( N( \tilde{y} | \hat{\mu } , \hat{\sigma }^2), N( \tilde{y} | \mu , \sigma ^2) ) = \frac{4}{1-\alpha ^2} \left( 1-d( \sigma ^ 2, \hat{\sigma }^2 ) \mathrm{e}^{-A( \sigma ^2, \hat{\sigma } ^2 ) \frac{( \hat{\mu } - \mu ) ^2}{2}} \right) , \end{aligned}$$
(10)
where
$$\begin{aligned}&d(\sigma ^2, \hat{\sigma }^2)= \frac{ \sigma ^{(\alpha -1)/2}\tau }{ \hat{\sigma }^{ (\alpha +1)/2} } , \quad A(\sigma ^2, \hat{\sigma }^2) = \left( \frac{1-\alpha }{2\sigma ^2} \right) \left( 1-\frac{(1-\alpha )\tau ^2}{2\sigma ^2} \right) >0\\ \text{ and }&\quad \frac{1}{\tau ^2} = \left( \frac{1+\alpha }{2\hat{\sigma }^2}+ \frac{1-\alpha }{2 \sigma ^2} \right) . \end{aligned}$$
Furthermore,\(d(\sigma ^2, \hat{\sigma }^2)<1\)and\(A(\sigma ^2, \hat{\sigma }^2)>0\).
(b) (Reverse KL)
$$\begin{aligned} D_{+1} ( N( \tilde{y} | \hat{\mu } , \hat{\sigma }^2), N( \tilde{y} | \mu , \sigma ^2) ) = \frac{1}{2} \left[ \left( \frac{\hat{\sigma }^2}{\sigma ^2} - \log \frac{\hat{\sigma }^2}{\sigma ^2} -1 \right) + \frac{(\hat{\mu } - \mu )^2}{\sigma ^2} \right] . \end{aligned}$$
(11)
(c) (KL)
$$\begin{aligned} D_{-1} ( N( \tilde{y} | \hat{\mu } , \hat{\sigma }^2), N( \tilde{y} | \mu , \sigma ^2) ) = \frac{1}{2} \left[ \left( \frac{ \sigma ^2}{\hat{\sigma } ^2} - \log \frac{\sigma ^2}{\hat{\sigma }^2} -1 \right) + \frac{(\hat{\mu } - \mu )^2}{\hat{\sigma }^2} \right] . \end{aligned}$$
(12)

Note: The first part of the RHS of (11) and (12) is a form of Stein’s loss for estimating variances and the second part is the squared error loss \(( \hat{\mu } - \mu ) ^2\) divided by either the true or estimated variance. In addition, note that in each case, the \(\{D(\alpha )\}\) loss is a concave monotone function of squared error loss \(| \hat{\mu } - \mu |^2\) and is also a function of the variances.

3 Results for estimation of two-ordered normal means with unknown, but ordered variances under modified Pitman closeness criterion

In this section, we consider the problem of estimating the ordered means of two normal distributions with unknown but ordered variances under modified Pitman closeness criterion. We show that in estimating the mean with larger variance, the proposed estimator, \(\hat{\mu }_2^{{\rm CS}}\), given in (6), is closer to true mean than the usual one, \(\hat{\mu }_2^{{\rm OS}}\), given in (4), which ignores the order restriction on variances. However, while in estimating the mean with smaller variance, the usual estimator, \(\hat{\mu }_1^{{\rm OS}}\), is not improved upon by \(\hat{\mu }_1^{{\rm CS}}\). We also discuss simultaneous estimation of two-ordered means when the unknown variances are ordered.

First, we show that \(\hat{\mu }_2^{{\rm CS}}\) is Pitman closer to \(\mu _2\) than \(\hat{\mu }_2^{{\rm OS}}\) under modified Pitman closeness criterion. Actually, if we set
$$\begin{aligned} \gamma = \frac{n_1s_2^2}{n_1s_2^2+ n_2s_1^2} \end{aligned}$$
(13)
in Theorem 2 of Chang and Shinozaki (2015), then we have following.

Theorem 1

The estimator\(\hat{\mu }_2^{{\rm CS}}\)is closer to\(\mu _2\)than\(\hat{\mu }_2^{{\rm OS}}\), i.e., for all\(\mu _1 \le \mu _2\)and\(\sigma _1^2 \le \sigma _2^2\)
$$\begin{aligned} {{\rm MPN}}_{\mu _2}( \hat{\mu }_2^{{\rm CS}}, \hat{\mu }_2^{{\rm OS}}) \ge 1/2, \end{aligned}$$
with strict inequality for some\(\mu _1 \le \mu _2\)and\(\sigma _1^2 \le \sigma _2^2\).

Next, we consider estimating the mean \(\mu _1\), the mean with smaller variance; we show that \(\hat{\mu }_1^{{\rm CS}}\) cannot be closer to \(\mu _1\) than \(\hat{\mu }_1^{{\rm OS}}\) when \(\mu _2-\mu _1\) is sufficiently large. Similarly, if we set \(\gamma\) as (13) in Theorem 3 of Chang and Shinozaki (2015), then we have following.

Theorem 2

When\(\mu _2-\mu _1\)is sufficiently large, the estimator\(\hat{\mu }_1^{{\rm CS}}\)can not be closer to\(\mu _1\)than\(\hat{\mu }_1^{{\rm OS}}\) , that is,
$$\begin{aligned} {{\rm MPN}}_{\mu _1}(\hat{\mu }_1^{{\rm CS}}, \hat{\mu }_1^{{\rm OS}}) < 1/2. \end{aligned}$$

Chang and Shinozaki (2015) have obtained a broader class of results including the above.

Although the estimator \(\hat{\mu }_1^{{\rm CS}}\) is not closer to \(\mu _1\) than \(\hat{\mu }_1^{{\rm OS}}\) when \(\mu _2-\mu _1\) is sufficiently large, in the simultaneous estimation problem with \(\mu _1 \le \mu _2\) when the variances are also ordered, the next theorem shows that if \(n_1 \ge n_2\), then \(\hat{\varvec{\mu }}^{{\rm CS}} =(\hat{\mu }_1^{{\rm CS}}, \hat{\mu }_2^{{\rm CS}})^\prime\) improves upon \(\hat{\varvec{\mu }}^{{\rm OS}} =(\hat{\mu }_1^{{\rm OS}}, \hat{\mu }_2^{{\rm OS}})^\prime\) under Pitman closeness based on the sum of normalized squared errors:
$$\begin{aligned} \sum _{i=1}^2 (\hat{\mu }_i- \mu _i)^2/\sigma _i^2. \end{aligned}$$

Theorem 3

If \(n_1 \ge n_2\) then \(\hat{\varvec{\mu }}^{{\rm CS}} =(\hat{\mu }_1^{{\rm CS}}, \hat{\mu }_2^{{\rm CS}})\) is closer to \((\mu _1, \mu _2)\) than \(\hat{\varvec{\mu }}^{{\rm OS}} =(\hat{\mu }_1^{{\rm OS}},\hat{\mu }_2^{{\rm OS}})\) as
$$\begin{aligned}&{{\rm MPN}}_ {\varvec{\mu }}( \hat{\varvec{\mu }}^{{\rm CS}}, \hat{\varvec{\mu }}^{{\rm OS}}) \nonumber \\&\quad = \frac{P \{ \sum _{i=1}^2 (\hat{\mu }_i^{{\rm CS}}- \mu _i)^2/\sigma _i^2 \le \sum _{i=1}^2 (\hat{\mu }_i^{{\rm OS}}- \mu _i)^2/\sigma _i^2, \hat{\varvec{\mu }}^{{\rm CS}} \ne \hat{\varvec{\mu }}^{{\rm OS}} \}}{P\{ \hat{\varvec{\mu }}^{{\rm CS}} \ne \hat{\varvec{\mu }}^{{\rm OS}} \}} \nonumber \\&\quad > 1/2. \end{aligned}$$
(14)

For completeness, the proof is given in the Appendix.

4 Pitman closeness in predicting density function under the D (α) loss metric

In this section, we will establish Pitman closeness results under the \(\{D(\alpha )\}\) loss metric for certain predictive density estimation problems involving two normal populations when the means are ordered. We handle the known and unknown variance cases in separate subsections.

First, we give a formal definition.

Definition 1

Given two predictive density estimates \({\hat{f}}_1( \tilde{y} | x)\) and \({\hat{f}}_2( \tilde{y} | x )\) of a density \(f( \tilde{y} | \psi )\) based on data x from a distributions \(X \sim g(X | \psi )\), \(\psi \in \varOmega\), \({\hat{f}}_2( \tilde{y} | x)\) is closer to \(f( \tilde{y} | \psi )\) than \({\hat{f}}_1( \tilde{y} | x)\) with respect to the \(D(\alpha )\) metric under the modified Pitman closeness criterion if \(\forall \psi \in \varOmega\)
$$\begin{aligned} P_\psi \{ D_\alpha ( {\hat{f}}_2( \tilde{y} | x), f(\tilde{y} | \psi ) )< D_\alpha ( {\hat{f}}_1( \tilde{y} | x), f(\tilde{y} | \psi ) ) \vert {\hat{f}}_2( \tilde{y} | x) \ne {\hat{f}}_1( \tilde{y} | x) \} \ge 1/2, \end{aligned}$$
with strict inequality for some \(\psi \in \varOmega\).

We first consider the case when variances are known.

4.1 Case when variances are known

The data are
$$\begin{aligned} X_{ij} \sim N( \mu _i, \sigma _i^2),\quad i=1,2 , \, j = 1, \dots ,n_i, \end{aligned}$$
(15)
with independent sufficient statistics
$$\begin{aligned} {\bar{X}}_{i} \sim N( \mu _i, \sigma _i^2/ n_i), \quad i=1,2, \end{aligned}$$
(16)
where \(\mu _1 \le \mu _2\).

First, we show that the MLEs are closer to \(\mu _i\) than the sample means under modified Pitman closeness criterion.

Theorem 4

The MLE of \(\mu _i\) is
$$\begin{aligned} \hat{\mu }_i^{{\rm MLE}}= \left\{ \begin{array}{ll} {\bar{X}}_i , &{}\quad if\, {\bar{X}}_1 \le {\bar{X}}_2 \\ \frac{n_1\sigma _2^2}{n_1\sigma _2^2+ n_2\sigma _1^2}{\bar{X}}_1 + \frac{n_2\sigma _1^2}{n_1\sigma _2^2+ n_2\sigma _1^2}{\bar{X}}_2 , &{}\quad if \, {\bar{X}}_1 > {\bar{X}}_2. \end{array} \right. \end{aligned}$$
Then
\(\hat{\mu }_i^{{\rm MLE}}\)is Pitman closer to\(\mu _i\)than\({\bar{X}}_i\), that is
$$\begin{aligned} P_{\mu _1, \mu _2} \{ | \hat{\mu }_i^{{\rm MLE}} - \mu _i | < | \bar{X}_i - \mu _i |\, | \hat{\mu }_i^{{\rm MLE}} \ne {\bar{X}}_i\} \ge 1/2 \end{aligned}$$
for all\(\mu _1 \le \mu _2\), with strict inequality for some\(\mu _1 \le \mu _2\).

For completeness, the proof is given in Appendix.

We wish to predict the density of a future observation from some population with mean equal to the larger mean \(\mu _2\), that is
$$\begin{aligned} {\tilde{Y}} \sim N(\mu _2, \sigma ^2), \end{aligned}$$
where \(\sigma ^2\) is known. We now consider comparison of plug-in estimators of the density of \({\tilde{Y}} \sim N( \mu _2, \sigma ^2)\) of the form:
$$\begin{aligned} {\hat{f}}_1( \tilde{y} | X_1, X_2) \sim N( {\bar{X}}_2, \nu ^2 ) \end{aligned}$$
and
$$\begin{aligned} {\hat{f}}_2( \tilde{y} | X_1, X_2) \sim N( \hat{\mu }_2^{{\rm MLE}}, \nu ^2 ), \end{aligned}$$
where \(\nu ^2\) is fixed (and not necessarily equal to \(\sigma ^2\)). See Fourdrinier et al. (2011) for a discussion of why a choice of \(\nu ^2\) different from (typically larger than) \(\sigma ^2\) is reasonable. Our main result in this subsection is the following.

Theorem 5

In estimating the predictive density\(f( \tilde{y} | \mu _i) \sim N( \mu _i, \sigma ^2)\), the density\({\hat{f}}_2 ( \tilde{y} | X_1,X_2) \sim N( \hat{\mu }_i^{{\rm MLE}}, \nu ^2)\)is Pitman closer to the predictive density\(f( \tilde{y} | \mu _i) \sim N( \mu _i, \sigma ^2)\)than\({\hat{f}}_1( \tilde{y} | X_1,X_2) \sim N( {\bar{X}}_i, \nu ^2)\)with respect to the\(\{D(\alpha )\}\)metric for every\(\alpha ( -1 \le \alpha \le 1)\) , \(\sigma ^2\)and\(\nu ^2\).

The analogous result for estimating a predictive density for a population with mean \(\mu _1\) follows immediately. Hence, in the case of known variances, Pitman closeness domination holds both for mean estimation and prediction under \(D_{\alpha }\) metrics for populations corresponding to either the larger or smaller mean.

Proof

The proof follows immediately from Lemma 1, since
$$\begin{aligned}&D_\alpha ( N(\hat{\mu }_i^{{\rm MLE}}, \nu ^2), N( \mu _i, \sigma ^2))< D_\alpha ( N({\bar{X}}_i, \nu ^2), N( \mu _i, \sigma ^2)) \\&\quad \Leftrightarrow (\hat{\mu }_i^{{\rm MLE}} -\mu _i)^2< ({\bar{X}}_i -\mu _i)^2\\&\quad \Leftrightarrow |\hat{\mu }_i^{{\rm MLE}} -\mu _i| < |{\bar{X}}_i-\mu _i|. \end{aligned}$$
From Theorem 4, we have
$$\begin{aligned} P\{ | \hat{\mu }_i^{{\rm MLE}} - \mu _i | < | {\bar{X}}_i - \mu _i | \, \vert \hat{\mu }_i^{{\rm MLE}} \ne {\bar{X}}_i \} \ge 1/2. \end{aligned}$$
This completes the proof.

In the next section, we extend the above results to two-ordered means when a covariance matrix is known.

4.2 Case when the covariance matrix is known

In this section, we consider the case when two normal means are ordered and covariance matrix defined in (7) is known. We show that plug-in predictive density with \(\hat{\mu }_i^{{\rm MLE}},\) (8), is Pitman closer to the true predictive density than plug-in predictive density with \(\hat{\mu }_i^{{\rm HP}},\) (9), under \(D_\alpha\) loss. The following result from Chang et al. (2017) is the basis for our study.

Theorem 6

(Chang et al. (2017), Theorem 3.1) \(\hat{\mu }_i^{{\rm MLE}}\)is Pitman closer to\(\mu _i\)than\(\hat{\mu }_i^{{\rm HP}}, i = 1,2.\)

Based on the above theorem, we have the following main result.

Theorem 7

In estimating the predictive density\(f_i( \tilde{y} | \mu _i) \sim N( \mu _i, \sigma ^2)\), the density\({\hat{f}}_i^{{\rm MLE}} ( \tilde{y} | X_1,X_2) \sim N( \hat{\mu }_i^{{\rm MLE}}, \nu ^2)\)is Pitman closer to the true predictive density\(f_i( \tilde{y} | \mu _i) \sim N( \mu _i, \sigma ^2)\)than\({\hat{f}}_i^{{\rm HP}}( \tilde{y} | X_1,X_2) \sim N( \hat{\mu }_i^{{\rm HP}}, \nu ^2)\)with respect to the\(\{D(\alpha )\}\)metric for every\(\alpha ( -1 \le \alpha \le 1)\) , \(\sigma ^2\)and\(\nu ^2\).

Proof

This follows directly from Theorem 6, since from Lemma 1, \(D_{\alpha }( N( \tilde{y} | \hat{\mu }, \hat{\sigma }^2),\)\(N( \tilde{y} | \mu , \sigma ^2) )\) is monotone function of \(|\hat{\mu } - \mu |\), where \(\hat{\mu }\) is an estimator of \(\mu\).

Next, we consider the cases of unknown variances.

4.3 Case when variances are unknown and unrestricted

In this subsection, we consider the same setup as (15) and (16). It is assumed that \(\mu _1 \le \mu _2\) and that no restriction is given on unknown \(\sigma _i^2\). It is shown that plug-in predictive density with \(\hat{\mu }_i^{{\rm OS}},\) (4), is Pitman closer to the true predictive density than plug-in predictive density with \({\bar{X}}_i\) under \(D_\alpha\) loss.

From the result of Chang and Shinozaki (2015), we note that with respect to modified Pitman criterion, the most critical case for \(\hat{\mu }_i^{{\rm OS}}\) to be closer to \(\mu _i\) than \({\bar{X}}_i\) is the case when \(\mu _1=\mu _2=\mu\). Surprisingly, this result reduces the dominance problem in estimating two-ordered means to that in estimating the common mean, that is \(\hat{\mu }_i^{{\rm OS}}\) improves upon \({\bar{X}}_i\) under modified Pitman closeness criterion if and only if \(\hat{\mu }^{{\rm GD}}\) is closer to \(\mu\) than \({\bar{X}}_i\) under Pitman closeness criterion. Kubokawa (1989) has given a sufficient condition on \(n_i(n_i \ge 5)\), so that \(\hat{\mu }^{{\rm GD}}\) is closer to \(\mu\) than both \({\bar{X}}_1\) and \({\bar{X}}_2\).

As matter of fact, if we set \(\gamma\) as (13) in Theorem 1 of Chang and Shinozaki (2015), then we have the following.

Theorem 8

\({{\rm MPN}}_{\mu _i}(\hat{\mu }_i^{{\rm OS}}, {\bar{X}}_i) \ge 1/2\)for all\(\mu _1 \le \mu _2\)and for all\(\sigma _1^2\)and\(\sigma _2^2\)if and only if for all\(\sigma _1 ^2\)and\(\sigma _2^2\), \(PN_{\mu }(\hat{\mu }^{{\rm GD}}, {\bar{X}}_i) \ge 1/2\)when\(\mu _1= \mu _2= \mu\).

We wish to predict the density of a future observation from a normal population with mean \(\mu _i\) and unknown variance \(\sigma ^2\), that is, we wish to predict the density:
$$\begin{aligned} f(\tilde{y}) \sim N( \mu _i, \sigma ^2). \end{aligned}$$
Let
$$\begin{aligned} {\bar{X}}_{i} \sim N( \mu _i, \sigma _i^2/ n_i), \quad S_i^2 \sim \sigma ^2_i \chi _{n_i-1} ^2,\,i=1,2 \end{aligned}$$
(17)
are independent, where \(\mu _1 \le \mu _2\) and it is desired to predict the density of a future independent variable \(Y_i \sim N( \mu _i, a \sigma ^2), i=1,2,\) where \(a >0\) is known.

Based on Theorem 8, we have the following main result.

Theorem 9

The plug-in predictive density estimate\({\hat{f}}_i^{{\rm OS}}(\tilde{y}) \sim N( \hat{\mu }_i^{{\rm OS}}, a\widehat{\sigma ^2})\)is Pitman closer to\(f(\tilde{y} | \mu _i, a \sigma ^2)\)than\({\hat{f}}_i^{{\bar{X}}_i} (\tilde{y}) \sim N({\bar{X}}_i, a\widehat{\sigma ^2})\)for all\(\mu _1 \le \mu _2\)and\(\sigma _i^2, i=1,2\)under the\(D(\alpha )\)metric for all\(-1 \le \alpha \le 1\)and every estimator\(\widehat{\sigma ^2}\)if and only if\(\hat{\mu }^{{\rm GD}}\)is Pitman closer to\(\mu\)than\({\bar{X}}_i\)for all\(\sigma _1^2\)and\(\sigma _2^2\)when\(\mu _1 = \mu _2=\mu\).

Proof

The proof follows immediately from
$$\begin{aligned}&D_\alpha ( N(\hat{\mu }_i^{{\rm OS}}, a\widehat{\sigma ^2}), N( \mu _i, a\sigma ^2)) \le D_\alpha ( N({\bar{X}}_i, a\widehat{\sigma ^2}), N( \mu _i, a\sigma ^2)) \\&\quad \Leftrightarrow (\hat{\mu }_i^{{\rm OS}} -\mu _i)^2 \le ({\bar{X}}_i -\mu _i)^2\\&\quad \Leftrightarrow |\hat{\mu }_i^{{\rm OS}} -\mu _i| \le |{\bar{X}}_i -\mu _i|. \end{aligned}$$
From Theorem 8, we have for all \(\mu _1 \le \mu _2\) and \(\sigma _i^2, i=1,2\),
$$\begin{aligned} P\{ | \hat{\mu }_i^{{\rm OS}} - \mu _i | < | {\bar{X}}_i - \mu _i | \, \vert \hat{\mu }_i^{{\rm OS}} \ne {\bar{X}}_i \} \ge 1/2, \end{aligned}$$
if and only if
$$\begin{aligned} P\{ | \hat{\mu }^{{\rm GD}} - \mu | < | {\bar{X}}_i - \mu | \} \ge 1/2 \end{aligned}$$
for all \(\sigma _1^2\) and \(\sigma _2^2\) when \(\mu _1 = \mu _2=\mu\). This completes the proof.

4.4 Case when variances are ordered

In this subsection, we consider the same setup as (17). We give Pitman closeness domination results in predictive density estimation when unknown variances \(\sigma _1^2\) and \(\sigma _2^2\) satisfy the order restriction \(\sigma _1^2 \le \sigma _2^2\). We note, however, that Oono and Shinozaki (2005) and Chang et al. (2012) have MSE domination results in estimating means when an order restriction on \(\sigma _1^2\) and \(\sigma _2^2\) is present or absent.

First, we estimate the density of a future observation from a normal population with mean \(\mu _2\) and variance \(\sigma ^2= a \sigma _2^2\), where a is known, i.e., we estimate the density:
$$\begin{aligned} f(\tilde{y}) \sim N( \mu _2, a\sigma _2^2). \end{aligned}$$

Based on Theorem 1 and Lemma 1, we have the following.

Theorem 10

The plug-in predictive density estimate\({\hat{f}}^{{\rm CS}}(\tilde{y}) \sim N( \hat{\mu }_2^{{\rm CS}}, a\widehat{\sigma _2^2})\)is Pitman closer to\(f(\tilde{y} | \mu _2, a \sigma _2^2)\)than\({\hat{f}}^{{\rm OS}}(\tilde{y}) \sim N(\hat{\mu }_2^{{\rm OS}}, a\widehat{\sigma _2^2})\)under the\(D(\alpha )\)metric for all\(-1 \le \alpha \le 1\)and for any estimator\(\widehat{\sigma _2^2}\).

Proof

The proof follows immediately from
$$\begin{aligned}&D_\alpha ( N(\hat{\mu }_2^{{\rm CS}}, a\widehat{\sigma _2^2}), N( \mu _2, a\sigma _2^2)) \le D_\alpha ( N(\hat{\mu }_2^{{\rm OS}}, a\widehat{\sigma _2^2}), N( \mu _2, a\sigma _2^2)) \\&\quad \Leftrightarrow (\hat{\mu }_2^{{\rm CS}} -\mu _2)^2 \le (\hat{\mu }_2^{{\rm OS}} -\mu _2)^2\\&\quad \Leftrightarrow |\hat{\mu }_2^{{\rm CS}} -\mu _2| \le |\hat{\mu }_2^{{\rm OS}} -\mu _2|. \end{aligned}$$
From Theorem 1, we have
$$\begin{aligned} P\{ | \hat{\mu }_2^{{\rm CS}} - \mu _2 | < | \hat{\mu }_2^{{\rm OS}} - \mu _2 | \, \vert \hat{\mu }_2^{{\rm CS}} \ne \hat{\mu }_2^{{\rm OS}}\} \ge 1/2, \end{aligned}$$
with strict inequality for some \(\mu _1 \le \mu _2\) and \(\sigma _1^2 \le \sigma _2^2\). This completes the proof.

Next, we consider estimating the predictive density with smaller variance \(\sigma _1^2\), \(N(\mu _1, a\sigma _1^2)\). Based on Theorem 2 and Lemma 1, we have the following.

Theorem 11

The plug-in predictive density estimate\({\hat{f}}^{{\rm CS}}(\tilde{y}) \sim N( \hat{\mu }_1^{{\rm CS}}, a\widehat{\sigma }_1^2)\)can not be Pitman closer to\(f(\tilde{y}| \mu _1, a \sigma _1^2)\)than\({\hat{f}}^{{\rm OS}}(\tilde{y}) \sim N(\hat{\mu }_1^{{\rm OS}}, a\widehat{\sigma _1^2})\)when\(\mu _2-\mu _1\)is sufficiently large, under the\(\{D(\alpha )\}\)metric for all\(-1 \le \alpha \le 1\)and for any estimator\(\widehat{\sigma _1^2}\).

Proof

The proof follows immediately from Theorem 2.

Finally, we consider estimation of the predictive density function \(p( {\varvec{y}} | {\varvec{\mu }}, \varSigma ) = N \left( {\varvec{\mu }}, \varSigma \right)\) and show that \(N \left( \hat{\varvec{\mu }}^{{\rm CS}}, {\hat{\varSigma }} \right)\) dominates \(N \left( \hat{\varvec{\mu }}^{{\rm OS}}, {\hat{\varSigma }} \right)\) in terms of Pitman closeness under reverse Kullback–Leibler loss \(D_{+1}\), where
$$\begin{aligned} {\varvec{\mu }}=(\mu _1, \mu _2)^{\prime }, \, \, \varSigma = \left( \begin{array}{cc} \sigma _1^2 &{} 0 \\ 0 &{} \sigma _2^2 \end{array} \right) \text{ and } \, \, {\hat{\varSigma }}= \left( \begin{array}{cc} {\hat{\sigma }}_1^2 &{} 0 \\ 0 &{} {\hat{\sigma }}_2^2 \end{array} \right) . \, \end{aligned}$$

We need the following Lemma.

Lemma 2

The reverse Kullback–Leibler loss \(D_{+1}\) when we predict \(p(\tilde{\varvec{y}} | {\varvec{\mu }}, \varSigma ) = N \left( {\varvec{\mu }} , \varSigma \right)\) by \(p(\tilde{\varvec{y}} | \hat{\varvec{\mu }}, {\hat{\varSigma }} ) = N \left( \hat{\varvec{\mu }} , {\hat{\varSigma }} \right)\) is given as
$$\begin{aligned} D_{+1} \left( N \left( \hat{\varvec{\mu }} , {\hat{\varSigma }} \right) , N \left( {\varvec{\mu }} , \varSigma \right) \right) = 1/2 \left[ \sum _{i=1}^2 \left( \frac{ \hat{\sigma }_i^2}{ \sigma _i^2} -\log \frac{\hat{\sigma }_i^2}{ \sigma _i^2} -1 + \frac{(\mu _i - \hat{\mu }_i)^2}{ \sigma _i^2} \right) \right] . \end{aligned}$$

Proof

Straightforward calculation.

Theorem 12

If \(n_1 \ge n_2\) then \({\hat{f}}^{{\rm CS}}(\tilde{\varvec{y}}) \sim N \left( \hat{\varvec{\mu }}^{{\rm CS}}, {\hat{\varSigma }} \right)\) dominates \({\hat{f}}^{{\rm OS}}(\tilde{\varvec{y}}) \sim N \left( \hat{\varvec{\mu }}^{{\rm OS}}, {\hat{\varSigma }} \right)\) in terms of Pitman closeness under reverse Kullback–Leibler metric.

Proof

From Lemma 2 and Theorem 3, the result follows.

5 Extension to generalized Bayesian predictive densities

In this section, we discuss improving the generalized Bayesian predictive densities suggested by Corcuera and Giummolè (1999) under \(D(\alpha )\) loss.

Based on the data
$$\begin{aligned} X_{ij} \sim N(\mu _i, \sigma _i^2), \quad i = 1, 2, \ j = 1, \cdots , n_i, \end{aligned}$$
we predict the density \({\tilde{Y}} \sim N( \mu _i, \sigma _i^2), i =1,2\). We denote its density function by \(p(\tilde{y}; \mu _i, \sigma _i)\), where \(\mu _i\) and \(\sigma _i^2\) are unknown.
When \(-1 \le \alpha < 1\), Corcuera and Giummolè (1999) have established that the best invariant predictive density of \(p(\tilde{y}; \mu _i, \sigma _i)\) based solely on \(x_{i1}, \ldots , x_{in_2}\) is
$$\begin{aligned} {\hat{p}}_\alpha (\tilde{y}; \bar{x}_i, \tilde{\sigma }_i ) \propto \left[ 1+ \frac{1-\alpha }{2n_i+1-\alpha }\left( \frac{y-\bar{x}_i}{\tilde{\sigma }_i} \right) ^2 \right] ^{-(2n_i-1-\alpha )/2(1-\alpha )} , \end{aligned}$$
(18)
where \(\bar{x}_i\) is the sample mean and \(\tilde{\sigma }_i^2= ((n_i -1 )/n_i)s_i^2\) is the sample variance. Corcuera and Giummolè (1999) have also shown that \({\hat{p}}_\alpha (\tilde{y}; \bar{x}_i, \tilde{\sigma }_i )\) is the generalized Bayesian predictive density for the prior density \(f(\mu _i, \sigma _i) \propto 1/\sigma _i, 0< \sigma _i < \infty\). It is to be noted that \({\hat{p}}_\alpha (\tilde{y}; \bar{x}_i, \tilde{\sigma }_i)\) is not a normal distribution, although the plug-in density \(N(\bar{x}_i, s_i^2)\) is the generalized Bayes rule when \(\alpha = 1\).
We consider the following two cases separately, where order restrictions on \(\mu _i\) and/or \(\sigma _i^2\) are present:
  1. (i)

    Case when \(\mu _1 \le \mu _2\).

     
  2. (ii)

    Case when \(\mu _1 \le \mu _2\) and \(\sigma _1^2 \le \sigma _2^2\).

     

We consider to improve \({\hat{p}}_\alpha (\tilde{y}; \bar{x}_i, \tilde{\sigma }_i )\) or \({\hat{p}}_\alpha (\tilde{y}; \hat{\mu }_i^{{\rm OS}}, \tilde{\sigma }_i )\) by replacing \(\bar{x}_i\) with \(\hat{\mu }_i^{{\rm OS}}\) or \(\hat{\mu }_i^{{\rm OS}}\) with \(\hat{\mu }_i^{{\rm CS}}\), respectively.

The next lemma is usefully for improving the generalized Bayesian predictive densities (18). We give its proof for completeness.

Lemma 3

Let\(f(\cdot )\)be the probability density function of\(X \sim N(0, \tau ^2)\). Assume that\(g(t) \ge 0\)is symmetric about the origin and is a strictly decreasing function of |t| such that\(\int _{-\infty } ^\infty h(x)f(x){\text{d}}x < \infty\). Then
$$\begin{aligned} \int _{-\infty }^{\infty } g(y-x) f(y-\mu ) {\text{d}}y \end{aligned}$$
is a strictly decreasing function of\(|x-\mu |\).

Proof

By making the transformation \(z= y-\mu\), we see that
$$\begin{aligned} \int _{-\infty }^{\infty } g(y-x) f(y-\mu ) {\text{d}}y = \int _{-\infty }^{\infty } g(z-v) f(z) {{\rm d}}z = h(v), \end{aligned}$$
where \(v= x-\mu\). Then, h(v) satisfies
  1. (i)

    \(h(v) = h(-v)\). (Since f and g are symmetric about the origin.)

     
  2. (ii)

    h(v) is a strictly decreasing function of |v|.

     
We prove (ii) here. We need only to show that \(h(v) -h(v + \varDelta )> 0\) for any \(v\ge 0\) and for any \(\varDelta >0\). We have
$$\begin{aligned} h(v) -h(v + \varDelta ) = \int _{-\infty }^{\infty } k(z;v, \varDelta ) f(z) {{\rm d}}z, \end{aligned}$$
where
$$\begin{aligned} k(z;v, \varDelta ) = g(z-v) - g(z-v-\varDelta ). \end{aligned}$$
We notice that \(k(z;v, \varDelta )\) satisfies
  1. (1)

    \(k(v+ \varDelta /2;v, \varDelta )= 0\).

     
  2. (2)

    When \(z >v+ \varDelta /2\), \(k(z;v, \varDelta )<0\).

     
  3. (3)

    When \(z <v+ \varDelta /2\), \(k(z;v, \varDelta )>0\).

    and

     
  4. (4)

    \(k(v+ \varDelta /2+(z-v-\varDelta /2;v, \varDelta ) = - k(v+ \varDelta /2-(z-v-\varDelta /2;v, \varDelta ).\)

     
Thus, we see that \(h(v) -h(v + \varDelta ) > 0.\)

Note: Lemma 3 can be generalized to p dimensional case, see Lemma (A.6) of Fourdrinier et al. (2018), which is an extension of Anderson’s Theorem due to Chou and Strawderman (1990).

Let \(\hat{\mu }_i\) denote an estimator of \(\mu _i, i = 1,2\) in general. Now, we show that for any \(1 \le \alpha < 1\), \(D_\alpha ({\hat{p}}_\alpha ( \tilde{y}; \hat{\mu }_i, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i) )\) is a strictly increasing function of \(|\hat{\mu }_i - \mu _i |\).

From Lemma 3, we see that for \(|\alpha | <1\)
$$\begin{aligned} D_\alpha ({\hat{p}}_\alpha ( \tilde{y}; \hat{\mu }_i, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i) ) \propto 1- \int _{\infty }^{\infty } g( \tilde{y}- \hat{\mu }_i)f(\tilde{y}-\mu _i) d \tilde{y} \end{aligned}$$
is a strictly increasing function of \(|\hat{\mu }_i- \mu _i|\), where
$$\begin{aligned} g(y-x) = \left[ 1+ \frac{1-\alpha }{2n_i+1-\alpha }\left( \frac{y- x}{\hat{\sigma }_i} \right) ^2 \right] ^{-(2n_i-1-\alpha )(1+ \alpha )/4(1-\alpha )} \end{aligned}$$
and
$$\begin{aligned} f(y-\mu ) \propto \exp \left\{ -\frac{(1-\alpha )(y-\mu )^2}{4\sigma ^2} \right\} . \end{aligned}$$
For \(\alpha = -1\), to show that
$$\begin{aligned} D_{-1} ({\hat{p}}_{-1} (\tilde{y}; \hat{\mu }_i, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i) ) = - E_{\tilde{y}} \left\{ \log \left[ \frac{{\hat{p}}_{-1} ( \tilde{y}; \hat{\mu }_i, \hat{\sigma }_i)}{p(\tilde{y} ; \mu _i, \sigma _i) } \right] \right\} \end{aligned}$$
is a strictly increasing function of \(|\hat{\mu }_i - \mu _i|\), we need only to notice that
$$\begin{aligned}&\int _{-\infty }^{\infty } \log \left[ 1+ \frac{1}{n_i+1}\left( \frac{\tilde{y}- \hat{\mu }_i}{\hat{\sigma }_i} \right) ^2 \right] \exp \left\{ - \frac{(\tilde{y}- \mu _i)^2}{2\sigma ^2} \right\} {{\rm d}}\tilde{y} \\&\quad = \int _{-\infty }^{\infty } \log \left[ 1+ \frac{1}{n_i+1}\left( \frac{z- v}{\hat{\sigma }_i} \right) ^2 \right] \exp \left\{ - \frac{z^2}{2\sigma ^2} \right\} d z \end{aligned}$$
is a strictly increasing function of \(v=|\hat{\mu }_i - \mu _i|\) from Lemma 3.

5.1 Case when \(\, \mu _1 \le \mu _2\)

In this case, we have the following result.

Theorem 13

The predictive density estimate\({\hat{p}}_\alpha (\tilde{y}; \hat{\mu }_i^{{\rm OS}}, \hat{\sigma }_i), i = 1,2\)is closer to the predictive density\(p ( \tilde{y} ; \mu _i, \sigma _i)\)than\({\hat{p}}_\alpha (\tilde{y}; \bar{x}_i, \hat{\sigma }_i)\), respectively, under the\(\{D(\alpha )\}\)metric for all\(-1 \le \alpha < 1\)and for every estimator\(\hat{\sigma }_i\)if and only if\(\hat{\mu }^{{\rm GD}}\)is Pitman closer to\(\mu\)than\({\bar{X}}_i\)for all\(\sigma _1^2\)and\(\sigma _2^2\)when\(\mu _1 = \mu _2=\mu\).

Proof

Let \(\hat{\mu }_i\) denote an estimator of \(\mu _i, i = 1,2\) in general. Since \(D_\alpha ({\hat{p}}_\alpha ( \tilde{y}; \hat{\mu }_i, \hat{\sigma }_i),\)\(p(\tilde{y} ; \mu _i, \sigma _i) )\) is a strictly increasing function of \(|\hat{\mu }_i- \mu _i|\), we see that
$$\begin{aligned} D_{\alpha } ({\hat{p}}_{\alpha } ( \tilde{y}; \hat{\mu }_i^{{\rm OS}}, \hat{\sigma }_i) , p(\tilde{y} ; \mu _i, \sigma _i)) < D_{\alpha } ({\hat{p}}_{\alpha } ( \tilde{y}; \bar{x}_i, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i) ) \end{aligned}$$
if and only if
$$\begin{aligned} |\hat{\mu }_i^{{\rm OS}}- \mu _i| < |\bar{x}_i - \mu _i|. \end{aligned}$$
Thus, from Theorem 8 we have the desired result.

5.2 Case when \(\mu _1 \le \mu _2\) and \(\sigma _1^2 \le \sigma _2^2\)

Here, we give domination results in predictive density estimation when unknown variances \(\sigma _1^2\) and \(\sigma _2^2\) satisfy the order restriction \(\sigma _1^2 \le \sigma _2^2\). Then, we have following results.

Theorem 14

The predictive density estimate\({\hat{p}}_\alpha ( \tilde{y} ; \hat{\mu }_2^{{\rm CS}}, \hat{\sigma }_2)\)is closer to the predictive density\(p ( \tilde{y} ; \mu _2, \sigma _2)\)than\({\hat{p}}_\alpha ( \tilde{y} ; \hat{\mu }_2^{{\rm OS}}, \hat{\sigma }_2)\)under the\(\{D(\alpha )\}\)metric for all\(-1 \le \alpha <1\)and for every estimator\(\hat{\sigma }_2^2\).

Theorem 15

The predictive density estimate\({\hat{p}}_\alpha (\tilde{y}; \hat{\mu }_1^{{\rm CS}}, \hat{\sigma }_1)\)is not Pitman closer to\(p(\tilde{y}; \mu _1, s_1)\)than\({\hat{p}}_\alpha (\tilde{y};\hat{\mu }_1^{{\rm OS}}, \hat{\sigma }_1)\)when\(\mu _2-\mu _1\)is sufficiently large, under the\(\{D(\alpha )\}\)metric for all\(-1 \le \alpha < 1\)and for any estimator\(\hat{\sigma }_1^2\).

Proof of Theorems 14 and 15. Since for \(-1 \le \alpha < 1\)
$$\begin{aligned} D_\alpha ( {\hat{p}}_\alpha ( \tilde{y}; \hat{\mu }_i^{{\rm CS}}, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i) ) < D_\alpha ( {\hat{p}}_\alpha ( \tilde{y}; \hat{\mu }_i^{{\rm OS}}, \hat{\sigma }_i), p(\tilde{y} ; \mu _i, \sigma _i)) \end{aligned}$$
if and only if
$$\begin{aligned} |\hat{\mu }_i^{{\rm CS}} -\mu _i | < |\hat{\mu }_i^{{\rm OS}} -\mu _i | , \end{aligned}$$
from Theorems 1 and 2, Theorems 14 and 15 are established, respectively.

6 Conclusion

In this paper, we have considered Pitman closeness domination in predictive density estimation problems when the underlying loss metric is \(\alpha\)-divergence (\(D(\alpha )\)) loss introduced by Csiszàr (1967). When all underlying distributions considered are normal, including the distribution of the observables, the distribution of the variable, whose density is to be predicted, and the estimated predictive density, we have given a general expression for the \(\alpha\)-divergence loss and noted that it is a concave monotone function of quadratic loss.

Using modified Pitman closeness domination results of Chang and Shinozaki (2015) for the point estimation problems of the means (when the variances are known or when they are unknown but ordered), we are able to obtain Pitman closeness domination results for the corresponding predictive density estimation problems under the \(\{D(\alpha )\}\) metric.

Notes

Acknowledgements

We would like to thank the Editor, the Associate Editor, and anonymous reviewers for quite thoughtful and constructive comments which lead to an improved version of this paper. This work is supported by Grant-in-Aid for Scientific Research (C) nos. 26330047, 18K11196 Japan (to Yuan-Tsung Chang and Nobuo Shinozaki). This work was partially supported by a Grant from the Simons Foundation (#418098 to William Strawderman).

Supplementary material

42081_2019_43_MOESM1_ESM.pdf (38 kb)
Supplementary material 1 (PDF 12 kb)

References

  1. Barlow, R. E., Bartholomew, D. J., Bremner, J. M., & Brunk, H. D. (1972). Statistical inference under order restrictions. New York: Wiley.zbMATHGoogle Scholar
  2. Chang, Y.-T., Fukuda, K., & Shinozaki, N. (2017). Estimation of two ordered normal means when a covariance matrix is known. Statistics, 5, 1095–1104.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Chang, Y.-T., Oono, Y., & Shinozaki, N. (2012). Improved estimators for the common mean and ordered means of two normal distributions with ordered variances. Journal of Statistical Planning and Inference, 142, 2619–2628.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Chang, Y.-T., & Shinozaki, N. (2015). Estimation of two ordered normal means under modified Pitman nearness criterion. Annals of the Institute of Statistical Mathematics, 67, 863–883.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chang, Y.-T., & Strawderman, W. E. (2014). Stochastic domination in predictive density estimation for ordered normal means under \(\alpha\)-divergence loss. Journal of Multivariate Analysis, 128, 1–9.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Chou, J.-P., & Strawderman, W. E. (1990). Minimax estimation of means of multivariate normal mixtures. Journal of Multivariate Analysis, 35, 141–150.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cohen, A., & Sackrowitz, B. (2004). A discussion of some inference issues in order restricted models. Canadian Journal of Statistics, 32(2), 199–205.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Corcuera, J. M., & Giummolè, F. (1999). A generalized Bayes rule for prediction. Scandinavian Journal of Statistics, 26, 265–279.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Csiszàr, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.MathSciNetzbMATHGoogle Scholar
  10. Fernández, M. A., Rueda, C., & Salvador, B. (2000). Parameter estimation under orthant restrictions. Canadian Journal of Statistics, 28(1), 171–181.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Fourdrinier, D., Marchand, E., Righi, A., & Strawderman, W. E. (2011). On improved predictive density with parametric constraints. Electronic Journal of Statistics, 5, 172–191.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Fourdrinier, D., Strawderman, W. E., & Wells, M. T. (2018). Shrinkage estimation. Berlin: Spring series in Statistics.CrossRefzbMATHGoogle Scholar
  13. Graybill, F. A., & Deal, R. B. (1959). Combining unbiased estimators. Biometrics, 15, 543–550.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Gupta, R. D., & Singh, H. (1992). Pitman nearness comparisons of estimates of two ordered normal means. Australian Journal of Statistics, 34(3), 407–414.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Hwang, J. T., & Peddada, S. D. (1994). Confidence interval estimation subject to order restrictions. The Annals of Statistics, 22(1), 67–93.MathSciNetCrossRefzbMATHGoogle Scholar
  16. Keating, J. P., Mason, R. L., & Sen, P. K. (1993). Pitman’s measure of closeness: A comparison of statistical estimators. Philadelphia: SIAM.CrossRefzbMATHGoogle Scholar
  17. Kubokawa, T. (1989). Closer estimation of a common mean in the sense of Pitman. Annals of the Institute of Statistical Mathematics, 41(3), 477–484.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Lee, C. I. C. (1981). The quadratic loss of isotonic regression under normality. The Annals of Statistics, 9(3), 686–688.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Maruyama, Y., & Strawderman, W. E. (2012). Bayesian predictive densities for linear regression models under \(\alpha\)-divergence loss: Some results and open problems. Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festschrift for William E. Strawderman. Institute of Mathematical Statistics. vol 8, pp 42–56Google Scholar
  20. Nayak, T. K. (1990). Estimation of location and scale parameters using generalized Pitman nearness criterion. Journal of Statistical Planning and Inference, 24, 259–268.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Oono, Y., & Shinozaki, N. (2005). Estimation of two order restricted normal means with unknown and possibly unequal variances. Journal of Statistical Planning and Inference, 131(2), 349–363.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Pitman, E. J. G. (1937). The closest estimates of statistical parameters. Proceedings of the Cambridge Philosophical Society, 33, 212–222.CrossRefzbMATHGoogle Scholar
  23. Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. New York: Wiley.zbMATHGoogle Scholar
  24. Shinozaki, N., & Chang, Y.-T. (1999). A comparison of maximum likelihood and the best unbiased estimators in the estimation of linear combinations of positive normal means. Statistics and Decisions, 17, 125–136.MathSciNetzbMATHGoogle Scholar
  25. Silvapulle, M. J., & Sen, P. K. (2004). Constrained statistical inference. New York: Wiley.Google Scholar
  26. van Eeden, C. (2006). Restricted parameter space estimation problems. Lecture notes in statistics 188. Berlin: Springer.CrossRefGoogle Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2019

Authors and Affiliations

  1. 1.Department of Social Information, Faculty of Studies on Contemporary SocietyMejiro UniversityTokyoJapan
  2. 2.Faculty of Science and TechnologyKeio UniversityYokohamaJapan
  3. 3.Department of Statistics and BiostatisticsRutgers UniversityNJUSA

Personalised recommendations