Non-minimaxity of debiased shrinkage estimators

We consider the estimation of the $p$-variate normal mean of $X\sim N_p(\theta,I)$ under the quadratic loss function. We investigate the decision theoretic properties of debiased shrinkage estimator, the estimator which shrinks towards the origin for smaller $\|x\|^2$ and which is exactly equal to the unbiased estimator $X$ for larger $\|x\|^2$. Such debiased shrinkage estimator seems superior to the unbiased estimator $X$, which implies minimaxity. However we show that it is not minimax under mild conditions.


Introduction
Let X have a p-variate normal distribution N p (θ, I p ).We consider the problem of estimating the mean vector θ under the loss function (1.1) The risk function of an estimator θ(X) is The usual unbiased estimator X has the constant risk p and is minimax for p ∈ N. Stein (1956) showed that there are orthogonally equivariant estimators of the form which dominate X when p ≥ 3. James and Stein (1961) gave an explicit dominating procedure θJS (X) = 1 − p − 2 ∥X∥ 2 X, (1.3) called the James-Stein estimator.Further, as shown in Baranchik (1964), the James-Stein estimator is inadmissible since the positive-part estimator θ+ JS (X) = max 0, 1 − p − 2 ∥X∥ 2 X (1.4) dominates θJS .For a class of general shrinkage estimators θϕ (X) given by (1.2), Baranchik (1970) proposed a sufficient condition for minimaxity, {B.1 and B.2} where B.1 0 ≤ ϕ(w) ≤ 2(p − 2) for all w ≥ 0, B.2 ϕ ′ (w) ≥ 0 for all w ≥ 0.
Further Stein (1974) expressed the risk of θϕ (X) as where r ϕ (w) = ϕ(w) w {ϕ(w) − 2(p − 2)} − 4ϕ ′ (w). (1.6) Hence the shrinkage factor ϕ(w) with the inequality r ϕ (w) ≤ 0 for all w ≥ 0, implies minimaxity of θϕ .We see that {B.1 and B.2} is a tractable sufficient condition for r ϕ (w) ≤ 0 for all w ≥ 0. A series of papers, Efron and Morris (1971, 1972a,b, 1973), showed that the James-Stein estimator can be interpreted as an empirical Bayes estimator under θ ∼ N p (0, τ I p ). Hence the shrinkage estimator including the James-Stein estimator utilize the prior information that ∥θ∥ 2 is relatively small.In fact, the risk function of the James-Stein estimator is which is increasing in ∥θ∥ 2 .On the other hand, the larger ∥x∥ 2 suggests that the prior information (∥θ∥ 2 is relatively small) is incorrect.Although the James-Stein estimator uniformly dominates X under the quadratic risk, for larger ∥x∥ 2 , the unbiased estimator X seems superior to the shrinkage estimators with the bias given by with O(1/∥θ∥) provided ϕ(w) is bounded.Note that many popular shrinkage estimators have ϕ with lim inf w→∞ ϕ(w) ≥ p − 2.
Hence the debiased shrinkage estimator shrinks towards the origin for smaller ∥x∥ 2 and is exactly equal to the unbiased estimator X for larger ∥x∥ 2 .Such debiased shrinkage estimators seem superior to the unbiased estimator X, which implies minimaxity.In this paper, we are interested in whether the debiased shrinkage estimators are minimax or not.
In the literature, there are some debiased estimators including SCAD (Smoothly Clipped Absolute Deviation) by Fan and Li (2001) and nearly unbiased estimators by MCP (Minimax Concave Penalty) by Zhang (2010), which have not necessarily aimed at enjoying the conventional minimaxity.
The organization of this paper is as follows.By (1.5), the risk difference between θ and the minimax estimator X is given by (1.10) where r ϕ (w) is given by (1.6) and the second equality follows from DS.2.In Section 2, we give a useful result, Theorem 2.1, on the asymptotic behavior of this type of an expected value when ∥θ∥ 2 → ∞.In Section 3, we review SCAD and MCP as a solution of penalized least squares and investigate how the corresponding ϕ(w) approaches 0 as w ↗ a.In Section 4, using Theorem 2.1, we show that the debiased shrinkage estimators with DS.1 and DS.2 as well as mild conditions on the way how ϕ(w) approaches 0 as w ↗ a, are not minimax, which is not necessarily expected.

Asymptotic behavior of an expected value
For fixed a > 0, we investigate the asymptotic behavior of the expected value where g(w) satisfies A.1 and A. (2.2) Notice that, on A.2, we do not lose the generality even if we assume the limit of g(w)/(a − w) b is 1.If the limit is equal to g * (̸ = 0), we have only to consider Then we have the following result.
Theorem 2.1.Assume p ≥ 2 and that g(w) satisfies A.1 and Proof.We first prove the theorem under the proper subset of A.1; Note that ∥X∥ 2 can be decomposed as and U and V are mutually independent.Then we have where q = (p − 1)/2, H(•) is given by Since the asymptotic behavior of G(ν; a) as ν → ∞ is of interest, ν > a is assumed in the following.For G(ν; a), apply the change of variables, Then we have Further we rewrite it as where (2.5) From Part 2 of Lemma 2.1 below, H(y) on [0, a] is bounded under A.1.1.Hence, for any ν, we have (2.6) Further, by (2.5) and Part 1 of Lemma 2.1, we have (2.7) By (2.6) and (2.7), the dominated convergence theorem, gives which completes the proof under A.1.1.Now we assume A.1, that is, w (p−1)/2 |g(w)| is bounded on [0, a] as Let f p (w, ν) be the density of W = ∥X∥ 2 .Note that f p (w, ν)/f p (w, 0) for any fixed ν > 0 is increasing in w and that w −(p−1)/2 is decreasing in w.By the correlation inequality, we have where which are both bounded.Then we have and, by the result under A.1.1, where c(a, b, p) is given by (2.3).Hence Theorem 2.1 is valid for the case where The following lemma gives some properties on H(y) given by (2.4), needed in the proof of Theorem 2.1.
Lemma 2.1.We assume that |g(w)| is bounded on [0, a] as in A.1.1.Then we have the following results.
By (2.4), we have which is bounded under A.1.1.By the continuity of H(y) and Part 1 of this lemma, the part 2 follows.

Review of existing debiased shrinkage estimators
As we mentioned in Section 1, in the literature, there are some "debiased shrinkage" estimators including SCAD (Smoothly Clipped Absolute Deviation) by Fan and Li (2001) and nearly unbiased estimators by MCP (Minimax Concave Penalty) by Zhang (2010), although they do not necessarily aim at enjoying the conventional minimaxity.In this section, we assume p = 1 and review existing estimators as solutions of the penalized least squares problem; Table 1 summarizes three popular penalty functions P (|θ|; λ), and the corresponding minimizers "ridge", "soft thresholding" and "hard thresholding".For the three estimators, the corresponding shrinkage factors, ϕ(x 2 ), from the form We see that DS.2, DS.2 and DS.1 are not satisfied by ϕ R (w), ϕ ST (w) and ϕ HT (w), respectively.SCAD (Smoothly Clipped Absolute Deviation) by Fan and Li (2001) is the minimizer, (3.1), with the continuous differentiable penalty function defined by where α > 2. The resulting solution is where the corresponding shrinkage factor is (3.5) We see that ϕ SCAD (w) satisfies both DS.1 and DS.2.Further, by (3.5), the derivative at As pointed in Strawderman and Wells (2012), the nearly unbiased estimator by MCP (Minimax Concave Penalty) considered in Zhang (2010) is equivalent to the minimizer of (3.1) with the continuous differentiable penalty function defined by where α > 1.Then the resulting solution is given by where the corresponding shrinkage factor is (3.9) We see that ϕ MCP (w) satisfies both DS.1 and DS.2.Further, by (3.9), the derivative at (3.10)By (3.6) and (3.10), both ϕ SCAD (w) and ϕ MCP (w) approach 0 as w ↗ α 2 λ 2 with the negative slope.Aside from the justification as a solution of the penalized least squares problem (3.1), let us consider (3.11) We see that ϕ Q (w) satisfies both DS.1 and DS.2 and (3.12) When ϕ(w) of debiased shrinkage estimator approaches 0 from above as w → a, it seems that both {(3.6) and (3.10)} and (3.12) are typical behaviors characterized by ϕ ′ (w).

Main result
In this section, we investigate the minimaxity of the shrinkage debiased estimators with DS.1 and DS.2.Recall, as in (1.10), the risk difference between θ and the minimax estimator X is where r ϕ (w) is given by (1.6).Under the assumptions on ϕ(w), DS.1 and DS.2, r ϕ (w) given by (1.6) is bounded, that is, there exists an M such that For ϕ(w) with lim w↗a ϕ(w) = 0 as wells as ϕ(w) > 0 for w < a, we consider two cases as a generalization of {(3.6) and (3.10)} and (3.12): Case 1 lim sup w↗a ϕ ′ (w) < 0.
Theorem 4.1.The debiased shrinkage estimator with DS.1 and DS.2 is not minimax under either Case 1 or Case 2.
The results by Hansen (2022) and Robert (1988), do not seem to directly provide the exact asymptotic order of the major term of (4.13) with the exact coefficient.Using Theorem 2.1, we can get it as follows.Since w (p−1)/2 |g(w)| is bounded on [0, a].A.2 There exists a nonnegative real b such that lim w↗a g(w) (a − w) b = 1.