Appendix: Proofs and Derivations
Proof of Lemma 1
We can write \(\boldsymbol{\theta }=\boldsymbol{\mu } +\boldsymbol{Z}_{1}\)
and \(\boldsymbol{Y } =\boldsymbol{\theta } +\boldsymbol{Z}_{2}\),
where \(\boldsymbol{Z}_{1} \sim \mathcal{N}_{p}(\boldsymbol{0},\boldsymbol{B})\)
and \(\boldsymbol{Z}_{2} \sim \mathcal{N}_{p}(\boldsymbol{0},\boldsymbol{A})\)
are independent. Jointly \({\boldsymbol{Y }\abovewithdelims()0.0pt \boldsymbol{\theta }}\)
is still multivariate normal with mean vector \({\boldsymbol{\mu }\abovewithdelims()0.0pt \boldsymbol{\mu }}\)
and covariance matrix \(\left (\begin{array}{cc} \boldsymbol{A} +\boldsymbol{ B}&\boldsymbol{B}\\ \boldsymbol{B} &\boldsymbol{B} \end{array} \right )\).
The result follows immediately from the conditional distribution of a multivariate
normal distribution.
Proof of Theorem 1
We start from decomposing the difference between the URE and the actual loss as
$$\displaystyle\begin{array}{rcl} & & \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\mu }\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\mu }}\right ) \\ & =& \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{0}_{p}\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{0}_{p} }\right ) -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right ){}\end{array}$$
(18)
$$\displaystyle\begin{array}{rcl} & =& \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ A} -\boldsymbol{\theta \theta }^{T}\right ) -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ Y \theta }^{T} -\boldsymbol{ A}\right )\right ) \\ & & -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right ) \\ & =& \left (\mathrm{I}\right ) + \left (\mathrm{II}\right ) + \left (\mathrm{III}\right ). {}\end{array}$$
(19)
To verify the first equality (18), note that
$$\displaystyle\begin{array}{rcl} & & \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\mu }\right ) -\mathrm{ URE}\left (\boldsymbol{B},\boldsymbol{0}_{p}\right ) {}\\ & & = \dfrac{1} {p}\left \Vert \boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\left (\boldsymbol{Y }-\boldsymbol{\mu }\right )\right \Vert ^{2} -\dfrac{1} {p}\left \Vert \boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{Y }\right \Vert ^{2} {}\\ & & = -\dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{\mu }^{T}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )^{T}\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\left (2\boldsymbol{Y }-\boldsymbol{\mu }\right )\right ), {}\\ & & \quad l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\mu }}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{0}_{p} }\right ) {}\\ & & = \dfrac{1} {p}\left \Vert \left (\boldsymbol{I}_{p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )\boldsymbol{Y } +\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{\mu }-\boldsymbol{\theta }\right \Vert ^{2} {}\\ & & \quad -\dfrac{1} {p}\left \Vert \left (\boldsymbol{I}_{p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )\boldsymbol{Y }-\boldsymbol{\theta }\right \Vert ^{2} {}\\ & & = \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{\mu }^{T}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )^{T}\left (2\left (\left (\boldsymbol{I}_{ p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )\boldsymbol{Y -\theta }\right ) +\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{\mu }\right )\right ). {}\\ \end{array}$$
Equation (18) then follows by rearranging the terms. To verify the second equality (19), note
$$\displaystyle\begin{array}{rcl} & & \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{0}_{p}\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{0}_{p} }\right ) {}\\ & & = \dfrac{1} {p}\left \Vert \boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{Y }\right \Vert ^{2} -\dfrac{1} {p}\left \Vert \left (\boldsymbol{I}_{p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )\boldsymbol{Y }-\boldsymbol{\theta }\right \Vert ^{2} {}\\ & & \quad + \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{A} - 2\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{A}\right ) {}\\ & & = \dfrac{1} {p}\mathrm{tr}\left (\left (\boldsymbol{Y } - 2\left (\boldsymbol{I}_{p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\right )\boldsymbol{Y }+\boldsymbol{\theta }\right )^{T}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right ) {}\\ & & \quad + \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{A} - 2\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{A}\right ) {}\\ & & = \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ A} -\boldsymbol{\theta \theta }^{T}\right ) -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\left (\boldsymbol{Y }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T} -\boldsymbol{ A}\right )\right ). {}\\ \end{array}$$
With the decomposition, we want to prove separately the uniform L
1 convergence of the three terms \(\left (\mathrm{I}\right )\), \(\left (\mathrm{II}\right )\), and \(\left (\mathrm{III}\right )\).
Proof for the case of Model I.
The uniform L
2 convergence of \(\left (\mathrm{I}\right )\) and \(\left (\mathrm{II}\right )\) has been shown in Theorem 3.1 of [29] under our assumptions \(\left (\mathrm{A}\right )\) and \(\left (\mathrm{B}\right )\), so we focus on \(\left (\mathrm{III}\right )\), i.e., we want to show that \(\sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )\right \vert \rightarrow 0\) in L
1 as p → ∞.
Without loss of generality, let us assume A
1 ≤ A
2 ≤ ⋯ ≤ A
p
. We have
$$\displaystyle\begin{array}{rcl} \sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )\right \vert & =& \dfrac{2} {p}\sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \sum _{i=1}^{p} \dfrac{A_{i}} {A_{i}+\lambda }\mu _{i}\left (Y _{i} -\theta _{i}\right )\right \vert {}\\ & \leq & \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sup \limits _{0\leq c_{1}\leq \cdots \leq c_{p}\leq 1}\left \vert \sum _{i=1}^{p}c_{ i}\mu _{i}\left (Y _{i} -\theta _{i}\right )\right \vert {}\\ & =& \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\max \limits _{1\leq j\leq p}\left \vert \sum _{i=j}^{p}\mu _{ i}\left (Y _{i} -\theta _{i}\right )\right \vert, {}\\ \end{array}$$
where the last equality follows from Lemma 2.1 of [13]. For a generic p-dimensional vector \(\boldsymbol{v}\), we denote \([\boldsymbol{v}]_{j:p} = (0,\ldots 0,v_{j},v_{j+1},\ldots,v_{p})\). Let \(\boldsymbol{P}_{\boldsymbol{X}} =\boldsymbol{ X}^{T}\left (\boldsymbol{XX}^{T}\right )^{-1}\boldsymbol{X}\) be the projection matrix onto \(\mathcal{L}_{\mathrm{row}}\left (\boldsymbol{X}\right )\). Then since \(\mathcal{L}\subset \mathcal{L}_{\mathrm{row}}\left (\boldsymbol{X}\right )\), we have
$$\displaystyle\begin{array}{rcl} & & \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\max \limits _{1\leq j\leq p}\left \vert \sum _{i=j}^{p}\mu _{ i}\left (Y _{i} -\theta _{i}\right )\right \vert = \dfrac{2} {p}\max \limits _{1\leq j\leq p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \boldsymbol{\mu }^{T}[\boldsymbol{Y }-\boldsymbol{\theta }]_{ j:p}\right \vert {}\\ & =& \dfrac{2} {p}\max \limits _{1\leq j\leq p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \boldsymbol{\mu }^{T}\boldsymbol{P}_{\boldsymbol{ X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \vert \leq \dfrac{2} {p}\max \limits _{1\leq j\leq p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \Vert \boldsymbol{\mu }\right \Vert \times \left \Vert \boldsymbol{P}_{\boldsymbol{X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \Vert {}\\ & =& \dfrac{2} {p}\max \limits _{1\leq j\leq p}Mp^{\kappa }\left \Vert \boldsymbol{Y }\right \Vert \times \left \Vert \boldsymbol{P}_{\boldsymbol{X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \Vert. {}\\ \end{array}$$
Cauchy-Schwarz inequality thus gives
$$\displaystyle{ \mathbb{E}\left (\sup \limits _{0\leq \lambda \leq \infty,\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )\right \vert \right ) \leq 2Mp^{\kappa -1}\sqrt{\mathbb{E}\left (\left \Vert \boldsymbol{Y } \right \Vert ^{2 } \right )} \times \sqrt{\mathbb{E}\left (\max \limits _{ 1\leq j\leq p}\left \Vert \boldsymbol{P}_{\boldsymbol{X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \Vert ^{2}\right )}. }$$
(20)
It is straightforward to see that, by conditions (A) and (C),
$$\displaystyle{\sqrt{\mathbb{E}\left (\left \Vert \boldsymbol{Y } \right \Vert ^{2 } \right )} = \sqrt{\mathbb{E}( \sum \nolimits _{i=1 }^{p }Y _{i }^{2 })} = \sqrt{\sum \nolimits _{i=1 }^{p }\left (\theta _{i }^{2 } + A_{i } \right )} = O\left (p^{1/2}\right ).}$$
For the second term on the right-hand side of (20), let \(\boldsymbol{P}_{\boldsymbol{X}} =\boldsymbol{\varGamma D\varGamma }^{T}\) denote the spectral decomposition. Clearly,
$$\displaystyle{\boldsymbol{D} =\mathrm{ diag}\left (\mathop{\mathop{\underbrace{1,\ldots,1}}\limits }\limits_{k\text{ copies}},\mathop{\mathop{\underbrace{0,\ldots,0}}\limits }\limits_{p - k\text{ copies}}\right ).}$$
It follows that
$$\displaystyle\begin{array}{rcl} & & \mathbb{E}\left (\max \limits _{1\leq j\leq p}\left \Vert \boldsymbol{P}_{\boldsymbol{X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \Vert ^{2}\right ) = \mathbb{E}\left (\max \limits _{ 1\leq j\leq p}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}^{T}\boldsymbol{P}_{\boldsymbol{ X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right ) {}\\ & & = \mathbb{E}\left (\max \limits _{1\leq j\leq p}\mathrm{tr}\left (\boldsymbol{D\varGamma }^{T}[\boldsymbol{Y }-\boldsymbol{\theta }]_{ j:p}\left (\boldsymbol{\varGamma }^{T}[\boldsymbol{Y }-\boldsymbol{\theta }]_{ j:p}\right )^{T}\right )\right ) {}\\ & & = \mathbb{E}\left (\max \limits _{1\leq j\leq p}\sum _{l=1}^{k}\left [\boldsymbol{\varGamma }^{T}[\boldsymbol{Y }-\boldsymbol{\theta }]_{ j:p}\right ]_{l}^{2}\right ) {}\\ & & = \mathbb{E}\left (\max \limits _{1\leq j\leq p}\sum _{l=1}^{k}\left (\sum _{ m=j}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{ lm}\left (Y _{m} -\theta _{m}\right )\right )^{2}\right ) {}\\ & & \leq \mathbb{E}\left (\sum _{l=1}^{k}\max \limits _{ 1\leq j\leq p}\left (\sum _{m=j}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{ lm}\left (Y _{m} -\theta _{m}\right )\right )^{2}\right ) {}\\ & & =\sum _{ l=1}^{k}\mathbb{E}\left (\max \limits _{ 1\leq j\leq p}\left (\sum _{m=j}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{ lm}\left (Y _{m} -\theta _{m}\right )\right )^{2}\right ). {}\\ \end{array}$$
For each l, \(M_{j}^{\left (l\right )} =\sum _{ m=p-j+1}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{lm}\left (Y _{m} -\theta _{m}\right )\) forms a martingale, so by Doob’s L
p maximum inequality,
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\max \limits _{1\leq j\leq p}\left (M_{j}^{\left (l\right )}\right )^{2}\right )& \leq & 4\mathbb{E}\left (M_{ p}^{\left (l\right )}\right )^{2} = 4\mathbb{E}\left (\sum _{ m=1}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{ lm}\left (Y _{m} -\theta _{m}\right )\right )^{2} {}\\ & =& 4\sum _{m=1}^{p}\left [\boldsymbol{\varGamma }^{T}\right ]_{ lm}^{2}A_{ m} = 4\left [\boldsymbol{\varGamma }^{T}\boldsymbol{A\varGamma }\right ]_{ ll}. {}\\ \end{array}$$
Therefore,
$$\displaystyle\begin{array}{rcl} & & \mathbb{E}\left (\max \limits _{1\leq j\leq p}\left \Vert \boldsymbol{P}_{\boldsymbol{X}}[\boldsymbol{Y }-\boldsymbol{\theta }]_{j:p}\right \Vert ^{2}\right ) \leq \sum _{ l=1}^{k}4\left [\boldsymbol{\varGamma }^{T}\boldsymbol{A\varGamma }\right ]_{ ll} {}\\ & & = 4\sum _{l=1}^{p}\left [\boldsymbol{D}\right ]_{ ll}\left [\boldsymbol{\varGamma }^{T}\boldsymbol{A\varGamma }\right ]_{ ll} = 4\ \mathrm{tr}\left (\boldsymbol{D\varGamma }^{T}\boldsymbol{A\varGamma }\right ) = 4\ \mathrm{tr}\left (\boldsymbol{P}_{\boldsymbol{ X}}\boldsymbol{A}\right ) {}\\ & & = 4\ \mathrm{tr}\left (\boldsymbol{X}^{T}\left (\boldsymbol{XX}^{T}\right )^{-1}\boldsymbol{XA}\right ) = 4\ \mathrm{tr}\left (\left (\boldsymbol{XX}^{T}\right )^{-1}\boldsymbol{XAX}^{T}\right ) = O\left (1\right ), {}\\ \end{array}$$
where the last equality uses conditions \(\left (\mathrm{D}\right )\) and \(\left (\mathrm{E}\right )\). We finally obtain
$$\displaystyle{\mathbb{E}\left (\sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )\right \vert \right ) \leq o\left (p^{-1/2}\right ) \times O\left (p^{1/2}\right ) \times O\left (1\right ) = o\left (1\right ).}$$
Proof for the case of Model II.
Under Model II, we know that
$$\displaystyle{\sum _{i=1}^{p}A_{ i}\theta _{i}^{2} =\boldsymbol{\theta } ^{T}\boldsymbol{A\theta } =\boldsymbol{\beta } ^{T}(\boldsymbol{XAX}^{T})\boldsymbol{\beta } = O\left (p\right )}$$
by condition \(\left (\mathrm{D}\right )\). In other words, condition \(\left (\mathrm{D}\right )\) implies condition \(\left (\mathrm{B}\right )\). Therefore, we know that the term \(\left (\mathrm{I}\right ) \rightarrow 0\) in L
2 as shown in Theorem 3.1 of [29], and we only need to show the uniform L
1 convergence of the other two terms, \(\left (\mathrm{II}\right )\) and \(\left (\mathrm{III}\right )\).
Recall that \(\boldsymbol{B} \in \mathcal{B} = \left \{\lambda \boldsymbol{X}^{T}\boldsymbol{WX}:\lambda > 0\right \}\) has only rank k under Model II. We can reexpress \(\left (\mathrm{II}\right )\) and \(\left (\mathrm{III}\right )\) in terms of low rank matrices. Let \(\boldsymbol{V } = \left (\boldsymbol{XA}^{-1}\boldsymbol{X}^{T}\right )^{-1}\). Woodbury formula gives
$$\displaystyle\begin{array}{rcl} \left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}& =& \left (\boldsymbol{A} +\lambda \boldsymbol{ X}^{T}\boldsymbol{WX}\right )^{-1} =\boldsymbol{ A}^{-1} -\boldsymbol{ A}^{-1}\lambda \boldsymbol{X}^{T}\left (\boldsymbol{W}^{-1} +\lambda \boldsymbol{ V }^{-1}\right )^{-1}\boldsymbol{XA}^{-1} {}\\ & =& \boldsymbol{A}^{-1} -\boldsymbol{ A}^{-1}\lambda \boldsymbol{X}^{T}\boldsymbol{W}\left (\lambda \boldsymbol{W} +\boldsymbol{ V }\right )^{-1}\boldsymbol{V XA}^{-1}, {}\\ \end{array}$$
which tells us
$$\displaystyle{\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1} =\boldsymbol{ I}_{ p} -\boldsymbol{ A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1} =\lambda \boldsymbol{ X}^{T}\boldsymbol{W}\left (\lambda \boldsymbol{W} +\boldsymbol{ V }\right )^{-1}\boldsymbol{V XA}^{-1}.}$$
Let \(\boldsymbol{U}\boldsymbol{\varLambda U}^{T}\) be the spectral decomposition of \(\boldsymbol{W}^{-1/2}\boldsymbol{V W}^{-1/2}\), i.e., \(\boldsymbol{W}^{-1/2}\boldsymbol{V W}^{-1/2} =\boldsymbol{ U\varLambda }\boldsymbol{U}^{T}\), where \(\boldsymbol{\varLambda }=\mathrm{ diag}\left (d_{1},\ldots,d_{k}\right )\) with d
1 ≤ ⋯ ≤ d
k
. Then \(\left (\lambda \boldsymbol{W} +\boldsymbol{ V }\right )^{-1} =\boldsymbol{ W}^{-1/2}\left (\lambda \boldsymbol{I}_{k} +\boldsymbol{ W}^{-1/2}\boldsymbol{V W}^{-1/2}\right )^{-1}\boldsymbol{W}^{-1/2} =\boldsymbol{ W}^{-1/2}\boldsymbol{U}\left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{U}^{T}\boldsymbol{W}^{-1/2}\), from which we obtain
$$\displaystyle{\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1} =\lambda \boldsymbol{ X}^{T}\boldsymbol{W}\left (\lambda \boldsymbol{W} +\boldsymbol{ V }\right )^{-1}\boldsymbol{V XA}^{-1} =\lambda \boldsymbol{ X}^{T}\boldsymbol{W}^{1/2}\boldsymbol{U}\left (\lambda \boldsymbol{I}_{ k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda U}^{T}\boldsymbol{W}^{1/2}\boldsymbol{XA}^{-1}.}$$
If we denote \(\boldsymbol{Z} =\boldsymbol{ U}^{T}\boldsymbol{W}^{1/2}\boldsymbol{X}\), i.e., \(\boldsymbol{Z}\) is the transformed covariate matrix, then \(\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1} =\lambda \boldsymbol{ Z}^{T}\left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\). It follows that
$$\displaystyle\begin{array}{rcl} \left (\mathrm{II}\right )& =& -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{B}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ Y \theta }^{T} -\boldsymbol{ A}\right )\right ) {}\\ & =& -\dfrac{2} {p}\mathrm{tr}\left (\lambda \boldsymbol{Z}^{T}\left (\lambda \boldsymbol{I}_{ k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ Y \theta }^{T} -\boldsymbol{ A}\right )\right ) {}\\ & =& -\dfrac{2} {p}\mathrm{tr}\left (\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ Y \theta }^{T} -\boldsymbol{ A}\right )\boldsymbol{Z}^{T}\right ), {}\\ \left (\mathrm{III}\right )& =& -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right ) {}\\ & =& -\dfrac{2} {p}\mathrm{tr}\left (\left (\boldsymbol{I}_{p} -\lambda \boldsymbol{ Z}^{T}\left (\lambda \boldsymbol{I}_{ k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\right )\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right ) {}\\ & =& -\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right ) + \dfrac{2} {p}\mathrm{tr}\left (\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\boldsymbol{Z}^{T}\right ) {}\\ & =& \left (\mathrm{III}\right )_{1} + \left (\mathrm{III}\right )_{2}. {}\\ \end{array}$$
We will next show that \(\left (\mathrm{II}\right )\), \(\left (\mathrm{III}\right )_{1}\), and \(\left (\mathrm{III}\right )_{2}\) all uniformly converge to zero in L
1, which will then complete our proof.
Let \(\boldsymbol{\varXi }=\boldsymbol{ ZA}^{-1}\left (\boldsymbol{Y Y }^{T} -\boldsymbol{ Y \theta }^{T} -\boldsymbol{ A}\right )\boldsymbol{Z}^{T}\). Then
$$\displaystyle\begin{array}{rcl} \sup \limits _{0\leq \lambda \leq \infty }\left \vert \left (\mathrm{II}\right )\right \vert & =& \dfrac{2} {p}\sup \limits _{0\leq \lambda \leq \infty }\left \vert \sum _{i=1}^{k} \dfrac{\lambda d_{i}} {\lambda +d_{i}}\left [\boldsymbol{\varXi }\right ]_{ii}\right \vert {}\\ & \leq & \dfrac{2} {p}\sup \limits _{0\leq c_{1}\leq \cdots \leq c_{k}\leq d_{k}}\left \vert \sum _{i=1}^{k}c_{ i}\left [\boldsymbol{\varXi }\right ]_{ii}\right \vert = \dfrac{2} {p}\max \limits _{1\leq j\leq k}\left \vert \sum _{i=j}^{k}d_{ k}\left [\boldsymbol{\varXi }\right ]_{ii}\right \vert, {}\\ \end{array}$$
where the last equality follows as in Lemma 2.1 of [13]. As there are finite number of terms in the summation and the maximization, it suffices to show that
$$\displaystyle{d_{k}\left [\boldsymbol{\varXi }\right ]_{ii}/p \rightarrow 0\text{ in }L^{2}\ \ \ \ \text{for all }1 \leq i \leq k.}$$
To establish this, we note that \(\left [\boldsymbol{\varXi }\right ]_{ii} =\sum _{ n=1}^{p}\sum _{m=1}^{p}\left (A_{n}^{-1}Y _{n}\left (Y _{m} -\theta _{m}\right ) -\delta _{nm}\right )\left [\boldsymbol{Z}\right ]_{in}\left [\boldsymbol{Z}\right ]_{im}\),
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\left [\boldsymbol{\varXi }\right ]_{ii}^{2}\right )& =& \sum _{ n,m,n^{{\prime}},m^{{\prime}}}\mathbb{E}\left (\left (A_{n}^{-1}Y _{ n}\left (Y _{m} -\theta _{m}\right ) -\delta _{nm}\right )\left (A_{n^{{\prime}}}^{-1}Y _{ n^{{\prime}}}\left (Y _{m^{{\prime}}}-\theta _{m^{{\prime}}}\right ) -\delta _{n^{{\prime}}m^{{\prime}}}\right )\right ) {}\\ & & \times \left [\boldsymbol{Z}\right ]_{in}\left [\boldsymbol{Z}\right ]_{im}\left [\boldsymbol{Z}\right ]_{in^{{\prime}}}\left [\boldsymbol{Z}\right ]_{im^{{\prime}}}. {}\\ \end{array}$$
Depending on n, m, n
′, m
′ taking the same or distinct values, we can break the summation into 15 disjoint cases:
$$\displaystyle\begin{array}{rcl} & & \sum _{\text{all distinct}} +\sum _{\text{three distinct, }n=m} +\sum _{\text{three distinct, }n=n^{{\prime}}} +\sum _{\text{three distinct, }n=m^{{\prime}}} {}\\ & +& \sum _{\text{three distinct, }m=n^{{\prime}}} +\sum _{\text{three distinct, }m=m^{{\prime}}} +\sum _{\text{three distinct, }n^{{\prime}}=m^{{\prime}}} +\sum _{\text{two distinct, }n=m\text{, }n^{{\prime}}=m^{{\prime}}} {}\\ {}\\ {}\\ & +& \sum _{\text{two distinct, }n=n^{{\prime}}\text{, }m=m^{{\prime}}} +\sum _{\text{two distinct, }n=m^{{\prime}}\text{, }n^{{\prime}}=m} +\sum _{\text{two distinct, }n=m=n^{{\prime}}} +\sum _{\text{two distinct, }n=m=m^{{\prime}}} {}\\ & +& \sum _{\text{two distinct, }n=n^{{\prime}}=m^{{\prime}}} +\sum _{\text{two distinct, }m=n^{{\prime}}=m^{{\prime}}} +\sum _{n=m=n^{{\prime}}=m^{{\prime}}}. {}\\ \end{array}$$
Many terms are zero. Straightforward evaluation of each summation gives
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\left [\boldsymbol{\varXi }\right ]_{ii}^{2}\right )& =& \sum _{ n=1}^{p}\mathbb{E}\left (\left (A_{ n}^{-1}Y _{ n}\left (Y _{n} -\theta _{n}\right ) - 1\right )^{2}\right )\left [\boldsymbol{Z}\right ]_{ in}^{4} {}\\ & & +\sum _{n=1}^{p}\sum _{ m\neq n}\mathbb{E}\left (\left (A_{n}^{-1}Y _{ n}\left (Y _{m} -\theta _{m}\right )\right )^{2}\right )\left [\boldsymbol{Z}\right ]_{ in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} {}\\ & & +\sum _{n=1}^{p}\sum _{ m\neq n}\mathbb{E}\left (\left (A_{n}^{-1}Y _{ n}\left (Y _{m} -\theta _{m}\right )\right )\left (A_{m}^{-1}Y _{ m}\left (Y _{n} -\theta _{n}\right )\right )\right )\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} {}\\ & & +2\sum _{n=1}^{p}\sum _{ m\neq n}\mathbb{E}\left (\left (A_{n}^{-1}Y _{ n}\left (Y _{n} -\theta _{n}\right ) - 1\right )\left (A_{m}^{-1}Y _{ m}\left (Y _{n} -\theta _{n}\right )\right )\right )\left [\boldsymbol{Z}\right ]_{in}^{3}\left [\boldsymbol{Z}\right ]_{ im} {}\\ & & +\sum _{n=1}^{p}\sum _{ m\neq n^{{\prime}},n^{{\prime}}\neq n,m\neq n}\mathbb{E}\left (\left (A_{m}^{-1}Y _{ m}\left (Y _{n} -\theta _{n}\right )\right )\left (A_{n^{{\prime}}}^{-1}Y _{ n^{{\prime}}}\left (Y _{n} -\theta _{n}\right )\right )\right )\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}\left [\boldsymbol{Z}\right ]_{in^{{\prime}}} {}\\ & =& \sum _{n=1}^{p}\dfrac{2A_{n} +\theta _{ n}^{2}} {A_{n}} \left [\boldsymbol{Z}\right ]_{in}^{4} +\sum _{ n=1}^{p}\sum _{ m\neq n}\dfrac{A_{n}A_{m} + A_{n}\theta _{m}^{2}} {A_{m}^{2}} \left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} +\sum _{ n=1}^{p}\sum _{ m\neq n}\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} {}\\ & & +2\sum _{n=1}^{p}\sum _{ m\neq n} \dfrac{\theta _{n}\theta _{m}} {A_{m}}\left [\boldsymbol{Z}\right ]_{in}^{3}\left [\boldsymbol{Z}\right ]_{ im} +\sum _{ n=1}^{p}\sum _{ m\neq n^{{\prime}},n^{{\prime}}\neq n,m\neq n} \dfrac{A_{n}\theta _{m}\theta _{n^{{\prime}}}} {A_{m}A_{n^{{\prime}}}}\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}\left [\boldsymbol{Z}\right ]_{in^{{\prime}}} {}\\ & =& \sum _{n,m=1}^{p} \dfrac{A_{n}} {A_{m}}\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} +\sum _{ n,m=1}^{p}\left [\boldsymbol{Z}\right ]_{ in}^{2}\left [\boldsymbol{Z}\right ]_{ im}^{2} +\sum _{ n,m,n^{{\prime}}=1}^{p} \dfrac{A_{n}\theta _{m}\theta _{n^{{\prime}}}} {A_{m}A_{n^{{\prime}}}}\left [\boldsymbol{Z}\right ]_{in}^{2}\left [\boldsymbol{Z}\right ]_{ im}\left [\boldsymbol{Z}\right ]_{in^{{\prime}}}.{}\\ \end{array}$$
Using matrix notation, we can reexpress the above equation as
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\left [\boldsymbol{\varXi }\right ]_{ii}^{2}\right )& =& \left [\boldsymbol{ZAZ}^{T}\right ]_{ ii}\left [\boldsymbol{ZA}^{-1}\boldsymbol{Z}^{T}\right ]_{ ii} + \left [\boldsymbol{ZZ}^{T}\right ]_{ ii}^{2} + \left [\boldsymbol{ZAZ}^{T}\right ]_{ ii}\left [\boldsymbol{ZA}^{-1}\boldsymbol{\theta }\right ]_{ i}^{2} {}\\ & \leq & \mathrm{tr}\left (\boldsymbol{ZAZ}^{T}\right )\mathrm{tr}\left (\boldsymbol{ZA}^{-1}\boldsymbol{Z}^{T}\right ) +\mathrm{ tr}\left (\boldsymbol{ZZ}^{T}\right )^{2} +\mathrm{ tr}\left (\boldsymbol{ZAZ}^{T}\right )\mathrm{tr}\left (\boldsymbol{\theta }^{T}\boldsymbol{A}^{-1}\boldsymbol{Z}^{T}\boldsymbol{ZA}^{-1}\boldsymbol{\theta }\right ) {}\\ & =& \mathrm{tr}\left (\boldsymbol{WXAX}^{T}\right )\mathrm{tr}\left (\boldsymbol{WXA}^{-1}\boldsymbol{X}^{T}\right ) +\mathrm{ tr}\left (\boldsymbol{WXX}^{T}\right )^{2} {}\\ & & +\mathrm{tr}\left (\boldsymbol{WXAX}^{T}\right )\mathrm{tr}\left (\boldsymbol{\beta }^{T}\left (\boldsymbol{XA}^{-1}\boldsymbol{X}^{T}\right )\boldsymbol{W}\left (\boldsymbol{XA}^{-1}\boldsymbol{X}^{T}\right )\boldsymbol{\beta }\right ), {}\\ \end{array}$$
which is \(O\left (p\right )O\left (p\right ) + O\left (p\right )^{2} + O\left (p\right )O\left (p^{2}\right ) = O\left (p^{3}\right )\) by conditions \(\left (\mathrm{D}\right )\)-\(\left (\mathrm{F}\right )\). Note also that condition \(\left (\mathrm{F}\right )\) implies
$$\displaystyle{d_{k} \leq \sum _{i=1}^{k}d_{ i} =\mathrm{ tr}\left (\boldsymbol{W}^{-1/2}\boldsymbol{V W}^{-1/2}\right ) =\mathrm{ tr}\left (\boldsymbol{W}^{-1}\boldsymbol{V }\right ) =\mathrm{ tr}\left (\boldsymbol{W}^{-1}(\boldsymbol{XA}^{-1}\boldsymbol{X}^{T})^{-1}\right ) = O\left (p^{-1}\right ).}$$
Therefore, we have
$$\displaystyle{\mathbb{E}\left (d_{k}^{2}\left [\boldsymbol{\varXi }\right ]_{ ii}^{2}/p^{2}\right ) = O\left (p^{-2}\right )O\left (p^{3}\right )/p^{2} = O\left (p^{-1}\right ) \rightarrow 0,}$$
which proves
$$\displaystyle{\sup \limits _{0\leq \lambda \leq \infty }\left \vert \left (\mathrm{II}\right )\right \vert \rightarrow 0\text{ in }L^{2},\ \ \ \ \text{as }p \rightarrow \infty.}$$
To prove the uniform convergence of \(\left (\mathrm{III}\right )_{1}\) to zero in L
1, we note that
$$\displaystyle\begin{array}{rcl} \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{1}\right \vert & =& \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \boldsymbol{\mu }^{T}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \vert = \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \boldsymbol{\mu }^{T}\boldsymbol{P}_{\boldsymbol{ X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \vert {}\\ & \leq & \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \Vert \boldsymbol{\mu }\right \Vert \times \left \Vert \boldsymbol{P}_{\boldsymbol{X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \Vert = \dfrac{2} {p}Mp^{\kappa }\left \Vert \boldsymbol{Y }\right \Vert \times \left \Vert \boldsymbol{P}_{\boldsymbol{X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \Vert, {}\\ \end{array}$$
so by Cauchy-Schwarz inequality
$$\displaystyle{ \mathbb{E}\left (\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{1}\right \vert \right ) \leq 2Mp^{\kappa -1}\sqrt{\mathbb{E}\left (\left \Vert \boldsymbol{Y } \right \Vert ^{2 } \right )}\sqrt{\mathbb{E}\left (\left \Vert \boldsymbol{P}_{\boldsymbol{ X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \Vert ^{2}\right )}. }$$
(21)
Under Model II, \(\boldsymbol{\theta }=\boldsymbol{ X}^{T}\boldsymbol{\beta }\), so it follows that \(\sum _{i=1}^{p}\theta _{i}^{2} = \left \Vert \boldsymbol{\theta }\right \Vert ^{2} =\mathrm{ tr}\left (\boldsymbol{\beta \beta }^{T}\boldsymbol{XX}^{T}\right ) = O\left (p\right )\) by condition \(\left (\mathrm{E}\right )\). Hence \(\sqrt{\mathbb{E}\left (\left \Vert \boldsymbol{Y } \right \Vert ^{2 } \right )} = \sqrt{\sum \nolimits _{i=1 }^{p }\left (\theta _{i }^{2 } + A_{i } \right )} = O\left (p^{1/2}\right )\). For the second term on the right-hand side of (21), note that
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\left \Vert \boldsymbol{P}_{\boldsymbol{X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right \Vert ^{2}\right )& =& \mathbb{E}\left (\mathrm{tr}\left (\boldsymbol{P}_{\boldsymbol{ X}}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right )\right ) {}\\ & =& \mathrm{tr}\left (\boldsymbol{P}_{\boldsymbol{X}}\boldsymbol{A}\right ) =\mathrm{ tr}\left (\left (\boldsymbol{XX}^{T}\right )^{-1}\boldsymbol{XAX}^{T}\right ) = O\left (1\right ) {}\\ \end{array}$$
by conditions \(\left (\mathrm{D}\right )\) and \(\left (\mathrm{E}\right )\). Thus, in aggregate, we have
$$\displaystyle{\mathbb{E}\left (\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{1}\right \vert \right ) \leq 2Mp^{\kappa -1}O\left (p^{1/2}\right )O\left (1\right ) = o\left (1\right ).}$$
We finally consider the \(\left (\mathrm{III}\right )_{2}\) term. We have
$$\displaystyle\begin{array}{rcl} \sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{2}\right \vert & =& \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sup \limits _{0\leq \lambda \leq \infty }\left \vert \sum _{i=1}^{k} \dfrac{\lambda d_{i}} {\lambda +d_{i}}\left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\boldsymbol{Z}^{T}\right ]_{ ii}\right \vert {}\\ & \leq & \dfrac{2} {p}\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\max \limits _{1\leq j\leq k}\left \vert \sum _{i=j}^{k}d_{ k}\left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\boldsymbol{Z}^{T}\right ]_{ ii}\right \vert {}\\ & \leq & \dfrac{2d_{k}} {p} \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sum _{i=1}^{k}\left \vert \left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\boldsymbol{Z}^{T}\right ]_{ ii}\right \vert {}\\ & =& \dfrac{2d_{k}} {p} \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sum _{i=1}^{k}\left \vert \left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\right ]_{ i}\left [\boldsymbol{Z}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right ]_{i}\right \vert {}\\ & \leq & \dfrac{2d_{k}} {p} \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sqrt{\sum _{i=1 }^{k }\left [\boldsymbol{ZA}^{-1 } \boldsymbol{\mu } \right ] _{i }^{2}} \times \sqrt{\sum _{i=1 }^{k }\left [\boldsymbol{Z}\left (\boldsymbol{Y }-\boldsymbol{\theta } \right ) \right ] _{i }^{2}}. {}\\ \end{array}$$
Thus, by Cauchy-Schwarz inequality
$$\displaystyle{\mathbb{E}\left (\sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{2}\right \vert \right ) \leq \dfrac{2d_{k}} {p} \sqrt{\mathbb{E}\left (\sup \limits _{\boldsymbol{\mu }\in \mathcal{L} } \sum _{i=1 }^{k }\left [\boldsymbol{ZA}^{-1 } \boldsymbol{\mu } \right ] _{i }^{2 } \right )} \times \sqrt{\mathbb{E}\left (\sum _{i=1 }^{k }\left [\boldsymbol{Z}\left (\boldsymbol{Y }-\boldsymbol{\theta } \right ) \right ] _{i }^{2 } \right )}.}$$
Note that
$$\displaystyle\begin{array}{rcl} & & \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sum _{i=1}^{k}\left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\right ]_{ i}^{2} =\sup \limits _{\boldsymbol{\mu } \in \mathcal{L}}\sum _{i=1}^{k}\left (\sum _{ m=1}^{p}\left [\boldsymbol{ZA}^{-1}\right ]_{ im}\left [\boldsymbol{\mu }\right ]_{m}\right )^{2} {}\\ & & \leq \sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sum _{i=1}^{k}\left (\sum _{ m=1}^{p}\left [\boldsymbol{ZA}^{-1}\right ]_{ im}^{2} \times \sum _{ m=1}^{p}\left [\boldsymbol{\mu }\right ]_{ m}^{2}\right ) =\sup \limits _{\boldsymbol{\mu } \in \mathcal{L}}\sum _{i=1}^{k}\left (\left [\boldsymbol{ZA}^{-2}\boldsymbol{Z}^{T}\right ]_{ ii}\left \Vert \boldsymbol{\mu }\right \Vert ^{2}\right ) {}\\ & & =\mathrm{ tr}\left (\boldsymbol{ZA}^{-2}\boldsymbol{Z}^{T}\right )\sup \limits _{\boldsymbol{\mu } \in \mathcal{L}}\left \Vert \boldsymbol{\mu }\right \Vert ^{2} =\mathrm{ tr}\left (\boldsymbol{WXA}^{-2}\boldsymbol{X}^{T}\right )\left (Mp^{\kappa }\left \Vert \boldsymbol{Y }\right \Vert \right )^{2} = o\left (p^{2}\right )\left \Vert \boldsymbol{Y }\right \Vert ^{2}, {}\\ \end{array}$$
where the last equality uses condition \(\left (\mathrm{G}\right )\). Thus,
$$\displaystyle{\mathbb{E}\left (\sup \limits _{\boldsymbol{\mu }\in \mathcal{L}}\sum _{i=1}^{k}\left [\boldsymbol{ZA}^{-1}\boldsymbol{\mu }\right ]_{ i}^{2}\right ) = o\left (p^{3}\right ).}$$
Also note that
$$\displaystyle\begin{array}{rcl} \mathbb{E}\left (\sum _{i=1}^{k}\left [\boldsymbol{Z}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\right ]_{ i}^{2}\right )& =& \mathbb{E}\left (\mathrm{tr}\left (\boldsymbol{Z}^{T}\boldsymbol{Z}\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )\left (\boldsymbol{Y }-\boldsymbol{\theta }\right )^{T}\right )\right ) {}\\ & =& \mathrm{tr}\left (\boldsymbol{Z}^{T}\boldsymbol{ZA}\right ) =\mathrm{ tr}\left (\boldsymbol{WXAX}^{T}\right ) = O\left (p\right ) {}\\ \end{array}$$
by condition \(\left (\mathrm{D}\right )\). Recall that \(d_{k} = O\left (p^{-1}\right )\) by condition \(\left (\mathrm{F}\right )\). It follows that
$$\displaystyle{\mathbb{E}\left (\sup \limits _{0\leq \lambda \leq \infty,\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \left (\mathrm{III}\right )_{2}\right \vert \right ) \leq \dfrac{2} {p}O\left (p^{-1}\right )o\left (p^{3/2}\right )O\left (p^{1/2}\right ) = o\left (1\right ),}$$
which completes our proof.
Proof of Lemma 2
The fact that \(\boldsymbol{\hat{\mu }}^{\mathrm{OLS}} \in \mathcal{L}\) is trivial as
$$\displaystyle{\boldsymbol{\hat{\mu }}^{\mathrm{OLS}} =\boldsymbol{ X}^{T}\left (\boldsymbol{XX}^{T}\right )^{-1}\boldsymbol{XY } =\boldsymbol{ P}_{\boldsymbol{ X}}\boldsymbol{Y },}$$
while the projection matrix \(\boldsymbol{P}_{\boldsymbol{X}}\) has induced matrix 2-norm \(\left \Vert \boldsymbol{P}_{\boldsymbol{X}}\right \Vert _{2} = 1\). Thus, \(\left \Vert \boldsymbol{\hat{\mu }}^{\mathrm{OLS}}\right \Vert \leq \left \Vert \boldsymbol{P}_{\boldsymbol{X}}\right \Vert _{2}\left \Vert \boldsymbol{Y }\right \Vert = \left \Vert \boldsymbol{Y }\right \Vert\). For \(\boldsymbol{\hat{\mu }}^{\mathrm{WLS}}\), note that
$$\displaystyle\begin{array}{rcl} \boldsymbol{\hat{\mu }}^{\mathrm{WLS}}& =& \boldsymbol{X}^{T}\left (\boldsymbol{XA}^{-1}\boldsymbol{X}^{T}\right )^{-1}\boldsymbol{XA}^{-1}\boldsymbol{Y } {}\\ & =& \boldsymbol{A}^{1/2}\left (\boldsymbol{XA}^{-1/2}\right )^{T}\left (\boldsymbol{XA}^{-1/2}\left (\boldsymbol{XA}^{-1/2}\right )^{T}\right )^{-1}\left (\boldsymbol{XA}^{-1/2}\right )\boldsymbol{A}^{-1/2}\boldsymbol{Y } {}\\ & =& \boldsymbol{A}^{1/2}\left (\boldsymbol{P}_{\boldsymbol{ XA}^{-1/2}}\right )\boldsymbol{A}^{-1/2}\boldsymbol{Y }, {}\\ \end{array}$$
where \(\boldsymbol{P}_{\boldsymbol{XA}^{-1/2}}\) is the ordinary projection matrix onto the row space of \(\boldsymbol{XA}^{-1/2}\) and has induced matrix 2-norm 1. It follows
$$\displaystyle{\left \Vert \boldsymbol{\hat{\mu }}^{\mathrm{WLS}}\right \Vert \leq \left \Vert \boldsymbol{A}^{1/2}\right \Vert _{ 2}\left \Vert \boldsymbol{P}_{\boldsymbol{A}^{-1/2}\boldsymbol{X}}\right \Vert _{2}\left \Vert \boldsymbol{A}^{-1/2}\right \Vert _{ 2}\left \Vert \boldsymbol{Y }\right \Vert =\max \limits _{1\leq i\leq p}A_{i}^{1/2} \times \max \limits _{ 1\leq i\leq p}A_{i}^{-1/2} \times \left \Vert \boldsymbol{Y }\right \Vert.}$$
Condition \(\left (\mathrm{A}\right )\) gives
$$\displaystyle{\max \limits _{1\leq i\leq p}A_{i}^{1/2} = (\max \limits _{ 1\leq i\leq p}A_{i}^{2})^{1/4} \leq (\sum _{ i=1}^{p}A_{ i}^{2})^{1/4} = O\left (p^{1/4}\right ).}$$
Similarly, condition \(\left (\mathrm{A}^{{\prime}}\right )\) gives
$$\displaystyle{\max \limits _{1\leq i\leq p}A_{i}^{-1/2} = (\max \limits _{ 1\leq i\leq p}A_{i}^{-2-\delta })^{1/\left (4+2\delta \right )} \leq (\sum _{ i=1}^{p}A_{ i}^{-2-\delta })^{1/\left (4+2\delta \right )} = O\left (p^{1/\left (4+2\delta \right )}\right ).}$$
We then have proved that
$$\displaystyle{\left \Vert \boldsymbol{\hat{\mu }}^{\mathrm{WLS}}\right \Vert \leq O\left (p^{1/4}\right )O\left (p^{1/\left (4+2\delta \right )}\right )\left \Vert \boldsymbol{Y }\right \Vert = O\left (p^{\kappa }\right )\left \Vert \boldsymbol{Y }\right \Vert.}$$
Proof of Theorem 2
To prove the first assertion, note that
$$\displaystyle{\mathrm{URE}\left (\boldsymbol{\hat{B} }^{\mathrm{URE}},\boldsymbol{\hat{\mu }}^{\mathrm{URE}}\right ) \leq \mathrm{ URE}\left (\boldsymbol{\tilde{B}}^{\mathrm{OL}},\boldsymbol{\tilde{\mu }}^{\mathrm{OL}}\right )}$$
by the definition of \(\boldsymbol{\hat{B} }^{\mathrm{URE}}\) and \(\boldsymbol{\hat{\mu }}^{\mathrm{URE}}\), so Theorem 1 implies that
$$\displaystyle\begin{array}{rcl} & & l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right ) \\ & & \leq l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right ) -\mathrm{ URE}\left (\boldsymbol{\hat{B} }^{\mathrm{URE}},\boldsymbol{\hat{\mu }}^{\mathrm{URE}}\right ) +\mathrm{ URE}\left (\boldsymbol{\tilde{B}}^{\mathrm{OL}},\boldsymbol{\tilde{\mu }}^{\mathrm{OL}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right ) \\ & & \leq 2\sup \limits _{\boldsymbol{B}\in \mathcal{B},\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\mu }\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\mu }}\right )\right \vert \mathop{ \rightarrow }\limits_{ p \rightarrow \infty }0\text{ in }L^{1}\text{ and in probability,}{}\end{array}$$
(22)
where the second inequality uses the condition that \(\boldsymbol{\hat{\mu }}^{\mathrm{URE}} \in \mathcal{L}\). Thus, for any ε > 0,
$$\displaystyle\begin{array}{rcl} & & \mathbb{P}\left (l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right ) \geq l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right )+\epsilon \right ) {}\\ & & \leq \mathbb{P}\left (2\sup \limits _{\boldsymbol{B}\in \mathcal{B},\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\mu }\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\mu }}\right )\right \vert \geq \epsilon \right ) \rightarrow 0. {}\\ \end{array}$$
To prove the second assertion, note that
$$\displaystyle{l_{p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right ) \leq l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right )}$$
by the definition of \(\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\) and the condition \(\boldsymbol{\hat{\mu }}^{\mathrm{URE}} \in \mathcal{L}\). Thus, taking expectations on Eq. (22) easily gives the second assertion.
Proof of Corollary 1
Simply note that
$$\displaystyle{l_{p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right ) \leq l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{\hat{B} }_{p},\boldsymbol{\hat{\mu }}_{p} }\right )}$$
by the definition of \(\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\). Thus,
$$\displaystyle{l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{\hat{B} }_{p},\boldsymbol{\hat{\mu }}_{p} }\right ) \leq l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\mathrm{URE}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\tilde{\theta }}^{\mathrm{OL}}\right ).}$$
Then Theorem 2 clearly implies the desired result.
Proof of Theorem 3
We observe that
$$\displaystyle\begin{array}{rcl} \mathrm{URE}_{\boldsymbol{M}}\left (\boldsymbol{B}\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\hat{\mu }}^{\boldsymbol{M}} }\right )& =& \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\hat{\mu }}^{\boldsymbol{M}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\hat{\mu }}^{\boldsymbol{M}} }\right ) {}\\ & & +\dfrac{2} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right ). {}\\ \end{array}$$
Since
$$\displaystyle\begin{array}{rcl} \sup \limits _{\boldsymbol{B}\in \mathcal{B}}\left \vert \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\hat{\mu }}^{\boldsymbol{M}}\right ) - l_{ p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\hat{\mu }}^{\boldsymbol{M}} }\right )\right \vert & \leq & \sup \limits _{\boldsymbol{B}\in \mathcal{B},\;\boldsymbol{\mu }\in \mathcal{L}}\left \vert \mathrm{URE}\left (\boldsymbol{B},\boldsymbol{\mu }\right ) - l_{p}\left (\boldsymbol{\theta },\boldsymbol{\hat{\theta }}^{\boldsymbol{B},\boldsymbol{\mu }}\right )\right \vert {}\\ & \rightarrow & 0\text{ in }L^{1} {}\\ \end{array}$$
by Theorem 1, we only need to show that
$$\displaystyle{\sup \limits _{\boldsymbol{B}\in \mathcal{B}}\left \vert \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right )\right \vert \rightarrow 0\ \ \ \ \text{as }p \rightarrow \infty.}$$
Under Model I,
$$\displaystyle\begin{array}{rcl} \mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right )& =& \sum _{i=1}^{p} \frac{A_{i}} {A_{i}+\lambda }[\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{A}]_{ii} {}\\ & \leq & \left (\sum _{i=1}^{p}( \frac{A_{i}} {A_{i}+\lambda })^{2} \times \sum _{ i=1}^{p}[\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}]_{ii}^{2}\right )^{1/2} {}\\ & \leq & \left (p \times \sum _{i=1}^{p}\left [\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right ]_{ii}^{2}\right )^{1/2} {}\\ & =& p^{1/2}\sqrt{\mathrm{tr }\left (\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}(\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{A})^{T}\right )},\ \ \ \ \text{for all }\lambda \geq 0, {}\\ \end{array}$$
but \(\mathrm{tr}\left (\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{AAP}_{\boldsymbol{M},\boldsymbol{X}}^{T}\right ) =\mathrm{ tr}\left (\boldsymbol{X}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMA}^{2}\boldsymbol{MX}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{X}\right )\)
\(=\mathrm{ tr}\left (\left (\boldsymbol{XMX}^{T}\right )^{-1}(\boldsymbol{XMA}^{2}\boldsymbol{MX}^{T})\left (\boldsymbol{XMX}^{T}\right )^{-1}(\boldsymbol{XX}^{T})\right ) = O(1)\) by (13) and condition (E). Therefore,
$$\displaystyle{\sup \limits _{\boldsymbol{B}\in \mathcal{B}}\left \vert \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right )\right \vert = \dfrac{1} {p}O\left (p^{1/2}\right )O(1) = O(\,p^{-1/2}) \rightarrow 0.}$$
Under Model II, \(\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1} =\boldsymbol{ I}_{p} -\lambda \boldsymbol{ Z}^{T}\left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\), where
\(\boldsymbol{W}^{-1/2}\boldsymbol{V W}^{-1/2} =\boldsymbol{ U\varLambda }\boldsymbol{U}^{T}\), \(\boldsymbol{\varLambda }=\mathrm{ diag}\left (d_{1},\ldots,d_{k}\right )\) with d
1 ≤ ⋯ ≤ d
k
, and \(\boldsymbol{Z} =\boldsymbol{ U}^{T}\boldsymbol{W}^{1/2}\boldsymbol{X}\) as defined in the proof of Theorem 1. Thus,
$$\displaystyle{\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right ) =\mathrm{ tr}\left (\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{A}\right ) -\mathrm{ tr}\left (\lambda \boldsymbol{Z}^{T}\left (\lambda \boldsymbol{I}_{ k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right ).}$$
We know that \(\mathrm{tr}\left (\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{A}\right ) =\mathrm{ tr}\left (\left (\boldsymbol{XMX}^{T}\right )^{-1}(\boldsymbol{XMAX}^{T})\right ) = O(1)\) by the assumption (13). \(\mathrm{tr}\left (\lambda \boldsymbol{Z}^{T}\left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\boldsymbol{A}\right ) =\mathrm{ tr}\left (\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\right.\boldsymbol{P}_{\boldsymbol{M},\boldsymbol{X}}\)
\(\left.\boldsymbol{AZ}^{T}\right )\)
\(=\mathrm{ tr}\left (\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\boldsymbol{Z}\boldsymbol{A}^{-1}\boldsymbol{X}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMAZ}^{T}\right )\). The Cauchy-Schwarz inequality for matrix trace gives
$$\displaystyle\begin{array}{rcl} & & \left \vert \mathrm{tr}\left (\left (\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda }\right )\left (\boldsymbol{ZA}^{-1}\boldsymbol{X}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMAZ}^{T}\right )\right )\right \vert {}\\ & & \leq \mathrm{ tr}^{1/2}\left ((\lambda \left (\lambda \boldsymbol{I}_{ k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda })^{2}\right ) {}\\ & & \quad \times \mathrm{ tr}^{1/2}\left (\boldsymbol{ZA}^{-1}\boldsymbol{X}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMAZ}^{T}\boldsymbol{ZAMX}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XA}^{-1}\boldsymbol{Z}^{T}\right ). {}\\ \end{array}$$
Since
$$\displaystyle\begin{array}{rcl} \mathrm{tr}\left ((\lambda \left (\lambda \boldsymbol{I}_{k}+\boldsymbol{\varLambda }\right )^{-1}\boldsymbol{\varLambda })^{2}\right ) =\sum _{ i=1}^{k}\left ( \dfrac{\lambda d_{i}} {\lambda +d_{i}}\right )^{2} \leq kd_{ k}^{2} = O\left (p^{-2}\right )\ \ \ \ \text{for all }\lambda \geq 0& & {}\\ \end{array}$$
as shown in the proof of Theorem 1 and
$$\displaystyle\begin{array}{rcl} & & \mathrm{tr}\left (\boldsymbol{ZA}^{-1}\boldsymbol{X}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMAZ}^{T}\boldsymbol{ZAMX}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XA}^{-1}\boldsymbol{Z}^{T}\right ) {}\\ & & =\mathrm{ tr}\left (\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XMAZ}^{T}\boldsymbol{ZAMX}^{T}\left (\boldsymbol{XMX}^{T}\right )^{-1}\boldsymbol{XA}^{-1}\boldsymbol{Z}^{T}\boldsymbol{ZA}^{-1}\boldsymbol{X}^{T}\right ) {}\\ & & =\mathrm{ tr}\left (\left (\boldsymbol{XMX}^{T}\right )^{-1}(\boldsymbol{XMAX}^{T})\boldsymbol{W}(\boldsymbol{XAMX}^{T})\left (\boldsymbol{XMX}^{T}\right )^{-1}(\boldsymbol{XA}^{-1}\boldsymbol{X}^{T})\boldsymbol{W}(\boldsymbol{XA}^{-1}\boldsymbol{X}^{T})\right ) {}\\ & & = O(\,p^{2}) {}\\ \end{array}$$
from (13) and condition (F), we have
$$\displaystyle{\sup \limits _{\boldsymbol{B}\in \mathcal{B}}\left \vert \dfrac{1} {p}\mathrm{tr}\left (\boldsymbol{A}\left (\boldsymbol{A} +\boldsymbol{ B}\right )^{-1}\boldsymbol{P}_{\boldsymbol{ M},\boldsymbol{X}}\boldsymbol{A}\right )\right \vert = \dfrac{1} {p}\left (O(1) + \sqrt{O\left (p^{-2 } \right ) \times O(\,p^{2 } )}\right ) = O(\,p^{-1}) \rightarrow 0.}$$
This completes our proof of (14). With this established, the rest of the proof is identical to that of Theorem 2 and Corollary 1.