Abstract
This paper is concerned with the error density estimation in high-dimensional sparse linear model, where the number of variables may be larger than the sample size. An improved two-stage refitted cross-validation procedure by random splitting technique is used to obtain the residuals of the model, and then traditional kernel density method is applied to estimate the error density. Under suitable sparse conditions, the large sample properties of the estimator including the consistency and asymptotic normality, as well as the law of the iterated logarithm are obtained. Especially, we gave the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that our error density estimator has a good performance. A real data example is presented to illustrate our methods.
Similar content being viewed by others
References
Bai, Z., Yin, Y. (1993). Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix. Annals of Probability, 21(3), 1275–1294.
Candes, E., Tao, T. (2007). The Danzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.
Chai, G., Li, Z. (1993). Asymptotic theory for estimation of error distribution in linear model. Science in China: Series A, 36, 408–419.
Cheng, F. (2005). Asymptotic distributions of error density and distribution function estimators in nonparametric regression. Journal of Statistical Planning and Inference, 128, 327–349.
Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R., Nishimura, D., Braun, T. A., Kim, K. Y., Huang, J., Elbedour, K., Carmi, R., Slusarski, D. C., Casavant, T. L., Stone, E. M., Sheffield, V. C. (2006). Homozygosity mapping with SNP arrays identifies a novel gene for Bardet–Biedl syndrome (BBS10). Proceedings of the National Academy of Sciences of the United States of America, 103, 6287–6292.
Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.
Fan, J., Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.
Fan, J., Guo, S., Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B, 74, 37–65.
Hall, P. (1981). Laws of the iterated logarithm for nonparametric density estimators. Probability Theory and Related Fields, 56, 47–61.
Huang, J., Ma, S., Zhang, C. H. (2008). Adaptive lasso for sparse high dimensional regression. Statistica Sinica, 18, 1603–1618.
Liang, H., Hardle, W. (1999). Large sample theory of the estimation of the error distribution for a semiparametric model. Communication in Statistics Theory and Methods, 28, 2025–2036.
Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.
Marčenko, V. A., Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1, 507–536.
Meinshausen, N., Meier, L., Bühlmann, P. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association, 104, 1671–1681.
Pollard, D. (1984). Convergence of stochastic processes. New York: Springer.
Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25, 303–325.
Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudt- son, K. L., Dorrance, A. M., Dibona, G. F., Huang, J., Casavant, T. L. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America, 103, 14429–14434.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
Yang, Y. (1997). Large sample properties of estimation of the error distribution in nonparametric regression. Acta Scientiarum Naturalium Universitatis Pekinensis, 33, 298–304.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942.
Zhong, W. (2014). Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Mathematical Sinica, 30, 1885–1896.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Acknowledgements
This project was supported partly by the National Natural Science Foundation of China (Grant Nos. 11071022, 11231010, 11471223) and “Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds”(No. 025185305000/204). The authors thank the Editor, the AE and reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs of main results
Appendix: Proofs of main results
Lemma 1
(Theorem 37 in Chapter 2, Pollard 1984) For each n, let \(\mathscr {F}_n\) be a permissible class of functions (Definition 1, in Appendix C, Pollard 1984) whose covering numbers (Definition 23, in Chapter 2, Pollard 1984) satisfy \(\sup _{Q}N_1(\epsilon ,Q,\mathscr {F}_n)\le A\epsilon ^{-W},0<\epsilon <1\), with constants A and W not depending on n. Let \({\alpha _n}\) be a non-increasing sequence of positive numbers for which \(n\delta _n^{2}\alpha _n^{2}\gg \log n\). If \(|f|\le 1\), \(\sqrt{P_0f^{2}}\le \delta _n\), \(\forall f\in \mathscr {F}_n\), then
where \(P_0f=\int fdP_0\), \(P_nf=\int fdP_n=\frac{1}{n}\sum _{i=1}^{n}f(X_i)\).
For simplicity, we only prove the large sample properties of \(\hat{f}_{n_1}(x)\) about f(x). For \(j=1,2\), we use n, \(\hat{s}\) and \(\hat{M}\) instead of \(n_j\), \(\hat{s}_j\) and \(\hat{M}_j\), respectively, to make the proof look more concise and comfortable, where \(\hat{s}_j\) and \(\hat{M}_j\) is defined in condition \({\mathbf{C_0}}\). By the screening consistency, we have \(\hat{s}=O(n^\gamma ), 0\le \gamma <1\).
Lemma 2
Suppose the assumptions \(\mathbf{C}_2\) and \(\mathbf{C}_3\) hold, then we have \(\max _{1\le i\le n}|\hat{e}_i-e_i|=o(c_n)\,\, a.s.\), where \(c_n=n^{-\frac{1}{2}(1-\frac{1}{k}-\gamma )}\, \log n \) with \(0\le \gamma <1-\frac{1}{k}\).
Proof
Denote \(X_{\hat{M}} =(X_{1\hat{M}},\ldots , X_{n\hat{M}})^\mathrm{T}\), where \(X_{i\hat{M}}=(X_{ij_1},\ldots , X_{ij_{\hat{s}}})^\mathrm{T}\) with \(\hat{M}=\{j_1,\ldots , j_{\hat{s}}\}\), \( (1\le j_1<\cdots < j_{\hat{s}}\le p)\). By the definition of P, we have \(P_{ij}=X_{i\hat{M}}^\mathrm{T}(X_{\hat{M}}^\mathrm{T}X_{\hat{M}})^{-1}X_{j\hat{M}}\) and \(\hbox {Var}(\sum _{j=1}^{n}P_{ij}e_j|X_{\hat{M}})=\sigma ^2P_{ii}\). To prove Lemma 2, the first is to compute the order of \(\max _{1\le i\le n}P_{ii}\) ( For more detailed proof, please see the supplementary material). By the definition of \(P_{ii}\) and the condition \(\mathbf{C}_2\), we have
which is equivalent to compute the order of \(\max _{1\le i\le n}\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2\). For the sake of simplicity, we may as well set \(Z_\mathrm{in}=\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2\) due to \(\hat{s}=O(n^\gamma )\) with \(0\le \gamma <1, i=1,\ldots ,n\). Since \(Z_\mathrm{in}\) is i.i.d. random variable sequence, then
where \(A_0>2\). \(\forall x>0\), denote \(h(x)=x^{2k}(\log ^{+}(x))^2\), then we have
For fixed k and the condition \(\sup _{1\le j\le p}E[X_{1j}^{4k}(\log ^{+}(X_{1j}^2))^2]\le C<\infty \), then
thereby we have \(\max _{1\le i\le n}Z_\mathrm{in}=O(n^{1/k})\,a.s.\). By Cauchy’s inequality, we have \(\max _{i,j}|P_{ij}|\le \max _{1\le i,j\le n}\sqrt{P_{ii}P_{jj}} \le \max _{1\le i\le n}P_{ii}\). In addition, by the condition \(C_3\), we have that \(\max _{1\le i\le n}|e_i|\le c_0\log n \) for some constant \(c_0>0\). Denote \(e_{1i} = e_iI\{|e_i|\le c_0\log n \}\) and \(e_{2i} = e_iI\{|e_i|> c_0\log n \}\), then \( E(e_{2j})=o(n^{-3})\) and
Since for any \(\epsilon >0\), \( \{\max _{1\le i\le n} \sum _{j=1}^n P_{ij}e_{2j}>\epsilon c_n \} \subset \{ \max _{1\le i\le n} |e_i|>c_0\log n \} \), then \( \max _{1\le i \le n} \sum _{j=1}^n P_{ij}e_{2j}= o(c_n)\ \ a.s.\). Therefore,
Furthermore, for some constant \(c_1>0\), by Bernstein’s inequality, we have
Let \(t=\epsilon t_n \) with \(\epsilon >0\), \( t_n=\sqrt{\hat{s}}n^{-\alpha }\log n\), \( \alpha =\frac{3}{2}-\frac{1}{2k}\), then
as n is large enough. It can be derived by the Borel–Cantelli lemma that
Then, we can easily know \(\max _{1\le i \le n}|\hat{e}_i-e_i|=o(c_n)\; \ a.s.\) by (5) and (6). \(\square \)
Lemma 3
Suppose the assumptions \(\mathbf{C}_0-C_3\) hold and \(\gamma <1-1/k\), then we have
- (i).
If \(f(\cdot )\) is continuous at u, then \( \hat{f}_n(u)-f_n(u) =o(1) \, \, a.s. \)
- (ii).
If \(f(\cdot )\) is continuous uniformly, then
$$\begin{aligned} \sup _u |\hat{f}_n(u)-f_n(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{ \frac{c_n}{h_n} }\right) \right) +2\sup _u[I_{n+}(u) + I_{n-}(u)] \,\, a.s., \end{aligned}$$where \(I_{n+}(u)=\int K(y)|f(u+Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y\) and \(I_{n-}(u)=\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y\) for some constant \(C>0\).
Proof
Since \(K(\cdot )\) is a bounded variation function, then K can be written as \(K=K_1-K_2\) which \(K_1\) and \(K_2\) are two monotonically increasing functions. By the definitions of \(\hat{f}_n(u)\) and \(f_n(u)\), we have
On the set \( \{\hat{e}_i|\,\max |\hat{e}_i-e_i|\le Cc_n\}\) with constant \(C>0\), it can be derived that \(I_{2n}(u)\le \varDelta _{1n}(u)\le I_{1n}(u)\) due to the fact that \(K_1\) is a monotonically increasing function, where
Let \(\mathscr {G}_n =:\{\,g_{nu}=K_1(\frac{(e+Cc_n)-u}{h_n})-K_1(\frac{e-u}{h_n}): u\in R\}\) ( For more details about \(\mathscr {G}_n\), please refer to supplementary material ), then \(\mathscr {G}_n\) is a class of permissible functions with a polynomial discriminant and
Since K has compact support, we suppose that \(K_j\) has compact support \([-M, M]\) with \(M>0\) and \(K_j'\) is bounded except one jump point without loss of generality. For fixed u, we have
and
- (i).
If \(f(\cdot )\) is continuous at u and \(h_n=o(1)\), then \(f(u+h_ny) \le f(u)+1\) for \(|y|\le M + Cc_n/h_n\), and \(P_0g_{n,u}^{2}\le 4(f(u)+1)MC_1^2h_n = O(h_n)\) for \(c_n/h_n \ge 1\). If \(c_n/h_n < 1\), let \(y_1\) be a jump point of \(K_1\) in \([-M, M]\), \(B=:[y_1-Cc_n/h_n, y_1+Cc_n/h_n]\) and \( B^c=: [-M-Cc_n/h_n, M+Cc_n/h_n]-B\), then
$$\begin{aligned} P_0g_{n,u}^{2}\le & {} h_n \left[ \left( \int _B + \int _{B^c}\right) \left[ K_1\left( y+C\frac{c_n}{h_n}\right) -K_1(y)\right] ^{2}f(u+h_ny)\hbox {d}y\right] \\\le & {} h_n\left[ 2C_1^2(f(u)+1)\int _B \hbox {d}y+ C_2(f(u)+1)(c_n/h_n)^2\int _{B^c}\hbox {d}y \right] \\= & {} (f(u)+1)O(c_n), \end{aligned}$$where \(C_1=\sup K\), \(C_2>0\) is some constant. Thus, \(P_0g_{n,u}^{2}\le (f(u)+1)O(c_n \wedge h_n)\) and \(I_{n-}(u) =o(1)\). By using Bernstein’s inequality, we obtain \( |P_ng_{nu}-P_0g_{nu}| = o(h_n)\, a.s.\) and \( |I_{1n}(u)|\le |P_ng_{nu}-P_0g_{nu}|/h_n + I_{n-}(u) = o(1) \ \ a.s. \) Similarly, we also have \( I_{2n}(u) =o(1)\,\, a.s.\). It means that \(\varDelta _{1n}(u) =o(1)\ a.s.\). By the same derivation way, \(\varDelta _{2n}(u) = o(1)\ a.s.\) still holds. Therefore, we have \(\hat{f}_n(u)-f_n(u) = o(1)\,\,\, a.s.\) by (7). \(\square \)
- (ii).
If \(f(\cdot )\) is continuous uniformly, then f(u) is bounded and \( \sup _{u} P_0g_{n,u}^{2}\le 4 \sup _u (f(u)+1)MC_1^2(c_n \wedge h_n) = O(c_n \wedge h_n)\). By Lemma 36 (ii) in Chapter 2, Pollard (1984), the covering numbers of \(\mathscr {G}_n\) satisfy \(\sup _{Q}N_1(\epsilon ,Q,\mathscr {G}_n)\le A\epsilon ^{-W}\), \(0<\epsilon <1\), where constants A and W are not depending on n, and \(\sup _{u}|g_{nu}|\le C_1\). Denote \(\alpha _n=\frac{\log n}{\sqrt{n}\delta _n}, \delta _n^{2}=O(c_n\wedge h_n)\), then the conditions of Lemma 1 hold. Furthermore, we have
$$\begin{aligned} \sup _{u}|P_ng_{nu}-P_0g_{nu}|=o\left( \delta _n^{2}\alpha _n\right) \,\, \, a.s. \end{aligned}$$Therefore, \(\sup _u|I_{1n}(u)|\le o(\frac{\delta _n^{2}\alpha _n}{h_n})+\sup _u I_{n-}(u)=o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}})) + \sup _u I_{n-}(u)\) a.s. Similarly, we have \( \sup _u |I_{2n}(u)| \le o(\frac{\log n}{\sqrt{nh_n}}) + \sup _u I_{n+}(u)\, \, a.s. \) and
$$\begin{aligned} \sup _u|\varDelta _{1n}(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + \sup _u[I_{n+}(u)+ I_{n-}(u)] \, \, a.s. \end{aligned}$$(8)Similar to the proof of Eq. (8), we also have \(\sup _u|\varDelta _{2n}(u)|\le o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}}))+\sup _u [I_{n+}(u)+I_{n-}(u)] \, a.s.\). We know that \(\sup _u |\hat{f}_n(u)-f_n(u)|\) can be dominated by \( \sup _u|\varDelta _{1n}(u)|+ \sup _u |\varDelta _{2n}(u)|\) almost surely by (7). Thus, we have
$$\begin{aligned} \sup _u|\hat{f}_n(u)-f_n(u)|\le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + 2\sup _u [I_{n+}(u)+I_{n-}(u)] \;\; a.s. \end{aligned}$$This completes the proof of Lemma 3. \(\square \)
Proof of Theorem 1
Note that
By condition \(\mathbf{C}_1\) and Lemma 3 (i), we have \(\hat{f}_n(u)-f_n(u) = o(1) \; a.s.\) due to the continuity of f at u. For \(f_n(u)-f(u) = o(1) \; a.s.\), please refer to pages 35–36 of Pollard (1984). Therefore, \(|\hat{f}_n(u)-f(u)|=o(1)\) holds a.s. \(\square \)
Proof of Theorem 2
According to triangle inequality,
From Lemma 3 (ii),
Due to the assumption that f(u) is continuous uniformly about u and conditions \(\mathbf{C}_1-C_3\), we have \(\sup _uI_{n-}(u)=\sup _u\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\hbox {d}y = o(1) \) and \(\sup _uI_{n+}(u)= o(1) \). By Lemma 3 (ii), we immediately have \(\sup _u|\hat{f}_n(u)-f_n(u)|=o(1)\,\, a.s.\). Moreover, \(\sup _u|f_n(u)-f(u)|=o(1) \,\, a.s.\) holds according to condition \(\mathbf{C}_1\). Therefore, we obtain \(\sup _u|\hat{f}_n(u)-f(u)|=o(1)\, \, a.s.\)\(\square \)
Proof of Theorem 3
According to Theorem 2 and Lemma 3 (ii),
According to Lipstchz condition of f, we have that \(\sup _u|Ef_n(u)-f(u)|=O(h_n)\) and \(\sup _u|f_n(u)-Ef_n(u)| = o(\frac{\log n}{\sqrt{nh_n}}) \, \, a.s. \) Therefore, it is easy to know by the triangle inequality,
Since f(u) satisfies the first-order Lipstchz condition, then
Similarly, we have \(I_{n+}(u)=O(c_n)\). Thus, it can be derived that
\(\square \)
Lemma 4
Assume that the conditions \(\mathbf{C}_1-C_3\) hold, \(f(\cdot )\) satisfies the first-order Lipstchz condition and \(\lim \nolimits _{n\rightarrow \infty }nh_n^3 =0\), then for fixed \( u \in R \) and \( f(u)>0 \), we have
\(\square \)
Proof of Theorem 4
-
(i).
Since \(f(\cdot )\) satisfies the first-order Lipstchz condition, then \(\sup _u I_{n+}(u)+ \sup _u I_{n-}(u) = O(c_n)\). By Lemma 3 and \(c_n= o(h_n/\log ^2 n) \), we know
$$\begin{aligned} \sqrt{nh_n}\,|\hat{f}_n(u)-f_n(u)| \,\,\,=\, \sqrt{nh_n}\,\left[ o\left( \sqrt{\frac{c_n}{h_n}}\frac{\log n}{\sqrt{nh_n}}\wedge \frac{\log n}{\sqrt{nh_n}}\right) +O(c_n)\right] \,=o(1) \ a.s. \end{aligned}$$and
$$\begin{aligned} \sqrt{\frac{nh_n}{v}}\,[\hat{f}_n(u)-f(u)]= & {} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f_n(u)] + \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] \\= & {} \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] + o(1) \,\, a.s. \end{aligned}$$Due to condition \(\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0\) and Lemma 4, then we have
$$\begin{aligned} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f(u)]{\mathop {\longrightarrow }\limits ^{d}}N(0,1). \end{aligned}$$\(\square \)
-
(ii).
By \(c_n= o(h_n/\log ^2 n) \), we have \(\sqrt{nh_n}\,[\hat{f}_n(u)-f_n(u)]\,= o(1) \;\ a.s. \) and \( Ef_n(u)-f(u)=O(h_n). \) In addition, by employing the law of the iterated logarithm (Theorem 2, Hall 1981), we have
$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{V\log \log n}}\,[f_n(u)-Ef_n(u)]\,=\sqrt{2}\,\, a.s. \end{aligned}$$(9)Furthermore, by condition \(\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0\), it can be derived that
$$\begin{aligned} \sqrt{\frac{nh_n}{v\log \log n}}\,[\hat{f}_n(u)-f(u)]\,= \sqrt{\frac{nh_n}{v\log \log n}}\,[f_n(u)-Ef_n(u)] + o(1) \,\,a.s.\qquad \end{aligned}$$(10)Finally, we have
$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{v\log \log n}}\,[{{\hat{f}}_n(u)}-f(u)]\,=\sqrt{2}\; a.s. \end{aligned}$$
\(\square \)
Proof of Theorem 5
-
(i)
$$\begin{aligned} |T(f_n)-T(f)|= & {} \left| \int H(u)f_n(u)du-EH(e_1)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int [H(e_i+h_ny)-H(e_i)] K(y)\hbox {d}y\right| + \left| \frac{1}{n}\sum _{i=1}^n H(e_i)-EH(e)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int H'(e_i+\theta _i h_ny)h_ny K(y)\hbox {d}y\right| +o(1)\\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)| h_n\int |y|K(y)\hbox {d}y + o(1) = o(1) \ \ a.s. \end{aligned}$$
Next, we make some explanations about why the “\(\le \)” holds. For given i, \(H(e_i+h_ny)-H(e_i)=H'(e_i+\theta _i h_ny)h_ny\) by Taylor’s expansion of \(H(\cdot )\) at \(e_i\) with \(|\theta _i|\le 1\). Since \(h_n\rightarrow 0\), there exists a constant \(\delta >0\) such that \(|\theta _ih_ny|\le Mh_n\le \delta \). Furthermore, we have \(|H(e_i+h_ny)-H(e_i)|\le \sup _{|z|\le \delta }|H'(e_i+z)|h_n|y|\) as long as n is large enough. To prove \(T({\hat{f}}_n)\) is a consistent estimator of T(f), it just needs to prove \(T({\hat{f}}_n)-T(f_n)= o(1)\ \ a.s.\). By Taylor’s expansion,
$$\begin{aligned} T({\hat{f}}_n)-T(f_n)= & {} \frac{1}{n}\sum _{i=1}^{n}\int [H(\hat{e}_i +h_ny)- H(e_i+h_ny)]K(y)\hbox {d}y \\= & {} \frac{1}{n}\sum _{i=1}^{n} \int \left[ H'(e_i+h_ny +\theta _i(\hat{e}_i-e_i))K(y)\hbox {d}y (\hat{e}_i-e_i)\right] \\&I\{\max _i|\hat{e}_i -e_i|\le c_n\}+\, o(1) \\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)|c_n +o(1) =o(1)\ \ a.s. \end{aligned}$$\(\square \)
-
(ii).
By Taylor’s expansion, we have
$$\begin{aligned}&\left| \sqrt{n}[ T(f_n)-T(f)]-\frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1))\right| \\&\quad =\left| \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\int \left[ H'(e_i)h_ny + \frac{1}{2} H''(e_i + \theta _i h_n y)h_n^2y^2\right] K(y)\hbox {d}y \right| \\&\quad =\left| \frac{h_n^2}{2\sqrt{n}}\sum _{i=1}^{n}\int H''(e_i + \theta _i h_n y)y^2 K(y)\hbox {d}y\right| \\&\quad \le \frac{\sqrt{n} h_n^2}{2n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H''(e_i +z)|\int y^2 K(y)\hbox {d}y = o(1)\ \ a.s., \end{aligned}$$and
$$\begin{aligned}&\sqrt{n}\left| T(\hat{f}_n)-T(f_n)\right| \le \sqrt{n}|T(\hat{f}_n)-T(f_n)|I\{ \max _i |\hat{e}_i -e_i|\le c_n \} + o(1) \nonumber \\&\quad = \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n [H(\hat{e}_i)-H(e_i)]\right| I\{ \max _i |\hat{e}_i -e_i|\le c_n \} \nonumber \\&\qquad + \left| \frac{1}{\sqrt{n}}\sum _{i=1}^n [ H''(\hat{e}_i +\theta _{1i} h_ny) - H''(e_i+\theta _{2i}h_ny) ]h_n^2\int y^2K(y)\hbox {d}y\right| \nonumber \\&\qquad I\{ \max _i |\hat{e}_i -e_i|\le c_n \} +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| + \frac{1}{2\sqrt{n}} \sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|c_n^2 \nonumber \\&\qquad + \frac{2h_n^2}{\sqrt{n}}\sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|\int y^2K(y)\hbox {d}y +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| +\left| \frac{E(H'(e_1)}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_j\right| +o(1) \nonumber \\&\quad =\left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| + O_p(\sqrt{\hat{s}/n}) +o(1). \end{aligned}$$(11)Let \(e_i^*=H'(e_i)- E(H'(e_i))\), then
$$\begin{aligned} E\left\{ \left[ \frac{1}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_i^*e_j\right] ^2|X_{\hat{M}}\right\}= & {} \frac{1}{n} \sum _{i=1}^nP_{ii}^2E\left( e_1^{*2}e_1^2\right) \\&+\frac{1}{n} \sum _{i\ne j}P_{ii}P_{jj}E\left( e_1^*e_1\right) E\left( e_2^*e_2\right) \\&+ \frac{2}{n} \sum _{i\ne j}P_{ij}^2E\left[ e_1^{*2}e_2^2\right] \\= & {} O(\hat{s}/n)+ O(\hat{s}^2/n) =O(\hat{s}^2/n) \ \ a.s. \end{aligned}$$It can be derived from (11) and condition \(\gamma +\,1/k <1/2\) that \( \sqrt{n}[T(\hat{f}_n)-T(f_n)]= o_p(1).\) Therefore,
$$\begin{aligned} \sqrt{n}[ T(\hat{f}_n)-T(f)]= & {} \sqrt{n}[ T(\hat{f}_n)-T(f_n)] + \sqrt{n}[ T(f_n)-T(f)] \\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1)) +o_p(1) {\mathop {\longrightarrow }\limits ^{d}}N(0, \hbox {Var}(H(e_1))). \end{aligned}$$This completes the proof of Theorem 5. \(\square \)
About this article
Cite this article
Zou, F., Cui, H. Error density estimation in high-dimensional sparse linear model. Ann Inst Stat Math 72, 427–449 (2020). https://doi.org/10.1007/s10463-018-0699-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-018-0699-0