Skip to main content
Log in

Error density estimation in high-dimensional sparse linear model

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

This paper is concerned with the error density estimation in high-dimensional sparse linear model, where the number of variables may be larger than the sample size. An improved two-stage refitted cross-validation procedure by random splitting technique is used to obtain the residuals of the model, and then traditional kernel density method is applied to estimate the error density. Under suitable sparse conditions, the large sample properties of the estimator including the consistency and asymptotic normality, as well as the law of the iterated logarithm are obtained. Especially, we gave the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that our error density estimator has a good performance. A real data example is presented to illustrate our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Bai, Z., Yin, Y. (1993). Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix. Annals of Probability, 21(3), 1275–1294.

    Article  MathSciNet  Google Scholar 

  • Candes, E., Tao, T. (2007). The Danzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.

    Article  MathSciNet  Google Scholar 

  • Chai, G., Li, Z. (1993). Asymptotic theory for estimation of error distribution in linear model. Science in China: Series A, 36, 408–419.

    MathSciNet  MATH  Google Scholar 

  • Cheng, F. (2005). Asymptotic distributions of error density and distribution function estimators in nonparametric regression. Journal of Statistical Planning and Inference, 128, 327–349.

    Article  MathSciNet  Google Scholar 

  • Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R., Nishimura, D., Braun, T. A., Kim, K. Y., Huang, J., Elbedour, K., Carmi, R., Slusarski, D. C., Casavant, T. L., Stone, E. M., Sheffield, V. C. (2006). Homozygosity mapping with SNP arrays identifies a novel gene for Bardet–Biedl syndrome (BBS10). Proceedings of the National Academy of Sciences of the United States of America, 103, 6287–6292.

    Article  Google Scholar 

  • Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.

    Article  MathSciNet  Google Scholar 

  • Fan, J., Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.

    MathSciNet  MATH  Google Scholar 

  • Fan, J., Guo, S., Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B, 74, 37–65.

    Article  MathSciNet  Google Scholar 

  • Hall, P. (1981). Laws of the iterated logarithm for nonparametric density estimators. Probability Theory and Related Fields, 56, 47–61.

    MathSciNet  MATH  Google Scholar 

  • Huang, J., Ma, S., Zhang, C. H. (2008). Adaptive lasso for sparse high dimensional regression. Statistica Sinica, 18, 1603–1618.

    MathSciNet  MATH  Google Scholar 

  • Liang, H., Hardle, W. (1999). Large sample theory of the estimation of the error distribution for a semiparametric model. Communication in Statistics Theory and Methods, 28, 2025–2036.

    Article  MathSciNet  Google Scholar 

  • Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.

    Article  MathSciNet  Google Scholar 

  • Marčenko, V. A., Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1, 507–536.

    Article  Google Scholar 

  • Meinshausen, N., Meier, L., Bühlmann, P. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association, 104, 1671–1681.

    Article  MathSciNet  Google Scholar 

  • Pollard, D. (1984). Convergence of stochastic processes. New York: Springer.

    Book  Google Scholar 

  • Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25, 303–325.

    Article  MathSciNet  Google Scholar 

  • Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudt- son, K. L., Dorrance, A. M., Dibona, G. F., Huang, J., Casavant, T. L. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America, 103, 14429–14434.

    Article  Google Scholar 

  • Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

    Book  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Yang, Y. (1997). Large sample properties of estimation of the error distribution in nonparametric regression. Acta Scientiarum Naturalium Universitatis Pekinensis, 33, 298–304.

    MATH  Google Scholar 

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942.

    Article  MathSciNet  Google Scholar 

  • Zhong, W. (2014). Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Mathematical Sinica, 30, 1885–1896.

    Article  MathSciNet  Google Scholar 

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This project was supported partly by the National Natural Science Foundation of China (Grant Nos. 11071022, 11231010, 11471223) and “Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds”(No. 025185305000/204). The authors thank the Editor, the AE and reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hengjian Cui.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 524 KB)

Appendix: Proofs of main results

Appendix: Proofs of main results

Lemma 1

(Theorem 37 in Chapter 2, Pollard 1984) For each n, let \(\mathscr {F}_n\) be a permissible class of functions (Definition 1, in Appendix C, Pollard 1984) whose covering numbers (Definition 23, in Chapter 2, Pollard 1984) satisfy \(\sup _{Q}N_1(\epsilon ,Q,\mathscr {F}_n)\le A\epsilon ^{-W},0<\epsilon <1\), with constants A and W not depending on n. Let \({\alpha _n}\) be a non-increasing sequence of positive numbers for which \(n\delta _n^{2}\alpha _n^{2}\gg \log n\). If \(|f|\le 1\), \(\sqrt{P_0f^{2}}\le \delta _n\), \(\forall f\in \mathscr {F}_n\), then

$$\begin{aligned} \sup \limits _{\mathscr {F}_n}|P_nf-P_0f|\ll \delta _n^{2}\alpha _n \, \, a.s. , \end{aligned}$$

where \(P_0f=\int fdP_0\), \(P_nf=\int fdP_n=\frac{1}{n}\sum _{i=1}^{n}f(X_i)\).

For simplicity, we only prove the large sample properties of \(\hat{f}_{n_1}(x)\) about f(x). For \(j=1,2\), we use n, \(\hat{s}\) and \(\hat{M}\) instead of \(n_j\), \(\hat{s}_j\) and \(\hat{M}_j\), respectively, to make the proof look more concise and comfortable, where \(\hat{s}_j\) and \(\hat{M}_j\) is defined in condition \({\mathbf{C_0}}\). By the screening consistency, we have \(\hat{s}=O(n^\gamma ), 0\le \gamma <1\).

Lemma 2

Suppose the assumptions \(\mathbf{C}_2\) and \(\mathbf{C}_3\) hold, then we have \(\max _{1\le i\le n}|\hat{e}_i-e_i|=o(c_n)\,\, a.s.\), where \(c_n=n^{-\frac{1}{2}(1-\frac{1}{k}-\gamma )}\, \log n \) with \(0\le \gamma <1-\frac{1}{k}\).

Proof

Denote \(X_{\hat{M}} =(X_{1\hat{M}},\ldots , X_{n\hat{M}})^\mathrm{T}\), where \(X_{i\hat{M}}=(X_{ij_1},\ldots , X_{ij_{\hat{s}}})^\mathrm{T}\) with \(\hat{M}=\{j_1,\ldots , j_{\hat{s}}\}\)\( (1\le j_1<\cdots < j_{\hat{s}}\le p)\). By the definition of P, we have \(P_{ij}=X_{i\hat{M}}^\mathrm{T}(X_{\hat{M}}^\mathrm{T}X_{\hat{M}})^{-1}X_{j\hat{M}}\) and \(\hbox {Var}(\sum _{j=1}^{n}P_{ij}e_j|X_{\hat{M}})=\sigma ^2P_{ii}\). To prove Lemma 2, the first is to compute the order of \(\max _{1\le i\le n}P_{ii}\) ( For more detailed proof, please see the supplementary material). By the definition of \(P_{ii}\) and the condition \(\mathbf{C}_2\), we have

$$\begin{aligned} \max _{1\le i\le n}P_{ii}\le \frac{2\hat{s}}{n\lambda _0}\max _{1\le i\le n}\frac{1}{\hat{s}}\sum \limits _{j\in \hat{M}}X_{ij}^2, \end{aligned}$$

which is equivalent to compute the order of \(\max _{1\le i\le n}\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2\). For the sake of simplicity, we may as well set \(Z_\mathrm{in}=\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2\) due to \(\hat{s}=O(n^\gamma )\) with \(0\le \gamma <1, i=1,\ldots ,n\). Since \(Z_\mathrm{in}\) is i.i.d. random variable sequence, then

$$\begin{aligned} P\left( \max _{1\le i\le n}Z_\mathrm{in}>A_0n^{1/k}\right) =1-P(Z_\mathrm{in}<A_0n^{1/k})^n=1-[1-P(Z_\mathrm{in}>A_0n^{1/k})]^n, \end{aligned}$$

where \(A_0>2\). \(\forall x>0\), denote \(h(x)=x^{2k}(\log ^{+}(x))^2\), then we have

$$\begin{aligned} P\left( Z_\mathrm{in}>A_0n^{1/k}\right) \le E\frac{h(Z_\mathrm{in})}{h\left( A_0n^{1/k}\right) }\le \frac{k^2}{A_0^{2k}}\frac{\sup _{1\le j\le p} Eh\left( X_{1j}^2\right) }{n^2\log ^2n}. \end{aligned}$$

For fixed k and the condition \(\sup _{1\le j\le p}E[X_{1j}^{4k}(\log ^{+}(X_{1j}^2))^2]\le C<\infty \), then

$$\begin{aligned} \sum _{n=2}^{\infty }P\left( \max _{1\le i\le n}Z_\mathrm{in}>A_0n^{1/k}\right) <\infty , \end{aligned}$$

thereby we have \(\max _{1\le i\le n}Z_\mathrm{in}=O(n^{1/k})\,a.s.\). By Cauchy’s inequality, we have \(\max _{i,j}|P_{ij}|\le \max _{1\le i,j\le n}\sqrt{P_{ii}P_{jj}} \le \max _{1\le i\le n}P_{ii}\). In addition, by the condition \(C_3\), we have that \(\max _{1\le i\le n}|e_i|\le c_0\log n \) for some constant \(c_0>0\). Denote \(e_{1i} = e_iI\{|e_i|\le c_0\log n \}\) and \(e_{2i} = e_iI\{|e_i|> c_0\log n \}\), then \( E(e_{2j})=o(n^{-3})\) and

$$\begin{aligned} \sum _{j=1}^n P_{ij}e_j = \sum _{j=1}^n P_{ij}[e_{1j} -E(e_{1j})] + \sum _{j=1}^n P_{ij} [e_{2j} -E(e_{2j}]). \end{aligned}$$

Since for any \(\epsilon >0\), \( \{\max _{1\le i\le n} \sum _{j=1}^n P_{ij}e_{2j}>\epsilon c_n \} \subset \{ \max _{1\le i\le n} |e_i|>c_0\log n \} \), then \( \max _{1\le i \le n} \sum _{j=1}^n P_{ij}e_{2j}= o(c_n)\ \ a.s.\). Therefore,

$$\begin{aligned} \max _{1\le i\le n} \sum _{j=1}^n P_{ij}[e_{2j}-E(e_{2j})] = o(c_n)\ \ \ a.s. \end{aligned}$$
(5)

Furthermore, for some constant \(c_1>0\), by Bernstein’s inequality, we have

$$\begin{aligned}&P\left\{ \max _{1\le i\le n} \left| \sum _{j=1}^n P_{ij}[e_{1j}- E(e_{1j})]\right| \ge nt, \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1} |X_{\hat{M}}\right\} \\&\quad \le \sum _{i=1}^n P\left\{ \left| \sum _{j=1}^{n}P_{ij}[e_{1j} -E(e_{1j})]\right| \ge nt, \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1}|X_{\hat{M}}\right\} \\&\quad \le 2n \exp \left\{ -\frac{n^2t^2}{2\sum _{j=1}^{n}\hbox {Var}(P_{ij}e_{j})+ \frac{2}{3}c_0\max _{1\le i\le n} P_{ii}(\log n)nt}\right\} \\&\qquad I\left\{ \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1}\right\} \\&\quad \le 2n \exp \left\{ -\frac{n^2t^2}{2c_1\hat{s}n^{1/k-1}(\sigma ^2 + \frac{t}{3}c_0n\log n)}\right\} . \end{aligned}$$

Let \(t=\epsilon t_n \) with \(\epsilon >0\), \( t_n=\sqrt{\hat{s}}n^{-\alpha }\log n\), \( \alpha =\frac{3}{2}-\frac{1}{2k}\), then

$$\begin{aligned} -\frac{n^2t^2}{2c_1\hat{s}n^{1/k-1}(\sigma ^2 + \frac{t}{3}c_0n\log n)}=\frac{-\epsilon ^2 \log ^2 n}{2c_1(\sigma ^2+\frac{\epsilon c_0}{3} c_n \log n)} \le -\frac{\epsilon ^2}{4c_1\sigma ^2} \log ^2 n \le -3\log n, \end{aligned}$$

as n is large enough. It can be derived by the Borel–Cantelli lemma that

$$\begin{aligned} \max _{1\le i\le n} \sum _{j=1}^n P_{ij}[e_{1j}-E(e_{1j})] = o(nt_n) = o(c_n)\ \ \ a.s. \end{aligned}$$
(6)

Then, we can easily know \(\max _{1\le i \le n}|\hat{e}_i-e_i|=o(c_n)\; \ a.s.\) by (5) and (6). \(\square \)

Lemma 3

Suppose the assumptions \(\mathbf{C}_0-C_3\) hold and \(\gamma <1-1/k\), then we have

  1. (i).

    If \(f(\cdot )\) is continuous at u, then \( \hat{f}_n(u)-f_n(u) =o(1) \, \, a.s. \)

  2. (ii).

    If \(f(\cdot )\) is continuous uniformly, then

    $$\begin{aligned} \sup _u |\hat{f}_n(u)-f_n(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{ \frac{c_n}{h_n} }\right) \right) +2\sup _u[I_{n+}(u) + I_{n-}(u)] \,\, a.s., \end{aligned}$$

    where \(I_{n+}(u)=\int K(y)|f(u+Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y\) and \(I_{n-}(u)=\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y\) for some constant \(C>0\).

Proof

Since \(K(\cdot )\) is a bounded variation function, then K can be written as \(K=K_1-K_2\) which \(K_1\) and \(K_2\) are two monotonically increasing functions. By the definitions of \(\hat{f}_n(u)\) and \(f_n(u)\), we have

$$\begin{aligned} \hat{f_n}(u)-f_n(u)= & {} \frac{1}{nh_n}\left[ \sum _{i=1}^{n}K\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\= & {} \frac{1}{nh_n}\left[ \sum _{i=1}^{n}K_1\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K_1\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\&+\frac{1}{nh_n}\left[ \sum _{i=1}^{n}K_2\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K_2\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\= & {} \varDelta _{1n}(u)+\varDelta _{2n}(u). \end{aligned}$$
(7)

On the set \( \{\hat{e}_i|\,\max |\hat{e}_i-e_i|\le Cc_n\}\) with constant \(C>0\), it can be derived that \(I_{2n}(u)\le \varDelta _{1n}(u)\le I_{1n}(u)\) due to the fact that \(K_1\) is a monotonically increasing function, where

$$\begin{aligned} I_{1n}(u)= & {} \frac{1}{nh_n}\sum _{i=1}^{n}\,\left[ \,K_1\left( \frac{e_i+Cc_n-u}{h_n}\right) -K_1\left( \frac{e_i-u}{h_n}\right) \,\right] ,\\ I_{2n}(u)= & {} \frac{1}{nh_n}\sum _{i=1}^{n}\,\left[ \,K_1\left( \frac{e_i-Cc_n-u}{h_n}\right) -K_1\left( \frac{e_i-u}{h_n}\right) \,\right] . \end{aligned}$$

Let \(\mathscr {G}_n =:\{\,g_{nu}=K_1(\frac{(e+Cc_n)-u}{h_n})-K_1(\frac{e-u}{h_n}): u\in R\}\) ( For more details about \(\mathscr {G}_n\), please refer to supplementary material ), then \(\mathscr {G}_n\) is a class of permissible functions with a polynomial discriminant and

$$\begin{aligned} I_{1n}(u)=\frac{1}{h_n}[P_ng_{nu}-P_0g_{nu}]+\frac{1}{h_n}P_0g_{nu}. \end{aligned}$$

Since K has compact support, we suppose that \(K_j\) has compact support \([-M, M]\) with \(M>0\) and \(K_j'\) is bounded except one jump point without loss of generality. For fixed u, we have

$$\begin{aligned} \frac{|P_0g_{nu}|}{h_n}= & {} \frac{1}{h_n} \left| E\left[ K_1\left( \frac{e-u+Cc_n}{h_n}\right) -K_1\left( \frac{e-u}{h_n}\right) \right] \right| \\= & {} \frac{1}{h_n}\left| \int K_1\left( \frac{x-u+Cc_n}{h_n}\right) f(x)\hbox {d}x-\int K_1\left( \frac{x-u}{h_n}\right) f(x)\hbox {d}x\right| \\= & {} \left| \int K_1(y)[f(u-Cc_n+h_ny)-f(u+h_ny)]\hbox {d}y \right| \\\le & {} \int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\hbox {d}y= I_{n-}(u), \end{aligned}$$

and

$$\begin{aligned} P_0g_{n,u}^{2}= & {} \int \left[ K_1\left( \frac{x-u+Cc_n}{h_n}\right) -K_1\left( \frac{x-u}{h_n}\right) \right] ^{2}f(x)\hbox {d}x \\= & {} h_n\int \left[ K_1\left( y+C\frac{c_n}{h_n}\right) -K_1(y)\right] ^{2}f(u+h_ny)\hbox {d}y. \end{aligned}$$
  1. (i).

    If \(f(\cdot )\) is continuous at u and \(h_n=o(1)\), then \(f(u+h_ny) \le f(u)+1\) for \(|y|\le M + Cc_n/h_n\), and \(P_0g_{n,u}^{2}\le 4(f(u)+1)MC_1^2h_n = O(h_n)\) for \(c_n/h_n \ge 1\). If \(c_n/h_n < 1\), let \(y_1\) be a jump point of \(K_1\) in \([-M, M]\), \(B=:[y_1-Cc_n/h_n, y_1+Cc_n/h_n]\) and \( B^c=: [-M-Cc_n/h_n, M+Cc_n/h_n]-B\), then

    $$\begin{aligned} P_0g_{n,u}^{2}\le & {} h_n \left[ \left( \int _B + \int _{B^c}\right) \left[ K_1\left( y+C\frac{c_n}{h_n}\right) -K_1(y)\right] ^{2}f(u+h_ny)\hbox {d}y\right] \\\le & {} h_n\left[ 2C_1^2(f(u)+1)\int _B \hbox {d}y+ C_2(f(u)+1)(c_n/h_n)^2\int _{B^c}\hbox {d}y \right] \\= & {} (f(u)+1)O(c_n), \end{aligned}$$

    where \(C_1=\sup K\), \(C_2>0\)  is some constant. Thus, \(P_0g_{n,u}^{2}\le (f(u)+1)O(c_n \wedge h_n)\) and \(I_{n-}(u) =o(1)\). By using Bernstein’s inequality, we obtain \( |P_ng_{nu}-P_0g_{nu}| = o(h_n)\, a.s.\) and \( |I_{1n}(u)|\le |P_ng_{nu}-P_0g_{nu}|/h_n + I_{n-}(u) = o(1) \ \ a.s. \) Similarly, we also have \( I_{2n}(u) =o(1)\,\, a.s.\). It means that \(\varDelta _{1n}(u) =o(1)\ a.s.\). By the same derivation way, \(\varDelta _{2n}(u) = o(1)\ a.s.\) still holds. Therefore, we have \(\hat{f}_n(u)-f_n(u) = o(1)\,\,\, a.s.\) by (7). \(\square \)

  2. (ii).

    If \(f(\cdot )\) is continuous uniformly, then f(u) is bounded and \( \sup _{u} P_0g_{n,u}^{2}\le 4 \sup _u (f(u)+1)MC_1^2(c_n \wedge h_n) = O(c_n \wedge h_n)\). By Lemma 36 (ii) in Chapter 2, Pollard (1984), the covering numbers of \(\mathscr {G}_n\) satisfy \(\sup _{Q}N_1(\epsilon ,Q,\mathscr {G}_n)\le A\epsilon ^{-W}\), \(0<\epsilon <1\), where constants A and W are not depending on n, and \(\sup _{u}|g_{nu}|\le C_1\). Denote \(\alpha _n=\frac{\log n}{\sqrt{n}\delta _n}, \delta _n^{2}=O(c_n\wedge h_n)\), then the conditions of Lemma 1 hold. Furthermore, we have

    $$\begin{aligned} \sup _{u}|P_ng_{nu}-P_0g_{nu}|=o\left( \delta _n^{2}\alpha _n\right) \,\, \, a.s. \end{aligned}$$

    Therefore, \(\sup _u|I_{1n}(u)|\le o(\frac{\delta _n^{2}\alpha _n}{h_n})+\sup _u I_{n-}(u)=o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}})) + \sup _u I_{n-}(u)\)  a.s.  Similarly, we have \( \sup _u |I_{2n}(u)| \le o(\frac{\log n}{\sqrt{nh_n}}) + \sup _u I_{n+}(u)\, \, a.s. \) and

    $$\begin{aligned} \sup _u|\varDelta _{1n}(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + \sup _u[I_{n+}(u)+ I_{n-}(u)] \, \, a.s. \end{aligned}$$
    (8)

    Similar to the proof of Eq. (8), we also have \(\sup _u|\varDelta _{2n}(u)|\le o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}}))+\sup _u [I_{n+}(u)+I_{n-}(u)] \, a.s.\). We know that \(\sup _u |\hat{f}_n(u)-f_n(u)|\) can be dominated by \( \sup _u|\varDelta _{1n}(u)|+ \sup _u |\varDelta _{2n}(u)|\) almost surely by (7). Thus, we have

    $$\begin{aligned} \sup _u|\hat{f}_n(u)-f_n(u)|\le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + 2\sup _u [I_{n+}(u)+I_{n-}(u)] \;\; a.s. \end{aligned}$$

    This completes the proof of Lemma 3. \(\square \)

Proof of Theorem 1

Note that

$$\begin{aligned} |\hat{f}_n(u)-f(u)|\le |\hat{f}_n(u)-f_n(u)|+|f_n(u)-f(u)|. \end{aligned}$$

By condition \(\mathbf{C}_1\) and Lemma 3 (i), we have \(\hat{f}_n(u)-f_n(u) = o(1) \; a.s.\) due to the continuity of f at u. For \(f_n(u)-f(u) = o(1) \; a.s.\), please refer to pages 35–36 of Pollard (1984). Therefore, \(|\hat{f}_n(u)-f(u)|=o(1)\) holds a.s. \(\square \)

Proof of Theorem 2

According to triangle inequality,

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)|\le \sup \limits _u|\hat{f}_n(u)-f_n(u)| +\sup \limits _u|f_n(u)-f(u)|. \end{aligned}$$

From Lemma 3 (ii),

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f_n(u)|\le o\left( \frac{\log n}{\sqrt{nh_n}}\right) +2\sup \limits _u[I_{n+}(u) + I_{n-}(u)]. \end{aligned}$$

Due to the assumption that f(u) is continuous uniformly about u and conditions \(\mathbf{C}_1-C_3\), we have \(\sup _uI_{n-}(u)=\sup _u\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\hbox {d}y = o(1) \) and \(\sup _uI_{n+}(u)= o(1) \). By Lemma 3 (ii), we immediately have \(\sup _u|\hat{f}_n(u)-f_n(u)|=o(1)\,\, a.s.\). Moreover, \(\sup _u|f_n(u)-f(u)|=o(1) \,\, a.s.\) holds according to condition \(\mathbf{C}_1\). Therefore, we obtain \(\sup _u|\hat{f}_n(u)-f(u)|=o(1)\, \, a.s.\)\(\square \)

Proof of Theorem 3

According to Theorem 2 and Lemma 3 (ii),

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\right) +2\sup _u [I_{n+}(u) +I_{n-}(u)]+\sup \limits _u|f_n(u)-f(u)|. \end{aligned}$$

According to Lipstchz condition of f, we have that \(\sup _u|Ef_n(u)-f(u)|=O(h_n)\) and \(\sup _u|f_n(u)-Ef_n(u)| = o(\frac{\log n}{\sqrt{nh_n}}) \, \, a.s. \) Therefore, it is easy to know by the triangle inequality,

$$\begin{aligned} \sup \limits _u|f_n(u)-f(u)|\le & {} \sup \limits _u|f_n(u)-Ef_n(u)|+ \sup \limits _u|Ef_n(u)-f(u)|\\= & {} o\left( \frac{\log n}{\sqrt{nh_n}}\right) +O(h_n)\,\, a.s. \end{aligned}$$

Since f(u) satisfies the first-order Lipstchz condition, then

$$\begin{aligned} I_{n-}(u)=\left| \int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)\right| \hbox {d}y =O(c_n). \end{aligned}$$

Similarly, we have \(I_{n+}(u)=O(c_n)\). Thus, it can be derived that

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)|=O(\frac{\log n}{\sqrt{nh_n}}+c_n+h_n) \, \, a.s. \end{aligned}$$

\(\square \)

Lemma 4

Assume that the conditions \(\mathbf{C}_1-C_3\) hold, \(f(\cdot )\) satisfies the first-order Lipstchz condition and \(\lim \nolimits _{n\rightarrow \infty }nh_n^3 =0\), then for fixed \( u \in R \) and \( f(u)>0 \), we have

$$\begin{aligned} \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)]{\mathop {\longrightarrow }\limits ^{d}}N(0,1),\, \, v= f(u) \int K^{2}(y)\mathrm{d}y. \end{aligned}$$

\(\square \)

Proof of Theorem 4

  1. (i).

    Since \(f(\cdot )\) satisfies the first-order Lipstchz condition, then \(\sup _u I_{n+}(u)+ \sup _u I_{n-}(u) = O(c_n)\). By Lemma 3 and \(c_n= o(h_n/\log ^2 n) \), we know

    $$\begin{aligned} \sqrt{nh_n}\,|\hat{f}_n(u)-f_n(u)| \,\,\,=\, \sqrt{nh_n}\,\left[ o\left( \sqrt{\frac{c_n}{h_n}}\frac{\log n}{\sqrt{nh_n}}\wedge \frac{\log n}{\sqrt{nh_n}}\right) +O(c_n)\right] \,=o(1) \ a.s. \end{aligned}$$

    and

    $$\begin{aligned} \sqrt{\frac{nh_n}{v}}\,[\hat{f}_n(u)-f(u)]= & {} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f_n(u)] + \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] \\= & {} \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] + o(1) \,\, a.s. \end{aligned}$$

    Due to condition \(\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0\) and Lemma 4, then we have

    $$\begin{aligned} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f(u)]{\mathop {\longrightarrow }\limits ^{d}}N(0,1). \end{aligned}$$

    \(\square \)

  2. (ii).

    By \(c_n= o(h_n/\log ^2 n) \), we have \(\sqrt{nh_n}\,[\hat{f}_n(u)-f_n(u)]\,= o(1) \;\ a.s. \) and \( Ef_n(u)-f(u)=O(h_n). \) In addition, by employing the law of the iterated logarithm (Theorem 2, Hall 1981), we have

    $$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{V\log \log n}}\,[f_n(u)-Ef_n(u)]\,=\sqrt{2}\,\, a.s. \end{aligned}$$
    (9)

    Furthermore, by condition \(\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0\), it can be derived that

    $$\begin{aligned} \sqrt{\frac{nh_n}{v\log \log n}}\,[\hat{f}_n(u)-f(u)]\,= \sqrt{\frac{nh_n}{v\log \log n}}\,[f_n(u)-Ef_n(u)] + o(1) \,\,a.s.\qquad \end{aligned}$$
    (10)

    Finally, we have

    $$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{v\log \log n}}\,[{{\hat{f}}_n(u)}-f(u)]\,=\sqrt{2}\; a.s. \end{aligned}$$

    This completes the proof of Theorem 4 by (9) and (10).

\(\square \)

Proof of Theorem 5

  1. (i)
    $$\begin{aligned} |T(f_n)-T(f)|= & {} \left| \int H(u)f_n(u)du-EH(e_1)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int [H(e_i+h_ny)-H(e_i)] K(y)\hbox {d}y\right| + \left| \frac{1}{n}\sum _{i=1}^n H(e_i)-EH(e)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int H'(e_i+\theta _i h_ny)h_ny K(y)\hbox {d}y\right| +o(1)\\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)| h_n\int |y|K(y)\hbox {d}y + o(1) = o(1) \ \ a.s. \end{aligned}$$

    Next, we make some explanations about why the “\(\le \)” holds. For given i, \(H(e_i+h_ny)-H(e_i)=H'(e_i+\theta _i h_ny)h_ny\) by Taylor’s expansion of \(H(\cdot )\) at \(e_i\) with \(|\theta _i|\le 1\). Since \(h_n\rightarrow 0\), there exists a constant \(\delta >0\) such that \(|\theta _ih_ny|\le Mh_n\le \delta \). Furthermore, we have \(|H(e_i+h_ny)-H(e_i)|\le \sup _{|z|\le \delta }|H'(e_i+z)|h_n|y|\) as long as n is large enough. To prove \(T({\hat{f}}_n)\) is a consistent estimator of T(f), it just needs to prove \(T({\hat{f}}_n)-T(f_n)= o(1)\ \ a.s.\). By Taylor’s expansion,

    $$\begin{aligned} T({\hat{f}}_n)-T(f_n)= & {} \frac{1}{n}\sum _{i=1}^{n}\int [H(\hat{e}_i +h_ny)- H(e_i+h_ny)]K(y)\hbox {d}y \\= & {} \frac{1}{n}\sum _{i=1}^{n} \int \left[ H'(e_i+h_ny +\theta _i(\hat{e}_i-e_i))K(y)\hbox {d}y (\hat{e}_i-e_i)\right] \\&I\{\max _i|\hat{e}_i -e_i|\le c_n\}+\, o(1) \\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)|c_n +o(1) =o(1)\ \ a.s. \end{aligned}$$

    \(\square \)

  2. (ii).

    By Taylor’s expansion, we have

    $$\begin{aligned}&\left| \sqrt{n}[ T(f_n)-T(f)]-\frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1))\right| \\&\quad =\left| \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\int \left[ H'(e_i)h_ny + \frac{1}{2} H''(e_i + \theta _i h_n y)h_n^2y^2\right] K(y)\hbox {d}y \right| \\&\quad =\left| \frac{h_n^2}{2\sqrt{n}}\sum _{i=1}^{n}\int H''(e_i + \theta _i h_n y)y^2 K(y)\hbox {d}y\right| \\&\quad \le \frac{\sqrt{n} h_n^2}{2n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H''(e_i +z)|\int y^2 K(y)\hbox {d}y = o(1)\ \ a.s., \end{aligned}$$

    and

    $$\begin{aligned}&\sqrt{n}\left| T(\hat{f}_n)-T(f_n)\right| \le \sqrt{n}|T(\hat{f}_n)-T(f_n)|I\{ \max _i |\hat{e}_i -e_i|\le c_n \} + o(1) \nonumber \\&\quad = \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n [H(\hat{e}_i)-H(e_i)]\right| I\{ \max _i |\hat{e}_i -e_i|\le c_n \} \nonumber \\&\qquad + \left| \frac{1}{\sqrt{n}}\sum _{i=1}^n [ H''(\hat{e}_i +\theta _{1i} h_ny) - H''(e_i+\theta _{2i}h_ny) ]h_n^2\int y^2K(y)\hbox {d}y\right| \nonumber \\&\qquad I\{ \max _i |\hat{e}_i -e_i|\le c_n \} +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| + \frac{1}{2\sqrt{n}} \sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|c_n^2 \nonumber \\&\qquad + \frac{2h_n^2}{\sqrt{n}}\sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|\int y^2K(y)\hbox {d}y +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| +\left| \frac{E(H'(e_1)}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_j\right| +o(1) \nonumber \\&\quad =\left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| + O_p(\sqrt{\hat{s}/n}) +o(1). \end{aligned}$$
    (11)

    Let \(e_i^*=H'(e_i)- E(H'(e_i))\), then

    $$\begin{aligned} E\left\{ \left[ \frac{1}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_i^*e_j\right] ^2|X_{\hat{M}}\right\}= & {} \frac{1}{n} \sum _{i=1}^nP_{ii}^2E\left( e_1^{*2}e_1^2\right) \\&+\frac{1}{n} \sum _{i\ne j}P_{ii}P_{jj}E\left( e_1^*e_1\right) E\left( e_2^*e_2\right) \\&+ \frac{2}{n} \sum _{i\ne j}P_{ij}^2E\left[ e_1^{*2}e_2^2\right] \\= & {} O(\hat{s}/n)+ O(\hat{s}^2/n) =O(\hat{s}^2/n) \ \ a.s. \end{aligned}$$

    It can be derived from (11) and condition \(\gamma +\,1/k <1/2\) that \( \sqrt{n}[T(\hat{f}_n)-T(f_n)]= o_p(1).\) Therefore,

    $$\begin{aligned} \sqrt{n}[ T(\hat{f}_n)-T(f)]= & {} \sqrt{n}[ T(\hat{f}_n)-T(f_n)] + \sqrt{n}[ T(f_n)-T(f)] \\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1)) +o_p(1) {\mathop {\longrightarrow }\limits ^{d}}N(0, \hbox {Var}(H(e_1))). \end{aligned}$$

    This completes the proof of Theorem 5. \(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, F., Cui, H. Error density estimation in high-dimensional sparse linear model. Ann Inst Stat Math 72, 427–449 (2020). https://doi.org/10.1007/s10463-018-0699-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0699-0

Keywords

Navigation