Error density estimation in high-dimensional sparse linear model

Zou, Feng; Cui, Hengjian

doi:10.1007/s10463-018-0699-0

Error density estimation in high-dimensional sparse linear model

Published: 16 November 2018

Volume 72, pages 427–449, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Feng Zou¹ &
Hengjian Cui¹

342 Accesses
2 Citations
Explore all metrics

Abstract

This paper is concerned with the error density estimation in high-dimensional sparse linear model, where the number of variables may be larger than the sample size. An improved two-stage refitted cross-validation procedure by random splitting technique is used to obtain the residuals of the model, and then traditional kernel density method is applied to estimate the error density. Under suitable sparse conditions, the large sample properties of the estimator including the consistency and asymptotic normality, as well as the law of the iterated logarithm are obtained. Especially, we gave the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that our error density estimator has a good performance. A real data example is presented to illustrate our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

Article 15 June 2022

RCV-based error density estimation in the ultrahigh dimensional additive model

Article 27 January 2022

Penalized robust estimators in sparse logistic regression

Article 12 November 2021

References

Bai, Z., Yin, Y. (1993). Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix. Annals of Probability, 21(3), 1275–1294.
Article MathSciNet Google Scholar
Candes, E., Tao, T. (2007). The Danzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.
Article MathSciNet Google Scholar
Chai, G., Li, Z. (1993). Asymptotic theory for estimation of error distribution in linear model. Science in China: Series A, 36, 408–419.
MathSciNet MATH Google Scholar
Cheng, F. (2005). Asymptotic distributions of error density and distribution function estimators in nonparametric regression. Journal of Statistical Planning and Inference, 128, 327–349.
Article MathSciNet Google Scholar
Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R., Nishimura, D., Braun, T. A., Kim, K. Y., Huang, J., Elbedour, K., Carmi, R., Slusarski, D. C., Casavant, T. L., Stone, E. M., Sheffield, V. C. (2006). Homozygosity mapping with SNP arrays identifies a novel gene for Bardet–Biedl syndrome (BBS10). Proceedings of the National Academy of Sciences of the United States of America, 103, 6287–6292.
Article Google Scholar
Cui, H., Li, R., Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.
Article MathSciNet Google Scholar
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet Google Scholar
Fan, J., Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.
Article MathSciNet Google Scholar
Fan, J., Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.
MathSciNet MATH Google Scholar
Fan, J., Guo, S., Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B, 74, 37–65.
Article MathSciNet Google Scholar
Hall, P. (1981). Laws of the iterated logarithm for nonparametric density estimators. Probability Theory and Related Fields, 56, 47–61.
MathSciNet MATH Google Scholar
Huang, J., Ma, S., Zhang, C. H. (2008). Adaptive lasso for sparse high dimensional regression. Statistica Sinica, 18, 1603–1618.
MathSciNet MATH Google Scholar
Liang, H., Hardle, W. (1999). Large sample theory of the estimation of the error distribution for a semiparametric model. Communication in Statistics Theory and Methods, 28, 2025–2036.
Article MathSciNet Google Scholar
Li, R., Zhong, W., Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.
Article MathSciNet Google Scholar
Marčenko, V. A., Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1, 507–536.
Article Google Scholar
Meinshausen, N., Meier, L., Bühlmann, P. (2009). P-values for high-dimensional regression. Journal of the American Statistical Association, 104, 1671–1681.
Article MathSciNet Google Scholar
Pollard, D. (1984). Convergence of stochastic processes. New York: Springer.
Book Google Scholar
Powell, J. L. (1984). Least absolute deviations estimation for the censored regression model. Journal of Econometrics, 25, 303–325.
Article MathSciNet Google Scholar
Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudt- son, K. L., Dorrance, A. M., Dibona, G. F., Huang, J., Casavant, T. L. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America, 103, 14429–14434.
Article Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Book Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
MathSciNet MATH Google Scholar
Yang, Y. (1997). Large sample properties of estimation of the error distribution in nonparametric regression. Acta Scientiarum Naturalium Universitatis Pekinensis, 33, 298–304.
MATH Google Scholar
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894–942.
Article MathSciNet Google Scholar
Zhong, W. (2014). Robust sure independence screening for ultrahigh dimensional non-normal data. Acta Mathematical Sinica, 30, 1885–1896.
Article MathSciNet Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This project was supported partly by the National Natural Science Foundation of China (Grant Nos. 11071022, 11231010, 11471223) and “Capacity Building for Sci-Tech Innovation-Fundamental Scientific Research Funds”(No. 025185305000/204). The authors thank the Editor, the AE and reviewers for their constructive comments, which have led to an improvement of the earlier version of this paper.

Author information

Authors and Affiliations

Department of Statistics, School of Mathematical Sciences, Capital Normal University, No.105, West Third Ring Road, Haidian District, Beijing, 100048, China
Feng Zou & Hengjian Cui

Authors

Feng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Hengjian Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hengjian Cui.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 524 KB)

Appendix: Proofs of main results

Lemma 1

(Theorem 37 in Chapter 2, Pollard 1984) For each n, let $\mathscr {F}_n$ be a permissible class of functions (Definition 1, in Appendix C, Pollard 1984) whose covering numbers (Definition 23, in Chapter 2, Pollard 1984) satisfy $\sup _{Q}N_1(\epsilon ,Q,\mathscr {F}_n)\le A\epsilon ^{-W},0<\epsilon <1$, with constants A and W not depending on n. Let ${\alpha _n}$ be a non-increasing sequence of positive numbers for which $n\delta _n^{2}\alpha _n^{2}\gg \log n$. If $|f|\le 1$, $\sqrt{P_0f^{2}}\le \delta _n$, $\forall f\in \mathscr {F}_n$, then

$$\begin{aligned} \sup \limits _{\mathscr {F}_n}|P_nf-P_0f|\ll \delta _n^{2}\alpha _n \, \, a.s. , \end{aligned}$$

where $P_0f=\int fdP_0$, $P_nf=\int fdP_n=\frac{1}{n}\sum _{i=1}^{n}f(X_i)$.

For simplicity, we only prove the large sample properties of $\hat{f}_{n_1}(x)$ about f(x). For $j=1,2$, we use n, $\hat{s}$ and $\hat{M}$ instead of $n_j$, $\hat{s}_j$ and $\hat{M}_j$, respectively, to make the proof look more concise and comfortable, where $\hat{s}_j$ and $\hat{M}_j$ is defined in condition ${\mathbf{C_0}}$. By the screening consistency, we have $\hat{s}=O(n^\gamma ), 0\le \gamma <1$.

Lemma 2

Suppose the assumptions $\mathbf{C}_2$ and $\mathbf{C}_3$ hold, then we have $\max _{1\le i\le n}|\hat{e}_i-e_i|=o(c_n)\,\, a.s.$, where $c_n=n^{-\frac{1}{2}(1-\frac{1}{k}-\gamma )}\, \log n $ with $0\le \gamma <1-\frac{1}{k}$.

Proof

Denote $X_{\hat{M}} =(X_{1\hat{M}},\ldots , X_{n\hat{M}})^\mathrm{T}$, where $X_{i\hat{M}}=(X_{ij_1},\ldots , X_{ij_{\hat{s}}})^\mathrm{T}$ with $\hat{M}=\{j_1,\ldots , j_{\hat{s}}\}$, $ (1\le j_1<\cdots < j_{\hat{s}}\le p)$. By the definition of P, we have $P_{ij}=X_{i\hat{M}}^\mathrm{T}(X_{\hat{M}}^\mathrm{T}X_{\hat{M}})^{-1}X_{j\hat{M}}$ and $\hbox {Var}(\sum _{j=1}^{n}P_{ij}e_j|X_{\hat{M}})=\sigma ^2P_{ii}$. To prove Lemma 2, the first is to compute the order of $\max _{1\le i\le n}P_{ii}$ ( For more detailed proof, please see the supplementary material). By the definition of $P_{ii}$ and the condition $\mathbf{C}_2$, we have

$$\begin{aligned} \max _{1\le i\le n}P_{ii}\le \frac{2\hat{s}}{n\lambda _0}\max _{1\le i\le n}\frac{1}{\hat{s}}\sum \limits _{j\in \hat{M}}X_{ij}^2, \end{aligned}$$

which is equivalent to compute the order of $\max _{1\le i\le n}\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2$. For the sake of simplicity, we may as well set $Z_\mathrm{in}=\frac{1}{\hat{s}}\sum _{j\in \hat{M}}X_{ij}^2$ due to $\hat{s}=O(n^\gamma )$ with $0\le \gamma <1, i=1,\ldots ,n$. Since $Z_\mathrm{in}$ is i.i.d. random variable sequence, then

$$\begin{aligned} P\left( \max _{1\le i\le n}Z_\mathrm{in}>A_0n^{1/k}\right) =1-P(Z_\mathrm{in}<A_0n^{1/k})^n=1-[1-P(Z_\mathrm{in}>A_0n^{1/k})]^n, \end{aligned}$$

where $A_0>2$. $\forall x>0$, denote $h(x)=x^{2k}(\log ^{+}(x))^2$, then we have

$$\begin{aligned} P\left( Z_\mathrm{in}>A_0n^{1/k}\right) \le E\frac{h(Z_\mathrm{in})}{h\left( A_0n^{1/k}\right) }\le \frac{k^2}{A_0^{2k}}\frac{\sup _{1\le j\le p} Eh\left( X_{1j}^2\right) }{n^2\log ^2n}. \end{aligned}$$

For fixed k and the condition $\sup _{1\le j\le p}E[X_{1j}^{4k}(\log ^{+}(X_{1j}^2))^2]\le C<\infty $, then

$$\begin{aligned} \sum _{n=2}^{\infty }P\left( \max _{1\le i\le n}Z_\mathrm{in}>A_0n^{1/k}\right) <\infty , \end{aligned}$$

thereby we have $\max _{1\le i\le n}Z_\mathrm{in}=O(n^{1/k})\,a.s.$. By Cauchy’s inequality, we have $\max _{i,j}|P_{ij}|\le \max _{1\le i,j\le n}\sqrt{P_{ii}P_{jj}} \le \max _{1\le i\le n}P_{ii}$. In addition, by the condition $C_3$, we have that $\max _{1\le i\le n}|e_i|\le c_0\log n $ for some constant $c_0>0$. Denote $e_{1i} = e_iI\{|e_i|\le c_0\log n \}$ and $e_{2i} = e_iI\{|e_i|> c_0\log n \}$, then $ E(e_{2j})=o(n^{-3})$ and

$$\begin{aligned} \sum _{j=1}^n P_{ij}e_j = \sum _{j=1}^n P_{ij}[e_{1j} -E(e_{1j})] + \sum _{j=1}^n P_{ij} [e_{2j} -E(e_{2j}]). \end{aligned}$$

Since for any $\epsilon >0$, $ \{\max _{1\le i\le n} \sum _{j=1}^n P_{ij}e_{2j}>\epsilon c_n \} \subset \{ \max _{1\le i\le n} |e_i|>c_0\log n \} $, then $ \max _{1\le i \le n} \sum _{j=1}^n P_{ij}e_{2j}= o(c_n)\ \ a.s.$. Therefore,

$$\begin{aligned} \max _{1\le i\le n} \sum _{j=1}^n P_{ij}[e_{2j}-E(e_{2j})] = o(c_n)\ \ \ a.s. \end{aligned}$$

(5)

Furthermore, for some constant $c_1>0$, by Bernstein’s inequality, we have

$$\begin{aligned}&P\left\{ \max _{1\le i\le n} \left| \sum _{j=1}^n P_{ij}[e_{1j}- E(e_{1j})]\right| \ge nt, \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1} |X_{\hat{M}}\right\} \\&\quad \le \sum _{i=1}^n P\left\{ \left| \sum _{j=1}^{n}P_{ij}[e_{1j} -E(e_{1j})]\right| \ge nt, \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1}|X_{\hat{M}}\right\} \\&\quad \le 2n \exp \left\{ -\frac{n^2t^2}{2\sum _{j=1}^{n}\hbox {Var}(P_{ij}e_{j})+ \frac{2}{3}c_0\max _{1\le i\le n} P_{ii}(\log n)nt}\right\} \\&\qquad I\left\{ \max _{1\le i\le n} P_{ii}\le c_1 \hat{s}n^{1/k-1}\right\} \\&\quad \le 2n \exp \left\{ -\frac{n^2t^2}{2c_1\hat{s}n^{1/k-1}(\sigma ^2 + \frac{t}{3}c_0n\log n)}\right\} . \end{aligned}$$

Let $t=\epsilon t_n $ with $\epsilon >0$, $ t_n=\sqrt{\hat{s}}n^{-\alpha }\log n$, $ \alpha =\frac{3}{2}-\frac{1}{2k}$, then

$$\begin{aligned} -\frac{n^2t^2}{2c_1\hat{s}n^{1/k-1}(\sigma ^2 + \frac{t}{3}c_0n\log n)}=\frac{-\epsilon ^2 \log ^2 n}{2c_1(\sigma ^2+\frac{\epsilon c_0}{3} c_n \log n)} \le -\frac{\epsilon ^2}{4c_1\sigma ^2} \log ^2 n \le -3\log n, \end{aligned}$$

as n is large enough. It can be derived by the Borel–Cantelli lemma that

$$\begin{aligned} \max _{1\le i\le n} \sum _{j=1}^n P_{ij}[e_{1j}-E(e_{1j})] = o(nt_n) = o(c_n)\ \ \ a.s. \end{aligned}$$

(6)

Then, we can easily know $\max _{1\le i \le n}|\hat{e}_i-e_i|=o(c_n)\; \ a.s.$ by (5) and (6). $\square $

Lemma 3

Suppose the assumptions $\mathbf{C}_0-C_3$ hold and $\gamma <1-1/k$, then we have

(i).
If $f(\cdot )$ is continuous at u, then $ \hat{f}_n(u)-f_n(u) =o(1) \, \, a.s. $
(ii).
If $f(\cdot )$ is continuous uniformly, then
$$\begin{aligned} \sup _u |\hat{f}_n(u)-f_n(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{ \frac{c_n}{h_n} }\right) \right) +2\sup _u[I_{n+}(u) + I_{n-}(u)] \,\, a.s., \end{aligned}$$
where $I_{n+}(u)=\int K(y)|f(u+Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y$ and $I_{n-}(u)=\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\mathrm{d}y$ for some constant $C>0$.

Proof

Since $K(\cdot )$ is a bounded variation function, then K can be written as $K=K_1-K_2$ which $K_1$ and $K_2$ are two monotonically increasing functions. By the definitions of $\hat{f}_n(u)$ and $f_n(u)$, we have

$$\begin{aligned} \hat{f_n}(u)-f_n(u)= & {} \frac{1}{nh_n}\left[ \sum _{i=1}^{n}K\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\= & {} \frac{1}{nh_n}\left[ \sum _{i=1}^{n}K_1\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K_1\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\&+\frac{1}{nh_n}\left[ \sum _{i=1}^{n}K_2\left( \frac{\hat{e}_i-u}{h_n}\right) -\sum _{i=1}^{n}K_2\left( \frac{e_i-u}{h_n}\right) \right] \nonumber \\= & {} \varDelta _{1n}(u)+\varDelta _{2n}(u). \end{aligned}$$

(7)

On the set $ \{\hat{e}_i|\,\max |\hat{e}_i-e_i|\le Cc_n\}$ with constant $C>0$, it can be derived that $I_{2n}(u)\le \varDelta _{1n}(u)\le I_{1n}(u)$ due to the fact that $K_1$ is a monotonically increasing function, where

$$\begin{aligned} I_{1n}(u)= & {} \frac{1}{nh_n}\sum _{i=1}^{n}\,\left[ \,K_1\left( \frac{e_i+Cc_n-u}{h_n}\right) -K_1\left( \frac{e_i-u}{h_n}\right) \,\right] ,\\ I_{2n}(u)= & {} \frac{1}{nh_n}\sum _{i=1}^{n}\,\left[ \,K_1\left( \frac{e_i-Cc_n-u}{h_n}\right) -K_1\left( \frac{e_i-u}{h_n}\right) \,\right] . \end{aligned}$$

Let $\mathscr {G}_n =:\{\,g_{nu}=K_1(\frac{(e+Cc_n)-u}{h_n})-K_1(\frac{e-u}{h_n}): u\in R\}$ ( For more details about $\mathscr {G}_n$, please refer to supplementary material ), then $\mathscr {G}_n$ is a class of permissible functions with a polynomial discriminant and

$$\begin{aligned} I_{1n}(u)=\frac{1}{h_n}[P_ng_{nu}-P_0g_{nu}]+\frac{1}{h_n}P_0g_{nu}. \end{aligned}$$

Since K has compact support, we suppose that $K_j$ has compact support $[-M, M]$ with $M>0$ and $K_j'$ is bounded except one jump point without loss of generality. For fixed u, we have

$$\begin{aligned} \frac{|P_0g_{nu}|}{h_n}= & {} \frac{1}{h_n} \left| E\left[ K_1\left( \frac{e-u+Cc_n}{h_n}\right) -K_1\left( \frac{e-u}{h_n}\right) \right] \right| \\= & {} \frac{1}{h_n}\left| \int K_1\left( \frac{x-u+Cc_n}{h_n}\right) f(x)\hbox {d}x-\int K_1\left( \frac{x-u}{h_n}\right) f(x)\hbox {d}x\right| \\= & {} \left| \int K_1(y)[f(u-Cc_n+h_ny)-f(u+h_ny)]\hbox {d}y \right| \\\le & {} \int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\hbox {d}y= I_{n-}(u), \end{aligned}$$

and

$$\begin{aligned} P_0g_{n,u}^{2}= & {} \int \left[ K_1\left( \frac{x-u+Cc_n}{h_n}\right) -K_1\left( \frac{x-u}{h_n}\right) \right] ^{2}f(x)\hbox {d}x \\= & {} h_n\int \left[ K_1\left( y+C\frac{c_n}{h_n}\right) -K_1(y)\right] ^{2}f(u+h_ny)\hbox {d}y. \end{aligned}$$

(i).
If $f(\cdot )$ is continuous at u and $h_n=o(1)$, then $f(u+h_ny) \le f(u)+1$ for $|y|\le M + Cc_n/h_n$, and $P_0g_{n,u}^{2}\le 4(f(u)+1)MC_1^2h_n = O(h_n)$ for $c_n/h_n \ge 1$. If $c_n/h_n < 1$, let $y_1$ be a jump point of $K_1$ in $[-M, M]$, $B=:[y_1-Cc_n/h_n, y_1+Cc_n/h_n]$ and $ B^c=: [-M-Cc_n/h_n, M+Cc_n/h_n]-B$, then
$$\begin{aligned} P_0g_{n,u}^{2}\le & {} h_n \left[ \left( \int _B + \int _{B^c}\right) \left[ K_1\left( y+C\frac{c_n}{h_n}\right) -K_1(y)\right] ^{2}f(u+h_ny)\hbox {d}y\right] \\\le & {} h_n\left[ 2C_1^2(f(u)+1)\int _B \hbox {d}y+ C_2(f(u)+1)(c_n/h_n)^2\int _{B^c}\hbox {d}y \right] \\= & {} (f(u)+1)O(c_n), \end{aligned}$$
where $C_1=\sup K$, $C_2>0$ is some constant. Thus, $P_0g_{n,u}^{2}\le (f(u)+1)O(c_n \wedge h_n)$ and $I_{n-}(u) =o(1)$. By using Bernstein’s inequality, we obtain $ |P_ng_{nu}-P_0g_{nu}| = o(h_n)\, a.s.$ and $ |I_{1n}(u)|\le |P_ng_{nu}-P_0g_{nu}|/h_n + I_{n-}(u) = o(1) \ \ a.s. $ Similarly, we also have $ I_{2n}(u) =o(1)\,\, a.s.$. It means that $\varDelta _{1n}(u) =o(1)\ a.s.$. By the same derivation way, $\varDelta _{2n}(u) = o(1)\ a.s.$ still holds. Therefore, we have $\hat{f}_n(u)-f_n(u) = o(1)\,\,\, a.s.$ by (7). $\square $
(ii).
If $f(\cdot )$ is continuous uniformly, then f(u) is bounded and $ \sup _{u} P_0g_{n,u}^{2}\le 4 \sup _u (f(u)+1)MC_1^2(c_n \wedge h_n) = O(c_n \wedge h_n)$. By Lemma 36 (ii) in Chapter 2, Pollard (1984), the covering numbers of $\mathscr {G}_n$ satisfy $\sup _{Q}N_1(\epsilon ,Q,\mathscr {G}_n)\le A\epsilon ^{-W}$, $0<\epsilon <1$, where constants A and W are not depending on n, and $\sup _{u}|g_{nu}|\le C_1$. Denote $\alpha _n=\frac{\log n}{\sqrt{n}\delta _n}, \delta _n^{2}=O(c_n\wedge h_n)$, then the conditions of Lemma 1 hold. Furthermore, we have
$$\begin{aligned} \sup _{u}|P_ng_{nu}-P_0g_{nu}|=o\left( \delta _n^{2}\alpha _n\right) \,\, \, a.s. \end{aligned}$$
Therefore, $\sup _u|I_{1n}(u)|\le o(\frac{\delta _n^{2}\alpha _n}{h_n})+\sup _u I_{n-}(u)=o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}})) + \sup _u I_{n-}(u)$ a.s. Similarly, we have $ \sup _u |I_{2n}(u)| \le o(\frac{\log n}{\sqrt{nh_n}}) + \sup _u I_{n+}(u)\, \, a.s. $ and
$$\begin{aligned} \sup _u|\varDelta _{1n}(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + \sup _u[I_{n+}(u)+ I_{n-}(u)] \, \, a.s. \end{aligned}$$
(8)
Similar to the proof of Eq. (8), we also have $\sup _u|\varDelta _{2n}(u)|\le o(\frac{\log n}{\sqrt{nh_n}}(1 \wedge \sqrt{ \frac{c_n}{h_n}}))+\sup _u [I_{n+}(u)+I_{n-}(u)] \, a.s.$. We know that $\sup _u |\hat{f}_n(u)-f_n(u)|$ can be dominated by $ \sup _u|\varDelta _{1n}(u)|+ \sup _u |\varDelta _{2n}(u)|$ almost surely by (7). Thus, we have
$$\begin{aligned} \sup _u|\hat{f}_n(u)-f_n(u)|\le o\left( \frac{\log n}{\sqrt{nh_n}}\left( 1 \wedge \sqrt{\frac{c_n}{h_n}}\right) \right) + 2\sup _u [I_{n+}(u)+I_{n-}(u)] \;\; a.s. \end{aligned}$$
This completes the proof of Lemma 3. $\square $

Proof of Theorem 1

Note that

$$\begin{aligned} |\hat{f}_n(u)-f(u)|\le |\hat{f}_n(u)-f_n(u)|+|f_n(u)-f(u)|. \end{aligned}$$

By condition $\mathbf{C}_1$ and Lemma 3 (i), we have $\hat{f}_n(u)-f_n(u) = o(1) \; a.s.$ due to the continuity of f at u. For $f_n(u)-f(u) = o(1) \; a.s.$, please refer to pages 35–36 of Pollard (1984). Therefore, $|\hat{f}_n(u)-f(u)|=o(1)$ holds a.s. $\square $

Proof of Theorem 2

According to triangle inequality,

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)|\le \sup \limits _u|\hat{f}_n(u)-f_n(u)| +\sup \limits _u|f_n(u)-f(u)|. \end{aligned}$$

From Lemma 3 (ii),

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f_n(u)|\le o\left( \frac{\log n}{\sqrt{nh_n}}\right) +2\sup \limits _u[I_{n+}(u) + I_{n-}(u)]. \end{aligned}$$

Due to the assumption that f(u) is continuous uniformly about u and conditions $\mathbf{C}_1-C_3$, we have $\sup _uI_{n-}(u)=\sup _u\int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)|\hbox {d}y = o(1) $ and $\sup _uI_{n+}(u)= o(1) $. By Lemma 3 (ii), we immediately have $\sup _u|\hat{f}_n(u)-f_n(u)|=o(1)\,\, a.s.$. Moreover, $\sup _u|f_n(u)-f(u)|=o(1) \,\, a.s.$ holds according to condition $\mathbf{C}_1$. Therefore, we obtain $\sup _u|\hat{f}_n(u)-f(u)|=o(1)\, \, a.s.$$\square $

Proof of Theorem 3

According to Theorem 2 and Lemma 3 (ii),

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)| \le o\left( \frac{\log n}{\sqrt{nh_n}}\right) +2\sup _u [I_{n+}(u) +I_{n-}(u)]+\sup \limits _u|f_n(u)-f(u)|. \end{aligned}$$

According to Lipstchz condition of f, we have that $\sup _u|Ef_n(u)-f(u)|=O(h_n)$ and $\sup _u|f_n(u)-Ef_n(u)| = o(\frac{\log n}{\sqrt{nh_n}}) \, \, a.s. $ Therefore, it is easy to know by the triangle inequality,

$$\begin{aligned} \sup \limits _u|f_n(u)-f(u)|\le & {} \sup \limits _u|f_n(u)-Ef_n(u)|+ \sup \limits _u|Ef_n(u)-f(u)|\\= & {} o\left( \frac{\log n}{\sqrt{nh_n}}\right) +O(h_n)\,\, a.s. \end{aligned}$$

Since f(u) satisfies the first-order Lipstchz condition, then

$$\begin{aligned} I_{n-}(u)=\left| \int K(y)|f(u-Cc_n+h_ny)-f(u+h_ny)\right| \hbox {d}y =O(c_n). \end{aligned}$$

Similarly, we have $I_{n+}(u)=O(c_n)$. Thus, it can be derived that

$$\begin{aligned} \sup \limits _u|\hat{f}_n(u)-f(u)|=O(\frac{\log n}{\sqrt{nh_n}}+c_n+h_n) \, \, a.s. \end{aligned}$$

$\square $

Lemma 4

Assume that the conditions $\mathbf{C}_1-C_3$ hold, $f(\cdot )$ satisfies the first-order Lipstchz condition and $\lim \nolimits _{n\rightarrow \infty }nh_n^3 =0$, then for fixed $ u \in R $ and $ f(u)>0 $, we have

$$\begin{aligned} \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)]{\mathop {\longrightarrow }\limits ^{d}}N(0,1),\, \, v= f(u) \int K^{2}(y)\mathrm{d}y. \end{aligned}$$

$\square $

Proof of Theorem 4

(i).
Since $f(\cdot )$ satisfies the first-order Lipstchz condition, then $\sup _u I_{n+}(u)+ \sup _u I_{n-}(u) = O(c_n)$. By Lemma 3 and $c_n= o(h_n/\log ^2 n) $, we know
$$\begin{aligned} \sqrt{nh_n}\,|\hat{f}_n(u)-f_n(u)| \,\,\,=\, \sqrt{nh_n}\,\left[ o\left( \sqrt{\frac{c_n}{h_n}}\frac{\log n}{\sqrt{nh_n}}\wedge \frac{\log n}{\sqrt{nh_n}}\right) +O(c_n)\right] \,=o(1) \ a.s. \end{aligned}$$
and
$$\begin{aligned} \sqrt{\frac{nh_n}{v}}\,[\hat{f}_n(u)-f(u)]= & {} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f_n(u)] + \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] \\= & {} \sqrt{\frac{nh_n}{v}}[f_n(u)-f(u)] + o(1) \,\, a.s. \end{aligned}$$
Due to condition $\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0$ and Lemma 4, then we have
$$\begin{aligned} \sqrt{\frac{nh_n}{v}}[\hat{f}_n(u)-f(u)]{\mathop {\longrightarrow }\limits ^{d}}N(0,1). \end{aligned}$$
$\square $
(ii).
By $c_n= o(h_n/\log ^2 n) $, we have $\sqrt{nh_n}\,[\hat{f}_n(u)-f_n(u)]\,= o(1) \;\ a.s. $ and $ Ef_n(u)-f(u)=O(h_n). $ In addition, by employing the law of the iterated logarithm (Theorem 2, Hall 1981), we have
$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{V\log \log n}}\,[f_n(u)-Ef_n(u)]\,=\sqrt{2}\,\, a.s. \end{aligned}$$
(9)
Furthermore, by condition $\lim \nolimits _{n\rightarrow \infty }{nh_n^{3}}=0$, it can be derived that
$$\begin{aligned} \sqrt{\frac{nh_n}{v\log \log n}}\,[\hat{f}_n(u)-f(u)]\,= \sqrt{\frac{nh_n}{v\log \log n}}\,[f_n(u)-Ef_n(u)] + o(1) \,\,a.s.\qquad \end{aligned}$$
(10)
Finally, we have
$$\begin{aligned} \limsup \limits _{n\rightarrow \infty }\sqrt{\frac{nh_n}{v\log \log n}}\,[{{\hat{f}}_n(u)}-f(u)]\,=\sqrt{2}\; a.s. \end{aligned}$$
This completes the proof of Theorem 4 by (9) and (10).

$\square $

Proof of Theorem 5

(i)
$$\begin{aligned} |T(f_n)-T(f)|= & {} \left| \int H(u)f_n(u)du-EH(e_1)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int [H(e_i+h_ny)-H(e_i)] K(y)\hbox {d}y\right| + \left| \frac{1}{n}\sum _{i=1}^n H(e_i)-EH(e)\right| \\= & {} \left| \frac{1}{n}\sum _{i=1}^{n}\int H'(e_i+\theta _i h_ny)h_ny K(y)\hbox {d}y\right| +o(1)\\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)| h_n\int |y|K(y)\hbox {d}y + o(1) = o(1) \ \ a.s. \end{aligned}$$
Next, we make some explanations about why the “$\le $” holds. For given i, $H(e_i+h_ny)-H(e_i)=H'(e_i+\theta _i h_ny)h_ny$ by Taylor’s expansion of $H(\cdot )$ at $e_i$ with $|\theta _i|\le 1$. Since $h_n\rightarrow 0$, there exists a constant $\delta >0$ such that $|\theta _ih_ny|\le Mh_n\le \delta $. Furthermore, we have $|H(e_i+h_ny)-H(e_i)|\le \sup _{|z|\le \delta }|H'(e_i+z)|h_n|y|$ as long as n is large enough. To prove $T({\hat{f}}_n)$ is a consistent estimator of T(f), it just needs to prove $T({\hat{f}}_n)-T(f_n)= o(1)\ \ a.s.$. By Taylor’s expansion,
$$\begin{aligned} T({\hat{f}}_n)-T(f_n)= & {} \frac{1}{n}\sum _{i=1}^{n}\int [H(\hat{e}_i +h_ny)- H(e_i+h_ny)]K(y)\hbox {d}y \\= & {} \frac{1}{n}\sum _{i=1}^{n} \int \left[ H'(e_i+h_ny +\theta _i(\hat{e}_i-e_i))K(y)\hbox {d}y (\hat{e}_i-e_i)\right] \\&I\{\max _i|\hat{e}_i -e_i|\le c_n\}+\, o(1) \\\le & {} \frac{1}{n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H'(e_i +z)|c_n +o(1) =o(1)\ \ a.s. \end{aligned}$$
$\square $
(ii).
By Taylor’s expansion, we have
$$\begin{aligned}&\left| \sqrt{n}[ T(f_n)-T(f)]-\frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1))\right| \\&\quad =\left| \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\int \left[ H'(e_i)h_ny + \frac{1}{2} H''(e_i + \theta _i h_n y)h_n^2y^2\right] K(y)\hbox {d}y \right| \\&\quad =\left| \frac{h_n^2}{2\sqrt{n}}\sum _{i=1}^{n}\int H''(e_i + \theta _i h_n y)y^2 K(y)\hbox {d}y\right| \\&\quad \le \frac{\sqrt{n} h_n^2}{2n}\sum _{i=1}^{n}\sup _{|z|\le \delta }|H''(e_i +z)|\int y^2 K(y)\hbox {d}y = o(1)\ \ a.s., \end{aligned}$$
and
$$\begin{aligned}&\sqrt{n}\left| T(\hat{f}_n)-T(f_n)\right| \le \sqrt{n}|T(\hat{f}_n)-T(f_n)|I\{ \max _i |\hat{e}_i -e_i|\le c_n \} + o(1) \nonumber \\&\quad = \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n [H(\hat{e}_i)-H(e_i)]\right| I\{ \max _i |\hat{e}_i -e_i|\le c_n \} \nonumber \\&\qquad + \left| \frac{1}{\sqrt{n}}\sum _{i=1}^n [ H''(\hat{e}_i +\theta _{1i} h_ny) - H''(e_i+\theta _{2i}h_ny) ]h_n^2\int y^2K(y)\hbox {d}y\right| \nonumber \\&\qquad I\{ \max _i |\hat{e}_i -e_i|\le c_n \} +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| + \frac{1}{2\sqrt{n}} \sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|c_n^2 \nonumber \\&\qquad + \frac{2h_n^2}{\sqrt{n}}\sum _{i=1}^n \sup _{|z|\le \delta }|H''(e_i +z)|\int y^2K(y)\hbox {d}y +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n H'(e_i)(\hat{e}_i -e_i)\right| +o(1) \nonumber \\&\quad \le \left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| +\left| \frac{E(H'(e_1)}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_j\right| +o(1) \nonumber \\&\quad =\left| \frac{1}{\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^n P_{ij}[H'(e_i)- E(H'(e_i))]e_j\right| + O_p(\sqrt{\hat{s}/n}) +o(1). \end{aligned}$$
(11)
Let $e_i^*=H'(e_i)- E(H'(e_i))$, then
$$\begin{aligned} E\left\{ \left[ \frac{1}{\sqrt{n}} \sum _{i=1}^n\sum _{j=1}^n P_{ij}e_i^*e_j\right] ^2|X_{\hat{M}}\right\}= & {} \frac{1}{n} \sum _{i=1}^nP_{ii}^2E\left( e_1^{*2}e_1^2\right) \\&+\frac{1}{n} \sum _{i\ne j}P_{ii}P_{jj}E\left( e_1^*e_1\right) E\left( e_2^*e_2\right) \\&+ \frac{2}{n} \sum _{i\ne j}P_{ij}^2E\left[ e_1^{*2}e_2^2\right] \\= & {} O(\hat{s}/n)+ O(\hat{s}^2/n) =O(\hat{s}^2/n) \ \ a.s. \end{aligned}$$
It can be derived from (11) and condition $\gamma +\,1/k <1/2$ that $ \sqrt{n}[T(\hat{f}_n)-T(f_n)]= o_p(1).$ Therefore,
$$\begin{aligned} \sqrt{n}[ T(\hat{f}_n)-T(f)]= & {} \sqrt{n}[ T(\hat{f}_n)-T(f_n)] + \sqrt{n}[ T(f_n)-T(f)] \\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n (H(e_i)-EH(e_1)) +o_p(1) {\mathop {\longrightarrow }\limits ^{d}}N(0, \hbox {Var}(H(e_1))). \end{aligned}$$
This completes the proof of Theorem 5. $\square $

About this article

Cite this article

Zou, F., Cui, H. Error density estimation in high-dimensional sparse linear model. Ann Inst Stat Math 72, 427–449 (2020). https://doi.org/10.1007/s10463-018-0699-0

Download citation

Received: 19 March 2018
Revised: 18 September 2018
Published: 16 November 2018
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10463-018-0699-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Error density estimation in high-dimensional sparse linear model

Abstract

Access this article

Similar content being viewed by others

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

RCV-based error density estimation in the ultrahigh dimensional additive model

Penalized robust estimators in sparse logistic regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 524 KB)

Appendix: Proofs of main results

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Lemma 4

Proof of Theorem 4

Proof of Theorem 5

About this article

Cite this article

Keywords

Navigation

Error density estimation in high-dimensional sparse linear model

Abstract

Access this article

Similar content being viewed by others

Robust Error Density Estimation in Ultrahigh Dimensional Sparse Linear Model

RCV-based error density estimation in the ultrahigh dimensional additive model

Penalized robust estimators in sparse logistic regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 524 KB)

Appendix: Proofs of main results

Appendix: Proofs of main results

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Lemma 4

Proof of Theorem 4

Proof of Theorem 5

About this article

Cite this article

Share this article

Keywords

Search

Navigation