High-dimensional inference for linear model with correlated errors

Yuan, Panxu; Guo, Xiao

doi:10.1007/s00184-021-00820-7

High-dimensional inference for linear model with correlated errors

Published: 10 May 2021

Volume 85, pages 21–52, (2022)
Cite this article

Metrika Aims and scope Submit manuscript

Panxu Yuan¹ &
Xiao Guo²

760 Accesses
1 Citation
Explore all metrics

Abstract

Temporally correlated error process is commonly encountered in practice and poses significant challenges in high-dimensional statistical analysis. This paper conducts low-dimensional inference for high-dimensional linear models with stationary errors. We adopt the framework of functional dependence measure for adequate accommodation of the error correlation. A new desparsifying Lasso based testing procedure is developed by incorporating a banded estimator of the error autocovariance matrix. Asymptotic normality of the proposed estimator is established by demonstrating the consistency of the banded autocovariance matrix estimator. The result indicates how the range of p is substantially narrower if the moment condition of error weakens or the dependence becomes stronger. We further develop a data-driven choice of the banding parameter. The simulation studies illustrate the satisfactory finite-sample performance of our proposed procedure, and a real data example is also presented for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient statistical inference for partially nonlinear errors-in-variables models

Article 15 August 2014

On Parameter Estimation for High Dimensional Errors-in-Variables Models

Variance-covariance component estimation for structured errors-in-variables models with cross-covariances

Article 25 October 2019

Abbreviations

i.i.d.:: Independent and identically distributed
CDF:: Cumulative distribution function
AR:: Autoregressive model
MA:: Moving average model
Cov:: Coverage probability of the confidence intervals
Len:: Length of the confidence intervals
Ave:: Averaged estimated parameters
Esd:: Empirical standard deviation
ACov:: Averaged coverage probability of the confidence intervals
ALen:: Averaged length of the confidence intervals

References

Adamek R, Smeekes S, Wilms I (2020) Lasso inference for high-dimensional time series. arXiv Pre-print arXiv:2007.10952v1
Babii A, Ghysels E, Striaukas J (2020) Inference for high-dimensional regressions with heteroskedasticity and autocorrelation. arXiv Pre-print arXiv:1912.06307v2
Basu S, Michailidis G (2015) Regularized estimation in sparse high-dimensional time series models. Ann Stat 43:1535–1567
Article MathSciNet Google Scholar
Bickel PJ, Levina E (2008) Regularized estimation of large covariance matrices. Ann Stat 36:199–227
MathSciNet MATH Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York
Book Google Scholar
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2404
MathSciNet MATH Google Scholar
Chernozhukov V, Härdle WK, Huang C, Wang W (2020) Lasso-driven inference in time and space. arXiv pre-print arXiv:1806.05081v4
de Mol C, Giannone D, Reichlin L (2008) Forecasting using a large number of predictors: is Bayesian shrinkage a valid alternative to principal components? J Econ 146:318–328
Article MathSciNet Google Scholar
Deshpande Y, Mackey L, Syrgkanis V, Taddy M (2018) Accurate inference for adaptive linear models. In: Proceedings of the 35th international conference on machine learning, pp 1202–1211
Deshpande Y, Javanmard A, Mehrabi M (2020) Online debiasing for adaptively collected high-dimensional data with applications to time series analysis. arXiv pre-print arXiv: 1911.01040v3
Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan JQ, Lv JC (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20:101–148
MathSciNet MATH Google Scholar
Fan JQ, Qi L, Tong X (2016) Penalized least squares estimation with weakly dependent data. Sci China Math 59:2335–2354
Article MathSciNet Google Scholar
Grenander U, Szegö G (1958) Toeplitz forms and their applications. Cambridge University Press, London
Book Google Scholar
Gupta S (2012) A note on the asymptotic distribution of lasso estimator for correlated data. Sankhya A 74:10–28
Article MathSciNet Google Scholar
Han Y, Tsay R (2020) High-dimensional linear regression for dependent data with applications to nowcasting. Stat Sin to appear
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the Lasso and generalizations. Taylor & Francis Group, New York
Book Google Scholar
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909
MathSciNet MATH Google Scholar
Liu WD, Wu WB (2010) Asymptotics of spectral density estimates. Econ Theory 26:1218–1245
Article MathSciNet Google Scholar
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York
Book Google Scholar
Raskutti G, Wainwright JM, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259
MathSciNet MATH Google Scholar
Shao X, Wu WB (2007) Asymptotic spectral theory for nonlinear time series. Ann Stat 35:1773–1801
Article MathSciNet Google Scholar
Smith SM (2012) The future of FMRI connectivity. NeuroImage 62:1257–1266
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392
MathSciNet MATH Google Scholar
van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202
MathSciNet MATH Google Scholar
Wainwright JM (2019) High-dimensional statistics: a non-asymptotic viewpoint. Cambidge University Press, Cambridge
Book Google Scholar
Wang H, Li G, Tasi C (2007) Regression coefficient and autoregressive order shrinkage and selection via the lasso. J R Stat Soc B 69:63–78
MathSciNet Google Scholar
Wong K, Li Z, Tewari A (2020) Lasso guarantees for $\beta $-mixing heavy tailed time series. Ann Stat 48:1124–1142
Article MathSciNet Google Scholar
Wu WB (2005) Nonlinear system theory: another look at dependence. Proc Natl Acad Sci USA 102:14150–14154
Article MathSciNet Google Scholar
Wu WB, Pourahmadi M (2009) Banding sample autocovariance matrices of stationary processes. Stat Sin 19:1755–1768
MathSciNet MATH Google Scholar
Wu WB, Wu YN (2016) Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron J Stat 10:352–379
Article MathSciNet Google Scholar
Xie F, Xiao ZJ (2018) Square-root lasso for high-dimensional sparse linear systems with weakly dependent errors. J Time Ser Anal 39:212–238
Article MathSciNet Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Article MathSciNet Google Scholar
Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76:217–242
Article MathSciNet Google Scholar
Zhang K, Janson L, Murphy S (2020) Inference for batched bandits. arXiv pre-print arXiv:2002.03217v2
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar

Download references

Acknowledgements

Xiao Guo’s research is supported by the National Natural Science Foundation of China, grants 12071452, 72091212, 11601500 and USTC Research Funds of the Double First-Class Initiative, grants YD2040002013. The authors also thank the Editor, Associate Editor, and two anonymous referees for their constructive comments that have led to a substantial improvement of this paper.

Author information

Authors and Affiliations

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, 230026, People’s Republic of China
Panxu Yuan
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, 230026, People’s Republic of China
Xiao Guo

Authors

Panxu Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Proofs of main results

Technical lemmas

Recall that $\varepsilon _i=g(\ldots ,\xi _{i-1},\xi _i)=g({\mathcal {F}}_i)$ in (6), define

$$\begin{aligned} \varepsilon ^*_i:=\varepsilon _{i,m}={\mathbb {E}}(\varepsilon _i|\xi _{i-m},\ldots ,\xi _i)={\mathbb {E}}(\varepsilon |{\mathcal {F}}_{i-m,i}),\quad m\ge 0, \end{aligned}$$

where ${\mathcal {F}}_{i-m,i}=(\xi _{i-m},\ldots ,\xi _i)$ is a $\sigma $ algebra. Then $\{\varepsilon _i^*\}$ is m-dependent random variables with mean zero. To prove the main theorems in this paper, we first give some lemmas.

Lemma A.1

(Concentration inequalities under dependence)

(i)
If ${\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty ,$ where $q>2$ and $\alpha >0,$ and $\sum _{i=1}^na_i^2=n$. Let ${\mathbf {a}}=(a_1,\ldots ,a_n)^\top $, and $\zeta _n=1 (resp.~ \zeta _n=(\log n)^{1+2q} ~or~\zeta _n= n^{q/2-1-\alpha q})$ if $\alpha >1/2-1/q~(resp.~\alpha =0~or~\alpha <1/2-1/q)$. Then for all $x>0, ~S_n=\sum _{i=1}^na_i\varepsilon _i$, we have
$$\begin{aligned} {\mathbb {P}}(|S_n|\ge x)\le K_1\frac{\zeta _n\Vert {\mathbf {a}}\Vert _q^q{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{x^q}+K_2\exp \Big (-\frac{K_3x^2}{n{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ), \end{aligned}$$
where $K_1, K_2, K_3$ are constants that only depend of q and $\alpha $.
(ii)
If ${\mathcal {D}}_z<\infty $ and $\sum _{i=1}^na_i^2=n$. Let $\alpha =2/(1+2z)$, $c_\alpha $ is a constant only depending on $\alpha $. Then for all $x>0, ~S_n=\sum _{i=1}^na_i\varepsilon _i$, we have
$$\begin{aligned} {\mathbb {P}}(|S_n|\ge nx)\le (2+\sqrt{2}c_\alpha )\exp \Big ( -\frac{(\sqrt{n}x/{\mathcal {D}}_z)^\alpha }{2e\alpha } \Big ). \end{aligned}$$

Proof

See Theorems 2 and 3 of Wu and Wu (2016). Details are omitted. $\square $

Lemma A.2

Suppose that $\varDelta _{0,q}<\infty $ for $q\ge 2$. Let $a_1,a_2,\ldots ,\in {\mathbb {R}} $, $A_n=(\sum _{i=1}^n a_i^2)^{1/2}$, and $C_q=18q^{3/2}(q-1)^{-1/2}.$ Then (i) ${\vert \vert \vert \sum _{i=1}^n a_i\varepsilon _i \vert \vert \vert }_q\le C_qA_n\varDelta _{0,q}$, and (ii) ${\vert \vert \vert \sum _{i=1}^n a_i(\varepsilon _i-\varepsilon _i^*) \vert \vert \vert }_q\le C_qA_n\varDelta _{m+1,q}.$

Proof

This result can be proved by using the arguments in Liu and Wu (2010) Lemma 1. $\square $

Lemma A.3

Suppose $ \varDelta _{m,q}<\infty $ for $2 <q\le 4$. Let $d=q/2$, then for any $j\in {\mathbb {Z}}$,

$$\begin{aligned} {\left| \left| \left| \sum _{i=1}^n\varepsilon _i\varepsilon _{i+j}-n\gamma _{j}^{\varvec{\varepsilon }} \right| \right| \right| }_d\le 2B_dn^{1/d}{\vert \vert \vert \varepsilon _1 \vert \vert \vert }_q\varDelta _{0,q}, \end{aligned}$$

where

$$\begin{aligned} B_{d}=\left\{ \begin{array}{ll} {\frac{18 d^{3 / 2}}{(d-1)^{1 / 2}},} &{} \mathrm{if } \quad d \ne 2, \\ {1,} &{} \mathrm{if } \quad d=2.\end{array}\right. \end{aligned}$$

Proof

See Lemma 1 of Wu and Pourahmadi (2009). $\square $

Lemma A.4

For $j\in \{1,\ldots ,p \}$, we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j-{\mathbf {e}}_j\Vert _\infty \le \frac{ \lambda _j}{\widehat{\tau }_j^2}. \end{aligned}$$

Proof

This Lemma is easily proved by using the KKT conditions for the nodewise Lasso.

Firstly, notice that $\widehat{z}_j^\top \widehat{\varvec{\gamma }}_j=\Vert \widehat{\varvec{\gamma }}_j\Vert _1$, where $\widehat{z}_j$ is the subdifferential of $\Vert \varvec{\gamma }\Vert _1$, thus $\Vert \widehat{z}_j\Vert _\infty \le 1$. By the KKT conditions for the nodewise lasso (5), we have

$$\begin{aligned} -\frac{1}{n}(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)^\top {\mathbf {X}}_{-j}+\lambda _j \widehat{z}_j=0, \end{aligned}$$

thus

$$\begin{aligned} \frac{(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n}+\lambda _j||\widehat{\varvec{\gamma }}_j||_1=\frac{X_j^T(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n}=\widehat{\tau }_j^2. \end{aligned}$$

Dividing each side of the above display by $\widehat{\tau }_j^2$ yields

$$\begin{aligned} 1=\frac{X_j^T(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n\widehat{\tau }_j^2}=\frac{X_j^T{\mathbf {X}}\widehat{C}_j}{n\widehat{\tau }_j^2}, \end{aligned}$$

so that

$$\begin{aligned} \frac{X_j^T{\mathbf {X}}\widehat{\varvec{\varTheta }}_j}{n}=1. \end{aligned}$$

Moreover, note that the KKT conditions for the nodewise lasso (5) can be written as

$$\begin{aligned} \widehat{z}_j=\frac{{\mathbf {X}}_{-j}^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j) }{n\lambda _j}, \end{aligned}$$

using $\Vert \widehat{z}_j\Vert _\infty \le 1$ yields

$$\begin{aligned} \Big \Vert \frac{{\mathbf {X}}_{-j}^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j) }{n\lambda _j}\Big \Vert _\infty \le 1, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \frac{\Vert {\mathbf {X}}_{-j}^\top {\mathbf {X}}\widehat{C}_j\Vert _\infty }{n}\le \lambda _j, \end{aligned}$$

since $(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)={\mathbf {X}}\widehat{C}_j.$ Then, dividing both sides of the above display by $\widehat{\tau }_j$ and using that $\widehat{\varvec{\varTheta }}_j=\widehat{ C}_j/\widehat{\tau }_j^2$, we have

$$\begin{aligned} \frac{\Vert {\mathbf {X}}_{-j}^T{\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _\infty }{n}\le \frac{\lambda _j}{\widehat{\tau }_j^2}, \end{aligned}$$

and combine with $X_j^\top {\mathbf {X}}\widehat{\varvec{\varTheta }}_j/n=1$, we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j-{\mathbf {e}}_j\Vert _\infty \le \frac{ \lambda _j}{\widehat{\tau }_j^2}. \end{aligned}$$

$\square $

Lemma A.5

Suppose Assumptions 3–4 hold and $\log p=o(\sqrt{n})$. Then there exist constants c and C, for all j it holds that

$$\begin{aligned} c\le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2\le C. \end{aligned}$$

Proof

By the definition of $\widehat{C}_j$, we have $\Vert {\mathbf {X}}\widehat{C}_j\Vert _2^2/n=\Vert X_{j}-{\mathbf {X}}_{-j} \widehat{\varvec{\gamma }}_{j}\Vert _{2}^2 / n :=\widetilde{\tau }_{j}^2,$ and $\widehat{\varvec{\varTheta }}_j=\widehat{ C}_j/\widehat{\tau }_j^2$. Thus

$$\begin{aligned} \widetilde{\tau }_{j}^2=\frac{1}{n}\Vert {\mathbf {X}}\widehat{C}_j\Vert _2^2=\frac{1}{n}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2\cdot \widehat{\tau }_j^4. \end{aligned}$$

Note that $\widehat{\tau }_j^2=\widetilde{\tau }_j^2+\lambda _j\Vert \widehat{\varvec{\gamma }}_j\Vert _1$, thus we get $\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2/n\le C$ since $\widetilde{\tau }_j^2\le \widehat{\tau }_j^2\le C$. Furthermore, according to Lemma A.4, Assumptions 3–4 and $\log p=o(\sqrt{n})$, we get

$$\begin{aligned} \begin{aligned} |\widehat{\varvec{\varTheta }}_j^T( \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j)|&\le \Vert \widehat{\varvec{\varTheta }}_j\Vert _1\Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j\Vert _{\infty }\\&= O(a_n\lambda _j)\\&=o(n^r\sqrt{\log (p)/n})=o(1). \end{aligned} \end{aligned}$$

Thus

$$\begin{aligned} \begin{aligned} n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2&=\widehat{\varvec{\varTheta }}_j^T\widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j=\widehat{\varvec{\varTheta }}_j^T{\mathbf {e}}_j+\widehat{\varvec{\varTheta }}_j^T( \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j)\\&=1/\widehat{\tau }_j^2+o(1)\ge c. \end{aligned} \end{aligned}$$

$\square $

Lemma A.6

Consider the linear model in (2) and the Lasso in (3).

(i)
If the error series $\{\varepsilon _i\}$ has finite q-th moment, $q>2$, and ${\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty $ for $ \alpha \ge 0.$ Define
$$\begin{aligned} v=\left\{ \begin{array}{ll} {1}/{2}, &{} { \text{ if } \alpha >1 / 2-1 / q} ,\\ {{1}/{q}+\alpha }, &{} { \text{ if } \alpha <1 / 2-1 / q}. \end{array} \right. , \end{aligned}$$
let $\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-v}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }\},$ where A is a sufficiently large constant, then, with probability at least $1- C_1(\log p)^{-1}-C_2p^{1-K_3A^2}$, we have
$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\le Cs_0\lambda \quad \mathrm{and }\quad \Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2/n\le Cs_0\lambda ^2. \end{aligned}$$
(A.1)
(ii)
If the error series $\{\varepsilon _i\}$ has finite exponential moment, that is ${\mathcal {D}}_z<\infty $. Let $\alpha =2/(1+2z)$, $c_\alpha $ is a constant only depending on $\alpha $ and $\lambda =An^{-1/2}(\log p)^{1/\alpha }{\mathcal {D}}_z$, where A is a sufficiently large constant. Then, with probability at least $1- C_1p^{1-C_2A^\alpha }$, we have the bounds (A.1).

Proof

Since $\widehat{\varvec{\beta }}$ minimizes (3), we have

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Vert \widehat{\varvec{\beta }}\Vert _1\le 2\lambda \Vert \varvec{\beta }^*\Vert _1+\frac{2}{n}\varvec{\varepsilon }^\top {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*). \end{aligned}$$

(A.2)

Define the event ${\mathcal {T}}:=\{\max _{1\le j\le p}|\varGamma _j|\le \lambda /c\}$ for some constant $c>0$, where $\varGamma _j=n^{-1}\sum _{i=1}^nx_{ij}\varepsilon _i.$ Then on the event ${\mathcal {T}}$, uing the fact

$$\begin{aligned} \frac{1}{n}\Vert 2\varvec{\varepsilon }^\top {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _\infty \le 2 \Big (\max _{1\le j\le p}|\varGamma _j|\Big )\cdot \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1, \end{aligned}$$

inequality (A.2) implies that

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Vert \widehat{\varvec{\beta }}\Vert _1\le 2\lambda \Vert \varvec{\beta }^*\Vert _1+\frac{2\lambda }{c}\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1. \end{aligned}$$

(A.3)

On the lefthand side in (A.3), using the triangle inequality,

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\ge \Vert \varvec{\beta }_{S_0}^*\Vert _1-\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1, \end{aligned}$$

whereas on the righthand side in (A.3), we can use

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1. \end{aligned}$$

Thus, we have

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Big ( 1-\frac{1}{c} \Big ) \Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le 2\lambda \Big ( 1+\frac{1}{c} \Big )\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1. \end{aligned}$$

(A.4)

In particular, the above implies

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le \frac{c+1}{c-1}\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1. \end{aligned}$$

Then Assumption 1 shows that

$$\begin{aligned} \frac{s_0}{n\kappa _0}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2\ge \Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1^2, \end{aligned}$$

combining (A.4), we get

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1\le Cs_0\lambda . \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le C\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1, \end{aligned}$$

we have

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\le Cs_0\lambda . \end{aligned}$$

Similarly,

$$\begin{aligned} \Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2/n\le Cs_0\lambda ^2. \end{aligned}$$

Now, we need to control the probability ${\mathbb {P}}({\mathcal {T}}).$ There are 2 situations.

(i)
If the error series $\{\varepsilon _i\}$ has finite q-th moment, $q>2$, and ${\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty $ for $ \alpha \ge 0.$ For $\alpha >1/2-1/q,$ let $v=1/2$, by the inequality of Lemma A.1 (i) with $\zeta _n=1$, we have
$$\begin{aligned} \begin{aligned} {\mathbb {P}}(|\varGamma _j|\ge \lambda /c)&\le K_1{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q\frac{\sum _{i=1}^n|x_{ij}|^q}{(n\lambda )^q}+K_2\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big )\\&\le K_1\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ) \end{aligned} \end{aligned}$$
Hence
$$\begin{aligned} \begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)&\le \sum _{j=1}^p K_1\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2p\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big )\\&= K_1p\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2p\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ), \end{aligned} \end{aligned}$$
under our choice of $\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-1/2}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }\}$ where A is a sufficiently large constant, we have
$$\begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2}. \end{aligned}$$
The case of $\alpha <1/2-1/q$ can be similarly proved.
(ii)
If the error series $\{\varepsilon _i\}$ satisfies stronger moment condition, that is ${\mathcal {D}}_z<\infty $ for $q\ge 2$. By Lemma A.1 (ii), we have
$$\begin{aligned} {\mathbb {P}}(|\varGamma _j|\ge \lambda /c)\le (2+\sqrt{2}c_\alpha )\exp \Big ( -\frac{(\sqrt{n}\lambda /(c{\mathcal {D}}_z) )^\alpha }{2e\alpha }\Big ). \end{aligned}$$
Hence
$$\begin{aligned} \begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le \sum _{j=1}^p{\mathbb {P}}(|\varGamma _j|\ge \lambda /c)= (2+\sqrt{2}c_\alpha )p\exp \Big ( -\frac{(\sqrt{n}\lambda /(c {\mathcal {D}}_z ) )^\alpha }{2e\alpha }\Big ). \end{aligned} \end{aligned}$$
Let $\lambda =An^{-1/2}(\log p)^{1/\alpha }{\mathcal {D}}_z$, we get
$$\begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

The proofs are completed. $\square $

Lemma A.7

Under the condition in Lemma A.6 and assume that $s_0\lambda ^2=o(1)$. For any fixed k, we have

$$\begin{aligned} \widehat{\gamma }^{\varvec{\varepsilon }}_k{\mathop {\longrightarrow }\limits ^{P}}\gamma ^{\varvec{\varepsilon }}_k. \end{aligned}$$

Proof

By using the results in Lemma A.6, we have

$$\begin{aligned} \Vert {\mathbf {e}}-\varvec{\varepsilon } \Vert _1=\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*) \Vert _1=O_{{\mathbb {P}}}(s_0\lambda ^2)=o_{{\mathbb {P}}}(1), \end{aligned}$$

thus $e_k{\mathop {\longrightarrow }\limits ^{P}}\varepsilon _k$ for $k=0,\ldots ,n-1$. Then for any fixed k,

$$\begin{aligned} \widehat{\gamma }^{\varvec{\varepsilon }}_k=\frac{1}{n}\sum _{i=1}^{n-|k|}e_ie_{i+|k|}{\mathop {\longrightarrow }\limits ^{P}}\frac{1}{n}\sum _{i=1}^{n-|k|}\varepsilon _i\varepsilon _{i+|k|}{\mathop {\longrightarrow }\limits ^{P}}\gamma ^{\varvec{\varepsilon }}_k \end{aligned}$$

by the ergodicity condition, and thus the proofs are completed. $\square $

Technical proofs

1.1 Proof of Theorem 1

Proof

Based on the definition of the desparsifying Lasso (4), and using

$$\begin{aligned} {\mathbf {Y}}={\mathbf {X}} \varvec{\beta }^*+\varvec{\varepsilon }, \end{aligned}$$

simple algebra yields

$$\begin{aligned} \widehat{{\varvec{b}}}=\varvec{\beta }^*+\widehat{\varvec{\varTheta }}{\mathbf {X}}^\top \varvec{\varepsilon }/n+\varvec{\varLambda }/\sqrt{n}, \end{aligned}$$

where $\varvec{\varLambda }=-\sqrt{n}(\widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p)(\widehat{\varvec{\beta }}-\varvec{\beta }^*).$ Thus,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{b}}}-\varvec{\beta }^*)=\frac{1}{\sqrt{n}}\widehat{\varvec{\varTheta }}{\mathbf {X}}^\top \varvec{\varepsilon }+\varvec{\varLambda }. \end{aligned}$$

By using the fact that for any matrix $\mathbf{A}\in {\mathbb {R}}^{m\times n}$ and any vector $x\in {\mathbb {R}}^{n\times 1}$,

$$\begin{aligned} \Vert \mathbf{A}x\Vert _\infty \le \Vert \mathbf{A}\Vert _{\max }\Vert x\Vert _1, \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned} \Vert \varvec{\varLambda }\Vert _\infty&= \sqrt{n}\Vert (\widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p)(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _\infty \le \sqrt{n}\Vert \widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p\Vert _{\max }\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1. \end{aligned} \end{aligned}$$

By Lemma A.4, we have

$$\begin{aligned} \Vert \varvec{\varLambda }\Vert _\infty \le \sqrt{n}\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\max _{1\le j\le p}( \lambda _j/\widehat{\tau }_j^2). \end{aligned}$$

(i)
If the error sequence has finite q-th moment, by Lemma A.6 (i), for $\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-v}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,a}\},$ yields
$$\begin{aligned} {\mathbb {P}}\Big (\Vert \varvec{\varLambda }\Vert _\infty \ge C\sqrt{n}s_0\lambda \Big (\max \limits _j\frac{\lambda _j}{\widehat{\tau }_j^2}\Big ) \Big )\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2}. \end{aligned}$$
(ii)
If the error sequence has finite exponential moment, then using Lemma A.6 (ii), for $\lambda =An^{-1/2}(\log p)^{1/\alpha }\mathcal {D}_z$, yields
$$\begin{aligned} {\mathbb {P}}\Big (\Vert \varvec{\varLambda }\Vert _\infty \ge C\sqrt{n}s_0\lambda \Big (\max \limits _j\frac{\lambda _j}{\widehat{\tau }_j^2}\Big ) \Big )\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

The following proof is applicable to both the error sequence has finite q-th moment and exponential moment, and we will use the m-dependent approximation to prove (7) holds.

For some constants $2<d<\infty $ and $2r<\eta <1/2$, where r is given in Assumption 4, let $\zeta _n=\lfloor n^{\eta }/(\log n)^d\rfloor $, $t_n=\lfloor n^{1/2-\eta }\rfloor $ and $w_n=\lfloor n/\zeta _n-t_n\rfloor $. Define $\varvec{\varepsilon }^*=(\varepsilon _1^*,\ldots ,\varepsilon _n^*)^\top $ with

$$\begin{aligned} \varepsilon ^*_i:=\varepsilon _{i,t_n}={\mathbb {E}}(\varepsilon _i|\xi _{i-t_n},\ldots ,\xi _i)={\mathbb {E}}(\varepsilon _i|{\mathcal {F}}_{i-t_n,i}), \end{aligned}$$

Thus $\{\varepsilon _i^*\}$ are $t_n$-dependent sequence. For any $k\in \{1,\ldots ,p\}$, according to Lemma A.2 (ii) and Lemma A.5 we have

$$\begin{aligned} \begin{aligned} {\left| \left| \left| n^{-1/2}\sum _{i=1}^n\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i(\varepsilon _i-\varepsilon _i^*) \right| \right| \right| }_2^2&= \mathrm{Var}\Big \{n^{-1/2} \sum _{i=1}^n\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i(\varepsilon _i-\varepsilon _i^*)\Big \}\\&\le Cn^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2 \varDelta _{t_n+1,2}^2\\&= O(\varDelta _{t_n+1,2}^2)\\&=o(1), \end{aligned} \end{aligned}$$

since the processes $\{\varepsilon _i\}$ is short-range dependence. Therefore, to show

$$\begin{aligned} V_k=\frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i{\mathop {\longrightarrow }\limits ^{D}} N(0, 1) \end{aligned}$$

is equivalent to show

$$\begin{aligned} \frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i^*{\mathop {\longrightarrow }\limits ^{D}}N(0,1). \end{aligned}$$

Denote by ${\mathbf {v}}=n^{-1/2}\widehat{\varvec{\varTheta }}_k^\top \mathbf { X}^\top /{\sigma _k} =(v_1,\ldots ,v_n)$, where $v_j=n^{-1/2}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_j/{\sigma _k}$. Define

$$\begin{aligned} \varvec{\varXi }_i= & {} \sum _{j=(i-1)(w_n+t_n)+1}^{(i-1)(w_n+t_n)+w_n}v_j\varepsilon ^*_j, \quad i=1,\ldots , \zeta _n. \\ \varvec{\varXi }_i^\prime= & {} \left\{ \begin{array}{cc} \sum \limits _{j=(i-1)(w_n+t_n)+w_n+1}^{i(w_n+t_n)}v_j\varepsilon _j^*,&{} i=1,\ldots , \zeta _n-1,\\ \sum \limits _{j=(i-1)(w_n+t_n)+w_n+1}^n v_j\varepsilon _j^*,&{} i= \zeta _n. \end{array} \right. \end{aligned}$$

Therefore

$$\begin{aligned} \sum _{i=1}^{ \zeta _n }\varvec{\varXi }_i+\sum _{i=1}^{ \zeta _n}\varvec{\varXi }_i^\prime = {\mathbf {v}}\varvec{\varepsilon }^*. \end{aligned}$$

For n large enough, $\{\varvec{\varXi }_i\}_{i=1}^{\zeta _n}$ are independent and $\{\varvec{\varXi }_i^\prime \}_{i=1}^{\zeta _n}$ are independent, since $\varvec{\varepsilon }^*$ are $t_n$-dependent. In the following, we will show that

$$\begin{aligned} \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i{\mathop {\longrightarrow }\limits ^{D}} N(0,1) \quad \mathrm{and}\quad \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime =o_{{\mathbb {P}}}(1). \end{aligned}$$

By Lemma A.5, we get

$$\begin{aligned} \begin{aligned} \sigma _k^2= |n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \varvec{\varSigma }_n{\mathbf {X}}\widehat{\varvec{\varTheta }}_k| \le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2\cdot \lambda _{\max }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})=O(1) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \sigma _k^{2}=|n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \varvec{\varSigma }_n{\mathbf {X}}\widehat{\varvec{\varTheta }}_k|\ge n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2\cdot \lambda _{\min }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})=O(1), \end{aligned} \end{aligned}$$

where $\lambda _{\min }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})$ and $\lambda _{\max }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})$ are the minimum and maximun eigenvalue of ${\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}$, respectively, since the eigenvalues of ${\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}=\mathrm{Cov}(\varvec{\varepsilon })$ are bounded away from zero and infinity (see Section 5.2 in Grenander and Szegö 1958). Thus we have

$$\begin{aligned} \begin{aligned} \max _{j}|v_j|&=\Vert n^{-1/2}\sigma _k^{-1} \widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \Vert _\infty \le Cn^{-1/2}\Vert {\mathbf {X}}\varvec{\widehat{\varTheta }}_k\Vert _\infty \\&\le Cn^{-1/2}\Vert {\mathbf {X}}\Vert _{\max }\Vert \varvec{\widehat{\varTheta }}_k \Vert _1\\&=O(n^{-1/2}a_n\log n), \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \Vert {\mathbf {v}}\Vert _2=\Vert n^{-1/2}\sigma _k^{-1} \widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \Vert _2\le Cn^{-1/2}\Vert {\mathbf {X}}\varvec{\widehat{\varTheta }}_k\Vert _2=O(1). \end{aligned} \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} {\mathrm{Var}}\Big (\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime \Big )&=\sum _{i=1}^{\zeta _n}{\mathrm{Var}}\Big (\varvec{\varXi }_i^\prime \Big )\le C\zeta _n {\mathrm{Var}}\Big (\sum _{j=\omega _n+1}^{\omega _n+t_n}v_j\varepsilon _j^*\Big )\\&\le C\frac{ \zeta _n (a_n\log n)^2}{n}\mathrm{Var}\Big (\sum _{j=\omega _n+1}^{\omega _n+t_n}\varepsilon _j^*\Big )\\&=O\Big (\frac{ \zeta _n t_n^2(a_n\log n)^2}{n}\Big )\\&= o\Big (\frac{ n^{2r}}{n^{\eta }(\log n)^{d-2}}\Big )\\&= o(1), \end{aligned} \end{aligned}$$

which, together with $\mathbb { E}(\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime )=0$ implies $\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime =o_{{\mathbb {P}}}(1)$.

By Lemma A.2(ii), we have

$$\begin{aligned} \mathrm{Var}\{{\mathbf {v}}(\varvec{\varepsilon }-\varvec{\varepsilon }^*)\}\le C\Vert {\mathbf {v}}\Vert _2^2\varDelta _{t_n+1,2}^2=O(\varDelta _{t_n+1,2}^2)=o(1) \quad \mathrm{as}\quad n\rightarrow \infty . \end{aligned}$$

Then

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm{Var}\Big \{\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i+\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime \Big \}=\lim _{n\rightarrow \infty }\mathrm{Var}({\mathbf {v}}\varvec{\epsilon }^*)=\mathrm{Var}({\mathbf {v}}\varvec{\epsilon })=1, \end{aligned}$$

implying that $\lim _{n\rightarrow \infty }\mathrm{Var}(\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i)=1$, since $\mathrm{Var}( \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime )=o(1)$.

According to Lemma A.2(ii), the Liapounov condition follows

$$\begin{aligned} \begin{aligned} \sum _{i=1}^{\zeta _n}{\mathbb {E}}(|\varvec{\varXi }_i|^{2+c})&\le C\sum _{i=1}^{\zeta _n}\Big \{ \sum _{j=(i-1)(w_n+t_n)+1}^{(i-1)(w_n+t_n)+w_n}v_j^2\Big \}^{1+c/2}\varDelta _{0,2+c}^{2+c}\\&=O\Big (\frac{\zeta _n w_n^{1+c/2}}{n^{1+c/2}}\cdot (a_n\log n)^{2+c}\Big )\\&= O\Big ( \frac{a_n^{2+c}(\log n)^{2+(1+d/2)c}}{n^{c\eta /2}} \Big )\\&=o(1) \end{aligned} \end{aligned}$$

for some constant $c>0$. Hence, by central limit theorem,

$$\begin{aligned} \frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i^*{\mathop {\longrightarrow }\limits ^{D}} N(0, 1), \end{aligned}$$

that is

$$\begin{aligned} V_k=\frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top \mathbf { x}_i\varepsilon _i{\mathop {\longrightarrow }\limits ^{D}} N(0, 1), \end{aligned}$$

and the proofs are completed. $\square $

1.2 Proof of Lemma 1

Proof

By using the fact that for any matrix $\mathbf{A}\in {\mathbb {R}}^{m\times n}$,

$$\begin{aligned} \Vert \mathbf{A}\Vert _2^2\le \Vert \mathbf{A} \Vert _{1} \Vert \mathbf{A} \Vert _{\infty }, \end{aligned}$$

we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2 \le \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _{1}, \end{aligned}$$

since $\widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}$ is symmetric. Thus

$$\begin{aligned} \begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2&\le \max _{1\le j\le n}\sum _{i=1}^n|\widehat{\gamma }_{i-j}^{\varvec{\varepsilon }}{\mathbf {1}}_{|i-j|\le l}-\gamma _{i-j}^{\varvec{\varepsilon }}|\\&\le \sum _{i=1-n}^{n-1}|\widehat{\gamma }_i^{\varvec{\varepsilon }}{\mathbf {1}}_{|i|\le l}-\gamma _{i}^{\varvec{\varepsilon }}|\\&\le 2\sum _{i=0}^{l} |\widehat{\gamma }_i^{\varvec{\varepsilon }}-\gamma _i^{\varvec{\varepsilon }}|+2\sum _{i=l+1}^n|\gamma _i^{\varvec{\varepsilon }}|\\&:=T_1+T_2. \end{aligned} \end{aligned}$$

Note that for $i\ge 0$,

$$\begin{aligned} \begin{aligned}&\Big | \frac{1}{n}\sum _{j=1}^{n-i}e_je_{j+i} -\gamma _j^{\varvec{\varepsilon }} \Big |=\Big |\frac{1}{n} \sum _{j=1}^{n-i}[\varepsilon _j-{\mathbf {x}}_j^\top ( \widehat{\varvec{\beta }}-\varvec{\beta }^*)] [\varepsilon _{j+i}-{\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*) ] -\gamma _j^{\varvec{\varepsilon }} \Big |\\&\quad \le \frac{1}{n}\Big | \sum _{i=1}^{n-i}\varepsilon _{j}\varepsilon _{j+i} - (n-i)\gamma _{j}^{\varvec{\varepsilon }} \Big | +\frac{1}{n}\Big |\sum _{j=1}^{n-i} (\varepsilon _j {\mathbf {x}}_{j+i}^\top + \varepsilon _{j+i}{\mathbf {x}}_{j}^\top ) (\widehat{\varvec{\beta }}-\varvec{\beta }^*) \Big |\\&\quad \quad +\frac{1}{n}\sum _{j=1}^{n-i} |{\mathbf {x}}_j^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)\cdot {\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)| + \frac{i}{n}|\gamma _j^{\varvec{\varepsilon }}|\\&\quad := D_1 + D_2 + D_3 + D_4. \end{aligned} \end{aligned}$$

Using Lemma A.3, there exists a constant $c_q$ depending only on q, such that

$$\begin{aligned} {\vert \vert \vert D_1 \vert \vert \vert }_{q/2} \le c_q{\vert \vert \vert \varepsilon _1 \vert \vert \vert }_{q}\varDelta _{0,q}\frac{(n-i)^{2/q}}{n}\le C n^{2/q-1}, \end{aligned}$$

thus

$$\begin{aligned} D_1=O_{{\mathbb {P}}}(n^{2/q-1}). \end{aligned}$$

Note that

$$\begin{aligned} \begin{aligned} \frac{1}{n}\Big |\sum _{j=1}^{n-i}\varepsilon _j{\mathbf {x}}_{i+j}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Big |&\le \frac{1}{n}\Vert \sum _{j=1}^{n-i} \varepsilon _j {\mathbf {x}}_{i+j}\Vert _\infty \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^* \Vert _1 \\&= \max _{k\le p} \frac{1}{n}\Big |\sum _{j=1}^{n-i}\varepsilon _jx_{j+i,k}\Big |\cdot \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1, \end{aligned} \end{aligned}$$

according to Lemma A.1, we get

$$\begin{aligned} {\mathbb {P}}\Big ( \max _{k} \frac{1}{n}\sum _{j=1}^n|x_{j,k}\varepsilon _j|\ge \lambda \Big )\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2} \end{aligned}$$

or

$$\begin{aligned} {\mathbb {P}}\Big ( \max _{k} \frac{1}{n}\sum _{j=1}^n|x_{j,k}\varepsilon _j|\ge \lambda \Big )\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

Combining Lemma A.6, yields $D_2 = O_{{\mathbb {P}}}(s_0\lambda ^2).$ Moreover,

$$\begin{aligned} \begin{aligned} D_3 = \frac{1}{n}\sum _{j=1}^{n-i} |{\mathbf {x}}_j^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)|\cdot |{\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)|\le C\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1^2, \end{aligned} \end{aligned}$$

thus $D_3 =O_{{\mathbb {P}}}(s_0^2\lambda ^2).$ Then we get

$$\begin{aligned} T_1 =O_{{\mathbb {P}}}\Big ( l(n^{2/q-1} +s_0^2\lambda ^2) \Big ), \end{aligned}$$

since $D_2=o_{{\mathbb {P}}}(D_3)$ and $D_4=O(l^2n^{-1})=o_{{\mathbb {P}}}(D_1)$. Therefore

$$\begin{aligned} \begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _2 = O_{{\mathbb {P}}} \Big ( l(n^{2/q-1} +s_0^2\lambda ^2) \Big )+2\sum _{i=l+1}^n|\gamma _i^{\varvec{\varepsilon }}|. \end{aligned} \end{aligned}$$

Let $l\rightarrow \infty , l(n^{2/q-1}+s_0^2\lambda ^2)=o(1)$ for $2<q\le 4$, yields

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _2 = o_{{\mathbb {P}}}(1), \end{aligned}$$

and the proofs are completed. $\square $

1.3 Proof of Theorem 2

Proof

Since $ \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2=o_{{\mathbb {P}}}(1)$ by Lemma 1, we get

$$\begin{aligned} \begin{aligned} |\widehat{\sigma }_k^2-\sigma _k^2|&=|n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top (\widehat{\varvec{\varSigma }}_{n,l}- \varvec{\varSigma }_n) {\mathbf {X}}\widehat{\varvec{\varTheta }}_k|\\&\le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2 \cdot \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2\\&=o_{{\mathbb {P}}}(1). \end{aligned} \end{aligned}$$

Thus $\widehat{\sigma }_k^2$ is a consistent estimator of $\sigma _k^2$. Combining Theorem 1, Lemma 1 and Slutsky’s Theorem, the proofs are completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, P., Guo, X. High-dimensional inference for linear model with correlated errors. Metrika 85, 21–52 (2022). https://doi.org/10.1007/s00184-021-00820-7

Download citation

Received: 07 August 2020
Accepted: 17 April 2021
Published: 10 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00184-021-00820-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional inference for linear model with correlated errors

Abstract

Access this article

Similar content being viewed by others

Efficient statistical inference for partially nonlinear errors-in-variables models

On Parameter Estimation for High Dimensional Errors-in-Variables Models

Variance-covariance component estimation for structured errors-in-variables models with cross-covariances

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix: Proofs of main results

Technical lemmas

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Lemma A.6

Proof

Lemma A.7

Proof

Technical proofs

1.1 Proof of Theorem 1

Proof

1.2 Proof of Lemma 1

Proof

1.3 Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation