Skip to main content
Log in

High-dimensional inference for linear model with correlated errors

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Temporally correlated error process is commonly encountered in practice and poses significant challenges in high-dimensional statistical analysis. This paper conducts low-dimensional inference for high-dimensional linear models with stationary errors. We adopt the framework of functional dependence measure for adequate accommodation of the error correlation. A new desparsifying Lasso based testing procedure is developed by incorporating a banded estimator of the error autocovariance matrix. Asymptotic normality of the proposed estimator is established by demonstrating the consistency of the banded autocovariance matrix estimator. The result indicates how the range of p is substantially narrower if the moment condition of error weakens or the dependence becomes stronger. We further develop a data-driven choice of the banding parameter. The simulation studies illustrate the satisfactory finite-sample performance of our proposed procedure, and a real data example is also presented for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Abbreviations

i.i.d.:

Independent and identically distributed

CDF:

Cumulative distribution function

AR:

Autoregressive model

MA:

Moving average model

Cov:

Coverage probability of the confidence intervals

Len:

Length of the confidence intervals

Ave:

Averaged estimated parameters

Esd:

Empirical standard deviation

ACov:

Averaged coverage probability of the confidence intervals

ALen:

Averaged length of the confidence intervals

References

  • Adamek R, Smeekes S, Wilms I (2020) Lasso inference for high-dimensional time series. arXiv Pre-print arXiv:2007.10952v1

  • Babii A, Ghysels E, Striaukas J (2020) Inference for high-dimensional regressions with heteroskedasticity and autocorrelation. arXiv Pre-print arXiv:1912.06307v2

  • Basu S, Michailidis G (2015) Regularized estimation in sparse high-dimensional time series models. Ann Stat 43:1535–1567

    Article  MathSciNet  Google Scholar 

  • Bickel PJ, Levina E (2008) Regularized estimation of large covariance matrices. Ann Stat 36:199–227

    MathSciNet  MATH  Google Scholar 

  • Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York

    Book  Google Scholar 

  • Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2404

    MathSciNet  MATH  Google Scholar 

  • Chernozhukov V, Härdle WK, Huang C, Wang W (2020) Lasso-driven inference in time and space. arXiv pre-print arXiv:1806.05081v4

  • de Mol C, Giannone D, Reichlin L (2008) Forecasting using a large number of predictors: is Bayesian shrinkage a valid alternative to principal components? J Econ 146:318–328

    Article  MathSciNet  Google Scholar 

  • Deshpande Y, Mackey L, Syrgkanis V, Taddy M (2018) Accurate inference for adaptive linear models. In: Proceedings of the 35th international conference on machine learning, pp 1202–1211

  • Deshpande Y, Javanmard A, Mehrabi M (2020) Online debiasing for adaptively collected high-dimensional data with applications to time series analysis. arXiv pre-print arXiv: 1911.01040v3

  • Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan JQ, Lv JC (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20:101–148

    MathSciNet  MATH  Google Scholar 

  • Fan JQ, Qi L, Tong X (2016) Penalized least squares estimation with weakly dependent data. Sci China Math 59:2335–2354

    Article  MathSciNet  Google Scholar 

  • Grenander U, Szegö G (1958) Toeplitz forms and their applications. Cambridge University Press, London

    Book  Google Scholar 

  • Gupta S (2012) A note on the asymptotic distribution of lasso estimator for correlated data. Sankhya A 74:10–28

    Article  MathSciNet  Google Scholar 

  • Han Y, Tsay R (2020) High-dimensional linear regression for dependent data with applications to nowcasting. Stat Sin to appear

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the Lasso and generalizations. Taylor & Francis Group, New York

    Book  Google Scholar 

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15:2869–2909

    MathSciNet  MATH  Google Scholar 

  • Liu WD, Wu WB (2010) Asymptotics of spectral density estimates. Econ Theory 26:1218–1245

    Article  MathSciNet  Google Scholar 

  • Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York

    Book  Google Scholar 

  • Raskutti G, Wainwright JM, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259

    MathSciNet  MATH  Google Scholar 

  • Shao X, Wu WB (2007) Asymptotic spectral theory for nonlinear time series. Ann Stat 35:1773–1801

    Article  MathSciNet  Google Scholar 

  • Smith SM (2012) The future of FMRI connectivity. NeuroImage 62:1257–1266

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392

    MathSciNet  MATH  Google Scholar 

  • van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Stat 42:1166–1202

    MathSciNet  MATH  Google Scholar 

  • Wainwright JM (2019) High-dimensional statistics: a non-asymptotic viewpoint. Cambidge University Press, Cambridge

    Book  Google Scholar 

  • Wang H, Li G, Tasi C (2007) Regression coefficient and autoregressive order shrinkage and selection via the lasso. J R Stat Soc B 69:63–78

    MathSciNet  Google Scholar 

  • Wong K, Li Z, Tewari A (2020) Lasso guarantees for \(\beta \)-mixing heavy tailed time series. Ann Stat 48:1124–1142

    Article  MathSciNet  Google Scholar 

  • Wu WB (2005) Nonlinear system theory: another look at dependence. Proc Natl Acad Sci USA 102:14150–14154

    Article  MathSciNet  Google Scholar 

  • Wu WB, Pourahmadi M (2009) Banding sample autocovariance matrices of stationary processes. Stat Sin 19:1755–1768

    MathSciNet  MATH  Google Scholar 

  • Wu WB, Wu YN (2016) Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron J Stat 10:352–379

    Article  MathSciNet  Google Scholar 

  • Xie F, Xiao ZJ (2018) Square-root lasso for high-dimensional sparse linear systems with weakly dependent errors. J Time Ser Anal 39:212–238

    Article  MathSciNet  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76:217–242

    Article  MathSciNet  Google Scholar 

  • Zhang K, Janson L, Murphy S (2020) Inference for batched bandits. arXiv pre-print arXiv:2002.03217v2

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Xiao Guo’s research is supported by the National Natural Science Foundation of China, grants 12071452, 72091212, 11601500 and USTC Research Funds of the Double First-Class Initiative, grants YD2040002013. The authors also thank the Editor, Associate Editor, and two anonymous referees for their constructive comments that have led to a substantial improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Proofs of main results

Technical lemmas

Recall that \(\varepsilon _i=g(\ldots ,\xi _{i-1},\xi _i)=g({\mathcal {F}}_i)\) in (6), define

$$\begin{aligned} \varepsilon ^*_i:=\varepsilon _{i,m}={\mathbb {E}}(\varepsilon _i|\xi _{i-m},\ldots ,\xi _i)={\mathbb {E}}(\varepsilon |{\mathcal {F}}_{i-m,i}),\quad m\ge 0, \end{aligned}$$

where \({\mathcal {F}}_{i-m,i}=(\xi _{i-m},\ldots ,\xi _i)\) is a \(\sigma \) algebra. Then \(\{\varepsilon _i^*\}\) is m-dependent random variables with mean zero. To prove the main theorems in this paper, we first give some lemmas.

Lemma A.1

(Concentration inequalities under dependence)

  1. (i)

    If \({\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty ,\) where \(q>2\) and \(\alpha >0,\) and \(\sum _{i=1}^na_i^2=n\). Let \({\mathbf {a}}=(a_1,\ldots ,a_n)^\top \), and \(\zeta _n=1 (resp.~ \zeta _n=(\log n)^{1+2q} ~or~\zeta _n= n^{q/2-1-\alpha q})\) if \(\alpha >1/2-1/q~(resp.~\alpha =0~or~\alpha <1/2-1/q)\). Then for all \(x>0, ~S_n=\sum _{i=1}^na_i\varepsilon _i\), we have

    $$\begin{aligned} {\mathbb {P}}(|S_n|\ge x)\le K_1\frac{\zeta _n\Vert {\mathbf {a}}\Vert _q^q{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{x^q}+K_2\exp \Big (-\frac{K_3x^2}{n{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ), \end{aligned}$$

    where \(K_1, K_2, K_3\) are constants that only depend of q and \(\alpha \).

  2. (ii)

    If \({\mathcal {D}}_z<\infty \) and \(\sum _{i=1}^na_i^2=n\). Let \(\alpha =2/(1+2z)\), \(c_\alpha \) is a constant only depending on \(\alpha \). Then for all \(x>0, ~S_n=\sum _{i=1}^na_i\varepsilon _i\), we have

    $$\begin{aligned} {\mathbb {P}}(|S_n|\ge nx)\le (2+\sqrt{2}c_\alpha )\exp \Big ( -\frac{(\sqrt{n}x/{\mathcal {D}}_z)^\alpha }{2e\alpha } \Big ). \end{aligned}$$

Proof

See Theorems 2 and 3 of Wu and Wu (2016). Details are omitted. \(\square \)

Lemma A.2

Suppose that \(\varDelta _{0,q}<\infty \) for \(q\ge 2\). Let \(a_1,a_2,\ldots ,\in {\mathbb {R}} \), \(A_n=(\sum _{i=1}^n a_i^2)^{1/2}\), and \(C_q=18q^{3/2}(q-1)^{-1/2}.\) Then (i) \({\vert \vert \vert \sum _{i=1}^n a_i\varepsilon _i \vert \vert \vert }_q\le C_qA_n\varDelta _{0,q}\), and (ii) \({\vert \vert \vert \sum _{i=1}^n a_i(\varepsilon _i-\varepsilon _i^*) \vert \vert \vert }_q\le C_qA_n\varDelta _{m+1,q}.\)

Proof

This result can be proved by using the arguments in Liu and Wu (2010) Lemma 1. \(\square \)

Lemma A.3

Suppose \( \varDelta _{m,q}<\infty \) for \(2 <q\le 4\). Let \(d=q/2\), then for any \(j\in {\mathbb {Z}}\),

$$\begin{aligned} {\left| \left| \left| \sum _{i=1}^n\varepsilon _i\varepsilon _{i+j}-n\gamma _{j}^{\varvec{\varepsilon }} \right| \right| \right| }_d\le 2B_dn^{1/d}{\vert \vert \vert \varepsilon _1 \vert \vert \vert }_q\varDelta _{0,q}, \end{aligned}$$

where

$$\begin{aligned} B_{d}=\left\{ \begin{array}{ll} {\frac{18 d^{3 / 2}}{(d-1)^{1 / 2}},} &{} \mathrm{if } \quad d \ne 2, \\ {1,} &{} \mathrm{if } \quad d=2.\end{array}\right. \end{aligned}$$

Proof

See Lemma 1 of Wu and Pourahmadi (2009). \(\square \)

Lemma A.4

For \(j\in \{1,\ldots ,p \}\), we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j-{\mathbf {e}}_j\Vert _\infty \le \frac{ \lambda _j}{\widehat{\tau }_j^2}. \end{aligned}$$

Proof

This Lemma is easily proved by using the KKT conditions for the nodewise Lasso.

Firstly, notice that \(\widehat{z}_j^\top \widehat{\varvec{\gamma }}_j=\Vert \widehat{\varvec{\gamma }}_j\Vert _1\), where \(\widehat{z}_j\) is the subdifferential of \(\Vert \varvec{\gamma }\Vert _1\), thus \(\Vert \widehat{z}_j\Vert _\infty \le 1\). By the KKT conditions for the nodewise lasso (5), we have

$$\begin{aligned} -\frac{1}{n}(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)^\top {\mathbf {X}}_{-j}+\lambda _j \widehat{z}_j=0, \end{aligned}$$

thus

$$\begin{aligned} \frac{(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n}+\lambda _j||\widehat{\varvec{\gamma }}_j||_1=\frac{X_j^T(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n}=\widehat{\tau }_j^2. \end{aligned}$$

Dividing each side of the above display by \(\widehat{\tau }_j^2\) yields

$$\begin{aligned} 1=\frac{X_j^T(X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)}{n\widehat{\tau }_j^2}=\frac{X_j^T{\mathbf {X}}\widehat{C}_j}{n\widehat{\tau }_j^2}, \end{aligned}$$

so that

$$\begin{aligned} \frac{X_j^T{\mathbf {X}}\widehat{\varvec{\varTheta }}_j}{n}=1. \end{aligned}$$

Moreover, note that the KKT conditions for the nodewise lasso (5) can be written as

$$\begin{aligned} \widehat{z}_j=\frac{{\mathbf {X}}_{-j}^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j) }{n\lambda _j}, \end{aligned}$$

using \(\Vert \widehat{z}_j\Vert _\infty \le 1\) yields

$$\begin{aligned} \Big \Vert \frac{{\mathbf {X}}_{-j}^\top (X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j) }{n\lambda _j}\Big \Vert _\infty \le 1, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \frac{\Vert {\mathbf {X}}_{-j}^\top {\mathbf {X}}\widehat{C}_j\Vert _\infty }{n}\le \lambda _j, \end{aligned}$$

since \((X_j-{\mathbf {X}}_{-j}\widehat{\varvec{\gamma }}_j)={\mathbf {X}}\widehat{C}_j.\) Then, dividing both sides of the above display by \(\widehat{\tau }_j\) and using that \(\widehat{\varvec{\varTheta }}_j=\widehat{ C}_j/\widehat{\tau }_j^2\), we have

$$\begin{aligned} \frac{\Vert {\mathbf {X}}_{-j}^T{\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _\infty }{n}\le \frac{\lambda _j}{\widehat{\tau }_j^2}, \end{aligned}$$

and combine with \(X_j^\top {\mathbf {X}}\widehat{\varvec{\varTheta }}_j/n=1\), we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j-{\mathbf {e}}_j\Vert _\infty \le \frac{ \lambda _j}{\widehat{\tau }_j^2}. \end{aligned}$$

\(\square \)

Lemma A.5

Suppose Assumptions 34 hold and \(\log p=o(\sqrt{n})\). Then there exist constants c and C, for all j it holds that

$$\begin{aligned} c\le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2\le C. \end{aligned}$$

Proof

By the definition of \(\widehat{C}_j\), we have \(\Vert {\mathbf {X}}\widehat{C}_j\Vert _2^2/n=\Vert X_{j}-{\mathbf {X}}_{-j} \widehat{\varvec{\gamma }}_{j}\Vert _{2}^2 / n :=\widetilde{\tau }_{j}^2,\) and \(\widehat{\varvec{\varTheta }}_j=\widehat{ C}_j/\widehat{\tau }_j^2\). Thus

$$\begin{aligned} \widetilde{\tau }_{j}^2=\frac{1}{n}\Vert {\mathbf {X}}\widehat{C}_j\Vert _2^2=\frac{1}{n}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2\cdot \widehat{\tau }_j^4. \end{aligned}$$

Note that \(\widehat{\tau }_j^2=\widetilde{\tau }_j^2+\lambda _j\Vert \widehat{\varvec{\gamma }}_j\Vert _1\), thus we get \(\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2/n\le C\) since \(\widetilde{\tau }_j^2\le \widehat{\tau }_j^2\le C\). Furthermore, according to Lemma A.4, Assumptions 34 and \(\log p=o(\sqrt{n})\), we get

$$\begin{aligned} \begin{aligned} |\widehat{\varvec{\varTheta }}_j^T( \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j)|&\le \Vert \widehat{\varvec{\varTheta }}_j\Vert _1\Vert \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j\Vert _{\infty }\\&= O(a_n\lambda _j)\\&=o(n^r\sqrt{\log (p)/n})=o(1). \end{aligned} \end{aligned}$$

Thus

$$\begin{aligned} \begin{aligned} n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_j\Vert _2^2&=\widehat{\varvec{\varTheta }}_j^T\widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j=\widehat{\varvec{\varTheta }}_j^T{\mathbf {e}}_j+\widehat{\varvec{\varTheta }}_j^T( \widehat{\varvec{\varSigma }}\widehat{\varvec{\varTheta }}_j -{\mathbf {e}}_j)\\&=1/\widehat{\tau }_j^2+o(1)\ge c. \end{aligned} \end{aligned}$$

\(\square \)

Lemma A.6

Consider the linear model in (2) and the Lasso in (3).

  1. (i)

    If the error series \(\{\varepsilon _i\}\) has finite q-th moment, \(q>2\), and \({\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty \) for \( \alpha \ge 0.\) Define

    $$\begin{aligned} v=\left\{ \begin{array}{ll} {1}/{2}, &{} { \text{ if } \alpha >1 / 2-1 / q} ,\\ {{1}/{q}+\alpha }, &{} { \text{ if } \alpha <1 / 2-1 / q}. \end{array} \right. , \end{aligned}$$

    let \(\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-v}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }\},\) where A is a sufficiently large constant, then, with probability at least \(1- C_1(\log p)^{-1}-C_2p^{1-K_3A^2}\), we have

    $$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\le Cs_0\lambda \quad \mathrm{and }\quad \Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2/n\le Cs_0\lambda ^2. \end{aligned}$$
    (A.1)
  2. (ii)

    If the error series \(\{\varepsilon _i\}\) has finite exponential moment, that is \({\mathcal {D}}_z<\infty \). Let \(\alpha =2/(1+2z)\), \(c_\alpha \) is a constant only depending on \(\alpha \) and \(\lambda =An^{-1/2}(\log p)^{1/\alpha }{\mathcal {D}}_z\), where A is a sufficiently large constant. Then, with probability at least \(1- C_1p^{1-C_2A^\alpha }\), we have the bounds (A.1).

Proof

Since \(\widehat{\varvec{\beta }}\) minimizes (3), we have

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Vert \widehat{\varvec{\beta }}\Vert _1\le 2\lambda \Vert \varvec{\beta }^*\Vert _1+\frac{2}{n}\varvec{\varepsilon }^\top {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*). \end{aligned}$$
(A.2)

Define the event \({\mathcal {T}}:=\{\max _{1\le j\le p}|\varGamma _j|\le \lambda /c\}\) for some constant \(c>0\), where \(\varGamma _j=n^{-1}\sum _{i=1}^nx_{ij}\varepsilon _i.\) Then on the event \({\mathcal {T}}\), uing the fact

$$\begin{aligned} \frac{1}{n}\Vert 2\varvec{\varepsilon }^\top {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _\infty \le 2 \Big (\max _{1\le j\le p}|\varGamma _j|\Big )\cdot \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1, \end{aligned}$$

inequality (A.2) implies that

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Vert \widehat{\varvec{\beta }}\Vert _1\le 2\lambda \Vert \varvec{\beta }^*\Vert _1+\frac{2\lambda }{c}\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1. \end{aligned}$$
(A.3)

On the lefthand side in (A.3), using the triangle inequality,

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\ge \Vert \varvec{\beta }_{S_0}^*\Vert _1-\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1, \end{aligned}$$

whereas on the righthand side in (A.3), we can use

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1. \end{aligned}$$

Thus, we have

$$\begin{aligned} \frac{1}{n}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2+2\lambda \Big ( 1-\frac{1}{c} \Big ) \Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le 2\lambda \Big ( 1+\frac{1}{c} \Big )\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1. \end{aligned}$$
(A.4)

In particular, the above implies

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le \frac{c+1}{c-1}\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1. \end{aligned}$$

Then Assumption 1 shows that

$$\begin{aligned} \frac{s_0}{n\kappa _0}\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2^2\ge \Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1^2, \end{aligned}$$

combining (A.4), we get

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1\le Cs_0\lambda . \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1=\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1+\Vert \widehat{\varvec{\beta }}_{S_0^c}\Vert _1\le C\Vert \widehat{\varvec{\beta }}_{S_0}-\varvec{\beta }_{S_0}^*\Vert _1, \end{aligned}$$

we have

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\le Cs_0\lambda . \end{aligned}$$

Similarly,

$$\begin{aligned} \Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _2/n\le Cs_0\lambda ^2. \end{aligned}$$

Now, we need to control the probability \({\mathbb {P}}({\mathcal {T}}).\) There are 2 situations.

  1. (i)

    If the error series \(\{\varepsilon _i\}\) has finite q-th moment, \(q>2\), and \({\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }<\infty \) for \( \alpha \ge 0.\) For \(\alpha >1/2-1/q,\) let \(v=1/2\), by the inequality of Lemma A.1 (i) with \(\zeta _n=1\), we have

    $$\begin{aligned} \begin{aligned} {\mathbb {P}}(|\varGamma _j|\ge \lambda /c)&\le K_1{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q\frac{\sum _{i=1}^n|x_{ij}|^q}{(n\lambda )^q}+K_2\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big )\\&\le K_1\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ) \end{aligned} \end{aligned}$$

    Hence

    $$\begin{aligned} \begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)&\le \sum _{j=1}^p K_1\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2p\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big )\\&= K_1p\frac{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }^q}{(\sqrt{n}\lambda )^q}+K_2p\exp \Big (\frac{-K_3n\lambda ^2}{{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }^2}\Big ), \end{aligned} \end{aligned}$$

    under our choice of \(\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-1/2}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,\alpha }\}\) where A is a sufficiently large constant, we have

    $$\begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2}. \end{aligned}$$

    The case of \(\alpha <1/2-1/q\) can be similarly proved.

  2. (ii)

    If the error series \(\{\varepsilon _i\}\) satisfies stronger moment condition, that is \({\mathcal {D}}_z<\infty \) for \(q\ge 2\). By Lemma A.1 (ii), we have

    $$\begin{aligned} {\mathbb {P}}(|\varGamma _j|\ge \lambda /c)\le (2+\sqrt{2}c_\alpha )\exp \Big ( -\frac{(\sqrt{n}\lambda /(c{\mathcal {D}}_z) )^\alpha }{2e\alpha }\Big ). \end{aligned}$$

    Hence

    $$\begin{aligned} \begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le \sum _{j=1}^p{\mathbb {P}}(|\varGamma _j|\ge \lambda /c)= (2+\sqrt{2}c_\alpha )p\exp \Big ( -\frac{(\sqrt{n}\lambda /(c {\mathcal {D}}_z ) )^\alpha }{2e\alpha }\Big ). \end{aligned} \end{aligned}$$

    Let \(\lambda =An^{-1/2}(\log p)^{1/\alpha }{\mathcal {D}}_z\), we get

    $$\begin{aligned} {\mathbb {P}}({\mathcal {T}}^c)\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

The proofs are completed. \(\square \)

Lemma A.7

Under the condition in Lemma A.6 and assume that \(s_0\lambda ^2=o(1)\). For any fixed k, we have

$$\begin{aligned} \widehat{\gamma }^{\varvec{\varepsilon }}_k{\mathop {\longrightarrow }\limits ^{P}}\gamma ^{\varvec{\varepsilon }}_k. \end{aligned}$$

Proof

By using the results in Lemma A.6, we have

$$\begin{aligned} \Vert {\mathbf {e}}-\varvec{\varepsilon } \Vert _1=\Vert {\mathbf {X}}(\widehat{\varvec{\beta }}-\varvec{\beta }^*) \Vert _1=O_{{\mathbb {P}}}(s_0\lambda ^2)=o_{{\mathbb {P}}}(1), \end{aligned}$$

thus \(e_k{\mathop {\longrightarrow }\limits ^{P}}\varepsilon _k\) for \(k=0,\ldots ,n-1\). Then for any fixed k,

$$\begin{aligned} \widehat{\gamma }^{\varvec{\varepsilon }}_k=\frac{1}{n}\sum _{i=1}^{n-|k|}e_ie_{i+|k|}{\mathop {\longrightarrow }\limits ^{P}}\frac{1}{n}\sum _{i=1}^{n-|k|}\varepsilon _i\varepsilon _{i+|k|}{\mathop {\longrightarrow }\limits ^{P}}\gamma ^{\varvec{\varepsilon }}_k \end{aligned}$$

by the ergodicity condition, and thus the proofs are completed. \(\square \)

Technical proofs

1.1 Proof of Theorem 1

Proof

Based on the definition of the desparsifying Lasso (4), and using

$$\begin{aligned} {\mathbf {Y}}={\mathbf {X}} \varvec{\beta }^*+\varvec{\varepsilon }, \end{aligned}$$

simple algebra yields

$$\begin{aligned} \widehat{{\varvec{b}}}=\varvec{\beta }^*+\widehat{\varvec{\varTheta }}{\mathbf {X}}^\top \varvec{\varepsilon }/n+\varvec{\varLambda }/\sqrt{n}, \end{aligned}$$

where \(\varvec{\varLambda }=-\sqrt{n}(\widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p)(\widehat{\varvec{\beta }}-\varvec{\beta }^*).\) Thus,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{b}}}-\varvec{\beta }^*)=\frac{1}{\sqrt{n}}\widehat{\varvec{\varTheta }}{\mathbf {X}}^\top \varvec{\varepsilon }+\varvec{\varLambda }. \end{aligned}$$

By using the fact that for any matrix \(\mathbf{A}\in {\mathbb {R}}^{m\times n}\) and any vector \(x\in {\mathbb {R}}^{n\times 1}\),

$$\begin{aligned} \Vert \mathbf{A}x\Vert _\infty \le \Vert \mathbf{A}\Vert _{\max }\Vert x\Vert _1, \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned} \Vert \varvec{\varLambda }\Vert _\infty&= \sqrt{n}\Vert (\widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p)(\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Vert _\infty \le \sqrt{n}\Vert \widehat{\varvec{\varTheta }}\widehat{\varvec{\varSigma }}-{\mathbf {I}}_p\Vert _{\max }\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1. \end{aligned} \end{aligned}$$

By Lemma A.4, we have

$$\begin{aligned} \Vert \varvec{\varLambda }\Vert _\infty \le \sqrt{n}\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1\max _{1\le j\le p}( \lambda _j/\widehat{\tau }_j^2). \end{aligned}$$
  1. (i)

    If the error sequence has finite q-th moment, by Lemma A.6 (i), for \(\lambda =A\max \{(n^{-1}\log p )^{1/2}{\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{2,\alpha }, n^{-v}(p\log p)^{\frac{1}{q}} {\vert \vert \vert \varvec{\varepsilon }. \vert \vert \vert }_{q,a}\},\) yields

    $$\begin{aligned} {\mathbb {P}}\Big (\Vert \varvec{\varLambda }\Vert _\infty \ge C\sqrt{n}s_0\lambda \Big (\max \limits _j\frac{\lambda _j}{\widehat{\tau }_j^2}\Big ) \Big )\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2}. \end{aligned}$$
  2. (ii)

    If the error sequence has finite exponential moment, then using Lemma A.6 (ii), for \(\lambda =An^{-1/2}(\log p)^{1/\alpha }\mathcal {D}_z\), yields

    $$\begin{aligned} {\mathbb {P}}\Big (\Vert \varvec{\varLambda }\Vert _\infty \ge C\sqrt{n}s_0\lambda \Big (\max \limits _j\frac{\lambda _j}{\widehat{\tau }_j^2}\Big ) \Big )\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

The following proof is applicable to both the error sequence has finite q-th moment and exponential moment, and we will use the m-dependent approximation to prove (7) holds.

For some constants \(2<d<\infty \) and \(2r<\eta <1/2\), where r is given in Assumption 4, let \(\zeta _n=\lfloor n^{\eta }/(\log n)^d\rfloor \), \(t_n=\lfloor n^{1/2-\eta }\rfloor \) and \(w_n=\lfloor n/\zeta _n-t_n\rfloor \). Define \(\varvec{\varepsilon }^*=(\varepsilon _1^*,\ldots ,\varepsilon _n^*)^\top \) with

$$\begin{aligned} \varepsilon ^*_i:=\varepsilon _{i,t_n}={\mathbb {E}}(\varepsilon _i|\xi _{i-t_n},\ldots ,\xi _i)={\mathbb {E}}(\varepsilon _i|{\mathcal {F}}_{i-t_n,i}), \end{aligned}$$

Thus \(\{\varepsilon _i^*\}\) are \(t_n\)-dependent sequence. For any \(k\in \{1,\ldots ,p\}\), according to Lemma A.2 (ii) and Lemma A.5 we have

$$\begin{aligned} \begin{aligned} {\left| \left| \left| n^{-1/2}\sum _{i=1}^n\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i(\varepsilon _i-\varepsilon _i^*) \right| \right| \right| }_2^2&= \mathrm{Var}\Big \{n^{-1/2} \sum _{i=1}^n\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i(\varepsilon _i-\varepsilon _i^*)\Big \}\\&\le Cn^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2 \varDelta _{t_n+1,2}^2\\&= O(\varDelta _{t_n+1,2}^2)\\&=o(1), \end{aligned} \end{aligned}$$

since the processes \(\{\varepsilon _i\}\) is short-range dependence. Therefore, to show

$$\begin{aligned} V_k=\frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i{\mathop {\longrightarrow }\limits ^{D}} N(0, 1) \end{aligned}$$

is equivalent to show

$$\begin{aligned} \frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i^*{\mathop {\longrightarrow }\limits ^{D}}N(0,1). \end{aligned}$$

Denote by \({\mathbf {v}}=n^{-1/2}\widehat{\varvec{\varTheta }}_k^\top \mathbf { X}^\top /{\sigma _k} =(v_1,\ldots ,v_n)\), where \(v_j=n^{-1/2}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_j/{\sigma _k}\). Define

$$\begin{aligned} \varvec{\varXi }_i= & {} \sum _{j=(i-1)(w_n+t_n)+1}^{(i-1)(w_n+t_n)+w_n}v_j\varepsilon ^*_j, \quad i=1,\ldots , \zeta _n. \\ \varvec{\varXi }_i^\prime= & {} \left\{ \begin{array}{cc} \sum \limits _{j=(i-1)(w_n+t_n)+w_n+1}^{i(w_n+t_n)}v_j\varepsilon _j^*,&{} i=1,\ldots , \zeta _n-1,\\ \sum \limits _{j=(i-1)(w_n+t_n)+w_n+1}^n v_j\varepsilon _j^*,&{} i= \zeta _n. \end{array} \right. \end{aligned}$$

Therefore

$$\begin{aligned} \sum _{i=1}^{ \zeta _n }\varvec{\varXi }_i+\sum _{i=1}^{ \zeta _n}\varvec{\varXi }_i^\prime = {\mathbf {v}}\varvec{\varepsilon }^*. \end{aligned}$$

For n large enough, \(\{\varvec{\varXi }_i\}_{i=1}^{\zeta _n}\) are independent and \(\{\varvec{\varXi }_i^\prime \}_{i=1}^{\zeta _n}\) are independent, since \(\varvec{\varepsilon }^*\) are \(t_n\)-dependent. In the following, we will show that

$$\begin{aligned} \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i{\mathop {\longrightarrow }\limits ^{D}} N(0,1) \quad \mathrm{and}\quad \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime =o_{{\mathbb {P}}}(1). \end{aligned}$$

By Lemma A.5, we get

$$\begin{aligned} \begin{aligned} \sigma _k^2= |n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \varvec{\varSigma }_n{\mathbf {X}}\widehat{\varvec{\varTheta }}_k| \le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2\cdot \lambda _{\max }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})=O(1) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \sigma _k^{2}=|n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \varvec{\varSigma }_n{\mathbf {X}}\widehat{\varvec{\varTheta }}_k|\ge n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2\cdot \lambda _{\min }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})=O(1), \end{aligned} \end{aligned}$$

where \(\lambda _{\min }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})\) and \(\lambda _{\max }({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }})\) are the minimum and maximun eigenvalue of \({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\), respectively, since the eigenvalues of \({\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}=\mathrm{Cov}(\varvec{\varepsilon })\) are bounded away from zero and infinity (see Section 5.2 in Grenander and Szegö 1958). Thus we have

$$\begin{aligned} \begin{aligned} \max _{j}|v_j|&=\Vert n^{-1/2}\sigma _k^{-1} \widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \Vert _\infty \le Cn^{-1/2}\Vert {\mathbf {X}}\varvec{\widehat{\varTheta }}_k\Vert _\infty \\&\le Cn^{-1/2}\Vert {\mathbf {X}}\Vert _{\max }\Vert \varvec{\widehat{\varTheta }}_k \Vert _1\\&=O(n^{-1/2}a_n\log n), \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \Vert {\mathbf {v}}\Vert _2=\Vert n^{-1/2}\sigma _k^{-1} \widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top \Vert _2\le Cn^{-1/2}\Vert {\mathbf {X}}\varvec{\widehat{\varTheta }}_k\Vert _2=O(1). \end{aligned} \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{aligned} {\mathrm{Var}}\Big (\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime \Big )&=\sum _{i=1}^{\zeta _n}{\mathrm{Var}}\Big (\varvec{\varXi }_i^\prime \Big )\le C\zeta _n {\mathrm{Var}}\Big (\sum _{j=\omega _n+1}^{\omega _n+t_n}v_j\varepsilon _j^*\Big )\\&\le C\frac{ \zeta _n (a_n\log n)^2}{n}\mathrm{Var}\Big (\sum _{j=\omega _n+1}^{\omega _n+t_n}\varepsilon _j^*\Big )\\&=O\Big (\frac{ \zeta _n t_n^2(a_n\log n)^2}{n}\Big )\\&= o\Big (\frac{ n^{2r}}{n^{\eta }(\log n)^{d-2}}\Big )\\&= o(1), \end{aligned} \end{aligned}$$

which, together with \(\mathbb { E}(\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime )=0\) implies \(\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime =o_{{\mathbb {P}}}(1)\).

By Lemma A.2(ii), we have

$$\begin{aligned} \mathrm{Var}\{{\mathbf {v}}(\varvec{\varepsilon }-\varvec{\varepsilon }^*)\}\le C\Vert {\mathbf {v}}\Vert _2^2\varDelta _{t_n+1,2}^2=O(\varDelta _{t_n+1,2}^2)=o(1) \quad \mathrm{as}\quad n\rightarrow \infty . \end{aligned}$$

Then

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathrm{Var}\Big \{\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i+\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime \Big \}=\lim _{n\rightarrow \infty }\mathrm{Var}({\mathbf {v}}\varvec{\epsilon }^*)=\mathrm{Var}({\mathbf {v}}\varvec{\epsilon })=1, \end{aligned}$$

implying that \(\lim _{n\rightarrow \infty }\mathrm{Var}(\sum _{i=1}^{\zeta _n}\varvec{\varXi }_i)=1\), since \(\mathrm{Var}( \sum _{i=1}^{\zeta _n}\varvec{\varXi }_i^\prime )=o(1)\).

According to Lemma A.2(ii), the Liapounov condition follows

$$\begin{aligned} \begin{aligned} \sum _{i=1}^{\zeta _n}{\mathbb {E}}(|\varvec{\varXi }_i|^{2+c})&\le C\sum _{i=1}^{\zeta _n}\Big \{ \sum _{j=(i-1)(w_n+t_n)+1}^{(i-1)(w_n+t_n)+w_n}v_j^2\Big \}^{1+c/2}\varDelta _{0,2+c}^{2+c}\\&=O\Big (\frac{\zeta _n w_n^{1+c/2}}{n^{1+c/2}}\cdot (a_n\log n)^{2+c}\Big )\\&= O\Big ( \frac{a_n^{2+c}(\log n)^{2+(1+d/2)c}}{n^{c\eta /2}} \Big )\\&=o(1) \end{aligned} \end{aligned}$$

for some constant \(c>0\). Hence, by central limit theorem,

$$\begin{aligned} \frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top {\mathbf {x}}_i\varepsilon _i^*{\mathop {\longrightarrow }\limits ^{D}} N(0, 1), \end{aligned}$$

that is

$$\begin{aligned} V_k=\frac{1}{\sqrt{n}\sigma _k}\sum _{i=1}^n \widehat{\varvec{\varTheta }}_k^\top \mathbf { x}_i\varepsilon _i{\mathop {\longrightarrow }\limits ^{D}} N(0, 1), \end{aligned}$$

and the proofs are completed. \(\square \)

1.2 Proof of Lemma 1

Proof

By using the fact that for any matrix \(\mathbf{A}\in {\mathbb {R}}^{m\times n}\),

$$\begin{aligned} \Vert \mathbf{A}\Vert _2^2\le \Vert \mathbf{A} \Vert _{1} \Vert \mathbf{A} \Vert _{\infty }, \end{aligned}$$

we have

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2 \le \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _{1}, \end{aligned}$$

since \(\widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\) is symmetric. Thus

$$\begin{aligned} \begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2&\le \max _{1\le j\le n}\sum _{i=1}^n|\widehat{\gamma }_{i-j}^{\varvec{\varepsilon }}{\mathbf {1}}_{|i-j|\le l}-\gamma _{i-j}^{\varvec{\varepsilon }}|\\&\le \sum _{i=1-n}^{n-1}|\widehat{\gamma }_i^{\varvec{\varepsilon }}{\mathbf {1}}_{|i|\le l}-\gamma _{i}^{\varvec{\varepsilon }}|\\&\le 2\sum _{i=0}^{l} |\widehat{\gamma }_i^{\varvec{\varepsilon }}-\gamma _i^{\varvec{\varepsilon }}|+2\sum _{i=l+1}^n|\gamma _i^{\varvec{\varepsilon }}|\\&:=T_1+T_2. \end{aligned} \end{aligned}$$

Note that for \(i\ge 0\),

$$\begin{aligned} \begin{aligned}&\Big | \frac{1}{n}\sum _{j=1}^{n-i}e_je_{j+i} -\gamma _j^{\varvec{\varepsilon }} \Big |=\Big |\frac{1}{n} \sum _{j=1}^{n-i}[\varepsilon _j-{\mathbf {x}}_j^\top ( \widehat{\varvec{\beta }}-\varvec{\beta }^*)] [\varepsilon _{j+i}-{\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*) ] -\gamma _j^{\varvec{\varepsilon }} \Big |\\&\quad \le \frac{1}{n}\Big | \sum _{i=1}^{n-i}\varepsilon _{j}\varepsilon _{j+i} - (n-i)\gamma _{j}^{\varvec{\varepsilon }} \Big | +\frac{1}{n}\Big |\sum _{j=1}^{n-i} (\varepsilon _j {\mathbf {x}}_{j+i}^\top + \varepsilon _{j+i}{\mathbf {x}}_{j}^\top ) (\widehat{\varvec{\beta }}-\varvec{\beta }^*) \Big |\\&\quad \quad +\frac{1}{n}\sum _{j=1}^{n-i} |{\mathbf {x}}_j^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)\cdot {\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)| + \frac{i}{n}|\gamma _j^{\varvec{\varepsilon }}|\\&\quad := D_1 + D_2 + D_3 + D_4. \end{aligned} \end{aligned}$$

Using Lemma A.3, there exists a constant \(c_q\) depending only on q, such that

$$\begin{aligned} {\vert \vert \vert D_1 \vert \vert \vert }_{q/2} \le c_q{\vert \vert \vert \varepsilon _1 \vert \vert \vert }_{q}\varDelta _{0,q}\frac{(n-i)^{2/q}}{n}\le C n^{2/q-1}, \end{aligned}$$

thus

$$\begin{aligned} D_1=O_{{\mathbb {P}}}(n^{2/q-1}). \end{aligned}$$

Note that

$$\begin{aligned} \begin{aligned} \frac{1}{n}\Big |\sum _{j=1}^{n-i}\varepsilon _j{\mathbf {x}}_{i+j}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)\Big |&\le \frac{1}{n}\Vert \sum _{j=1}^{n-i} \varepsilon _j {\mathbf {x}}_{i+j}\Vert _\infty \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^* \Vert _1 \\&= \max _{k\le p} \frac{1}{n}\Big |\sum _{j=1}^{n-i}\varepsilon _jx_{j+i,k}\Big |\cdot \Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1, \end{aligned} \end{aligned}$$

according to Lemma A.1, we get

$$\begin{aligned} {\mathbb {P}}\Big ( \max _{k} \frac{1}{n}\sum _{j=1}^n|x_{j,k}\varepsilon _j|\ge \lambda \Big )\le C_1(\log p)^{-1}+C_2p^{1-K_3A^2} \end{aligned}$$

or

$$\begin{aligned} {\mathbb {P}}\Big ( \max _{k} \frac{1}{n}\sum _{j=1}^n|x_{j,k}\varepsilon _j|\ge \lambda \Big )\le C_1p^{1-C_2A^\alpha }. \end{aligned}$$

Combining Lemma A.6, yields \(D_2 = O_{{\mathbb {P}}}(s_0\lambda ^2).\) Moreover,

$$\begin{aligned} \begin{aligned} D_3 = \frac{1}{n}\sum _{j=1}^{n-i} |{\mathbf {x}}_j^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)|\cdot |{\mathbf {x}}_{j+i}^\top (\widehat{\varvec{\beta }}-\varvec{\beta }^*)|\le C\Vert \widehat{\varvec{\beta }}-\varvec{\beta }^*\Vert _1^2, \end{aligned} \end{aligned}$$

thus \(D_3 =O_{{\mathbb {P}}}(s_0^2\lambda ^2).\) Then we get

$$\begin{aligned} T_1 =O_{{\mathbb {P}}}\Big ( l(n^{2/q-1} +s_0^2\lambda ^2) \Big ), \end{aligned}$$

since \(D_2=o_{{\mathbb {P}}}(D_3)\) and \(D_4=O(l^2n^{-1})=o_{{\mathbb {P}}}(D_1)\). Therefore

$$\begin{aligned} \begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _2 = O_{{\mathbb {P}}} \Big ( l(n^{2/q-1} +s_0^2\lambda ^2) \Big )+2\sum _{i=l+1}^n|\gamma _i^{\varvec{\varepsilon }}|. \end{aligned} \end{aligned}$$

Let \(l\rightarrow \infty , l(n^{2/q-1}+s_0^2\lambda ^2)=o(1)\) for \(2<q\le 4\), yields

$$\begin{aligned} \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }} \Vert _2 = o_{{\mathbb {P}}}(1), \end{aligned}$$

and the proofs are completed. \(\square \)

1.3 Proof of Theorem 2

Proof

Since \( \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2=o_{{\mathbb {P}}}(1)\) by Lemma 1, we get

$$\begin{aligned} \begin{aligned} |\widehat{\sigma }_k^2-\sigma _k^2|&=|n^{-1}\widehat{\varvec{\varTheta }}_k^\top {\mathbf {X}}^\top (\widehat{\varvec{\varSigma }}_{n,l}- \varvec{\varSigma }_n) {\mathbf {X}}\widehat{\varvec{\varTheta }}_k|\\&\le n^{-1}\Vert {\mathbf {X}}\widehat{\varvec{\varTheta }}_k\Vert _2^2 \cdot \Vert \widehat{\varvec{\varSigma }}_{n, l}^{\varvec{\varepsilon }} - {\varvec{\varSigma }}_{n}^{\varvec{\varepsilon }}\Vert _2\\&=o_{{\mathbb {P}}}(1). \end{aligned} \end{aligned}$$

Thus \(\widehat{\sigma }_k^2\) is a consistent estimator of \(\sigma _k^2\). Combining Theorem 1, Lemma 1 and Slutsky’s Theorem, the proofs are completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, P., Guo, X. High-dimensional inference for linear model with correlated errors. Metrika 85, 21–52 (2022). https://doi.org/10.1007/s00184-021-00820-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-021-00820-7

Keywords

Navigation