Estimation of autocovariance matrices for high dimensional linear processes

Furmańczyk, Konrad

doi:10.1007/s00184-020-00790-2

Estimation of autocovariance matrices for high dimensional linear processes

Open access
Published: 26 August 2020

Volume 84, pages 595–613, (2021)
Cite this article

Download PDF

You have full access to this open access article

Metrika Aims and scope Submit manuscript

Estimation of autocovariance matrices for high dimensional linear processes

Download PDF

Konrad Furmańczyk ORCID: orcid.org/0000-0002-7683-4787¹

2358 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In this paper under some mild restrictions upper bounds on the rate of convergence for estimators of $p\times p$ autocovariance and precision matrices for high dimensional linear processes are given. We show that these estimators are consistent in the operator norm in the sub-Gaussian case when $p={\mathcal {O}}\left( n^{\gamma /2}\right) $ for some $\gamma >1$, and in the general case when $ p^{2/\beta }(n^{-1} \log p)^{1/2}\rightarrow 0$ for some $\beta >2$ as $ p=p(n)\rightarrow \infty $ and the sample size $n\rightarrow \infty $. In particular our results hold for multivariate AR processes. We compare our results with those previously obtained in the literature for independent and dependent data. We also present non-asymptotic bounds for the error probability of these estimators.

Moment Bounds for Large Autocovariance Matrices Under Dependence

Article 13 June 2019

CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data

Article 23 July 2021

Extreme value analysis for the sample autocovariance matrices of heavy-tailed multivariate time series

Article 19 April 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Estimation of covariance matrices in a high dimensional setting has been one of the fundamental statistical issues in the last decade. Some statistical applications of estimation of covariance matrices have been presented for ridge regression in Hoerl and Kennard (1970), in regularized discriminant analysis—(Friedman 1989) and in principal component analysis—(Johnstone and Lu 2009). For an overview of this topic and its applications see (Bickel and Levina 2008b; Birnbaum and Nadler 2012; Chen et al. 2013; Fan et al. 2006; Rothman et al. 2009). The problem of estimating covariance matrices for dependent data has been recently investigated by (Chen et al. 2013; Bhattacharjee and Bose 2014a, b; Guo et al. 2016; Jentsch and Politis 2015; McMurry and Politis 2010), and Wu and Pourahmadi (2009). The estimation of the inverse covariance matrix is used in the recovery of the true unknown structure of undirected graphical models, especially in Gaussian graphical models, where a zero entry of the inverse covariance matrix is associated with a missing edge between two vertices in the graph. The recovery of undirected graphs on the basis of the estimation of the precision matrices for a general class of nonstationary time series is considered in Xu et al. (2020).

Consider a p-dimensional linear process

$$\begin{aligned} {\mathbf {X}}_{t}=\sum _{j=0}^{\infty }{\varvec{\Psi }}_{j}{\varepsilon } _{t-j}\text { (almost surely),} \end{aligned}$$

(1)

where the ${\varvec{\Psi }}_{j}$ are $p\times p$ matrices, ${ \varepsilon }_{i}=\left( \varepsilon _{i,1},\ldots ,\varepsilon _{i,p}\right) ^{^{\prime }}$, $\left( {\varepsilon }_{t}\right) $ are i.i.d. vectors in ${\mathbb {R}}^{p}$ with mean ${\mathbf {0}}$ and variance-covariance matrix $\varvec{\Sigma }$. Under a causality condition, a vector ARMA process which is a basic model in econometrics and finance is a linear process (Brockwell and Davis 2002). We assume that $\left( {\varepsilon }_{t}\right) $ satisfies one of the following conditions:

(Gauss):

${\varepsilon }_{i}$ is Gaussian with mean ${\mathbf {0}} $ and variance-covariance matrix ${\varvec{\Sigma }}$.

(SGauss):

$\left( \varepsilon _{i,l}\varepsilon _{j,s}\right) $ is sub-Gaussian with constant $\sigma ^{2}$, that is,

$$\begin{aligned} E\exp \left( u\varepsilon _{i,l}\varepsilon _{j,s}\right) \le \exp \left( \sigma ^{2}u^{2}/2\right) \end{aligned}$$

for all $u\in {\mathbb {R}}$, $i,j=1,2,...$ and $l,s=1,\ldots ,p$,

(NGa$_{\beta }$):

$E\left| \varepsilon _{i,j}\right| ^{\beta }<\infty $ for some $\beta >2$ for $i=1,2,...$ and $j=1,...,p$.

For example, condition (SGauss) is satisfied for bounded sequences $\left( \varepsilon _{i,l}\right) $ i.e. $\sup _{i,l}\left| \varepsilon _{i,l}\right| \le M$ for some $M>0$. We can observe that (SGauss) is implied by sub-Gaussian condition for the vectorization $vec\left( {{ \varepsilon }}_{i}^{^{\prime }}\otimes {\varepsilon }_{j}\right) $ of the Kronecker product ${\varepsilon }_{i}^{^{\prime }}\otimes {\varepsilon }_{j}$ for all i, j i.e. for all $u\in {\mathbb {R}} ^{p^{2}}$

$$\begin{aligned} E\exp ( u^{^{\prime }}vec({\varepsilon }_{i}^{^{\prime }}\otimes {\varepsilon }_{j})) \le \exp ( u^{^{\prime }}\sigma ^{2}u/2) \end{aligned}$$

for some $\sigma ^{2}>0$. Condition (NGa$_{\beta }$) is a moment condition for the innovation process without assumptions on the dependency of the coordinates of the innovation process $\left( {\varepsilon }_{i}\right) $.

Let ${\varvec{\Gamma }}_{k}$ be the kth order autocovariance matrix,

$$\begin{aligned} {\varvec{\Gamma }}_{k}={{\,\mathrm{Cov}\,}}({\mathbf {X}}_{t},{\mathbf {X}}_{t-k}^{^{\prime }})=\sum _{j=k}^{\infty }{\varvec{\Psi }}_{j}{\varvec{\Sigma \Psi }} _{j-k}^{^{\prime }}\text {.} \end{aligned}$$

(2)

The matrix ${\varvec{\Gamma }}_{k}$ will be estimated from the sample ${\mathbf {X}}_{1},...,{\mathbf {X}}_{n}$ because in practice we do not know the matrices $ {\varvec{\Psi }}_{j}$ and ${\varvec{\Sigma }}$. Define the sample autocovariance matrix of order k as

$$\begin{aligned} \varvec{\hat{\Gamma }}_{k}=\frac{1}{n-k}\sum _{t=k+1}^{n}\mathbf {X}_{t}\mathbf { X}_{t-k}^{{\prime }}=:(\hat{\gamma }_{ij}^{k}) \end{aligned}$$

for $0\le k\le n-1$, and the banded version of ${\varvec{{\hat{\Gamma }}}}_{k}$ (as in Bhattacharjee and Bose (2014b)) is given by

$$\begin{aligned} \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})=(\hat{\gamma }_{ij}^{k}\mathbf {1 }\left( \left| i-j\right| \le l_{n}\right) ) \end{aligned}$$

(3)

for some sequence of thresholds $l_{n}\rightarrow \infty $ as $n\rightarrow \infty $, where ${\mathbf {1}}(\cdot )$ is the indicator function. We will assume that $p=p(n)\rightarrow \infty $ as $n\rightarrow \infty $.

The main contribution of our paper is that in Theorem 1 we obtain the rate in the operator norm of ${\mathbf {B}}_{l_{n}}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }}_{k}$ for high dimensional linear process

$$\begin{aligned} \big \Vert {\mathbf {B}}_{l_{n}}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }} _{k}\big \Vert _{2}={\mathcal {O}}_{P}( \Vert {\varvec{\Sigma }} \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {,} \end{aligned}$$

for some $c_{n}\rightarrow 0$ and some $\alpha >0$. Under the sub-Gaussian condition ((Gauss) and (SGauss)) we obtain the same rate ${\mathcal {O}}_{P }( (n^{-1} \log p)^{\alpha /2(\alpha +1)}) $ as Bhattacharjee and Bose (2014b) but under weaker assumptions for matrices of coefficients $ {\varvec{\Psi }}_{j} $ and $ {\varvec{\Sigma }} $. In particular, under causality, our results include vector autoregressive AR(r) processes. Similar results (Corollary 1) we obtain for the precision matrix

$$\begin{aligned} \big \Vert {\mathbf {B}}_{l_{n}}^{-1}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }}_{k}^{-1} \big \Vert _{2}={\mathcal {O}}_{P}(\Vert {\varvec{ \Sigma }}\Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {.} \end{aligned}$$

An interesting problem is to obtain lower bounds and the optimal rate of convergence. Cai et al. (2010) obtained the minimax bound for i.i.d. observations for tapering estimators of ${\varvec{\Gamma }}_{0}$. For dependent data this problem is still open. Below we briefly present the state of research related to the estimation of the covariance matrix for independent and dependent observations.

The sample covariance matrix $\varvec{\hat{\Gamma }}_{0}=(\hat{\gamma } _{ij}^{0})$ performs poorly in a high dimensional setting. In the Gaussian case when ${\varvec{\Gamma }}_{\mathbf {0}}={\mathbf {I}}$ is the identity matrix and $p/n\rightarrow c \in (0,1)$, the empirical distribution of the eigenvalues of the sample covariance matrix ${\varvec{\hat{\Gamma }}}_{0}$ follows the Marchenko and Pastur (1967) law, which is supported on the interval $(( 1-\sqrt{c})^{2}, (1+\sqrt{c})^{2})$.

For the i.i.d. case Bickel and Levina (2008a) proposed thresholding of the sample covariance matrix and obtained rates of convergence for the thresholding estimators for a proper choice of the threshold $\lambda _{n}$, where the estimator is given by

$$\begin{aligned} {\hat{\gamma }}_{ij}^{\lambda }={\hat{\gamma }}_{ij}^{0}{\mathbf {1}}({\hat{\gamma }} _{ij}^{0}\ge \lambda _{n})\text {.} \end{aligned}$$

Rothman et al. (2009) considered a class of universal thresholding rules with more general thresholding functions than hard thresholding. An interesting generalization of this method can be found in Cai and Liu (2011) for sparse covariance matrices, where an adaptive thresholding estimator is given by

$$\begin{aligned} {\hat{\gamma }}_{ij}^{*}=S_{\lambda _{ij}}({\hat{\gamma }}_{ij}^{0})\text {,} \end{aligned}$$

where $S_{\lambda _{ij}}(\cdot )$ is a general thresholding function with data-driven thresholds $\lambda _{ij}$. For other interesting results in this area, see (Birnbaum and Nadler 2012; Cai et al. 2010; Fan et al. 2006; Furrer and Bengtsson 2007; Huang et al. 2006).

There are few results for high-dimensional dependent data: (Bhattacharjee and Bose 2014a, b; Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Bhattacharjee and Bose (2014a) considered the estimation of the high dimensional variance-covariance matrix under a general Gaussian model with weak dependence in both rows and columns of the data matrix. They showed that the bounded and tapered sample variance-covariance matrices are consistent under a suitable column dependence model. But their conditions do not allow control of the first few autocovariances, only they control higher order autocovariances. Bhattacharjee and Bose (2014b) showed that under suitable assumptions for the linear process, the banded sample autocovariance matrices are consistent in the high dimensional setting. Chen et al. (2013) obtained the rate of convergence for a banded autocovariance estimator in operator norm for a general dependent model. A similar result under more restricted assumptions was obtained by (Jentsch and Politis 2015; Guo et al. 2016) established similar results for multivariate sparse autoregressive AR processes.

The rest of the paper is organized as follows. In “The rate of convergence of autocovariance estimation” section we deal with the problem of estimating the kth order autocovariance matrix ${\varvec{\Gamma }}_{k}$ by (3) in high dimensions (Theorem 1). The rate of convergence for the estimator of the precision matrix ${\varvec{\Gamma }}_{k}^{-1}$ is given in Corollary 1.

In Proposition 1 we obtain bounds on the error probability of the estimation of the k th order autocovariance. In “Comparison of our results with previous studies” section we compare our results with the results obtained by (Bickel and Levina 2008a, b) for independent normal and nonnormal data, with the minimax upper bound for tapering estimator in Cai et al. (2010) and with the results for dependent data obtained by (Chen et al. 2013; Bhattacharjee and Bose 2014b; Guo et al. 2016) and Jentsch and Politis (2015). In the special case of a multi-dimensional linear process we obtain a sharper bound (Theorem 1) than Chen et al. (2013) and Jentsch and Politis (2015). We also obtain a better rate for the estimation error for multivariate sparse AR processes than Guo et al. (2016).

Finally, the conclusions are presented in Sect. 2.5. All the proofs and auxiliary lemmas are given in the “Appendix”.

2 The rate of convergence of autocovariance estimation

For any $p\times p$ matrix ${\mathbf {M=}}\left( m_{ij}\right) $ we define the following matrix norms which are convenient for comparison with other results of autocovariance estimation:

$$\begin{aligned} \left\| {\mathbf {M}}\right\| _{2}=\sqrt{\lambda _{\max }\left( {\mathbf {M}} ^{^{\prime }}{\mathbf {M}}\right) }\text { (operator norm),} \end{aligned}$$

where $\lambda _{\max }\left( {\mathbf {M}}^{^{\prime }}{\mathbf {M}}\right) $ is the maximum eigenvalue of the matrix ${\mathbf {M}}^{^{\prime }}{\mathbf {M}}$,

$$\begin{aligned} T\left( {\mathbf {M}},t\right) =\max _{1\le j\le p}\sum _{i:\left| i-j\right| >t}\left| m_{ij}\right| \end{aligned}$$

for some threshold $t>0$,

$$\begin{aligned} \left\| {\mathbf {M}}\right\| _{(1,1)}= & {} \max _{1\le j\le p}\sum _{i=1}^{p}\left| m_{ij}\right| \text {,} \\ \left\| {\mathbf {M}}\right\| _{\infty }= & {} \max _{1\le i,j\le p}\left| m_{ij}\right| \text {.} \end{aligned}$$

We consider the following conditions on the matrices $\varvec{\Psi }_{j}$ (see (1)), $ {\varvec{\Gamma }}_{k}$ and $\varvec{\Sigma }$ for all $t>0$ and $j=0,1,...:$

(A1)
there exists a sequence $d_{j}$ with $\sum _{j=1}^{\infty }d_{j}^{2}<\infty $ such that
$$\begin{aligned} \max (T\left( \varvec{\Psi }_{j},t\right) ,T(\varvec{\Psi }_{j}^{^{\prime }},t))\le C_{1}d_{j}t^{-\alpha } \end{aligned}$$
for some constants $C_{1}>0$, $\alpha >0$ ,
(A2)
$T\left( \varvec{\Sigma },t\right) \le C_{2}t^{-\alpha }$ for some constants $C_{2}>0$, $\alpha >0$,
(A3)
$\sum _{j=1}^{\infty }r_{j}^{2}<\infty $, where $r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})$,
(A4)
$\sum _{j=1}^{\infty }r_{j}<\infty $, where $r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})$,
(A5)
$\lambda _{\max }\left( \varvec{\Sigma }\right) \le C_{3}$ for some constant $C_{3}>0$.

These conditions are restrictions on the parameter space. It is obvious that (A3) implies (A4) but the converse is not true. If the covariance matrix $ {\varvec{\Sigma =}}( \gamma _{ij}) $ is such that $\left| \gamma _{ij}\right| \le C\left| i-j\right| ^{-\alpha -1}$ for all i, j for some $\alpha >0$, then $T\left( {\varvec{\Sigma }},t\right) \le C_{2}t^{-\alpha }$ and (A2) holds. Conditions (A1)-(A2) are tapering conditions and specify the rate of decay for the matrices ${\varvec{\Psi }}_{j}$ and ${\varvec{\Sigma }}$ away from the diagonal.

Of course, if (A3) holds then $\left\| {{\varvec{\Gamma }}}_{k}\right\| _{(1,1)}<\infty $ (see (2)), and $\sum _{j=0}^{\infty }\left\| {\varvec{\Psi }}_{j}\right\| _{(1,1)}^{2}<\infty $ implies that the series in (1) converges almost surely.

Next, in “Remarks on the condition (A1) for AR(1) processes” and “Remarks on the condition (A1) for AR(r) processes” sections in a few remarks we will give a discussion about condition (A1) is fulfilled for vector AR(1) and AR(r) processes. In “Comparison of our results with previous studies” section we compare our main result Theorem 1 with the previous studies. In “Conclusions” section we summarize results related to Theorem 1 available in the literature.

2.1 The main results

The main results in this section concern the rate of convergence in operator norm for the kth order autocovariance matrix ${\varvec{\Gamma }}_{k}$ and for the precision matrix ${\varvec{\Gamma }}_{k}^{-1}$.

Theorem 1

Suppose (A1)-(A2) hold. Then

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}( \Vert \varvec{\Sigma } \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \end{aligned}$$

(4)

for $l_{n}=( c_{n}) ^{-1/(\alpha +1)}$. Here $c_{n}=\sqrt{n^{-1} \log p}$ when (((Gauss) and (A5)) or (SGauss)) and (A3) hold, and $ p={\mathcal {O}}( n^{\gamma /2}) $ for some $\gamma >1$ as $ n\rightarrow \infty $; and $c_{n}=p^{2/\beta }\sqrt{n^{-1} \log p}$ when (NGa$_{\beta }$) and (A4) hold, and $p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0$ as $n\rightarrow \infty $.

The proof of Theorem 1 is similar to the proofs of Theorems 1-2 in Bhattacharjee and Bose (2014b), but instead of their Lemmas 3, 5 we use a maximal inequality for the (SGauss) case and Pisier’s inequality (see (Pisier 1983)) for the (NGa$_{\beta }$) case.

Corollary 1

Under the assumptions of Theorem 1 and when $\varvec{ \Gamma }_{k}^{-1}$ exists and $\Vert {\varvec{\Gamma }} _{k}^{-1}\Vert _{2}={\mathcal {O}}(1) $, we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}^{-1}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}^{-1}\big \Vert _{2}=\mathcal {O}_{P}( \Vert \varvec{\Sigma } \Vert _{(1,1)}( c_{n}) ^{\alpha /(\alpha +1)}) \text {,} \end{aligned}$$

(5)

where $l_{n}$ and $c_{n}$ are defined in Theorem 1.

Corollary 1 is a simple consequence of Theorem 1. We also obtain the rate of convergence of ${\varvec{{\hat{\Gamma }}}}_{k}$ in supremum norm, which will be used in the proof of Theorem 1.

Lemma 1

We have

$$\begin{aligned} \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }}_{k}\big \Vert _{\infty }={\mathcal {O}}_{P}\left( c_{n}\right) \text {,} \end{aligned}$$

(6)

where $c_{n}=\sqrt{n^{-1} \log p}$ if (SGauss) and (A3) hold and $p= {\mathcal {O}}\left( n^{\gamma /2}\right) $ for some $\gamma >1$ as $ n\rightarrow \infty $, and $c_{n}=p^{1/\beta }\sqrt{n^{-1} \log p}$ when (NGa$_{\beta }$) and (A4) hold and $p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0$ as $n\rightarrow \infty $.

Next, we obtain non-asymptotic bounds on the error probability of the estimation of the kth order autocovariance.

Proposition 1

Suppose (A1)-(A2) hold. Then for any $\eta >\tilde{C} _{1}\left\| {\varvec{\Sigma }}\right\| _{(1,1)}+{\tilde{C}}_{2}$,

$$\begin{aligned}&P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})- \varvec{\Gamma }_{k}\big \Vert _{2}>\eta \Big ) \nonumber \\&\quad \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( -\frac{(n-k)^{2}(\eta -(\tilde{C} _{1}\left\| \varvec{\Sigma }\right\| _{(1,1)}+\tilde{C} _{2})l_{n}^{-\alpha })^{2}}{2\sigma ^{2}\left( 2l_{n}+1\right) ^{2}r_{i}^{2}r_{j}^{2}} \Big ) \end{aligned}$$

(7)

for some constants ${\tilde{C}}_{1}$, ${\tilde{C}}_{2}$ where (SGauss) and (A3) hold, and

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le C_{\beta }C_{*}\frac{p^{2/\beta }n^{1/2}\left( 2l_{n}+1\right) }{(n-k)(\eta -(\tilde{C}_{1}\left\| \varvec{\Sigma }\right\| _{(1,1)}+\tilde{C}_{2})l_{n}^{-\alpha })} \end{aligned}$$

(8)

for some constants ${\tilde{C}}_{1}$, ${\tilde{C}}_{2}$, $C_{*}=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}$ and $C_{\beta }$ depending on $\beta $ when (NGa$_{\beta }$) and (A4) hold.

2.2 Remarks on the condition (A1) for AR(1) processes

As a special case, we consider a multivariate AR(1) process

$$\begin{aligned} \begin{aligned} {\mathbf {X}}_{t}=\sum _{j=0}^{\infty }{\mathbf {A}}^{j}{{\varepsilon }}_{t-j} \end{aligned} \end{aligned}$$

(9)

for some $p\times p$ matrix ${\mathbf {A}}$. Then $\varvec{\Psi }_{j}={\mathbf {A}} ^{j}$ for $j=1,2,\dots $. Let $t>0$. We impose one of two conditions on the matrix ${\mathbf {A}}$:

(w1)
there exists $\alpha >0$, a constant $C>0$, and a sequence $ b_{j} $ such that $\sum _{j=1}^{\infty }jb_{j}\max (\left\| \mathbf {A} \right\| _{(1,1)}^{j-1},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)}^{j-1})<\infty $ and $\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A} ^{^{\prime }}\Vert _{(1,1)})<1$ and $\max (T(\mathbf {A},t/ 2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1}))\le Cb_{j}t^{-\alpha }$.
(w2)
$\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)})<2^{-\alpha }$ and $\max (T( \mathbf {A},t/2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1} ))\le Ct^{-\alpha }$ for some $\alpha >0$ and some constant $C>0$.

Remark A

Condition (A1) is fulfilled for $d_{j}=jb_{j}\max (\left\| {\mathbf {A}}\right\| _{(1,1)}^{j-1},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j-1})$ when (w1) holds, because from Lemma 5 (see “Appendix”)

$$\begin{aligned} \begin{aligned} \max (T ( \varvec{\Psi }_{j},t) ,T(\varvec{\Psi } _{j}^{^{\prime }} ,t))=&{} \max (T ( {\mathbf {A}} ^{j},t) ,T(({\mathbf {A}} ^{^{\prime }} ) ^{j} ,t)) \\\le&{} j\max (\Vert {\mathbf {A}}\Vert _{(1,1)}^{j-1}T({\mathbf {A}},t/2^{j-1}),\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j-1}T({\mathbf {A}}^{^{\prime }},t/2^{j-1})). \end{aligned} \end{aligned}$$

Similarly, condition (A1) is fulfilled for $d_{j}=j2^{(j-1)\alpha }\max (\Vert {\mathbf {A}}\Vert _{(1,1)}^{j-1},\Vert {\mathbf {A}} ^{^{\prime }}\Vert _{(1,1)}^{j-1})$ when (w2) holds. Condition (A3) holds when $\max (\left\| {\mathbf {A}}\right\| _{(1,1)},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)})<1$, because

$$\begin{aligned} \sum _{j=1}^{\infty }r_{j}=\sum _{j=1}^{\infty }\max (\Vert \varvec{\Psi } _{j}\Vert _{(1,1)},\Vert \varvec{\Psi }_{j}^{^{\prime }}\Vert _{(1,1)})\le \sum _{j=1}^{\infty }\max (\Vert {\mathbf {A}} \Vert _{(1,1)}^{j},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j})\text {.} \end{aligned}$$

2.3 Remarks on the condition (A1) for AR(r) processes

Now, we consider a multivariate AR(r) process of order r given by

$$\begin{aligned} \begin{aligned} {{\mathbf {X}}}_{{t}}=\sum _{i=1}^{r}{\mathbf {A}}_{i}{\mathbf {X}}_{t-i}+{ \varepsilon }_{t}\text{, } \end{aligned} \end{aligned}$$

$t\ge 1$, where the $p\times p$ matrices ${\mathbf {A}}_{i}$, $ i=1,\ldots ,r$, are called parameter matrices. Under some regularity condition (for more details see ((Bhattacharjee and Bose 2014b) (4), p. 264) and Brockwell and Davis (2002)) this process has the representation ( 1), where ${\varvec{\Psi }}_{\mathbf {0}}={\mathbf {I}}$ and

$$\begin{aligned} \begin{aligned} {\varvec{\Psi }}_{{j}}=\sum _{i=1}^{\min (j,r)}{\mathbf {A}}_{i}\varvec{\Psi }_{j-i} \text{ for } j=1,2,\dots \end{aligned} \end{aligned}$$

(10)

We consider the class ${\mathcal {A}}$ of r-sequences of matrices, defined by

$$\begin{aligned} \begin{aligned} \mathcal {A=}\left\{ \begin{array}{c} \max (\Vert {{\mathbf {A}}}_{i}\Vert _{(1,1)},\Vert {\mathbf {A}} _{i}^{^{\prime }}\Vert _{(1,1)})\le C_{1}\delta ^{i} \text{ and } \text{ for } \text{ all } t>0\text{, } \\ \max (T\left( {{\mathbf {A}}}_{i},t\right) ,T({\mathbf {A}}_{i}^{^{\prime }},t))\le C_{2}\delta ^{i}t^{-\alpha } \text{ for } \text{ some } 0<\delta <1\text{, } \\ \alpha>0\text{, } C_{1}=1/\left( 2^{\alpha }r\right) \text{, } i=1,2,\ldots ,r \text{ for } \text{ some } \text{ constant } C_{2}>0\text{. } \end{array} \right. \end{aligned} \end{aligned}$$

Observe that if $T\left( {\mathbf {A}}_{i},t\right) \le C_{2}\delta ^{i}t^{-\alpha }$ for some $0<\delta <1$ and

$$\begin{aligned} \max _{1\le k\le p}\left| a_{i}^{k,k}\right| \le C\delta ^{i}, \end{aligned}$$

where ${\mathbf {A}}_{i}=\left( a_{i}^{s,k}\right) $, then $\left\| {\mathbf {A}}_{i}\right\| _{(1,1)}\le \left( 2^{\alpha }C_{2}+C\right) \delta ^{i}$. Indeed, for any $ t>0$,

$$\begin{aligned} \left\| \mathbf {A}_{i}\right\| _{(1,1)}\le & {} T(\mathbf {A} _{i},t)+\max _{1\le k\le p}\sum _{s:\left| s-k\right| \le t}\left| a_{i}^{s,k}\right| \\\le & {} C_{2}\delta ^{i}t^{-\alpha }+\max _{1\le k\le p}\sum _{s:\left| s-k\right| \le t}\left| a_{i}^{s,k}\right| \text {.} \end{aligned}$$

Therefore, putting $t=1/2$, we get

$$\begin{aligned} \left\| {\mathbf {A}}_{i}\right\| _{(1,1)}\le C_{2}2^{\alpha }\delta ^{i}+\max _{1\le k\le p}\left| a_{i}^{k,k}\right| \le \left( 2^{\alpha }C_{2}+C\right) \delta ^{i}\text {.} \end{aligned}$$

Similarly, we may obtain $\Vert {\mathbf {A}} _{i}^{^{\prime }} \Vert _{(1,1)}\le ( 2^{\alpha }C_{2}+C) \delta ^{i}$.

Proposition 2

Under the condition of class ${\mathcal {A}}$, we have

$$\begin{aligned} \max (\Vert \varvec{\Psi }_{j}\Vert _{(1,1)},\Vert \varvec{ \Psi }_{j}^{^{\prime }}\Vert _{(1,1)})\le C_{1}\delta ^{j} \end{aligned}$$

(11)

and

$$\begin{aligned} \max (T\left( \varvec{\Psi }_{j},t\right) ,T(\varvec{\Psi }_{j}^{^{\prime }},t))\le C_{2}j\delta ^{j}t^{-\alpha } \end{aligned}$$

(12)

for $j=1,2,\ldots $

Remark B

Immediately from Proposition 2, we have that under the condition of class ${\mathcal {A}}$ condition (A1) holds for $ d_{j}=j\delta ^{j}$ for multivariate AR(r) processes.

2.4 Comparison of our results with previous studies

Similar results to those given in Theorem 1 have been presented in Bickel and Levina (2008b) for i.i.d. Gaussian observations ${{\mathbf {X}}}_{1},...,{\mathbf {X}}_{n}$ and in Bhattacharjee and Bose (2014b) for a p-dimensional linear process with $n^{-1} \log p\rightarrow 0$. In Bhattacharjee and Bose (2014b) it is assumed that the matrix $\varvec{ \Sigma }$ belongs to the class

$$\begin{aligned} \mathcal {U=}\big \{ \varvec{\Sigma :}\text { }0<\varepsilon<\lambda _{\min }\left( \varvec{\Sigma }\right) \le \lambda _{\max }\left( \varvec{\Sigma } \right)<1/\varepsilon ,T\left( \varvec{\Sigma ,}t\right) <Ct^{-\alpha } \text { for all }t>0\big \} \text {,} \end{aligned}$$

where $\varepsilon ,\alpha ,C>0$ and $\lambda _{\min }\left( \varvec{\Sigma } \right) $, $\lambda _{\max }\left( \varvec{\Sigma }\right) $ are respectively the minimum and maximum eigenvalues of $\varvec{\Sigma }$, the coefficient matrices $\left( \varvec{\Psi }_{j}\right) $ are in ${\mathcal {T}} _{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }$ for some $0<\beta <1 $, $\lambda \ge 0$, $\alpha $, $\nu >0$, $0<\eta <1$, where

$$\begin{aligned} {\mathcal {T}}_{\beta ,\lambda }=\big \{\big ( \varvec{\Psi }_{j}\big ) :\sum _{j=0}^{\infty }r_{j}^{\beta }<\infty ,\sum _{j=0}^{\infty }r_{j}^{2( 1-\beta ) }j^{\lambda }<\infty \big \}\text {,} \end{aligned}$$

$$\begin{aligned} {\mathcal {G}}_{\alpha ,\eta ,\nu }= & {} \Big \{\left( \varvec{\Psi }_{j}\right) :T( \varvec{\Psi }_{j},t\sum _{u=0}^{j}\eta ^{u})<Ct^{-\alpha }r_{j}j^{v}\sum _{u=0}^{j}\eta ^{-u\alpha },\text { and} \\&\sum _{j=k}^{\infty } \frac{r_{j}r_{j-k}j^{\nu }}{\eta ^{\alpha j}}<\infty \Big \}\text {,} \end{aligned}$$

with $r_{j}=\max (\Vert \varvec{\Psi }_{j}\Vert _{(1,1)},\Vert {\varvec{\Psi }}_{j}{^{^{\prime }}}\Vert _{(1,1)})$. Additionally they assumed that for some $\lambda _{0}>0$,

$$\begin{aligned} \sup _{j\ge 1}E\left( e^{\lambda \varepsilon _{1,j}}\right)<\infty \text { for all }\left| \lambda \right| <\lambda _{0}\text {.} \end{aligned}$$

(13)

In our Theorem 1 we consider (Gauss), (SGauss) or (NGa$_{\beta }$) for $\left( {\varepsilon }_{i}\right) $. In particular, condition (NGa $_{\beta }$) is much weaker than (13). The class ${\mathcal {U}}$ implies our conditions (A2) and (A5). Also conditions (A1) and (A3) are weaker than ${\mathcal {T}}_{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }$. For example if $r_{j}\sim j^{-a}$ for $a>1$, then (A4) holds but the conditions in ${\mathcal {T}}_{\beta ,\lambda }$ are satisfied for $a>\max ( 1/\beta ,(\lambda +1)/2( 1-\beta ))$, and those in ${\mathcal {G}}_{\alpha ,\eta ,\nu }$ are satisfied for $r_{j}\sim b^{j}$ for some $0<b<1$.

Bickel and Levina (2008a) considered asymptotic behavior of the threshold estimator of the covariance matrix ${\varvec{\Gamma }}_{0}$ of the form

$$\begin{aligned} \begin{aligned}T_{u}(\varvec{\hat{\Gamma }}_{0})=(\hat{\gamma }_{ij}^{0}\mathbf {1}(\left| \hat{\gamma }_{ij}^{0}\right| \ge u))\text{. } \end{aligned} \end{aligned}$$

(14)

For Gaussian data, where ${\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) $ and

$$\begin{aligned} {\mathcal {G}}_{r}( M) :=\big \{{\varvec{\Gamma }}_{0}=( \gamma _{ij}) :\text { }\max _{1\le i\le p}\gamma _{ii}\le 1\text {, } \max _{1\le i\le p}\sum _{j=1}^{p}\vert \gamma _{ij}\vert ^{r}\le M\big \},0\le r<1, \end{aligned}$$

(15)

they obtained

$$\begin{aligned} \big \Vert T_{u}({\varvec{{\hat{\Gamma }}}}_{0})-{\varvec{\Gamma }}_{0}\big \Vert _{2}={\mathcal {O}}_{P}(Mu^{1-r}) \end{aligned}$$

(16)

where $u=C(\log p)^{1/2}n^{-1/2}$ for a sufficiently large constant C. For nonnormal data where ${\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) $, they obtained (16) for $u=Cp^{2/q}n^{-1/2}$, for a sufficiently large constant C (see (Chen et al. 2013), (48)).

Cai et al. (2010) obtained the minimax upper bound for a special class of tapering estimators of ${\varvec{\Gamma }}_{0}$ for i.i.d. observations under (A2) and (A5). Its upper rate bound equals ${\mathcal {O}}_{ P}\big ( \min \left\{ n^{-2\alpha /(2\alpha +1)}+n^{-1} \log p,p/n\right\} \big )$ and is better than our result for (Gauss) and (SGauss) from Theorem 1. It means that for i.i.d. observations our result is suboptimal.

A sharper rate than in (16) for a nonnormal case was obtained by Chen et al. (2013), where data come from a general weak dependence model. In particular, when data come from a linear process (1) with (NGa$ _{\beta }$) for $\beta =2q$ for some $q>2$, and the coefficient matrices $ \varvec{\Psi }_{j}=\left( \psi _{k,s}\left( j\right) \right) _{1\le k,s\le p}$ satisfy

$$\begin{aligned} \max _{1\le k\le p}\sum _{s=1}^{p}\left( \psi _{k,s}\left( j\right) \right) ^{2}={\mathcal {O}}( j^{-\left( 2+2\gamma \right) }) \end{aligned}$$

for some $\gamma >1/2-1/q$, ${\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) $ and $M<p$, then one can obtain (16) for $u=u_{*}$, where $u_{*}=\max \left( u_{1},u_{2},u_{3}\right) $, $u_{1}=M^{\frac{1}{ q+r}}p^{\frac{1}{q+r}}n^{\frac{1-q}{q+r}}$, $u_{2}=\sqrt{n^{-1} \log p}$, $u_{3}=M^{-\frac{2}{q-2r}}p^{\frac{2}{q-2r}}n^{\frac{(1-q)}{q-2r}}$(for more details see discussion preceding Corollary 2.7 in Chen et al. (2013)).

Under condition (Gauss) or (SGauss), we obtain the same rate of convergence as in Bickel and Levina (2008a) for the covariance matrix $\varvec{ \Gamma }_{0}$ for normal data and as in Bhattacharjee and Bose (2014b) for ${\varvec{\Gamma }}_{k}$. In particular, if ${{\varvec{\Gamma }}}_{\mathbf {0}}\in {\mathcal {G}}_{r}\left( M\right) $ then $\left\| \varvec{\Sigma }\right\| _{(1,1)}\le M$, and from Theorem 1 we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{0})-\varvec{\Gamma } _{0}\big \Vert _{2}=\mathcal {O}_{P}( M(n^{-1} \log p ) ^{\frac{\alpha }{2(\alpha +1)}}) \text {.} \end{aligned}$$

(17)

If $M\sim p^{\eta }$ for some $\eta \in [0,1)$, then the r.h.s. of (16) for $u=u_{*}$ equals

$$\begin{aligned} \max \big ( p^{\frac{(1-r)(2-2\eta )}{q+r}+\eta }n^{\frac{(1-q)(1-r)}{q+r} },p^{\eta }(n^{-1} \log p) ^{\frac{1-r}{2}},p^{\eta +\frac{ (1-r)(2-2\eta )}{q-2r}}n^{\frac{(1-q)(1-r)}{q-2r}}\big ) \text {,} \end{aligned}$$

(18)

which is greater than the r.h.s. of (17), $p^{\eta }\left( n^{-1} \log p\right) ^{\frac{\alpha }{2(\alpha +1)}}$. Thus, our rate of convergence is better than the rate (16) for $u=u_{*}$ in Chen et al. (2013) for a linear process.

Guo et al. (2016) worked with a bounded covariance estimator for $ l_{n}=C\log (n/\log p)$ for some $C>0$ for a sparse multivariate AR model. The rate of convergence for operator norm was ${\mathcal {O}}_{P}(\log ( n/\log p)\sqrt{n^{-1} \log p})$. In a special case, from our Theorem 1 we obtain the better rate ${\mathcal {O}}_{P}(\log ^{\alpha }(n^{-1} \log p))$.

Jentsch and Politis (2015) dealt with so called flat-top tapered covariance matrix estimation for a multivariate strictly stationary time series process. In the special case when the observations come from a $\ p$ dimensional linear process ${\mathbf {X}}_{t}=\sum _{j=-\infty }^{\infty } \varvec{\Psi }_{j}{\varepsilon }_{t-j}$, where the sequence $\left( \varvec{\Psi }_{j}\right) $ of coefficient matrices is component-wise absolutely summable (then $\sum _{h=-\infty }^{\infty }\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| <\infty $, where $ {\varvec{\Gamma }}_{k}=\left( \gamma _{i,j}\left( h\right) \right) $) with i.i.d. noise $\left( {\varepsilon }_{t}\right) $ with fourth moments, they obtained

$$\begin{aligned} \Big ( E\big \Vert {\varvec{{\hat{\Gamma }}}}_{k,l}-{\varvec{\Gamma }} _{k}\big \Vert _{2}^{2}\Big ) ^{1/2}={\mathcal {O}}\Big ( \frac{lp^{2}}{\sqrt{n}} +\sum _{h=l+1}^{n-1}\sum _{i,j=1}^{p}\big \vert \gamma _{i,j}(h)\big \vert \Big ) \text {,} \end{aligned}$$

(19)

where ${\varvec{{\hat{\Gamma }}}}_{k,l}$ is a flat-top tapered covariance matrix estimator of ${\varvec{\Gamma }}_{k}$, where $l=o\left( \sqrt{n}\right) $ is the banding parameter. If $\psi _{i,j}(h)\le Ch^{-(1+\alpha )}$ for some $ C>0$ and $\alpha >0$, where $\varvec{\Psi }_{h}=\left( \psi _{i,j}(h)\right) $, then (A1) holds and under (A2) and (SGauss) we deduce from Theorem 1 that the r.h.s. of (4) is of order ${\mathcal {O}}_{P} ((n^{-1}\log p)^{\alpha /2(\alpha +1)})$. Therefore $ \left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( h^{-\alpha -1}\right) $ and for $l=n^{1/2-\kappa }$ for some $\kappa \in (0,1/2)$ we have $\sum _{h=l+1}^{n-1}\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( l^{-\alpha }\right) $, so the r.h.s. of (19) is of order ${\mathcal {O}}\left( p^{2}n^{-\kappa }+l^{-\alpha }\right) =\mathcal {O(}p^{2}n^{-\kappa }+n^{-\left( \frac{1}{2}-\kappa \right) \alpha })$, which is worse rate than in Theorem 1. Similarly when (NGa$_{\beta }$) and $p={\mathcal {O}}\left( n^{\gamma /2}\right) $ for some $\gamma >1$, then from Theorem 1 the r.h.s. of (4) is of order ${\mathcal {O}}_{P}( ( p^{2/\beta }\sqrt{ n^{-1} \log p}) ^{\alpha /(\alpha +1)}) $ and this rate is sharper than the r.h.s. of (19).

2.5 Conclusions

Our main result (Theorem 1) was compared with related results available in the literature. In the special case of a multidimensional linear process we obtained a better rate of convergence of our covariance estimator in operator norm than (Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Our result is similar to that of Bhattacharjee and Bose (2014b), but under milder assumptions for the noise process $\left( { \varepsilon }_{t}\right) $ and for the admissible class matrices $\varvec{ \Gamma }_{k}$.

Comparing the results of estimating the covariance matrix is difficult because those results are for different assumptions on the class of covariance matrices and for independent or dependent data. For independent data Cai et al. (2010) obtained the optimal rate for tapering estimators of $\varvec{\Gamma _0}$. In contrast such results do not exist for dependent data and the problem of finding the optimal rate of convergence in Theorem 1 is still open.

3 Appendix

Lemma 2

(Bhattacharjee and Bose 2014b). For any matrices ${\mathbf {A}}$ , ${\mathbf {B}}$ and for all $\alpha $, $\beta $, $t>0$,

(i)
$\left\| \mathbf {AB}\right\| _{(1,1)}\le \left\| \mathbf {A}\right\| _{(1,1)}\left\| \mathbf {B}\right\| _{(1,1)}$,
(ii)
$T\left( \mathbf {AB,}\left( \alpha +\beta \right) t\right) \le \left\| \mathbf {A}\right\| _{(1,1)}T\left( \mathbf {B},\alpha t\right) +\left\| \mathbf {B}\right\| _{(1,1)}T\left( \mathbf {A},\beta t\right) $ .

Lemma 3

Suppose (A1)–(A2) and (A3) hold. Then, for all $t>0$,

$$\begin{aligned} T\left( \varvec{\Gamma }_{k},t\right) \le (\tilde{C}_{1}\left\| \varvec{ \Sigma }\right\| _{(1,1)}+\tilde{C}_{2})t^{-\alpha } \end{aligned}$$

for some $\alpha >0$, where ${\tilde{C}}_{1}$, ${\tilde{C}}_{2}$ are some constants.

Proof

Observe

$$\begin{aligned} \begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) =T(\sum _{j=k}^{\infty }\varvec{\Psi }_{j}\varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t)\le \sum _{j=k}^{\infty }T( \varvec{\Psi }_{j}\varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t)\text{. } \end{aligned} \end{aligned}$$

From Lemma 2(ii), we have

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) \le \sum _{j=k}^{\infty }\Vert \varvec{\Psi }_{j}\Vert _{(1,1)}T(\varvec{\Sigma \Psi } _{j-k}^{^{\prime }},t/2)+\sum _{j=k}^{\infty }\Vert \varvec{ \Sigma \Psi }_{j-k}^{^{\prime }}\Vert _{(1,1)}T(\varvec{\Psi }_{j}, t/2). \end{aligned}$$

It follows from Lemma 2(i) that

$$\begin{aligned} \begin{aligned} \Vert \varvec{\Sigma \Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}\le \Vert \varvec{\Sigma }\Vert _{(1,1)}\Vert \varvec{\Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}\text{. } \end{aligned} \end{aligned}$$

Hence

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right) \le \sum _{j=k}^{\infty }r_{j}T( \varvec{\Sigma \Psi }_{j-k}^{^{\prime }},t/2)+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi }_{j}, t/2)\text {.} \end{aligned}$$

Again using Lemma 2(ii), we obtain

$$\begin{aligned} \begin{aligned} T(\varvec{\Sigma \Psi } _{j-k}^{^{\prime }},t/2)\le \Vert {\varvec{\Sigma }\Vert }_{{(1,1)}}T(\varvec{\Psi }_{j-k}^{^{\prime }}, t/4)+\Vert \varvec{\Psi } _{j-k}^{^{\prime }}\Vert _{(1,1)}T(\varvec{\Sigma }, t/4) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right)\le & {} \sum _{j=k}^{\infty }r_{j}(\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi } _{j-k}^{^{\prime }},t/4)+r_{j-k}T(\varvec{\Sigma }, t/4)) \\&+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}T(\varvec{\Psi }_{j},t/2)\text {.} \end{aligned}$$

By assumptions (A1)-(A2), we see that

$$\begin{aligned} T\left( {\varvec{\Gamma }}_{k},t\right)\le & {} \sum _{j=k}^{\infty }r_{j}(\left\| \varvec{\Sigma }\right\| _{(1,1)}4^{\alpha }C_{1}d_{j-k}t^{-\alpha }+r_{j-k}4^{\alpha }C_{2}t^{-\alpha }) \nonumber \\&+\sum _{j=k}^{\infty }r_{j-k}\left\| \varvec{\Sigma }\right\| _{(1,1)}2^{\alpha }C_{1}d_{j}t^{-\alpha } \nonumber \\\le & {} \left\| \varvec{\Sigma }\right\| _{(1,1)}4^{\alpha }C_{1}t^{-\alpha }\sum _{j=k}^{\infty }r_{j}d_{j-k}+4^{\alpha }C_{2}t^{-\alpha }\sum _{j=k}^{\infty }r_{j}r_{j-k} \nonumber \\&+\left\| \varvec{\Sigma }\right\| _{(1,1)}2^{\alpha }C_{1}t^{-\alpha }\sum _{j=k}^{\infty }r_{j-k}d_{j}\text {.} \end{aligned}$$

(20)

From the Schwarz inequality, (A1) and (A3), we have

$$\begin{aligned} \sum _{j=k}^{\infty }r_{j}d_{j-k}\le \sqrt{\sum _{j=k}^{\infty }r_{j}^{2}} \sqrt{\sum _{j=k}^{\infty }d_{j-k}^{2}}<\infty \end{aligned}$$

and similarly $\sum _{j=k}^{\infty }r_{j-k}d_{j}<\infty $, $ \sum _{j=k}^{\infty }r_{j}r_{j-k}<\infty $. Therefore, from (20), we get

$$\begin{aligned} T\left( \varvec{\Gamma }_{k},t\right) \le (\tilde{C}_{1}\left\| \varvec{ \Sigma }\right\| _{(1,1)}+\tilde{C}_{2})t^{-\alpha } \end{aligned}$$

(21)

for some constants ${\tilde{C}}_{1}>0$ and ${\tilde{C}}_{2}>0$. $\square $

Lemma 4

Suppose (SGauss) and (A3) hold. Then for any $\eta >0$,

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{i}^{2}r_{j}^{2}}\Big ) \text { .} \end{aligned}$$

(22)

Suppose (NGa$_{\beta }$) and (A4) hold. Then for any $\eta >0$,

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{\beta }C^{*}\frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\text {,} \end{aligned}$$

(23)

where $C^{*}:=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}$ and $ C_{\beta }$ is some constant depending on $\beta $.

Proof

By a simple calculation,

$$\begin{aligned}&\big \Vert \varvec{\hat{\Gamma }}_{k}-\varvec{\Gamma }_{k} \big \Vert _{\infty }=\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\mathbf {X}_{m}\mathbf {X} _{m-k}^{^{\prime }}-E(\mathbf {X}_{m}\mathbf {X}_{m-k}^{^{\prime }})\Big \Vert _{\infty } \\&\quad =\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi }_{j}{\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}\varvec{\Psi }_{i}^{^{\prime }} -\\&\qquad \frac{1}{ n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi } _{j}E({\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }})\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{\infty } \\&\quad =\Big \Vert \frac{1}{n-k}\sum _{m=k+1}^{n}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\varvec{\Psi }_{j}({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j} {\varepsilon }_{m-i-k}^{^{\prime }}))\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{\infty } \\&\quad \le \frac{1}{n-k}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }\Big \Vert \varvec{ \Psi }_{j}\varvec{\Psi }_{i}^{^{\prime }}\Big \Vert _{(1,1)}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty } \\&\quad \le \frac{1}{n-k}\sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j} {\varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty }\text {.} \end{aligned}$$

Hence

$$\begin{aligned}&P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \\&\quad \le P\Big ( \sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\Big \Vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j}{\varepsilon } _{m-i-k}^{^{\prime }}-E({\varepsilon }_{m-j}{ \varepsilon }_{m-i-k}^{^{\prime }})\Big \Vert _{\infty }>(n-k)\eta \Big ) \\&\quad =P\Big ( \sum _{j=0}^{\infty }\sum _{i=0}^{\infty }r_{j}r_{i}\max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}{\varepsilon }_{m-j,l}{ \varepsilon }_{m-i-k,s}-E\Big ( {\varepsilon }_{m-j,l} {\varepsilon }_{m-i,s}\Big ) \Big \vert \\ {}&\quad >(n-k)\eta \Big ) \text {.} \end{aligned}$$

Put $\zeta _{m,i,j}^{l,s}:={\varepsilon }_{m-j,l}{\varepsilon } _{m-i-k,s}$. Then

$$\begin{aligned} P\big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \big ) \le \sum _{j,i=0}^{\infty }P\big ( \max _{1\le l,s\le p}\big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\big ( \zeta _{m,i,j}^{l,s}\big ) \big \vert > \frac{(n-k)\eta }{r_{j}r_{i}}\big ) \text {.} \nonumber \\ \end{aligned}$$

(24)

From (SGauss) and (A3) we conclude that

$$\begin{aligned}&P\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\Big ( \zeta _{m,i,j}^{l,s}\Big ) \Big \vert> \frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \\&\quad \le \sum _{s=1}^{p}P\Big ( \max _{1\le l\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E\Big ( \zeta _{m,i,j}^{l,s}\Big ) \Big \vert >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \text {.} \end{aligned}$$

Since for fixed s, $(\zeta _{m,i,j}^{l,s})$ is sub-Gaussian with constant $ \sigma ^{2}$, we have

$$\begin{aligned} E\exp (u\zeta _{m,i,j}^{l,s})\le \exp (u^{2}\sigma ^{2}/2) \text {.} \end{aligned}$$

Hence, from the maximal inequality for sub-Gaussian r.v.’s we obtain

$$\begin{aligned} P\Big ( \max _{1\le l\le p}\left| \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\right| >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \le 2p\exp \Big ( -\frac{\left( n-k\right) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text {.} \end{aligned}$$

Hence, we have

$$\begin{aligned} P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text { .} \end{aligned}$$

Suppose (NGa$_{\beta }$) and (A4) hold. Applying to the r.h.s. of (24) Pisier’s maximal inequality (Pisier 1983)

$$\begin{aligned} E\max _{1\le i\le N}\left| Z_{i}\right| \le N^{1/Q}\max _{1\le i\le N}\left\| Z_{i}\right\| _{Q}\text {,} \end{aligned}$$

which holds for any random variables $\left( Z_{i}\right) $ with $\left\| Z_{i}\right\| _{Q}=E^{1/Q}\left| Z_{i}\right| ^{Q}<\infty $ for $Q>1$, and putting $Q=\beta $, we obtain

$$\begin{aligned}&P\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \vert >\frac{(n-k)\eta }{r_{j}r_{i}}\Big ) \\&\quad \le \frac{r_{j}r_{i}}{(n-k)\eta }E\Big ( \max _{1\le l,s\le p}\Big \vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \vert \Big ) \\&\quad \le \frac{p^{2/\beta }r_{j}r_{i}}{(n-k)\eta }\max _{1\le l,s\le p}\Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta } \\&\quad \le \frac{p^{2/\beta }}{(n-k)\eta }r_{j}r_{i}\max _{1\le l,s\le p}\Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta }\text {.} \end{aligned}$$

For fixed i, j, l, s the sequence $(\zeta _{m,i,j}^{l,s})_{m}$ is i.i.d. and using the moment bound ( Petrov 1995) and (NGa$_{\beta }$), we obtain

$$\begin{aligned} \Big \Vert \sum _{m=k+1}^{n}\zeta _{m,i,j}^{l,s}-E(\zeta _{m,i,j}^{l,s})\Big \Vert _{\beta }\le C_{\beta }n^{1/2}\text {.} \end{aligned}$$

Hence, we get (23), as desired. $\square $

Proof of Lemma 1

From Lemma 4 under (SGauss), it follows that

$$\begin{aligned} P\Big ( \Big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\Big \Vert _{\infty }>\eta \Big ) \le 2p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( - \frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) \text { .} \end{aligned}$$

Since for all $x>0$, we have $\exp ( -x) <x^{-\gamma }$ for some $ \gamma >1$, it follows that

$$\begin{aligned} \sum _{i,j=0}^{\infty }\exp \Big ( -\frac{( n-k) ^{2}\eta ^{2}}{2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big ) <\frac{( 2\sigma ^{2}) ^{\gamma }}{( n-k) ^{2\gamma }\eta ^{2\gamma }}\Big ( \sum _{j=0}^{\infty }r_{j}^{2\gamma }\Big ) ^{2}\text {.} \end{aligned}$$

Putting $\eta =\sqrt{n^{-1} \log p}$, we obtain

$$\begin{aligned} p^{2}\sum _{i,j=0}^{\infty }\exp \Big ( -\frac{( n-k) ^{2}\eta ^{2}}{ 2\sigma ^{2}r_{j}^{2}r_{i}^{2}}\Big )= & {} {\mathcal {O}}\Big ( \frac{p^{2}}{( n-k) ^{2\gamma }\eta ^{2\gamma }}\Big ) \\= & {} {\mathcal {O}}\Big ( \frac{p^{2}}{n^{\gamma }\log ^{\gamma }p}\Big ) ={\mathcal {O}}\Big ( \frac{1 }{\log ^{\gamma }p}\Big ) =o(1) \end{aligned}$$

as $n\rightarrow \infty $. Therefore, (6) holds for $c_{n}=\sqrt{ n^{-1} \log p}$. Under (NGa$_{\beta }$) and from Lemma 4, we have

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{\beta }C^{*}\frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\text {.} \end{aligned}$$

Putting $\eta =p^{2/\beta }\sqrt{n^{-1} \log p}$, we obtain

$$\begin{aligned} \frac{p^{2/\beta }n^{1/2}}{( n-k) \eta }\sim \frac{p^{2/\beta }}{ n^{1/2}\eta }=\frac{1}{\sqrt{\log p}}\rightarrow 0 \end{aligned}$$

as $n\rightarrow \infty $. $\square $

Remark C

From Lemma A.3 (Bickel and Levina 2008b) under the assumption that $({\varepsilon }_{t})$ is Gaussian and (A5), we may deduce that for some $\delta >0$ and any $\left| \eta \right| \le \delta $,

$$\begin{aligned} P\Big ( \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }} _{k}\big \Vert _{\infty }>\eta \Big ) \le C_{1}^{*}\exp (-C_{2}^{*}( n-k) \eta ^{2}) \end{aligned}$$

for some constants $C_{1}^{*}$, $C_{2}^{*}>0$. Reasoning as in the proof of Lemma 1, we may obtain (6) for $c_{n}=\sqrt{ n^{-1} \log p}$.

Proof of Theorem 1

From the inequality in (Bhattacharjee and Bose (2014b), p. 280), we find that

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}\le (2l_{n}+1)\big \Vert \varvec{\hat{\Gamma }}_{k}- \varvec{\Gamma }_{k}\big \Vert _{\infty }+T\big ( \varvec{\Gamma } _{k},l_{n}\big ) . \end{aligned}$$

(25)

From Lemma 3, we have

$$\begin{aligned} \begin{aligned} T\big ( \mathbf {\Gamma }_{k},l_{n}\big ) =\mathcal {O}(\big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}l_{n}^{-\alpha }) \end{aligned} \end{aligned}$$

(26)

for any $l_{n}\rightarrow \infty $ as $n\rightarrow \infty $. From Lemma 1 (see also Remark C when (Gauss) holds), we get

$$\begin{aligned} \big \Vert {\varvec{{\hat{\Gamma }}}}_{k}-{\varvec{\Gamma }}_{k}\big \Vert _{\infty }={\mathcal {O}}_{P}\big ( c_{n}\big ) \end{aligned}$$

(27)

for $c_{n}$ as in Lemma 1. Consequently, due to (25)-(27), we have

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}\big ( l_{n}c_{n}+\big \Vert \varvec{ \Sigma }\big \Vert _{(1,1)}l_{n}^{-\alpha }\big ) \text {.} \end{aligned}$$

Putting $l_{n}=c_{n}^{-\frac{1}{\alpha +1}}$, we obtain

$\big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma } \big \Vert _{(1,1)}l_{n}^{-\alpha }\big ) =\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}\big ( c_{n}\big ) ^{\frac{\alpha }{ \alpha +1}}\big ) $ and (4) holds.

Proof of Corollary 1

Reasoning as in (Bhattacharjee and Bose (2014b), Section 6.2), we have: if ${\varvec{A}}^{-1}$ exists and $\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le \big \Vert {\varvec{A}}^{-1}\big \Vert ^{-1}$, then $\big \Vert {\varvec{B}}^{-1}-{\varvec{A}}^{-1}\big \Vert \le \frac{\big \Vert {\varvec{A}}^{-1}\big \Vert ^{2}\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }{ 1-\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }$. Put ${\varvec{A}}=\varvec{ \Gamma }_{k}$ and ${\varvec{B}}=\mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})$. Since $ \big \Vert \varvec{\Gamma }_{k}^{-1}\big \Vert _{2}=\mathcal {O}\big ( 1\big ) $ and from Theorem 1, $\big \Vert \mathbf {B}_{l_{n}}( \varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}\big \Vert _{2}=\mathcal {O}_{ P}\big ( a_{n}\big ) $ as $a_{n}\rightarrow 0$, it follows that for large n we have $\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le 1$. Therefore for some $C>0$ and large n, we find that

$$\begin{aligned} \big \Vert \mathbf {B}_{l_{n}}^{-1}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}^{-1}\big \Vert _{2}\le C\big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{ \Gamma }}_{k})-\varvec{\Gamma }_{k}\big \Vert _{2}\text {.} \end{aligned}$$

Hence, directly from (4), we have (5).

Proof of Proposition 1

From (25), we have for any $\eta >0$,

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le P\Big ( (2l_{n}+1)\big \Vert \varvec{\hat{\Gamma }}_{k}-\varvec{\Gamma }_{k}\big \Vert _{\infty }+T\Big ( \varvec{\Gamma }_{k},l_{n}\big ) >\eta \Big ) . \end{aligned}$$

Using (21), we obtain

$$\begin{aligned} P\Big ( \big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{ \Gamma }_{k}\big \Vert _{2}>\eta \Big ) \le P\Big ( \big \Vert \varvec{\hat{ \Gamma }}_{k}-\varvec{\Gamma }_{k}\big \Vert _{\infty }>\frac{\eta -(\tilde{C }_{1}\big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}+\tilde{C} _{2})l_{n}^{-\alpha }}{2l_{n}+1}\Big ) \text {.} \end{aligned}$$

Therefore, from Lemma 4, we deduce (7)-(8).

Proof of Proposition 2

First, we will show (11) by induction. For $j=1$, under the condition of class $\mathcal {A}$, $\left\| \varvec{\Psi }_{j}\right\| _{(1,1)}=\left\| \varvec{\Psi }_{1}\right\| _{(1,1)}=\left\| \mathbf {A}_{1}\right\| _{(1,1)}\le C_{1}\delta $ and (11) holds for $j=1$. From (10), $C_{1}r<1$ and the induction assumption that $ \left\| \varvec{\Psi }_{i}\right\| _{(1,1)}\le C_{1}\delta ^{i}$ for all $i\le j$, we have

$$\begin{aligned} \Vert \varvec{\Psi }_{j+1}\Vert _{(1,1)}\le & {} \sum _{i=1}^{\min (j+1,r)}\Vert \mathbf {A}_{i}\varvec{\Psi }_{j+1-i}\Vert _{(1,1)}\le \sum _{i=1}^{\min (j+1,r)}\Vert \varvec{A}_{i}\Vert _{(1,1)}\Vert \varvec{\Psi }_{j+1-i}\Vert _{(1,1)} \\\le & {} \sum _{i=1}^{\min (j+1,r)}C_{1}^2\delta ^{i}\delta ^{j+1-i}=C_{1}^{2}\sum _{i=1}^{\min (j+1,r)}\delta ^{j+1}\le C_{1}^{2}r\delta ^{j+1}\le C_{1}\delta ^{j+1}\text {.} \end{aligned}$$

In a similar manner, we have $\big \Vert \varvec{\Psi }_{j}^{^{\prime }}\big \Vert _{(1,1)}\le C_{1}\delta ^{j}$. Hence, (11) is proved. Next, by induction we will show (12). For $j=1$, $T\left( \varvec{ \Psi }_{j},t\right) =T\left( \varvec{\Psi }_{1},t\right) =T\left( \mathbf {A} _{1},t\right) \le C_{2}\delta t^{-\alpha }$ under the condition of class $ \mathcal {A}$. Hence, (12) is satisfied for $j=1$. By (10) and Lemma 2(ii),

$$\begin{aligned} T( \varvec{\Psi }_{j+1},t)\le & {} \sum _{i=1}^{\min (j+1,r)}T( \mathbf {A}_{i}\varvec{\Psi }_{j+1-i},t) \\\le & {} \sum _{i=1}^{\min (j+1,r)}\{\Vert \mathbf {A}_{i}\Vert _{(1,1)}T(\varvec{\Psi }_{j+1-i},t/2)+\Vert \varvec{\Psi } _{j+1-i}\Vert _{(1,1)}T(\mathbf {A}_{i},t/2)\}\text {.} \end{aligned}$$

Then, by the induction assumption that $T\left( \varvec{\Psi }_{i},t\right) \le C_{2}i\delta ^{i}t^{-\alpha }$ for all $i\le j$ and under the conditions of class $\mathcal {A}$ ($C_{1}=1/\left( 2^{\alpha }r\right) $) it follows that

$$\begin{aligned} T(\varvec{\Psi }_{j+1},t)\le & {} \sum _{i=1}^{\min (j+1,r)}\big \{ C_{1}\delta ^{i}C_{2}(j+1-i)\delta ^{j+1-i}2^{\alpha }t^{-\alpha }+C_{1}\delta ^{j+1-i}C_{2}\delta ^{i}2^{\alpha }t^{-\alpha }\big \} \\\le & {} \sum _{i=1}^{\min (j+1,r)}\big \{ C_{1}C_{2}j\delta ^{j+1}2^{\alpha }t^{-\alpha }+C_{1}C_{2}\delta ^{j+1}2^{\alpha }t^{-\alpha }\big \} \\\le & {} rC_{1}C_{2}2^{\alpha }\delta ^{j+1}t^{-\alpha }(j+1)=C_{2}(j+1)\delta ^{j+1}t^{-\alpha }\text {.} \end{aligned}$$

By a similar reasoning, we obtain $T(\varvec{\Psi }_{j}^{^{\prime }},t)\le C_{2}j\delta ^{j}t^{-\alpha }$. Hence, by induction the proof of (12) is complete.

Lemma 5

For any matrix $\mathbf {A}$,

$$\begin{aligned} T\left( \mathbf {A}^{j},t\right) \le j\left\| \mathbf {A}\right\| _{(1,1)}^{j-1}T(\mathbf {A},t/2^{j-1}) \end{aligned}$$

(28)

for all $t>0$ and $j=1,2,...$

Proof

Clearly for $j=1$ inequality (28) holds. Now, we assume that (28) is true for some j. From the induction assumption and Lemma 2 (ii),

$$\begin{aligned} \begin{aligned} T\left( \mathbf {A}^{j+1},t\right) \le&{} \left\| \mathbf {A} ^{j}\right\| _{(1,1)}T(\mathbf {A},t/2)+\left\| \mathbf {A} \right\| _{(1,1)}T(\mathbf {A}^{j},t/2) \\\le&{} \left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t/2 )+j\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t/2^{j}) \\\le&{} \left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j})+j\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j}) \\\le&{} (j+1)\left\| \mathbf {A}\right\| _{(1,1)}^{j}T(\mathbf {A},t /2^{j})\text{. } \end{aligned} \end{aligned}$$

By induction, we see that (28) holds for all positive integers j.

References

Bhattacharjee M, Bose A (2014a) Consistency of large dimensional sample covariance matrix under dependence. Stat Methodol 20:11–26
Article MathSciNet Google Scholar
Bhattacharjee M, Bose A (2014b) Estimation of autocovariance matrices for infinite dimensional vector linear process. J Time Ser Anal 35:262–281
Article MathSciNet Google Scholar
Bickel PJ, Levina E (2008a) Covariance regularization by thresholding. Ann Stat 36:2577–2604
MathSciNet MATH Google Scholar
Bickel PJ, Levina E (2008b) Regularized estimation of large covariance matrices. Ann Stat 36:199–227
MathSciNet MATH Google Scholar
Birnbaum A, Nadler B (2012) High dimensional sparse covariance estimation: accurate thresholds for the maximal diagonal entry and for the largest correlation coefficient, Technical Report
Brockwell P, Davis R (2002) Introduction to time series and forecasting, 2nd edn. Springer Texts in Statistics, New York
Book Google Scholar
Cai T, Liu WD (2011) Adaptive thresholding for sparse covariance matrix estimation. J Am Statist Assoc 106:672–684
Article MathSciNet Google Scholar
Cai T, Zhang C, Zhou H (2010) Optimal rates of convergence for covariance matrix estimation. Ann Stat 38:2118–2144
MathSciNet MATH Google Scholar
Chen X, Xu M, Wu WB (2013) Covariance and precision matrix estimation for high-dimensional time series. Ann Stat 41:2994–3021
MathSciNet MATH Google Scholar
Fan J, Fan Y, Lv. J (2006) High dimensional covariance matrix estimation using a factor model, Technical report, Princeton Univ
Friedman J (1989) Regularized discriminant analysis. J Am Statist Assoc 84:165–175
Article MathSciNet Google Scholar
Furrer R, Bengtsson T (2007) Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. J Multivar Anal 98:227–255
Article Google Scholar
Guo S, Wang Y, Yao Q (2016) High-dimensional and banded vector autoregressions. Biometrika 103:889–903
Article MathSciNet Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Article Google Scholar
Huang J, Liu N, Pourahmadi M, Liu L (2006) Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 98:85–98
Article MathSciNet Google Scholar
Jentsch C, Politis D (2015) Covariance matrix estimation and linear process bootstrap for multivariate time series of possibly increasing dimension. Ann Stat 43:1117–1140
Article MathSciNet Google Scholar
Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high-dimensions. J Am Statist Assoc 104:682–693
Article MathSciNet Google Scholar
Marchenko VA, Pastur LA (1967) Distributions of eigenvalues of some sets of random matrices. Math USSR-Sb 1:507–536
MATH Google Scholar
McMurry T, Politis D (2010) Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J Time Ser Anal 31:471–482
Article MathSciNet Google Scholar
Petrov VV (1995) Limit theorems of probability theory. Oxford University Press, Oxford
MATH Google Scholar
Pisier G (1983) Some applications of the metric entropy to harmonic analysis, Banach Spaces, Harmonic Ann. Prob. Theory. Lecture Notes in Math. 995, Springer, New York
Rothman AJ, Levina E, Zhu J (2009) Generalized thresholding of large covariance matrices. J Am Statist Assoc 104:177–186
Article MathSciNet Google Scholar
Wu WB, Pourahmadi M (2009) Banding sample autocovariance matrices of stationary processes. Statist Sin 19:1755–1768
MathSciNet MATH Google Scholar
Xu M, Chen X, Wu WB (2020) Estimation of dynamic networks for high-dimensional nonstationary time series. Entropy 22, 55

Download references

Acknowledgements

The author would like to thank the editor and the reviewers for useful comments and suggestions which have improved the presentation of the paper.

Author information

Authors and Affiliations

Institute of Information Technology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-776, Warsaw, Poland
Konrad Furmańczyk

Authors

Konrad Furmańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konrad Furmańczyk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Furmańczyk, K. Estimation of autocovariance matrices for high dimensional linear processes. Metrika 84, 595–613 (2021). https://doi.org/10.1007/s00184-020-00790-2

Download citation

Received: 25 January 2019
Published: 26 August 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s00184-020-00790-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimation of autocovariance matrices for high dimensional linear processes

Abstract

Similar content being viewed by others

Moment Bounds for Large Autocovariance Matrices Under Dependence

CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data

Extreme value analysis for the sample autocovariance matrices of heavy-tailed multivariate time series

1 Introduction

2 The rate of convergence of autocovariance estimation

2.1 The main results

Theorem 1

Corollary 1

Lemma 1

Proposition 1

2.2 Remarks on the condition (A1) for AR(1) processes

Remark A

2.3 Remarks on the condition (A1) for AR(r) processes

Proposition 2

Remark B

2.4 Comparison of our results with previous studies

2.5 Conclusions

3 Appendix

Lemma 2

Lemma 3

Proof

Lemma 4

Proof

Proof of Lemma 1

Remark C

Proof of Theorem 1

Proof of Corollary 1

Proof of Proposition 1

Proof of Proposition 2

Lemma 5

Proof

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation