Abstract
In this paper under some mild restrictions upper bounds on the rate of convergence for estimators of \(p\times p\) autocovariance and precision matrices for high dimensional linear processes are given. We show that these estimators are consistent in the operator norm in the sub-Gaussian case when \(p={\mathcal {O}}\left( n^{\gamma /2}\right) \) for some \(\gamma >1\), and in the general case when \( p^{2/\beta }(n^{-1} \log p)^{1/2}\rightarrow 0\) for some \(\beta >2\) as \( p=p(n)\rightarrow \infty \) and the sample size \(n\rightarrow \infty \). In particular our results hold for multivariate AR processes. We compare our results with those previously obtained in the literature for independent and dependent data. We also present non-asymptotic bounds for the error probability of these estimators.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Estimation of covariance matrices in a high dimensional setting has been one of the fundamental statistical issues in the last decade. Some statistical applications of estimation of covariance matrices have been presented for ridge regression in Hoerl and Kennard (1970), in regularized discriminant analysis—(Friedman 1989) and in principal component analysis—(Johnstone and Lu 2009). For an overview of this topic and its applications see (Bickel and Levina 2008b; Birnbaum and Nadler 2012; Chen et al. 2013; Fan et al. 2006; Rothman et al. 2009). The problem of estimating covariance matrices for dependent data has been recently investigated by (Chen et al. 2013; Bhattacharjee and Bose 2014a, b; Guo et al. 2016; Jentsch and Politis 2015; McMurry and Politis 2010), and Wu and Pourahmadi (2009). The estimation of the inverse covariance matrix is used in the recovery of the true unknown structure of undirected graphical models, especially in Gaussian graphical models, where a zero entry of the inverse covariance matrix is associated with a missing edge between two vertices in the graph. The recovery of undirected graphs on the basis of the estimation of the precision matrices for a general class of nonstationary time series is considered in Xu et al. (2020).
Consider a p-dimensional linear process
where the \({\varvec{\Psi }}_{j}\) are \(p\times p\) matrices, \({ \varepsilon }_{i}=\left( \varepsilon _{i,1},\ldots ,\varepsilon _{i,p}\right) ^{^{\prime }}\), \(\left( {\varepsilon }_{t}\right) \) are i.i.d. vectors in \({\mathbb {R}}^{p}\) with mean \({\mathbf {0}}\) and variance-covariance matrix \(\varvec{\Sigma }\). Under a causality condition, a vector ARMA process which is a basic model in econometrics and finance is a linear process (Brockwell and Davis 2002). We assume that \(\left( {\varepsilon }_{t}\right) \) satisfies one of the following conditions:
- (Gauss):
-
\({\varepsilon }_{i}\) is Gaussian with mean \({\mathbf {0}} \) and variance-covariance matrix \({\varvec{\Sigma }}\).
- (SGauss):
-
\(\left( \varepsilon _{i,l}\varepsilon _{j,s}\right) \) is sub-Gaussian with constant \(\sigma ^{2}\), that is,
$$\begin{aligned} E\exp \left( u\varepsilon _{i,l}\varepsilon _{j,s}\right) \le \exp \left( \sigma ^{2}u^{2}/2\right) \end{aligned}$$for all \(u\in {\mathbb {R}}\), \(i,j=1,2,...\) and \(l,s=1,\ldots ,p\),
- (NGa\(_{\beta }\)):
-
\(E\left| \varepsilon _{i,j}\right| ^{\beta }<\infty \) for some \(\beta >2\) for \(i=1,2,...\) and \(j=1,...,p\).
For example, condition (SGauss) is satisfied for bounded sequences \(\left( \varepsilon _{i,l}\right) \) i.e. \(\sup _{i,l}\left| \varepsilon _{i,l}\right| \le M\) for some \(M>0\). We can observe that (SGauss) is implied by sub-Gaussian condition for the vectorization \(vec\left( {{ \varepsilon }}_{i}^{^{\prime }}\otimes {\varepsilon }_{j}\right) \) of the Kronecker product \({\varepsilon }_{i}^{^{\prime }}\otimes {\varepsilon }_{j}\) for all i, j i.e. for all \(u\in {\mathbb {R}} ^{p^{2}}\)
for some \(\sigma ^{2}>0\). Condition (NGa\(_{\beta }\)) is a moment condition for the innovation process without assumptions on the dependency of the coordinates of the innovation process \(\left( {\varepsilon }_{i}\right) \).
Let \({\varvec{\Gamma }}_{k}\) be the kth order autocovariance matrix,
The matrix \({\varvec{\Gamma }}_{k}\) will be estimated from the sample \({\mathbf {X}}_{1},...,{\mathbf {X}}_{n}\) because in practice we do not know the matrices \( {\varvec{\Psi }}_{j}\) and \({\varvec{\Sigma }}\). Define the sample autocovariance matrix of order k as
for \(0\le k\le n-1\), and the banded version of \({\varvec{{\hat{\Gamma }}}}_{k}\) (as in Bhattacharjee and Bose (2014b)) is given by
for some sequence of thresholds \(l_{n}\rightarrow \infty \) as \(n\rightarrow \infty \), where \({\mathbf {1}}(\cdot )\) is the indicator function. We will assume that \(p=p(n)\rightarrow \infty \) as \(n\rightarrow \infty \).
The main contribution of our paper is that in Theorem 1 we obtain the rate in the operator norm of \({\mathbf {B}}_{l_{n}}({\varvec{{\hat{\Gamma }}}}_{k})-{\varvec{\Gamma }}_{k}\) for high dimensional linear process
for some \(c_{n}\rightarrow 0\) and some \(\alpha >0\). Under the sub-Gaussian condition ((Gauss) and (SGauss)) we obtain the same rate \({\mathcal {O}}_{P }( (n^{-1} \log p)^{\alpha /2(\alpha +1)}) \) as Bhattacharjee and Bose (2014b) but under weaker assumptions for matrices of coefficients \( {\varvec{\Psi }}_{j} \) and \( {\varvec{\Sigma }} \). In particular, under causality, our results include vector autoregressive AR(r) processes. Similar results (Corollary 1) we obtain for the precision matrix
An interesting problem is to obtain lower bounds and the optimal rate of convergence. Cai et al. (2010) obtained the minimax bound for i.i.d. observations for tapering estimators of \({\varvec{\Gamma }}_{0}\). For dependent data this problem is still open. Below we briefly present the state of research related to the estimation of the covariance matrix for independent and dependent observations.
The sample covariance matrix \(\varvec{\hat{\Gamma }}_{0}=(\hat{\gamma } _{ij}^{0})\) performs poorly in a high dimensional setting. In the Gaussian case when \({\varvec{\Gamma }}_{\mathbf {0}}={\mathbf {I}}\) is the identity matrix and \(p/n\rightarrow c \in (0,1)\), the empirical distribution of the eigenvalues of the sample covariance matrix \({\varvec{\hat{\Gamma }}}_{0}\) follows the Marchenko and Pastur (1967) law, which is supported on the interval \((( 1-\sqrt{c})^{2}, (1+\sqrt{c})^{2})\).
For the i.i.d. case Bickel and Levina (2008a) proposed thresholding of the sample covariance matrix and obtained rates of convergence for the thresholding estimators for a proper choice of the threshold \(\lambda _{n}\), where the estimator is given by
Rothman et al. (2009) considered a class of universal thresholding rules with more general thresholding functions than hard thresholding. An interesting generalization of this method can be found in Cai and Liu (2011) for sparse covariance matrices, where an adaptive thresholding estimator is given by
where \(S_{\lambda _{ij}}(\cdot )\) is a general thresholding function with data-driven thresholds \(\lambda _{ij}\). For other interesting results in this area, see (Birnbaum and Nadler 2012; Cai et al. 2010; Fan et al. 2006; Furrer and Bengtsson 2007; Huang et al. 2006).
There are few results for high-dimensional dependent data: (Bhattacharjee and Bose 2014a, b; Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Bhattacharjee and Bose (2014a) considered the estimation of the high dimensional variance-covariance matrix under a general Gaussian model with weak dependence in both rows and columns of the data matrix. They showed that the bounded and tapered sample variance-covariance matrices are consistent under a suitable column dependence model. But their conditions do not allow control of the first few autocovariances, only they control higher order autocovariances. Bhattacharjee and Bose (2014b) showed that under suitable assumptions for the linear process, the banded sample autocovariance matrices are consistent in the high dimensional setting. Chen et al. (2013) obtained the rate of convergence for a banded autocovariance estimator in operator norm for a general dependent model. A similar result under more restricted assumptions was obtained by (Jentsch and Politis 2015; Guo et al. 2016) established similar results for multivariate sparse autoregressive AR processes.
The rest of the paper is organized as follows. In “The rate of convergence of autocovariance estimation” section we deal with the problem of estimating the kth order autocovariance matrix \({\varvec{\Gamma }}_{k}\) by (3) in high dimensions (Theorem 1). The rate of convergence for the estimator of the precision matrix \({\varvec{\Gamma }}_{k}^{-1}\) is given in Corollary 1.
In Proposition 1 we obtain bounds on the error probability of the estimation of the k th order autocovariance. In “Comparison of our results with previous studies” section we compare our results with the results obtained by (Bickel and Levina 2008a, b) for independent normal and nonnormal data, with the minimax upper bound for tapering estimator in Cai et al. (2010) and with the results for dependent data obtained by (Chen et al. 2013; Bhattacharjee and Bose 2014b; Guo et al. 2016) and Jentsch and Politis (2015). In the special case of a multi-dimensional linear process we obtain a sharper bound (Theorem 1) than Chen et al. (2013) and Jentsch and Politis (2015). We also obtain a better rate for the estimation error for multivariate sparse AR processes than Guo et al. (2016).
Finally, the conclusions are presented in Sect. 2.5. All the proofs and auxiliary lemmas are given in the “Appendix”.
2 The rate of convergence of autocovariance estimation
For any \(p\times p\) matrix \({\mathbf {M=}}\left( m_{ij}\right) \) we define the following matrix norms which are convenient for comparison with other results of autocovariance estimation:
where \(\lambda _{\max }\left( {\mathbf {M}}^{^{\prime }}{\mathbf {M}}\right) \) is the maximum eigenvalue of the matrix \({\mathbf {M}}^{^{\prime }}{\mathbf {M}}\),
for some threshold \(t>0\),
We consider the following conditions on the matrices \(\varvec{\Psi }_{j}\) (see (1)), \( {\varvec{\Gamma }}_{k}\) and \(\varvec{\Sigma }\) for all \(t>0\) and \(j=0,1,...:\)
-
(A1)
there exists a sequence \(d_{j}\) with \(\sum _{j=1}^{\infty }d_{j}^{2}<\infty \) such that
$$\begin{aligned} \max (T\left( \varvec{\Psi }_{j},t\right) ,T(\varvec{\Psi }_{j}^{^{\prime }},t))\le C_{1}d_{j}t^{-\alpha } \end{aligned}$$for some constants \(C_{1}>0\), \(\alpha >0\) ,
-
(A2)
\(T\left( \varvec{\Sigma },t\right) \le C_{2}t^{-\alpha }\) for some constants \(C_{2}>0\), \(\alpha >0\),
-
(A3)
\(\sum _{j=1}^{\infty }r_{j}^{2}<\infty \), where \(r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})\),
-
(A4)
\(\sum _{j=1}^{\infty }r_{j}<\infty \), where \(r_{j}=\max (\left\| \varvec{\Psi }_{j}\right\| _{(1,1)},\Vert {\varvec{\Psi } }_{\mathbf {j}^{^{\prime }}}\Vert _{(1,1)})\),
-
(A5)
\(\lambda _{\max }\left( \varvec{\Sigma }\right) \le C_{3}\) for some constant \(C_{3}>0\).
These conditions are restrictions on the parameter space. It is obvious that (A3) implies (A4) but the converse is not true. If the covariance matrix \( {\varvec{\Sigma =}}( \gamma _{ij}) \) is such that \(\left| \gamma _{ij}\right| \le C\left| i-j\right| ^{-\alpha -1}\) for all i, j for some \(\alpha >0\), then \(T\left( {\varvec{\Sigma }},t\right) \le C_{2}t^{-\alpha }\) and (A2) holds. Conditions (A1)-(A2) are tapering conditions and specify the rate of decay for the matrices \({\varvec{\Psi }}_{j}\) and \({\varvec{\Sigma }}\) away from the diagonal.
Of course, if (A3) holds then \(\left\| {{\varvec{\Gamma }}}_{k}\right\| _{(1,1)}<\infty \) (see (2)), and \(\sum _{j=0}^{\infty }\left\| {\varvec{\Psi }}_{j}\right\| _{(1,1)}^{2}<\infty \) implies that the series in (1) converges almost surely.
Next, in “Remarks on the condition (A1) for AR(1) processes” and “Remarks on the condition (A1) for AR(r) processes” sections in a few remarks we will give a discussion about condition (A1) is fulfilled for vector AR(1) and AR(r) processes. In “Comparison of our results with previous studies” section we compare our main result Theorem 1 with the previous studies. In “Conclusions” section we summarize results related to Theorem 1 available in the literature.
2.1 The main results
The main results in this section concern the rate of convergence in operator norm for the kth order autocovariance matrix \({\varvec{\Gamma }}_{k}\) and for the precision matrix \({\varvec{\Gamma }}_{k}^{-1}\).
Theorem 1
Suppose (A1)-(A2) hold. Then
for \(l_{n}=( c_{n}) ^{-1/(\alpha +1)}\). Here \(c_{n}=\sqrt{n^{-1} \log p}\) when (((Gauss) and (A5)) or (SGauss)) and (A3) hold, and \( p={\mathcal {O}}( n^{\gamma /2}) \) for some \(\gamma >1\) as \( n\rightarrow \infty \); and \(c_{n}=p^{2/\beta }\sqrt{n^{-1} \log p}\) when (NGa\(_{\beta }\)) and (A4) hold, and \(p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0\) as \(n\rightarrow \infty \).
The proof of Theorem 1 is similar to the proofs of Theorems 1-2 in Bhattacharjee and Bose (2014b), but instead of their Lemmas 3, 5 we use a maximal inequality for the (SGauss) case and Pisier’s inequality (see (Pisier 1983)) for the (NGa\(_{\beta }\)) case.
Corollary 1
Under the assumptions of Theorem 1 and when \(\varvec{ \Gamma }_{k}^{-1}\) exists and \(\Vert {\varvec{\Gamma }} _{k}^{-1}\Vert _{2}={\mathcal {O}}(1) \), we have
where \(l_{n}\) and \(c_{n}\) are defined in Theorem 1.
Corollary 1 is a simple consequence of Theorem 1. We also obtain the rate of convergence of \({\varvec{{\hat{\Gamma }}}}_{k}\) in supremum norm, which will be used in the proof of Theorem 1.
Lemma 1
We have
where \(c_{n}=\sqrt{n^{-1} \log p}\) if (SGauss) and (A3) hold and \(p= {\mathcal {O}}\left( n^{\gamma /2}\right) \) for some \(\gamma >1\) as \( n\rightarrow \infty \), and \(c_{n}=p^{1/\beta }\sqrt{n^{-1} \log p}\) when (NGa\(_{\beta }\)) and (A4) hold and \(p^{2/\beta }\sqrt{n^{-1} \log p} \rightarrow 0\) as \(n\rightarrow \infty \).
Next, we obtain non-asymptotic bounds on the error probability of the estimation of the kth order autocovariance.
Proposition 1
Suppose (A1)-(A2) hold. Then for any \(\eta >\tilde{C} _{1}\left\| {\varvec{\Sigma }}\right\| _{(1,1)}+{\tilde{C}}_{2}\),
for some constants \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\) where (SGauss) and (A3) hold, and
for some constants \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\), \(C_{*}=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}\) and \(C_{\beta }\) depending on \(\beta \) when (NGa\(_{\beta }\)) and (A4) hold.
2.2 Remarks on the condition (A1) for AR(1) processes
As a special case, we consider a multivariate AR(1) process
for some \(p\times p\) matrix \({\mathbf {A}}\). Then \(\varvec{\Psi }_{j}={\mathbf {A}} ^{j}\) for \(j=1,2,\dots \). Let \(t>0\). We impose one of two conditions on the matrix \({\mathbf {A}}\):
-
(w1)
there exists \(\alpha >0\), a constant \(C>0\), and a sequence \( b_{j} \) such that \(\sum _{j=1}^{\infty }jb_{j}\max (\left\| \mathbf {A} \right\| _{(1,1)}^{j-1},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)}^{j-1})<\infty \) and \(\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A} ^{^{\prime }}\Vert _{(1,1)})<1\) and \(\max (T(\mathbf {A},t/ 2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1}))\le Cb_{j}t^{-\alpha }\).
-
(w2)
\(\max (\left\| \mathbf {A}\right\| _{(1,1)},\Vert \mathbf {A}^{^{\prime }}\Vert _{(1,1)})<2^{-\alpha }\) and \(\max (T( \mathbf {A},t/2^{j-1}),T(\mathbf {A}^{^{\prime }},t/2^{j-1} ))\le Ct^{-\alpha }\) for some \(\alpha >0\) and some constant \(C>0\).
Remark A
Condition (A1) is fulfilled for \(d_{j}=jb_{j}\max (\left\| {\mathbf {A}}\right\| _{(1,1)}^{j-1},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)}^{j-1})\) when (w1) holds, because from Lemma 5 (see “Appendix”)
Similarly, condition (A1) is fulfilled for \(d_{j}=j2^{(j-1)\alpha }\max (\Vert {\mathbf {A}}\Vert _{(1,1)}^{j-1},\Vert {\mathbf {A}} ^{^{\prime }}\Vert _{(1,1)}^{j-1})\) when (w2) holds. Condition (A3) holds when \(\max (\left\| {\mathbf {A}}\right\| _{(1,1)},\Vert {\mathbf {A}}^{^{\prime }}\Vert _{(1,1)})<1\), because
2.3 Remarks on the condition (A1) for AR(r) processes
Now, we consider a multivariate AR(r) process of order r given by
\(t\ge 1\), where the \(p\times p\) matrices \({\mathbf {A}}_{i}\), \( i=1,\ldots ,r\), are called parameter matrices. Under some regularity condition (for more details see ((Bhattacharjee and Bose 2014b) (4), p. 264) and Brockwell and Davis (2002)) this process has the representation ( 1), where \({\varvec{\Psi }}_{\mathbf {0}}={\mathbf {I}}\) and
We consider the class \({\mathcal {A}}\) of r-sequences of matrices, defined by
Observe that if \(T\left( {\mathbf {A}}_{i},t\right) \le C_{2}\delta ^{i}t^{-\alpha }\) for some \(0<\delta <1\) and
where \({\mathbf {A}}_{i}=\left( a_{i}^{s,k}\right) \), then \(\left\| {\mathbf {A}}_{i}\right\| _{(1,1)}\le \left( 2^{\alpha }C_{2}+C\right) \delta ^{i}\). Indeed, for any \( t>0\),
Therefore, putting \(t=1/2\), we get
Similarly, we may obtain \(\Vert {\mathbf {A}} _{i}^{^{\prime }} \Vert _{(1,1)}\le ( 2^{\alpha }C_{2}+C) \delta ^{i}\).
Proposition 2
Under the condition of class \({\mathcal {A}}\), we have
and
for \(j=1,2,\ldots \)
Remark B
Immediately from Proposition 2, we have that under the condition of class \({\mathcal {A}}\) condition (A1) holds for \( d_{j}=j\delta ^{j}\) for multivariate AR(r) processes.
2.4 Comparison of our results with previous studies
Similar results to those given in Theorem 1 have been presented in Bickel and Levina (2008b) for i.i.d. Gaussian observations \({{\mathbf {X}}}_{1},...,{\mathbf {X}}_{n}\) and in Bhattacharjee and Bose (2014b) for a p-dimensional linear process with \(n^{-1} \log p\rightarrow 0\). In Bhattacharjee and Bose (2014b) it is assumed that the matrix \(\varvec{ \Sigma }\) belongs to the class
where \(\varepsilon ,\alpha ,C>0\) and \(\lambda _{\min }\left( \varvec{\Sigma } \right) \), \(\lambda _{\max }\left( \varvec{\Sigma }\right) \) are respectively the minimum and maximum eigenvalues of \(\varvec{\Sigma }\), the coefficient matrices \(\left( \varvec{\Psi }_{j}\right) \) are in \({\mathcal {T}} _{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }\) for some \(0<\beta <1 \), \(\lambda \ge 0\), \(\alpha \), \(\nu >0\), \(0<\eta <1\), where
with \(r_{j}=\max (\Vert \varvec{\Psi }_{j}\Vert _{(1,1)},\Vert {\varvec{\Psi }}_{j}{^{^{\prime }}}\Vert _{(1,1)})\). Additionally they assumed that for some \(\lambda _{0}>0\),
In our Theorem 1 we consider (Gauss), (SGauss) or (NGa\(_{\beta }\)) for \(\left( {\varepsilon }_{i}\right) \). In particular, condition (NGa \(_{\beta }\)) is much weaker than (13). The class \({\mathcal {U}}\) implies our conditions (A2) and (A5). Also conditions (A1) and (A3) are weaker than \({\mathcal {T}}_{\beta ,\lambda }\cap {\mathcal {G}}_{\alpha ,\eta ,\nu }\). For example if \(r_{j}\sim j^{-a}\) for \(a>1\), then (A4) holds but the conditions in \({\mathcal {T}}_{\beta ,\lambda }\) are satisfied for \(a>\max ( 1/\beta ,(\lambda +1)/2( 1-\beta ))\), and those in \({\mathcal {G}}_{\alpha ,\eta ,\nu }\) are satisfied for \(r_{j}\sim b^{j}\) for some \(0<b<1\).
Bickel and Levina (2008a) considered asymptotic behavior of the threshold estimator of the covariance matrix \({\varvec{\Gamma }}_{0}\) of the form
For Gaussian data, where \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \) and
they obtained
where \(u=C(\log p)^{1/2}n^{-1/2}\) for a sufficiently large constant C. For nonnormal data where \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \), they obtained (16) for \(u=Cp^{2/q}n^{-1/2}\), for a sufficiently large constant C (see (Chen et al. 2013), (48)).
Cai et al. (2010) obtained the minimax upper bound for a special class of tapering estimators of \({\varvec{\Gamma }}_{0}\) for i.i.d. observations under (A2) and (A5). Its upper rate bound equals \({\mathcal {O}}_{ P}\big ( \min \left\{ n^{-2\alpha /(2\alpha +1)}+n^{-1} \log p,p/n\right\} \big )\) and is better than our result for (Gauss) and (SGauss) from Theorem 1. It means that for i.i.d. observations our result is suboptimal.
A sharper rate than in (16) for a nonnormal case was obtained by Chen et al. (2013), where data come from a general weak dependence model. In particular, when data come from a linear process (1) with (NGa\( _{\beta }\)) for \(\beta =2q\) for some \(q>2\), and the coefficient matrices \( \varvec{\Psi }_{j}=\left( \psi _{k,s}\left( j\right) \right) _{1\le k,s\le p}\) satisfy
for some \(\gamma >1/2-1/q\), \({\varvec{\Gamma }}_{0}\in {\mathcal {G}}_{r}\left( M\right) \) and \(M<p\), then one can obtain (16) for \(u=u_{*}\), where \(u_{*}=\max \left( u_{1},u_{2},u_{3}\right) \), \(u_{1}=M^{\frac{1}{ q+r}}p^{\frac{1}{q+r}}n^{\frac{1-q}{q+r}}\), \(u_{2}=\sqrt{n^{-1} \log p}\), \(u_{3}=M^{-\frac{2}{q-2r}}p^{\frac{2}{q-2r}}n^{\frac{(1-q)}{q-2r}}\)(for more details see discussion preceding Corollary 2.7 in Chen et al. (2013)).
Under condition (Gauss) or (SGauss), we obtain the same rate of convergence as in Bickel and Levina (2008a) for the covariance matrix \(\varvec{ \Gamma }_{0}\) for normal data and as in Bhattacharjee and Bose (2014b) for \({\varvec{\Gamma }}_{k}\). In particular, if \({{\varvec{\Gamma }}}_{\mathbf {0}}\in {\mathcal {G}}_{r}\left( M\right) \) then \(\left\| \varvec{\Sigma }\right\| _{(1,1)}\le M\), and from Theorem 1 we have
If \(M\sim p^{\eta }\) for some \(\eta \in [0,1)\), then the r.h.s. of (16) for \(u=u_{*}\) equals
which is greater than the r.h.s. of (17), \(p^{\eta }\left( n^{-1} \log p\right) ^{\frac{\alpha }{2(\alpha +1)}}\). Thus, our rate of convergence is better than the rate (16) for \(u=u_{*}\) in Chen et al. (2013) for a linear process.
Guo et al. (2016) worked with a bounded covariance estimator for \( l_{n}=C\log (n/\log p)\) for some \(C>0\) for a sparse multivariate AR model. The rate of convergence for operator norm was \({\mathcal {O}}_{P}(\log ( n/\log p)\sqrt{n^{-1} \log p})\). In a special case, from our Theorem 1 we obtain the better rate \({\mathcal {O}}_{P}(\log ^{\alpha }(n^{-1} \log p))\).
Jentsch and Politis (2015) dealt with so called flat-top tapered covariance matrix estimation for a multivariate strictly stationary time series process. In the special case when the observations come from a \(\ p\) dimensional linear process \({\mathbf {X}}_{t}=\sum _{j=-\infty }^{\infty } \varvec{\Psi }_{j}{\varepsilon }_{t-j}\), where the sequence \(\left( \varvec{\Psi }_{j}\right) \) of coefficient matrices is component-wise absolutely summable (then \(\sum _{h=-\infty }^{\infty }\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| <\infty \), where \( {\varvec{\Gamma }}_{k}=\left( \gamma _{i,j}\left( h\right) \right) \)) with i.i.d. noise \(\left( {\varepsilon }_{t}\right) \) with fourth moments, they obtained
where \({\varvec{{\hat{\Gamma }}}}_{k,l}\) is a flat-top tapered covariance matrix estimator of \({\varvec{\Gamma }}_{k}\), where \(l=o\left( \sqrt{n}\right) \) is the banding parameter. If \(\psi _{i,j}(h)\le Ch^{-(1+\alpha )}\) for some \( C>0\) and \(\alpha >0\), where \(\varvec{\Psi }_{h}=\left( \psi _{i,j}(h)\right) \), then (A1) holds and under (A2) and (SGauss) we deduce from Theorem 1 that the r.h.s. of (4) is of order \({\mathcal {O}}_{P} ((n^{-1}\log p)^{\alpha /2(\alpha +1)})\). Therefore \( \left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( h^{-\alpha -1}\right) \) and for \(l=n^{1/2-\kappa }\) for some \(\kappa \in (0,1/2)\) we have \(\sum _{h=l+1}^{n-1}\sum _{i,j=1}^{p}\left| \gamma _{i,j}(h)\right| ={\mathcal {O}}\left( l^{-\alpha }\right) \), so the r.h.s. of (19) is of order \({\mathcal {O}}\left( p^{2}n^{-\kappa }+l^{-\alpha }\right) =\mathcal {O(}p^{2}n^{-\kappa }+n^{-\left( \frac{1}{2}-\kappa \right) \alpha })\), which is worse rate than in Theorem 1. Similarly when (NGa\(_{\beta }\)) and \(p={\mathcal {O}}\left( n^{\gamma /2}\right) \) for some \(\gamma >1\), then from Theorem 1 the r.h.s. of (4) is of order \({\mathcal {O}}_{P}( ( p^{2/\beta }\sqrt{ n^{-1} \log p}) ^{\alpha /(\alpha +1)}) \) and this rate is sharper than the r.h.s. of (19).
2.5 Conclusions
Our main result (Theorem 1) was compared with related results available in the literature. In the special case of a multidimensional linear process we obtained a better rate of convergence of our covariance estimator in operator norm than (Chen et al. 2013; Jentsch and Politis 2015) and Guo et al. (2016). Our result is similar to that of Bhattacharjee and Bose (2014b), but under milder assumptions for the noise process \(\left( { \varepsilon }_{t}\right) \) and for the admissible class matrices \(\varvec{ \Gamma }_{k}\).
Comparing the results of estimating the covariance matrix is difficult because those results are for different assumptions on the class of covariance matrices and for independent or dependent data. For independent data Cai et al. (2010) obtained the optimal rate for tapering estimators of \(\varvec{\Gamma _0}\). In contrast such results do not exist for dependent data and the problem of finding the optimal rate of convergence in Theorem 1 is still open.
3 Appendix
Lemma 2
(Bhattacharjee and Bose 2014b). For any matrices \({\mathbf {A}}\) , \({\mathbf {B}}\) and for all \(\alpha \), \(\beta \), \(t>0\),
-
(i)
\(\left\| \mathbf {AB}\right\| _{(1,1)}\le \left\| \mathbf {A}\right\| _{(1,1)}\left\| \mathbf {B}\right\| _{(1,1)}\),
-
(ii)
\(T\left( \mathbf {AB,}\left( \alpha +\beta \right) t\right) \le \left\| \mathbf {A}\right\| _{(1,1)}T\left( \mathbf {B},\alpha t\right) +\left\| \mathbf {B}\right\| _{(1,1)}T\left( \mathbf {A},\beta t\right) \) .
Lemma 3
Suppose (A1)–(A2) and (A3) hold. Then, for all \(t>0\),
for some \(\alpha >0\), where \({\tilde{C}}_{1}\), \({\tilde{C}}_{2}\) are some constants.
Proof
Observe
From Lemma 2(ii), we have
It follows from Lemma 2(i) that
Hence
Again using Lemma 2(ii), we obtain
and
By assumptions (A1)-(A2), we see that
From the Schwarz inequality, (A1) and (A3), we have
and similarly \(\sum _{j=k}^{\infty }r_{j-k}d_{j}<\infty \), \( \sum _{j=k}^{\infty }r_{j}r_{j-k}<\infty \). Therefore, from (20), we get
for some constants \({\tilde{C}}_{1}>0\) and \({\tilde{C}}_{2}>0\). \(\square \)
Lemma 4
Suppose (SGauss) and (A3) hold. Then for any \(\eta >0\),
Suppose (NGa\(_{\beta }\)) and (A4) hold. Then for any \(\eta >0\),
where \(C^{*}:=\left( \sum _{i=0}^{\infty }r_{i}\right) ^{2}\) and \( C_{\beta }\) is some constant depending on \(\beta \).
Proof
By a simple calculation,
Hence
Put \(\zeta _{m,i,j}^{l,s}:={\varepsilon }_{m-j,l}{\varepsilon } _{m-i-k,s}\). Then
From (SGauss) and (A3) we conclude that
Since for fixed s, \((\zeta _{m,i,j}^{l,s})\) is sub-Gaussian with constant \( \sigma ^{2}\), we have
Hence, from the maximal inequality for sub-Gaussian r.v.’s we obtain
Hence, we have
Suppose (NGa\(_{\beta }\)) and (A4) hold. Applying to the r.h.s. of (24) Pisier’s maximal inequality (Pisier 1983)
which holds for any random variables \(\left( Z_{i}\right) \) with \(\left\| Z_{i}\right\| _{Q}=E^{1/Q}\left| Z_{i}\right| ^{Q}<\infty \) for \(Q>1\), and putting \(Q=\beta \), we obtain
For fixed i, j, l, s the sequence \((\zeta _{m,i,j}^{l,s})_{m}\) is i.i.d. and using the moment bound ( Petrov 1995) and (NGa\(_{\beta }\)), we obtain
Hence, we get (23), as desired. \(\square \)
Proof of Lemma 1
From Lemma 4 under (SGauss), it follows that
Since for all \(x>0\), we have \(\exp ( -x) <x^{-\gamma }\) for some \( \gamma >1\), it follows that
Putting \(\eta =\sqrt{n^{-1} \log p}\), we obtain
as \(n\rightarrow \infty \). Therefore, (6) holds for \(c_{n}=\sqrt{ n^{-1} \log p}\). Under (NGa\(_{\beta }\)) and from Lemma 4, we have
Putting \(\eta =p^{2/\beta }\sqrt{n^{-1} \log p}\), we obtain
as \(n\rightarrow \infty \). \(\square \)
Remark C
From Lemma A.3 (Bickel and Levina 2008b) under the assumption that \(({\varepsilon }_{t})\) is Gaussian and (A5), we may deduce that for some \(\delta >0\) and any \(\left| \eta \right| \le \delta \),
for some constants \(C_{1}^{*}\), \(C_{2}^{*}>0\). Reasoning as in the proof of Lemma 1, we may obtain (6) for \(c_{n}=\sqrt{ n^{-1} \log p}\).
Proof of Theorem 1
From the inequality in (Bhattacharjee and Bose (2014b), p. 280), we find that
From Lemma 3, we have
for any \(l_{n}\rightarrow \infty \) as \(n\rightarrow \infty \). From Lemma 1 (see also Remark C when (Gauss) holds), we get
for \(c_{n}\) as in Lemma 1. Consequently, due to (25)-(27), we have
Putting \(l_{n}=c_{n}^{-\frac{1}{\alpha +1}}\), we obtain
\(\big \Vert \mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma } _{k}\big \Vert _{2}=\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma } \big \Vert _{(1,1)}l_{n}^{-\alpha }\big ) =\mathcal {O}_{P}\big ( \big \Vert \varvec{\Sigma }\big \Vert _{(1,1)}\big ( c_{n}\big ) ^{\frac{\alpha }{ \alpha +1}}\big ) \) and (4) holds.
Proof of Corollary 1
Reasoning as in (Bhattacharjee and Bose (2014b), Section 6.2), we have: if \({\varvec{A}}^{-1}\) exists and \(\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le \big \Vert {\varvec{A}}^{-1}\big \Vert ^{-1}\), then \(\big \Vert {\varvec{B}}^{-1}-{\varvec{A}}^{-1}\big \Vert \le \frac{\big \Vert {\varvec{A}}^{-1}\big \Vert ^{2}\big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }{ 1-\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert }\). Put \({\varvec{A}}=\varvec{ \Gamma }_{k}\) and \({\varvec{B}}=\mathbf {B}_{l_{n}}(\varvec{\hat{\Gamma }}_{k})\). Since \( \big \Vert \varvec{\Gamma }_{k}^{-1}\big \Vert _{2}=\mathcal {O}\big ( 1\big ) \) and from Theorem 1, \(\big \Vert \mathbf {B}_{l_{n}}( \varvec{\hat{\Gamma }}_{k})-\varvec{\Gamma }_{k}\big \Vert _{2}=\mathcal {O}_{ P}\big ( a_{n}\big ) \) as \(a_{n}\rightarrow 0\), it follows that for large n we have \(\big \Vert {\varvec{A}}^{-1}\big \Vert \big \Vert {\varvec{A}}-{\varvec{B}}\big \Vert \le 1\). Therefore for some \(C>0\) and large n, we find that
Hence, directly from (4), we have (5).
Proof of Proposition 1
From (25), we have for any \(\eta >0\),
Using (21), we obtain
Therefore, from Lemma 4, we deduce (7)-(8).
Proof of Proposition 2
First, we will show (11) by induction. For \(j=1\), under the condition of class \(\mathcal {A}\), \(\left\| \varvec{\Psi }_{j}\right\| _{(1,1)}=\left\| \varvec{\Psi }_{1}\right\| _{(1,1)}=\left\| \mathbf {A}_{1}\right\| _{(1,1)}\le C_{1}\delta \) and (11) holds for \(j=1\). From (10), \(C_{1}r<1\) and the induction assumption that \( \left\| \varvec{\Psi }_{i}\right\| _{(1,1)}\le C_{1}\delta ^{i}\) for all \(i\le j\), we have
In a similar manner, we have \(\big \Vert \varvec{\Psi }_{j}^{^{\prime }}\big \Vert _{(1,1)}\le C_{1}\delta ^{j}\). Hence, (11) is proved. Next, by induction we will show (12). For \(j=1\), \(T\left( \varvec{ \Psi }_{j},t\right) =T\left( \varvec{\Psi }_{1},t\right) =T\left( \mathbf {A} _{1},t\right) \le C_{2}\delta t^{-\alpha }\) under the condition of class \( \mathcal {A}\). Hence, (12) is satisfied for \(j=1\). By (10) and Lemma 2(ii),
Then, by the induction assumption that \(T\left( \varvec{\Psi }_{i},t\right) \le C_{2}i\delta ^{i}t^{-\alpha }\) for all \(i\le j\) and under the conditions of class \(\mathcal {A}\) (\(C_{1}=1/\left( 2^{\alpha }r\right) \)) it follows that
By a similar reasoning, we obtain \(T(\varvec{\Psi }_{j}^{^{\prime }},t)\le C_{2}j\delta ^{j}t^{-\alpha }\). Hence, by induction the proof of (12) is complete.
Lemma 5
For any matrix \(\mathbf {A}\),
for all \(t>0\) and \(j=1,2,...\)
Proof
Clearly for \(j=1\) inequality (28) holds. Now, we assume that (28) is true for some j. From the induction assumption and Lemma 2 (ii),
By induction, we see that (28) holds for all positive integers j.
References
Bhattacharjee M, Bose A (2014a) Consistency of large dimensional sample covariance matrix under dependence. Stat Methodol 20:11–26
Bhattacharjee M, Bose A (2014b) Estimation of autocovariance matrices for infinite dimensional vector linear process. J Time Ser Anal 35:262–281
Bickel PJ, Levina E (2008a) Covariance regularization by thresholding. Ann Stat 36:2577–2604
Bickel PJ, Levina E (2008b) Regularized estimation of large covariance matrices. Ann Stat 36:199–227
Birnbaum A, Nadler B (2012) High dimensional sparse covariance estimation: accurate thresholds for the maximal diagonal entry and for the largest correlation coefficient, Technical Report
Brockwell P, Davis R (2002) Introduction to time series and forecasting, 2nd edn. Springer Texts in Statistics, New York
Cai T, Liu WD (2011) Adaptive thresholding for sparse covariance matrix estimation. J Am Statist Assoc 106:672–684
Cai T, Zhang C, Zhou H (2010) Optimal rates of convergence for covariance matrix estimation. Ann Stat 38:2118–2144
Chen X, Xu M, Wu WB (2013) Covariance and precision matrix estimation for high-dimensional time series. Ann Stat 41:2994–3021
Fan J, Fan Y, Lv. J (2006) High dimensional covariance matrix estimation using a factor model, Technical report, Princeton Univ
Friedman J (1989) Regularized discriminant analysis. J Am Statist Assoc 84:165–175
Furrer R, Bengtsson T (2007) Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. J Multivar Anal 98:227–255
Guo S, Wang Y, Yao Q (2016) High-dimensional and banded vector autoregressions. Biometrika 103:889–903
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Huang J, Liu N, Pourahmadi M, Liu L (2006) Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 98:85–98
Jentsch C, Politis D (2015) Covariance matrix estimation and linear process bootstrap for multivariate time series of possibly increasing dimension. Ann Stat 43:1117–1140
Johnstone IM, Lu AY (2009) On consistency and sparsity for principal components analysis in high-dimensions. J Am Statist Assoc 104:682–693
Marchenko VA, Pastur LA (1967) Distributions of eigenvalues of some sets of random matrices. Math USSR-Sb 1:507–536
McMurry T, Politis D (2010) Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J Time Ser Anal 31:471–482
Petrov VV (1995) Limit theorems of probability theory. Oxford University Press, Oxford
Pisier G (1983) Some applications of the metric entropy to harmonic analysis, Banach Spaces, Harmonic Ann. Prob. Theory. Lecture Notes in Math. 995, Springer, New York
Rothman AJ, Levina E, Zhu J (2009) Generalized thresholding of large covariance matrices. J Am Statist Assoc 104:177–186
Wu WB, Pourahmadi M (2009) Banding sample autocovariance matrices of stationary processes. Statist Sin 19:1755–1768
Xu M, Chen X, Wu WB (2020) Estimation of dynamic networks for high-dimensional nonstationary time series. Entropy 22, 55
Acknowledgements
The author would like to thank the editor and the reviewers for useful comments and suggestions which have improved the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Furmańczyk, K. Estimation of autocovariance matrices for high dimensional linear processes. Metrika 84, 595–613 (2021). https://doi.org/10.1007/s00184-020-00790-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-020-00790-2