Abstract
Let \(\{X_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) be a stationary process with associated lag operators \({\varvec{\mathcal {C}}}_h\). Uniform asymptotic expansions of the corresponding empirical eigenvalues and eigenfunctions are established under almost optimal conditions on the lag operators in terms of the eigenvalues (spectral gap). In addition, the underlying dependence assumptions are optimal in a certain sense, including both short and long memory processes. This allows us to study the relative maximum deviation of the empirical eigenvalues under very general conditions. Among other things, convergence to an extreme value distribution is shown. We also discuss how the asymptotic expansions transfer to the long-run covariance operator \({\varvec{\mathcal {G}}}\) in a general framework.
1 Introduction
Principal component analysis (PCA) has emerged as one of the most important tools in multivariate and highdimensional data analysis. In the latter, functional principal component analysis (FPCA) is becoming more and more important. A comprehensive overview and some leading examples can be found in [36, 43, 56]. Given a functional time series \(\mathbf{X}=\{X_k\}_{k \in \mathbb {Z}}\), it is typically assumed that \(\mathbf{X}\) lies in the Hilbert space \({\mathbb {L}}^2({\mathcal {T}})\), where \({\mathcal {T}} \subset \mathbb {R}^d\) is compact. The fundamental tool in the area of PCA and FPCA—both in theory and practice—is the usage of (functional) principal components (FPC). To fix ideas, let us introduce some notation. If \(\mathbf{X}\) is stationary with \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]<\infty \), then the mean \(\mu ={\mathbb {E}}[X_k]\) and the covariance operator
exist. Here \(\langle \cdot ,\cdot \rangle \) denotes the inner product in \({\mathbb {L}}^2\), and \(\Vert \cdot \Vert _{{\mathbb {L}}^2}\) the corresponding norm. The eigenfunctions of \({\varvec{\mathcal {C}}}_h\) are called the functional principal components and denoted by \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), i.e; we have \({\varvec{\mathcal {C}}}(e_j) = \lambda _j e_j\), where \({\varvec{\lambda }}= \{\lambda _j\}_{j \in \mathbb {N}}\) denotes the eigenvalues. The eigenfunctions \(\mathbf{e}\) are usually estimated by the empirical eigenfunctions \(\widehat{\mathbf{e}} = \{{\widehat{e}}_j\}_{j \in \mathbb {N}}\), defined as the eigenfunctions of the empirical covariance operator
where \({\bar{X}}_n = \frac{1}{n} \sum _{k = 1}^n X_k\). Hence \(\widehat{\varvec{\mathcal {C}}}({\widehat{e}}_j) = {\widehat{\lambda }}_j {\widehat{e}}_j\), where \({\widehat{{\varvec{\lambda }}}} = \{{\widehat{\lambda }}_j\}_{j \in \mathbb {N}}\) denotes the empirical eigenvalues. Due to the fundamental importance of eigenfunctions and eigenvalues for FPCA and PCA, corresponding results on the asymptotic behavior of empirical eigenfunctions and values are of high interest. Anderson [1] was among the first to give such results (see also [19]), and established a CLT for \({\widehat{\lambda }}_j\) (resp. \({\widehat{e}}_j\)) if j is fixed. Fueled from highdimensional applications, uniform bounds where j increases with the sample size n have become very important, leading to a significant rise in complexity of the problem. Well-known pathwise bounds are provided in the lemma given below (cf. [7, 10]).
Lemma 1
If \(\mathbf{X} \in {\mathbb {L}}^2({\mathcal {T}})\) and \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]<\infty ,\) then
where \(\psi _j = \min \{\lambda _{j-1} - \lambda _{j}, \lambda _j - \lambda _{j+1}\}\) (with \(\psi _1 = \lambda _1 - \lambda _2)\) and \(\Vert \cdot \Vert _{{\mathcal {L}}}\) denotes the operator norm.
The attractiveness of the above bounds lies in their simplicity, but unfortunately they are far from optimal from a probabilistic perspective. Indeed, the results of [19] tell us that in case of \({\widehat{\lambda }}_j - \lambda _j\), the correct bound should include the additional factor \(\lambda _j\), i.e; \(\lambda _j\Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathcal {L}}}\). A similar claim can be made for \(\Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2}\). In this spirit, based on Lemma 1, asymptotic expansions for \({\widehat{\lambda }}_j - \lambda _j\) and \(\widehat{e}_j - e_j\) which allow for increasing j have been established in [26–28] (see also [10, 13, 48]). These results have proved to be an indispensable tool in the literature, see for instance [8, 13, 26, 36, 43, 49] to name a few. But the corresponding (asymptotic) analysis is often based on heavy structural assumptions regarding \(\mathbf{X}\) and the spacings (spectral gap) \({\varvec{\Psi }} = \{\psi _j\}_{j \in \mathbb {N}}\) of the eigenvalues, limiting its applicability. In particular, often only the covariance operator \({\varvec{\mathcal {C}}}\) is considered, and a common key assumption is that \(\mathbf{X}\) is an IID sequence, which is rather restrictive, see [33, 36, 54] and also Sects. 2.2 and 6.2. In the presence of serial correlation, the lag operators \({\varvec{\mathcal {C}}}_h\) and the long-run covariance operator \({\varvec{\mathcal {G}}}\), formally defined as
serve as a generalization of \({\varvec{\mathcal {C}}} = {\varvec{\mathcal {C}}}_0\). They play a fundamental role for dependent functional time series, see for instance [29, 53, 54]. In this paper, we consider a general framework that contains both \({\varvec{\mathcal {C}}}_h\) and \({\varvec{\mathcal {G}}}\), avoiding the previously mentioned limitations. We derive exact asymptotic expansions of \({\widehat{\lambda }}_j\), \({\widehat{e}}_j\) under optimal dependence assumptions, allowing for short memory (weak dependence), but also for long memory (strong dependence) in case of \({\varvec{\mathcal {C}}}_h\), h finite. In addition, we only require a ‘natural condition’ concerning the spectral gap \({\varvec{\Psi }}\). It turns out that this condition is nearly optimal.
As a particular application, we study the relative maximum deviation of the empirical eigenvalues of \({\varvec{\mathcal {C}}}\), namely
where \(J_n^+ \rightarrow \infty \), see Proposition 1 for a precise definition of \(J_n^+\). Under mild assumptions, we show that
where \({\mathcal {V}}\) is a distribution of Gumbel type. The latter is based on a high dimensional Gaussian approximation, which is of independent interest, see Theorem 8. Result (1.4) is particularly important for the construction of simultaneous confidence sets and tests for the relevant number of FPCs to be used for statistical inference or modelling (cf. [4, 43, 56]). The range of further applications is surveyed in Sect. 6. Here we also touch on the possibility of long-memory in functional time series.
An outline of the paper can be given as follows. In Sect. 2 the key expansions of \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) are established in a general framework, alongside some additional results. In particular, we discuss in detail the optimality of the underlying assumptions. Asymptotic expansions of \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) in the context of \({\varvec{\mathcal {C}}}_h\) and \({\varvec{\mathcal {{\mathcal {G}}}}}\) are established in Sects. 3 and 4, whereas Sect. 5 is devoted to the study of (1.4). Additional fields of application are surveyed in Sect. 6, with an emphasis on functional linear regression, ARH(1) processes and long-memory in a functional context. The proofs of the eigen expansions are given in Sects. 7 and 8. In Sect. 9.1, a general high dimensional Gaussian approximation under dependence is established. Based on this result, we prove (1.4) in Sect. 9.2. Finally, Sect. 10 presents the proofs of Sect. 6.
2 Preliminary notation and main asymptotic expansions
For \(p \ge 1\), denote with \(\Vert \cdot \Vert _p\) the \(L^p\)-norm \({\mathbb {E}}[|\cdot |^p]^{1/p}\). We write \(\lesssim \), \(\gtrsim \), (\(\thicksim \)) to denote (two-sided) inequalities involving a multiplicative constant, \(a \wedge b = \min \{a,b\}\) and \(a \vee b = \max \{a,b\}\). Given a set \({\mathcal {A}}\), we denote with \({\mathcal {A}}^c\) its complement. Moreover, we write \(\overline{X} = X - {\mathbb {E}}[X]\) for a random variable X.
In the sequel, it is convenient to first consider a more abstract framework. Assume that the operator \({\varvec{\mathcal {D}}}: {\mathbb {L}}^2({\mathcal {T}}) \mapsto {\mathbb {L}}^2({\mathcal {T}})\) has non-negative eigenvalues \({\varvec{\lambda }}= \{\lambda _j\}_{j \in \mathbb {N}}\) and eigenfunctions \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), and satisfies the spectral representation
For a sequence of non-negative numbers \(\{\widetilde{\lambda }_j\}_{j s\in \mathbb {N}}\) with \(\sum _{j = 1}^{\infty } \widetilde{\lambda }_j < \infty \) and real-valued random variables \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\}_{i,j \in \mathbb {N}}\), \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\}_{i,j \in \mathbb {N}}\) consider the empirical version
The random variables \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\) denote the contributing random components, whereas \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\) denote the negligible parts. In the sequel, both random variables depend on a sequence \({m}\rightarrow \infty \), i.e; \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}({m})\) and \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}({m})\). To simplify the notation, we often suppress this dependence if it is of no immanent relevance. This class of (empirical) operators is rich enough to include the lag operators \({\varvec{\mathcal {C}}}_h\) (in fact only \({\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\), see Sect. 3), but also the more general long-run covariance operator \({\varvec{\mathcal {G}}}\) (see Sect. 4). In order to provide an intuition for this setup, let us discuss how this translates in case of the covariance operator \({\varvec{\mathcal {C}}}\), hence \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) and \(\widehat{\varvec{\mathcal {D}}} = \widehat{\varvec{\mathcal {C}}}\). Then obviously \(\widetilde{\lambda }_j = \lambda _j\) and for \({m}= n\) we have
Clearly, if \(\mathbf{X}\) is stationary, then so is \(\{\eta _{k,j}\}_{k \in \mathbb {Z}, j \in \mathbb {N}}\) and hence \({\varvec{\mathcal {C}}}\) does not depend on n in this case. We also note that \({\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {C}}}] = 1\) and \({\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {C}}}] = 0\) for \(i \ne j\) since \({\mathbb {E}}[\eta _{k,i} \eta _{k,j}] = 0\) by the classical Kahunen–Loève expansion (cf. [36]). This is actually true in a more general fashion. Since \(\mathbf{e}\) are the eigenfunctions of \({\varvec{\mathcal {D}}}\), the representations in (2.1) and (2.2) yield that \((\widetilde{\lambda }_i \widetilde{\lambda }_j)^{1/2}{\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}] = 0\) for \(i \ne j\). For the sake of reference, we formulate this simple observation as a lemma.
Lemma 2
Assume \({\varvec{\mathcal {D}}}\) satisfies (2.1) and (2.2) with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Then \((\widetilde{\lambda }_i \widetilde{\lambda }_j)^{1/2}{\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} ]=0\) for \(i \ne j\) and \(\lambda _j = \widetilde{\lambda }_j {\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {D}}}]\).
Most of our results will depend on the centered version of \({\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\), i.e;
We now demand the following conditions.
Assumption 1
The operators \({\varvec{\mathcal {D}}}\), \(\widehat{\varvec{\mathcal {D}}}\) satisfy (2.1) and (2.2). Moreover, for a universal constant \(C^{\varvec{\mathcal {D}}}\) and a universal sequence \(s_{{m}}^{\varvec{\mathcal {D}}}= \mathcal {O}(1)\) and \({\mathfrak {a}}> 0\), \({\mathfrak {h}}, p \ge 1\), \(J_{{m}}^+ \in \mathbb {N}\) and \({m}\rightarrow \infty \) it holds that
- (D1):
-
\({m}^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}({m})\Vert _q \le C^{\varvec{\mathcal {D}}}\) and \({m}^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}({m})\Vert _q \le s_{{m}}^{\varvec{\mathcal {D}}}\) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \),
- (D2):
-
\(\max _{1 \le j \le J_{{m}}^+}\left\{ {{m}}^{-\frac{1}{2} + {\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{}}{|\lambda _j^{} - \lambda _i^{}|}, {{m}}^{-1 + 2{\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{} \lambda _j^{} }{(\lambda _j^{} - \lambda _i^{})^2}\right\} \le C^{\varvec{\mathcal {D}}}\) and \(\lambda _{J_{{m}}^+} \ge {{m}}^{-{\mathfrak {h}}}/C^{\varvec{\mathcal {D}}}\),
- (D3):
-
\(1/C^{\varvec{\mathcal {D}}} \le {\mathbb {E}}[{\varvec{\eta }}_{j,j}^{\varvec{\mathcal {D}}}({m})] \le {C}^{\varvec{\mathcal {D}}}\) for \(j \in \mathbb {N}\) and \(\sum _{j = 1}^{\infty } {\lambda }_j \le C^{\varvec{\mathcal {D}}}\).
Remark 1
Note that in the above assumptions, \({\varvec{\lambda }}\) may depend on \({m}\). We can deal with this case in the sequel due to the universal bounds provided by \(C^{\varvec{\mathcal {D}}}\).
Let us discuss these assumptions and compare them to the literature. As a general preliminary remark, we note that all of our results have analogues in a general Hilbert space setting \({\mathbb H}\). Working in \({\mathbb {L}}^2({\mathcal {T}})\) is notationally less burdensome though, and the proofs are simpler. In particular, the Fubini–Tonelli Theorem allows to interchange the order of inner products and expectations. Since most related relevant results in the literature focus on the covariance operator \({\varvec{\mathcal {C}}}\), we also consider this setup for our discussion, i.e; \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) (and \(\widehat{\varvec{\mathcal {D}}} = \widehat{\varvec{\mathcal {C}}}\)). To this end, it is convenient to translate Assumption 1 to this special case to make the comparison transparent. Recall the notation introduced in (2.3). We then have the following result.
Proposition 1
Let \(\mathbf{X}\) be stationary with \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] \le C^{\varvec{\mathcal {C}}}\) for a universal constant \(C^{\varvec{\mathcal {C}}}\). Then \({\varvec{\mathcal {C}}}\) satisfies (2.1) and (2.2) with summable eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Assume in addition that for some \({\mathfrak {a}}> 0,{\mathfrak {h}}, p \ge 1\) and universal sequence \(s_{n}^{\varvec{\mathcal {C}}}= \mathcal {O}(1)\) we have that
- (C1):
-
\(n^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{{\varvec{\mathcal {C}}}}(n)\Vert _q < C^{\varvec{\mathcal {C}}},\) \(n^{\frac{1}{4}}\max _{j \in \mathbb {N}}\Vert \sum _{k = 1}^n \eta _{k,j}\Vert _{2q} \le s_{n}^{\varvec{\mathcal {C}}},\) for \(q = p 2^{{\mathfrak {p}}+ 4},\) \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil ,\)
- (C2):
-
(D2) holds with \(C^{\varvec{\mathcal {D}}} = C^{\varvec{\mathcal {C}}},\) \({m}= n,\) \(J_n^+ \in \mathbb {N}\) and \({\mathfrak {a}}\) as above.
Then Assumption 1 holds for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) with \({\mathfrak {a}}> 0,{\mathfrak {h}}, p \ge 1,{m}= n,J_n^+ \in \mathbb {N},\) \(s_{{m}}^{\varvec{\mathcal {D}}} = s_{n}^{\varvec{\mathcal {C}}}\) and \(C^{\varvec{\mathcal {D}}} = C^{\varvec{\mathcal {C}}}\) as above.
Let us now compare the literature with Proposition 1.
Dependence assumptions: Assumption (C1) implicitly imposes a dependence assumption on the scores \(\eta _{k,j}\). In contrast to the literature (cf. [18, 27, 28, 48]), we do not require the typical independence assumption. In fact, (C1) is much more general. In Sect. 2.2 we also discuss why looking at \({\varvec{\mathcal {C}}}\) under dependence can be relevant in practice. It can be shown that (C1) holds under general, sharp weak dependence conditions. This means that if these conditions fail, we no longer have weak dependence. However, much more is valid. Suppose that \(\eta _{k,j} = \sum _{i = 0}^{\infty } \alpha _{i,j} \epsilon _{k-i,j}\) where \(\bigl \{\epsilon _{k,j}\bigr \}_{k \in \mathbb {Z},j \in \mathbb {N}}\) is standard Gaussian and IID and \(\alpha _{i,j} \thicksim i^{-\alpha }\), \(\alpha > 1/2\). Then we show in Sect. 2.2 that
where \(\Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathbb {L}}^2}\) denotes the Hilbert–Schmidt-norm. Hence the rate \(n^{-1/2}\) carries over and (C1) poses no restriction, as long as we consider the CLT-domain (normalization with \(n^{-1/2}\)). In this sense, condition (C1) is optimal (in the CLT-Domain). Interestingly, this also allows for long memory sequences, and we even obtain a CLT for \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) under long memory conditions, i.e; where \(\sum _{i = 1}^{\infty } \alpha _{i,j} = \infty \), see Theorem 3. Note that it is shown in [50] that \(\sum _{i = 1}^{\infty } |\alpha _i| <\infty \) is necessary for the validity of a CLT for \(\sum _{k = 1}^n X_k\) in an infinite dimensional Hilbert space, which is different from the univariate case. Observe that condition \(\max _{j \in \mathbb {N}}\Vert n^{-3/4}\sum _{k = 1}^n \eta _{k,j}\Vert _{2q} = \mathcal {O}(1)\) is usually for ‘free’ due to the additional factor \(n^{-1/4}\), and is only necessary to control the empirical mean correction \({\bar{X}}_n\). Finally, we remark that our method of proof can also be used to derive corresponding results in the non-central domain, i.e; where \(\Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathbb {L}}^2}\Vert _2 \thicksim b_n \) with \(\sqrt{n} = \mathcal {O}(b_n)\). To keep this exposition at reasonable length, this is not pursued here.
Structural conditions for eigenvalues: (C2) is the key condition regarding the structure of the eigenvalues \(\lambda _j\). Note that the special form of the terms appearing in (C2) is no coincidence, and is connected to the variance of the asymptotic distribution of the empirical eigenfunctions \(\widehat{e}_j\) (cf. [19]). The literature (cf. [13, 18, 26–28]) usually requires polynomial, exponential or convex structures regarding the decay-rate of the eigenvalues and particularly the spacing \(\psi _j\). For instance, a common minimum assumption is that \(\psi _j \gtrsim \lambda _j j^{-1}\), which reflects a polynomial behavior of the eigenvalues \(\lambda _j\). As will be discussed below Theorem 2, (C2) turns out to be much weaker, in fact, we shall see that it is nearly optimal. To get a feeling of the implications of (C2), let us consider the case where \(\lambda _j\) satisfies a convexity condition, i.e;
If (2.5) holds, then one may verify (cf. Lemma 13) that
hence (C2) is valid if \(J_n^+ \lesssim n^{1/2 - {\mathfrak {a}}} (\log n)^{-1}\). Note that these bounds are not directly influenced by the decay of \({\varvec{\lambda }}\) or \({\varvec{\Psi }}\). The convexity condition (2.5) itself is mild and includes many cases encountered in the literature (cf. [18]), in particular polynomial or exponential cases
Also note that (C2) implies that the first \(J_n^+\) eigenvalues are distinct. See [19] for a flavour of results which allow for eigenspaces with rank greater than one.
Moment assumptions: The existence of all moments (often with additional Gaussian like growth conditions) is usually required in the literature (cf. [18, 27, 28, 48]) in the context of expansions for \({\widehat{\lambda }}_j, {\widehat{e}}_j\). In contrast, we only require a finite number of moments, which, however, may be large. On the other hand, all of our results will be expressed in terms of the \(\Vert \cdot \Vert _p\)-norm, and moving over to the weaker \({\mathcal {O}}_P(\cdot )\) formulation, the moment assumptions can be lowered.
For stating our results, we introduce the quantity
which is one of the main contributing parts in the expansions given below. We first give the main results, followed by a discussion and comparison to the literature. For the empirical eigenvalues \({\widehat{\lambda }}_j\), we have the following.
Theorem 1
Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+\)
The above result provides an exact uniform first-order expansion for \({\widehat{\lambda }}_j\). For a nonuniform version, the factor \(J^{1/p}\) in the bound on the RHS can be dropped. Next, we state the companion result for the empirical eigenfunctions \({\widehat{e}}_j\).
Theorem 2
Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+\)
where \({\varLambda }_j = \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2},\) and we also have
Theorem 2 provides both uniform expansions for \({\widehat{e}}_j\) and the corresponding norm. As before, the factor \(J^{1/p}\) in the bound on the RHS can be dropped for a nonuniform version. We also have a slight modification of Theorems 1 and 2.
Proposition 2
Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+,\) one may replace \(\{I_{k,j}\}_{k \in \mathbb {N}}\) with \(\{(\widetilde{\lambda }_k \widetilde{\lambda }_j)^{1/2} \overline{\varvec{\eta }}_{k,j}^{\varvec{\mathcal {D}}}\}_{k \in \mathbb {N}}\) in Theorems 1 and 2. Recall also that \(\widetilde{\lambda }_j = \lambda _j/{\mathbb {E}}[{\varvec{\eta }}_{j,j}^{\varvec{\mathcal {D}}}]\) by Lemma 2.
As an immediate corollary, we obtain a probabilistic version of Lemma 1 of correct order.
Corollary 1
Assume that Assumption 1 holds. Then for \(1 \le j < J_{{m}}^+\)
2.1 Previous results and comparison
Let us now compare Theorems 1 and 2 to the literature in case of \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\). It seems that the currently best known expansions in this context can be found in [28]. Among other things, it is required that \(\{X_k\}_{k \in \mathbb {Z}}\) is IID, all moments exist, and the error term \(ER_{J_n^+}\) in the expansions of \({\widehat{\lambda }}_j - \lambda _j\) (not weighted with \(\lambda _j^{-1}\)) is of magnitude
and \(\xi _j \in (0,1)\) is defined as \(\xi _j = \inf _{k <j}(1 - \frac{\lambda _k}{\lambda _j})\). We emphasize that this is the overall error term, hence one requires for instance at least \(\sqrt{n}ER_{J_n^+} = \mathcal {O}(1)\) for the validity of a CLT, and \((n/\lambda _{J_n^+}^2)^{1/2}ER_{J_n^+} = \mathcal {O}(1)\) for a weighted version. If we assume the convexity condition (2.5), we see that (C2) is much weaker. In fact, takeing for instance \(\lambda _j \thicksim j^{-{\mathfrak {c}}}\) we find that \(ER_{J_n^+} \gtrsim n^{-3/2} (J_n^+)^{3 + 7{\mathfrak {c}}/2}\). On the other hand, we see from (2.6) that if \(J_n^+ \thicksim n^{1/2 - {\mathfrak {a}}}\), \({\mathfrak {a}}> 0\), we still obtain valid asymptotic expansions, i.e; the expressions containing \(I_{k,j}\) are still the principal terms in our expansions, reflecting the exact asymptotic behavior. In stark contrast, \(ER_{J_n^+}\) already explodes for \({\mathfrak {a}}\) small (resp. \({\mathfrak {c}}\) large) enough, rendering a vacuous result. Similarly, (C2) is valid if we only require
and again obtain valid asymptotic expansions. On the other hand, the actual approximation error \(ER_{J_n^+}\) in [28] may even be unbounded, since \(1/\lambda _j \rightarrow \infty \) as j increases. In this sense, Assumption 1 is substantially weaker.
2.2 Dependence assumptions: optimality
Throughout this section, we assume that \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\). We first present the following result.
Theorem 3
Assume that \(\mathbf{X}\) has zero mean such that for \(\alpha > 3/4\)
Then (C1) holds. Moreover, if we have in addition (C2) (for \(J_n^+\) possibly finite), then for any fixed \(1 \le j < J_n^+\)
where \(\xrightarrow {w}\) denotes weak convergence, and \(\sigma _{\lambda _j}^2\) the corresponding variance. Note that an analogue result can be established for \(\widehat{e}_j,\) see [41] for details.
The above result indicates that \(\alpha = 3/4\) is the boundary value for a CLT with normalization \(\sqrt{n}\), see also the discussion in [3]. In fact, given the linear structure of \(\eta _{k,j}\) one readily computes that
On the other hand, Lemma 6 below yields that (C1) implies \(\Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathbb {L}}^2}\Vert _2 \lesssim n^{-1/2}\). Hence we obtain the equivalence in (2.4). Finally, note that the regime \(1/2 < \alpha \le 1\) is generally considered as long memory. Hence by Theorem 3 above, we obtain a CLT for \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) even in the presence of long memory, where \(3/4 < \alpha \le 1\). If \(1/2 < \alpha \le 3/4\), Non-central limit theorems arise. If \(\alpha \le 1/2\), then \({\mathbb {E}}[\Vert X_0\Vert _{{\mathbb {L}}^2}^2]=\infty \), which requires a completely different treatment.
2.3 Spectral gap: almost optimality
Next, we discuss the issue of ‘almost optimality’ of condition (C2). To this end, we draw heavily from the noteworthy results of [48]. Suppose that \(\{\eta _{i,j}\}_{i,j \in \mathbb {N}}\) are IID and satisfy \({\mathbb {E}}[|\eta _{i,j}|^{2p}] \le p! C^{p-1}\) for some constant \(C > 0\). If a structure condition like (\(\mathbf{E P}\)) holds, then it is shown in [48] that
As can be seen from Corollary 1, this bound deviates from the optimal one by the additional factor \((\log n)^2\). On the other hand, note that in the polynomial case in (\(\mathbf{E P}\)), this bound is also valid for \(j > J_n^+\) (we require \({\mathfrak {a}}> 0\)), which is a slightly larger region. In [48], a lower bound is also provided, which is \(\frac{j^2}{n} \wedge 1\). Strictly speaking, it is proven for the projection \(\widehat{\pi }_j = {\widehat{e}}_j \otimes {\widehat{e}}_j\), where \(\otimes \) denotes the one-rank operation
According to [48], it then holds that (recall that \({\mathcal {L}}\) denotes the operator norm)
On the other hand, Corollary 1 and elementary computations yield
(in the polynomial case) and thus the order of the upper and lower bounds match for \(j \le n^{1/2 - {\mathfrak {a}}} (\log n)^{-1}\). If \(j \ge n^{1/2}\), Cauchy–Schwarz yields the trivial optimal upper bound. Since \({\mathfrak {a}}> 0\) may be chosen arbitrarily small given sufficiently many (all) moments, we find that our conditions on the eigenvalues \({\varvec{\lambda }}\) are essentially optimal. In other words, we obtain exact expansions and the optimal error bound for almost the complete region of indices j where (2.12) still converges to zero.
3 Lag operator
While the covariance operator \({\varvec{\mathcal {C}}}\) is a key object for serially uncorrelated data \(\mathbf{X}\), the lag operator \({\varvec{\mathcal {C}}}_h\) and the long-run covariance operator \({\varvec{\mathcal {G}}}\) become more relevant in the presence of serial correlation, see Sects. 4 and 6 for a discussion. Here, we focus on \({\varvec{\mathcal {C}}}_h\), and then carry out a similar program for \({\varvec{\mathcal {G}}}\) in Sect. 4. To facilitate the discussion, let us first introduce a popular notion of weak dependence. In the remainder of this section, we assume that for each \(j \in \mathbb {N}\), the score sequence \(\{\eta _{k,j}\}_{k \in \mathbb {Z}}\) is a causal weak Bernoulli sequence, which can be written as
for some measurable functions \(g_j\) and IID sequences \(\{{\varvec{\epsilon }}_k\}_{k \in \mathbb {Z}}\) with \({\varvec{\epsilon }}_k = \{\epsilon _{k,j}\}_{j \in \mathbb {N}}\). We do not specify any crosswise dependence between \(\epsilon _{k,i}\), \(\epsilon _{k,j}\) for \(i \ne j\), allowing for a large flexibility. Let \({\mathcal {E}}_{k,j} = (\epsilon _{i,j}, \, i \le k)\). To quantify the dependence of \(\{\eta _{k,j}\}_{k \in \mathbb {Z}}\), we adopt the coupling idea. Let \(\{\epsilon _{k,j}'\}_{k \in \mathbb {Z},j \in \mathbb {N}}\) be an IID copy of \(\{\epsilon _{k,j}\}_{k \in \mathbb {Z},j \in \mathbb {N}}\) and \({\mathcal {E}}_{k,j}' = ({\mathcal {E}}_{-1,j},\epsilon _{0,j}', \epsilon _{1,j}, \ldots , \epsilon _{k,j})\) the coupled version of \({\mathcal {E}}_{k,j}\). Then we define
Roughly speaking, \({\varOmega }_p(k)\) measures the overall degree of dependence of \(\eta _{k,j} = g_j({\mathcal {E}}_{k,j})\) on \(\epsilon _{0,j}'\) and it is directly related to the data-generating mechanism of the underlying process ([57] refers to \({\varOmega }_p(k)\) as physical dependence measure). This dependence concept is well established in the literature, and popular processes like ARMA, GARCH, iterated random functions etc. fit into this framework (cf. [57, 58]). Consider for example the linear process \(\eta _{k,j} = \sum _{l = 0}^{\infty } \alpha _l \epsilon _{k-l,j}\) where \(\{\epsilon _{k,j}\}_{k,\in \mathbb {Z},j \in \mathbb {N}}\) is IID with \(\Vert \epsilon _{k,j}\Vert _p < \infty \). Then
In this sense, (3.3) is necessary for a CLT. In fact, if it is violated, one can construct examples such that
and a different normalization than \(n^{-1/2}\) is required (cf. [55]). In the sequel, all dependence conditions will be expressed in terms of summability conditions of \({\varOmega }_p(k)\).
A major difference when dealing with \({\varvec{\mathcal {C}}}_h\) compared to \({\varvec{\mathcal {C}}}\) (and \({\varvec{\mathcal {G}}}\)) is that it only satisfies a singular-value decomposition (SVD) in general, i.e; there exist orthonormal Bases \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), \(\mathbf{f} = \{f_j\}_{j \in \mathbb {N}}\) and a sequence of real numbers \({\varvec{\lambda }}= (\lambda _j)_{j \in \mathbb {N}}\) tending to zero such that for fixed \(h \in \mathbb {Z}\)
Hence a priori, \({\varvec{\mathcal {C}}}_h\) does not fit into our framework. However, by considering the symmetrized version \({\varvec{\mathcal {D}}}(\cdot ) = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h (\cdot )\), we end up with an operator that meets our requirements. Here, \({\varvec{\mathcal {C}}}_h^*\) denotes the adjoint operator of \({\varvec{\mathcal {C}}}_h\), given by
Routine computations (with \(\overline{X}_k = \sum _{j = 1}^{\infty }\widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\)) then indeed reveal that
Hence \({\varvec{\mathcal {D}}}\) has a spectral decomposition with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\) and satisfies (2.1). Representations (3.4), (3.5) motivate a natural plug-in estimator for \({\varvec{\mathcal {D}}}\) (cf. [10]), given as (for \(h \in \mathbb {N}\))
The empirical SVD components \({\widehat{{\varvec{\lambda }}}}=\{{\widehat{\lambda }}_j\}_{j \in \mathbb {N}}\), \(\widehat{\mathbf{e}} = \{{\widehat{e}}_j\}_{j \in \mathbb {N}}\) and \(\widehat{\mathbf{f}} = \{\widehat{f}_j\}_{j \in \mathbb {N}}\) are then defined via
where the empirical lag operator \(\widehat{\varvec{\mathcal {C}}}_h\) is given by
and analogously for \(-n+1 \le h < 0\). In order to apply Theorems 1 and 2 to \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\), the key objective is to validate (D1) for appropriate \(\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\) and \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\). To this end, introduce
Recalling \(\overline{X}_k = \sum _{j = 1}^{\infty }\widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\), we then define \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\) for fixed \(h \in \mathbb {N}\) as
Note that this automatically defines \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\) via (2.2), see also (8.2) in the proof. We then have the following result.
Proposition 3
Let \(q \ge 2\) and assume \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \) and \({\varOmega }_{4q}(k) \lesssim k^{-{\mathfrak {b}}},\) \({\mathfrak {b}}>3/2\). Then \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\) and \(\widehat{\varvec{\mathcal {D}}}\) as in (3.7) satisfy (2.1) and (2.2) such that
Related results can be established under different weak dependence conditions, see for instance [20]. Using Proposition 3, it is now easy to transfer the results, which we summarize in the following theorem.
Theorem 4
Suppose that \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]\le C^{{\varvec{\mathcal {C}}}_h}\) for a universal constant \(C^{{\varvec{\mathcal {C}}}_h}\). Assume in addition that for some \({\mathfrak {a}}> 0,\) \({\mathfrak {h}}, p \ge 1\) we have that
- \(\mathrm{(C}_\mathrm{h}\mathrm{1)}\) :
-
\(\Omega _{4q}(k) \lesssim k^{-{\mathfrak {b}}}\), \({\mathfrak {b}}>3/2\) for \(q = p 2^{{\mathfrak {p}}+ 4},\) \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil ,\)
- \(\mathrm{(C}_\mathrm{h}\mathrm{2)}\) :
-
(D2) holds with \(C^{\varvec{\mathcal {D}}} = C^{{\varvec{\mathcal {C}}}_h},\) \({m}= n,\) \(J_n^+ \in \mathbb {N}\) and \({\mathfrak {a}}\) as above,
- \(\mathrm{(C}_\mathrm{h}\mathrm{3)}\) :
-
\(0 < \inf _{j \in \mathbb {N}} \sum _{r = 1}^{\infty } \widetilde{\lambda }_r {\mathbb {E}}[\eta _{h,r}\eta _{0,j}]^2\).
Then Assumption 1 holds for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\) and \(\widehat{\varvec{\mathcal {D}}}\) as in (3.7) with \({\mathfrak {a}}> 0,\) \({\mathfrak {h}}, p \ge 1,\) \({m}= n,\) \(J_n^+ \in \mathbb {N},\) \(s_{{m}}^{\varvec{\mathcal {D}}} = s_{n}^{\varvec{\mathcal {D}}} = n^{-1/2}\) and \(C^{\varvec{\mathcal {D}}} = C^{{\varvec{\mathcal {C}}}_h}\) as above. In particular, Theorems 1 and 2 apply to \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\).
It remains to deal with \(\widehat{\mathbf{f}}\), which is the subject of Theorem 5 below.
Theorem 5
Grant the assumptions of Theorem 4, and let \(1 \le p' \le p\). Then
As the proof shows, Theorem 5 is essentially a concatenation of the previous results. Note in particular that the above expansion can be developed further in a straightforward manner by employing Theorems 1 and 2.
4 Long-run covariance operator
The long-run covariance operator is a natural generalization of the covariance operator in the presence of serial correlation. From a statistical perspective, this is particularly relevant in the context of the CLT, where under appropriate conditions on \(\mathbf{X}\), we have that
where \({\varvec{\mathcal {G}}}(\cdot )\) is the long-run covariance operator, (formally) defined as
Note that \({\varvec{\mathcal {G}}}\) in general only exists if \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}} < \infty \), which is usually referred to as a weak dependence condition. In view of (4.1), we see that \({\varvec{\mathcal {G}}}\) takes over the role of \({\varvec{\mathcal {C}}}\) if \(\mathbf{X}\) has serial correlation: in the ‘limit case’ where \(n^{-1/2}S_n\) is distributed as \({\mathcal {N}}\bigl (0, {\varvec{\mathcal {G}}}\bigr )\), the best (in \({\mathbb {L}}^2\)-sense) finite dimensional approximations are provided by the classical Kahunen–Loève decomposition with respect to \({\varvec{\mathcal {G}}}\). Hence we can expect that for large enough n, finite dimensional approximations of \(n^{-1/2}S_n\) based on appropriate estimates \(\widehat{\varvec{\mathcal {G}}}\) are close to optimality too. We refer to [29, 37, 53, 54], and more recently [15] for further discussions. A unifying, even more general object than \({\varvec{\mathcal {G}}}\) is the spectral density operator \(\varvec{\mathcal {F}}(\theta )\), first studied in [54], which recently has attracted a lot of attention (cf. [29, 53]). A (detailed) study is beyond the scope of the present note, and is left open for future research. It appears though that at least some of the results can be transferred.
Estimation of \({\varvec{\mathcal {G}}}\) is a delicate issue, and already in the univariate/multivariate case a substantial body of literature has evolved around this problem, see for instance [2, 30] and the many references therein. In the context of functional data, we refer for instance to [29, 37, 53, 54]. The basic principle is plug-in estimation, which leads to the estimates
and \(|\omega _h| \le 1\) is a sequence of weight functions. In the sequel, the choice of \(\omega _h\) has little impact on the results, and we therefore set \(\omega _h = 1\) for the remainder of this section. For consistent estimates, it is necessary that \(b = b_n \rightarrow \infty \) as n increases. Even so, in contrast to \(\widehat{\varvec{\mathcal {C}}}_h\), the estimate \(\widehat{\varvec{\mathcal {G}}}^b\) is biased. Depending on the decay rate of \(\Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}}\), the optimal choice of \(b_n\) is \(b_n \thicksim \log n\) (geometric decay), or \(b_n \thicksim n^{1/(2s + 1)}\) (polynomial decay with s), see [2]. Thus, the actual operator we are estimating is
Note that in general \({\mathbb {E}}[\widehat{\varvec{\mathcal {G}}}^b] \ne {\varvec{\mathcal {G}}}^b\) and hence \(\widehat{\varvec{\mathcal {G}}}^b\) is still biased, but this bias is negligible. We point out that subject to some regularity conditions (cf. [54])
which is the same rate as in the univariate case (cf. [2]). Moreover, under quite general assumptions (cf. [29, 54]), it follows that \({\varvec{\mathcal {G}}}^b\) satisfies the spectral decomposition
with eigenvalues \({\varvec{\lambda }}^b = \{\lambda _j^b\}_{j \in \mathbb {N}}\) and eigenfunctions \(\mathbf{e}^b = \{e_j^b\}_{j \in \mathbb {N}}\). Since the actual underlying operator of interest is \({\varvec{\mathcal {G}}}^b\), it is natural to (first) express our conditions in terms of \({\varvec{\lambda }}^b\) and \(\mathbf{e}^b\). We can decompose \(\overline{X}_k\) as
Observe that in general \({\mathbb {E}}[\eta _{k,j}^b \eta _{k,i}^b] \ne 0\) for \(i \ne j\), which is different from the Kahunen–Loève expansion. In analogy to (2.3), we also introduce the quantity
It is then easy to see that
for appropriate (degenerate) random variables \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\}_{i,j \in \mathbb {N}}\) (see (8.14)). Takeing (4.5) into account, we see that both (4.5), (4.8) match the setup in (2.1) and (2.2). We can thus appeal to the results of Sect. 2. To this end, we translate Assumption 1 to our present setup.
Assumption 2
The sequence \(\mathbf{X}\) is stationary such that \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}} < \infty \). Moreover, for \(b = \mathcal {O}(n)\), a universal constant \(C^{\varvec{\mathcal {G}}}<\infty \) and universal sequence \(s_n^{\varvec{\mathcal {G}}} = \mathcal {O}(1)\) and \({\mathfrak {a}}> 0\), \({\mathfrak {h}}, p \ge 1\) and \(J_n^+ \in \mathbb {N}\) it holds that
- (G1)\(^{b}\) :
-
\((n/b)^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{b}(n)\Vert _q \le C^{\varvec{\mathcal {{\mathcal {G}}}}}\), \(n^{-\frac{3}{4}} b^{\frac{1}{4}} \max _{j \in \mathbb {N}}\Vert \sum _{k = 1}^n \eta _{k,j}^b\Vert _{2q} \le s_n^{\varvec{\mathcal {G}}}\) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \),
- (G2)\(^{b}\) :
-
\(\max _{1 \le j \le J_n^+}\left\{ {(n/b)}^{-\frac{1}{2} + {\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{b}}{|\lambda _j^{b} - \lambda _i^{b}|}, {(n/b)}^{-1 + 2{\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{b} \lambda _j^{b} }{(\lambda _j^{b} - \lambda _i^{b})^2}\right\} \le C^{\varvec{\mathcal {G}}}\) and \(\lambda _{J_n^+}^{b} \gtrsim {(n/b)}^{-{\mathfrak {h}}}\),
- (G3)\(^{b}\) :
-
\(1/C^{\varvec{\mathcal {G}}} \le {\mathbb {E}}[\varvec{\eta }_{j,j}^{b}] \le C^{\varvec{\mathcal {G}}}\) for \(j \in \mathbb {N}\), \(\sum _{j = 1}^{\infty } \lambda _j^b \le C^{\varvec{\mathcal {G}}}\).
Let us discuss these conditions. In view of (4.4), the choice \(m = n/b\) is quite natural. Condition (G1)\(^{b}\) is a little more explicit than (D1), but of the same nature. (G2)\(^{b}\), (G3)\(^{b}\) are essentially translations of (D2), (D3). Note that in the present formulation, (G3)\(^{b}\) reflects the common non-degeneracy assumption encountered in the time series literature.
The setup in Assumption 2 is quite general. Before looking at the possible range of applications, let us formulate the transferred results. To this end, in analogy to \(I_{i,j}\) in (2.7), we introduce \(I_{i,j}^b\) as
We then have the following general transfer result.
Theorem 6
Assume that Assumption 2 holds. Then for \(1 \le J< J_n^+,\) Theorems 1 and 2 remain valid if we substitute \(n/b,\lambda _j^b,\) \(e_j^b,{\widehat{\lambda }}_j^b,\) \({\widehat{e}}_j^b\) and \(I_{i,j}^b\) at the corresponding places. Moreover, corresponding versions of Proposition 2 and Corollary 1 hold.
Due to the uniform bounds provided by \(C^{\varvec{\mathcal {G}}}\) in Assumption 2, Theorem 6 can either be used pointwise or uniformly in b, n, depending on whether Assumption 2 holds pointwise or uniformly. The strength and weakness of Theorem 6 is that everything is essentially expressed in terms of the operator \({\varvec{\mathcal {G}}}^b\). The positive aspect is that this makes the assumptions rather general (in fact, almost optimal in a certain sense, see below). On the other hand, the drawback is that these conditions can be difficult to verify, since they explicitly depend on b. If \(b = b_n\) is a function in n this is not so useful, and uniform bounds in terms of n would be more interesting. Let us mention here that the trouble mainly originates from (G2)\(^{b}\) and not (G1)\(^{b}\). It is therefore desirable to find simple conditions that depend in a more transparent way on b, and preferably mainly on \({\varvec{\mathcal {{\mathcal {G}}}}}\). We first discuss a case where this can be accomplished rather easily.
\({\mathfrak {m}}\)-Correlated processes: We call \(\mathbf{X}\) an \({\mathfrak {m}}\)-correlated process if \({\varvec{\mathcal {C}}}_h = 0\) for \(|h| > {\mathfrak {m}}\), where \({\mathfrak {m}}\) is finite. Locally dependent processes are quite common in the literature, and often modeled as \({\mathfrak {m}}\)-dependent processes. Clearly, \({\mathfrak {m}}\)-dependency implies \({\mathfrak {m}}\)-correlation. Moreover, we get that
Note that \({\mathfrak {m}}\)-correlation also implies that representations (4.5) and (4.8) are valid. Hence we conclude the following.
Corollary 2
If \(\mathbf{X}\) is \({\mathfrak {m}}\)-correlated and \({\mathfrak {m}}\le b,\) then we can replace \(e_j^b,\eta _{k,j}^b\) with \(e_j^{{\mathfrak {m}}},\eta _{k,j}^{{\mathfrak {m}}}\) everywhere in (4.6) and (4.7) (which alters (G1)\(^{b}\)), and b with \({\mathfrak {m}}\) everywhere in (G2)\(^{b}\) and (G3)\(^{b}\).
Corollary 2 shows that Theorem 6 applies to a large class of processes under general and accessible conditions. Note in particular, that the optimality criterium used in Sect. 2.3 also applies since \({\mathfrak {m}}\) is finite. In the presence of \({\mathfrak {m}}\)-dependence, the conditions can be further simplified. More precisely, routine calculations reveal that (G1)\(^{b}\) can be replaced with
- (G1)\(^{{\mathfrak {m}}}\) :
-
\(\max _{j \in \mathbb {N}}\Vert \eta _{k,j}^{{\mathfrak {m}}}\Vert _{2q}<\infty \) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \).
If the dependence in question is infinite, i.e.; a general weak dependence applies, then the situation is more complicated. This is discussed in more detail in an extended version in [41].
5 Maximum deviation of empirical eigenvalues
As already mentioned, Theorems 1 and 2 can be used to obtain various fluctuation results for eigenvalues or eigenfunctions. We exemplify this further in case of \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\), mentioning that a similar program can be carried out for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^*{\varvec{\mathcal {C}}}_h\), \(h \in \mathbb {Z}\) fixed. To this end, we formally introduce the longrun covariance (recall that \(\overline{X} = X - {\mathbb {E}}[X]\)) as
In Sect. 9.1 we show that this is well-defined given Assumption 3 below. Moreover, for \(\sigma _{j}^2 = \gamma _{j,j}\) we have the usual representation \(\sigma _{j}^2 = \sum _{k \in \mathbb {Z}} \phi _{k,j}\), where \(\phi _{k,j} = {\mathbb {C}}\text {ov}[\eta _{0,j}\eta _{0,j},\eta _{k,j}\eta _{k,j}]\). Consider \({\varvec{\mathcal {C}}}\) with eigenvalues \({\varvec{\lambda }}\) and denote with
where \(\{Z_{j}\}_{1 \le j < J}\) is a zero mean sequence of Gaussian random variables with correlation structure \({\varSigma }_{J}^{Z_{}} = (\rho _{i,j})_{1 \le i,j < J}\), where \(\rho _{i,j} = \gamma _{i,j}/\sigma _{i} \sigma _{j}\). In the sequel, we show that \(T_{J_n^+}^{}\) is close to \(T_{J_n^+}^{Z}\) in probability. To this end, we work under the following assumption.
Assumption 3
For \(p \ge 1\) let \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \), and assume that
- (E1):
-
\({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \) and (C2) hold (with \({\mathfrak {a}},{\mathfrak {h}}\) as above) such that \((J_n^+)^{1/p} n^{-{\mathfrak {a}}} \lesssim n^{-\delta }\), \(\delta > 0\),
- (E2):
-
\({\varOmega }_{2q(k)} \lesssim k^{-{\mathfrak {b}}}\), \({\mathfrak {b}}> 3/2\),
- (E3):
-
\(\inf _j \sigma _{j}> 0\).
Note that these assumptions are mild. In particular, the decay rate \({\mathfrak {b}}\) in condition (E2) is completely independent of the underlying dimension \(J_n^+\).
Theorem 7
Grant Assumption 3. Then
The above result provides a Gaussian approximation with an algebraic rate. Note that no conditions on the underlying covariance structure are required. If we impose a very weak decay assumption on \(\gamma _{\lambda ,i,j}\), we obtain the limit distribution.
Corollary 3
Grant Assumption 3, and assume in addition
Then for \(x \in \mathbb {R}\)
where \(u_m(x) = x/a_m + b_m\) with \(a_m = (2 \log m)^{1/2}\) and \(b_m = (2 \log m)^{1/2} - (8 \log m)^{-1/2}(\log \log m + 4\pi - 4)\) for \(m \in \mathbb {N}\).
Remark 2
Note that condition (5.3) is essentially the weakest possible currently known, see [30, 45].
Uniform control measures are an important statistical tool and have many applications. In the present context, Corollary 3 allows for the construction of simultaneous confidence bands for \({\widehat{\lambda }}_j\). This in turn is very useful to assess parametric hypothesis and decay rates of the structure of \({\varvec{\lambda }}\). A particular and important case is the determination of relevant principle components. A huge number of stopping rules have been developed in the literature (cf. [40, 43]), which all require a uniform control of \({\widehat{{\varvec{\lambda }}}}\). As pointed out by a reviewer, Corollary 3 can be particularly useful in case of threshold rules like the scree plot, see also [4] for related problems.
6 Applications
A huge bulk of testing and estimation problems in FPCA is related to the normalized scores \(\{\eta _{k,j}\}_{k \in \mathbb {Z}, j \in \mathbb {N}}\) in some way or other, where the associated operator is either \({\varvec{\mathcal {C}}}_h\) or \({\varvec{\mathcal {G}}}\). Among others, we mention (two) sample mean tests and related problems [36, 37, 47], tests about potential serial correlation, stationarity and related issues [4, 22, 24, 35, 38, 44, 52, 54], various change point problems [6, 34], and many more. Given a sample of size n, the canonical estimator of the scores is their empirical version
Intuitively, it is clear that the power of tests or estimation accuracy is augmented if \(J_n^+\) increases with the sample size, since more and more information is taken into account. From a theoretical statistical point of view, this can be made rigorous by minimax theory for estimates and Ingster’s (minimax)-theory for tests (cf. [31, 39]). In [23], a striking example is presented where a very large amount of principal components is required to adequately describe the data, see also [12]. Let us also mention that the necessity of uniform control of \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\) also arises in the completely different field of machine learning in the context of techniques based on Reproducing Kernel Hilbert spaces, see for instance [8]. All this highlights the importance of a uniform, accurate control of \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\) as \(J_n^+\) increases, and the usefulness of results like Theorems 1 and 2.
Let us briefly discuss how this relates to our main Assumption 1. Due to its general formulation, (D1) is very flexible. In particular, all the problems mentioned above can be reformulated in a (general) framework (depending on the problem and corresponding operator) such that (D1) is valid. Regarding (D2), the convexity assumption (2.5) leading to (2.6) provides a general and simple condition that is recommended for all the applications. In particular, the resulting range \(J_n^+\) of potentially allowed principal components is quite large. (D3) typically reflects a non-degeneracy condition, which usually is necessary any way in the problem at hand. We do not take this discussion any further, but rather investigate two other applications a little more detailed. The first one is the functional linear model, which contains in particular first order autoregression in Hilbert spaces (coined ARH(1) or FAR(1)). As a second, very different application, we survey how and why long-memory situations can arise in a functional context and how this relates to our results.
6.1 Functional linear regression
A fundamental regression model in a high-dimensional context is the functional linear model. Given \(\mathbf{X} = \{X_k\}_{k \in \mathbb {Z}}\), \(\mathbf{Y} = \{Y_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\), the basic model is defined as
where \({{\varvec{\Phi }}}\) is a (bounded) linear operator, mapping from \({\mathbb {L}}^2({\mathcal {T}})\) to \({\mathbb {L}}^2({\mathcal {T}})\), and \({\varvec{\varepsilon }}= \{\epsilon _k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) is a noise sequence. The goal is to recover \({{\varvec{\Phi }}}\), given \(\mathbf{X}\) and \(\mathbf{Y}\), while the noise \({\varvec{\varepsilon }}\) is unknown. Observe that estimating \({{\varvec{\Phi }}}\) is an ill-posed problem, see e.g. [14] for a more detailed discussion. Model (6.1) and its many variations have been extensively studied in the literature, with active research persisting (see e.g. [32]), and it would be impossible to survey all the results. From a theoretic perspective, a significant part of the current literature (cf. [11, 13, 17, 26, 27, 49] and the extensive references therein) focuses on the case where \(\mathbf{Y}\) and \({\varvec{\varepsilon }}\) are mutually independent (which excludes ARH(1)), and in addition \(X_k, {{\varvec{\Phi }}}(Y_{k}), \epsilon _k\) are all real-valued. Hence by Riesz-representation \({{\varvec{\Phi }}}(\cdot ) = \langle x^{\phi }, \cdot \rangle \) for some \(x^{\phi } \in {\mathbb {L}}^2({\mathcal {T}})\), and it all boils down to the estimation of \(x^{\phi }\). Let us touch on the main idea for estimating \({{\varvec{\Phi }}}\). Denote with \({\varvec{\mathcal {C}}}^y\) the covariance operator of \(\mathbf{Y}\) with eigenvalues \({\varvec{\lambda }}^{y}\) and eigenfunctions \(\mathbf{e}^{y}\). For the remainder of this section, we assume that \({\varvec{\varepsilon }}= \{\epsilon _k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) is an IID sequence, and for each \(k \in \mathbb {Z}\), \(\epsilon _k\) and \(Y_k\) are independent. Applying Fubini–Tonelli we get that for \(j \in \mathbb {N}\)
Hence we obtain the alternative representation
The advantage of this representation is that all involved quantities can be estimated. Given a truncation parameter \(b \in \mathbb {N}\), this motivates the estimate
In special cases, it is known that (a version of) \(\widehat{{{\varvec{\Phi }}}}^b\) is sharp minimax optimal (cf. [49]), and adaptive in slightly more general situations (cf. [17]). The construction of \(\widehat{{{\varvec{\Phi }}}}^b\) illustrates the necessity of an accurate control of \({\widehat{{\varvec{\lambda }}}}^y\) and \(\widehat{\mathbf{e}}^{y}\). We remark that Proposition 1 is very useful in this context. Not only can it be used to obtain precise bounds for prediction errors or the actual estimation error \(\Vert \widehat{{{\varvec{\Phi }}}}^b - {{{\varvec{\Phi }}}}\Vert _{\mathcal {L}}\) itself, but also for deriving various limit theorems for functions of \(\widehat{{{\varvec{\Phi }}}}^b\), which requires exact expansions. Limit theorems in turn are required for goodness of fit tests or the construction of confidence sets.
Let us now consider the setup where \(Y_k = X_{k-1}\), which is exactly the case of an ARH(1) process. Note that for \(p\in \mathbb {N}\) finite any ARH(p) process can be reformulated as an ARH(1) process by changing the underlying Hilbert space, see [10] for details. Below in Corollary 4, we provide simple yet general conditions that imply the validity of Proposition 1 for ARH(1)-processes. In view of the discussion about the convexity condition in (2.5) leading to (2.6), providing a general and simple condition, we only touch on the validity of (C1). Regarding the operator \({{\varvec{\Phi }}}\), we assume that it possesses the spectral decomposition
with eigenvalues \({\varvec{\lambda }}^{\phi }\) and eigenfunctions \(\mathbf{e}^{\phi }\). In the sequel, let \({\varvec{\Theta }}\) be any operator with eigenvalues \({\varvec{\lambda }}^{\theta }\) and eigenfunctions \(\mathbf{e}^{\theta }\) satisfying the spectral decomposition
Natural candidates for \({\varvec{\Theta }}\) in our framework are of course the operators \({\varvec{\mathcal {C}}}_h^*{\varvec{\mathcal {C}}}_h\) or \({\varvec{\mathcal {G}}}^b\). We have the associated usual decomposition of \(X_k\), given as
Similarly, denote with \({\varvec{\mathcal {C}}}^{\epsilon }\) the covariance operator of \(\epsilon _k\) with eigenvalues \({\varvec{\lambda }}^{\epsilon }\) and eigenfunctions \(\mathbf{e}^{\epsilon }\), and consider the decomposition \(\epsilon _k = \sum _{j = 1}^{\infty } \sqrt{{\lambda }_j^{\epsilon }} \epsilon _{k,j} e_j^{\epsilon }\), \(k \in \mathbb {Z}\). We make the following distributional assumption for \(\epsilon _k\). Given \(q \ge 1\), there exists a \(q' \ge q\) and a constant \(C_q > 0\) such that
Condition (6.6) is mild and allows for a certain invariance in or results, see below for more details. A general example satisfying (6.6) with \(q' = q\) is the following. Suppose that for each fixed \(k \in \mathbb {Z}\), \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\) forms a martingale difference sequence with respect to some filtration \({\mathcal {F}}_{k,j}^{\epsilon }\). Elementary calculations together with Burkholders inequality then yield the validity of (6.6). Note that since the scores of a covariance operator always have zero correlation, demanding an underlying martingale structure is a reasonable assumption. Observe that in the Gaussian case, we even have that \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\) is IID, which is a common assumption in the literature. Next, recall the notion of weak dependence introduced in Sect. 3. We then have the following result.
Proposition 4
Assume that \({{\varvec{\Phi }}},\) \({\varvec{\Theta }}\) satisfy representations (6.4), (6.5). If \({\mathbb {E}}[\Vert \epsilon _k\Vert _{{\mathbb {L}}^2}]<\infty ,\) then \(\mathbf{X}\) is a stationary Bernoulli-shift process which can be written as \(X_k = \sum _{i = 0}^{\infty } {{\varvec{\Phi }}}^{i}(\epsilon _{k-i})\). If in addition \(\{\epsilon _k\}_{k \in \mathbb {Z}}\) satisfies (6.6) for some \(2 \le q \le q',\) then
Note that the geometric contraction property in (6.7) is independent of the underlying orthonormal basis \(\mathbf{e}^{\theta }\), which is a desirable property. A check of the proof reveals that this essentially follows from condition (6.6). We also remark that Proposition 4 can be extended to more general ARH(p)-processes using the same method as in [10]. Denote with \({\varvec{\mathcal {C}}}^x\) the covariance operator of \(\mathbf{X}\), and let \({\varvec{\Theta }} = {\varvec{\mathcal {C}}}^x\). We then obtain the following result.
Corollary 4
Grant the assumptions of Proposition 4 and let \({\varvec{\Theta }} = {\varvec{\mathcal {C}}}^x\). Then there exists a universal constant \(C^{\varvec{\mathcal {C}}}\) and universal sequence \(s_n^{\varvec{\mathcal {C}}} \lesssim n^{-1/4}\) such that (C1) holds.
A related result can be established for \({\varvec{\Theta }} = {\varvec{\mathcal {G}}}^b\), we omit the details.
6.2 Weak and long memory in econometric and financial time series
In the presence of serial dependence, the covariance operator \({\varvec{\mathcal {C}}}\) as a single object is not so relevant in the context of a CLT, and the long-run operator \({\varvec{\mathcal {{\mathcal {G}}}}}\) is the key object. However, this can be entirely different if only serial dependence is present, but essentially no serial correlation, which is often the case in financial or econometric time series. More recently, there has been considerable activity (see for instance [5, 25] and particularly [51]) to model financial or econometric time series with the help of FPCA. In this context, it is well-known (cf. [9]), that (differenced) stock returns often display a martingale like behavior, which forms the basis for many financial discrete time models (e.g. GARCH) and continuous time models (e.g. semimartingales). On the other hand, it is equally known that the absolute or squared returns display a completely different behavior, and sometimes even exhibit long memory (cf. [21]). As a general example, let us consider the case where \(\{\epsilon _k\}_{k \in \mathbb {Z}}\) is an IID sequence in \({\mathbb {L}}^2({\mathcal {T}})\), \(\{X_k\}_{k \in \mathbb {Z}}\), \(\{Y_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) are stationary and satisfy the structural equation
Note that the GARCH-model is a special case of (6.8), see also Example 2.4 in [33]. Observe that \(X_k\) is a martingale difference sequence with respect to \({\mathcal {E}}_k\). On the other hand, \(X_k^2\) (or \(|X_k|\)) can behave completely differently due to \(\{Y_k\}_{k \in \mathbb {Z}}\), as is desired from a modelling perspective. This becomes relevant for the estimator \(\widehat{\varvec{\mathcal {C}}}\). While we still have by the martingale CLT (up to mild regularity conditions)
the standard estimator \(\widehat{\varvec{\mathcal {C}}}\) as in (1.2) in contrast is based on \(X_k^2\). Depending on the behavior of \(\{Y_k\}_{k \in \mathbb {Z}}\), we may thus witness the full palette of dependence when employing \(\widehat{\varvec{\mathcal {C}}}\), ranging from independence to weak dependence or even a long memory behavior of \(X_k^2\). Due to the high degree of flexibility in (C1), our results thus provide the necessary tools for a more detailed analysis of the model in (6.8).
7 Proofs of asymptotic expansions
We introduce the following additional notation. Given functions \(f,g \in {\mathbb {L}}^2({\mathcal {T}})\) and a kernel \(\mathbf {K}(r,s)\), we write
If we have \(f = g\), then we write \(f^2 = f(r)^2\) and otherwise \(f f = f(r) f(s)\) in the above notation. We interchangeably use \(\langle \cdot , \cdot \rangle \) and \(\int _{{\mathcal {T}}} \cdot \), the latter being more convenient when dealing with kernels. We also frequently apply Fubini–Tonelli without mentioning it any further. Next, we introduce the empirical kernel \(\widehat{\mathbf {D}}\) and its analogue deterministic version \({\mathbf {D}}\) as
We first establish the transfer result of Proposition 1.
Proof of Proposition 1
Due to \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \), standard arguments (cf. [29]) reveal that \({\varvec{\mathcal {C}}}\) exists and satisfies (2.1) and (2.2) with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Moreover, we have that \({\varvec{\mathcal {C}}}\) is of trace class. Since \({m}= n\), by virtue of (C2) and since \({\mathbb {E}}[\eta _{k,j}^2]= 1\) for \(j \in \mathbb {N}\), we only need to verify (D1). Due to (C1), it suffices to establish a bound for \(\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\Vert _q\). However, using (2.3), Cauchy–Schwarz and (C1), the claim follows. \(\square \)
We now turn to the proofs of Theorems 1 and 2, which are developed in a series of lemmas. As starting point, we recall the following elementary preliminary result (cf. [10]).
Lemma 3
We have the decomposition
Rearranging terms, we obtain from the above that (provided \(\lambda _k \ne \lambda _j\))

and
Due to the frequent use of relations (7.4) and (7.5), it is convenient to use the abbreviation
in the sequel. We also recall the following lemma (cf. [10]).
Lemma 4
For any \(j \in \mathbb {N}\) we have
We proceed by deriving subsequent bounds for \(I_{k,j}, \textit{II}_{k,j}\) and \(\textit{III}_{k,j}\).
Lemma 5
Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+4}\) we have
Proof of Lemma 5
Using the orthogonality of \(e_j,e_k\) we have
hence the claim follows from (D1), Lemma 2 and (D3). \(\square \)
Lemma 6
Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+3}\) we have
Proof of Lemma 6
Since the Hilbert–Schmidt norm dominates the Operator norm, Parseval’s identity and Lemma 5 yield the claim, using that (D3) supplies \(\sum _{j = 1}^{\infty }\lambda _j < \infty \). \(\square \)
Lemma 7
Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+4}\) and \(k \in \mathbb {N}\) we have
Proof of Lemma 7
It holds that
Since \(\sum _{i = 1}^{\infty } E_{i,j}^2 = \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\) by Parseval’s identity, the Cauchy–Schwarz inequality gives
Hence the triangle inequality, (D1) and Lemma 2 together with (D3) yield
\(\square \)
Lemma 8
Assume that Assumption 1 holds, and let \({\mathcal {A}}_j = \{|{\widehat{\lambda }}_j - \lambda _j| \le \psi _j/2\}\). Then
Proof of Lemma 8
Proceeding as in Lemma E.2 and E.1 in the supplement of [31] (or likewise Lemma 18, Lemma 16 in [48]), it follows that for some absolute constant \(C > 0\)
Let \(p^* = p 2^{{\mathfrak {p}}+4}\). Then by the triangle inequality and Lemma 5
Similarly, we get that
Observe that due to (D2), (7.8) and (7.9) are bounded by \(\lesssim {m}^{-2{\mathfrak {a}}}\). Hence an application of Markov’s and the triangle inequality yields the claim. \(\square \)
The next result is our key technical lemma.
Lemma 9
Assume that Assumption 1 holds. Then uniformly for \(1 \le q \le p 2^{{\mathfrak {p}}/2 + 3}, k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)
Proof of Lemma 9
Note first that by construction of \({\mathcal {A}}_j\), we have that
Using the decomposition in (7.5) and bound (7.10), we obtain that
We now use a backward inductive argument. Let \(p_{i} = p 2^{i}\), \(\tau \ge 0\), and suppose we have uniformly for \(k \in \mathbb {N}\)
Then we obtain from (7.11), the triangle inequality and Lemma 5 that for \(l \ne j\)
Using decomposition (7.6), Cauchy–Schwarz and Lemma 2 together with (D3), we get
hence we obtain from Lemma 4, inequality (7.13) and (D1), (D2) that
and this bound holds uniformly for \(k \in \mathbb {N}\). Observe that we have now shown the validity of relation (7.12) with the updated value \(\tau = \tau + {\mathfrak {a}}\), but with respect to \(p_{{i}-1}\) instead of \(p_{i}\). Since \(\lambda _j \gtrsim {m}^{-{\mathfrak {h}}}\) with \({\mathfrak {h}}\ge 1\), it follows that after at most \({\mathfrak {p}}/2 + 1= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil /2 + 1\) iterations we have
where \(q^* = p 2^{{\mathfrak {p}}/2 + 3}\). By Lemma 7, relation (7.12) is true for \(\tau = 0\) (hence \({m}^{\tau } = 1\)) and \({i} = {\mathfrak {p}}+4\), constituting the basis induction step, hence the proof is complete. Note that we have also shown
which is of further relevance in the sequel. \(\square \)
Proposition 5
Assume that Assumption 1 holds. Then for \(1 \le q \le p 2^{{\mathfrak {p}}/2+2}\) we have uniformly for \(1 \le j < J_{{m}}^+\)
Proof of Proposition 5
The triangle inequality and Cauchy–Schwarz give
We now invoke the ‘traditional’ way of bounding \(\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\), (cf. [10, 36]), which uses the inequality
Hence using (7.15) and the triangle inequality, we obtain from (D2) that
Combining this with (7.16) gives the first inequality, Lemma 8 and Assumption 1 yield the second part. \(\square \)
Note that \({\mathfrak {a}}\le 1/2\) and hence \({\mathfrak {p}}/2 \ge {\mathfrak {h}}\ge 1\) and \(2^{{\mathfrak {p}}/2 + 2} \ge 8\). Since
we obtain the following corollary to Lemma 9.
Corollary 5
Assume that Assumption 1 holds. Then for \(1 \le q \le 8p\) we have uniformly for \(k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)
Proof of Corollary 5
Lemmas 7–9 and Cauchy–Schwarz give
Since \({\mathfrak {a}}p 2^{{\mathfrak {p}}+ 3}/q \ge {\mathfrak {a}}2^{{\mathfrak {p}}} \ge {\mathfrak {h}}\), we have \({{m}}^{-{\mathfrak {a}}p 2^{{\mathfrak {p}}+ 3}/q} \lesssim \lambda _{J_{{m}}^+}\) by (D2) and the claim follows. \(\square \)
Lemma 10
Assume that Assumption 1 holds. Then for \(1 \le q \le 4p\)
Proof of Lemma 10
We have that
Since by Lemma 4
we obtain by rearranging terms (if \(\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 < 2\))
Let \({\mathcal {B}}_j = \{\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \le 1\}\). By Lemma 5, Proposition 5 and the Cauchy–Schwarz inequality we obtain
Similarly, Corollary 5 yields that
Let \({\mathcal {D}} = \{\Vert \widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}} \Vert _{{\mathcal {L}}} \le 1\}\). Lemma 6 and Markov’s inequality then yield that
On the other hand, Proposition 5 implies that \(P({\mathcal {B}}_j^c) \lesssim {{m}}^{-2{\mathfrak {a}}p 2^{{\mathfrak {p}}/2+2}}\). Since \({\mathfrak {h}}\ge 1, 1/2 > {\mathfrak {a}}\) we have \(2^{{\mathfrak {p}}/2} \ge 1/2 + 1/4{\mathfrak {a}}+ {\mathfrak {h}}/2{\mathfrak {a}}\) and hence \({{m}}^{-2{\mathfrak {a}}2^{{\mathfrak {p}}/2}} \lesssim {{m}}^{-1/2 - {\mathfrak {a}}} \lambda _{J_{{m}}^+}\) by (D2). Combining (7.18), (7.19), (7.20) and (7.21) we obtain from the Cauchy–Schwarz inequality, Lemma 1 (see [10] for a general version) and Lemma 6, that
which gives the first claim. The second claim follows from Lemma 5. \(\square \)
Lemma 11
Assume that Assumption 1 holds. Then for \(1 \le q \le 2p\) we have uniformly for \(k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)
Proof of Lemma 11
Recall that \({\textit{III}}_{k,j} = ({\widehat{\lambda }}_j - \lambda _j)E_{k,j}\). By the Cauchy–Schwarz inequality and Lemma 10, we have that
Hence the claims follow from inequality (7.15) and (D2). \(\square \)
For the sake of reference, we state Pisier’s inequality.
Lemma 12
Let \(p \ge 1\) and \(Y_{j},1 \le j \le J\) be a sequence of random variables. Then
We are now ready to proof Theorems 1 and 2.
Proof of Theorem 1
This readily follows from Lemma 10 and Lemma 12. \(\square \)
Proof of Theorem 2
We treat the first claim. By Lemma 4 we have the decomposition
Note that by the triangle inequality
Let \(C_j = \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }e_k \frac{I_{k,j}}{\lambda _j - \lambda _k}\). Then another application of the triangle inequality gives
Hence by the Cauchy–Schwarz inequality and Lemma 5
which by Lemma 8 and (D2) (arguing as in the proof of Lemma 10) is bounded by
Lemma 12 and the inequality \({\varLambda }_j \ge \frac{\lambda _{j}}{\lambda _{j-1}} \gtrsim \lambda _{j} \wedge 1\) then show that it suffices to consider event \({\mathcal {A}}_j\). Corollary 5 and Lemma 11 give
hence the first claim follows from Lemma 12. Next, we treat the second claim. As before Lemma 4 yields
Proceeding as in the first claim, one shows that it suffices to consider the event \({\mathcal {A}}_j\). Let \({\mathcal {D}}_j = \{\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \le {{m}}^{-{\mathfrak {a}}} \}\). Then proceeding as in Lemma 10 we obtain
We thus obtain from Lemma 5, Corollary 5, Lemma 11 and (7.23)
Iterating this inequality once and rearranging terms, Lemma 5 yields that
Since \({\varLambda }_j \ge \frac{\lambda _{j}}{\lambda _{j-1}} \gtrsim \lambda _{j} \wedge 1\), an application of Lemma 12 yields the desired result. \(\square \)
Proof of Proposition 2
Observe that since \({\mathbb {E}}[\varvec{\eta }_{k,j}^{\varvec{\mathcal {D}}}]=0\) for \(k \ne j\), we get that
Since \(\widetilde{\lambda }_j = \lambda _j/{\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {D}}}]\), the claim follows from (D1) and routine calculations. \(\square \)
Proof of Corollary 1
The claim follows from Proposition 2 and (D1). \(\square \)
7.1 Proofs of Lemma 13 and Theorem 3
We first provide the following result about the convexity relations of \(\lambda _x\).
Lemma 13
If (2.5) holds, then (2.6) is valid.
Proof of Lemma 13
For the proof, the following relations are useful, which can be found in [13, 18].
Now by (7.25) we have
In the same manner, one shows that
\(\square \)
Proof of Theorem 3
First note that due to the Gaussianity of \(\mathbf{X}\), scores \(\eta _{k,i}\) and \(\eta _{k,j}\) are mutually independent for \(i \ne j\). Given independent standard Gaussian random variables X, Y, the function \(XY-1\) is a two-dimensional second degree Hermite polynomial. If \(X = Y\), then \(X^2-1\) is a univariate Hermite polynomial of second degree. We may now invoke Theorem 4 in [3]. The proof is based on the method of moments for partial sums of Hermite polynomials. In particular, using that \(\sup _{j \in \mathbb {N}}\sum _{k = 0}^{\infty }{Cov} (\eta _{0,j},\eta _{k,j})^2 < \infty \) (which follows from \(\alpha > 3/4\)) it is shown via the Diagram formula that for any fixed \(p \in \mathbb {N}\)
Moreover, since \(\alpha > 3/4\) one readily shows that \(\max _{j \in \mathbb {N}}\bigl \Vert n^{-3/4}\sum _{k = 1}^n \eta _{k,j}\bigr \Vert _{2q} = \mathcal {O}(1)\) for any fixed \(q \in \mathbb {N}\). Hence (C1) holds and using Proposition 2 the CLT for \({\widehat{\lambda }}_j\) follows. \(\square \)
8 Proofs of Sects. 3 and 4
For the proof of Proposition 3, we require some preliminary results.
Lemma 14
For \(p \ge 2,\) let \(\{X_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2\) satisfy
Lemma 14 comes as a byproduct of the results in [42], see also Lemma 16 and [58] for the original argument for real-valued sequences, which we also use in the sequel. As a next result, we state a special type of Höffding decomposition.
Lemma 15
Let \(\{X_k\}_{k \in \mathbb {Z}}, \{Y_k\}_{k \in \mathbb {Z}} \in \mathbb {R}\) be stationary such that for \(p \ge 2\)
Denote with \(A_{k} = (X_k - {\mathbb {E}}[X_k]){\mathbb {E}}[Y_1] + (Y_k - {\mathbb {E}}[Y_k]){\mathbb {E}}[X_1]\). Then
-
(i)
\(\Vert \sum _{1 \le k,l \le n} X_k Y_l - n \sum _{k = 1}^n A_k - n^2 {\mathbb {E}}[X_1]{\mathbb {E}}[Y_1]\Vert _{p} \lesssim n\),
-
(ii)
\(\Vert \sum _{k = 1}^n A_k\Vert _{2p} \lesssim \sqrt{n}\).
Proof of Lemma 15
Using the Höffding decomposition
claim (i) follows from the triangle inequality, Cauchy–Schwarz and Lemma 16. Claim (ii) follows directly from Lemma 16. \(\square \)
Proof of Proposition 3
Let us first mention that the assumptions of Proposition 3 clearly imply those of Lemmas 14 and 15. As another preliminary remark, observe that \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^{4p}] < \infty \) implies that \({\varvec{\mathcal {C}}}_h\) exists and \(\overline{X}_k = \sum _{j = 1}^{\infty } \widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\) with \(\sum _{j = 1}^{\infty } \widetilde{\lambda }_j < \infty \). Next, denote with
Employing Lemma 14, lengthy routine calculations reveal that (here condition \({\mathfrak {b}}> 3/2\) is helpful)
we spare the details. Observe next that we have the representation
From the triangle inequality and Cauchy–Schwarz, we obtain
Hence by (8.3), Lemma 15(i) (using (8.5)) and \(\sum _{r = 1}^{\infty } \widetilde{\lambda }_r < \infty \), we obtain
Next, using Lemma 15(ii) (applicable by (8.5)) we get
Finally, we remark that the same calculations used to derive (3.6) also reveal that \({\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}] = 0\) for \(i \ne j\). Hence (2.2) holds, which completes the proof. \(\square \)
Proof of Theorem 5
Note first that an application of Lemma 14 together with routine calculations gives
Let us make the decomposition
and also
Using (8.6), elementary computations yield
Next, for \(j \in \mathbb {N}\) consider the set \({\mathcal {C}}_j\) defined as
where the bound for \(P({\mathcal {C}}_j^c)\) follows from Markov’s inequality and Lemma 10. Since \(\Vert \widehat{f}_j\Vert _{{\mathbb {L}}^2} = \Vert {f}_j\Vert _{{\mathbb {L}}^2} = 1\), we thus obtain
Similarly, since \({\varvec{\mathcal {C}}}_h\) is a bounded operator, the triangle inequality, Cauchy–Schwarz, Lemma 10, (8.6) and (8.9) yield for \(1 \le p' \le p\)
Multiplying with \(\lambda _j^{-1/2}\), we see that it suffices to establish the claim on the set \({\mathcal {C}}_j\). To this end, observe that
Then (8.8), (8.11), Cauchy–Schwarz, the triangle inequality and Lemma 10 yield
Using (8.6) and (8.7) together with Cauchy–Schwarz, (8.11) together with Lemma 10 and combining this with (8.12), the triangle inequality gives
\(\square \)
Proof of Theorem 6
Since \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h \Vert _{{\mathcal {L}}} < \infty \), \({\varvec{\mathcal {{\mathcal {G}}}}}^{b}\) exists, and by \({\varvec{\mathcal {C}}}_h^* = {\varvec{\mathcal {C}}}_{-h}\), \({\varvec{\mathcal {{\mathcal {G}}}}}^{b}\) is symmetric. Hence by the spectral theorem, (4.5) holds. Together with (4.8), this gives (2.1) and (2.2). It remains to derive a bound for \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}(n)\). To this end, put \(\bar{\eta }_j^b = \bar{\eta }_j^b(n) = \langle {\bar{X}}_n-\mu , e_j^b \rangle (\widetilde{\lambda }_j^b)^{-1/2}\). Since \(b = \mathcal {O}(n)\), routine calculations then reveal the upper bound
Using Cauchy–Schwarz and (G1)\(^{b}\), the claim then follows. \(\square \)
9 Proofs of Sect. 5
We need to introduce some further notation. To this end, we slightly reformulate our notion of weak dependence in an equivalent way. In the sequel, \(\{\epsilon _{k}\}_{k \in \mathbb {Z}} \in \mathbb {S}\) denotes an IID sequence in some measure space \(\mathbb {S}\) and \({\mathcal {F}}_k = \sigma (\epsilon _j,\, j \le k)\) the corresponding filtration. For \(d \in \mathbb {N}\), we then consider the variables
where \(H_h\) are measurable functions. Compared to Sect. 5, this setup is notationally more convenient. As a measure of dependence, we then consider
where \(U_{k,h}' = H_h({\mathcal {F}}_k')\), \({\mathcal {F}}_k' = \sigma (\ldots \epsilon _{-1},\epsilon _0', \epsilon _1, \ldots , \epsilon _k)\), and \(\{\epsilon _k'\}_{k \in \mathbb {Z}}\) is an independent copy of \(\{\epsilon _k\}_{k \in \mathbb {Z}}\).
9.1 Gaussian approximation for weak dependence
In this section, a high dimensional Gaussian approximation result is established, which is a key ingredient in the proof of Theorem 7. This result may be of independent interest. Let \(S_{n,h} = \sum _{k = 1}^n U_{k,h}\), and denote with
where \(\{Z_h\}_{1 \le h \le d}\) is a sequence of zero mean Gaussian random variables. We also formally introduce
existence is shown below in Lemma 19. We also put \(\sigma _h^2 = \gamma _{h,h}\). Throughout this section, we work under the following assumption.
Assumption 4
The sequence \(\{U_{k,h}\}_{k \in \mathbb {Z}}\) is stationary for each \(1 \le h \le d\), such that for \(p > 2\) and \(d \lesssim n^{{\mathfrak {d}}}\)
- (F1):
-
\({\mathbb {E}}[U_{k,h}] = 0\) and \(\theta _{j,p} \lesssim j^{-{\mathfrak {c}}}\) with \({\mathfrak {c}}> 3/2 \),
- (F2):
-
\({\mathfrak {d}}< p/2 - 1\),
- (F3):
-
\(\inf _h \sigma _h > 0\).
We then have the following Gaussian approximation result.
Theorem 8
Grant Assumption 4. Then
where \(\{Z_h\}_{1 \le h \le d}\) has the same covariance structure as \(n^{-1/2}\{S_{n,h}\}_{1 \le h \le d}\). Alternatively, we may also choose \((\gamma _{i,j})_{1 \le i,j \le d}\) as covariance structure.
We first establish some additional notation. Let \(K = n^{{\mathfrak {k}}}\), \(L = n^{{\mathfrak {l}}}\) such that \(n = K L\) and \(0 < {\mathfrak {k}},{\mathfrak {l}}< 1\). To simplify the discussion, we always assume that \(K,L \in \mathbb {N}\). For each \(1 \le l \le L\), let \(\{\epsilon _{k}^{l}\}_{k \in \mathbb {Z}} \in \mathbb {S}\) be mutually independent sequences of IID random variables. For \(K(l-1) < k \le Kl\), \(1 \le l \le L\), denote with
where \({\mathcal {F}}_k^l = \sigma (\epsilon _j^l\, j \le k)\). For \(1 \le m < K\) put
and \(V_{l,h}^{\diamond } = V_{l,h}^{\diamond }(1)\). The random variables \(V_{l,h}^{\diamond }\) play a key role in the proof of Theorem 8. Note in particular that \(\{V_{l,h}^{\diamond }\}_{1 \le l \le L}\) is IID by construction for each h. Finally, put \(S_{L,h}(V) = \sum _{l = 1}^L V_{l,h}\) and \(S_{L,h}^{\diamond }(V) = \sum _{l = 1}^L V_{l,h}^{\diamond }\), and note that \(S_{n,h} = S_{L,h}(V)\). In the sequel, we make frequent use of the following lemma.
Lemma 16
Suppose that \(\sum _{j = 1}^{\infty } \theta _{j,p} < \infty \) for \(p \ge 2\). Then
For the proof and variants of this result, see [58]. The next lemma controls the approximation error between \(S_{L,h}(V)\) and \(S_{L,h}^{\diamond }(V)\).
Lemma 17
Grant Assumption 4. For any \(K = n^{{\mathfrak {k}}}\) with \(0 < {\mathfrak {k}}< 1\) there exists a \(\delta > 0\) and a constant \(C > 0\) such that
Proof of Lemma 17
Let \(x_n = x \sqrt{n}\), \(x > 0\). For \(1 \le m < K\) we have that
Denote with \(\alpha _{j,p} = (j^{p/2 - 1} \theta _{j,p}^p)^{1/(p+1)}\) and \(A = \sum _{j = 1}^{\infty } \alpha _{j,p}\). Note that by (F1) we have
and thus \(A < \infty \). Due to Theorem 2 in [46], there exist constants \(C_{p,1},C_{p,2} > 0\) such that
Setting \(x = y\sqrt{L \, m} A^{1 + 1/p}/\sqrt{n}\), it follows that \(\alpha _{j,p}^2 x_n^2 /(A^2 L\,m \theta _{j,2}^2) \ge j^{1 - 2/p}y^2\) and hence
Choosing m such that \(\sqrt{n}/\sqrt{L m} = n^{2 \delta }\) and \(y = n^{\delta }\), \(\delta > 0\), it follows that
Next, put \({\varDelta }_{k,h}(U) = U_{k,h} - U_{k,h}^{(K,\diamond )}\). By the triangle inequality, we have
Let \((k)_K = k \mod K\). Then Theorem 1 in [57] yields that
Since clearly \({\varTheta }_{(k)_K,p}\) is monotone decreasing, we have \({\varTheta }_{(k)_K,p} \le {\varTheta }_{(m)_K,p}\) for \(m \le k \le K\). Combining this with the above, it follows that for \(m \le (k)_K\) (since \(m = (m)_K\))
Put \(\beta _{j,p}(m) = (j^{p/2 - 1} \vartheta _{j,p}^p(m))^{1/(p+1)}\) and \(B(m) = \sum _{j = 1}^{\infty } \beta _{j,p}(m)\). Then another application of Theorem 2 in [46] yields that
Let \(y_n = n^{\delta } \sqrt{L m}/\sqrt{n} = n^{-\delta }\). Arguing similarly as before, it follows (since \(m = (m)_K\))
Since \({\varTheta }_{m,p} \lesssim m^{-2 {\mathfrak {c}}+ 1}\), we conclude
Setting \(m \thicksim n^{\nu }\), \(\nu > 0\), balancing the above and choosing \(\delta \) sufficiently small, we obtain \(y_n^2 B(m)^{-2} \wedge y_n^2/{\varTheta }_{m,p} \gtrsim n^{\delta }\). This implies that
Note that by the above choice of \(m = n^{\nu }\) we require that \(L \thicksim n^{1 - 4 \delta - \nu }\). Choosing \(\nu \) sufficiently close to 1, we can select \({\mathfrak {k}}< 1\) arbitrarily close to 1, which completes the proof. \(\square \)
In the sequel, we also require the following result.
Lemma 18
Grant Assumption 4. Then
Proof of Lemma 18
Since \(V_{l,h}^{\diamond } \mathop {=}\limits ^{d} V_{l,h}\), Theorem 2 in [46] and arguing similarly as in Lemma 17 yields
Setting \(y = \log n\), the claim follows. \(\square \)
Next, we establish some useful results concerning the covariances \(\phi _{k,i,j} = {\mathbb {E}}[U_{0,i}U_{k,j}]\).
Lemma 19
Grant Assumption 4. Then
-
(i)
\(\sup _{i,j}|\phi _{k,i,j}| \lesssim k^{-{\mathfrak {c}}+ 1/2},\)
-
(ii)
\(\sup _{i,j} \sum _{k = 0}^{\infty } |\phi _{k,i,j}| < \infty ,\)
-
(iii)
\(\gamma _{i,j} = \phi _{0,i,j} + 2 \sum _{k = 1}^{\infty } \phi _{k,i,j} < \infty ,\)
-
(iv)
\(\sum _{k,l = 1}^n {\mathbb {E}}[U_{k,i}U_{l,j}] = n \gamma _{i,j} - \sum _{k \in \mathbb {Z}}^{\infty } n \wedge |k| \phi _{k,i,j}\).
Proof of Lemma 19
Claims (iii) and (iv) are well-known in the literature, and follow from elementary computations from (ii). Since (i) implies (ii) due to \({\mathfrak {c}}> 3/2\), it suffices to establish (i). To this end, let \(U_{k,h}^* = H_h\bigl ({\mathcal {F}}_k^*\bigr )\), where \({\mathcal {F}}_k^* = \sigma (\ldots ,\epsilon _{-1}',\epsilon _0',\epsilon _1,\ldots , \epsilon _k)\). Since then \({\mathbb {E}}[U_{k,h}^{*}|{\mathcal {F}}_0] = {\mathbb {E}}[U_{k,h}]= 0\), Cauchy–Schwarz, Jensen’s inequality and Theorem 1 in [57] yield
where the last claim follows from (F1). \(\square \)
For \(1 \le i,j \le d\) denote with
Lemma 20
Grant Assumption 4. Then
Remark 3
From Lemma 19(iv), Lemma 20 and the triangle inequality, we have that
Proof of Lemma 20
We have that
By the Marcinkiewicz–Zygmund inequality, Lemma 16 and (F1) we have
Using the triangle inequality and Theorem 1 in [57], it follows that
Hence combining (9.6) and (9.7), the claim follows. \(\square \)
Next, we state some Gaussian approximation results. To this end, we require the following condition. For \(\varepsilon , u(\varepsilon ) > 0\) we have
Denote with
where \(\{Z_h^{\diamond }\}_{1 \le h \le d}\) is a zero mean Gaussian sequence with covariance structure \({\varSigma }_d^{(\diamond ,n)} = (\gamma _{i,j}^{(\diamond ,n)})_{1 \le i,j \le d}\). We have the following Gaussian approximation result, which is an adaptation of Theorem 2.2 in [16].
Lemma 21
Assume the validity of (9.8) and that
-
(i)
\(K^{-1/2}\min _{1 \le h \le d}\min _{1 \le l \le L} \Vert V_{l,h}^{\diamond }\Vert _2 > 0,\)
-
(ii)
\(K^{-1/2}\max _{1 \le h \le d}\max _{1 \le l \le L} \Vert V_{l,h}^{\diamond }\Vert _4 < \infty \).
Then it holds that
We also require the following two results, which are Lemmas 2.1 and 3.1 in [16], slightly adapted for our purpose.
Lemma 22
Let \(\{X_h\}_{1 \le h \le d}\) and \(\{Y_h\}_{1 \le h \le d}\) be zero mean Gaussian sequences, and denote with \(\gamma _{i,j}^X, \gamma _{i,j}^Y\) the corresponding covariances for \(1 \le i,j \le d\). If \(0 < \inf _h \gamma _{h,h}^X \le \sup _h \gamma _{h,h}^X < \infty \), then with \(\delta = \max _{1 \le i,j \le d}|\gamma _{i,j}^X - \gamma _{i,j}^Y|,\)
Lemma 23
Let \(\{X_h\}_{1 \le h \le d}\) be a zero mean Gaussian sequence with covariances \(\{\gamma _{i,j}^X\}_{1\le i,j\le d}\). If \(0 < \inf _h \gamma _{h,h}^X \le \sup _h \gamma _{h,h}^X < \infty ,\) then for \(|\delta |<\infty \)
Proof of Theorem 8
By Lemma 17 and Boole’s inequality, we have
Since \(d \lesssim n^{{\mathfrak {d}}}\) we obtain from (F2) with \(\delta > 0\) sufficiently small
Employing this bound, we get that
In the same manner one obtains a lower bound, hence
Next, we apply Lemma 21 to \(T_{L,d}^{\diamond }\). To this end, we need to verify its conditions. Note that by the independence of \(V_{l,h}^{\diamond }\), we have that
Hence we deduce from Lemmas 19, 20, Remark 3 and (F3) that
uniformly in h, and thus (i) holds. Next we verify (ii). This, however, readily follows from Lemma 12 and (F1). Finally, we need to establish (9.8). Set \(u(\varepsilon ) = (\log n)^2\). Using Boole’s inequality and Lemma 18 gives
By (F2) and choosing \({\mathfrak {k}}\) sufficiently close to 1, we get that
and (9.8) holds with \(\varepsilon \thicksim n^{-C_3}\). Since \(L \thicksim n^{{\mathfrak {l}}}\) with \({\mathfrak {l}}> 0\) due to \({\mathfrak {k}}< 1\), Lemma 21 yields that
Combining this with (9.10), we deduce that
Next, since \(\log d \lesssim \log n\), Lemma 23 yields that
In addition, by Remark 3
Hence the claim follows from Lemma 22. \(\square \)
9.2 Proofs of Sect. 5
Proof of Theorem 7
Denote with
We first show that we may apply Theorem 8 to \(T_{J_n^+}^{\eta }\). To this end, we need to verify Assumption 4. Observe that (E2) implies \(\bigl \Vert \eta _{k,j}\bigr \Vert _{q} < \infty \) (cf. [57]). Moreover, using \(a^2 - b^2 = (a-b)(a+b)\), it follows from Cauchy–Schwarz
Since \({\mathfrak {b}}> 3/2\) by (E2), (F1) follows. Next, note that (E1) implies that \(J_n^+ \lesssim n^{p ({\mathfrak {a}}- \delta )}\). Since \(q/2 - 1 > p 2^{{\mathfrak {p}}+ 2} > p {\mathfrak {a}}\) (recall \(0 < {\mathfrak {a}}< 1\)), (F2) holds. Finally, (E3) gives (F3), hence Assumption 4 is verified. We proceed with the proof. For \(j \in \mathbb {N}\), denote with \(I_{j,j}^* = \lambda _j\sum _{k = 1}^n \bigl (\eta _{k,j}^2 -1 \bigr )/n\), and note that by the above and Lemma 16 we have
Introduce the set
Then Markov’s inequality together with Proposition 2 and (9.14) yields
Due to Theorem 8 and the above, we have the inequalities
where \(T_{J_n^+}^{Z}\) is as in (5.2). An application of Lemma 23 yields that this is further bounded by
In the same manner, we obtain a lower bound, hence
which completes the proof. \(\square \)
Proof of Corollary 3
Due to Theorem 7, it suffices to show that
This, however, follows from Theorem 14 and Theorem 1 in [30]. \(\square \)
10 Proofs of Sect. 6
Proof of Proposition 4
Due to (6.4), Theorem 3.6 in [10] yields the Bernoulli-shift representation \(X_k = \sum _{i = 0}^{\infty } {{\varvec{\Phi }}}^{i}(\epsilon _{k-i})\). Next, using the orthogonality of \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\), we get
On the other hand, since \(\epsilon _k\) and \(X_{k-1}\) are independent, we obtain
For \(k \ge 1\), using the triangle inequality, the linearity of \({{\varvec{\Phi }}}\), the fact that \({{\varvec{\Phi }}}(e_j^{\phi }) = \lambda _j^{\phi } e_j^{\phi }\) and (6.6) yields that
where we also used \((\sum _{j = 1}^{\infty } \lambda _j^{\epsilon } \langle e_j^{\epsilon }, e_i^{\phi } \rangle ^2)^{(q' - q)/2q} < \infty \) in the last step (recall \(q' \ge q\)). Note that we have the inequality
which can be readily derived by contradiction (assume the converse and sum over j on both sides). Hence by the triangle inequality and (6.4), the above is further bounded by
for \(0 < \rho < 1\). Combining this with (10.1), (10.2) we arrive at
If \(k = 0\), we get from (6.6) that
If \(k < 0\) we have \(\eta _{k,j}^{\theta } = (\eta _{k,j}^{\theta })'\), and hence the claim follows from (10.4) and (10.5). Observe that by telescoping and Kolmogorov’s zero one law, we also get that \(\max _{j \in \mathbb {N}}\Vert \eta _{k,j}\Vert _q < \infty \). \(\square \)
Proof of Corollary 4
This follows from Lemma 16. \(\square \)
References
Anderson, T.W.: Asymptotic theory for principal component analysis. Ann. Math. Stat. 34(1), 122–148 (1963)
Andrews, D.W.K.: Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59(3), 817–858 (1991)
Arcones, M.A.: Limit theorems for nonlinear functionals of a stationary Gaussian sequence of vectors. Ann. Probab. 22(4), 2242–2274 (1994)
Bathia, N., Yao, Q., Ziegelmann, F.: Identifying the finite dimensionality of curve time series. Ann. Stat. 38(6), 3352–3386 (2010)
Benko, M., Härdle, W., Kneip, A.: Common functional principal components. Ann. Stat. 37(1), 1–34 (2009)
Berkes, I., Gabrys, R., Horváth, L., Kokoszka, P.: Detecting changes in the mean of functional observations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71(5), 927–946 (2009)
Bhatia, R., Davis, Ch., McIntosh, A.: Perturbation of spectral subspaces and solution of linear operator equations. Linear Algebra Appl. 52(53), 45–67 (1983)
Blanchard, G., Bousquet, O., Zwald, L.: Statistical properties of kernel principal component analysis. Mach. Learn. 66(2–3), 259–294 (2007)
Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econom. 31(3), 307–327 (1986)
Bosq, D.: Linear Processes in Function Spaces: Theory and Applications. Lecture Notes in Statistics, vol. 149. Springer, New York (2000)
Cai, T.T., Hall, P.: Prediction in functional linear regression. Ann. Stat. 34(5), 2159–2179 (2006)
Cai, T.T., Yuan, M.: Minimax and adaptive prediction for functional linear regression. J. Am. Stat. Assoc. 107(499), 1201–1216 (2012)
Cardot, H., Mas, A., Sarda, P.: CLT in functional linear regression models. Probab. Theory Relat. Fields 138(3–4), 325–361 (2007)
Cavalier, L., Tsybakov, A.: Sharp adaptation for inverse problems with random noise. Probab. Theory Relat. Fields 123(3), 323–354 (2002)
Cerovecki, C., Hörmann, S.: On the CLT for discrete Fourier transforms of functional time series (2015). ArXiv e-prints
Chernozhukov, V., Chetverikov, D., Kato, K.: Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41(6), 2786–2819 (2013)
Comte, F., Johannes, J.: Adaptive functional linear regression. Ann. Stat. 40(6), 2765–2797 (2012)
Crambes, C., Mas, A.: Asymptotics of prediction in functional linear regression with functional outputs. Bernoulli 19(5B), 2627–2651 (2013)
Dauxois, J., Pousse, A., Romain, Y.: Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivar. Anal. 12(1), 136–154 (1982)
Dedecker, J., Prieur, C.: New dependence coefficients. Examples and applications to statistics. Probab. Theory Relat. Fields 132(2), 203–236 (2005)
Ding, Z., Granger, C.W.J., Engle, R.F.: A long memory property of stock market returns and a new model. J. Empir. Financ. 1(1), 83–106 (1993)
Ferraty, F.: Quintela del Río, A., Vieu, P.: Specification test for conditional distribution with functional data. Econom. Theory 28(2), 363–386 (2012)
Fremdt, S., Horváth, L., Kokoszka, P., Steinebach, J.G.: Functional data analysis with increasing number of projections. J. Multivar. Anal. 124, 313–332 (2014)
Fremdt, S., Steinebach, J.G., Horváth, L., Kokoszka, P.: Testing the equality of covariance operators in functional samples. Scand. J. Stat. 40(1), 138–152 (2013)
Gabrys, R., Hörmann, S., Kokoszka, P.: Monitoring the intraday volatility pattern. J. Time Ser. Econom. 5(2), 87–116 (2013)
Hall, P., Horowitz, J.L.: Methodology and convergence rates for functional linear regression. Ann. Stat. 35(1), 70–91 (2007)
Hall, P., Hosseini-Nasab, M.: On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 68(1), 109–126 (2006)
Hall, P., Hosseini-Nasab, M.: Theory for high-order bounds in functional principal components analysis. Math. Proc. Camb. Philos. Soc. 146(1), 225–256 (2009)
Hallin, M., Hörmann, S., Kidziński, L.: Dynamic functional principal components. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77(2), 319–348 (2015)
Han, X., Wu, W.B.: Portmanteau test and simultaneous inference for serial covariances. Stat. Sin. 24(2), 577–599 (2014)
Hilgert, N., Mas, A., Verzelen, N.: Minimax adaptive tests for the functional linear model. Ann. Stat. 41(2), 838–869 (2013)
Hörmann, S., Kidzinski, L., Kokoszka, P.: Estimation in functional lagged regression. J. Time Ser. Anal. 36(4), 541–561 (2015)
Hörmann, S., Kokoszka, P.: Weakly dependent functional data. Ann. Stat. 38(3), 1845–1884 (2010)
Horváth, L., Hušková, M., Kokoszka, P.: Testing the stability of the functional autoregressive process. J. Multivar. Anal. 101(2), 352–367 (2010)
Horváth, L., Hušková, M., Rice, G.: Test of independence for functional data. J. Multivar. Anal. 117, 100–119 (2013)
Horváth, L., Kokoszka, P.: Inference for Functional data with Applications. Springer Series in Statistics. Springer, New York (2012)
Horváth, L., Kokoszka, P., Reeder, R.: Estimation of the mean of functional time series and a two-sample problem. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75(1), 103–122 (2013)
Horváth, L., Kokoszka, P., Rice, G.: Testing stationarity of functional time series. J. Econom. 179(1), 66–82 (2014)
Ingster, Y.I., Suslina, I.A.: Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lecture Notes in Statistics, vol. 169. Springer, New York (2003)
Jackson, D.A.: Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74(8), 2204–2214 (1993)
Jirak, M.: Optimal eigen expansions and uniform bounds. Extended version. arXiv:1501.01271
Jirak, M.: On weak invariance principles for sums of dependent random functionals. Stat. Probab. Lett. 83(10), 2291–2296 (2013)
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2002)
Kraus, D., Panaretos, V.M.: Dispersion operators and resistant second-order functional data analysis. Biometrika 99(4), 813–832 (2012)
Leadbetter, M.R.: Extremes and local dependence in stationary sequences. Probab. Theory Relat. Fields 65, 291–306 (1983). doi:10.1007/BF00532484
Liu, W., Xiao, H., Wu, W.B.: Probability and moment inequalities under dependence. Stat. Sin. 23(3), 1257–1272 (2013)
Mas, A.: Testing for the mean of random curves: a penalization approach. Stat. Inference Stoch. Process. 10(2), 147–163 (2007)
Mas, A., Ruymgaart, F.: High-dimensional principal projections. Complex Anal. Oper. Theory 9(1), 35–63 (2015)
Meister, A.: Asymptotic equivalence of functional linear regression and a white noise inverse problem. Ann. Stat. 39(3), 1471–1495 (2011)
Merlevède, F., Peligrad, M., Utev, S.: Sharp conditions for the clt of linear processes in a Hilbert space. J. Theor. Probab. 10(3), 681–693 (1997)
Müller, H.-G., Sen, R., Stadtmüller, U.: Functional data analysis for volatility. J. Econom. 165(2), 233–245 (2011)
Panaretos, V.M., Kraus, D., Maddocks, J.H.: Second-order comparison of Gaussian random functions and the geometry of DNA minicircles. J. Am. Stat. Assoc. 105(490), 670–682 (2010). Supplementary materials available online
Panaretos, V.M., Tavakoli, S.: Cramér–Karhunen–Loève representation and harmonic principal component analysis of functional time series. Stoch. Process. Appl. 123(7), 2779–2807 (2013)
Panaretos, V.M., Tavakoli, S.: Fourier analysis of stationary time series in function space. Ann. Stat. 41(2), 568–603 (2013)
Peligrad, M., Utev, S.: Central limit theorem for stationary linear processes. Ann. Probab. 34(4), 1608–1622 (2006)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2005)
Wu, W.B.: Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. USA 102, 14150–14154 (2005)
Wu, W.B.: Strong invariance principles for dependent random variables. Ann. Probab. 35(6), 2294–2320 (2007)
Acknowledgments
I would like to thank the Associate Editor and in particular the anonymous Reviewer for the many constructive comments, suggestions and corrections. The generous help has been of major benefit.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jirak, M. Optimal eigen expansions and uniform bounds. Probab. Theory Relat. Fields 166, 753–799 (2016). https://doi.org/10.1007/s00440-015-0671-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-015-0671-3
Keywords
- Eigen expansion
- Short and long memory
- Lag operator
- Long-run covariance operator
- Hilbert space
- Extreme value distribution
Mathematics Subject Classification
- 62H25
- 60B12
- 62M10
- 60G70