1 Introduction

Principal component analysis (PCA) has emerged as one of the most important tools in multivariate and highdimensional data analysis. In the latter, functional principal component analysis (FPCA) is becoming more and more important. A comprehensive overview and some leading examples can be found in [36, 43, 56]. Given a functional time series \(\mathbf{X}=\{X_k\}_{k \in \mathbb {Z}}\), it is typically assumed that \(\mathbf{X}\) lies in the Hilbert space \({\mathbb {L}}^2({\mathcal {T}})\), where \({\mathcal {T}} \subset \mathbb {R}^d\) is compact. The fundamental tool in the area of PCA and FPCA—both in theory and practice—is the usage of (functional) principal components (FPC). To fix ideas, let us introduce some notation. If \(\mathbf{X}\) is stationary with \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]<\infty \), then the mean \(\mu ={\mathbb {E}}[X_k]\) and the covariance operator

$$\begin{aligned} {\varvec{\mathcal {C}}}(\cdot )={\mathbb {E}}[\langle X_k - \mu , \cdot \rangle (X_{k}-\mu )], \end{aligned}$$
(1.1)

exist. Here \(\langle \cdot ,\cdot \rangle \) denotes the inner product in \({\mathbb {L}}^2\), and \(\Vert \cdot \Vert _{{\mathbb {L}}^2}\) the corresponding norm. The eigenfunctions of \({\varvec{\mathcal {C}}}_h\) are called the functional principal components and denoted by \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), i.e; we have \({\varvec{\mathcal {C}}}(e_j) = \lambda _j e_j\), where \({\varvec{\lambda }}= \{\lambda _j\}_{j \in \mathbb {N}}\) denotes the eigenvalues. The eigenfunctions \(\mathbf{e}\) are usually estimated by the empirical eigenfunctions \(\widehat{\mathbf{e}} = \{{\widehat{e}}_j\}_{j \in \mathbb {N}}\), defined as the eigenfunctions of the empirical covariance operator

$$\begin{aligned} \widehat{\varvec{\mathcal {C}}}(\cdot ) = \frac{1}{n}\sum _{k = 1}^n \langle X_k - {\bar{X}}_n, \cdot \rangle (X_{k}-{\bar{X}}_n), \end{aligned}$$
(1.2)

where \({\bar{X}}_n = \frac{1}{n} \sum _{k = 1}^n X_k\). Hence \(\widehat{\varvec{\mathcal {C}}}({\widehat{e}}_j) = {\widehat{\lambda }}_j {\widehat{e}}_j\), where \({\widehat{{\varvec{\lambda }}}} = \{{\widehat{\lambda }}_j\}_{j \in \mathbb {N}}\) denotes the empirical eigenvalues. Due to the fundamental importance of eigenfunctions and eigenvalues for FPCA and PCA, corresponding results on the asymptotic behavior of empirical eigenfunctions and values are of high interest. Anderson [1] was among the first to give such results (see also [19]), and established a CLT for \({\widehat{\lambda }}_j\) (resp. \({\widehat{e}}_j\)) if j is fixed. Fueled from highdimensional applications, uniform bounds where j increases with the sample size n have become very important, leading to a significant rise in complexity of the problem. Well-known pathwise bounds are provided in the lemma given below (cf. [7, 10]).

Lemma 1

If \(\mathbf{X} \in {\mathbb {L}}^2({\mathcal {T}})\) and \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]<\infty ,\) then

$$\begin{aligned} |{\widehat{\lambda }}_j - \lambda _j|\le \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathcal {L}}}, \quad \Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2} \le \frac{2\sqrt{2}}{\psi _j} \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathcal {L}}}, \end{aligned}$$

where \(\psi _j = \min \{\lambda _{j-1} - \lambda _{j}, \lambda _j - \lambda _{j+1}\}\) (with \(\psi _1 = \lambda _1 - \lambda _2)\) and \(\Vert \cdot \Vert _{{\mathcal {L}}}\) denotes the operator norm.

The attractiveness of the above bounds lies in their simplicity, but unfortunately they are far from optimal from a probabilistic perspective. Indeed, the results of [19] tell us that in case of \({\widehat{\lambda }}_j - \lambda _j\), the correct bound should include the additional factor \(\lambda _j\), i.e; \(\lambda _j\Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathcal {L}}}\). A similar claim can be made for \(\Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2}\). In this spirit, based on Lemma 1, asymptotic expansions for \({\widehat{\lambda }}_j - \lambda _j\) and \(\widehat{e}_j - e_j\) which allow for increasing j have been established in [2628] (see also [10, 13, 48]). These results have proved to be an indispensable tool in the literature, see for instance [8, 13, 26, 36, 43, 49] to name a few. But the corresponding (asymptotic) analysis is often based on heavy structural assumptions regarding \(\mathbf{X}\) and the spacings (spectral gap) \({\varvec{\Psi }} = \{\psi _j\}_{j \in \mathbb {N}}\) of the eigenvalues, limiting its applicability. In particular, often only the covariance operator \({\varvec{\mathcal {C}}}\) is considered, and a common key assumption is that \(\mathbf{X}\) is an IID sequence, which is rather restrictive, see [33, 36, 54] and also Sects. 2.2 and 6.2. In the presence of serial correlation, the lag operators \({\varvec{\mathcal {C}}}_h\) and the long-run covariance operator \({\varvec{\mathcal {G}}}\), formally defined as

$$\begin{aligned} {\varvec{\mathcal {C}}}_h(\cdot ) = {\mathbb {E}}[\langle X_k - \mu , \cdot \rangle (X_{k-h} - \mu )], \quad {\varvec{\mathcal {G}}}(\cdot )= \sum _{h \in \mathbb {Z}}{\varvec{\mathcal {C}}}_h(\cdot ), \end{aligned}$$
(1.3)

serve as a generalization of \({\varvec{\mathcal {C}}} = {\varvec{\mathcal {C}}}_0\). They play a fundamental role for dependent functional time series, see for instance [29, 53, 54]. In this paper, we consider a general framework that contains both \({\varvec{\mathcal {C}}}_h\) and \({\varvec{\mathcal {G}}}\), avoiding the previously mentioned limitations. We derive exact asymptotic expansions of \({\widehat{\lambda }}_j\), \({\widehat{e}}_j\) under optimal dependence assumptions, allowing for short memory (weak dependence), but also for long memory (strong dependence) in case of \({\varvec{\mathcal {C}}}_h\), h finite. In addition, we only require a ‘natural condition’ concerning the spectral gap \({\varvec{\Psi }}\). It turns out that this condition is nearly optimal.

As a particular application, we study the relative maximum deviation of the empirical eigenvalues of \({\varvec{\mathcal {C}}}\), namely

$$\begin{aligned} T_{J_n^+}^{} = \sqrt{n}\max _{1 \le j < J_n^+}\frac{|{\widehat{\lambda }}_j - \lambda _j|}{ \sigma _{j}\lambda _j}, \end{aligned}$$

where \(J_n^+ \rightarrow \infty \), see Proposition 1 for a precise definition of \(J_n^+\). Under mild assumptions, we show that

$$\begin{aligned} a_n\left( T_{J_n^+}^{} - b_n\right) \xrightarrow {d} {\mathcal {V}}, \end{aligned}$$
(1.4)

where \({\mathcal {V}}\) is a distribution of Gumbel type. The latter is based on a high dimensional Gaussian approximation, which is of independent interest, see Theorem 8. Result (1.4) is particularly important for the construction of simultaneous confidence sets and tests for the relevant number of FPCs to be used for statistical inference or modelling (cf. [4, 43, 56]). The range of further applications is surveyed in Sect. 6. Here we also touch on the possibility of long-memory in functional time series.

An outline of the paper can be given as follows. In Sect. 2 the key expansions of \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) are established in a general framework, alongside some additional results. In particular, we discuss in detail the optimality of the underlying assumptions. Asymptotic expansions of \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) in the context of \({\varvec{\mathcal {C}}}_h\) and \({\varvec{\mathcal {{\mathcal {G}}}}}\) are established in Sects. 3 and 4, whereas Sect. 5 is devoted to the study of (1.4). Additional fields of application are surveyed in Sect. 6, with an emphasis on functional linear regression, ARH(1) processes and long-memory in a functional context. The proofs of the eigen expansions are given in Sects. 7 and 8. In Sect. 9.1, a general high dimensional Gaussian approximation under dependence is established. Based on this result, we prove (1.4) in Sect. 9.2. Finally, Sect. 10 presents the proofs of Sect. 6.

2 Preliminary notation and main asymptotic expansions

For \(p \ge 1\), denote with \(\Vert \cdot \Vert _p\) the \(L^p\)-norm \({\mathbb {E}}[|\cdot |^p]^{1/p}\). We write \(\lesssim \), \(\gtrsim \), (\(\thicksim \)) to denote (two-sided) inequalities involving a multiplicative constant, \(a \wedge b = \min \{a,b\}\) and \(a \vee b = \max \{a,b\}\). Given a set \({\mathcal {A}}\), we denote with \({\mathcal {A}}^c\) its complement. Moreover, we write \(\overline{X} = X - {\mathbb {E}}[X]\) for a random variable X.

In the sequel, it is convenient to first consider a more abstract framework. Assume that the operator \({\varvec{\mathcal {D}}}: {\mathbb {L}}^2({\mathcal {T}}) \mapsto {\mathbb {L}}^2({\mathcal {T}})\) has non-negative eigenvalues \({\varvec{\lambda }}= \{\lambda _j\}_{j \in \mathbb {N}}\) and eigenfunctions \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), and satisfies the spectral representation

$$\begin{aligned} {\varvec{\mathcal {D}}}(\cdot ) = \sum _{j = 1}^{\infty } \lambda _j \langle e_j,\cdot \rangle e_j, \quad \text {with } \sum _{j = 1}^{\infty } \lambda _j < \infty . \end{aligned}$$
(2.1)

For a sequence of non-negative numbers \(\{\widetilde{\lambda }_j\}_{j s\in \mathbb {N}}\) with \(\sum _{j = 1}^{\infty } \widetilde{\lambda }_j < \infty \) and real-valued random variables \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\}_{i,j \in \mathbb {N}}\), \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\}_{i,j \in \mathbb {N}}\) consider the empirical version

$$\begin{aligned}&\widehat{\varvec{\mathcal {D}}}(\cdot ) = \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i \widetilde{\lambda }_j}(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} - \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})\langle e_i, \cdot \rangle e_j, \quad \text {with } \widehat{\varvec{\mathcal {D}}}({\widehat{e}}_j) = {\widehat{\lambda }}_j {\widehat{e}}_j, j \in \mathbb {N},\nonumber \\&\quad \text {where we demand } {\varvec{\mathcal {D}}}(\cdot ) = \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i \widetilde{\lambda }_j}{\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}] \langle e_i, \cdot \rangle e_j. \end{aligned}$$
(2.2)

The random variables \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\) denote the contributing random components, whereas \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\) denote the negligible parts. In the sequel, both random variables depend on a sequence \({m}\rightarrow \infty \), i.e; \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}({m})\) and \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}({m})\). To simplify the notation, we often suppress this dependence if it is of no immanent relevance. This class of (empirical) operators is rich enough to include the lag operators \({\varvec{\mathcal {C}}}_h\) (in fact only \({\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\), see Sect. 3), but also the more general long-run covariance operator \({\varvec{\mathcal {G}}}\) (see Sect. 4). In order to provide an intuition for this setup, let us discuss how this translates in case of the covariance operator \({\varvec{\mathcal {C}}}\), hence \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) and \(\widehat{\varvec{\mathcal {D}}} = \widehat{\varvec{\mathcal {C}}}\). Then obviously \(\widetilde{\lambda }_j = \lambda _j\) and for \({m}= n\) we have

$$\begin{aligned} \varvec{\eta }_{i,j}^{\varvec{\mathcal {C}}}(n) = \sum _{k = 1}^n\frac{\eta _{k,i} \eta _{k,j}}{n}, \quad \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}(n) = \sum _{k,l = 1}^n \frac{\eta _{k,i} \eta _{l,j}}{n^2}, \quad \eta _{k,j} = \frac{\langle \overline{X}_k, e_j \rangle }{\lambda _j^{1/2}}. \end{aligned}$$
(2.3)

Clearly, if \(\mathbf{X}\) is stationary, then so is \(\{\eta _{k,j}\}_{k \in \mathbb {Z}, j \in \mathbb {N}}\) and hence \({\varvec{\mathcal {C}}}\) does not depend on n in this case. We also note that \({\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {C}}}] = 1\) and \({\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {C}}}] = 0\) for \(i \ne j\) since \({\mathbb {E}}[\eta _{k,i} \eta _{k,j}] = 0\) by the classical Kahunen–Loève expansion (cf. [36]). This is actually true in a more general fashion. Since \(\mathbf{e}\) are the eigenfunctions of \({\varvec{\mathcal {D}}}\), the representations in (2.1) and (2.2) yield that \((\widetilde{\lambda }_i \widetilde{\lambda }_j)^{1/2}{\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}] = 0\) for \(i \ne j\). For the sake of reference, we formulate this simple observation as a lemma.

Lemma 2

Assume \({\varvec{\mathcal {D}}}\) satisfies (2.1) and (2.2) with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Then \((\widetilde{\lambda }_i \widetilde{\lambda }_j)^{1/2}{\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} ]=0\) for \(i \ne j\) and \(\lambda _j = \widetilde{\lambda }_j {\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {D}}}]\).

Most of our results will depend on the centered version of \({\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\), i.e;

$$\begin{aligned} \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} - {\mathbb {E}}[{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}],\quad i,j \in \mathbb {N}. \end{aligned}$$

We now demand the following conditions.

Assumption 1

The operators \({\varvec{\mathcal {D}}}\), \(\widehat{\varvec{\mathcal {D}}}\) satisfy (2.1) and (2.2). Moreover, for a universal constant \(C^{\varvec{\mathcal {D}}}\) and a universal sequence \(s_{{m}}^{\varvec{\mathcal {D}}}= \mathcal {O}(1)\) and \({\mathfrak {a}}> 0\), \({\mathfrak {h}}, p \ge 1\), \(J_{{m}}^+ \in \mathbb {N}\) and \({m}\rightarrow \infty \) it holds that

(D1):

\({m}^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}({m})\Vert _q \le C^{\varvec{\mathcal {D}}}\) and \({m}^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}({m})\Vert _q \le s_{{m}}^{\varvec{\mathcal {D}}}\) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \),

(D2):

\(\max _{1 \le j \le J_{{m}}^+}\left\{ {{m}}^{-\frac{1}{2} + {\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{}}{|\lambda _j^{} - \lambda _i^{}|}, {{m}}^{-1 + 2{\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{} \lambda _j^{} }{(\lambda _j^{} - \lambda _i^{})^2}\right\} \le C^{\varvec{\mathcal {D}}}\) and \(\lambda _{J_{{m}}^+} \ge {{m}}^{-{\mathfrak {h}}}/C^{\varvec{\mathcal {D}}}\),

(D3):

\(1/C^{\varvec{\mathcal {D}}} \le {\mathbb {E}}[{\varvec{\eta }}_{j,j}^{\varvec{\mathcal {D}}}({m})] \le {C}^{\varvec{\mathcal {D}}}\) for \(j \in \mathbb {N}\) and \(\sum _{j = 1}^{\infty } {\lambda }_j \le C^{\varvec{\mathcal {D}}}\).

Remark 1

Note that in the above assumptions, \({\varvec{\lambda }}\) may depend on \({m}\). We can deal with this case in the sequel due to the universal bounds provided by \(C^{\varvec{\mathcal {D}}}\).

Let us discuss these assumptions and compare them to the literature. As a general preliminary remark, we note that all of our results have analogues in a general Hilbert space setting \({\mathbb H}\). Working in \({\mathbb {L}}^2({\mathcal {T}})\) is notationally less burdensome though, and the proofs are simpler. In particular, the Fubini–Tonelli Theorem allows to interchange the order of inner products and expectations. Since most related relevant results in the literature focus on the covariance operator \({\varvec{\mathcal {C}}}\), we also consider this setup for our discussion, i.e; \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) (and \(\widehat{\varvec{\mathcal {D}}} = \widehat{\varvec{\mathcal {C}}}\)). To this end, it is convenient to translate Assumption 1 to this special case to make the comparison transparent. Recall the notation introduced in (2.3). We then have the following result.

Proposition 1

Let \(\mathbf{X}\) be stationary with \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] \le C^{\varvec{\mathcal {C}}}\) for a universal constant \(C^{\varvec{\mathcal {C}}}\). Then \({\varvec{\mathcal {C}}}\) satisfies (2.1) and (2.2) with summable eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Assume in addition that for some \({\mathfrak {a}}> 0,{\mathfrak {h}}, p \ge 1\) and universal sequence \(s_{n}^{\varvec{\mathcal {C}}}= \mathcal {O}(1)\) we have that

(C1):

\(n^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{{\varvec{\mathcal {C}}}}(n)\Vert _q < C^{\varvec{\mathcal {C}}},\) \(n^{\frac{1}{4}}\max _{j \in \mathbb {N}}\Vert \sum _{k = 1}^n \eta _{k,j}\Vert _{2q} \le s_{n}^{\varvec{\mathcal {C}}},\) for \(q = p 2^{{\mathfrak {p}}+ 4},\) \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil ,\)

(C2):

(D2) holds with \(C^{\varvec{\mathcal {D}}} = C^{\varvec{\mathcal {C}}},\) \({m}= n,\) \(J_n^+ \in \mathbb {N}\) and \({\mathfrak {a}}\) as above.

Then Assumption 1 holds for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\) with \({\mathfrak {a}}> 0,{\mathfrak {h}}, p \ge 1,{m}= n,J_n^+ \in \mathbb {N},\) \(s_{{m}}^{\varvec{\mathcal {D}}} = s_{n}^{\varvec{\mathcal {C}}}\) and \(C^{\varvec{\mathcal {D}}} = C^{\varvec{\mathcal {C}}}\) as above.

Let us now compare the literature with Proposition 1.

Dependence assumptions: Assumption (C1) implicitly imposes a dependence assumption on the scores \(\eta _{k,j}\). In contrast to the literature (cf. [18, 27, 28, 48]), we do not require the typical independence assumption. In fact, (C1) is much more general. In Sect. 2.2 we also discuss why looking at \({\varvec{\mathcal {C}}}\) under dependence can be relevant in practice. It can be shown that (C1) holds under general, sharp weak dependence conditions. This means that if these conditions fail, we no longer have weak dependence. However, much more is valid. Suppose that \(\eta _{k,j} = \sum _{i = 0}^{\infty } \alpha _{i,j} \epsilon _{k-i,j}\) where \(\bigl \{\epsilon _{k,j}\bigr \}_{k \in \mathbb {Z},j \in \mathbb {N}}\) is standard Gaussian and IID and \(\alpha _{i,j} \thicksim i^{-\alpha }\), \(\alpha > 1/2\). Then we show in Sect. 2.2 that

$$\begin{aligned} \Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathbb {L}}^2}\Vert _2 \lesssim n^{-1/2} \quad \text {is equivalent with `(C1) holds for any fixed } p \ge 1\text {'},\nonumber \\ \end{aligned}$$
(2.4)

where \(\Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathbb {L}}^2}\) denotes the Hilbert–Schmidt-norm. Hence the rate \(n^{-1/2}\) carries over and (C1) poses no restriction, as long as we consider the CLT-domain (normalization with \(n^{-1/2}\)). In this sense, condition (C1) is optimal (in the CLT-Domain). Interestingly, this also allows for long memory sequences, and we even obtain a CLT for \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) under long memory conditions, i.e; where \(\sum _{i = 1}^{\infty } \alpha _{i,j} = \infty \), see Theorem 3. Note that it is shown in [50] that \(\sum _{i = 1}^{\infty } |\alpha _i| <\infty \) is necessary for the validity of a CLT for \(\sum _{k = 1}^n X_k\) in an infinite dimensional Hilbert space, which is different from the univariate case. Observe that condition \(\max _{j \in \mathbb {N}}\Vert n^{-3/4}\sum _{k = 1}^n \eta _{k,j}\Vert _{2q} = \mathcal {O}(1)\) is usually for ‘free’ due to the additional factor \(n^{-1/4}\), and is only necessary to control the empirical mean correction \({\bar{X}}_n\). Finally, we remark that our method of proof can also be used to derive corresponding results in the non-central domain, i.e; where \(\Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathbb {L}}^2}\Vert _2 \thicksim b_n \) with \(\sqrt{n} = \mathcal {O}(b_n)\). To keep this exposition at reasonable length, this is not pursued here.

Structural conditions for eigenvalues: (C2) is the key condition regarding the structure of the eigenvalues \(\lambda _j\). Note that the special form of the terms appearing in (C2) is no coincidence, and is connected to the variance of the asymptotic distribution of the empirical eigenfunctions \(\widehat{e}_j\) (cf. [19]). The literature (cf. [13, 18, 2628]) usually requires polynomial, exponential or convex structures regarding the decay-rate of the eigenvalues and particularly the spacing \(\psi _j\). For instance, a common minimum assumption is that \(\psi _j \gtrsim \lambda _j j^{-1}\), which reflects a polynomial behavior of the eigenvalues \(\lambda _j\). As will be discussed below Theorem 2, (C2) turns out to be much weaker, in fact, we shall see that it is nearly optimal. To get a feeling of the implications of (C2), let us consider the case where \(\lambda _j\) satisfies a convexity condition, i.e;

$$\begin{aligned} \text {the function }{\varvec{\lambda }}(x): \, x \mapsto \lambda _x\text { is convex.} \end{aligned}$$
(2.5)

If (2.5) holds, then one may verify (cf. Lemma 13) that

$$\begin{aligned} \sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i }{|\lambda _j - \lambda _i|} \lesssim j \log j \quad \text {and} \quad \sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i \lambda _j }{(\lambda _j - \lambda _i)^2} \lesssim j^2, \end{aligned}$$
(2.6)

hence (C2) is valid if \(J_n^+ \lesssim n^{1/2 - {\mathfrak {a}}} (\log n)^{-1}\). Note that these bounds are not directly influenced by the decay of \({\varvec{\lambda }}\) or \({\varvec{\Psi }}\). The convexity condition (2.5) itself is mild and includes many cases encountered in the literature (cf. [18]), in particular polynomial or exponential cases

$$\begin{aligned} \lambda _j \sim j^{{\mathfrak {r}}} \rho ^{-j},\quad 0 < \rho < 1,\ |{\mathfrak {r}}| < \infty \qquad \text {or} \qquad \lambda _j \sim j^{-{\mathfrak {r}}},\quad {\mathfrak {r}}> 1. \end{aligned}$$
(EP)

Also note that (C2) implies that the first \(J_n^+\) eigenvalues are distinct. See [19] for a flavour of results which allow for eigenspaces with rank greater than one.

Moment assumptions: The existence of all moments (often with additional Gaussian like growth conditions) is usually required in the literature (cf. [18, 27, 28, 48]) in the context of expansions for \({\widehat{\lambda }}_j, {\widehat{e}}_j\). In contrast, we only require a finite number of moments, which, however, may be large. On the other hand, all of our results will be expressed in terms of the \(\Vert \cdot \Vert _p\)-norm, and moving over to the weaker \({\mathcal {O}}_P(\cdot )\) formulation, the moment assumptions can be lowered.

For stating our results, we introduce the quantity

$$\begin{aligned} I_{i,j} = \langle (\widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}})(e_i), e_j \rangle , \quad i,j \in \mathbb {N}, \end{aligned}$$
(2.7)

which is one of the main contributing parts in the expansions given below. We first give the main results, followed by a discussion and comparison to the literature. For the empirical eigenvalues \({\widehat{\lambda }}_j\), we have the following.

Theorem 1

Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+\)

$$\begin{aligned} \left\| \max _{1 \le j \le J} \left| \frac{1}{\lambda _j}\left( {\widehat{\lambda }}_j - \lambda _j - I_{j,j}\right) \right| \right\| _p \lesssim \frac{J^{1/p}{m}^{-{\mathfrak {a}}}}{\sqrt{{m}}}. \end{aligned}$$

The above result provides an exact uniform first-order expansion for \({\widehat{\lambda }}_j\). For a nonuniform version, the factor \(J^{1/p}\) in the bound on the RHS can be dropped. Next, we state the companion result for the empirical eigenfunctions \({\widehat{e}}_j\).

Theorem 2

Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+\)

$$\begin{aligned} \left\| \max _{1 \le j \le J} \left\| \frac{1}{\sqrt{{\varLambda }_j} }\left( {\widehat{e}}_j - e_j + \frac{e_j}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 - \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }e_k \frac{I_{k,j}}{\lambda _j - \lambda _k}\right) \right\| _{{\mathbb {L}}^2}\right\| _p \lesssim \frac{J^{1/p}{m}^{-{\mathfrak {a}}}}{\sqrt{{m}}}, \end{aligned}$$

where \({\varLambda }_j = \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2},\) and we also have

$$\begin{aligned} \left\| \max _{1 \le j \le J} \left| \frac{1}{{\varLambda }_j}\left( \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 - \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{I_{k,j}^2}{(\lambda _j - \lambda _k)^2}\right) \right| \right\| _p \lesssim \frac{J^{1/p}{m}^{-{\mathfrak {a}}}}{{m}}. \end{aligned}$$

Theorem 2 provides both uniform expansions for \({\widehat{e}}_j\) and the corresponding norm. As before, the factor \(J^{1/p}\) in the bound on the RHS can be dropped for a nonuniform version. We also have a slight modification of Theorems 1 and 2.

Proposition 2

Assume that Assumption 1 holds. Then for \(1 \le J< J_{{m}}^+,\) one may replace \(\{I_{k,j}\}_{k \in \mathbb {N}}\) with \(\{(\widetilde{\lambda }_k \widetilde{\lambda }_j)^{1/2} \overline{\varvec{\eta }}_{k,j}^{\varvec{\mathcal {D}}}\}_{k \in \mathbb {N}}\) in Theorems 1 and 2. Recall also that \(\widetilde{\lambda }_j = \lambda _j/{\mathbb {E}}[{\varvec{\eta }}_{j,j}^{\varvec{\mathcal {D}}}]\) by Lemma 2.

As an immediate corollary, we obtain a probabilistic version of Lemma 1 of correct order.

Corollary 1

Assume that Assumption 1 holds. Then for \(1 \le j < J_{{m}}^+\)

$$\begin{aligned} \Vert {\widehat{\lambda }}_j - \lambda _j\Vert _p \lesssim \frac{\lambda _j}{\sqrt{{m}}} \quad \text {and} \quad \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _p \lesssim \frac{{\varLambda }_j}{{m}}. \end{aligned}$$

2.1 Previous results and comparison

Let us now compare Theorems 1 and 2 to the literature in case of \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\). It seems that the currently best known expansions in this context can be found in [28]. Among other things, it is required that \(\{X_k\}_{k \in \mathbb {Z}}\) is IID, all moments exist, and the error term \(ER_{J_n^+}\) in the expansions of \({\widehat{\lambda }}_j - \lambda _j\) (not weighted with \(\lambda _j^{-1}\)) is of magnitude

$$\begin{aligned} ER_{J_n^+} = \max _{1 \le j \le J_n^+}n^{-3/2} (1 - \xi _j)^{-1/2}\psi _{j}^{-3} \lambda _j^{-1/2} s_j,\quad s_j = \sup _{t \in {\mathcal {T}}}|e_j(t)|, \end{aligned}$$
(2.8)

and \(\xi _j \in (0,1)\) is defined as \(\xi _j = \inf _{k <j}(1 - \frac{\lambda _k}{\lambda _j})\). We emphasize that this is the overall error term, hence one requires for instance at least \(\sqrt{n}ER_{J_n^+} = \mathcal {O}(1)\) for the validity of a CLT, and \((n/\lambda _{J_n^+}^2)^{1/2}ER_{J_n^+} = \mathcal {O}(1)\) for a weighted version. If we assume the convexity condition (2.5), we see that (C2) is much weaker. In fact, takeing for instance \(\lambda _j \thicksim j^{-{\mathfrak {c}}}\) we find that \(ER_{J_n^+} \gtrsim n^{-3/2} (J_n^+)^{3 + 7{\mathfrak {c}}/2}\). On the other hand, we see from (2.6) that if \(J_n^+ \thicksim n^{1/2 - {\mathfrak {a}}}\), \({\mathfrak {a}}> 0\), we still obtain valid asymptotic expansions, i.e; the expressions containing \(I_{k,j}\) are still the principal terms in our expansions, reflecting the exact asymptotic behavior. In stark contrast, \(ER_{J_n^+}\) already explodes for \({\mathfrak {a}}\) small (resp. \({\mathfrak {c}}\) large) enough, rendering a vacuous result. Similarly, (C2) is valid if we only require

$$\begin{aligned} \max _{1 \le j \le J_n^+}n^{-1/2}/\psi _j \lesssim n^{-{\mathfrak {a}}} \quad \text {for some arbitrary } {\mathfrak {a}}> 0, \end{aligned}$$
(2.9)

and again obtain valid asymptotic expansions. On the other hand, the actual approximation error \(ER_{J_n^+}\) in [28] may even be unbounded, since \(1/\lambda _j \rightarrow \infty \) as j increases. In this sense, Assumption 1 is substantially weaker.

2.2 Dependence assumptions: optimality

Throughout this section, we assume that \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\). We first present the following result.

Theorem 3

Assume that \(\mathbf{X}\) has zero mean such that for \(\alpha > 3/4\)

$$\begin{aligned} \eta _{k,j} = \sum _{i = 0}^{\infty } \alpha _{i,j} \epsilon _{k-i,j}, \quad 0 \le \alpha _{i,j} \thicksim i^{-\alpha }\text { and } \epsilon _{k,j} \text { are standard Gaussian IID.} \end{aligned}$$

Then (C1) holds. Moreover,  if we have in addition (C2) (for \(J_n^+\) possibly finite),  then for any fixed \(1 \le j < J_n^+\)

$$\begin{aligned} \sqrt{n}({\widehat{\lambda }}_j - \lambda _j) \xrightarrow {w} {\mathcal {N}}(0,\lambda _j^2 \sigma _{\lambda _j}^2), \end{aligned}$$

where \(\xrightarrow {w}\) denotes weak convergence,  and \(\sigma _{\lambda _j}^2\) the corresponding variance. Note that an analogue result can be established for \(\widehat{e}_j,\) see [41] for details.

The above result indicates that \(\alpha = 3/4\) is the boundary value for a CLT with normalization \(\sqrt{n}\), see also the discussion in [3]. In fact, given the linear structure of \(\eta _{k,j}\) one readily computes that

$$\begin{aligned} \Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}}\Vert _{{\mathbb {L}}^2}\Vert _2 \lesssim n^{-1/2} \quad \text {iff } \alpha > 3/4. \end{aligned}$$

On the other hand, Lemma 6 below yields that (C1) implies \(\Vert \Vert \widehat{\varvec{\mathcal {C}}} - {\varvec{\mathcal {C}}} \Vert _{{\mathbb {L}}^2}\Vert _2 \lesssim n^{-1/2}\). Hence we obtain the equivalence in (2.4). Finally, note that the regime \(1/2 < \alpha \le 1\) is generally considered as long memory. Hence by Theorem 3 above, we obtain a CLT for \({\widehat{\lambda }}_j\) and \({\widehat{e}}_j\) even in the presence of long memory, where \(3/4 < \alpha \le 1\). If \(1/2 < \alpha \le 3/4\), Non-central limit theorems arise. If \(\alpha \le 1/2\), then \({\mathbb {E}}[\Vert X_0\Vert _{{\mathbb {L}}^2}^2]=\infty \), which requires a completely different treatment.

2.3 Spectral gap: almost optimality

Next, we discuss the issue of ‘almost optimality’ of condition (C2). To this end, we draw heavily from the noteworthy results of [48]. Suppose that \(\{\eta _{i,j}\}_{i,j \in \mathbb {N}}\) are IID and satisfy \({\mathbb {E}}[|\eta _{i,j}|^{2p}] \le p! C^{p-1}\) for some constant \(C > 0\). If a structure condition like (\(\mathbf{E P}\)) holds, then it is shown in [48] that

$$\begin{aligned} {\mathbb {E}}[\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2]\lesssim \frac{j^2 (\log n)^2}{n}. \end{aligned}$$
(2.10)

As can be seen from Corollary 1, this bound deviates from the optimal one by the additional factor \((\log n)^2\). On the other hand, note that in the polynomial case in (\(\mathbf{E P}\)), this bound is also valid for \(j > J_n^+\) (we require \({\mathfrak {a}}> 0\)), which is a slightly larger region. In [48], a lower bound is also provided, which is \(\frac{j^2}{n} \wedge 1\). Strictly speaking, it is proven for the projection \(\widehat{\pi }_j = {\widehat{e}}_j \otimes {\widehat{e}}_j\), where \(\otimes \) denotes the one-rank operation

$$\begin{aligned} u \otimes v (w) = \langle u, w \rangle v, \quad u,v,w \in {\mathbb {L}}^2({\mathcal {T}}). \end{aligned}$$

According to [48], it then holds that (recall that \({\mathcal {L}}\) denotes the operator norm)

$$\begin{aligned} \frac{j^2}{n} \wedge 1 \lesssim {\mathbb {E}}[\Vert \widehat{\pi }_j - \pi _j\Vert _{{\mathcal {L}}}^2]\lesssim \frac{j^2 (\log n)^2}{n} \wedge 1. \end{aligned}$$
(2.11)

On the other hand, Corollary 1 and elementary computations yield

$$\begin{aligned} {\mathbb {E}}[\Vert \widehat{\pi }_j - \pi _j\Vert _{{\mathcal {L}}}^2]\lesssim \frac{1}{n}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2} \lesssim \frac{j^2}{n}, \quad \text {if }j \le n^{1/2 - {\mathfrak {a}}} (\log n)^{-1},\quad \end{aligned}$$
(2.12)

(in the polynomial case) and thus the order of the upper and lower bounds match for \(j \le n^{1/2 - {\mathfrak {a}}} (\log n)^{-1}\). If \(j \ge n^{1/2}\), Cauchy–Schwarz yields the trivial optimal upper bound. Since \({\mathfrak {a}}> 0\) may be chosen arbitrarily small given sufficiently many (all) moments, we find that our conditions on the eigenvalues \({\varvec{\lambda }}\) are essentially optimal. In other words, we obtain exact expansions and the optimal error bound for almost the complete region of indices j where (2.12) still converges to zero.

3 Lag operator

While the covariance operator \({\varvec{\mathcal {C}}}\) is a key object for serially uncorrelated data \(\mathbf{X}\), the lag operator \({\varvec{\mathcal {C}}}_h\) and the long-run covariance operator \({\varvec{\mathcal {G}}}\) become more relevant in the presence of serial correlation, see Sects. 4 and 6 for a discussion. Here, we focus on \({\varvec{\mathcal {C}}}_h\), and then carry out a similar program for \({\varvec{\mathcal {G}}}\) in Sect. 4. To facilitate the discussion, let us first introduce a popular notion of weak dependence. In the remainder of this section, we assume that for each \(j \in \mathbb {N}\), the score sequence \(\{\eta _{k,j}\}_{k \in \mathbb {Z}}\) is a causal weak Bernoulli sequence, which can be written as

$$\begin{aligned} \eta _{k,j} = g_j(\ldots ,\epsilon _{k-1,j},\epsilon _{k,j}) \end{aligned}$$
(3.1)

for some measurable functions \(g_j\) and IID sequences \(\{{\varvec{\epsilon }}_k\}_{k \in \mathbb {Z}}\) with \({\varvec{\epsilon }}_k = \{\epsilon _{k,j}\}_{j \in \mathbb {N}}\). We do not specify any crosswise dependence between \(\epsilon _{k,i}\), \(\epsilon _{k,j}\) for \(i \ne j\), allowing for a large flexibility. Let \({\mathcal {E}}_{k,j} = (\epsilon _{i,j}, \, i \le k)\). To quantify the dependence of \(\{\eta _{k,j}\}_{k \in \mathbb {Z}}\), we adopt the coupling idea. Let \(\{\epsilon _{k,j}'\}_{k \in \mathbb {Z},j \in \mathbb {N}}\) be an IID copy of \(\{\epsilon _{k,j}\}_{k \in \mathbb {Z},j \in \mathbb {N}}\) and \({\mathcal {E}}_{k,j}' = ({\mathcal {E}}_{-1,j},\epsilon _{0,j}', \epsilon _{1,j}, \ldots , \epsilon _{k,j})\) the coupled version of \({\mathcal {E}}_{k,j}\). Then we define

$$\begin{aligned} {\varOmega }_p(k) = \max _{j \in \mathbb {N}}\Vert \eta _{k,j} - \eta _{k,j}'\Vert _p \quad \text {for } p \ge 1, \text { where } \eta _{k,j}' = g_j({\mathcal {E}}_{k,j}'). \end{aligned}$$
(3.2)

Roughly speaking, \({\varOmega }_p(k)\) measures the overall degree of dependence of \(\eta _{k,j} = g_j({\mathcal {E}}_{k,j})\) on \(\epsilon _{0,j}'\) and it is directly related to the data-generating mechanism of the underlying process ([57] refers to \({\varOmega }_p(k)\) as physical dependence measure). This dependence concept is well established in the literature, and popular processes like ARMA, GARCH, iterated random functions etc. fit into this framework (cf. [57, 58]). Consider for example the linear process \(\eta _{k,j} = \sum _{l = 0}^{\infty } \alpha _l \epsilon _{k-l,j}\) where \(\{\epsilon _{k,j}\}_{k,\in \mathbb {Z},j \in \mathbb {N}}\) is IID with \(\Vert \epsilon _{k,j}\Vert _p < \infty \). Then

$$\begin{aligned} \sum _{k = 1}^{\infty } {\varOmega }_p(k) < \infty \quad \text {holds iff } \sum _{k = 1}^{\infty } |\alpha _k| <\infty . \end{aligned}$$
(3.3)

In this sense, (3.3) is necessary for a CLT. In fact, if it is violated, one can construct examples such that

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n}\left\| \sum _{k = 1}^n \eta _{k,j} \right\| _2^2 = \infty , \quad j \in \mathbb {N}, \end{aligned}$$

and a different normalization than \(n^{-1/2}\) is required (cf. [55]). In the sequel, all dependence conditions will be expressed in terms of summability conditions of \({\varOmega }_p(k)\).

A major difference when dealing with \({\varvec{\mathcal {C}}}_h\) compared to \({\varvec{\mathcal {C}}}\) (and \({\varvec{\mathcal {G}}}\)) is that it only satisfies a singular-value decomposition (SVD) in general, i.e; there exist orthonormal Bases \(\mathbf{e} = \{e_j\}_{j \in \mathbb {N}}\), \(\mathbf{f} = \{f_j\}_{j \in \mathbb {N}}\) and a sequence of real numbers \({\varvec{\lambda }}= (\lambda _j)_{j \in \mathbb {N}}\) tending to zero such that for fixed \(h \in \mathbb {Z}\)

$$\begin{aligned} {\varvec{\mathcal {C}}}_h (\cdot ) = {\mathbb {E}}[\langle \overline{X}_k, \cdot \rangle \overline{X}_{k-h}] = \sum _{j = 1}^{\infty } \sqrt{\lambda _j} \langle e_j, \cdot \rangle f_j, \quad \text {if } {\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty . \end{aligned}$$
(3.4)

Hence a priori, \({\varvec{\mathcal {C}}}_h\) does not fit into our framework. However, by considering the symmetrized version \({\varvec{\mathcal {D}}}(\cdot ) = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h (\cdot )\), we end up with an operator that meets our requirements. Here, \({\varvec{\mathcal {C}}}_h^*\) denotes the adjoint operator of \({\varvec{\mathcal {C}}}_h\), given by

$$\begin{aligned} {\varvec{\mathcal {C}}}_h^* (\cdot ) = {\mathbb {E}}[\langle \overline{X}_{k-h}, \cdot \rangle \overline{X}_{k}] = \sum _{j = 1}^{\infty } \sqrt{\lambda _j} \langle f_j, \cdot \rangle e_j. \end{aligned}$$
(3.5)

Routine computations (with \(\overline{X}_k = \sum _{j = 1}^{\infty }\widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\)) then indeed reveal that

$$\begin{aligned} {\varvec{\mathcal {D}}}(\cdot )=\sum _{j = 1}^{\infty } \lambda _j \langle e_j, \cdot \rangle e_j = \sum _{j = 1}^{\infty } \left( \widetilde{\lambda }_j \sum _{k = 1}^{\infty } \widetilde{\lambda }_k {\mathbb {E}}[\eta _{h,k}\eta _{0,j}]^2\right) \langle e_j, \cdot \rangle e_j. \end{aligned}$$
(3.6)

Hence \({\varvec{\mathcal {D}}}\) has a spectral decomposition with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\) and satisfies (2.1). Representations (3.4), (3.5) motivate a natural plug-in estimator for \({\varvec{\mathcal {D}}}\) (cf. [10]), given as (for \(h \in \mathbb {N}\))

$$\begin{aligned} \widehat{\varvec{\mathcal {D}}}(\cdot ) = \frac{1}{(n-h)^2}\sum _{1 \le k,l \le n-h} \langle X_{l+h} - {\bar{X}}_n, X_{k+h} - {\bar{X}}_n \rangle \langle X_k- {\bar{X}}_n, \cdot \rangle (X_l - {\bar{X}}_n).\nonumber \\ \end{aligned}$$
(3.7)

The empirical SVD components \({\widehat{{\varvec{\lambda }}}}=\{{\widehat{\lambda }}_j\}_{j \in \mathbb {N}}\), \(\widehat{\mathbf{e}} = \{{\widehat{e}}_j\}_{j \in \mathbb {N}}\) and \(\widehat{\mathbf{f}} = \{\widehat{f}_j\}_{j \in \mathbb {N}}\) are then defined via

$$\begin{aligned} \widehat{\varvec{\mathcal {D}}}({\widehat{e}}_j) = {\widehat{\lambda }}_j {\widehat{e}}_j, \quad \widehat{\varvec{\mathcal {C}}}_h\bigl ({\widehat{e}}_j\bigr ) = {\widehat{\lambda }}_j^{1/2} \widehat{f}_j, \end{aligned}$$
(3.8)

where the empirical lag operator \(\widehat{\varvec{\mathcal {C}}}_h\) is given by

$$\begin{aligned} \widehat{\varvec{\mathcal {C}}}_h(\cdot ) = \frac{1}{n-h}\sum _{k = h + 1}^n \langle X_k - {\bar{X}}_n, \cdot \rangle (X_{k-h} - {\bar{X}}_n), \quad 0 \le h \le n-1, \end{aligned}$$
(3.9)

and analogously for \(-n+1 \le h < 0\). In order to apply Theorems 1 and 2 to \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\), the key objective is to validate (D1) for appropriate \(\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\) and \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\). To this end, introduce

$$\begin{aligned} A_{l,h,r,i,j} = (\eta _{l+h,r}\eta _{l,i} - {\mathbb {E}}[\eta _{l+h,r}\eta _{l,i}]){\mathbb {E}}[\eta _{l+h,r}\eta _{l,j}], \quad l,h,r,i,j \in \mathbb {N}. \end{aligned}$$

Recalling \(\overline{X}_k = \sum _{j = 1}^{\infty }\widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\), we then define \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}\) for fixed \(h \in \mathbb {N}\) as

$$\begin{aligned} \varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}(n) = \frac{1}{n-h} \sum _{l = 1}^{n-h} \sum _{r = 1}^{\infty } \widetilde{\lambda }_r (A_{l,h,r,i,j} + A_{l,h,r,j,i}) + \sum _{r = 1}^{\infty } \widetilde{\lambda }_r {\mathbb {E}}[\eta _{h,r}\eta _{0,i}]{\mathbb {E}}[\eta _{h,r}\eta _{0,j}].\nonumber \\ \end{aligned}$$
(3.10)

Note that this automatically defines \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\) via (2.2), see also (8.2) in the proof. We then have the following result.

Proposition 3

Let \(q \ge 2\) and assume \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \) and \({\varOmega }_{4q}(k) \lesssim k^{-{\mathfrak {b}}},\) \({\mathfrak {b}}>3/2\). Then \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\) and \(\widehat{\varvec{\mathcal {D}}}\) as in (3.7) satisfy (2.1) and (2.2) such that

$$\begin{aligned} n^{1/2}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\Vert _q < \infty , \quad n^{1/2}\max _{i,j \in \mathbb {N}}\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\Vert _q \lesssim n^{-1/2}. \end{aligned}$$

Related results can be established under different weak dependence conditions, see for instance [20]. Using Proposition 3, it is now easy to transfer the results, which we summarize in the following theorem.

Theorem 4

Suppose that \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2]\le C^{{\varvec{\mathcal {C}}}_h}\) for a universal constant \(C^{{\varvec{\mathcal {C}}}_h}\). Assume in addition that for some \({\mathfrak {a}}> 0,\) \({\mathfrak {h}}, p \ge 1\) we have that

\(\mathrm{(C}_\mathrm{h}\mathrm{1)}\) :

\(\Omega _{4q}(k) \lesssim k^{-{\mathfrak {b}}}\), \({\mathfrak {b}}>3/2\) for \(q = p 2^{{\mathfrak {p}}+ 4},\) \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil ,\)

\(\mathrm{(C}_\mathrm{h}\mathrm{2)}\) :

(D2) holds with \(C^{\varvec{\mathcal {D}}} = C^{{\varvec{\mathcal {C}}}_h},\) \({m}= n,\) \(J_n^+ \in \mathbb {N}\) and \({\mathfrak {a}}\) as above, 

\(\mathrm{(C}_\mathrm{h}\mathrm{3)}\) :

\(0 < \inf _{j \in \mathbb {N}} \sum _{r = 1}^{\infty } \widetilde{\lambda }_r {\mathbb {E}}[\eta _{h,r}\eta _{0,j}]^2\).

Then Assumption 1 holds for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^* {\varvec{\mathcal {C}}}_h\) and \(\widehat{\varvec{\mathcal {D}}}\) as in (3.7) with \({\mathfrak {a}}> 0,\) \({\mathfrak {h}}, p \ge 1,\) \({m}= n,\) \(J_n^+ \in \mathbb {N},\) \(s_{{m}}^{\varvec{\mathcal {D}}} = s_{n}^{\varvec{\mathcal {D}}} = n^{-1/2}\) and \(C^{\varvec{\mathcal {D}}} = C^{{\varvec{\mathcal {C}}}_h}\) as above. In particular,  Theorems 1 and 2 apply to \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\).

It remains to deal with \(\widehat{\mathbf{f}}\), which is the subject of Theorem 5 below.

Theorem 5

Grant the assumptions of Theorem 4, and let \(1 \le p' \le p\). Then

$$\begin{aligned}&\left\| \left\| \widehat{f}_j - f_j - \frac{({\widehat{\lambda }}_j - \lambda _j)f_j}{2\lambda _j} - \frac{{\varvec{\mathcal {C}}}_h({\widehat{e}}_j - e_j)+(\widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h)(e_j)}{\sqrt{\lambda _j}}\right\| _{{\mathbb {L}}^2} \right\| _{p'}\\&\quad \lesssim \frac{1}{\sqrt{\lambda _j n}}\left( \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}\Vert _{4p'} + \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

As the proof shows, Theorem 5 is essentially a concatenation of the previous results. Note in particular that the above expansion can be developed further in a straightforward manner by employing Theorems 1 and 2.

4 Long-run covariance operator

The long-run covariance operator is a natural generalization of the covariance operator in the presence of serial correlation. From a statistical perspective, this is particularly relevant in the context of the CLT, where under appropriate conditions on \(\mathbf{X}\), we have that

$$\begin{aligned} \frac{1}{\sqrt{n}}S_n = \frac{1}{\sqrt{n}}\sum _{k = 1}^n \overline{X}_k \xrightarrow {w} {\mathcal {N}}(0, {\varvec{\mathcal {G}}}) \quad \text {and} \quad \sup _{n} n^{-1/2}\Vert \Vert S_n\Vert _{{\mathbb {L}}^2}\Vert _2 < \infty , \end{aligned}$$
(4.1)

where \({\varvec{\mathcal {G}}}(\cdot )\) is the long-run covariance operator, (formally) defined as

$$\begin{aligned} {\varvec{\mathcal {G}}}(\cdot ) = \sum _{h \in \mathbb {Z}}{\varvec{\mathcal {C}}}_h(\cdot ), \quad {\varvec{\mathcal {C}}}_h(\cdot ) = {\mathbb {E}}[\langle \overline{X}_k, \cdot \rangle \overline{X}_{k-h}]. \end{aligned}$$

Note that \({\varvec{\mathcal {G}}}\) in general only exists if \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}} < \infty \), which is usually referred to as a weak dependence condition. In view of (4.1), we see that \({\varvec{\mathcal {G}}}\) takes over the role of \({\varvec{\mathcal {C}}}\) if \(\mathbf{X}\) has serial correlation: in the ‘limit case’ where \(n^{-1/2}S_n\) is distributed as \({\mathcal {N}}\bigl (0, {\varvec{\mathcal {G}}}\bigr )\), the best (in \({\mathbb {L}}^2\)-sense) finite dimensional approximations are provided by the classical Kahunen–Loève decomposition with respect to \({\varvec{\mathcal {G}}}\). Hence we can expect that for large enough n, finite dimensional approximations of \(n^{-1/2}S_n\) based on appropriate estimates \(\widehat{\varvec{\mathcal {G}}}\) are close to optimality too. We refer to [29, 37, 53, 54], and more recently [15] for further discussions. A unifying, even more general object than \({\varvec{\mathcal {G}}}\) is the spectral density operator \(\varvec{\mathcal {F}}(\theta )\), first studied in [54], which recently has attracted a lot of attention (cf. [29, 53]). A (detailed) study is beyond the scope of the present note, and is left open for future research. It appears though that at least some of the results can be transferred.

Estimation of \({\varvec{\mathcal {G}}}\) is a delicate issue, and already in the univariate/multivariate case a substantial body of literature has evolved around this problem, see for instance [2, 30] and the many references therein. In the context of functional data, we refer for instance to [29, 37, 53, 54]. The basic principle is plug-in estimation, which leads to the estimates

$$\begin{aligned} \widehat{{\varvec{\mathcal {G}}}}^b(\cdot ) = \widehat{{\varvec{\mathcal {C}}}}_0(\cdot ) + \sum _{h = 1}^b \omega _h (\widehat{{\varvec{\mathcal {C}}}}_h(\cdot ) + \widehat{{\varvec{\mathcal {C}}}}_{-h}(\cdot )), \quad \text {where } \widehat{\varvec{\mathcal {C}}}_h(\cdot )\text { is as in (3.9)}, \end{aligned}$$
(4.2)

and \(|\omega _h| \le 1\) is a sequence of weight functions. In the sequel, the choice of \(\omega _h\) has little impact on the results, and we therefore set \(\omega _h = 1\) for the remainder of this section. For consistent estimates, it is necessary that \(b = b_n \rightarrow \infty \) as n increases. Even so, in contrast to \(\widehat{\varvec{\mathcal {C}}}_h\), the estimate \(\widehat{\varvec{\mathcal {G}}}^b\) is biased. Depending on the decay rate of \(\Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}}\), the optimal choice of \(b_n\) is \(b_n \thicksim \log n\) (geometric decay), or \(b_n \thicksim n^{1/(2s + 1)}\) (polynomial decay with s), see [2]. Thus, the actual operator we are estimating is

$$\begin{aligned} {\varvec{\mathcal {G}}}^b(\cdot ) = \sum _{|h|\le b} {\varvec{\mathcal {C}}}_h(\cdot ). \end{aligned}$$
(4.3)

Note that in general \({\mathbb {E}}[\widehat{\varvec{\mathcal {G}}}^b] \ne {\varvec{\mathcal {G}}}^b\) and hence \(\widehat{\varvec{\mathcal {G}}}^b\) is still biased, but this bias is negligible. We point out that subject to some regularity conditions (cf. [54])

$$\begin{aligned} \Vert \Vert \widehat{\varvec{\mathcal {G}}}^b -{\varvec{\mathcal {G}}}^b\Vert _{{\mathbb {L}}^2}\Vert _2 \thicksim \sqrt{n/b}, \end{aligned}$$
(4.4)

which is the same rate as in the univariate case (cf. [2]). Moreover, under quite general assumptions (cf. [29, 54]), it follows that \({\varvec{\mathcal {G}}}^b\) satisfies the spectral decomposition

$$\begin{aligned} {\varvec{\mathcal {G}}}^b(\cdot ) = \sum _{j = 1}^{\infty } \lambda _j^{b} \langle e_j^b, \cdot \rangle e_j^b, \quad \sum _{j = 1}^{\infty } \lambda _j^b < \infty , \end{aligned}$$
(4.5)

with eigenvalues \({\varvec{\lambda }}^b = \{\lambda _j^b\}_{j \in \mathbb {N}}\) and eigenfunctions \(\mathbf{e}^b = \{e_j^b\}_{j \in \mathbb {N}}\). Since the actual underlying operator of interest is \({\varvec{\mathcal {G}}}^b\), it is natural to (first) express our conditions in terms of \({\varvec{\lambda }}^b\) and \(\mathbf{e}^b\). We can decompose \(\overline{X}_k\) as

$$\begin{aligned} \overline{X}_k = \sum _{j = 1}^{\infty } \sqrt{\widetilde{\lambda }_j^b} \eta _{k,j}^b e_j^b, \quad \widetilde{\lambda }_j^b = {\mathbb {E}}[\langle \overline{X}_k, e_j^b \rangle ^2],\quad \eta _{k,j}^b = \langle \overline{X}_k, e_j \rangle (\widetilde{\lambda }_j^b)^{-1/2}. \end{aligned}$$
(4.6)

Observe that in general \({\mathbb {E}}[\eta _{k,j}^b \eta _{k,i}^b] \ne 0\) for \(i \ne j\), which is different from the Kahunen–Loève expansion. In analogy to (2.3), we also introduce the quantity

$$\begin{aligned} \varvec{\eta }_{i,j}^b = \varvec{\eta }_{i,j}^{b}(n) = \sum _{k = 1}^n\frac{ {\eta _{k,i}^{b} \eta _{k,j}^{b}}}{n} + \sum _{h = 1}^{b}\sum _{k = h+1}^n \frac{\eta _{k,i}^{b} \eta _{k-h,j}^{b} + \eta _{k-h,i}^{b} \eta _{k,j}^{b}}{n-h}. \end{aligned}$$
(4.7)

It is then easy to see that

$$\begin{aligned} \widehat{\varvec{\mathcal {G}}}^b(\cdot ) = \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i^b \widetilde{\lambda }_j^b} (\varvec{\eta }_{i,j}^{b} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})\langle e_i^b, \cdot \rangle e_j^b, \quad {\varvec{\mathcal {G}}}^b(\cdot ) = \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i^b \widetilde{\lambda }_j^b} {\mathbb {E}}[\varvec{\eta }_{i,j}^{b}]\langle e_i^b, \cdot \rangle e_j^b,\nonumber \\ \end{aligned}$$
(4.8)

for appropriate (degenerate) random variables \(\{\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\}_{i,j \in \mathbb {N}}\) (see (8.14)). Takeing (4.5) into account, we see that both (4.5), (4.8) match the setup in (2.1) and (2.2). We can thus appeal to the results of Sect. 2. To this end, we translate Assumption 1 to our present setup.

Assumption 2

The sequence \(\mathbf{X}\) is stationary such that \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}} < \infty \). Moreover, for \(b = \mathcal {O}(n)\), a universal constant \(C^{\varvec{\mathcal {G}}}<\infty \) and universal sequence \(s_n^{\varvec{\mathcal {G}}} = \mathcal {O}(1)\) and \({\mathfrak {a}}> 0\), \({\mathfrak {h}}, p \ge 1\) and \(J_n^+ \in \mathbb {N}\) it holds that

(G1)\(^{b}\) :

\((n/b)^{\frac{1}{2}}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{b}(n)\Vert _q \le C^{\varvec{\mathcal {{\mathcal {G}}}}}\), \(n^{-\frac{3}{4}} b^{\frac{1}{4}} \max _{j \in \mathbb {N}}\Vert \sum _{k = 1}^n \eta _{k,j}^b\Vert _{2q} \le s_n^{\varvec{\mathcal {G}}}\) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \),

(G2)\(^{b}\) :

\(\max _{1 \le j \le J_n^+}\left\{ {(n/b)}^{-\frac{1}{2} + {\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{b}}{|\lambda _j^{b} - \lambda _i^{b}|}, {(n/b)}^{-1 + 2{\mathfrak {a}}}\sum _{\begin{array}{c} i = 1\\ i \ne j \end{array}}^{\infty } \frac{\lambda _i^{b} \lambda _j^{b} }{(\lambda _j^{b} - \lambda _i^{b})^2}\right\} \le C^{\varvec{\mathcal {G}}}\) and \(\lambda _{J_n^+}^{b} \gtrsim {(n/b)}^{-{\mathfrak {h}}}\),

(G3)\(^{b}\) :

\(1/C^{\varvec{\mathcal {G}}} \le {\mathbb {E}}[\varvec{\eta }_{j,j}^{b}] \le C^{\varvec{\mathcal {G}}}\) for \(j \in \mathbb {N}\), \(\sum _{j = 1}^{\infty } \lambda _j^b \le C^{\varvec{\mathcal {G}}}\).

Let us discuss these conditions. In view of (4.4), the choice \(m = n/b\) is quite natural. Condition (G1)\(^{b}\) is a little more explicit than (D1), but of the same nature. (G2)\(^{b}\), (G3)\(^{b}\) are essentially translations of (D2), (D3). Note that in the present formulation, (G3)\(^{b}\) reflects the common non-degeneracy assumption encountered in the time series literature.

The setup in Assumption 2 is quite general. Before looking at the possible range of applications, let us formulate the transferred results. To this end, in analogy to \(I_{i,j}\) in (2.7), we introduce \(I_{i,j}^b\) as

$$\begin{aligned} I_{i,j}^b = \langle (\widehat{\varvec{\mathcal {G}}}^b - {\varvec{\mathcal {G}}}^b)(e_i^b), e_j^b\rangle , \quad i,j \in \mathbb {N}. \end{aligned}$$
(4.9)

We then have the following general transfer result.

Theorem 6

Assume that Assumption 2 holds. Then for \(1 \le J< J_n^+,\) Theorems 1 and 2 remain valid if we substitute \(n/b,\lambda _j^b,\) \(e_j^b,{\widehat{\lambda }}_j^b,\) \({\widehat{e}}_j^b\) and \(I_{i,j}^b\) at the corresponding places. Moreover,  corresponding versions of Proposition 2 and Corollary 1 hold.

Due to the uniform bounds provided by \(C^{\varvec{\mathcal {G}}}\) in Assumption 2, Theorem 6 can either be used pointwise or uniformly in bn, depending on whether Assumption 2 holds pointwise or uniformly. The strength and weakness of Theorem 6 is that everything is essentially expressed in terms of the operator \({\varvec{\mathcal {G}}}^b\). The positive aspect is that this makes the assumptions rather general (in fact, almost optimal in a certain sense, see below). On the other hand, the drawback is that these conditions can be difficult to verify, since they explicitly depend on b. If \(b = b_n\) is a function in n this is not so useful, and uniform bounds in terms of n would be more interesting. Let us mention here that the trouble mainly originates from (G2)\(^{b}\) and not (G1)\(^{b}\). It is therefore desirable to find simple conditions that depend in a more transparent way on b, and preferably mainly on \({\varvec{\mathcal {{\mathcal {G}}}}}\). We first discuss a case where this can be accomplished rather easily.

\({\mathfrak {m}}\)-Correlated processes: We call \(\mathbf{X}\) an \({\mathfrak {m}}\)-correlated process if \({\varvec{\mathcal {C}}}_h = 0\) for \(|h| > {\mathfrak {m}}\), where \({\mathfrak {m}}\) is finite. Locally dependent processes are quite common in the literature, and often modeled as \({\mathfrak {m}}\)-dependent processes. Clearly, \({\mathfrak {m}}\)-dependency implies \({\mathfrak {m}}\)-correlation. Moreover, we get that

$$\begin{aligned} {\varvec{\mathcal {G}}}^b = \sum _{|h| \le b}{\varvec{\mathcal {C}}}_h = \sum _{|h| \le {\mathfrak {m}}}{\varvec{\mathcal {C}}}_h = {\varvec{\mathcal {G}}}^{{\mathfrak {m}}} = {\varvec{\mathcal {G}}}^{}, \quad \text {if } {\mathfrak {m}}\le b. \end{aligned}$$

Note that \({\mathfrak {m}}\)-correlation also implies that representations (4.5) and (4.8) are valid. Hence we conclude the following.

Corollary 2

If \(\mathbf{X}\) is \({\mathfrak {m}}\)-correlated and \({\mathfrak {m}}\le b,\) then we can replace \(e_j^b,\eta _{k,j}^b\) with \(e_j^{{\mathfrak {m}}},\eta _{k,j}^{{\mathfrak {m}}}\) everywhere in (4.6) and (4.7) (which alters (G1)\(^{b}\)),  and b with \({\mathfrak {m}}\) everywhere in (G2)\(^{b}\) and (G3)\(^{b}\).

Corollary 2 shows that Theorem 6 applies to a large class of processes under general and accessible conditions. Note in particular, that the optimality criterium used in Sect. 2.3 also applies since \({\mathfrak {m}}\) is finite. In the presence of \({\mathfrak {m}}\)-dependence, the conditions can be further simplified. More precisely, routine calculations reveal that (G1)\(^{b}\) can be replaced with

(G1)\(^{{\mathfrak {m}}}\) :

\(\max _{j \in \mathbb {N}}\Vert \eta _{k,j}^{{\mathfrak {m}}}\Vert _{2q}<\infty \) for \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \).

If the dependence in question is infinite, i.e.; a general weak dependence applies, then the situation is more complicated. This is discussed in more detail in an extended version in [41].

5 Maximum deviation of empirical eigenvalues

As already mentioned, Theorems 1 and 2 can be used to obtain various fluctuation results for eigenvalues or eigenfunctions. We exemplify this further in case of \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}\), mentioning that a similar program can be carried out for \({\varvec{\mathcal {D}}} = {\varvec{\mathcal {C}}}_h^*{\varvec{\mathcal {C}}}_h\), \(h \in \mathbb {Z}\) fixed. To this end, we formally introduce the longrun covariance (recall that \(\overline{X} = X - {\mathbb {E}}[X]\)) as

$$\begin{aligned} \gamma _{i,j} = \lim _{n \rightarrow \infty } \frac{1}{n}{\mathbb {E}}\left[ \sum _{k,l = 1}^{n}(\eta _{k,i}^2 - 1)(\eta _{l,j}^2 - 1)\right] . \end{aligned}$$
(5.1)

In Sect. 9.1 we show that this is well-defined given Assumption 3 below. Moreover, for \(\sigma _{j}^2 = \gamma _{j,j}\) we have the usual representation \(\sigma _{j}^2 = \sum _{k \in \mathbb {Z}} \phi _{k,j}\), where \(\phi _{k,j} = {\mathbb {C}}\text {ov}[\eta _{0,j}\eta _{0,j},\eta _{k,j}\eta _{k,j}]\). Consider \({\varvec{\mathcal {C}}}\) with eigenvalues \({\varvec{\lambda }}\) and denote with

$$\begin{aligned} T_{J}^{} = \sqrt{n}\max _{1 \le j < J}\frac{|{\widehat{\lambda }}_j - \lambda _j|}{\sigma _{j} \lambda _j}, \quad T_{J}^{Z_{}} = \max _{1 \le j < J}|Z_{j}|, \end{aligned}$$
(5.2)

where \(\{Z_{j}\}_{1 \le j < J}\) is a zero mean sequence of Gaussian random variables with correlation structure \({\varSigma }_{J}^{Z_{}} = (\rho _{i,j})_{1 \le i,j < J}\), where \(\rho _{i,j} = \gamma _{i,j}/\sigma _{i} \sigma _{j}\). In the sequel, we show that \(T_{J_n^+}^{}\) is close to \(T_{J_n^+}^{Z}\) in probability. To this end, we work under the following assumption.

Assumption 3

For \(p \ge 1\) let \(q = p 2^{{\mathfrak {p}}+ 4}\), \({\mathfrak {p}}= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil \), and assume that

(E1):

\({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \) and (C2) hold (with \({\mathfrak {a}},{\mathfrak {h}}\) as above) such that \((J_n^+)^{1/p} n^{-{\mathfrak {a}}} \lesssim n^{-\delta }\), \(\delta > 0\),

(E2):

\({\varOmega }_{2q(k)} \lesssim k^{-{\mathfrak {b}}}\), \({\mathfrak {b}}> 3/2\),

(E3):

\(\inf _j \sigma _{j}> 0\).

Note that these assumptions are mild. In particular, the decay rate \({\mathfrak {b}}\) in condition (E2) is completely independent of the underlying dimension \(J_n^+\).

Theorem 7

Grant Assumption 3. Then

$$\begin{aligned} \sup _{x \in \mathbb {R}}\left| P\left( T_{J_n^+}^{} \le x\right) - P\left( T_{J_n^+}^{Z_{}} \le x\right) \right| \lesssim n^{-C}, \quad C > 0. \end{aligned}$$

The above result provides a Gaussian approximation with an algebraic rate. Note that no conditions on the underlying covariance structure are required. If we impose a very weak decay assumption on \(\gamma _{\lambda ,i,j}\), we obtain the limit distribution.

Corollary 3

Grant Assumption 3, and assume in addition

$$\begin{aligned} |\gamma _{i,j}|\log (|i-j|) = \mathcal {O}(1) \quad \text {for }|i-j| \rightarrow \infty . \end{aligned}$$
(5.3)

Then for \(x \in \mathbb {R}\)

$$\begin{aligned} \lim _{n \rightarrow \infty }P\left( T_{J_n^+}^{}\le u_{J_n^+}(x)\right) = \exp (-e^{-x}), \end{aligned}$$

where \(u_m(x) = x/a_m + b_m\) with \(a_m = (2 \log m)^{1/2}\) and \(b_m = (2 \log m)^{1/2} - (8 \log m)^{-1/2}(\log \log m + 4\pi - 4)\) for \(m \in \mathbb {N}\).

Remark 2

Note that condition (5.3) is essentially the weakest possible currently known, see [30, 45].

Uniform control measures are an important statistical tool and have many applications. In the present context, Corollary 3 allows for the construction of simultaneous confidence bands for \({\widehat{\lambda }}_j\). This in turn is very useful to assess parametric hypothesis and decay rates of the structure of \({\varvec{\lambda }}\). A particular and important case is the determination of relevant principle components. A huge number of stopping rules have been developed in the literature (cf. [40, 43]), which all require a uniform control of \({\widehat{{\varvec{\lambda }}}}\). As pointed out by a reviewer, Corollary 3 can be particularly useful in case of threshold rules like the scree plot, see also [4] for related problems.

6 Applications

A huge bulk of testing and estimation problems in FPCA is related to the normalized scores \(\{\eta _{k,j}\}_{k \in \mathbb {Z}, j \in \mathbb {N}}\) in some way or other, where the associated operator is either \({\varvec{\mathcal {C}}}_h\) or \({\varvec{\mathcal {G}}}\). Among others, we mention (two) sample mean tests and related problems [36, 37, 47], tests about potential serial correlation, stationarity and related issues [4, 22, 24, 35, 38, 44, 52, 54], various change point problems [6, 34], and many more. Given a sample of size n, the canonical estimator of the scores is their empirical version

$$\begin{aligned} \widehat{\eta }_{k,j} = \langle X_k, {\widehat{e}}_j \rangle ({\widehat{\lambda }}_j)^{-1/2}, \quad 1 \le k \le n, \,1 \le j \le J_n^+. \end{aligned}$$

Intuitively, it is clear that the power of tests or estimation accuracy is augmented if \(J_n^+\) increases with the sample size, since more and more information is taken into account. From a theoretical statistical point of view, this can be made rigorous by minimax theory for estimates and Ingster’s (minimax)-theory for tests (cf. [31, 39]). In [23], a striking example is presented where a very large amount of principal components is required to adequately describe the data, see also [12]. Let us also mention that the necessity of uniform control of \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\) also arises in the completely different field of machine learning in the context of techniques based on Reproducing Kernel Hilbert spaces, see for instance [8]. All this highlights the importance of a uniform, accurate control of \({\widehat{{\varvec{\lambda }}}}\) and \(\widehat{\mathbf{e}}\) as \(J_n^+\) increases, and the usefulness of results like Theorems 1 and 2.

Let us briefly discuss how this relates to our main Assumption 1. Due to its general formulation, (D1) is very flexible. In particular, all the problems mentioned above can be reformulated in a (general) framework (depending on the problem and corresponding operator) such that (D1) is valid. Regarding (D2), the convexity assumption (2.5) leading to (2.6) provides a general and simple condition that is recommended for all the applications. In particular, the resulting range \(J_n^+\) of potentially allowed principal components is quite large. (D3) typically reflects a non-degeneracy condition, which usually is necessary any way in the problem at hand. We do not take this discussion any further, but rather investigate two other applications a little more detailed. The first one is the functional linear model, which contains in particular first order autoregression in Hilbert spaces (coined ARH(1) or FAR(1)). As a second, very different application, we survey how and why long-memory situations can arise in a functional context and how this relates to our results.

6.1 Functional linear regression

A fundamental regression model in a high-dimensional context is the functional linear model. Given \(\mathbf{X} = \{X_k\}_{k \in \mathbb {Z}}\), \(\mathbf{Y} = \{Y_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\), the basic model is defined as

$$\begin{aligned} X_k = {\varvec{\Phi }}(Y_{k}) + \epsilon _k, \quad k \in \mathbb {Z}, \end{aligned}$$
(6.1)

where \({{\varvec{\Phi }}}\) is a (bounded) linear operator, mapping from \({\mathbb {L}}^2({\mathcal {T}})\) to \({\mathbb {L}}^2({\mathcal {T}})\), and \({\varvec{\varepsilon }}= \{\epsilon _k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) is a noise sequence. The goal is to recover \({{\varvec{\Phi }}}\), given \(\mathbf{X}\) and \(\mathbf{Y}\), while the noise \({\varvec{\varepsilon }}\) is unknown. Observe that estimating \({{\varvec{\Phi }}}\) is an ill-posed problem, see e.g. [14] for a more detailed discussion. Model (6.1) and its many variations have been extensively studied in the literature, with active research persisting (see e.g. [32]), and it would be impossible to survey all the results. From a theoretic perspective, a significant part of the current literature (cf. [11, 13, 17, 26, 27, 49] and the extensive references therein) focuses on the case where \(\mathbf{Y}\) and \({\varvec{\varepsilon }}\) are mutually independent (which excludes ARH(1)), and in addition \(X_k, {{\varvec{\Phi }}}(Y_{k}), \epsilon _k\) are all real-valued. Hence by Riesz-representation \({{\varvec{\Phi }}}(\cdot ) = \langle x^{\phi }, \cdot \rangle \) for some \(x^{\phi } \in {\mathbb {L}}^2({\mathcal {T}})\), and it all boils down to the estimation of \(x^{\phi }\). Let us touch on the main idea for estimating \({{\varvec{\Phi }}}\). Denote with \({\varvec{\mathcal {C}}}^y\) the covariance operator of \(\mathbf{Y}\) with eigenvalues \({\varvec{\lambda }}^{y}\) and eigenfunctions \(\mathbf{e}^{y}\). For the remainder of this section, we assume that \({\varvec{\varepsilon }}= \{\epsilon _k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) is an IID sequence, and for each \(k \in \mathbb {Z}\), \(\epsilon _k\) and \(Y_k\) are independent. Applying Fubini–Tonelli we get that for \(j \in \mathbb {N}\)

$$\begin{aligned} {\varvec{\Upsilon }}(e_j)= & {} {\mathbb {E}}[\langle Y_k, e_j^{y} \rangle X_k] = {\mathbb {E}}[\langle Y_k, e_j^{y} \rangle {{\varvec{\Phi }}}(Y_k)] + {\mathbb {E}}[\langle Y_k, e_j^{y} \rangle \epsilon _k] \\= & {} {{\varvec{\Phi }}}({\mathbb {E}}[\langle Y_k, e_j^{y} \rangle Y_k]) = \lambda _j^{y} {{\varvec{\Phi }}}(e_j^{y}). \end{aligned}$$

Hence we obtain the alternative representation

$$\begin{aligned} {{\varvec{\Phi }}}(\cdot ) = \sum _{j = 1}^{\infty } {{\varvec{\Phi }}}(\langle e_j^{y}, \cdot \rangle e_j^{y}) = \sum _{j = 1}^{\infty } \frac{\lambda _j^{y} {{\varvec{\Phi }}}(e_j^{y})}{\lambda _j^{y}}\langle e_j^{y}, \cdot \rangle = \sum _{j = 1}^{\infty } \frac{{\varvec{\Upsilon }}(e_j^{y})}{\lambda _j^{y}}\langle e_j^{y}, \cdot \rangle . \end{aligned}$$
(6.2)

The advantage of this representation is that all involved quantities can be estimated. Given a truncation parameter \(b \in \mathbb {N}\), this motivates the estimate

$$\begin{aligned} \widehat{{{\varvec{\Phi }}}}^b(\cdot ) = \sum _{j = 1}^{b} \frac{1}{n} \sum _{k = 1}^n \frac{\langle Y_k, {\widehat{e}}_j^{y} \rangle X_k }{{\widehat{\lambda }}_j^{y}}\langle {\widehat{e}}_j^{y}, \cdot \rangle , \quad b = b_n \rightarrow \infty \text { as } n \text { increases.} \end{aligned}$$
(6.3)

In special cases, it is known that (a version of) \(\widehat{{{\varvec{\Phi }}}}^b\) is sharp minimax optimal (cf. [49]), and adaptive in slightly more general situations (cf. [17]). The construction of \(\widehat{{{\varvec{\Phi }}}}^b\) illustrates the necessity of an accurate control of \({\widehat{{\varvec{\lambda }}}}^y\) and \(\widehat{\mathbf{e}}^{y}\). We remark that Proposition 1 is very useful in this context. Not only can it be used to obtain precise bounds for prediction errors or the actual estimation error \(\Vert \widehat{{{\varvec{\Phi }}}}^b - {{{\varvec{\Phi }}}}\Vert _{\mathcal {L}}\) itself, but also for deriving various limit theorems for functions of \(\widehat{{{\varvec{\Phi }}}}^b\), which requires exact expansions. Limit theorems in turn are required for goodness of fit tests or the construction of confidence sets.

Let us now consider the setup where \(Y_k = X_{k-1}\), which is exactly the case of an ARH(1) process. Note that for \(p\in \mathbb {N}\) finite any ARH(p) process can be reformulated as an ARH(1) process by changing the underlying Hilbert space, see [10] for details. Below in Corollary 4, we provide simple yet general conditions that imply the validity of Proposition 1 for ARH(1)-processes. In view of the discussion about the convexity condition in (2.5) leading to (2.6), providing a general and simple condition, we only touch on the validity of (C1). Regarding the operator \({{\varvec{\Phi }}}\), we assume that it possesses the spectral decomposition

$$\begin{aligned} {{\varvec{\Phi }}}(\cdot ) = \sum _{j = 1}^{\infty } \lambda _j^{\phi } \langle e_j^{\phi }, \cdot \rangle e_j^{\phi }, \quad \sum _{j = 1}^{\infty } \lambda _j^{\phi } < 1, \end{aligned}$$
(6.4)

with eigenvalues \({\varvec{\lambda }}^{\phi }\) and eigenfunctions \(\mathbf{e}^{\phi }\). In the sequel, let \({\varvec{\Theta }}\) be any operator with eigenvalues \({\varvec{\lambda }}^{\theta }\) and eigenfunctions \(\mathbf{e}^{\theta }\) satisfying the spectral decomposition

$$\begin{aligned} {\varvec{\Theta }}(\cdot ) = \sum _{j = 1}^{\infty } \lambda _j^{\theta } \langle e_j^{\theta }, \cdot \rangle e_j^{\theta }, \quad \sum _{j = 1}^{\infty } \lambda _j^{\theta } < \infty . \end{aligned}$$
(6.5)

Natural candidates for \({\varvec{\Theta }}\) in our framework are of course the operators \({\varvec{\mathcal {C}}}_h^*{\varvec{\mathcal {C}}}_h\) or \({\varvec{\mathcal {G}}}^b\). We have the associated usual decomposition of \(X_k\), given as

$$\begin{aligned} X_k = \sum _{j = 1}^{\infty } \sqrt{\widetilde{\lambda }_j^{\theta }} \eta _{k,j}^{\theta } e_j^{\theta },\qquad k \in \mathbb {Z}, \quad \widetilde{\lambda }_j^{\theta } = {\mathbb {E}}[\langle X_k, e_j^{\theta } \rangle ^2],\quad \eta _{k,j}^{\theta } = \langle \overline{X}_k,e_j^{\theta } \rangle (\widetilde{\lambda }_j^{\theta })^{-1/2}. \end{aligned}$$

Similarly, denote with \({\varvec{\mathcal {C}}}^{\epsilon }\) the covariance operator of \(\epsilon _k\) with eigenvalues \({\varvec{\lambda }}^{\epsilon }\) and eigenfunctions \(\mathbf{e}^{\epsilon }\), and consider the decomposition \(\epsilon _k = \sum _{j = 1}^{\infty } \sqrt{{\lambda }_j^{\epsilon }} \epsilon _{k,j} e_j^{\epsilon }\), \(k \in \mathbb {Z}\). We make the following distributional assumption for \(\epsilon _k\). Given \(q \ge 1\), there exists a \(q' \ge q\) and a constant \(C_q > 0\) such that

$$\begin{aligned} \forall x \in {\mathbb {L}}^2({\mathcal {T}}) \text { with } \Vert x\Vert _{{\mathbb {L}}^2} = 1 \text { it holds that } \Vert \langle \epsilon _k, x \rangle \Vert _q^{2q} \le C_q (\Vert \langle \epsilon _k, x \rangle \Vert _2^{2})^{q'}.\quad \end{aligned}$$
(6.6)

Condition (6.6) is mild and allows for a certain invariance in or results, see below for more details. A general example satisfying (6.6) with \(q' = q\) is the following. Suppose that for each fixed \(k \in \mathbb {Z}\), \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\) forms a martingale difference sequence with respect to some filtration \({\mathcal {F}}_{k,j}^{\epsilon }\). Elementary calculations together with Burkholders inequality then yield the validity of (6.6). Note that since the scores of a covariance operator always have zero correlation, demanding an underlying martingale structure is a reasonable assumption. Observe that in the Gaussian case, we even have that \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\) is IID, which is a common assumption in the literature. Next, recall the notion of weak dependence introduced in Sect. 3. We then have the following result.

Proposition 4

Assume that \({{\varvec{\Phi }}},\) \({\varvec{\Theta }}\) satisfy representations (6.4),  (6.5). If \({\mathbb {E}}[\Vert \epsilon _k\Vert _{{\mathbb {L}}^2}]<\infty ,\) then \(\mathbf{X}\) is a stationary Bernoulli-shift process which can be written as \(X_k = \sum _{i = 0}^{\infty } {{\varvec{\Phi }}}^{i}(\epsilon _{k-i})\). If in addition \(\{\epsilon _k\}_{k \in \mathbb {Z}}\) satisfies (6.6) for some \(2 \le q \le q',\) then

$$\begin{aligned} \max _{j \in \mathbb {N}}\Vert \eta _{k,j}^{\theta } - (\eta _{k,j}^{\theta })'\Vert _q \lesssim \rho ^k, \quad 0 < \rho < 1,\ k \in \mathbb {N}. \end{aligned}$$
(6.7)

Note that the geometric contraction property in (6.7) is independent of the underlying orthonormal basis \(\mathbf{e}^{\theta }\), which is a desirable property. A check of the proof reveals that this essentially follows from condition (6.6). We also remark that Proposition 4 can be extended to more general ARH(p)-processes using the same method as in [10]. Denote with \({\varvec{\mathcal {C}}}^x\) the covariance operator of \(\mathbf{X}\), and let \({\varvec{\Theta }} = {\varvec{\mathcal {C}}}^x\). We then obtain the following result.

Corollary 4

Grant the assumptions of Proposition 4 and let \({\varvec{\Theta }} = {\varvec{\mathcal {C}}}^x\). Then there exists a universal constant \(C^{\varvec{\mathcal {C}}}\) and universal sequence \(s_n^{\varvec{\mathcal {C}}} \lesssim n^{-1/4}\) such that (C1) holds.

A related result can be established for \({\varvec{\Theta }} = {\varvec{\mathcal {G}}}^b\), we omit the details.

6.2 Weak and long memory in econometric and financial time series

In the presence of serial dependence, the covariance operator \({\varvec{\mathcal {C}}}\) as a single object is not so relevant in the context of a CLT, and the long-run operator \({\varvec{\mathcal {{\mathcal {G}}}}}\) is the key object. However, this can be entirely different if only serial dependence is present, but essentially no serial correlation, which is often the case in financial or econometric time series. More recently, there has been considerable activity (see for instance [5, 25] and particularly [51]) to model financial or econometric time series with the help of FPCA. In this context, it is well-known (cf. [9]), that (differenced) stock returns often display a martingale like behavior, which forms the basis for many financial discrete time models (e.g. GARCH) and continuous time models (e.g. semimartingales). On the other hand, it is equally known that the absolute or squared returns display a completely different behavior, and sometimes even exhibit long memory (cf. [21]). As a general example, let us consider the case where \(\{\epsilon _k\}_{k \in \mathbb {Z}}\) is an IID sequence in \({\mathbb {L}}^2({\mathcal {T}})\), \(\{X_k\}_{k \in \mathbb {Z}}\), \(\{Y_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2({\mathcal {T}})\) are stationary and satisfy the structural equation

$$\begin{aligned} X_k = \epsilon _k Y_{k-1}, \qquad k \in \mathbb {Z}, \quad Y_k \in {\mathcal {E}}_k \text { with } {\mathcal {E}}_k = \sigma (\epsilon _j, \, j \le k). \end{aligned}$$
(6.8)

Note that the GARCH-model is a special case of (6.8), see also Example 2.4 in [33]. Observe that \(X_k\) is a martingale difference sequence with respect to \({\mathcal {E}}_k\). On the other hand, \(X_k^2\) (or \(|X_k|\)) can behave completely differently due to \(\{Y_k\}_{k \in \mathbb {Z}}\), as is desired from a modelling perspective. This becomes relevant for the estimator \(\widehat{\varvec{\mathcal {C}}}\). While we still have by the martingale CLT (up to mild regularity conditions)

$$\begin{aligned} n^{-1/2}\sum _{k = 1}^n X_k \xrightarrow {w} {\mathcal {N}}(0, {\varvec{\mathcal {C}}}), \end{aligned}$$

the standard estimator \(\widehat{\varvec{\mathcal {C}}}\) as in (1.2) in contrast is based on \(X_k^2\). Depending on the behavior of \(\{Y_k\}_{k \in \mathbb {Z}}\), we may thus witness the full palette of dependence when employing \(\widehat{\varvec{\mathcal {C}}}\), ranging from independence to weak dependence or even a long memory behavior of \(X_k^2\). Due to the high degree of flexibility in (C1), our results thus provide the necessary tools for a more detailed analysis of the model in (6.8).

7 Proofs of asymptotic expansions

We introduce the following additional notation. Given functions \(f,g \in {\mathbb {L}}^2({\mathcal {T}})\) and a kernel \(\mathbf {K}(r,s)\), we write

$$\begin{aligned} \int _{{\mathcal {T}}} f g = \int _{{\mathcal {T}}} f(r) g(r)\,dr \quad \text {and} \quad \int _{{\mathcal {T}}^2} \mathbf {K} f g = \int _{{\mathcal {T}}^2} \mathbf {K}(r,s) f(r) g(s) \,dr \, ds.\quad \end{aligned}$$
(7.1)

If we have \(f = g\), then we write \(f^2 = f(r)^2\) and otherwise \(f f = f(r) f(s)\) in the above notation. We interchangeably use \(\langle \cdot , \cdot \rangle \) and \(\int _{{\mathcal {T}}} \cdot \), the latter being more convenient when dealing with kernels. We also frequently apply Fubini–Tonelli without mentioning it any further. Next, we introduce the empirical kernel \(\widehat{\mathbf {D}}\) and its analogue deterministic version \({\mathbf {D}}\) as

$$\begin{aligned} \widehat{\mathbf {D}}= & {} \widehat{\mathbf {D}}(r,s) = \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i \widetilde{\lambda }_j} (\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}) e_i(r)e_j(s)\quad \left( \text {note: } \widehat{\varvec{\mathcal {D}}}(f) = \int _{{\mathcal {T}}} \widehat{\mathbf {D}}f\right) , \nonumber \\ \mathbf {D}= & {} {\mathbf {D}}(r,s)=\sum _{j = 1}^{\infty } \widetilde{\lambda }_j {\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {D}}}] e_j(r)e_j(s), \quad \left( \text {note: }\varvec{\mathcal {D}}(f) = \int _{{\mathcal {T}}} {\mathbf {D}} f\right) . \end{aligned}$$
(7.2)

We first establish the transfer result of Proposition 1.

Proof of Proposition 1

Due to \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^2] < \infty \), standard arguments (cf. [29]) reveal that \({\varvec{\mathcal {C}}}\) exists and satisfies (2.1) and (2.2) with eigenvalues \({\varvec{\lambda }}\) and eigenfunctions \(\mathbf{e}\). Moreover, we have that \({\varvec{\mathcal {C}}}\) is of trace class. Since \({m}= n\), by virtue of (C2) and since \({\mathbb {E}}[\eta _{k,j}^2]= 1\) for \(j \in \mathbb {N}\), we only need to verify (D1). Due to (C1), it suffices to establish a bound for \(\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\Vert _q\). However, using (2.3), Cauchy–Schwarz and (C1), the claim follows. \(\square \)

We now turn to the proofs of Theorems 1 and 2, which are developed in a series of lemmas. As starting point, we recall the following elementary preliminary result (cf. [10]).

Lemma 3

We have the decomposition

$$\begin{aligned} {\widehat{\lambda }}_j \int _{{\mathcal {T}}}e_k ({\widehat{e}}_j - e_j)= & {} \lambda _k \int _{{\mathcal {T}}}e_k ({\widehat{e}}_j - e_j) \nonumber \\&+\int _{{\mathcal {T}}^2} (\widehat{\mathbf {D}} - \mathbf{D}) {e}_k {e}_j + \int _{{\mathcal {T}}^2} (\widehat{\mathbf {D}} - \mathbf{D}) {e}_k ({\widehat{e}}_j - e_j). \end{aligned}$$
(7.3)

Rearranging terms, we obtain from the above that (provided \(\lambda _k \ne \lambda _j\))

(7.4)

and

$$\begin{aligned} \int _{{\mathcal {T}}}e_k ({\widehat{e}}_j - e_j)=\frac{\lambda _k - \lambda _j}{{\widehat{\lambda }}_j - \lambda _j + \lambda _j - \lambda _k} \frac{1}{\lambda _j - \lambda _k}(I_{k,j} + {\textit{II}}_{k,j}). \end{aligned}$$
(7.5)

Due to the frequent use of relations (7.4) and (7.5), it is convenient to use the abbreviation

$$\begin{aligned} E_{k,j} = \int _{{\mathcal {T}}}e_k ({\widehat{e}}_j - e_j) = \langle e_k, {\widehat{e}}_j - e_j \rangle \end{aligned}$$

in the sequel. We also recall the following lemma (cf. [10]).

Lemma 4

For any \(j \in \mathbb {N}\) we have

$$\begin{aligned} \int _{{\mathcal {T}}} ({\widehat{e}}_j - e_j) {\widehat{e}}_j = \frac{1}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \quad \text {and}\quad \int _{{\mathcal {T}}} ({\widehat{e}}_j - e_j) e_j = -\frac{1}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2. \end{aligned}$$

We proceed by deriving subsequent bounds for \(I_{k,j}, \textit{II}_{k,j}\) and \(\textit{III}_{k,j}\).

Lemma 5

Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+4}\) we have

$$\begin{aligned} \Vert I_{k,j}\Vert _q \lesssim {m}^{-1/2}\sqrt{\lambda _k \lambda _j} \quad \text {uniformly for } k,j \in \mathbb {N}. \end{aligned}$$

Proof of Lemma 5

Using the orthogonality of \(e_j,e_k\) we have

$$\begin{aligned} I_{k,j} = \int _{{\mathcal {T}}^2} (\widehat{\mathbf {D}} - \mathbf{D}) e_k e_j = {m}^{-1/2} \sqrt{\widetilde{\lambda }_k \widetilde{\lambda }_j} {m}^{1/2}(\overline{\varvec{\eta }}_{k,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{k,j}^{\varvec{\mathcal {R}}}), \end{aligned}$$

hence the claim follows from (D1), Lemma 2 and (D3). \(\square \)

Lemma 6

Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+3}\) we have

$$\begin{aligned} \Vert \Vert \widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}}\Vert _{\mathcal {L}}\Vert _q \lesssim \Vert \Vert \widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}}\Vert _{{{\mathbb {L}}^2}}\Vert _q \lesssim {m}^{-1/2}. \end{aligned}$$

Proof of Lemma 6

Since the Hilbert–Schmidt norm dominates the Operator norm, Parseval’s identity and Lemma 5 yield the claim, using that (D3) supplies \(\sum _{j = 1}^{\infty }\lambda _j < \infty \). \(\square \)

Lemma 7

Assume that Assumption 1 holds. Then for \(1 \le q \le p2^{{\mathfrak {p}}+4}\) and \(k \in \mathbb {N}\) we have

$$\begin{aligned} \left\| \max _{1 \le j \le J_{{m}}^+}\frac{|{\textit{II}}_{k,j}|}{\Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2}}\right\| _{q} \lesssim \sqrt{\lambda _k} {m}^{-1/2}. \end{aligned}$$

Proof of Lemma 7

It holds that

$$\begin{aligned} {\textit{II}}_{k,j} = \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D}) e_k({\widehat{e}}_j - e_j) = \sum _{i = 1}^{\infty } \sqrt{\widetilde{\lambda }_k \widetilde{\lambda }_i} (\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})E_{i,j}. \end{aligned}$$
(7.6)

Since \(\sum _{i = 1}^{\infty } E_{i,j}^2 = \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\) by Parseval’s identity, the Cauchy–Schwarz inequality gives

$$\begin{aligned} \left| \sum _{i = 1}^{\infty } \sqrt{\widetilde{\lambda }_i} E_{i,j} (\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})\right| \le \left( \sum _{i = 1}^{\infty } \widetilde{\lambda }_i (\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})^2\right) ^{1/2} \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}. \end{aligned}$$
(7.7)

Hence the triangle inequality, (D1) and Lemma 2 together with (D3) yield

$$\begin{aligned} \left\| \max _{1 \le j \le J_{{m}}^+}\frac{|{\textit{II}}_{k,j}|}{\Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2}}\right\| _{q}\le & {} {m}^{-1/2}\sqrt{\widetilde{\lambda }_k} \left( \sum _{i = 1}^{\infty } \widetilde{\lambda }_i {m}\Vert (\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}})^2\Vert _{q/2}\right) ^{1/2} \\\lesssim & {} {m}^{-1/2}\sqrt{\widetilde{\lambda }_k} \lesssim {m}^{-1/2}\sqrt{\lambda _k}. \end{aligned}$$

\(\square \)

Lemma 8

Assume that Assumption 1 holds,  and let \({\mathcal {A}}_j = \{|{\widehat{\lambda }}_j - \lambda _j| \le \psi _j/2\}\). Then

$$\begin{aligned} \max _{1 \le j < J_{{m}}^+}P({\mathcal {A}}_j^c)\lesssim {m}^{-{\mathfrak {a}}p 2^{{\mathfrak {p}}+4}}. \end{aligned}$$

Proof of Lemma 8

Proceeding as in Lemma E.2 and E.1 in the supplement of [31] (or likewise Lemma 18, Lemma 16 in [48]), it follows that for some absolute constant \(C > 0\)

$$\begin{aligned} P({\mathcal {A}}_j^c) \lesssim P\left( \sum _{\begin{array}{c} k,l = 1\\ k,l \ne j \end{array}}^{\infty } \frac{I_{k,l}^2}{|\lambda _k - \lambda _{j}||\lambda _l - \lambda _{j}|} + \frac{I_{j,j}^2}{\psi _j^2} + \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{I_{k,j}^2}{|\lambda _k - \lambda _j|\psi _j}\ge C\right) . \end{aligned}$$

Let \(p^* = p 2^{{\mathfrak {p}}+4}\). Then by the triangle inequality and Lemma 5

$$\begin{aligned} \max _{1 \le j < J_{{m}}^+}\left\| \sum _{\begin{array}{c} k,l = 1\\ k,l \ne j \end{array}}^{\infty } \frac{I_{k,l}^2}{|\lambda _k - \lambda _{j}||\lambda _l - \lambda _{j}|}\right\| _{p*/2} \lesssim \max _{1 \le j \le J_{{m}}^+}\left( \frac{1}{\sqrt{{m}}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _k}{|\lambda _k - \lambda _{j}|}\right) ^2.\nonumber \\ \end{aligned}$$
(7.8)

Similarly, we get that

$$\begin{aligned}&\max _{1 \le j < J_{{m}}^+}\left\| \frac{I_{j,j}^2}{\psi _j^2}\right\| _{p*/2}, \max _{1 \le j < J_{{m}}^+}\left\| \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }\frac{I_{k,j}^2}{|\lambda _k - \lambda _j|\psi _j}\right\| _{p*/2}\nonumber \\&\quad \lesssim \max _{1 \le j \le J_{{m}}^+} \left( \frac{1}{\sqrt{{m}}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _k}{|\lambda _k - \lambda _{j}|}\right) ^2. \end{aligned}$$
(7.9)

Observe that due to (D2), (7.8) and (7.9) are bounded by \(\lesssim {m}^{-2{\mathfrak {a}}}\). Hence an application of Markov’s and the triangle inequality yields the claim. \(\square \)

The next result is our key technical lemma.

Lemma 9

Assume that Assumption 1 holds. Then uniformly for \(1 \le q \le p 2^{{\mathfrak {p}}/2 + 3}, k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)

$$\begin{aligned} \Vert {\textit{II}}_{k,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{q} \lesssim \frac{\sqrt{\lambda _k \lambda _j}}{\sqrt{{m}}}(\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{2q} + {m}^{-{\mathfrak {a}}}). \end{aligned}$$

Proof of Lemma 9

Note first that by construction of \({\mathcal {A}}_j\), we have that

$$\begin{aligned} \left| \frac{\lambda _j - \lambda _l}{{\widehat{\lambda }}_j - \lambda _j + \lambda _j - \lambda _l}\mathbf {1}({\mathcal {A}}_j)\right| \le 2, \quad \text {for }l \ne j. \end{aligned}$$
(7.10)

Using the decomposition in (7.5) and bound (7.10), we obtain that

$$\begin{aligned} |E_{l,j}\mathbf {1}({\mathcal {A}}_j)|\le \frac{2}{|\lambda _j - \lambda _l|} (|I_{l,j}| + |{\textit{II}}_{l,j}|) \mathbf {1}({\mathcal {A}}_j). \end{aligned}$$
(7.11)

We now use a backward inductive argument. Let \(p_{i} = p 2^{i}\), \(\tau \ge 0\), and suppose we have uniformly for \(k \in \mathbb {N}\)

$$\begin{aligned} \Vert {\textit{II}}_{k,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{p_{i}} \lesssim {m}^{-1/2}\sqrt{\lambda _k} (\sqrt{\lambda _j} + {m}^{- \tau })\quad \text {for some }{i} \le {\mathfrak {p}}+ 4. \end{aligned}$$
(7.12)

Then we obtain from (7.11), the triangle inequality and Lemma 5 that for \(l \ne j\)

$$\begin{aligned} \Vert E_{l,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{p_{i}} \lesssim {m}^{-1/2}\frac{\sqrt{\lambda _l}}{|\lambda _j - \lambda _l|}\left( \sqrt{\lambda _j} + {m}^{-\tau }\right) . \end{aligned}$$
(7.13)

Using decomposition (7.6), Cauchy–Schwarz and Lemma 2 together with (D3), we get

$$\begin{aligned} \Vert {\textit{II}}_{k,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{p_{{i}-1}} \lesssim \sqrt{\lambda _k}\sum _{l = 1}^{\infty } \sqrt{\lambda _l} \Vert E_{l,j}\mathbf {1}({\mathcal {A}}_j)\Vert _{p_{i}} \Vert \overline{\varvec{\eta }}_{k,j}^{\varvec{\mathcal {D}}} + \varvec{\eta }_{k,j}^{\varvec{\mathcal {R}}}\Vert _{p_{i}}, \end{aligned}$$

hence we obtain from Lemma 4, inequality (7.13) and (D1), (D2) that

$$\begin{aligned} \Vert {\textit{II}}_{k,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{p_{{i}-1}}\lesssim & {} \frac{\sqrt{\lambda _k}}{\sqrt{{m}}} \left( \sqrt{\lambda _j}\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{p_{i}} + \frac{1}{\sqrt{{m}}}\sum _{\begin{array}{c} l = 1\\ l \ne j \end{array}}^{\infty } \frac{\lambda _l \left( \sqrt{\lambda _j} + {m}^{-\tau }\right) }{|\lambda _l - \lambda _j|}\right) \nonumber \\\lesssim & {} \frac{\sqrt{\lambda _k}}{\sqrt{{m}}} \left( (\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{p_{i}} + {m}^{-{\mathfrak {a}}})\sqrt{\lambda _j} + {m}^{-{\mathfrak {a}}- \tau }\right) , \end{aligned}$$
(7.14)

and this bound holds uniformly for \(k \in \mathbb {N}\). Observe that we have now shown the validity of relation (7.12) with the updated value \(\tau = \tau + {\mathfrak {a}}\), but with respect to \(p_{{i}-1}\) instead of \(p_{i}\). Since \(\lambda _j \gtrsim {m}^{-{\mathfrak {h}}}\) with \({\mathfrak {h}}\ge 1\), it follows that after at most \({\mathfrak {p}}/2 + 1= \lceil {\mathfrak {h}}/{\mathfrak {a}}\rceil /2 + 1\) iterations we have

$$\begin{aligned} \Vert {\textit{II}}_{k,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{q^*} \lesssim \frac{\sqrt{\lambda _k \lambda _j}}{\sqrt{{m}}}(\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{2q^*} + {m}^{-{\mathfrak {a}}}), \end{aligned}$$

where \(q^* = p 2^{{\mathfrak {p}}/2 + 3}\). By Lemma 7, relation (7.12) is true for \(\tau = 0\) (hence \({m}^{\tau } = 1\)) and \({i} = {\mathfrak {p}}+4\), constituting the basis induction step, hence the proof is complete. Note that we have also shown

$$\begin{aligned} \Vert E_{l,j} \mathbf {1}({\mathcal {A}}_j)\Vert _{q^*} \lesssim {m}^{-1/2}\frac{\sqrt{\lambda _l \lambda _j}}{|\lambda _j - \lambda _l|}, \end{aligned}$$
(7.15)

which is of further relevance in the sequel. \(\square \)

Proposition 5

Assume that Assumption 1 holds. Then for \(1 \le q \le p 2^{{\mathfrak {p}}/2+2}\) we have uniformly for \(1 \le j < J_{{m}}^+\)

$$\begin{aligned} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{q} \lesssim P({\mathcal {A}}_j^c)^{1/q} + {{m}}^{-1} \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _j \lambda _k}{(\lambda _k - \lambda _j)^2}\lesssim {{m}}^{-2{\mathfrak {a}}}. \end{aligned}$$

Proof of Proposition 5

The triangle inequality and Cauchy–Schwarz give

$$\begin{aligned} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{q} \le 2 P({\mathcal {A}}_j^c)^{1/q} + \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\mathbf {1}({\mathcal {A}}_j)\Vert _{q}. \end{aligned}$$
(7.16)

We now invoke the ‘traditional’ way of bounding \(\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\), (cf. [10, 36]), which uses the inequality

$$\begin{aligned} \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \le 2 \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } E_{k,j}^2. \end{aligned}$$
(7.17)

Hence using (7.15) and the triangle inequality, we obtain from (D2) that

$$\begin{aligned} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\mathbf {1}({\mathcal {A}}_j)\Vert _{q} \le 2\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \Vert E_{k,j}^2\mathbf {1}({\mathcal {A}}_j)\Vert _q \lesssim \frac{1}{{m}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _l \lambda _j}{(\lambda _j - \lambda _l)^2} \lesssim {{m}}^{-2 {\mathfrak {a}}}. \end{aligned}$$

Combining this with (7.16) gives the first inequality, Lemma 8 and Assumption 1 yield the second part. \(\square \)

Note that \({\mathfrak {a}}\le 1/2\) and hence \({\mathfrak {p}}/2 \ge {\mathfrak {h}}\ge 1\) and \(2^{{\mathfrak {p}}/2 + 2} \ge 8\). Since

$$\begin{aligned} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{2q} \le \sqrt{2} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{q}^{1/2} \quad \text {for } q \ge 1, \end{aligned}$$

we obtain the following corollary to Lemma 9.

Corollary 5

Assume that Assumption 1 holds. Then for \(1 \le q \le 8p\) we have uniformly for \(k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)

$$\begin{aligned} \Vert {\textit{II}}_{k,j}\Vert _{q} \lesssim \frac{\sqrt{\lambda _j \lambda _k}}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}}. \end{aligned}$$

Proof of Corollary 5

Lemmas 79 and Cauchy–Schwarz give

$$\begin{aligned} \Vert {\textit{II}}_{k,j}\Vert _{q}\le & {} \Vert {\textit{II}}_{k,j}\mathbf {1}({\mathcal {A}}_j)\Vert _{q} + \Vert {\textit{II}}_{k,j}\mathbf {1}({\mathcal {A}}_j^c)\Vert _{q} \lesssim \frac{\sqrt{\lambda _j \lambda _k}}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}} + \frac{\sqrt{\lambda _k}}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}} P({\mathcal {A}}_j^c)^{1/2q} \\\lesssim & {} \frac{\sqrt{\lambda _j \lambda _k}}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}} + \frac{\sqrt{\lambda _k}}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}} {{m}}^{-{\mathfrak {a}}p 2^{{\mathfrak {p}}+ 3}/q}. \end{aligned}$$

Since \({\mathfrak {a}}p 2^{{\mathfrak {p}}+ 3}/q \ge {\mathfrak {a}}2^{{\mathfrak {p}}} \ge {\mathfrak {h}}\), we have \({{m}}^{-{\mathfrak {a}}p 2^{{\mathfrak {p}}+ 3}/q} \lesssim \lambda _{J_{{m}}^+}\) by (D2) and the claim follows. \(\square \)

Lemma 10

Assume that Assumption 1 holds. Then for \(1 \le q \le 4p\)

$$\begin{aligned} \Vert {\widehat{\lambda }}_j - \lambda _j - I_{j,j}\Vert _q \lesssim \frac{\lambda _j {{m}}^{-{\mathfrak {a}}}}{\sqrt{{m}}}, \quad \Vert {\widehat{\lambda }}_j - \lambda _j\Vert _q \lesssim \frac{\lambda _j}{\sqrt{{m}}}, \qquad \text {uniformly for } 1 \le j < J_{{m}}^+. \end{aligned}$$

Proof of Lemma 10

We have that

$$\begin{aligned} {\widehat{\lambda }}_j= & {} \int _{{\mathcal {T}}^2} \widehat{{\mathbf {D}}} {\widehat{e}}_j {\widehat{e}}_j = \int _{{\mathcal {T}}^2} \widehat{{\mathbf {D}}} ({\widehat{e}}_j - e_j) {\widehat{e}}_j + \int _{{\mathcal {T}}^2} \widehat{{\mathbf {D}}} e_j {\widehat{e}}_j \\= & {} {\widehat{\lambda }}_j\int _{{\mathcal {T}}} ({\widehat{e}}_j - e_j) {\widehat{e}}_j + \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D}) {e}_j {\widehat{e}}_j + \int _{{\mathcal {T}}^2}{{\mathbf {D}}} e_j {\widehat{e}}_j\\= & {} \frac{{\widehat{\lambda }}_j}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 + \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D}) e_j({\widehat{e}}_j - e_j) + \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D}) e_j {e}_j + \int _{{\mathcal {T}}^2}{{\mathbf {D}}} e_j {\widehat{e}}_j. \end{aligned}$$

Since by Lemma 4

$$\begin{aligned} \int _{{\mathcal {T}}^2}{{\mathbf {D}}} e_j {\widehat{e}}_j = \int _{{\mathcal {T}}^2}{{\mathbf {D}}} e_j ({\widehat{e}}_j - e_j) + \int _{{\mathcal {T}}^2}{{\mathbf {D}}} e_j {e}_j = -\frac{\lambda _j}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 + \lambda _j, \end{aligned}$$

we obtain by rearranging terms (if \(\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 < 2\))

$$\begin{aligned} {\widehat{\lambda }}_j - \lambda _j= & {} \frac{2}{2 - \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2}\left( \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D}) e_j {e}_j + \int _{{\mathcal {T}}^2} (\widehat{{\mathbf {D}}} - \mathbf{D})e_j({\widehat{e}}_j - e_j)\right) \nonumber \\= & {} \frac{2}{2 - \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2}(I_{j,j} + {\textit{II}}_{j,j}). \end{aligned}$$
(7.18)

Let \({\mathcal {B}}_j = \{\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \le 1\}\). By Lemma 5, Proposition 5 and the Cauchy–Schwarz inequality we obtain

$$\begin{aligned} \left\| I_{j,j}\left( 1 - \frac{2}{2 - \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2}\right) \mathbf {1}({\mathcal {B}}_j)\right\| _q \lesssim \Vert I_{j,j}\Vert _{2q}\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\Vert _{2q} \lesssim \frac{\lambda _j}{\sqrt{{m}}}{{m}}^{-2{\mathfrak {a}}}.\nonumber \\ \end{aligned}$$
(7.19)

Similarly, Corollary 5 yields that

$$\begin{aligned} \left\| {\textit{II}}_{j,j}\left( 1 - \frac{2}{2 - \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2}\right) \mathbf {1}({\mathcal {B}}_j)\right\| _q \lesssim \frac{\lambda _j}{\sqrt{{m}}}{{m}}^{-{\mathfrak {a}}}. \end{aligned}$$
(7.20)

Let \({\mathcal {D}} = \{\Vert \widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}} \Vert _{{\mathcal {L}}} \le 1\}\). Lemma 6 and Markov’s inequality then yield that

$$\begin{aligned} P({\mathcal {D}}^c) \lesssim {{m}}^{-2{\mathfrak {a}}p 2^{{\mathfrak {p}}/2+3}}. \end{aligned}$$
(7.21)

On the other hand, Proposition 5 implies that \(P({\mathcal {B}}_j^c) \lesssim {{m}}^{-2{\mathfrak {a}}p 2^{{\mathfrak {p}}/2+2}}\). Since \({\mathfrak {h}}\ge 1, 1/2 > {\mathfrak {a}}\) we have \(2^{{\mathfrak {p}}/2} \ge 1/2 + 1/4{\mathfrak {a}}+ {\mathfrak {h}}/2{\mathfrak {a}}\) and hence \({{m}}^{-2{\mathfrak {a}}2^{{\mathfrak {p}}/2}} \lesssim {{m}}^{-1/2 - {\mathfrak {a}}} \lambda _{J_{{m}}^+}\) by (D2). Combining (7.18), (7.19), (7.20) and (7.21) we obtain from the Cauchy–Schwarz inequality, Lemma 1 (see [10] for a general version) and Lemma 6, that

$$\begin{aligned} \Vert {\widehat{\lambda }}_j - \lambda _j - I_{j,j}\Vert _q\lesssim & {} P({\mathcal {B}}_j^c)^{1/q} + \Vert \Vert \widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}}\Vert _{{\mathcal {L}}} \Vert _{2q} P ({\mathcal {D}}^c)^{1/2q} + \frac{\lambda _j}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}} \\\lesssim & {} \frac{\lambda _j}{\sqrt{{m}}} {{m}}^{-{\mathfrak {a}}}, \end{aligned}$$

which gives the first claim. The second claim follows from Lemma 5. \(\square \)

Lemma 11

Assume that Assumption 1 holds. Then for \(1 \le q \le 2p\) we have uniformly for \(k \in \mathbb {N}\) and \(1 \le j < J_{{m}}^+\)

$$\begin{aligned} \Vert {\textit{III}}_{k,j}\mathbf {1}({\mathcal {A}}_j)\Vert _q \lesssim \frac{\lambda _j}{{m}}\frac{\sqrt{\lambda _k \lambda _j}}{|\lambda _k - \lambda _j|}\lesssim \frac{\sqrt{\lambda _k \lambda _j}}{\sqrt{{m}}}{{m}}^{-{\mathfrak {a}}}. \end{aligned}$$

Proof of Lemma 11

Recall that \({\textit{III}}_{k,j} = ({\widehat{\lambda }}_j - \lambda _j)E_{k,j}\). By the Cauchy–Schwarz inequality and Lemma 10, we have that

$$\begin{aligned} \Vert {\textit{III}}_{k,j}\mathbf {1}({\mathcal {A}}_j)\Vert _q \lesssim \frac{\lambda _j}{\sqrt{{m}}} \Vert E_{k,j}\mathbf {1}({\mathcal {A}}_j)\Vert _{2q}. \end{aligned}$$

Hence the claims follow from inequality (7.15) and (D2). \(\square \)

For the sake of reference, we state Pisier’s inequality.

Lemma 12

Let \(p \ge 1\) and \(Y_{j},1 \le j \le J\) be a sequence of random variables. Then

$$\begin{aligned} \Vert \max _{1 \le j \le J}|Y_j|\Vert _p \le \left( \sum _{j = 1}^{J}\Vert Y_j\Vert _p^p\right) ^{1/p} \le J^{1/p} \max _{1 \le j \le J}\Vert Y_j\Vert _{p}. \end{aligned}$$

We are now ready to proof Theorems 1 and 2.

Proof of Theorem 1

This readily follows from Lemma 10 and Lemma 12. \(\square \)

Proof of Theorem 2

We treat the first claim. By Lemma 4 we have the decomposition

$$\begin{aligned} \widehat{e}_j - e_j = -\frac{e_j}{2}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 + \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }e_k \frac{I_{k,j} + {\textit{II}}_{k,j} + {\textit{III}}_{k,j}}{\lambda _j - \lambda _k} \mathop {=}\limits ^{def} -A_j + B_j. \end{aligned}$$
(7.22)

Note that by the triangle inequality

$$\begin{aligned} \Vert B_j\Vert _{{\mathbb {L}}^2} \le \Vert \widehat{e}_j - e_j\Vert _{{\mathbb {L}}^2} + \Vert A_j\Vert _{{\mathbb {L}}^2} \le 4. \end{aligned}$$

Let \(C_j = \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty }e_k \frac{I_{k,j}}{\lambda _j - \lambda _k}\). Then another application of the triangle inequality gives

$$\begin{aligned} \Vert \widehat{e}_j - e_j + A_j - C_j\Vert _{{\mathbb {L}}^2} \le \Vert B_j\Vert _{{\mathbb {L}}^2} + \Vert C_j\Vert _{{\mathbb {L}}^2} \le 4 + \Vert C_j\Vert _{{\mathbb {L}}^2}. \end{aligned}$$

Hence by the Cauchy–Schwarz inequality and Lemma 5

$$\begin{aligned}&\Vert \Vert \widehat{e}_j - e_j + A_j - C_j\Vert _{{\mathbb {L}}^2}\mathbf {1}({\mathcal {A}}_j^c)\Vert _p \\&\qquad \lesssim 4P({\mathcal {A}}_j^c)^{1/p} + P({\mathcal {A}}_j^c)^{1/2p} \left( \frac{1}{n}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2}\right) ^{1/2}, \end{aligned}$$

which by Lemma 8 and (D2) (arguing as in the proof of Lemma 10) is bounded by

$$\begin{aligned} \Vert \Vert \widehat{e}_j - e_j + A_j - C_j\Vert _{{\mathbb {L}}^2}\mathbf {1}({\mathcal {A}}_j^c)\Vert _p \lesssim {{m}}^{-1/2 - {\mathfrak {a}}}\left( \lambda _{J_n^+} + \sqrt{{\varLambda }_j}\right) . \end{aligned}$$

Lemma 12 and the inequality \({\varLambda }_j \ge \frac{\lambda _{j}}{\lambda _{j-1}} \gtrsim \lambda _{j} \wedge 1\) then show that it suffices to consider event \({\mathcal {A}}_j\). Corollary 5 and Lemma 11 give

$$\begin{aligned} \left\| \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{({\textit{II}}_{k,j} + {\textit{III}}_{k,j})^2}{(\lambda _j - \lambda _k)^2}\mathbf {1}({\mathcal {A}}_j)\right\| _p \lesssim {{m}}^{-1-{\mathfrak {a}}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2}, \end{aligned}$$

hence the first claim follows from Lemma 12. Next, we treat the second claim. As before Lemma 4 yields

$$\begin{aligned} \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 = \frac{1}{4}\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^4 + \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{(I_{k,j} + {\textit{II}}_{k,j} + {\textit{III}}_{k,j})^2}{(\lambda _j - \lambda _k)^2}. \end{aligned}$$

Proceeding as in the first claim, one shows that it suffices to consider the event \({\mathcal {A}}_j\). Let \({\mathcal {D}}_j = \{\Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \le {{m}}^{-{\mathfrak {a}}} \}\). Then proceeding as in Lemma 10 we obtain

$$\begin{aligned} P({\mathcal {D}}_j^c) \lesssim {{m}}^{-{\mathfrak {a}}p 2^{{\mathfrak {p}}/2 + 2}} \lesssim {{m}}^{-p - 2{\mathfrak {a}}p} \lambda _{J_n^+}^p. \end{aligned}$$
(7.23)

We thus obtain from Lemma 5, Corollary 5, Lemma 11 and (7.23)

$$\begin{aligned}&\left\| \left( \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 - \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{I_{k,j}^2}{(\lambda _j - \lambda _k)^2}\right) \mathbf {1}({\mathcal {A}}_j)\right\| _p \lesssim {{m}}^{-{\mathfrak {a}}}\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 \mathbf {1}({\mathcal {A}}_j)\Vert _p \nonumber \\&\qquad + P({\mathcal {D}}_j^c)^{1/p} + \left\| \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{(I_{k,j} + {\textit{II}}_{k,j} + {\textit{III}}_{k,j})^2 - I_{k,j}^2}{(\lambda _j - \lambda _k)^2} \mathbf {1}({\mathcal {A}}_j)\right\| _p \nonumber \\&\quad \lesssim {{m}}^{-{\mathfrak {a}}}\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2\mathbf {1}({\mathcal {A}}_j)\Vert _p + {{m}}^{-1- 2 {\mathfrak {a}}}\lambda _{J_n^+} + {{m}}^{-1-{\mathfrak {a}}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2}. \end{aligned}$$
(7.24)

Iterating this inequality once and rearranging terms, Lemma 5 yields that

$$\begin{aligned} \left\| \left( \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}^2 - \sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{I_{k,j}^2}{(\lambda _j - \lambda _k)^2}\right) \mathbf {1}({\mathcal {A}}_j)\right\| _p \lesssim \frac{\lambda _{J_n^+}}{{{m}}^{1+2 {\mathfrak {a}}}} + \frac{1}{{{m}}^{1+{\mathfrak {a}}}}\sum _{\begin{array}{c} k = 1\\ k \ne j \end{array}}^{\infty } \frac{\lambda _j \lambda _k}{(\lambda _j - \lambda _k)^2}. \end{aligned}$$

Since \({\varLambda }_j \ge \frac{\lambda _{j}}{\lambda _{j-1}} \gtrsim \lambda _{j} \wedge 1\), an application of Lemma 12 yields the desired result. \(\square \)

Proof of Proposition 2

Observe that since \({\mathbb {E}}[\varvec{\eta }_{k,j}^{\varvec{\mathcal {D}}}]=0\) for \(k \ne j\), we get that

$$\begin{aligned} I_{k,j} = \langle (\widehat{\varvec{\mathcal {D}}} - {\varvec{\mathcal {D}}})(e_k), e_j\rangle = \sqrt{\widetilde{\lambda }_k \widetilde{\lambda }_j}(\overline{\varvec{\eta }}_{k,j}^{\varvec{\mathcal {D}}} + {\varvec{\eta }}_{k,j}^{\varvec{\mathcal {R}}}). \end{aligned}$$

Since \(\widetilde{\lambda }_j = \lambda _j/{\mathbb {E}}[\varvec{\eta }_{j,j}^{\varvec{\mathcal {D}}}]\), the claim follows from (D1) and routine calculations. \(\square \)

Proof of Corollary 1

The claim follows from Proposition 2 and (D1). \(\square \)

7.1 Proofs of Lemma 13 and Theorem 3

We first provide the following result about the convexity relations of \(\lambda _x\).

Lemma 13

If (2.5) holds,  then (2.6) is valid.

Proof of Lemma 13

For the proof, the following relations are useful, which can be found in [13, 18].

$$\begin{aligned}&\text {If } j > k \text { and (2.5) holds, then } k \lambda _k \ge j \lambda _j \text { and } \lambda _k - \lambda _j \gtrsim (1 - k/j)\lambda _k. \nonumber \\&\text {Moreover, it holds that } \sum _{k > j} \lambda _k \le (j+1) \lambda _j. \end{aligned}$$
(7.25)

Now by (7.25) we have

$$\begin{aligned} \sum _{\begin{array}{c} k =1\\ k \ne j \end{array}}^{\infty }\frac{\lambda _k \lambda _j}{(\lambda _j - \lambda _k)^2} \lesssim j^2 \sum _{j > k} \frac{\lambda _j \lambda _k }{(k - j)^2 \lambda _k^2} + \sum _{j < k}^{2 j} \frac{k^2 \lambda _j \lambda _k }{(k - j)^2 \lambda _j^2} + \sum _{2j < k} \frac{\lambda _j \lambda _k }{\lambda _j^2}\lesssim j^2. \end{aligned}$$

In the same manner, one shows that

$$\begin{aligned} \sum _{\begin{array}{c} k =1\\ k \ne j \end{array}}^{\infty }\frac{\lambda _k}{|\lambda _j - \lambda _k|} \lesssim j \log j. \end{aligned}$$

\(\square \)

Proof of Theorem 3

First note that due to the Gaussianity of \(\mathbf{X}\), scores \(\eta _{k,i}\) and \(\eta _{k,j}\) are mutually independent for \(i \ne j\). Given independent standard Gaussian random variables XY, the function \(XY-1\) is a two-dimensional second degree Hermite polynomial. If \(X = Y\), then \(X^2-1\) is a univariate Hermite polynomial of second degree. We may now invoke Theorem 4 in [3]. The proof is based on the method of moments for partial sums of Hermite polynomials. In particular, using that \(\sup _{j \in \mathbb {N}}\sum _{k = 0}^{\infty }{Cov} (\eta _{0,j},\eta _{k,j})^2 < \infty \) (which follows from \(\alpha > 3/4\)) it is shown via the Diagram formula that for any fixed \(p \in \mathbb {N}\)

$$\begin{aligned} \sqrt{n}\max _{1 \le i,j \le \infty }\Vert \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {C}}}\Vert _p < \infty \quad \text {and} \quad \sqrt{n}\overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {C}}} \xrightarrow {w} {\mathcal {N}}(0,\sigma _{i,j}^2). \end{aligned}$$
(7.26)

Moreover, since \(\alpha > 3/4\) one readily shows that \(\max _{j \in \mathbb {N}}\bigl \Vert n^{-3/4}\sum _{k = 1}^n \eta _{k,j}\bigr \Vert _{2q} = \mathcal {O}(1)\) for any fixed \(q \in \mathbb {N}\). Hence (C1) holds and using Proposition 2 the CLT for \({\widehat{\lambda }}_j\) follows. \(\square \)

8 Proofs of Sects. 3 and 4

For the proof of Proposition 3, we require some preliminary results.

Lemma 14

For \(p \ge 2,\) let \(\{X_k\}_{k \in \mathbb {Z}} \in {\mathbb {L}}^2\) satisfy

$$\begin{aligned} \sum _{k = 1}^{\infty } \Vert \Vert X_k - X_k'\Vert _{{\mathbb {L}}^2}\Vert _p < \infty . \quad \text {Then} \quad \Vert \Vert X_1 + \cdots + X_n \Vert _{{\mathbb {L}}^2}\Vert _p \lesssim \sqrt{n}. \end{aligned}$$

Lemma 14 comes as a byproduct of the results in [42], see also Lemma 16 and [58] for the original argument for real-valued sequences, which we also use in the sequel. As a next result, we state a special type of Höffding decomposition.

Lemma 15

Let \(\{X_k\}_{k \in \mathbb {Z}}, \{Y_k\}_{k \in \mathbb {Z}} \in \mathbb {R}\) be stationary such that for \(p \ge 2\)

$$\begin{aligned} \sum _{k = 1}^{\infty } \Vert X_k - X_k'\Vert _{2p} < \infty , \quad \sum _{k = 1}^{\infty } \Vert Y_k - Y_k'\Vert _{2p} < \infty . \end{aligned}$$
(8.1)

Denote with \(A_{k} = (X_k - {\mathbb {E}}[X_k]){\mathbb {E}}[Y_1] + (Y_k - {\mathbb {E}}[Y_k]){\mathbb {E}}[X_1]\). Then

  1. (i)

    \(\Vert \sum _{1 \le k,l \le n} X_k Y_l - n \sum _{k = 1}^n A_k - n^2 {\mathbb {E}}[X_1]{\mathbb {E}}[Y_1]\Vert _{p} \lesssim n\),

  2. (ii)

    \(\Vert \sum _{k = 1}^n A_k\Vert _{2p} \lesssim \sqrt{n}\).

Proof of Lemma 15

Using the Höffding decomposition

$$\begin{aligned} \sum _{1 \le k,l \le n} X_k Y_l=\sum _{1 \le k,l \le n} (X_k - {\mathbb {E}}[X_k])(Y_l - {\mathbb {E}}[Y_l]) + n^2 {\mathbb {E}}[X_1] {\mathbb {E}}[Y_1]+ n \sum _{k = 1}^n A_k, \end{aligned}$$

claim (i) follows from the triangle inequality, Cauchy–Schwarz and Lemma 16. Claim (ii) follows directly from Lemma 16. \(\square \)

Proof of Proposition 3

Let us first mention that the assumptions of Proposition 3 clearly imply those of Lemmas 14 and 15. As another preliminary remark, observe that \({\mathbb {E}}[\Vert X_k\Vert _{{\mathbb {L}}^2}^{4p}] < \infty \) implies that \({\varvec{\mathcal {C}}}_h\) exists and \(\overline{X}_k = \sum _{j = 1}^{\infty } \widetilde{\lambda }_j^{1/2} \eta _{k,j} e_j\) with \(\sum _{j = 1}^{\infty } \widetilde{\lambda }_j < \infty \). Next, denote with

$$\begin{aligned} \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}} = -(\widetilde{\lambda }_i \widetilde{\lambda }_j)^{-1/2} \langle \widehat{\varvec{\mathcal {D}}}(e_i), e_j \rangle + \varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}, \quad i,j \in \mathbb {N}. \end{aligned}$$
(8.2)

Employing Lemma 14, lengthy routine calculations reveal that (here condition \({\mathfrak {b}}> 3/2\) is helpful)

$$\begin{aligned}&{\mathbb {E}}\left[ \left\| \sum _{1 \le k,l \le n-h} \langle X_{l+h} - {\bar{X}}_n, X_{k+h} - {\bar{X}}_n \rangle \langle X_k- {\bar{X}}_n, \cdot \rangle (X_l - {\bar{X}}_n)\right. \right. \nonumber \\&\quad \left. \left. - \sum _{1 \le k,l \le n-h} \langle X_{l+h} - \mu , X_{k+h} - \mu \rangle \langle X_k- \mu , \cdot \rangle (X_l - \mu )\right\| _{{\mathbb {L}}^2}^{p}\right] \lesssim n^p,\quad \end{aligned}$$
(8.3)

we spare the details. Observe next that we have the representation

$$\begin{aligned}&\sum _{1 \le k,l \le n-h} \langle X_{l+h} - \mu , X_{k+h} - \mu \rangle \langle X_k- \mu , \cdot \rangle (X_l - \mu ) \nonumber \\&\quad = \sum _{1 \le k,l \le n-h} \sum _{i,j = 1}^{\infty } \sqrt{\widetilde{\lambda }_i \widetilde{\lambda }_j} \sum _{r = 1}^{\infty } \widetilde{\lambda }_r \eta _{l+h,r}\eta _{l,i} \eta _{k+h,r} \eta _{k,j} \langle e_i, \cdot \rangle e_j. \end{aligned}$$
(8.4)

From the triangle inequality and Cauchy–Schwarz, we obtain

$$\begin{aligned} \max _{i,r \in \mathbb {N}}\Vert \eta _{l+h,r}\eta _{l,i} - (\eta _{l+h,r}\eta _{l,i})'\Vert _{2p} \lesssim {\varOmega }_{4p}(l+h) + {\varOmega }_{4p}(l), \quad l,h \in \mathbb {N}.\quad \end{aligned}$$
(8.5)

Hence by (8.3), Lemma 15(i) (using (8.5)) and \(\sum _{r = 1}^{\infty } \widetilde{\lambda }_r < \infty \), we obtain

$$\begin{aligned} n^{1/2}\max _{i,j \in \mathbb {N}}\Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\Vert _p \lesssim n^{-1/2}. \end{aligned}$$

Next, using Lemma 15(ii) (applicable by (8.5)) we get

$$\begin{aligned} n^{1/2}\max _{i,j \in \mathbb {N}}\Vert \overline{\varvec{\eta }}_{i,j}^{\varvec{\mathcal {D}}}\Vert _p < \infty . \end{aligned}$$

Finally, we remark that the same calculations used to derive (3.6) also reveal that \({\mathbb {E}}[\varvec{\eta }_{i,j}^{\varvec{\mathcal {D}}}] = 0\) for \(i \ne j\). Hence (2.2) holds, which completes the proof. \(\square \)

Proof of Theorem 5

Note first that an application of Lemma 14 together with routine calculations gives

$$\begin{aligned} \Vert \Vert \widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h \Vert _{{\mathcal {L}}}\Vert _{p'} \lesssim n^{-1/2}, \quad 1 \le p' \le p 2^{{\mathfrak {p}}+ 2}. \end{aligned}$$
(8.6)

Let us make the decomposition

$$\begin{aligned} \widehat{f}_j - f_j = ({\widehat{\lambda }}_j^{1/2}\widehat{f}_j - {\lambda }_j^{1/2}{f}_j + (\lambda _j^{1/2}- {\widehat{\lambda }}_j^{1/2}){f}_j)\left( \frac{1}{\lambda _j^{1/2}} + \frac{\lambda _j^{1/2} - {\widehat{\lambda }}_j^{1/2}}{({\widehat{\lambda }}_j \lambda _j)^{1/2}}\right) , \end{aligned}$$

and also

$$\begin{aligned} {\widehat{\lambda }}_j^{1/2} \widehat{f}_j - {\lambda }_j^{1/2}{f}_j= & {} \widehat{\varvec{\mathcal {C}}}_h({\widehat{e}}_j) - {\varvec{\mathcal {C}}}_h({e}_j) \nonumber \\= & {} {\varvec{\mathcal {C}}}_h({\widehat{e}}_j - e_j) +(\widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h)(e_j) + (\widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h)({\widehat{e}}_j - e_j).\qquad \quad \end{aligned}$$
(8.7)

Using (8.6), elementary computations yield

$$\begin{aligned}&\Vert \Vert {\widehat{\lambda }}_j^{1/2} \widehat{f}_j - \lambda _j^{1/2}f_j\Vert _{{\mathbb {L}}^2}\Vert _{p'} \nonumber \\&\quad \le \Vert {\varvec{\mathcal {C}}}_h\Vert _{\mathcal {L}} \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}\Vert _{p'} + \Vert \Vert \widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h\Vert _{\mathcal {L}}\Vert _{2p'}(1 + \Vert \Vert {\widehat{e}}_j-e_j\Vert _{{\mathbb {L}}^2}\Vert _{2p'}) \nonumber \\&\quad \lesssim \Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}\Vert _{2p'} + n^{-1/2}, \quad 1 \le p' \le p 2^{{\mathfrak {p}}+ 2}. \end{aligned}$$
(8.8)

Next, for \(j \in \mathbb {N}\) consider the set \({\mathcal {C}}_j\) defined as

$$\begin{aligned} {\mathcal {C}}_j = \{{\widehat{\lambda }}_j > \lambda _j/2\}, \qquad P({\mathcal {C}}_j^c) \lesssim n^{-2p} \quad j \in \mathbb {N}, \end{aligned}$$
(8.9)

where the bound for \(P({\mathcal {C}}_j^c)\) follows from Markov’s inequality and Lemma 10. Since \(\Vert \widehat{f}_j\Vert _{{\mathbb {L}}^2} = \Vert {f}_j\Vert _{{\mathbb {L}}^2} = 1\), we thus obtain

$$\begin{aligned} \Vert \Vert \widehat{f}_j - f_j\Vert _{{\mathbb {L}}^2}\mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{p'} \le 2 \Vert \mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{p'} \lesssim n^{-2p/p'}, \quad p' \ge 1. \end{aligned}$$
(8.10)

Similarly, since \({\varvec{\mathcal {C}}}_h\) is a bounded operator, the triangle inequality, Cauchy–Schwarz, Lemma 10, (8.6) and (8.9) yield for \(1 \le p' \le p\)

$$\begin{aligned}&\Vert \Vert ({\widehat{\lambda }}_j - \lambda _j)f_j/(2\lambda _j^{1/2}) + {\varvec{\mathcal {C}}}_h({\widehat{e}}_j - e_j) + (\widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h)(e_j)\Vert _{{\mathbb {L}}^2} \mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{p'} \\&\quad \lesssim \Vert {\widehat{\lambda }}_j - \lambda _j\Vert _{2p'}\Vert \mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{2p'}/\lambda _j^{1/2} + \Vert \mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{p'} + \Vert \Vert \widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h\Vert _{{\mathcal {L}}}\Vert _{2p'} \Vert \mathbf {1}_{{\mathcal {C}}_j^c}\Vert _{2p'} \\&\quad \lesssim n^{-1/2 - p/p'}\lambda _j^{1/2} + n^{-2p/p'} + n^{-1/2 - p/p'} \lesssim n^{-3/2}. \end{aligned}$$

Multiplying with \(\lambda _j^{-1/2}\), we see that it suffices to establish the claim on the set \({\mathcal {C}}_j\). To this end, observe that

$$\begin{aligned} |{\widehat{\lambda }}_j^{1/2} - \lambda _j^{1/2} - \frac{{\widehat{\lambda }}_j - \lambda _j}{2\lambda _j^{1/2}}|\le \frac{({\widehat{\lambda }}_j - \lambda _j)^2}{2 \lambda _j^{3/2}}, \quad j \in \mathbb {N}. \end{aligned}$$
(8.11)

Then (8.8), (8.11), Cauchy–Schwarz, the triangle inequality and Lemma 10 yield

$$\begin{aligned}&\left\| \Vert {\widehat{\lambda }}_j^{1/2}\widehat{f}_j - {\lambda }_j^{1/2}{f}_j + (\lambda _j^{1/2}- {\widehat{\lambda }}_j^{1/2}){f}_j\Vert _{{\mathbb {L}}^2} \frac{\lambda _j^{1/2} - {\widehat{\lambda }}_j^{1/2}}{({\widehat{\lambda }}_j \lambda _j)^{1/2}}\mathbf {1}_{{\mathcal {C}}_j}\right\| _{p'} \nonumber \\&\quad \lesssim (\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}\Vert _{4p'} + n^{-1/2})(\lambda _j n)^{-1/2}, \quad 1 \le p' \le p. \end{aligned}$$
(8.12)

Using (8.6) and (8.7) together with Cauchy–Schwarz, (8.11) together with Lemma 10 and combining this with (8.12), the triangle inequality gives

$$\begin{aligned}&\left\| \left( \left\| \widehat{f}_j - f_j - \frac{({\widehat{\lambda }}_j - \lambda _j)f_j}{2\lambda _j} - \frac{{\varvec{\mathcal {C}}}_h({\widehat{e}}_j - e_j) + (\widehat{\varvec{\mathcal {C}}}_h - {\varvec{\mathcal {C}}}_h)(e_j)}{\lambda _j^{1/2}}\right\| _{{\mathbb {L}}^2}\right) \mathbf {1}_{{\mathcal {C}}_j} \right\| _{p'} \nonumber \\&\quad \lesssim \frac{1}{\sqrt{\lambda _j n}}\Vert \Vert {\widehat{e}}_j - e_j\Vert _{{\mathbb {L}}^2}\Vert _{4p'} + \frac{1}{\sqrt{\lambda _j}n} +\frac{1}{n}. \end{aligned}$$
(8.13)

\(\square \)

Proof of Theorem 6

Since \(\sum _{h \in \mathbb {Z}} \Vert {\varvec{\mathcal {C}}}_h \Vert _{{\mathcal {L}}} < \infty \), \({\varvec{\mathcal {{\mathcal {G}}}}}^{b}\) exists, and by \({\varvec{\mathcal {C}}}_h^* = {\varvec{\mathcal {C}}}_{-h}\), \({\varvec{\mathcal {{\mathcal {G}}}}}^{b}\) is symmetric. Hence by the spectral theorem, (4.5) holds. Together with (4.8), this gives (2.1) and (2.2). It remains to derive a bound for \(\varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}} = \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}(n)\). To this end, put \(\bar{\eta }_j^b = \bar{\eta }_j^b(n) = \langle {\bar{X}}_n-\mu , e_j^b \rangle (\widetilde{\lambda }_j^b)^{-1/2}\). Since \(b = \mathcal {O}(n)\), routine calculations then reveal the upper bound

$$\begin{aligned} \Vert \varvec{\eta }_{i,j}^{\varvec{\mathcal {R}}}\Vert _q \lesssim \frac{1}{n} \sum _{h = 1}^b \left( \left\| \sum _{k = h+1}^n \eta _{k,i}^b \bar{\eta }_{j}^b\right\| _q + \left\| \sum _{k = h+1}^n \bar{\eta }_{i}^b \eta _{k-h,j}^b \right\| _q + n\Vert \bar{\eta }_{i}^b \bar{\eta }_{j}^b\Vert _q\right) .\qquad \quad \end{aligned}$$
(8.14)

Using Cauchy–Schwarz and (G1)\(^{b}\), the claim then follows. \(\square \)

9 Proofs of Sect. 5

We need to introduce some further notation. To this end, we slightly reformulate our notion of weak dependence in an equivalent way. In the sequel, \(\{\epsilon _{k}\}_{k \in \mathbb {Z}} \in \mathbb {S}\) denotes an IID sequence in some measure space \(\mathbb {S}\) and \({\mathcal {F}}_k = \sigma (\epsilon _j,\, j \le k)\) the corresponding filtration. For \(d \in \mathbb {N}\), we then consider the variables

$$\begin{aligned} U_{k,h} = H_h({\mathcal {F}}_k), \quad k \in \mathbb {Z},\, 1 \le h \le d, \end{aligned}$$

where \(H_h\) are measurable functions. Compared to Sect. 5, this setup is notationally more convenient. As a measure of dependence, we then consider

$$\begin{aligned} \theta _{j,p} = \max _{1 \le h \le d}\Vert U_{j,h} - U_{j,h}'\Vert _p, \quad p \ge 1, \end{aligned}$$

where \(U_{k,h}' = H_h({\mathcal {F}}_k')\), \({\mathcal {F}}_k' = \sigma (\ldots \epsilon _{-1},\epsilon _0', \epsilon _1, \ldots , \epsilon _k)\), and \(\{\epsilon _k'\}_{k \in \mathbb {Z}}\) is an independent copy of \(\{\epsilon _k\}_{k \in \mathbb {Z}}\).

9.1 Gaussian approximation for weak dependence

In this section, a high dimensional Gaussian approximation result is established, which is a key ingredient in the proof of Theorem 7. This result may be of independent interest. Let \(S_{n,h} = \sum _{k = 1}^n U_{k,h}\), and denote with

$$\begin{aligned} T_d = \frac{1}{\sqrt{n}} \max _{1 \le h \le d}|S_{n,h}|, \quad T_d^{Z} = \max _{1 \le h \le d}|Z_h|, \end{aligned}$$
(9.1)

where \(\{Z_h\}_{1 \le h \le d}\) is a sequence of zero mean Gaussian random variables. We also formally introduce

$$\begin{aligned} \gamma _{i,j} = \lim _{n \rightarrow \infty }\frac{1}{n}{\mathbb {E}}[S_{n,i} S_{n,j}], \end{aligned}$$

existence is shown below in Lemma 19. We also put \(\sigma _h^2 = \gamma _{h,h}\). Throughout this section, we work under the following assumption.

Assumption 4

The sequence \(\{U_{k,h}\}_{k \in \mathbb {Z}}\) is stationary for each \(1 \le h \le d\), such that for \(p > 2\) and \(d \lesssim n^{{\mathfrak {d}}}\)

(F1):

\({\mathbb {E}}[U_{k,h}] = 0\) and \(\theta _{j,p} \lesssim j^{-{\mathfrak {c}}}\) with \({\mathfrak {c}}> 3/2 \),

(F2):

\({\mathfrak {d}}< p/2 - 1\),

(F3):

\(\inf _h \sigma _h > 0\).

We then have the following Gaussian approximation result.

Theorem 8

Grant Assumption 4. Then

$$\begin{aligned} \sup _{x \in \mathbb {R}}|P(T_d \le x)- P(T_d^{Z} \le x )|\lesssim n^{-C}, \quad C > 0, \end{aligned}$$

where \(\{Z_h\}_{1 \le h \le d}\) has the same covariance structure as \(n^{-1/2}\{S_{n,h}\}_{1 \le h \le d}\). Alternatively,  we may also choose \((\gamma _{i,j})_{1 \le i,j \le d}\) as covariance structure.

We first establish some additional notation. Let \(K = n^{{\mathfrak {k}}}\), \(L = n^{{\mathfrak {l}}}\) such that \(n = K L\) and \(0 < {\mathfrak {k}},{\mathfrak {l}}< 1\). To simplify the discussion, we always assume that \(K,L \in \mathbb {N}\). For each \(1 \le l \le L\), let \(\{\epsilon _{k}^{l}\}_{k \in \mathbb {Z}} \in \mathbb {S}\) be mutually independent sequences of IID random variables. For \(K(l-1) < k \le Kl\), \(1 \le l \le L\), denote with

$$\begin{aligned} U_{k,h}^{(K,\diamond )} = H_h({\mathcal {F}}_{k,h}^{K,\diamond }), \quad \text {where }{\mathcal {F}}_{k,h}^{K,\diamond } = \sigma ({\mathcal {F}}_{K(l-1)}^l,\epsilon _{K(l-1)+1}, \epsilon _{K(l-1)+2},\ldots ,\epsilon _k), \end{aligned}$$

where \({\mathcal {F}}_k^l = \sigma (\epsilon _j^l\, j \le k)\). For \(1 \le m < K\) put

$$\begin{aligned} V_{l,h}^{\diamond }(m) = \sum _{k = K(l-1) + 1}^{K(l-1) + m - 1} U_{k,h} + \sum _{k = K(l-1) + m}^{Kl} U_{k,h}^{(K,\diamond )}, \end{aligned}$$
(9.2)

and \(V_{l,h}^{\diamond } = V_{l,h}^{\diamond }(1)\). The random variables \(V_{l,h}^{\diamond }\) play a key role in the proof of Theorem 8. Note in particular that \(\{V_{l,h}^{\diamond }\}_{1 \le l \le L}\) is IID by construction for each h. Finally, put \(S_{L,h}(V) = \sum _{l = 1}^L V_{l,h}\) and \(S_{L,h}^{\diamond }(V) = \sum _{l = 1}^L V_{l,h}^{\diamond }\), and note that \(S_{n,h} = S_{L,h}(V)\). In the sequel, we make frequent use of the following lemma.

Lemma 16

Suppose that \(\sum _{j = 1}^{\infty } \theta _{j,p} < \infty \) for \(p \ge 2\). Then

$$\begin{aligned} \max _{1 \le h \le d}\Vert U_{1,h} + \cdots + U_{n,h} \Vert _p \lesssim \sqrt{n}. \end{aligned}$$

For the proof and variants of this result, see [58]. The next lemma controls the approximation error between \(S_{L,h}(V)\) and \(S_{L,h}^{\diamond }(V)\).

Lemma 17

Grant Assumption 4. For any \(K = n^{{\mathfrak {k}}}\) with \(0 < {\mathfrak {k}}< 1\) there exists a \(\delta > 0\) and a constant \(C > 0\) such that

$$\begin{aligned} P(|S_{L,h}(V) - S_{L,h}^{\diamond }(V)| \ge C n^{1/2 - \delta }) \lesssim n^{-\frac{p-2}{2} + p \delta }. \end{aligned}$$

Proof of Lemma 17

Let \(x_n = x \sqrt{n}\), \(x > 0\). For \(1 \le m < K\) we have that

$$\begin{aligned} P(|S_{L,h}(V) - S_{L,h}^{\diamond }(V)|\ge 2 x_n )\le & {} P\left( \left| \sum _{l = 1}^L \sum _{k = K(l-1) + 1}^{K(l-1) + m - 1} U_{k,h} - U_{k,h}^{(K,\diamond )}\right| \ge x_n\right) \\&+ P\left( \left| \sum _{l = 1}^L V_{l,h}^{} - V_{l,h}^{\diamond }(m) \right| \ge x_n\right) . \end{aligned}$$

Denote with \(\alpha _{j,p} = (j^{p/2 - 1} \theta _{j,p}^p)^{1/(p+1)}\) and \(A = \sum _{j = 1}^{\infty } \alpha _{j,p}\). Note that by (F1) we have

$$\begin{aligned} \alpha _{j,p} \lesssim j^{-{\mathfrak {B}}(p,{\mathfrak {c}})}, \quad \text {where } {\mathfrak {B}}(p,{\mathfrak {c}}) = \frac{p({\mathfrak {c}}-1/2)+1}{p+1} > 1, \end{aligned}$$
(9.3)

and thus \(A < \infty \). Due to Theorem 2 in [46], there exist constants \(C_{p,1},C_{p,2} > 0\) such that

$$\begin{aligned} P\left( \left| \sum _{l = 1}^L \sum _{k = K(l-1) + 1}^{K(l-1) + m - 1} U_{k,h} - U_{k,h}^{(K,\diamond )}\right| \ge x_n \right)\le & {} \frac{C_{1,p} Lm}{x_n^p} + \sum _{j = 1}^{\infty } \exp \left( -\frac{C_{p,2} \alpha _{j,p}^2 x_n^2}{A^2 L m \theta _{j,2}^2}\right) \\&+ \exp \left( -\frac{C_{p,2} x_n^2}{L m\Vert U_{k,h}\Vert _2^2}\right) . \end{aligned}$$

Setting \(x = y\sqrt{L \, m} A^{1 + 1/p}/\sqrt{n}\), it follows that \(\alpha _{j,p}^2 x_n^2 /(A^2 L\,m \theta _{j,2}^2) \ge j^{1 - 2/p}y^2\) and hence

$$\begin{aligned} \exp \left( -\frac{C_{p,2} \alpha _{j,p}^2 x_n^2}{A^2 L\,m \theta _{j,2}^2}\right) \le \exp (-C_{p,2} j^{1 - 2/p}y^2). \end{aligned}$$

Choosing m such that \(\sqrt{n}/\sqrt{L m} = n^{2 \delta }\) and \(y = n^{\delta }\), \(\delta > 0\), it follows that

$$\begin{aligned} P\left( \left| \sum _{l = 1}^L \sum _{k = K(l-1) + 1}^{K(l-1) + m - 1} U_{k,h} - U_{k,h}^{(K,\diamond )}\right| \ge n^{1/2 - \delta } A^{1 + 1/p}\right) \lesssim n^{-\frac{p-2}{2} + p\delta }. \end{aligned}$$
(9.4)

Next, put \({\varDelta }_{k,h}(U) = U_{k,h} - U_{k,h}^{(K,\diamond )}\). By the triangle inequality, we have

$$\begin{aligned} \Vert {\varDelta }_{k,h}(U) - {\varDelta }_{k,h}(U)'\Vert _p \le 2 (\theta _{k,p} \wedge \Vert {\varDelta }_{k,h}(U)\Vert _p). \end{aligned}$$

Let \((k)_K = k \mod K\). Then Theorem 1 in [57] yields that

$$\begin{aligned} \max _{1 \le h \le d}\Vert {\varDelta }_{k,h}(U)\Vert _p^2 = \max _{1 \le h \le d}\Vert U_{k,h} - U_{k,h}^{(K,\diamond )}\Vert _p^2 \lesssim \sum _{j = (k)_K}^{\infty } \theta _{j,p}^2 \mathop {=}\limits ^{def} {\varTheta }_{(k)_K,p}. \end{aligned}$$

Since clearly \({\varTheta }_{(k)_K,p}\) is monotone decreasing, we have \({\varTheta }_{(k)_K,p} \le {\varTheta }_{(m)_K,p}\) for \(m \le k \le K\). Combining this with the above, it follows that for \(m \le (k)_K\) (since \(m = (m)_K\))

$$\begin{aligned} \max _{1 \le h \le d}\Vert {\varDelta }_{k,h}(U) - {\varDelta }_{k,h}(U)'\Vert _p \le 2 \left( \theta _{k,p} \wedge \sqrt{{\varTheta }_{m,p}}\right) \mathop {=}\limits ^{def}\vartheta _{k,p}(m). \end{aligned}$$
(9.5)

Put \(\beta _{j,p}(m) = (j^{p/2 - 1} \vartheta _{j,p}^p(m))^{1/(p+1)}\) and \(B(m) = \sum _{j = 1}^{\infty } \beta _{j,p}(m)\). Then another application of Theorem 2 in [46] yields that

$$\begin{aligned} P\left( \left| \sum _{l = 1}^L V_{l,h}^{} - V_{l,h}^{\diamond }(m) \right| \ge x_n\right)\le & {} C_{1,p}\frac{n}{x_n^p} + \sum _{j = 1}^{\infty } \exp \left( -\frac{C_{p,2} \beta _{j,p}^2(m) x_n^2}{B(m)^2 n \vartheta _{j,2}^2(m)}\right) \\&+\exp \left( -\frac{C_{p,2} x_n^2}{n \max _{k \ge m}\Vert {\varDelta }_{k,h}(U)\Vert _2^2}\right) . \end{aligned}$$

Let \(y_n = n^{\delta } \sqrt{L m}/\sqrt{n} = n^{-\delta }\). Arguing similarly as before, it follows (since \(m = (m)_K\))

$$\begin{aligned} P\left( \left| \sum _{l = 1}^L V_{l,h}^{} - V_{l,h}^{\diamond }(m) \right| \ge x_n\right)\lesssim & {} \frac{n}{x_n^p} + \sum _{j = 1}^{\infty } \exp \left( -\frac{C_{p,2} j^{1 + -2/p}y_n^2}{B(m)^2}\right) \\&+\exp \left( -\frac{C_{p,2} y_n^2}{{\varTheta }_{m,p}}\right) . \end{aligned}$$

Since \({\varTheta }_{m,p} \lesssim m^{-2 {\mathfrak {c}}+ 1}\), we conclude

$$\begin{aligned} B(m) \lesssim \sum _{j > M} \alpha _{j,p} + \sum _{j = 1}^M (j^{p/2 - 1} m^{-p {\mathfrak {c}}+ p/2})^{1/(p+1)} \lesssim M^{-{\mathfrak {B}}(p,{\mathfrak {c}})+1} + M^{\frac{3p}{2p + 2}} m^{\frac{-2p {\mathfrak {c}}+ p}{2p + 2}}. \end{aligned}$$

Setting \(m \thicksim n^{\nu }\), \(\nu > 0\), balancing the above and choosing \(\delta \) sufficiently small, we obtain \(y_n^2 B(m)^{-2} \wedge y_n^2/{\varTheta }_{m,p} \gtrsim n^{\delta }\). This implies that

$$\begin{aligned} P\left( \left| \sum _{l = 1}^L V_{l,h}^{} - V_{l,h}^{\diamond }(m) \right| \ge n^{1/2 - \delta } A^{1+1/p}\right) \lesssim n^{-\frac{p-2}{2} + p\delta }. \end{aligned}$$

Note that by the above choice of \(m = n^{\nu }\) we require that \(L \thicksim n^{1 - 4 \delta - \nu }\). Choosing \(\nu \) sufficiently close to 1, we can select \({\mathfrak {k}}< 1\) arbitrarily close to 1, which completes the proof. \(\square \)

In the sequel, we also require the following result.

Lemma 18

Grant Assumption 4. Then

$$\begin{aligned} P\left( |V_{l,h}^{\diamond }| \ge \sqrt{K} \log n \right) \lesssim K^{1 - p/2} \bigl (\log n)^{-p}. \end{aligned}$$

Proof of Lemma 18

Since \(V_{l,h}^{\diamond } \mathop {=}\limits ^{d} V_{l,h}\), Theorem 2 in [46] and arguing similarly as in Lemma 17 yields

$$\begin{aligned} P\left( |V_{l,h}^{\diamond }|\ge y \sqrt{K}\right)\lesssim & {} \frac{K^{1-p/2}}{y^p} + \sum _{j = 1}^{\infty } \exp \left( -\frac{C_{p,2} j^{1 + -2/p}y^2}{A^2}\right) \\&+\exp \left( -\frac{C_{p,2} y^2}{\Vert U_{k,h}\Vert _2^2}\right) . \end{aligned}$$

Setting \(y = \log n\), the claim follows. \(\square \)

Next, we establish some useful results concerning the covariances \(\phi _{k,i,j} = {\mathbb {E}}[U_{0,i}U_{k,j}]\).

Lemma 19

Grant Assumption 4. Then

  1. (i)

    \(\sup _{i,j}|\phi _{k,i,j}| \lesssim k^{-{\mathfrak {c}}+ 1/2},\)

  2. (ii)

    \(\sup _{i,j} \sum _{k = 0}^{\infty } |\phi _{k,i,j}| < \infty ,\)

  3. (iii)

    \(\gamma _{i,j} = \phi _{0,i,j} + 2 \sum _{k = 1}^{\infty } \phi _{k,i,j} < \infty ,\)

  4. (iv)

    \(\sum _{k,l = 1}^n {\mathbb {E}}[U_{k,i}U_{l,j}] = n \gamma _{i,j} - \sum _{k \in \mathbb {Z}}^{\infty } n \wedge |k| \phi _{k,i,j}\).

Proof of Lemma 19

Claims (iii) and (iv) are well-known in the literature, and follow from elementary computations from (ii). Since (i) implies (ii) due to \({\mathfrak {c}}> 3/2\), it suffices to establish (i). To this end, let \(U_{k,h}^* = H_h\bigl ({\mathcal {F}}_k^*\bigr )\), where \({\mathcal {F}}_k^* = \sigma (\ldots ,\epsilon _{-1}',\epsilon _0',\epsilon _1,\ldots , \epsilon _k)\). Since then \({\mathbb {E}}[U_{k,h}^{*}|{\mathcal {F}}_0] = {\mathbb {E}}[U_{k,h}]= 0\), Cauchy–Schwarz, Jensen’s inequality and Theorem 1 in [57] yield

$$\begin{aligned} |{\mathbb {E}}[U_{0,i}U_{k,i}]| \le \Vert U_{0,i}\Vert _2 \Vert U_{k,j} - U_{k,j}^{*}\Vert _2\lesssim \left( \sum _{l = k}^{\infty } \theta _{l,2}^2\right) ^{1/2} \lesssim k^{-{\mathfrak {c}}+ 1/2}, \end{aligned}$$

where the last claim follows from (F1). \(\square \)

For \(1 \le i,j \le d\) denote with

$$\begin{aligned} \gamma _{i,j}^{(n)} = \frac{1}{n}{\mathbb {E}}[S_{n,i} S_{n,j}], \quad \gamma _{i,j}^{(\diamond ,n)} = \frac{1}{n}{\mathbb {E}}[S_{L,i}^{\diamond }(V) S_{L,j}^{\diamond }(V)]. \end{aligned}$$

Lemma 20

Grant Assumption 4. Then

$$\begin{aligned} \max _{1 \le i,j \le d}|\gamma _{i,j}^{(n)} - \gamma _{i,j}^{(\diamond ,n)}| \lesssim n^{-1/2} L. \end{aligned}$$

Remark 3

From Lemma 19(iv), Lemma 20 and the triangle inequality, we have that

$$\begin{aligned} |\gamma _{i,j} - \gamma _{i,j}^{(\diamond ,n)}| \lesssim \frac{1}{n} \sum _{k = 1}^n k^{3/2 - {\mathfrak {c}}} + \sum _{k > n}^{\infty } k^{-{\mathfrak {c}}+ 1/2} + n^{-\frac{1}{2}} L \lesssim n^{-\frac{1}{2}} L + n^{\frac{3}{2} - {\mathfrak {c}}}. \end{aligned}$$

Proof of Lemma 20

We have that

$$\begin{aligned} |{\mathbb {E}}[S_{L,i}(V) S_{L,j}(V)] - {\mathbb {E}}[S_{L,i}^{\diamond }(V) S_{L,j}^{\diamond }(V)]|\le & {} \sum _{l = 1}^L \Vert V_{l,j}^{\diamond } - V_{l,j}\Vert _2 \Vert S_{L,j}^{\diamond }(V)\Vert _2 \\&+ \sum _{l = 1}^L \Vert V_{l,i}^{\diamond } - V_{l,i}\Vert _2 \Vert S_{L,i}(V)\Vert _2. \end{aligned}$$

By the Marcinkiewicz–Zygmund inequality, Lemma 16 and (F1) we have

$$\begin{aligned} \max _{1 \le h \le d}\Vert S_{L,h}^{\diamond }(V)\Vert _2 \lesssim \sqrt{n} \quad \text {and} \quad \max _{1 \le h \le d}\Vert S_{L,h}(V)\Vert _2 \lesssim \sqrt{n}. \end{aligned}$$
(9.6)

Using the triangle inequality and Theorem 1 in [57], it follows that

$$\begin{aligned} \max _{1 \le h \le d} \sum _{l = 1}^L \Vert V_{l,h}^{\diamond } - V_{l,h}\Vert _2\lesssim & {} \max _{1 \le h \le d} L \sum _{k = 1}^{\infty }\Vert U_{k,h} - U_{k,h}^*\Vert _2 \nonumber \\\lesssim & {} L \sum _{k = 1}^{\infty } \sqrt{\sum _{j \ge k} \theta _{j,2}^2} \lesssim L \sum _{k = 1}^{\infty } j^{-{\mathfrak {c}}+1/2} \lesssim L. \end{aligned}$$
(9.7)

Hence combining (9.6) and (9.7), the claim follows. \(\square \)

Next, we state some Gaussian approximation results. To this end, we require the following condition. For \(\varepsilon , u(\varepsilon ) > 0\) we have

$$\begin{aligned} P\left( \max _{1 \le h \le d}\max _{1 \le l \le L}|V_{l,h}^{\diamond }| \ge \sqrt{K u(\varepsilon )}\right) \le \varepsilon . \end{aligned}$$
(9.8)

Denote with

$$\begin{aligned} T_{L,d}^{\diamond }=\frac{1}{\sqrt{n}}\max _{1 \le h \le d}|S_{L,h}^{\diamond }(V)|, \quad T_d^{Z,\diamond } = \max _{1 \le h \le d}|Z_h^{\diamond }|, \end{aligned}$$

where \(\{Z_h^{\diamond }\}_{1 \le h \le d}\) is a zero mean Gaussian sequence with covariance structure \({\varSigma }_d^{(\diamond ,n)} = (\gamma _{i,j}^{(\diamond ,n)})_{1 \le i,j \le d}\). We have the following Gaussian approximation result, which is an adaptation of Theorem 2.2 in [16].

Lemma 21

Assume the validity of (9.8) and that

  1. (i)

    \(K^{-1/2}\min _{1 \le h \le d}\min _{1 \le l \le L} \Vert V_{l,h}^{\diamond }\Vert _2 > 0,\)

  2. (ii)

    \(K^{-1/2}\max _{1 \le h \le d}\max _{1 \le l \le L} \Vert V_{l,h}^{\diamond }\Vert _4 < \infty \).

Then it holds that

$$\begin{aligned}&\sup _{x \in \mathbb {R}}|P(T_{L,d}^{\diamond } \le x) - P(T_{d}^{Z} \le x)| \\&\quad \lesssim L^{-1/8} (\log (d L/\varepsilon ))^{7/8} + L^{-1/2} (\log (d L/\varepsilon ))^{3/2}u(\varepsilon ) + \varepsilon . \end{aligned}$$

We also require the following two results, which are Lemmas 2.1 and 3.1 in [16], slightly adapted for our purpose.

Lemma 22

Let \(\{X_h\}_{1 \le h \le d}\) and \(\{Y_h\}_{1 \le h \le d}\) be zero mean Gaussian sequences,  and denote with \(\gamma _{i,j}^X, \gamma _{i,j}^Y\) the corresponding covariances for \(1 \le i,j \le d\). If \(0 < \inf _h \gamma _{h,h}^X \le \sup _h \gamma _{h,h}^X < \infty \), then with \(\delta = \max _{1 \le i,j \le d}|\gamma _{i,j}^X - \gamma _{i,j}^Y|,\)

$$\begin{aligned} \sup _{x \in \mathbb {R}}|P\left( \max _{1 \le h \le d}|X_h| \le x\right) - P\left( \max _{1 \le h \le d}|Y_h| \le x\right) | \lesssim \delta ^{1/3}(1 \vee \log (d/\delta ))^{2/3}. \end{aligned}$$

Lemma 23

Let \(\{X_h\}_{1 \le h \le d}\) be a zero mean Gaussian sequence with covariances \(\{\gamma _{i,j}^X\}_{1\le i,j\le d}\). If \(0 < \inf _h \gamma _{h,h}^X \le \sup _h \gamma _{h,h}^X < \infty ,\) then for \(|\delta |<\infty \)

$$\begin{aligned} \sup _{x \in \mathbb {R}}P\left( \max _{1 \le h \le d}|X_h - \delta | \le x\right) \lesssim \delta \sqrt{1 \vee \log (d / \delta )}. \end{aligned}$$

Proof of Theorem 8

By Lemma 17 and Boole’s inequality, we have

$$\begin{aligned} P\left( \max _{1 \le h \le d}|S_{L,h}(V) - S_{L,h}^{\diamond }(V)| \ge C_1 n^{1/2 - \delta }\right) \lesssim d n^{-\frac{p-2}{2} + \delta p}. \end{aligned}$$

Since \(d \lesssim n^{{\mathfrak {d}}}\) we obtain from (F2) with \(\delta > 0\) sufficiently small

$$\begin{aligned} P\left( \max _{1 \le h \le d}|S_{L,h}(V) - S_{L,h}^{\diamond }(V)| \ge C_1 n^{1/2 - \delta }\right) \lesssim n^{-C_2}, \quad C_2 > 0. \end{aligned}$$
(9.9)

Employing this bound, we get that

$$\begin{aligned} P(T_{d} \le x) \le P(T_{L,d}^{\diamond } \le x + C_1 n^{- \delta })+ {\mathcal {O}}(n^{-C_2}). \end{aligned}$$

In the same manner one obtains a lower bound, hence

$$\begin{aligned}&P(T_{L,d}^{\diamond } \le x -C_1 n^{- \delta }) - {\mathcal {O}}(n^{-C_2})\le P(T_{d} \le x) \nonumber \\&\quad \le P(T_{L,d}^{\diamond } \le x + C_1 n^{- \delta })+ {\mathcal {O}}(n^{-C_2}). \end{aligned}$$
(9.10)

Next, we apply Lemma 21 to \(T_{L,d}^{\diamond }\). To this end, we need to verify its conditions. Note that by the independence of \(V_{l,h}^{\diamond }\), we have that

$$\begin{aligned} \gamma _{h,h}^{(\diamond ,n)} = \frac{1}{L K}\sum _{l = 1}^L \Vert V_{l,h}^{\diamond }\Vert _2^2 = \frac{1}{K}\Vert V_{1,h}^{\diamond }\Vert _2^2. \end{aligned}$$

Hence we deduce from Lemmas 19, 20, Remark 3 and (F3) that

$$\begin{aligned} K^{-1}\Vert V_{1,h}^{\diamond }\Vert _2^2 \ge \gamma _{h,h}^{(n)} - \mathcal {O}(1)\ge \sigma _h^2 - \mathcal {O}(1) > 0, \end{aligned}$$

uniformly in h, and thus (i) holds. Next we verify (ii). This, however, readily follows from Lemma 12 and (F1). Finally, we need to establish (9.8). Set \(u(\varepsilon ) = (\log n)^2\). Using Boole’s inequality and Lemma 18 gives

$$\begin{aligned} P\left( \max _{1 \le h \le d}\max _{1 \le l \le L}|V_{l,h}^{\diamond }| \ge \sqrt{K u(\varepsilon )}\right) \lesssim d L K^{-\frac{p-2}{2}} (\log n)^{-p}. \end{aligned}$$

By (F2) and choosing \({\mathfrak {k}}\) sufficiently close to 1, we get that

$$\begin{aligned} P\left( \max _{1 \le h \le d}\max _{1 \le l \le L}|V_{l,h}^{\diamond }| \ge \sqrt{K u(\varepsilon )}\right) \lesssim n^{-C_3}, \quad C_3, \end{aligned}$$

and (9.8) holds with \(\varepsilon \thicksim n^{-C_3}\). Since \(L \thicksim n^{{\mathfrak {l}}}\) with \({\mathfrak {l}}> 0\) due to \({\mathfrak {k}}< 1\), Lemma 21 yields that

$$\begin{aligned} \sup _{x \in \mathbb {R}}|P(T_{L,d}^{\diamond } \le x) - P(T_{d}^{Z} \le x )|\lesssim n^{-C_4}, \quad C_4 > 0. \end{aligned}$$
(9.11)

Combining this with (9.10), we deduce that

$$\begin{aligned}&P(Z_{d}^{\diamond } \le x -C_1 n^{- \delta }) - {\mathcal {O}}(n^{-C_5})\le P(T_{d} \le x) \nonumber \\&\quad \le P(Z_{d}^{\diamond } \le x + C_1 n^{- \delta }) + {\mathcal {O}}(n^{-C_5}). \end{aligned}$$
(9.12)

Next, since \(\log d \lesssim \log n\), Lemma 23 yields that

$$\begin{aligned} \sup _{x \in \mathbb {R}}|P(Z_{d}^{\diamond } \le x -C_1 n^{-\delta }) - P(Z_{d}^{\diamond } \le x)| \lesssim n^{-\delta } \sqrt{\log n}. \end{aligned}$$
(9.13)

In addition, by Remark 3

$$\begin{aligned} \max _{1 \le i,j \le d}|\gamma _{i,j}^{(\diamond ,n)} - \gamma _{i,j}^{}| \lesssim n^{-\frac{1}{2}}L + n^{\frac{3}{2} - {\mathfrak {c}}} \lesssim n^{-C_6}, \quad C_6 > 0. \end{aligned}$$

Hence the claim follows from Lemma 22. \(\square \)

9.2 Proofs of Sect. 5

Proof of Theorem 7

Denote with

$$\begin{aligned} T_{J_n^+}^{\eta } = \frac{1}{\sqrt{n}}\max _{1 \le j < J_n^+}\frac{\left| \sum _{k = 1}^n (\eta _{k,j}^2 - 1)\right| }{\sigma _{j}}. \end{aligned}$$

We first show that we may apply Theorem 8 to \(T_{J_n^+}^{\eta }\). To this end, we need to verify Assumption 4. Observe that (E2) implies \(\bigl \Vert \eta _{k,j}\bigr \Vert _{q} < \infty \) (cf. [57]). Moreover, using \(a^2 - b^2 = (a-b)(a+b)\), it follows from Cauchy–Schwarz

$$\begin{aligned} \Vert \eta _{k,j}^2 - (\eta _{k,j}^2)'\Vert _{q} \le 2 \Vert \eta _{k,j} - \eta _{k,j}'\Vert _{2q} \Vert \eta _{k,j}\Vert _{2q} \lesssim {\varOmega }_{2q(k)} \lesssim k^{-{\mathfrak {b}}}. \end{aligned}$$

Since \({\mathfrak {b}}> 3/2\) by (E2), (F1) follows. Next, note that (E1) implies that \(J_n^+ \lesssim n^{p ({\mathfrak {a}}- \delta )}\). Since \(q/2 - 1 > p 2^{{\mathfrak {p}}+ 2} > p {\mathfrak {a}}\) (recall \(0 < {\mathfrak {a}}< 1\)), (F2) holds. Finally, (E3) gives (F3), hence Assumption 4 is verified. We proceed with the proof. For \(j \in \mathbb {N}\), denote with \(I_{j,j}^* = \lambda _j\sum _{k = 1}^n \bigl (\eta _{k,j}^2 -1 \bigr )/n\), and note that by the above and Lemma 16 we have

$$\begin{aligned} \Vert I_{j,j}^*\Vert _p \lesssim (\lambda _j/n^{1/2}), \quad j \in \mathbb {N}. \end{aligned}$$
(9.14)

Introduce the set

$$\begin{aligned} {\mathcal {M}} = \left\{ \max _{1 \le j < J_n^+}\lambda _j^{-1}|{\widehat{\lambda }} - \lambda _j - I_{j,j}^*|\ge n^{-1/2 - \delta /2}\right\} . \end{aligned}$$

Then Markov’s inequality together with Proposition 2 and (9.14) yields

$$\begin{aligned} P({\mathcal {M}}^c)\lesssim n^{-p \delta /2} \lesssim n^{-C_1}, \quad C_1 > 0. \end{aligned}$$
(9.15)

Due to Theorem 8 and the above, we have the inequalities

$$\begin{aligned} P(T_{J_n^+}^{} \le x)\le & {} P(T_{J_n^+}^{\eta } \le x + n^{-\delta /2} ) + P({\mathcal {M}}^c) \\\le & {} P(T_{J_n^+}^{Z} \le x + n^{-\delta /2} ) + {\mathcal {O}}(n^{-C_2}), \quad C_2 > 0, \end{aligned}$$

where \(T_{J_n^+}^{Z}\) is as in (5.2). An application of Lemma 23 yields that this is further bounded by

$$\begin{aligned} P(T_{J_n^+}^{} \le x)\le P(T_{J_n^+}^{Z_{}} \le x) + {\mathcal {O}}(n^{-C_2} + n^{-\delta /2} \log n). \end{aligned}$$

In the same manner, we obtain a lower bound, hence

$$\begin{aligned} \sup _{x\in \mathbb {R}}|P\left( T_{J_n^+}^{} \le x\right) - P\left( T_{J_n^+}^{Z_{}} \le x\right) |\lesssim n^{-C_3}, \quad C_3 > 0, \end{aligned}$$
(9.16)

which completes the proof. \(\square \)

Proof of Corollary 3

Due to Theorem 7, it suffices to show that

$$\begin{aligned} P\left( T_{J_n^+}^{Z_{\lambda }} \le u_{J_n^+}(z)\right) \rightarrow \exp (-e^{-z}). \end{aligned}$$

This, however, follows from Theorem 14 and Theorem 1 in [30]. \(\square \)

10 Proofs of Sect. 6

Proof of Proposition 4

Due to (6.4), Theorem 3.6 in [10] yields the Bernoulli-shift representation \(X_k = \sum _{i = 0}^{\infty } {{\varvec{\Phi }}}^{i}(\epsilon _{k-i})\). Next, using the orthogonality of \(\{\epsilon _{k,j}\}_{j \in \mathbb {N}}\), we get

$$\begin{aligned} \Vert \langle \epsilon _{k}, e_l^{\theta } \rangle \Vert _2^2= \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2 . \end{aligned}$$
(10.1)

On the other hand, since \(\epsilon _k\) and \(X_{k-1}\) are independent, we obtain

$$\begin{aligned} \widetilde{\lambda }_l^{\theta } = \Vert \langle X_k, e_l^{\theta } \rangle \Vert _2^2 = \Vert \langle {{\varvec{\Phi }}}(X_{k-1}), e_l^{\theta } \rangle \Vert _2^2 + \Vert \langle \epsilon _k, e_l^{\theta } \rangle \Vert _2^2 \ge \Vert \langle \epsilon _k, e_l^{\theta } \rangle \Vert _2^2.\quad \end{aligned}$$
(10.2)

For \(k \ge 1\), using the triangle inequality, the linearity of \({{\varvec{\Phi }}}\), the fact that \({{\varvec{\Phi }}}(e_j^{\phi }) = \lambda _j^{\phi } e_j^{\phi }\) and (6.6) yields that

$$\begin{aligned} \widetilde{\lambda }_l^{\theta }\Vert \eta _{k,l}^{\theta } - (\eta _{k,l}^{\theta })'\Vert _q^{2}\lesssim & {} \left( \sum _{i = 1}^{\infty } (\lambda _i^{\phi })^k |\langle e_i^{\phi }, e_l^{\theta } \rangle |\Vert \langle \epsilon _0 - \epsilon _0', e_i^{\phi } \rangle \Vert _2^{q'/q}\right) ^{2} \\\lesssim & {} \left( \sum _{i = 1}^{\infty }(\lambda _i^{\phi })^k \left( \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \Vert \epsilon _{0,j}\Vert _2^{2q'/q}\langle e_j^{\epsilon }, e_i^{\phi } \rangle ^2 \langle e_i^{\phi }, e_l^{\theta } \rangle ^2 \right) ^{1/2}\right) ^{2}, \end{aligned}$$

where we also used \((\sum _{j = 1}^{\infty } \lambda _j^{\epsilon } \langle e_j^{\epsilon }, e_i^{\phi } \rangle ^2)^{(q' - q)/2q} < \infty \) in the last step (recall \(q' \ge q\)). Note that we have the inequality

$$\begin{aligned} \langle e_j^{\epsilon }, e_i^{\phi } \rangle ^2 \langle e_i^{\phi }, e_l^{\theta } \rangle ^2 \le \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2, \end{aligned}$$
(10.3)

which can be readily derived by contradiction (assume the converse and sum over j on both sides). Hence by the triangle inequality and (6.4), the above is further bounded by

$$\begin{aligned}&\lesssim \left( \sum _{i = 1}^{\infty }(\lambda _i^{\phi })^k \left( \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \Vert \epsilon _{0,j}\Vert _2^{2q'/q} \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2\right) ^{1/2}\right) ^{2} \\&\quad \lesssim \left( \sum _{i = 1}^{\infty }(\lambda _i^{\phi })\right) ^{2k} \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \Vert \epsilon _{0,j}\Vert _2^{2q'/q} \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2 \lesssim \rho ^k \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2, \end{aligned}$$

for \(0 < \rho < 1\). Combining this with (10.1), (10.2) we arrive at

$$\begin{aligned} \Vert \eta _{k,l}^{\theta } - (\eta _{k,l}^{\theta })'\Vert _q^2 \lesssim \frac{\rho ^k}{\widetilde{\lambda }_j^{\theta }} \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2 \lesssim \rho ^k, \quad k \ge 1. \end{aligned}$$
(10.4)

If \(k = 0\), we get from (6.6) that

$$\begin{aligned} \widetilde{\lambda }_l^{\theta }\Vert \eta _{k,l}^{\theta } - (\eta _{k,l}^{\theta })'\Vert _q^2 = \Vert \langle \epsilon _k - \epsilon _k', e_l^{\theta } \rangle \Vert _q^2 \lesssim \sum _{j = 1}^{\infty } {\lambda }_j^{\epsilon } \Vert \epsilon _{k,j}\Vert _q^2 \langle e_j^{\epsilon }, e_l^{\theta } \rangle ^2. \end{aligned}$$
(10.5)

If \(k < 0\) we have \(\eta _{k,j}^{\theta } = (\eta _{k,j}^{\theta })'\), and hence the claim follows from (10.4) and (10.5). Observe that by telescoping and Kolmogorov’s zero one law, we also get that \(\max _{j \in \mathbb {N}}\Vert \eta _{k,j}\Vert _q < \infty \). \(\square \)

Proof of Corollary 4

This follows from Lemma 16. \(\square \)