Sieve bootstrapping the memory parameter in long-range dependent stationary functional time series

We consider a sieve bootstrap procedure to quantify the estimation uncertainty of long-memory parameters in stationary functional time series. We use a semiparametric local Whittle estimator to estimate the long-memory parameter. In the local Whittle estimator, discrete Fourier transform and periodogram are constructed from the first set of principal component scores via a functional principal component analysis. The sieve bootstrap procedure uses a general vector autoregressive representation of the estimated principal component scores. It generates bootstrap replicates that adequately mimic the dependence structure of the underlying stationary process. We first compute the estimated first set of principal component scores for each bootstrap replicate and then apply the semiparametric local Whittle estimator to estimate the memory parameter. By taking quantiles of the estimated memory parameters from these bootstrap replicates, we can nonparametrically construct confidence intervals of the long-memory parameter. As measured by coverage probability differences between the empirical and nominal coverage probabilities at three levels of significance, we demonstrate the advantage of using the sieve bootstrap compared to the asymptotic confidence intervals based on normality.


Introduction
The past few decades have seen extensive studies and developments in analyzing long-range dependence (LRD) time series, which appear to exist in many fields, such as agriculture, economics, finance, and geophysics (see, e.g., Beran 1994; 1 3 Robinson 2003;Palma 2007;Giraitis et al. 2012;Beran et al. 2013). These books describe stochastic processes with greater persistence than short-range dependent ones. By greater persistence, the autocovariance for LRD processes decays to zero more slowly than for short-range ones. Indeed, the autocovariance is not summable, and the spectral density is unbounded at zero frequency.
One of the most important issues in analyzing LRD time series is estimating the memory parameter, which quantifies the strength of persistence. Li et al. (2020) considered a rescale/range (R/S) estimator to estimate the memory parameter. While the R/S estimator is consistent, it has a slow convergence rate and performs poorly in finite samples, especially small samples. For improving the convergence rate, Li et al. (2021) considered a semiparametric local Whittle estimator, which is known to be more efficient. Further, Li et al. (2021) provided asymptotic results for parametrically constructing confidence intervals of the memory parameter. The asymptotic confidence intervals are the benchmark for our empirical comparison. Through a series of Monte-Carlo studies, Shang (2020) evaluated and compared time-domain and frequency-domain estimators and found that the local Whittle estimator is often one of the most accurate estimators in long-range dependent stationary functional time series. Thus, we consider the local Whittle estimator to estimate the memory parameter, although the sieve bootstrap can also be applied to other memory estimators.
Let {X t ∶ t ∈ ℤ} be a sequence of functional observations, where each X t is a random function of a stochastic process (X t (u) ∶ u ∈ I) , I ⊂ ℝ is a compact set, ℝ is the real line and ℤ = {0, ±1, … } . Generally speaking, two major stationary shortrange dependent functional time series structures have been considered in the literature: one extends small-ball probability for mixing sequences (see, e.g., Ferraty and Vieu 2006;Bathia et al. 2010); the other extends linear and nonlinear sequences and martingale using m-dependent approximation techniques (see, e.g., Bosq 2000;Hörmann and Kokoszka 2010;Rice and Shang 2017). It is assumed that X t = g( t , t−1 , … ) , where g ∶ S ∞ → H and { t ∶ t ∈ ℤ} with t = ( t (u) ∶ u ∈ I) is a sequence of independent and identically distributed (i.i.d.) random elements in a measurable space S , and H is a separable Hilbert space. In Sect. 4, we follow the second structure to generate observations from a stochastic process that follows a functional autoregressive fractionally integrated moving average (ARFIMA) model.
With a time series of functions (X 1 , X 2 , … , X n ) , a central issue is to model the temporal dependence accurately. A challenge associated with functional time series is the curse of dimensionality. So, a common practice is to project a time series of functions onto a dominant subspace, such as the first eigenfunction of long-run covariance function (see, e.g., Li et al. 2020Li et al. , 2021Li et al. , 2022Chen and Pun 2021). In so doing, we simplify the problem from a functional to univariate time series analysis.
Long-run covariance and spectral density estimation enjoy a vast literature in functional time series, beginning with the seminal work of  and Panaretos and Tavakoli (2013). Still, the most commonly used technique is smoothing the periodogram at frequency zero by employing a weight function and bandwidth parameter. While Li et al. (2020) considered a set of linearly decaying weights to estimate the long-run covariance, Rice and Shang (2017) considered a kernel sandwich estimator 1 3 Sieve bootstrapping the memory parameter in long-range… and presented a plug-in algorithm to estimate the optimal bandwidth parameter in the kernel sandwich estimator.
Our estimation procedure begins with estimating the long-run covariance function. We then obtain the first functional principal component and its associated scores via eigendecomposition of the estimated long-run covariance function. From a univariate time series of the principal component scores, we apply the semiparametric local Whittle estimator, described in Sect. 2, to estimate the memory parameter. Li et al. (2021) presented theoretical development of the local Whittle estimator for stationary functional time series and provided asymptotic confidence intervals for the memory parameter. We aim to improve the confidence interval calibration through the sieve bootstrap method in Sect. 3. Via a series of simulation studies in Sect. 4, we evaluate and compare the finite-sample performance between the asymptotic and bootstrapped confidence intervals of the memory parameters. The conclusion is given in Sect. 5, along with some idea on how the methodology presented here can be further extended.

Estimation of the long-run covariance function
We consider a stationary ergodic functional time series, which exhibits both stationarity and ergodicity. In essence, the stochastic process will not change its statistical properties with time, and its statistical properties can be captured from a single, sufficiently long sample of the process. For such a random process, the long-run covariance function can be defined as where u, v ∈ I and denotes a time-series lag variable. A feature of long-memory processes is that the integral operator of C(u, v) is not finite. As a result, the norm of the autocovariance function of such a process decays much more slowly than that of the usual short-memory functional time series.
While the long-run covariance can be expressed as a bi-infinite summation, its estimation is not trivial. For a finite sample, a natural estimator of C(u, v) is where d denotes a memory parameter.
In practice, the population mean (u) can be estimated by its empirical counterpart X(u) = 1 n ∑ n t=1 X t (u) ; and the autocovariance (u, v) can also be estimated by From (1), the estimation of the long-run covariance function relies on knowing the value of the memory parameter. However, the first term on the right-hand side of (1) is a constant, and it does not affect the estimation of the orthonormal functions spanning the dominant subspace of functional time series. Following (Li et al. 2020), we perform eigendecomposition on From (2), we observe the long-run covariance function is a sum of autocovariance function with decreasing weights. In practice, it is essential to determine the optimal value of to balance the trade-off between squared bias and variance. In Li et al. (2020), is chosen as the minimum between sample size n and the number of discretized data points in a function. Alternatively, it is also the approach we take; one could consider a kernel sandwich estimator where h is called the bandwidth parameter, and W q (⋅) is a symmetric weight function with bounded support of order q. As with any kernel estimator, the choice of kernel function is not as important as the bandwidth parameter. Therefore, Rice and Shang (2017) proposed a plug-in algorithm for determining the optimal bandwidth parameter that minimizes the asymptotic mean-squared normed error between the estimated and actual long-run covariance functions.

Dynamic functional principal component analysis
Via Mercer's lemma, the estimated long-run covariance function Ĉ n (u, v) can be approximated by u, v) , and [ 1 (u), 2 (u), … ] are the orthonormal functional principal components. Because of the inner product in the Hilbert space, we could project a time series of functions onto a set of orthogonal (2)

3
Sieve bootstrapping the memory parameter in long-range… functional principal components. This results in the Karhunen-Loève expansion of a realization of the stochastic process, where t,k = ⟨X t (u) − X(u), k (u)⟩ denotes the k th set of principal component scores for time t.
Since the eigenvalues are ordered in descending order, the first eigenvalue and its associated eigenfunction form the dominant subspace of the functional time series. It is custom to use the first set of principal component scores t,1 to determine the memory parameter (see also Li et al. 2020Li et al. , 2021Li et al. , 2022Chen and Pun 2021). Alternatively, one could also consider taking the L 2 norm of several leading principal component scores (see, e.g., Li et al. 2020). The memory parameters estimated from both approaches are similar, so we choose to work with the former.

Local Whittle estimator
The local Whittle estimator is a Gaussian semiparametric estimation method to estimate the memory parameter based on the periodogram. Introduced by Künsch (1987), Robinson (1995) and Velasco (1999), this frequency-domain estimator does not require the specification of a parametric model for the data. Instead, it depends on the specification of the shape of the spectral density of the time series. The spectral density f ( ) of a stationary time series is assumed to satisfy where 0 < G < ∞ and − 1 2 < d < 1 2 . To estimate d, we define an objective function where ln(⋅) denotes natural logarithm, I( i ) denotes the sample periodogram, which is the square of discrete Fourier transform of the scores, i = (2 i)∕n , i = 1, … , m , and m is a positive integer and m = o(n) (Robinson 1995). Customarily, m = ⌊n 0.65 + 1⌋ . By minimizing the objective function, we define the estimates where the closed interval of admissible estimates of d, Θ = [− 1 2 , 1 2 ] denotes an admissible range for stationary time series. While G and d can be estimated jointly in (3), one could also use an iterative estimation procedure Based on this asymptotic result, we can construct our parametric confidence intervals as: where Q(⋅) represents a quantile function of the standard Gaussian distribution, and denotes a level of significance, such as = 0.2, 0.05 and 0.01.

Sieve bootstrapping
Bootstrapping has been receiving increasing attention in the functional time series literature to quantify estimation or prediction uncertainty. Franke and Nyarige (2019) (2017) proposed a kernel estimation of the first-order nonparametric functional autoregression model and its bootstrap approximation. We revisit the sieve bootstrap of Paparoditis (2018). The basic idea of the sieve bootstrap is to generate a functional time series of pseudo-random elements (X * 1 , X * 2 , … , X * n ) , which appropriately imitate the temporal dependence of the original functional time series. The sieve bootstrapping begins with centering the observed functional time series by computing Y t = X t − X n , where X n = 1 n ∑ n t=1 X t . Via Karhunen-Loève representation, random element Y t can be decomposed into two parts: Sieve bootstrapping the memory parameter in long-range… where K denotes the number of retained functional principal components. While the first terms on the right-hand side of (5) are considered as the main driving part of X t , the second term is treated as white noise. The value of K is determined as the integer that minimizes ratios of two adjacent empirical eigenvalues given by where ̂ k represents the k th estimated eigenvalue from the functional principal component analysis, k max is a pre-specified positive integer, is a pre-specified small positive-valued number, and 1(⋅) is the binary indicator function. Without a prior knowledge about a possible maximum value of k, it is unproblematic to choose a relatively large k max , e.g., k max = #{k�̂ k ≥ 1 n ∑ n k=1̂ k } (Ahn and Horenstein 2013). Since small empirical eigenvalues ̂ k are close to zero, we adopt the threshold constant = 1 ln[max(̂ 1 ,n)] to ensure consistency of k.
We compute sample variance operator 1 n ∑ n t=1 Y t ⊗ Y t , and then obtain the first K set of estimated orthonormal eigenfunctions (̂ k , k = 1, 2, … , K) corresponding to the K largest estimated eigenvalues. By projecting a time series of functions onto these orthonormal eigenfunctions, we obtain a time series of estimated, K-dimensional vector of scores; that is, where t = + 1, + 2, … , n , with ê t being the estimated residuals. The order of the fitted VAR model is chosen using a corrected Akaike information criterion (Hurvich and Tsai 1993), that is, by minimizing over a range of values of .
is the variance of the residuals and ê t, is the residuals after fitting the VAR( ) model to the K-dimensional time series of estimated scores (̂ 1 ,̂ 2 , … ,̂ n ) . Computationally, the VARselect function in the vars package (Pfaff 2008) is implemented for selecting the optimal VAR order and estimating parameters.
Given the VAR is an autoregression, it requires a burn-in period to remove the effects of starting values. With a burn-in sample of 100, we generate the vector time series of scores where we use the starting value * t =̂ t for t = 1, 2, … , . The bootstrap residuals e * t are i.i.d. resampled from the set of centered residuals {ê t − e n , t = + 1, + 2, … , n} , and e n = 1 n− ∑ n t= +1êt . Through the Karhunen-Loève expansion in (5), we obtain bootstrap samples We discard the first 100 generated X * t observations and keep (X * 101 , X * 102 , … , X * n+100 ) as the bootstrap generated pseudo functional time series.

Functional ARFIMA model
We simulate realizations of a stochastic process that follows a functional ARFIMA model. The functional ARFIMA (p, d, q) can be defined as and where B denotes the backshift operator, t denotes the white noise operator, and s (u, v) and s (u, v) are the kernel functions. Provided the usual conditions on the autoregressive and moving-average operators are satisfied, the models (6) and (7) represent a stationary process if d < 1∕2 and is invertible if d > −1∕2.
The functional ARFIMA model can be viewed as a generalization of some widely used parametric time-series models. When d = 0 , model in (6) and (7) becomes the functional autoregressive moving average of Klepsch et al. (2017). Further, when q = 0 , it further reduces to the functional autoregressive model of Bosq (2000) and Liu et al. (2016); when p = 0 , it reduces to the functional moving average model of Chen et al. (2016) and Aue and Klepsch (2017).
We consider generating a time series of functions through a functional ARFIMA(p, d, q) model, where function support I = [0, 1] , { t ∶ t ∈ ℤ} a sequence of i.i.d. standard Brownian motions over [0, 1], in the following two cases:

3
Sieve bootstrapping the memory parameter in long-range… The constants c FAR and c FMA in 1 (u, v) and 1 (u, v) are selected to ensure that both ‖ 1 ‖ and ‖ 1 ‖ are smaller than one (see, e.g., Rice and Shang 2017;Kokoszka et al. 2017), so the simulated curve time series are stationary and invertible. We consider the following sets of constants: Set 1: ‖ 1 ‖ = ‖ 1 ‖ = 0.5 , which corresponds to c FAR = 0.34 and c FMA = 1.5. Set 2: ‖ 1 ‖ = ‖ 1 ‖ = 0.9 , which corresponds to c FAR = 0.612 and c FMA = 4.765.
The sample sizes are n = 250 and 500, each with R = 200 replications.
As an illustration, in Fig. 1, we present one replicate of n = 250 functional time series generated from a ARFI(1, d) model with d = 0.2 . Following (Mestre et al. 2021), we plot the functional autocorrelation and partial autocorrelation functions to check the linear temporal dependence in the simulated functional time series. For the ARFI(1, d) model, the functional partial autocorrelation function reveals a significant correlation at lag one. For the ARFIMA(1, d, 1) model, the functional autocorrelation and partial autocorrelation functions do not clearly indicate the autoregressive and moving-average orders.
In Fig. 2, we present one bootstrap sample using the sieve bootstrap method and compute its functional autocorrelation and partial autocorrelation functions. The bootstrap samples obtained from the sieve bootstrap method capture the linear temporal dependence exhibited in the original functional time series.
Via the local Whittle estimator, we estimate the memory parameter d = 0.3101 and 0.3346 for simulated data generated from the ARFI(1, d) and ARFIMA(1, d, 1) models, respectively. Because the sample size n = 250 is relatively small, it is expected that there will be a considerable difference between the sample and population parameters. To quantify the estimation uncertainty, we construct 80%, 95%, and 99% CIs using the asymptotic and sieve bootstrap procedures in Fig. 3. The CIs based on the asymptotic normality are constructed from (4), while the CIs based on the sieve bootstrapping are percentiles of 400 bootstrap memory estimates. Figure 3 displays the CIs constructed by the asymptotic and sieve bootstrap procedures for one simulated functional time series. To assess overall performance, we Sieve bootstrapping the memory parameter in long-range… repeat the following evaluation criteria for R = 200 replicates and report the results in Sect. 4.3.

Evaluation criteria of the interval forecast accuracy
To measure the interval forecast accuracy, we consider the empirical coverage probability. For each value of d, we compute the empirical coverage probability (ECP) at a level of significance as where 1(⋅) represents the binary indicator function, and ( d lb r, ,d ub r, ) represents the lower and upper bounds of asymptotic or bootstrap CIs for the r th replication. From the ECP , we compute the coverage probability difference (CPD ), which is the absolute difference between the empirical and nominal coverage probabilities at various levels of significance.

Simulation results
We consider the asymptotic CIs based on (4). Implemented by Li et al. (2021), this parametric approach of constructing CIs provides symmetric lower and upper CIs. In contrast, the sieve bootstrapping produces a set of bootstrap functional time series. For each of the B = 400 bootstrap samples, we can implement the local Whittle estimator in Sect. 2 to compute an estimate of the memory parameter. By taking quantiles of all bootstrap memory parameter estimates, we obtain a nonparametric approach to constructing CIs. This nonparametric approach constructs lower and upper bounds that can be asymmetric.
In Table 1, we present the ECP in (8) and CPD for three levels of significance. Under a ARFI(1, d) model, these results are averaged over R = 200 replicates for n = 250 and 500. The CIs constructed by the sieve bootstrapping generally have a better calibration than those constructed by the asymptotic normality results. By better calibration, the empirical coverage probabilities are closer to the nominal coverage probabilities. This finding is not surprising since the CIs of the sieve bootstrap can be asymmetric around the sample estimate of the memory parameter. For the sieve bootstrap method, the empirical coverage probabilities gradually deviate from the nominal coverage probabilities as the memory parameter d increases from 0.05 to 0.45. This phenomenon holds for the functional time series with a moderate temporal dependence, and it becomes less so for the stronger dependence. With various significance levels, we observe that the mean CPDs obtained from the sieve bootstrap are generally smaller than those obtained from the asymptotic normality. As the sample size increases from n = 250 to 500, the CPD becomes smaller for both approaches. When the temporal dependence is

3
Sieve bootstrapping the memory parameter in long-range… For each replicate, we compute empirical coverage probabilities and coverage probability differences for the CIs constructed by the asymptotic and sieve bootstrap procedures at three levels of significance. Empirical coverage probability is abbreviated as ECP, while coverage probability difference is abbreviated as CPD. The smallest averaged CPD values are highlighted in bold for each level of significance stronger, we observe a further improvement in estimation accuracy from the sieve bootstrapping.
In Table 2, we present the ECP in (8) and CPD for three levels of significance. Under an ARFIMA(1, d, 1) model, these results are averaged over R = 200 replicates for n = 250 and 500. The CIs constructed by the sieve bootstrapping also have a better calibration than those constructed by the asymptotic normality. As the sample size increases from n = 250 to 500, the CPD generally becomes smaller for both approaches. When the temporal dependence is stronger, we observe a further improvement in estimation accuracy from the sieve bootstrapping.

Simulation of functional time series via discrete Fourier transform
Following an early work of Li et al. (2021), we also consider another data generating process by using an algorithm of Davies and Harte (1987) to simulate functional time series. Let X t be a 'fractional noise' process with autocovariance where d = −0.3, −0.15, 0, 0.15, 0.3 . These parameter values are chosen to reflect negative dependent, short-range dependent and long-range dependent properties. For each n, let g k ∶= g n,k , k = 0, 1, … , 2n − 1 , be the discrete Fourier transform of the real sequence { 0 , 1 , … , n−1 , n , n−1 , … , 1 } , i.e., and g k = g 2n−k for k = n, … , 2n − 1 . Let t be an i.i.d. standard Brownian motion sequence over a domain [0, 1] and then we construct

3
Sieve bootstrapping the memory parameter in long-range…   We take n = 250 and 500 with R = 200 replications. For each replication, we generate B = 400 bootstrap pseudo functional time series.
As an illustration, in Fig. 4, we present one replicate of n = 250 functional time series generated from (9) with d = −0.3, 0 and 0.3 to reflect negative dependence, short-range dependence and long-range dependence. We plot the functional autocorrelation and partial autocorrelation functions to examine the behavior of the linear temporal dependence.
In Table 3, we present the ECP and CPD for three levels of significance. These results are averaged over R = 200 replicates for n = 250 and 500. As measured by the averaged CPD, the CIs constructed by the sieve bootstrapping generally have a better calibration than those constructed by the asymptotic normality results. By better calibration, the ECPs are closer to the nominal coverage probabilities. The CIs obtained from the sieve bootstrapping have a better calibration when d ≤ 0 , whereas the CIs obtained from the asymptotic normality have a better calibration when d > 0 . Our finding further confirms the validity of the asymptotic CIs for long-memory functional time series. As the sample size increases from n = 250 to 500, the CPD becomes smaller for the parametric approach based on the asymptotic normality.

Conclusion
Not all long-memory estimators have an asymptotic distribution. For those other than the local Whittle estimator, it may not be possible to construct parametric confidence intervals of the memory parameter. In contrast, the sieve bootstrapping generates pseudo stationary functional time series, where the temporal dependence in the original functional time series is adequately preserved. For each bootstrap replicate, any long-memory estimator can be applied. Therefore, the sieve bootstrap method presents a general framework for constructing confidence intervals of the memory parameter. Using the ARFI(1, d) and ARFIMA(1, d, 1) models, via a series of simulation studies, we evaluate and compare the empirical and nominal coverage probabilities between the asymptotic and bootstrap confidence intervals using the local Fig. 4 With the same random seed, we simulate functional time series with d = −0.3, 0 , and 0.3. These d values are chosen to reflect negative dependence, short-range dependence, and long-range dependence in the series. For each simulated series, we also display its corresponding functional autocorrelation function Whittle estimator. Averaged over nine d values from 0.05 to 0.45, the sieve bootstrap confidence intervals produce a better calibration than the asymptotic confidence intervals, especially when the functional time series exhibits a stronger dependence.
In addition, we also consider a simulation study where functional time series can exhibit negative dependent, short-range, and long-range dependent. Averaged over five d values from − 0.30 to 0.30, the sieve bootstrap confidence intervals produce a better calibration than the asymptotic confidence intervals, especially when the memory parameter d ≤ 0.
There are several ways in which the present paper could be further extended. We briefly mention four below: (1) In the local Whittle estimator, the bandwidth parameter m = ⌊n 0.65 + 1⌋ is used in the simulation studies. It is possible to explore other values centered around 0.65. (2) In the sieve bootstrap, we generate B = 400 Table 3 With n = 250 and 500, 200 replicates of the functional time series are generated from the ARFI(1, d) model For each replicate, we compute empirical coverage probabilities and coverage probability differences for the CIs constructed by the asymptotic and sieve bootstrap procedures at three levels of significance. Empirical coverage probability is abbreviated as ECP, while coverage probability difference is abbreviated as CPD