Estimation methods for stationary Gegenbauer processes

This paper reviews alternative methods for estimation for stationary Gegenbauer processes specifically, as distinct from the more general long memory models. A short set of Monte Carlo simulations is used to compare the accuracy of these methods. The conclusion found is that a Bayesian technique results in the highest accuracy. The paper is completed with an examination of the SILSO Sunspot Number series as collated by the Royal Observatory of Belgium.


Introduction
Long memory models have become useful in a large number of areas including hydrology, astronomy, econometrics and finance. One of the first practitioners to appreciate the significance of these models was Hurst (1951Hurst ( , 1956 who studied the flows of the river Nile, and identified the distinction between short and long memory models; specifically noting that It is commonly assumed that the frequency distribution for river flows approximates to the Normal or Gaussian curve, and this is usually true, but it is only part of the description of the phenomenon, since there is also a tendency for high or low years to be grouped together. Thus there is a tendency for phenomena (e.g. floods, high inflation levels etc) to occur in runs. If the Nile were to flood at very high levels for year after year, it is likely that a model based on traditional short-memory estimates (which assume flooding in one year is correlated only to the most recent few years of flood levels) would suggest a B Richard Hunt richard@huntemail.id.au 1 School of Mathematics and Statistics, University of Sydney, Camperdown, Australia dam level too low. The likely result of this would be that the dam would overflow after only a few years of the higher flood levels. The work of Hurst was pivotal in helping hydrologists decide to build the high dam at Aswan much higher than had previously been planned. A summary of Hurst's life and work may be found in Sutcliffe et al. (2016). This paper will deal primarily with a subclass of long memory models, known as Gegenbauer models, which are formally defined in Sect. 2. Gegenbauer models provide long memory in 'cycles' and this property has enabled them to be successfully applied to a wide range of phenomena, including modeling the El-Niño effect, where Lustig et al. (2017) found evidence of a 2-factor Gegenbauer process driving the long-term memory in this climate phenomena, and also Diongue and Guégan (2008) examined electricity spot prices using a Gegenbauer-GARCH process (introduced formally below).
A short introduction to the area of long memory models in general, including the 'standard' ARFIMA model and also Gegenbauer models in particular was given by Holan et al. (2010), in Section 6, and a more detailed one is provided in Woodward et al. (2017), Ch 11. Other excellent and detailed references for long memory models specifically (but with only a short mention of Gegenbauer models) are Beran et al. (2013) and Palma (2006), both of whom also cover in detail the estimation of long memory ARFIMA processes.
Section 2 will provide a very brief introduction to Gegenbauer processes and also cover a few further variations on these models. Section 3 details a variety of estimation methods for Gegenbauer models. Section 4 illustrates the estimation methods using Monte-Carlo simulations to compare performance and Sect. 5 examines an application of these models to updated SILSO sunspot data. Section 6 concludes the paper. A Box-Jenkins (Box and Jenkins 1976) style of seasonal model has a general form of

Box-Jenkins seasonality and the general seasonal model
where φ(z) = 1 − φ 1 z − · · · − φ p z p is a polynomial of order p in z, θ(z) = 1 − θ 1 z − · · · − θ q z q is a polynomial of order q in z, Φ(z; s) = 1 − Φ 1 z s − · · · Φ P z Ps is a polynomial of order P in z s , and Θ(z; s) = 1−Θ 1 z s −· · ·−Θ Q z Qs is a polynomial of order Q in z s , with s ∈ N representing the seasonality or periodicity of the model (this paper will use both terms interchangably), μ is the mean of the stationary process, B is the backward shift operator defined as B X t = X t−1 , and t is a set of uncorrelated zero mean innovations with constant variance σ 2 , possibly non-Gaussian.
The above model may be re-written in a more general form as where Ψ (B; s, Φ, Θ) = Φ(B; s) Θ(B; s) . (2) The Box-Jenkins seasonality model defined by (1) and (2) provides an absolutely summable autocorrelation function (ACF)-equivalently a continuous and bounded spectral density-and is thus a short memory process. Many standard results hold for this model-for example Mann and Wald (1943) provides consistency and central limit theorem results for maximum likelihood-derived parameter estimates and Hannan (1973) provides similar results for the Whittle likelihood parameter estimates.
The form of (1) allows for alternative forms of seasonality. One of those as detailed in Sect. 2.2 is Gegenbauer seasonality. For this reason (1) is referred to as a general seasonal model.
It should be noted that (1) can be extended to include k seasonal factors by repeating the factor (2) k times, with different periodicities s 1 , . . . , s k .
Finally note that if Φ(B; s) ≡ 1 then (1) is generally referred to as an ARMA (or autoregressive, moving average) model. Adenstedt (1974) originally introduced the concept of processes based on Gegenbauer polynomials in a general form, with the specific details of applying (1) developed further by Gray et al. (1989). These authors introduced a type of seasonal model, defined with just 2 parameters and using the generating function for the Gegenbauer (or hyper-spherical polynomials (Rainville 1960;Szegò 1959)) as

Gegenbauer seasonality
where |u| ≤ 1. For this model, define λ = arccos(u) and the period is thus s = 2π/λ. It is also noted that the periodicity here may be fractional (unlike Box-Jenkins seasonality) and secondly that the periodicity can be estimated directly from the likelihood function without needing to do this as a part of an identification step as in traditional Box-Jenkins models (Box and Jenkins 1976). The Gegenbauer frequency is sometimes written as f G = 1/s. A full k-factor Gegenbauer model is defined as (4) where u j , j = 1, . . . , k are distinct real numbers with |u j | ≤ 1, and d j are real numbers, which must satisfy |d j | < 1 /2 for the process to be stationary and invertible, unless u j = 1 in which case |d j | < 1 /4. As mentioned above, the factor (1 − 2u B + B 2 ) d is the generating function for the Gegenbauer orthogonal polynomials, which can be written as Recursion formulae exist for calculating these polynomials-refer to Szegò (1959).
The spectral density of model (4) is given by where λ j = arccos(u j ). When k = 1 there is an alternative formulation of (6), which reflects only the behavior of the function around the unbounded peak when d > 0 This formulation is often used to define the regularity conditions for semiparametric estimators of λ and d (refer Sect. 3.4).
Both (6) and (7) illustrate that the spectral density of a process with Gegenbauer seasonality will in general have k values where there is a discontinuity in the spectral density, and where the function becomes unbounded in a neighborhood of that peak. The peaks correspond to the Gegenbauer frequencies.
In the remainder of this paper, when k = 1, the subscript will be dropped from the parameters. Note that it is usual to refer to a k = 1 process of the form of (4) as a GARMA(p,d,q;u) process, and this format is used when referring to specific models in Sect. 4.
Finally we define an ARFIMA process from (4) with k = 1, u = 1, and δ = 2d, giving Figure 1 illustrates the ACF for the Box-Jenkins seasonal component defined by (2) (with P = 1, Φ 1 = 0.8) and a Gegenbauer seasonal component defined by (3) (with d = 0.4). The ACF for the Gegenbauer seasonal component was generated using the approximation formula of Gray et al. (1989) Theorem 3(c). For clarity of interpretation the seasonal period s for both has been set to 24. Referring to Fig. 1 it is apparent that the Box-Jenkins ACF provides correlation only at the seasonal lags, whereas the Gegenbauer ACF provides some correlation at adjacent lags as well, since it represents a damped sine wave. This might be appropriate for instance when modeling daily data with a (calendar) monthly periodicity. However in different circumstances and with different datasets, it is expected that either form might prove to be useful.

Comparing theoretical ACF of seasonal models
A second key feature is perhaps less obvious, but can be seen by observing the height of the ACF at lags 24 and 192-for the Box-Jenkins model the ACF decays quite rapidly, with ACF at lag 192 being only 21% of the value at lag 24. However the Gegenbauer ACF at lag 192 is still 66% of the value at lag 24.
This illustrates the 'long memory' component of the Gegenbauer seasonal model. Observations quite far apart in time are still quite highly correlated with each other.

Variations on Gegenbauer seasonality
One variation on model (4) was introduced by Giraitis and Leipus (1995) which was a fractional ARUMA process. This has 2 unit roots with differing degrees of fractional differencing, and is defined as Giraitis and Leipus (1995) showed consistency of the Whittle estimates of the long memory components and the Gegenbauer frequencies of this process. Setting d 0 = d k+1 = 0 shows this result includes (4). Let , −π < ω < π represents the spectral density of the short memory component of the process (with Y t representing the short memory process)-also known as the underlying ARMA process, and thus As an alternative to an underlying ARMA process, Bloomfield (1973) introduced the exponential model for short memory processes with a rational spectrum defined by |φ(e iω )| 2 is "often found to be a well behaved function" and this might then be approximated by a truncated Fourier series logk(ω) = m e r =1 ξ r cos(r ω), with the intention of using a small value for m e , resulting in the spectral density for a new class of models defined by where the ξ r are Fourier coefficients. Beran (1993) extended this formulation to long memory models, labeling the result as the Fractional EXPonential or FEXP model, and this was further extended to Gegenbauer models by Hsu and Tsai (2009) (for the k = 1 model) and McElroy and Holan (2012) for the general k > 1 model. This model is also defined through its spectral density as where m e indicates the truncation of the approximation to the short memory process, and ξ r , r = 1, .., m e are the short memory parameters to be estimated. ARFIMA models defined by (8) have been extended to include conditional heteroscedasticity. The fractionally integrated (long memory) GARCH (FIGARCH) model was introduced by Baillie et al. (1996) and Bollerslev and Mikkelsen (1996). Ling and Li (1997) extended these models to cover the GARCH(r , s) error structure where the t in (8) are modeled as with e t |F t−1 ∼ N (0, 1), where F t denotes the sigma-field spanned by (X 1 , . . . , X t−1 , X t ). Note that some authors assume a simpler model for the GARCH(r ,0) process, labeled as ARCH(r ) when s = 0. This is distinct from a stochastic volatility model as defined by Phillip et al. (2018) (for instance) following from Bos et al. (2014), who introduced the GARFIMA-SV model, where the error structure is given as This defines h t |F t−1 (and thus the magnitude/scaling of t ) as stochastic, which is not the case with a GARCH model. Dissanayake and Peiris (2011);Dissanayake et al. (2014) and later Boubaker (2015) introduced the GARMA-GARCH variation, which applies the disturbance term defined by (13) to (4). Dissanayake et al. (2014) used a truncated maximum likelihood technique to estimate these models, and Boubaker (2015) used a wavelet based method to estimate the Whittle likelihood (refer Sect. 3.2).
Other variations on the basic process (4), mentioned here but otherwise not followed up include: 1. Arteche (1998) introduced the asymmetric Gegenbauer processes. The interested reader may consult Arteche (1998) and also Arteche (2007) the latter of which describes an application of this model to a monthly Spanish inflation process. 2. Bordignon et al. (2007) introduced a variation on (13) so that the process variance (rather than the underlying process) follows a GARMA definition. The idea is that since a Gegenbauer process can model seasonal long memory behavior, so there can be a minimal set of frequencies to be included in the model. Bordignon et al. (2007) give an example of the log returns for the Euro-Dollar exchange rate, with k = 4. 3. Lu and Guegan (2011) introduced a version of (4) which allowed for a time varying fractional differencing parameter d j = d j (t/n) which can be considered as locally stationary. Lu and Guegan (2011) introduced a conditional method for estimating the parameters of these models. 4. Espejo et al. (2014) presented a 2-dimensional spatial version of the Gegenbauer process-a Gegenbauer random field, although without a moving average shortmemory component. These can be used for instance for spatial modeling of climate phenomena-for example, see Rypdal et al. (2015) for a pure long memory example. Further, Espejo et al. (2015) presents a class of minimum contrast estimators for these Gegenbauer random fields and establishes consistency and asymptotic normality results for those estimators. 5. Li et al. (2020) introduced long memory functional time series, including a ARFIMA process as a special case. A number of key theorems were provided as well as some examples. 6. Wu and Peiris (2018) introduced a vector (dimension m, k = 1) cointegrated Gegenbauer process, which they labeled as VEGARMA, which allowed cointegration only for the short memory components of the model. They derived a quasi-maximum likelihood estimation procedure for this model. They do not examine any theoretical properties of these estimators. This was used to model a vector process consisting of 669 consumer price index (CPI) figures from France and the USA.

Parameter estimation
Whilst a natural approach for many practitioners might be to evaluate a maximumlikelihood function to obtain parameter estimates, in the case of long memory and Gegenbauer models, this can amount to a considerable investment in time and computer resources. It is also true that the assumptions behind such a model may not be satisfied. Note that the maximum likelihood estimation of just 1 of the simulations from Sect. 4 (below) using a Kalman filter to evaluate the likelihood as documented in Palma and Chan (2005) took more than 11 hours on the same hardware as the other simulations were run typically taking less than 1 second. Thus alternative methods to estimate the values of the parameters have become popular. There are 2 main approaches taken to the estimation problem. One approach is first to establish the order of the model (the model identification step-to determine appropriate values of k, p, q), and then estimate the parameters based on a range of techniques like maximum-likelihood, approximate maximum-likelihood, quasi maximum-likelihood, conditional sum-of-squares or Whittle.
The second method is to estimate the long memory parameters first. This technique was originally used to allow the original series X t to be transformed to Y t via the relation (10)-which is an approximate process only-and then use techniques such as those described in Box and Jenkins (1976) to estimate the remainder of the parameters (refer for instance to Geweke and Porter-Hudak (1983)). Whilst this technique provides conditional parameter estimates, they are fast to compute and further can often be used as starting values in a numerical optimization process for other methods.
More practically, these latter techniques have been used to estimate the long memory components without specific knowledge of the structure of the short memory components.
For instance if a single unbounded peak can be identified in the spectrum of a given process, then these techniques can be used to identify the location (u) and sharpness (d) of that peak, allowing (10) to be used to generate an ARMA process which can then be estimated via traditional means-see for example Box and Jenkins (1976).
Define ν to represent the vector of parameters to be estimated, which may vary from occurrence to occurrence based on the underlying model structure. For a given sample of n observations of the process X t available, t = 1, . . . , n, define ω j = 2π j n , and w n (ω) = 1 √ 2π n n t=1 X t e itω , where the mean is omitted since n t=1 e itω j = 0, for all j = 1, . . . , n.
The periodogram of (4) is I (ω) = |w n (ω)| 2 . Further, defineñ = [n/2] where the square brackets indicate the largest integer which is less than or equal to the argument. ψ(x) denotes the di-gamma function.

Likelihood-related methods
The Gaussian log-likelihood function for (4) is where X represents the vector of realizations and 1 n represents an n × 1 vector of 1's, and the covariance matrix of the process X t is Σ ≡ Σ(ν) which is assumed to be a strictly positive definite, n ×n matrix, and where ν represents the vector of parameters. Further defineX t as the fitted value of the model and e t = X t −X t as the residuals.
A key issue with strict likelihood based estimation techniques is the need to invert the covariance matrix Σ(ν). There have been many proposals to identify methods to avoid or approximate this.
The original papers by Gray et al. (1989Gray et al. ( , 1994 include suggested details on estimation-a grid search on an approximate log-likelihood-actually a conditional sum-of-squares (CSS) approximation-to identify the long memory parameters d and u and then transform the original process to a short memory ARMA process via (10).
Chung (1996a) provided a central limit theorem for a Gegenbauer k = 1 white noise process estimated via a CSS method, and Chung (1996b) examined a k = 1 Gegenbauer process defined by (4), including short-memory ARMA parameters as well as the long memory parameters. The CSS method may be illustrated by considering Brockwell and Davis (1991), equations 8.6.6 and 8.6.7 which show that, If we writev = n t=1 v t 1/n , then the resulting log-likelihood approximation is Whilst Chung (1996a, b) derives distributional characteristics for the estimators obtained by minimizing this, these arguments were refuted by Giraitis et al. (2001), in part because the proof of consistency was not adequate. Robinson (2006) provided a consistency result for the k = 1 CSS estimates. Smallwood and Beaumont (2004) examined the full k-factor model defined by (4) and derived results very similar to those of Chung (1996b). Further work was presented in Beaumont and Smallwood (2019), establishing the asymptotic independence of the u j , j = 1, .., k from the other parameters in the model; the asymptotic Gaussian distribution for the parameters excepting the u j , and a complex asymptotic distribution involving independent Brownian motions for the u j . However the literature does not appear to provide a result which establishes consistency of these estimators in the k > 1 case.
In a related paper, Beaumont and Smallwood (2020) examined a variety of methods for testing a null hypothesis of no cyclic behavior (u = 1) against the alternative that there is cyclic behavior. They examined the methods provided by Chung (1996b) and Hidalgo (2005) (mentioned further in Sect. 3.4 below) and found that the CSS method identified by Chung (1996b) was rarely able to reject the null hypothesis, whereas the method identified by Hidalgo (2005) was much better formed but more difficult to use. Beaumont and Smallwood (2020) proceed to provide advice on the selection of bandwidth parameters which may assist the practitioner.
A technique related to CSS estimation-but not directly based on a likelihood-was provided by Kouamé and Hili (2008) who looked at the process (4) and identified the consistency and asymptotic normality of an estimate of the autocorrelation function of the disturbance term, and then considered an estimator based on minimizing the 'distance' between the actual and theoretical autocorrelation functions of the disturbance term, technically defined as where m f is a fixed integer, greater than the number of parameters to be estimated, with , and e t (ν) = n−1 j=0 α j (ν)X t− j are the residuals of the model. The α j (ν) are defined by the relation t = ∞ j=0 α j (ν)X t− j which converges when the process is invertible and causal. The minimization process can thus be seen as ensuring the autocorrelation function of the residuals is as close to a white noise process as possible, up to a fixed lag. Kouamé and Hili (2008) find that if the parameter space is restricted to be a compact set then the estimates of all parameters are consistent, and with the exception of the Gegenbauer frequencies (which are then assumed to be known apriori), they are asymptotically Gaussian. Dissanayake et al. (2016) derive a quasi-maximum likelihood (QML) estimation procedure (both AR and MA truncated versions) using a Kalman filter to evaluate the likelihood function for a k = 1 Gegenbauer process which includes estimation of the Gegenbauer frequency. Consistency and asymptotic normality are established only for the Gaussian white-noise case. Dissanayake et al. (2014) and Dissanayake (2015) (Ch 4) further examined this estimation method for GARMA-GARCH models defined by (13) (without specific ARMA components). Whilst the exercise of putting a long memory process into a State Space representation results in a state vector with infinite length, it was shown by Chan and Palma (1998) Theorem 2.2 that for a long memory model, although the State Space representation is infinite-dimensional, the likelihood can be evaluated in a finite number of steps.
McElroy and Holan (2012) explored efficient methods for calculating the covariance matrix Σ(ν), and its determinant which allowed a relatively efficient calculation of the exact likelihood function. This was extended by McElroy and Holan (2016) who showed specifically how the technique could be applied for very efficient likelihood estimation specifically for the GEXP model (12), although they state the technique can be equally applied to other Gegenbauer based models like the GARMA model. McElroy and Holan (2016) illustrated this technique, as did Gray et al. (1989) before them, with the Mauna Loa data-382 monthly atmospheric C O 2 measurements collected at the summit of Mauna Loa in Hawaii beginning in March 1958 (Keeling et al. 1989). McElroy and Holan (2016) showed using 2-factor GEXP model with 1 short memory parameter (excepting the disturbance term variance) and for a process of length 500, a single maximum likelihood evaluation on their hardware took over 180 s, whilst their new technique took just 1.18 s.

Whittle and other frequency domain methods
Whilst the Whittle methods may technically be viewed as a type of approximate likelihood method, the utility and popularity of the method deserves separate treatment.
The Whittle likelihood can written as This is derived directly from the Gaussian log-likelihood by a process of approximating the Fourier transforms of the autocovariance function by the eigenvectors of Σ(ν)see Whittle (1953), Eqn 5.3. Giraitis and Leipus (1995) obtained parameter estimates for (9) by minimizing (17) to obtain Whittle estimates for the process, and proved consistency. Giraitis and Taqqu (1999) showed for long memory models a √ n consistency for the Whittle based estimators under a Gaussian assumption and showed the parameters were asymptotically Gaussian under a range of conditions on the spectral density around the unbounded peak. Giraitis and Taqqu (1999) found that when the long memory parameter is outside the stationary and invertible region, or for non-Gaussian processes, the √ n consistency does not always hold and the limit distribution is not necessarily Gaussian.
For Gaussian Gegenbauer processes, Yajima (1996) examined (4) for k = 1 and found the maximum of the normalized periodogram-defined as arg max ω∈(0,π ) I (ω)-is a strongly consistent estimator of the Gegenbauer frequency. Yajima (1996) conjectures the limiting distribution of this estimator is non-Gaussian but this is not proved. For the remainder of the parameters (excluding the process mean μ), Whittles' technique is shown to provide strong consistency of those estimates. Assuming the Gaussian process is stationary, the asymptotic distribution of the estimator is Gaussian, when the Gegenbauer frequency is estimated as above, subject to some regularity conditions on the spectral density. Giraitis et al. (2001) established a strong, general result for a Whittle estimator of k = 1 Gegenbauer processes using a general definition which includes standard Gegenbauer processes defined by (4) and also GEXP processes defined by (12), without the Gaussian assumption. The likelihood to be minimized is given by Giraitis et al. (2001) assume π −π log (k(ω)) dω = 0 and thus (18) is the Whittle likelihood (17). Specifically, when the conditions in the Appendix are satisfied, they identify nconsistency for the Gegenbauer frequency, and √ n-consistency for the remainder of the parameters. Further for all parameters except the Gegenbauer frequency they established a central limit theorem. Nordman and Lahiri (2006) established a frequency domain empirical likelihood function for a long memory process, and showed that this could be used to establish a Gaussian central limit theorem for Whittle estimates. However there has been no extension of this to Gegenbauer processes.
For the long memory FEXP model defined by (12), Moulines and Soulier (1999) looked at a broadband least-squares estimator. Hsu and Tsai (2009) extended this to a k = 1 Gegenbauer model as follows: z j = −2 log cos ω j − cos λ , 1, cos ω j , cos(2ω j ), . . . , cos(m e ω j ) and γ is the Euler-Mascheroni constant. This is in the form of a linear regression equation, and if λ is known apriori, then ψ can be estimated bŷ . For a k = 1 Gegenbauer process with sufficiently smooth short memory process (and some other technical conditions), Hsu and Tsai (2009) showed consistency and also that Var(ψ) = Wavelets have also been used to calculate L W . Whitcher (2004) examines a k = 1 white noise process for (4) and derives a wavelet estimation method for a Whittle-like likelihood function. Boubaker (2015) extends the result of Whitcher to a full GARMA-GARCH model defined by (13). Neither paper examines the impact of estimating L W in this way on the consistency or asymptotic distribution of the estimates; presumably the results above for a Whittle estimator are assumed to hold for these methods. The interested reader is also referred to Sect. 3.4 which documents some interesting work by Alomari et al. (2020) on semi-parametric estimation with wavelets in the context of functional time series.
Finally, Hunt et al. (2021) described a Whittle-like estimator, and whilst they did provide consistency and a central limit theorem in a variety of restricted situations, they do not provide full consistency results or CLT for all parameters where the underlying short memory process is an ARMA process. However they do indicate that the estimator is more resistant to highly non-Gaussian distributions, particularly where the distribution is highly skewed. Their estimator was defined by minimising This estimator is technically not derived from a likelihood function but instead looks to find those values of the parameters ν which will make the spectral density most closely match the periodogram.

Bayesian methods
This section starts with a brief overview of the Bayesian methods for parameter estimation, relating to long memory processes generally, rather than specifically Gegenbauer processes.
Bayesian techniques have been used to estimate the parameters of ARFIMA and GARMA processes using both frequency domain methods and time domain methods. A key contribution made by almost all authors has been to find efficient ways to evaluate or approximate the full likelihood function, or use Bayesian techniques for some frequency domain approaches, particularly semi-parametric estimation approaches.
Specifically, and looking first at the frequency domain methods, a semi-parametric approach to estimating the long memory parameter was first explored by Petris (1997). Liseo et al. (2001) also developed a semi-parametric method to estimate the long memory parameter. Liseo et al. (2001) used a frequency domain method to divide the parameter space into regions where prior information is available and others where vague priors are adopted. Specifically, they restrict evaluation of the short memory parameters to those frequencies of the spectral density which are greater than some cut-off, and the long memory parameter being estimated from frequencies less than the cut-off-the cut-off being treated as a hyper parameter to be estimated as a part of the Bayesian estimation. Specifically, Liseo et al. (2001) specify a Bayes factor approach to estimate the hyper parameter. The likelihood is specifically estimated using a Whittle approximation.
For an ARFIMA process defined by (8), define a cut-off frequency as one of the Fourier frequencies, ω m , for some 1 ≤ m ≤ñ. Then from (18) as the approximate log-likelihood. As noted above, the cut-off m is then treated as a hyper-parameter. Holan et al. (2009) examined the FEXP model (defined as (12) with k = 1 and u = 1) with Gaussian disturbances, and thus likelihood (15), and looked specifically at estimating the long memory parameter d. They estimate the autocovariance function by first estimating the autocovariances of the short memory portion of the model using the techniques of Pourahmadi (1983) and Hurvich (2002), and then the long memory components using a 'splitting' method documented by Hurvich (2002) and Bertelli and Caporin (2002), and from there constructing an estimate of Σ. The inverse and the determinant of this are then calculated, and the likelihood calculation is then used in an MCMC estimation method. Makarava et al. (2011) looked at a Bayesian estimate of the long memory parameter and its variance in the context of linear-trend mixed models for continuous time processes.
Wavelet estimation via Bayesian methods has also been examined, notably by Ko and Vannucci (2006), who developed such a method for ARFIMA models.
Time-domain methods have also been used with Bayesian estimation. Pai and Ravishanker (1998) looked at univariate long memory ARFIMA models with a Gaussian distribution. In order to simplify and speed up the evaluation of the full likelihood, they propose a method (originally presented in Pai and Ravishanker (1995)) which involves treating the process values before those available as unobserved latent variables. This in turn allows them to calculate Σ −1 (ν) exactly, and use this to calculate the likelihood and then the posterior distribution of the ARFIMA parameters. Ravishanker and Ray (1997) and Doppelt and O'Hara (2019) extended this approach to multivariate long memory time series with Gaussian disturbances.
The above references all considered the ARFIMA/VARFIMA representation where the spectral density is defined by (6) with k = 1 and λ = 0. However there has also been some work specifically looking at estimating the parameters of Gegenbauer processes (6) with λ = 0. Graves et al. (2015) looked at an approximate likelihood technique for ARFIMA models, based on a technique similar to Haslett and Raftery (1989) which truncated the infinite AR representation to a finite but large number. In their Appendix B, Graves et al. (2015) state that the MCMC methodology they develop can be easily extended to incorporate Gegenbauer models, provided the Gegenbauer frequency/frequencies are known a-priori.
Phillip et al. (2018) identified an efficient sampling scheme for estimating the parameters of (4) with disturbance term (14) when p = 1, q = 0. They used a truncated form of the Wold representation, similar to that used by Dissanayake et al. (2016) to estimate the likelihood.
Whilst this paper is looking specifically at methods to estimate the parameters of Gegenbauer processes, it would be remiss of us not to mention 2 further papers. Firstly, McElroy and Holan (2012) provide a method to efficiently calculate the autocovariances of a Gegenbauer process. This can be used be used equally easily for Bayesian estimates as frequentist maximum-likelihood estimates. Secondly Holan and McElroy (2012) adopted a structural approach to seasonal adjustment. Durham et al. (2019) presented the SABL algorithm (sequentially adaptive Bayesian learning) applied to the estimation of ARFIMA models where the posterior density of the parameters is highly non-Gaussian, but where the number of parameters is relatively small. This algorithm differs from the more standard Hastings algorithms in that the updating of the Markov chain is done via a series of individual Bayesian estimation steps, rather than updating the Hamiltonian as in Metropolis-Hastings, Betancourt et al. (2014).

Semi-parametric methods
Semi-parametric methods refer generally to the estimation of the long memory parameter-the d parameter of either long memory ARFIMA or Gegenbauer processes (7) where k = 1. It is assumed that the unbounded peak can be located apriori, as indicated by the work of Yajima (1996), or from an understanding of the underlying process.
These methods became popular for two main reasons. First they allow a practitioner to test for the presence of long memory prior to model fitting; if no evidence is found then the practitioner can proceed to fit a more parsimonious model. Secondly these methods require fewer assumptions on the behavior of the underlying distribution in order to establish consistency and a central limit theorem.
The earliest of the semi-parametric methods for long memory ARFIMA models was Geweke and Porter-Hudak (1983), which although not in common use today, is reproduced here since it forms the basis of much subsequent work. Taking logs of the spectral density of an ARFIMA process, and with a small amount of manipulation, the following equation is obtained Geweke and Porter-Hudak pointed out this equation resembles a simple linear regression model with log I (ω j ) the response, log 4 sin 2 (ω j /2) the explanatory variables and log I (ω j ) f (ω j ) representing the disturbance term. If the frequencies used are restricted to be close to 0, then log will be close to 0. To formalize this, they introduce m as the number of frequencies in the neighborhood of 0 to include in the regression. The cutoff m must grow to infinity with n, but more slowly than n, satisfying lim n→∞ (log n) 2 m = 0. With this and some regularity conditions, d is estimated by whereṽ j = log 1 − e −iω j = log 4 sin 2 (ω j /2) andv = 1 m m j=1ṽ j so that if d < 0 thend G P H behaves asymptotically as a N d, π 2 6m variate. Geweke and Porter-Hudak (1983) suggested setting m = √ n. Robinson (1994) however showed that the optimal value for m was proportional to n 4/5 , although the exact optimal value depends also on the true value of d. Robinson (1995b) explored the distributional properties of this estimator, and also the related estimator whenṽ j = 2 log | j| which is asymptotically the same, and showed that the consistency holds for d > 0 as well as d < 0, and further under some assumptions, the parameter estimate is asymptotically Gaussian.
In the case of a Gegenbauer process defined by (4) where k = 1, a related estimate of d was suggested by Arteche (1998): whereṽ j = log | j| andv defined as above, and where 1 m + m n → 0 as n → ∞. This estimator is shown to be asymptotically Gaussian under some conditions.
Modifications to these equations-originally suggested for long memory models by Robinson (1995b) are (i) trimming out the lower frequencies and (ii) pooling adjacent periodogram ordinates. If J ordinates are being pooled, the disturbance term is Gaussian, the spectral density is smooth in a neighborhood of the Gegenbauer frequency and other technical conditions, Arteche (1998)  when n is large. The interested reader is referred to Arteche (1998) for further details. Arteche (1998Arteche ( , 2000 and Arteche and Robinson (2000) also suggest a local-Whittle like estimator, based on the earlier work of Robinson (1995a), but restricted to the Fourier frequencies around the non-zero unbounded peak. The estimate iŝ where Θ is a closed subset of (− 1 /2, 1 /2), and Arteche and Robinson (2000) showed thatd LW can be approximated by N d, 1 4m when n is large and the disturbance terms are martingales which are 4-th order sta-tionary and some other technical conditions including that m is an integer tending to infinity more slowly than n. Again the interested reader is referred to the original paper for specific details. Narukawa (2016) looked at a local Whittle estimator for the GEXP model (12), assuming the Gegenbauer frequency was already known. In their notes it is suggested this frequency can be estimated using the method of Yajima (1996). Their estimate is given byd where q 0 = arg min j∈1,...,ñ |λ−ω j | so that ω q 0 is the closest Fourier frequency to λ, and if there are two such values, then the smaller of the two is chosen. However there are no notes provided on how to choose a suitable value for m e , although the numerical studies of this paper illustrate the process by choosing values of m e from 1 to 7. Narukawa (2016) discussed extensions to this for a general k-factor process. Narukawa (2016) was able to show thatd N is a consistent estimator of d under conditions including that the disturbance terms are iid, m e varies with n and some other technical conditions. They also derived a central limit theorem for this estimator under further conditions including 4th order stationarity. See Narukawa (2016) for further details.
In a similar vein to Yajima (1996), Hidalgo and Soulier (2004) found that is also a strongly consistent estimator of the Gegenbauer frequency, with the primary condition being a finite eight-order moment, and other conditions. Having estimated the Gegenbauer frequency in this way, and if it is not found to be 0 or π , and if the underlying disturbance term is Gaussian, then a phase-shifted version of the Geweke and Porter-Hudak (1983) estimator is defined aŝ 2β+1 , and f Y (ω) satisfies a Hölder-or uniform Lipschitz condition of order β. The distribution ofd can then be approximated by N d, π 2 12m when n is large. Hidalgo and Soulier (2004) outline a method for estimating the unbounded poles for a general k-factor Gegenbauer process and show consistency for these estimators.
The method is a sequential method, wherê where z n is a non-decreasing sequence such that for any positive real numbers k and k , log k (n) n z n n k , for instance z n = n e − √ log(n) , and Λ 1 = {1 ≤ j ≤ñ}, Λ s = 1 ≤ j ≤ñ; |ω j −λ s−1,H S | ≥ z n /n; . . . ; |ω j −λ 1,H S | ≥ z n /n for s > 1. Artiach and Arteche (2006) improve on the estimation methods of Yajima (1996) and Hidalgo and Soulier (2004) by suggesting an iterative approach to refine the estimate of the Gegenbauer frequency so that it need not fall on a Fourier frequency. As they point out, for small sample sizes, Fourier frequencies may not provide sufficient resolution to determine the Gegenbauer frequency. Their method involves starting with the Hidalgo and Soulier estimate, and then considering an interval between the frequency before and the frequency after the Fourier frequency, and dividing this into 20 sections, and looking for which of these continues to maximize the periodogram. This new interval is then recursively subdivided into a further 20 sections and the process repeated until the desired level of accuracy is obtained. Bardet and Bertrand (2010) in an interesting paper looked at estimating the spectral density of a functional time series using wavelets and established consistency and a central limit theorem for each positive frequency. Using this key result, Alomari et al. (2020) established consistency of estimating the Gegenbauer frequency and exponent of a k = 1 Gegenbauer functional time series, under conditions involving sufficient smoothness of the short memory process, and some general assumptions on the smoothness of the wavelet function.
In a further result, Ayache et al. (2020) established a central limit theorem result for the estimators defined by Alomari et al. (2020).

Summary
For GARMA models defined by (4), a summary of the theoretical results for the various methods is presented in Table 1.
The semi-parametric methods give the strongest set of theoretical results; Yajima (1996), Hidalgo and Soulier (2004) and Artiach and Arteche (2006) give estimates, consistency and CLT results for locating the Gegenbauer frequencies, whilst the methods for estimating d given by Arteche and Robinson (2000) assume the Gegenbauer frequency is known a-priori, and further that the process can be transformed to a short memory process and the remaining parameters estimated from that. However, our simulations in Sect. 4 indicate that this process results in larger bias and root-mean-square error in the short memory parameters.

Simulations and results
To evaluate a range of the more popular methods for estimation of the parameters in (4), a short simulation study was undertaken looking specifically at a k = 1 GARMA(1, 0.22, 0, − 0.8090) process. For this process the other parameters were f G = 0.4, (u ≈ −0.8090) σ 2 = 1, φ = 0.8. The disturbance term was Gaussian. Since not all methods provide an estimate of the variance σ 2 this parameter was not estimated. 1000 realizations were generated as detailed in Sect. 4.2. This process was repeated 3 times for processes of length 100, 500 and 1000.
These last two methods (A-R & local Whittle) rely firstly on using the maximum value of the periodogram to determine the Gegenbauer frequency and then both use the standard short-memory processes to determine the estimate of the autoregressive parameter.
The results-bias and mean-squared error to 4 decimal places-are presented in Tables 2, 3 and 4. In general the Gegenbauer frequency is determined very accurately, reflecting the faster convergence of this parameter for most methods (n-consistent whereas the other parameters are √ n consistent). Whilst the results are mixed, overall the Bayesian method appears to produce the best mix of small bias and MSE over the parameters, and over all three process lengths (where the parameter estimates for the Bayesian method are determined as the mean of the posterior distribution), although by Table 2 Simulation results for processes of length 100 far with the longest computation time. In particular, the MSE for the shorter processes was substantially smaller than other methods for the n = 100 process length.
The wavelet (MB8) method also performed quite well, however the source code to run this needed to be modified to assist with convergence, and to estimate the short memory component, and even so there were a number of cases where the algorithm failed to converge. Some further details on this are given in Sect. 4.2 and in the source code itself.
The CSS, Whittle and log-Whittle methods all performed quite well, with the log-Whittle better at the shorter process lengths.
The semi-parametric methods (A-R and local-Whittle)-when used this way to estimate all the parameters of the process, performed worst of all, particularly on smaller process lengths.
It is also interesting to view the average time taken to produce each estimate. In general, for a length of n = 500, the time domain methods-CSS and QML-took 10-12 s and the Bayesian method took 2½ minutes per simulated process. The frequency domain methods ranged from 0.07 to 0.10 of a second. The wavelet MB8 method recorded some exceptionally long convergence times for some processes (although convergence was achieved in those cases) for the n = 100 case and this explains the much longer time for that method -on average than at the longer process lengths. It can be seen that the median time is much shorter.
As noted in Sect. 4.2, the wavelet method MB8 did have difficulty with numerically integrating the spectral density to calculate the bandwidth variances. The original source code was modified to assist in this quadrature. However the algorithm still did not converge to a solution at all for a limited number of cases.
The Bayesian estimation also exhibited some issues. The Rhat statistic (Gelman and Rubin 1992) is used to identify these issues-an estimation was deemed to have failed when the Rhat statistic was greater than 1.1. This happened only for a single run-in n = 1000 -run 128. However the n = 100 also had 132 runs where the number of effective samples was less than 400 (Vehtari et al. 2019, Section 2). In practice this is usually relatively straight forward to rectify by increasing the number of iterations per chain. The n = 1000 sample which exhibited high Rhat statistics-run 128-also had too few effective samples, although most likely this was a result of the four Markov chains not fully mixing.

Algorithm and software details
To generate realizations of Gegenbauer processes, the algorithm originally documented by Ford (1991) and later also by Woodward et al. (1998) was used. In summary, the algorithm generates a theoretical autocorrelation function, by numerically integrating the theoretical spectrum and then uses the method of  to generate realizations from a process with that autocorrelation structure.
Although Whitcher (2001) provided an alternative, wavelet based method for generating realizations, that paper found no overall improvement from using the wavelet algorithm, although its' computation time was reduced. Also the techniques of Holan (2012, 2016) allow for the very accurate and fast evaluation of the autocovariance function for a given finite sample. However, for the purposes of this paper the additional time spent in generating the realizations was not significantthe main criterion is the time taken to estimate the parameters-so it was decided to continue to use the "Hosking" method as modified by Ford. A 'run-in' period of 500 observations was used for processes of length 100 and 500, and a run-in period of 1000 observations was used for processes of length 1000. Once the autocovariance function of the process had been estimated, the function in the R package waveslim (Whitcher 2015) called "hosking.sim" was used to generate the specific realisation from the autocovariance function.
Numerical integration was required for this algorithm. Numerical integration was done via adaptive Gauss-Kronrod quadrature, as supplied by the R pracma package (Borchers 2018), or in some cases alternatively by a fine-grained custom trapizoidal quadrature.
In order to ensure the realisation was sufficiently close to the theoretical definition, a check was made to ensure there was a peak evident at the Gegenbauer frequency (away from the autoregressive peak near the origin); if this was not evident, the realization was thrown away and the process re-generated. This was done since the frequency domain estimation techniques do require a sufficiently distinguishable peak to allow the identification and estimation of the long memory process.
For maximum likelihood and quasi-maximum likelihood estimation, a Kalman filter approach was used to calculate the likelihood, and the R package "FKF" was used for this purpose (Luethi et al. 2018). For wavelet estimation methods, the waveslim package (Whitcher 2015) modified as detailed below was used.
For semi-parametric algorithms, the Gegenbauer frequency was determined by the method of Yajima (1996) and Hidalgo and Soulier (2004)-choosing the Fourier frequency which maximized the periodogram. After estimating the long memory parameter, those processes were then filtered back to a short memory process using (10); specifically if X = (X 1 , X 2, . . . , X n ) T and Y = (Y 1 , Y 2, . . . , Y n ) T and defining n (u) represent the Gegenbauer polynomials defined by (5), then X = T −1 Y and since T is lower-triangular with determinant 1, the matrix inversion is comparatively straight-forward and numerically stable.
For semi-parametric algorithms, once the short-memory vector X is obtained, the standard R function "arima" was used to estimate the short-memory parameters.
The wavelet based methods were based on Whitcher (2004) using the code written by the author from the R package "waveslim"-specifically the routine "spp.mle" which can be used to determine the Gegenbauer frequency and the long memory parameter d. To support the simulations, the source code for this was modified to include an AR(1) parameter; also some stability and accuracy improvements were made, particularly around the numerical integration of the spectral density to estimate the bandpass variances.
Other estimation algorithms were coded directly. The authors gratefully acknowledge that the R code for the quasi-maximum likelihood implementation was supplied by Hao Wu from the University of Sydney.
A key challenge to be overcome was associated with the autoregressive component of the spectral density. An example appears in Chung (1996b) (Fig. 1). In this figure it can clearly be seen that, although the peak at the Gegenbauer frequency is apparent, in a finite sample, the peak near the origin (or in the case of a negative autoregressive parameter-near π ) can, and in shorter series often does, have a higher value that the peak at the Gegenbauer frequency. Note that the height of the peak near the origin or near π is related to how close the autoregressive parameter is to ±1-in other words a unit-root-the closer it is the higher the peak, and the greater difficulty the algorithms have in determining the correct location of the Gegenbauer frequency.
Even when this is not a problem, for frequency domain estimation methods there is a tendency for the algorithm to converge to the incorrect peak for the Gegenbauer frequency. To avoid this, the algorithms were specifically coded to look for a peak well away from the origin. (It is assumed a practitioner will already have determined that a Gegenbauer process rather than a long memory process is an appropriate model; perhaps by an examination of the autocorrelation function to identify the characteristic damped sine curve as illustrated by Hosking (1982) , Fig. 2). The specific approach taken is to examine frequencies which exclude the bottom and top 10% of the Fourier frequencies.
However, as shown in Sect. 4.1, the wavelet based functions were still unable to converge to the correct Gegenbauer frequency in a few cases.
For Bayesian estimation, the "stan" package (Carpenter et al. 2017) was used, via the "rstan" interface and this estimated the parameters via a time-domain technique. Relatively diffuse priors were used for this-φ ∼ N (0, 1 /2 2 ), f ∼ log N (0, 0.2 2 ), and d ∼ N (0, 1 /4 2 ). Parameters were specifically restricted to d, f ∈ (0, 1 /2), and φ ∈ (−1, 1). These priors and parameter limits assume that the model has been correctly identified as a stationary and invertible GARMA model. The posterior distribution was estimated using a NUTS MCMC method (Hoffman and Gelman 2014), with 4 Markov chains, 500 warmup samples and 1500 post-warmup samples, and in contrast to other methods, the chains used multi-threading due to the additional time required for single threaded operation. Even so, as is noted in Tables 2, 3 and 4, the time taken for the Bayesian estimation was clearly the longest.
The "Rhat" statistic and the number of effective samples were monitored to ensure the convergence of the algorithm (Vehtari et al. 2019). Any variation in the Rhat statistic was to be reported as an 'algorithm failure'. As noted above, this happened only once. Box and Jenkins (1976), noted that the analysis of sunspot data has had a long history in time series literature, including Schuster (1906) and Yule (1927).

Sunspot data
Recently the Royal Observatory of Belgium has undertaken a considerable project to improve and standardise the sunspot data, and these data have been chosen to illustrate the above techniques (Clette et al. 2014).
It is emphasized here that the intention is not to provide an exhaustive analysis of modeling of this process but more to illustrate the techniques and how they may be used.

Background
The original sunspot data was collated and some observed directly by Rudolf Wolf (and followed by his successor Alfred Wolfer), and labeled the (international) sunspot number (SN). Much of the information reproduced in this section is due to Clette et al. (2014).
Many authors have illustrated their techniques using the Wolfer sunspot time series, although this is available in a number of forms and lengths. Most authors use the annual series from 1749 to 1924, and credit the data to either Anderson (1971) (A.3) or directly Waldmeier (1961). Anderson in turn credits the data to Waldmeier, who as director of the Zurich Observatory continued to collate the Wolfer sunspot numbers after Wolfer. When Waldmeier retired in 1980, the process was taken over by the Royal Observatory of Belgium, Brussels, and this led to the foundation of the SIDC "Sunspot Index Data Center".
Over time alternative series with different ways of counting sunspots, and applying "corrections" were introduced. This included the group number (GN) from Hoyt and Schatten (1998a, b). A key problem arose since these alternative series metrics did not agree in some respects.
Starting in 2011 a major project was undertaken by the "Sunspot Index and Longterm Solar Observations"-SILSO-run from the Royal Observatory of Belgium to collate, validate and integrate the many sources of information. A key requirement was to try and keep as much of the history of the process as possible, since changes in solar activity can happen over considerable periods of time.
An example of this was the Maunder minimum which occurred very early in the series-between 1645 and 1715-when sunspot activity was greatly reduced. Measurements of this time allowed analysis to show how natural solar forcing impacted past climate changes on scales of 50-1000 years (Shapiro et al. 2011).

Model fitting
In this section the annual data is analysed. The official series is now kept with a specific version number-this analysis uses version 2.0.
The first step is a model identification step where an attempt is made to identify the best form of the model, before proceeding to estimate the parameters of the model. The plot and periodogram of the process are presented in Fig. 2. It appears there is a single, very large spike in the periodogram at frequency 0.0906, corresponding to a period of 11 years and 13 days.
As found by Gray et al. (1989) and Chung (1996b), a k = 1 GARMA process appears to be the most appropriate fit. Using the method of Arteche (1998Arteche ( , 2000 and Arteche and Robinson (2000) with l = 0, the value of d is estimated as 0.3689. Using these values, as detailed in Sect. 4.2, the original process was transformed to a short memory process.
The autocorrelation function and partial autocorrelation function of this short memory process are shown in Fig. 3. They appear to indicate that an autoregressive process with parameters at lags (years) 1 and 9 might be an appropriate fit. For a point of comparison-as in Gray et al. (1989)-a process with a single autoregressive parameter was also fit.
Having identified an initial model, the next step is to estimate the parameters. As identified in Sect. 4, above, a Bayesian method appears most appropriate, particularly since there is earlier work identifying the likely values of the parameters using the data up to 1924-the priors specified were μ ∼ N 44.78, 20 2 , d ∼ N 0.42, 0.1 2 , f ∼ N 0.0883, 0.04 2 , and σ 2 ∼ log N 20.0, 20.0 2 . For the AR(1) model the prior φ 1 ∼ N 0.49, 0.1 2 was used, and for the AR(1,9) model, relatively diffuse priors were selected-φ 1 , φ 9 ∼ N 0.0, 1.0 2 . The 'Stan' program was used to perform this analysis (Stan Development Team 2019). The results are shown in Table 5, and for comparative purposes the values estimated on the original series by both Gray et al. (1989) and Chung (1996b) are reproduced and labeled as the "1924 data", with the analysis done here denoted as "SILSO data".  It should be noted that the mean and variance of the 1924 data is different from the SILSO data; however the Gegenbauer and autoregressive parameters are very similar.
Comparing the Bayesian estimates for the AR(1) and AR(1,9) models, the smaller residual variance is given by the AR(1,9) model. For this model, only 0.025% of the samples for the parameter d were over 0.5 indicating that the Gegenbauer component is very likely stationary.
The autocorrelation function (ACF) of the residuals for the AR(1,9) model are presented in Fig. 4 and show some small evidence of autocorrelation at lags 3 and 11. A Box-Ljung test at these lags however indicated these were not significant.
The Gegenbauer frequency calculated for this model actually corresponds to a period of 10 years and 9 months and 11 days-a bit shorter than previously calculated by other authors. However, it is interesting to note that the maximum of the periodogram is at f = 0.0906250 which corresponds to 11 years and 13 days.

Conclusion
This paper has documented a variety of variations of Gegenbauer models which have been developed and further documented the various estimation techniques used for a GARMA process.
Section 4 documented some Monte Carlo simulations using the main estimation methods. This established that a Bayesian methodology appeared the most accurate at estimating the parameters of a GARMA model, although other methods such as the Whittle, CSS and log-Whittle methods also performed very well. An example of the use of Bayesian estimation in practice was provided in Sect. 5.
It is noted that there is still work to be done on the theoretical results behind these estimators however (refer to Sect. 3.5). In particular the theoretical asymptotic distribution and standard errors have not been derived for a number of estimators. There is no way to determine the exact model unless some knowledge from the series itself can be used (for instance monthly observations might be expected to have a frequency close to 1/12).
Recent work by Beaumont and Smallwood (2020) shows that some methods of inference on parameter values can have remarkably less power than others, and the choice of method appears crucial.
With long memory models generally, techniques like bootstrap that may be used in other areas of statistics are typically not feasible.
In general the semi-parametric methods which estimate only a single parameter at a time showed the highest bias and MSE. Whilst these techniques are very useful for model identification, the authors would caution against using them for final estimation or forecasting.
The final observation to be made is that the time taken to evaluate the time-domain based methods is in general considerably longer than for frequency domain methods.