1 Introduction

Risk management in the stock markets, commonly called market risk management, requires the use of statistical tools and models which aim at reducing the potential size of losses, occurring by sudden drops or growth in the value of stock. Losses can be amplified during periods of large volatility. Risk managers routinely use strategies to handle, model and predict the volatility of daily log-returns, defined as \(X_t = \log {P_t} - \log {P_{t-1}}\), where \(P_t\), \(t = 1,2,\ldots ,\) is the price of a generic asset. The behaviour of the extreme values of daily log-returns is critically important for market risk management. Marginally these processes have power-law heavy tail decay and determining the associated (power) tail index is therefore important to quantify this. Additionally, isolated extreme values of daily log-return can often be managed, but there is major risk when there is a clustering of these extreme values, and so the study of this dependence structure during extreme events is essential.

It is standard to assume that the series \(\{X_t\}\) is a stationary series. The most widely adopted models for \(\{X_t\}\) are the generalised autoregressive conditionally heteroskedastic (GARCH) models (Bollerslev 1986) and stochastic volatility models (Taylor 1986). These models are capable of capturing many of the empirical features of daily log-returns. Both processes have heavy tailed marginal distributions but they differ in terms of their extremal dependence structure. One of the most common ways to measure this is through the lag \(\tau \) tail dependence

$$\begin{aligned} \chi _X(\tau )=\lim _{x\rightarrow \infty }\Pr (X_{t+\tau }>x~\mid ~X_t>x), \end{aligned}$$

proposed by Ledford and Tawn (2003), with Davis and Mikosch (2009b) terming \(\{\chi _X(\tau )\}_{\tau \ge 0}\) the extremogram. For stochastic volatility models Breidt and Davis (1998) show that there is no clustering of extreme values, so that \(\chi _X(\tau )=0\) for all \(\tau >0\). Thus extreme values from this stochastic volatility process occur in temporal isolation, although there are examples of other such processes for which this is not the case (Mikosch and Rezapour 2013). In contrast, for any GARCH(\(p,q\)) process \(\chi _X(\tau )>0\) for at least one value of \(\tau >0\). The values of the extremogram, and other extremal dependence features, are only known for a very restricted subclass of GARCH(\(p,q\)) processes.

The aim of our paper is to derive all the extremal features for all GARCH(\(p,q\)) models used in a wide range of financial applications and to present computationally efficient algorithms for their evaluation. We consider GARCH(\(p,q\)) models, for \(p \in {\mathbb {N}}\) and \(q\in {\mathbb {N}}_+\), of the form

$$\begin{aligned} X_t= \sigma _t Z_t, \text { where } \sigma _t^2 = \alpha _0 + \sum _{i=1}^{q} \alpha _i X_{t-i}^2 + \sum _{j=1}^{p} \beta _j \sigma _{t-j}^2,\nonumber \\ \end{aligned}$$
(1)

for \(t \in \mathbb {Z}\), where the parameters are \(\alpha _0 > 0\), \(\alpha _i \ge 0, i = 1,\ldots , q-1\), \(\alpha _q>0\) and \(\beta _j \ge 0, j=1,\ldots ,p-1\), \(\beta _p>0\) for \(p\ge 1\), and \(\{Z_t\}\) are independent and identically distributed (i.i.d.) random variables with \(E(Z_t) = 0\) and \(\text{ Var }(Z_t) = 1\) for all \(t \in {\mathbb {Z}}\). The processes \(\{\sigma _t\}\) and \(\{Z_t\}\) are commonly referred to as the conditional volatility and innovations of \(\{X_t\}\). We additionally assume that \(\{Z_t\}\) are continuous random variables, which avoids technical issues that occur when \(\Pr (Z_t=0)>0\) (Buraczewski et al. 2016). For the process to be strictly stationary the parameters need to satisfy the constraints in Sect. 2.2. Properties of GARCH(pq) processes are often determined by

$$\begin{aligned} \phi := \sum _{i = 1}^q \alpha _i + \sum _{j = 1}^p \beta _j. \end{aligned}$$
(2)

Two important special cases of GARCH(pq) processes arise when \(p=0\) or \(\phi =1\), corresponding to ARCH(q) and IGARCH(pq) processes respectively. The IGARCH(pq) process is strictly stationary but not weakly stationary, due to violating the condition \(E(X_t^2)< \infty \) for all t. We develop new numerically robust and efficient methods for assessing whether any GARCH(\(p,q\)) process is strictly stationary. Our results cover all of the GARCH(pq) processes including these special cases.

Existing theoretical and computational methods for deriving the extremal properties are well established for special cases of the GARCH(pq) process, namely: for symmetric \(Z_t\) with \(p=0, \, q=1\), corresponding to the ARCH(1) process (de Haan et al. 1989) and for \(p=q=1\), corresponding to a GARCH(1,1) (Laurini and Tawn 2012); and for asymmetric \(Z_t\) with \(p=q=1\) (Ehlert et al. 2015). For a class of processes which contain both the ARCH(1) and GARCH(1,1) processes, but not any GARCH(pq) process with \(\max (p,q) \ge 2\), Collamore et al. (2014) provide an algorithm to derive \(\Pr (X_t>v)\) for any v. For general GARCH(\(p,q\)) models, with arbitrary (pq) many theoretical extremal properties have been derived by Basrak et al. (2002), Davis and Mikosch (2009a), Basrak and Segers (2009) and Collamore and Mentemeier (2018), including marginal distributions and some results for the extremal clustering properties. At first sight it seems that these results give everything that is needed for their numerical evaluation. But, as we will show, this is far from the case. In fact, until now none of these features can be evaluated for general GARCH(pq) processes or for the processes considered by Collamore and Mentemeier (2018).

Consider the marginal tail behaviour of \(|X_t|\) and \(X_t^2\) for GARCH(pq) processes \(X_t\). Basrak et al. (2002) give a limit expression, as \(u\rightarrow \infty \), that there exists \(\kappa > 0\) such that for fixed \(x>1\)

$$\begin{aligned} \begin{aligned}&\Pr (|X_t|> ux \mid |X_t|>u)\rightarrow x^{-2\kappa }, \text { and }\\ {}&\Pr (X_t^2>ux \mid X_t^2>u)\rightarrow x^{-\kappa }.\end{aligned} \end{aligned}$$

These papers only give an asymptotic limiting expression for \(\kappa \), but do not evaluate it. We find that direct computation using their expression gives very poor numerical performance. Janssen (2010) presents an alternative approach to evaluate \(\kappa \), however that approach requires \(Z_t\) to have bounded support, ruling out distributions used by practitioners, e.g., \(Z_t\) being Gaussian or t-distributed. Furthermore, the associated numerical methods are very slow and unreliable. Collamore and Mentemeier (2018) characterise the limit \(\lim _{u\rightarrow \infty }u^{\kappa }\Pr (X^2_t>u)=C>0\), with \(\kappa \) formulated as in Basrak et al. (2002). So, without techniques to obtain \(\kappa \) their methods cannot evaluate C or any of the other properties they characterise. We propose the first computationally efficient numerical algorithm for the evaluation of \(\kappa \), which is valid irrespective of whether \(Z_t\) are unbounded or bounded. We also provide strong empirical evidence for how \(\phi \) influences \(\kappa \), including \(\kappa =1\) for all IGARCH(pq) and how our evaluation of \(\kappa \) can be used for assessing stationarity.

Now consider the extremal dependence and the clustering features of GARCH(pq) processes. Basrak and Segers (2009) propose algorithms for their evaluation with bounded innovations. However, Basrak and Segers (2011) identify these methods have major limitations and that they do not hold for any GARCH(pq) process. So currently no extremal clustering features for GARCH(pq) processes, when \(\max (p,q)\ge 2\), can be evaluated. We propose an entirely new algorithm to evaluate a range of cluster features regardless of p and q and for \(Z_t\) having any continuous distribution.

There are a range of features of GARCH processes that are of interest in managing financial stress periods. We focus primarily on \(\{\chi _{X}(\tau ); \tau \ge 0\}\), discussed above, and the extremal index for the process \(X_t\) denoted \(0<\theta _X\le 1\), where \(1/\theta _{X}\) is the limiting mean number of extreme values per independent extreme event (Hsing et al. 1988). We also evaluate these properties for the lower tail and for the square of the GARCH process. Other features can also be derived using our methods, such as the duration of a stress period and the total losses in such a period.

All of these cluster functionals can be obtained from the tail process (Smith et al. 1997; Segers 2003; Planinić and Soulier 2018). For a heavy tailed process \(\{X_t\}\) the tail process \(\{\hat{X}_t\}_{t\ge 0}\) is defined in the following way. If for any \(t\in {\mathbb {N}}_{+}\) the variables \((X_0/u, X_1/u, \ldots ,X_t/u )\mid X_0>u\), converge weakly to \((\hat{X}_0, \hat{X}_1, \ldots , \hat{X}_t)\), with \(\hat{X}_0\) non-degenerate as \(u\rightarrow \infty \), then the limit process \(\{\hat{X}_t\}_{t\ge 0}\) is termed the forward tail process. For Breidt and Davis (1998) models of stochastic volatility \(\hat{X}_t=0\) almost surely for all \(t\ge 1\), so large values are not followed by large values for these processes. In contrast, for any GARCH(pq) process at least one \(\hat{X}_t\), for \(t\ge 1\), is non-degenerate, and furthermore all elements of the tail process are non-degenerate if all the coefficients of \(\sigma ^2_t\) in expression (1) are positive.

In this paper we derive the theory for obtaining the forward tail process, the extremogram, the extremal index and cluster size distribution for any GARCH(\(p,q\)) process with bounded or unbounded support for the innovations. We provide a new fast, yet accurate, Monte Carlo algorithm for the numerical evaluation of these extremal features. Our approach is quite different from that of Collamore and Mentemeier (2018), who start their Markov chains in non-extreme states and use a form of importance sampling to move into the tail region. Relative to our approach, using the tail process, this is guaranteed to be inefficient for evaluation of limiting properties, as we start in the limit. However, for evaluating non-limit tail features the Collamore and Mentemeier (2018) approach has advantages over ours, but to be useful they need our evaluation of \(\kappa \).

The paper is structured in the following way. In Sect. 2 we give the required background details for the properties of stationary GARCH(\(p,q\)) processes and the theory of multivariate regular variation that is required for our methodology. In Sect. 3 we give new results for robust computational testing for stationarity and in Sect. 4 a novel particle filtering algorithm for fixed point distributions that enables us to obtain the tail index \(\kappa \) and to sample from a \((p+q)\)-dimensional extreme state of the process. In Sect. 5 we derive the tail processes for the squared and original GARCH(\(p,q\)) processes. Section 6 discusses the numerical evaluation of the method, including checking for stationarity, evaluating \(\kappa \), and illustrating the rapid convergence of the particle filter algorithm. Section 7 has a study of a range of extremal dependence features of the GARCH(\(p,q\)) process for a variety of parameter values. In particular, the empirical evidence from the systematic study of two sub-classes of GARCH(\(p,q\)) process suggests a number of other features that remain to be proved. In Sect. 8 we discuss the likely wider impact of our methods and in the Appendix give proofs. In Supplementary Material, Laurini et al. (2022), we provide some theory on the convergence properties of our methods and extensive illustrations of their reliability.

2 Known properties of GARCH(\(p,q\)) processes

2.1 Matrix formulation

Focusing on the squared GARCH process, \(X_t^2\), we write the process as a stochastic recurrence equation (SRE) to enable the exploitation of a range of established results (Kesten 1973). Let the \((p+q)\) vector \({\textbf{Y}}_t\), the \((p+q) \times (p+q)\) matrix \({\textbf{A}}_t\) and the \((p+q)\) vector \({\textbf{B}}_t\) be

$$\begin{aligned}{} & {} {\textbf{Y}}_{t} = \left( X_t^2, \ldots , X_{t-q+1}^2, \sigma _t^2, \ldots , \sigma _{t-p+1}^2\right) ^\textrm{T} ,\nonumber \\{} & {} \quad {\textbf{A}}_t = \begin{pmatrix} \alpha ^{(q-1)}Z^2_t &{}\alpha _q Z_t^2&{}\beta ^{(p-1)}Z^2_t &{} \beta _p Z_t^2 \\ {\varvec{I}}_{q-1} &{}0_{q-1}&{} {\varvec{0}}_{(q-1)\times (p-1)} &{} 0_{q-1} \\ \alpha ^{(q-1)} &{}\alpha _q &{}\beta ^{(p-1)} &{} \beta _p \\ {\varvec{0}}_{(p-1)\times (q-1)} &{}0_{p-1} &{} {\varvec{I}}_{p-1} &{} 0_{p-1}\end{pmatrix},\nonumber \\{} & {} \quad {\textbf{B}}_t = \left( \alpha _0 Z^2_t , 0_{q-1}, \alpha _0, 0_{p-1}\right) ^\textrm{T} \end{aligned}$$
(3)

where the superscript \(\textbf{T}\) denotes vector’s transpose, \(\alpha ^{(s)} = (\alpha _1, \ldots , \alpha _{s}) \in {\mathbb {R}}^{s}\), \(\beta ^{(s)}= (\beta _1, \ldots , \beta _{s}) \in {\mathbb {R}}^{s}\), \({\varvec{I}}_s\) is the identity matrix of size s, \({\varvec{0}}_{(r\times s)}\) is a matrix of zeros with r rows and s columns and \(0_s\) is a length s column vector of zeros. If \(s \le 0\) these terms are to be interpreted as being dimensionless, hence ARCH(q) processes arise using the top left \(q\times q\) elements of the matrix. Squared GARCH(\(p,q\)) processes satisfy the SRE

$$\begin{aligned} {\textbf{Y}}_t = {\textbf{A}}_t {\textbf{Y}}_{t-1} + {\textbf{B}}_t, t \in {\mathbb {Z}}, \end{aligned}$$
(4)

where \(\{{\textbf{A}}_t\}\) and \(\{{\textbf{B}}_t\}\) are each i.i.d. sequences. As \(Z_t\) is a continuous variable, \({\textbf{A}}_t\) has no rows of zeros, this avoids problems discussed by Buraczewski et al. (2016).

The expression of the GARCH(\(p,q\)) in the SRE formulation (3) is due to Francq and Zakoïan (2010, Section 2.2.2, p. 29). This SRE formulation is less parsimonious than that of Bougerol and Picard (1992), but has the benefit of covering all GARCH(\(p,q\)) processes, unlike that of Bougerol and Picard (1992) which excludes the \(p=q=1\) case that expression (3) gives, simplifying to

$$\begin{aligned} {\textbf{Y}}_{t} = \begin{pmatrix} X_t^2 \\ \sigma _t^2 \end{pmatrix},\quad \quad {\textbf{A}}_t = \begin{pmatrix} \alpha _1 Z_t^2&{} \beta _1 Z_t^2 \\ \alpha _1 &{}\beta _1 \end{pmatrix}, \quad \quad {\textbf{B}}_t = \begin{pmatrix} \alpha _0 Z^2_t \\ \alpha _0 \end{pmatrix}, \end{aligned}$$
(5)

where \(\alpha ^{(s)}\) and \(\beta ^{(s)}\) are scalar and none of the \({\varvec{I}}_s, {\varvec{0}}_{r\times s}\) or \(0_s\) terms are included.

2.2 Strict stationarity

Kesten (1973) presents necessary and sufficient conditions for SREs of the form (4), with general matrix \({\textbf{A}}_t\) and vector \({\textbf{B}}_t\), to have a unique, strictly stationary solution. The conditions that are non-trivially true for GARCH(pq) processes are that \(E( \ln ^{+} \Vert {\textbf{A}}_t\Vert ) < \infty \) and \(E (\ln ^{+} \Vert {\textbf{B}}_t\Vert ) < \infty \) (here \(\ln ^{+} x = \ln x,\) if \(x\ge 1\) and 0 otherwise), and that the top Lyapunov exponent \(\gamma \), given by

$$\begin{aligned} \gamma = \lim _{t \rightarrow \infty } \frac{1}{t} E \left( \ln \Vert {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1\Vert \right) , \end{aligned}$$
(6)

has the property that \(\gamma <0\). This result holds for all norms. Here and subsequently we use the \(L_1\) norm, so for a matrix \({\textbf{A}}\) with \(a_{i,j}\) being the (ij)th element, \(\Vert {\textbf{A}}\Vert = \sum _{i,j} |a_{i,j}|\) and for a vector \({\textbf{B}}\) with \(b_{i}\) being the (i)th element, \(\Vert {\textbf{B}}\Vert = \sum _{i} |b_{i}|\).

For the squared GARCH(pq) the moment conditions on \({\textbf{A}}_t\) and \({\textbf{B}}_t\) always hold. To illustrate this for \({\textbf{A}}_t\), first consider the random variable \(K_t:=\phi Z^2_t+\phi +p+q\), where \(\phi >0\) is defined by (2). Then \(K_t>1\), so \(0<\ln (K_t)<K_t\) and \(E(K_t)<\infty \) as \(E(Z_t^2)=1\). As \(0\le \Vert {\textbf{A}}_t\Vert =\phi (Z^2_t+1)+p+q-2< K_t\), we have \(0\le E (\ln ^{+} \Vert {\textbf{A}}_t\Vert ) \le E (\ln ^{+} K_t)=E(\ln K_t)<E(K_t)<\infty \). So for GARCH(pq) processes \(X_t^2\) and \(X_t\) to be strictly stationary it is necessary and sufficient that \(\gamma <0\). Expression (6) is not an ideal starting point for evaluating \(\gamma \). Instead we also have that

$$\begin{aligned} \gamma _t = \frac{1}{t}\ln \Vert {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1\Vert \quad \text{ and }\quad \gamma _t \xrightarrow {\text{ a.s. }}\gamma , \quad \text{ as } t \rightarrow \infty ,\nonumber \\ \end{aligned}$$
(7)

where a.s. denotes almost sure convergence, (Francq and Zakoïan 2010, Theorem 2.3—p. 30). It would appear that a relatively simple simulation can be performed to obtain a reliable estimate of \(\gamma \) via expression (7). However, in Sects. 3.3 and 6.2 we show this is far from the case and a mix of careful asymptotic approximation analysis and numerical evaluation is required to evaluate \(\gamma \).

In some cases we do not need to evaluate \(\gamma \) to find if the process is strictly stationary, e.g., GARCH(pq) processes are always strictly stationary when \(\phi \le 1\); this includes all IGARCH(pq) processes. Also \(\sum _{j = 1}^p \beta _j < 1\) is necessary but not sufficient for strict stationarity. So, numerical evaluation of \(\gamma \) is required whenever \(\phi >1\) and \(\sum _{j = 1}^p \beta _j < 1\).

2.3 Multivariate regular variation results

Basrak et al. (2002) and Collamore and Mentemeier (2018) show that there exists a unique stationary solution to the SRE (4) and that this solution exhibits a multivariate regular variation property, i.e., for any t, any vector norm \(\Vert \cdot \Vert \) and all \(r>0\),

$$\begin{aligned} \begin{aligned}&\frac{\Pr (\Vert {\textbf{Y}}_t\Vert> rx, {\textbf{Y}}_t/\Vert {{\textbf{Y}}_t}\Vert \in \cdot )}{\Pr (\Vert {\textbf{Y}}_t\Vert > x)} {\mathop {\rightarrow }\limits ^{v}} r^{-\kappa } \Pr (\hat{\varvec{\Theta }}_t \in \cdot ), \end{aligned} \end{aligned}$$
(8)

as \(x \rightarrow \infty \), where \({\mathop {\rightarrow }\limits ^{v}}\) denotes vague convergence, \(\kappa > 0\), and \(\hat{\varvec{\Theta }}_t\) is a \((p+q)\)-dimensional random vector on the unit sphere (with respect to a norm \(\Vert \cdot \Vert \)) defined by \({\mathbb {S}}^{p+q}\subset {\mathbb {R}}^{p+q}\).

If condition (8) holds then \({\textbf{Y}}_t\) is said to exhibit multivariate regular variation with index \(\kappa \) and the distribution of \(\hat{\varvec{\Theta }}_t\) is termed the spectral measure of the vector \({\textbf{Y}}_t\) (Resnick 1987). A consequence of property (8) for GARCH(pq) processes is that all the marginal variables of \({\textbf{Y}}_t\) have regularly varying tails with index \(\kappa >0\), i.e., for \(r\ge 1\) and all t

$$\begin{aligned}{} & {} \Pr (X_t^2> rx \mid X_t^2> x) \rightarrow r^{-\kappa },\\{} & {} \qquad \text{ and } \Pr (\sigma _t^2> rx \mid \sigma _t^2> x) \rightarrow r^{-\kappa }, \text{ as } x \rightarrow \infty . \end{aligned}$$

It is insightful to consider a slightly rearranged version of limit (8), and with the \(L_1\) vector norm. We define radial, \(R_t\), and angular (two variants \({\varvec{\Theta }}_t\) and \({\varvec{\Theta }}^-_t\)) random variables by

$$\begin{aligned} R_t= & {} \Vert {\textbf{Y}}_t\Vert = X_t^2+ \ldots +X_{t-q+1}^2+ \sigma ^2_t + \ldots +\sigma ^2_{t-p+1}, \nonumber \\ {\varvec{\Theta }}_t= & {} {\textbf{Y}}_t/\Vert {{\textbf{Y}}_t}\Vert \nonumber \\= & {} (X_t^2, \ldots ,X_{t-q+1}^2, \sigma _t^2, \ldots ,\sigma _{t-p}^2,\sigma _{t-p+1}^2)/R_t \nonumber \\ {\varvec{\Theta }}^-_t= & {} (X_t^2, \ldots ,X_{t-q+1}^2, \sigma _t^2, \ldots ,\sigma _{t-p}^2)/R_t, \end{aligned}$$
(9)

with \({\mathbb {S}}^{p+q}\) now being the \((p+q)\) dimensional unit simplex.

For GARCH(pq) processes, with \(p\ge 1\), we have two angular variables as the \(p+q\) dimensional variable \({\varvec{\Theta }}_t\) has redundancy in its final dimension as \(\Vert {\varvec{\Theta }}_t\Vert =1\), and so for studying the distribution of angular variables it is simpler to work with the \(p+q-1\) dimensional variable \({\varvec{\Theta }}^-_t\), which is related to \({\varvec{\Theta }}_t\) by \({\varvec{\Theta }}_t=({\varvec{\Theta }}^-_t, 1-\Vert {\varvec{\Theta }}^-_t\Vert )\) where \({\varvec{\Theta }}^-_t\) is \({\varvec{\Theta }}_t\) without its last component. We use this \({\textbf{W}}^-\) notation to create a \((p+q-1)\) dimensional vector from any \((p+q)\) dimensional vector \({\textbf{W}}\) on the simplex \({\mathbb {S}}^{p+q}\) throughout. Furthermore, for \({\textbf{w}}\in {\mathbb {R}}^{p+q-1}\), we define

$$\begin{aligned} H_{{\varvec{\Theta }}^-_t }({\textbf{w}})=\Pr ({\varvec{\Theta }}^-_t\le {\textbf{w}}), \end{aligned}$$
(10)

with vector algebra, here and elsewhere, interpreted as being componentwise. We will denote the limit random variables, that arise in limit (8), for \((R_t, {\varvec{\Theta }}_t, {\varvec{\Theta }}^-_t)\) by \((\hat{R}_t,\hat{\varvec{\Theta }}_t,\hat{\varvec{\Theta }}^-_t)\), with the distribution function of \(\hat{\varvec{\Theta }}^-_t\), denoted by \(H_{\hat{\varvec{\Theta }}^-_t}\), defined similarly to distribution (10). Then, for \(r \ge 1\), as \(x \rightarrow \infty \), the limit (8) becomes,

$$\begin{aligned}{} & {} \Pr (R_t> rx, {\varvec{\Theta }}^-_t \le {\textbf{w}} \mid R_t> x) {\mathop {\rightarrow }\limits ^{v}} \Pr (\hat{R}_t>r)\Pr (\hat{\varvec{\Theta }}^-_t \le {\textbf{w}})\nonumber \\{} & {} \quad = r^{-\kappa } H_{\hat{\varvec{\Theta }}^-_t}({\textbf{w}}). \end{aligned}$$
(11)

From the first expression for the asymptotic form in limit (11) we see that the radial variable \(R_t\) and the angular variables \({\varvec{\Theta }}^-_t\) become asymptotically independent, as the radial variable \(R_t\) grows due to \(x \rightarrow \infty \), i.e., the variables \( \hat{R}_t\) and \(\hat{\varvec{\Theta }}^-_t\) are independent. The second term in this limit shows that \(\hat{R}_t\) is a Pareto random variable with tail index \(\kappa \), i.e.,

$$\begin{aligned} \Pr (\hat{R}_t>r)=r^{-\kappa } \text{ for } r\ge 1. \end{aligned}$$
(12)

Basrak and Segers (2009) and Janssen (2010) identify that there is additional structure imposed on \(H_{\hat{\varvec{\Theta }}^-_t}\) and \(\kappa \) respectively by the GARCH(\(p,q\)) process. We discuss these restrictions below.

For the GARCH(1, 1) process, Laurini and Tawn (2012) provided an expression for the spectral measure, for a different description of the angular variable to that used here. For the choice of the angular variable (9), for all t, their result translates to a univariate \(\hat{\varvec{\Theta }}^-_t\), with distribution

$$\begin{aligned} \begin{aligned}&H_{\hat{\varvec{\Theta }}^-_t}(w)= \frac{2}{E(|Z|^{2\kappa })} \int _{0}^{\left( \frac{w}{1-w}\right) ^{1/2}}(1+s^2)^{\kappa } F_Z(ds), \end{aligned} \end{aligned}$$
(13)

where \(F_Z\) is the distribution function of \(Z_t\). When \(\max (p,q)\ge 2\), through use of the multivariate regular variation structure, Basrak and Segers (2009, Propositions 3.3, 5.1) show that

$$\begin{aligned} E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert ^{\kappa }) = 1, \end{aligned}$$
(14)

where \(0 \le w \le 1\) and \({\textbf{A}}\) is i.i.d. to \({\textbf{A}}_t\), and uniquely \(\Pr (\hat{\varvec{\Theta }}_t \in \cdot )=E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert ^{\kappa }; {\textbf{A}}\hat{\varvec{\Theta }}_t/\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert \in \cdot )\), where the notation \(E(X;Y):=E(X {\textbf{1}}(Y))\) and \({\textbf{1}}(Y)\) is the indicator of the event Y. Thus

$$\begin{aligned} H_{\hat{\varvec{\Theta }}^-_t}({\textbf{w}})=E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert ^{\kappa }; ({\textbf{A}}\hat{\varvec{\Theta }}_t/\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert )^- \le {\textbf{w}}). \end{aligned}$$
(15)

Basrak and Segers (2009, p. 1075) propose an approach to simulate from \(H_{\hat{\varvec{\Theta }}^-_t}({\textbf{w}})\) for a SRE of the form (4), with the required distribution \(H_{\hat{\varvec{\Theta }}^-_t}\) being the invariant distribution of a Markov chain and MCMC methods used for its evaluation. However, this method cannot be used for any GARCH(pq) process for the following reasons. Firstly, they make an assumption that \({\textbf{A}}_t\) is bounded, which excludes the possibility of \(Z_t\) being, for example, Gaussian or \(t_{\nu }\) distributed. Much more critically though, Basrak and Segers (2011) note that their proof was flawed and their results only hold under very specific conditions on the matrix \({\textbf{A}}_t\) in the SRE framework (4), which excludes all GARCH(pq) processes. Our approach in Sect. 4 overcomes these restrictions.

2.4 Janssen’s approach to determine \(\kappa \)

Next focus on how \(\kappa \) has been determined. Following methods of Kesten (1973), Basrak et al. (2002) showed that there exists a \(\kappa >0\) which is the unique positive solution of the equation

$$\begin{aligned} \lim _{t \rightarrow \infty } \frac{1}{t} \ln E \left( \Vert {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1\Vert ^{\kappa }\right) = 0. \end{aligned}$$
(16)

Collamore and Mentemeier (2018) found the same expression for the marginal tail index for more general stochastic recurrence relations. For the GARCH(1, 1) process Mikosch and Stărică (2000) show that \(\kappa \) is simple to evaluate using expression (16). Specifically, taking \({\textbf{A}}_t\) as in expression (5) then \( {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1 = {\textbf{A}}_t \prod _{i=1}^{t-1}(\alpha _1 Z^2_i + \beta _1)\),

from which it simply follows that expression (16) holds and \(\kappa \) satisfies \(E\left[ \left( \alpha _1 Z^2_t + \beta _1\right) ^\kappa \right] = 1\). Setting \(\beta _1 = 0\) for the GARCH(\(1,1\))  process gives the same result for \(\kappa \) derived by de Haan et al. (1989) for the ARCH(1) process. For general GARCH(\(p,q\)) processes no such simplification of equation (16) exists. So it is natural to try to evaluate \(\kappa \) by solution of the equation (16), but such an approach is impossible due to numerical instabilities.

The only existing feasible method to evaluate \(\kappa \) has been proposed by Janssen (2010, Proposition 4.3.1), which exploits Kesten (1973, Proof of Theorem 3). With \({\textbf{A}}\) specified in (4) the conditions required for the results of Kesten (1973) apply and the equality

$$\begin{aligned}{} & {} \int _{{\mathbb {S}}^{p+q}} E \left[ \Vert {\textbf{A}} {\textbf{w}}\Vert ^{k} g\left( \frac{{\textbf{A}} {\textbf{w}}}{\Vert {\textbf{A}} {\textbf{w}}\Vert }\right) \right] H_k(d{\textbf{w}}) \nonumber \\{} & {} \quad = \rho _k \int _{{\mathbb {S}}^{p+q}} g\left( {\textbf{w}}\right) H_k(d{\textbf{w}}), \end{aligned}$$
(17)

holds for all continuous functions g, where \(H_k\) is a probability measure on \({\mathbb {S}}^{p+q}\) (see step 1 in the proof of Theorem 3 in Kesten 1973), and where \(\rho _k>0\). For all pairs \((\rho _k, H_k)\) that satisfy equality (17), for any given \(k\in (0, \infty )\) then \(\rho _k\) is determined solely by k. The special case of \(g\equiv 1\) in (17) gives the simplification

$$\begin{aligned} \int _{{\mathbb {S}}^{p+q}} E \left[ \Vert {\textbf{A}} {\textbf{w}}\Vert ^{k}\right] H_k(d{\textbf{w}}) = \rho _k \int _{{\mathbb {S}}^{p+q}} H_k(d{\textbf{w}}) = \rho _k. \end{aligned}$$
(18)

From condition (14) we know that the \(\kappa \)-th moment of \(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_t\Vert \) is equal to 1, where \(\hat{\varvec{\Theta }}_t \sim H_{\hat{\varvec{\Theta }}_t}\) on the space \({\mathbb {S}}^{p+q}\). It follows from property (18) that \(\rho _k = 1\) when \(k = \kappa \). Kesten (1973) and Janssen (2010) show that there is only one solution to equation (14), so \(\kappa \) is the unique solution of

$$\begin{aligned} \int _{{\mathbb {S}}^{p+q}} E \left[ \Vert {\textbf{A}} {\textbf{w}}\Vert ^{\kappa }\right] H_{\kappa }(d{\textbf{w}}) = 1, \end{aligned}$$
(19)

and that the unit measure \(H_{\kappa }\) must be \(H_{\hat{\varvec{\Theta }}_t}\). Thus if we can find, or simulate from, \(H_{\hat{\varvec{\Theta }}_t}\) we can find \(\kappa \). To evaluate \(\kappa \) all that is required is to define a class of unit measures \(H_k\), over \(k\in (0, \infty )\), which contains within it as an interior point \(H_{\hat{\varvec{\Theta }}_t}\), and then vary k until property (19) is found.

By adapting the algorithm of Basrak and Segers (2009), Janssen (2010) proposes a valid algorithm to simulate from a class of functions \(H_k\), which enables the evaluation of \(\kappa \) and \(H_{\hat{\varvec{\Theta }}_t}\). Critically this only applies when the innovations \(Z_t\) have bounded support. In Sect. 4 we describe an algorithm that applies even if \(Z_t\) is unbounded and which has substantial computational efficiency gains for calculating \(\kappa \), compared to the algorithm of Janssen (2010), even when \(Z_t\) is bounded.

3 Alternative formulations for assessing strict stationarity of GARCH processes

3.1 Preliminaries

Let \(\varvec{\phi }=(\alpha _1, \ldots ,\alpha _q, \beta _1, \ldots ,\beta _p)=(\phi _1, \ldots ,\phi _{p+q})\), with \(\phi _i \ge 0\) for all \(i = 1, \ldots , p+q\). From (2) it follows that \(\phi =\sum _{i=1}^{p+q}\phi _i\). Also let \(\phi ^{(M)} = \max \{\phi _i; i=1, \ldots ,p+q\}\) so \(0< \phi ^{(M)}\le \phi \) for all GARCH(pq) processes as they require that \(\alpha _q>0\), even for ARCH(q) processes. As we use the \(L_1\) norm of the matrix, \(\Vert {\textbf{A}}_t\Vert =\phi (Z_t^2+1)+p+q-2\), so it is linear in \(Z_t^2\).

Results (7) and (16), for \(\gamma \) and \(\kappa \) respectively, involve the behaviour of the \(\Vert {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1\Vert \) as \(t\rightarrow \infty \). As this term tends to zero we need to capture its dominant behaviour to stabilise it. The approach we will take uses products of simple scalar summaries of the individual matrices. One possibility is the trace of \({\textbf{A}}_t\), with \(\text{ trace }({\textbf{A}}_t) = \alpha _1 Z_t^2 + \beta _1\) for all tp and q, as the trace of a square matrix is the sum of all its eigenvalues. As we can have \(\alpha _1=\beta _1=0\) when \(\min (p,q) \ge 2\), we instead focus on the related quantities

$$\begin{aligned} \lambda _t^*= \alpha _q Z_t^2 + \beta _p \text{ and } \lambda ^*= \alpha _q Z^2 + \beta _p, \end{aligned}$$
(20)

where Z is i.i.d. from \(\{Z_t\}\). Here \(\alpha _q>0\) for all GARCH(pq) processes so \(\lambda ^*_t>\beta _p>0\), but for ARCH(q) processes \(\beta _p=0\) so \(\lambda ^*_t\) can be arbitrarily close to 0. Another possible summary of \({\textbf{A}}_t\) is \(\lambda _t\), the largest magnitude eigenvalue of all of its eigenvalues, and similarly we use the term \(\lambda \) for this when \(Z^2_t\) is replaced by \(Z^2\). For \({\textbf{A}}_t\), given by representation (3), it is guaranteed that \(\lambda _t\) is simple and exceeds all other eigenvalues in absolute value; see Seneta (1981, Theorem 1.1). In general \(\lambda _t\) does not have a simple closed form expression.

At first sight the two proposals for stabilising the matrix product seems quite different. However, their properties are highly related. For example in the GARCH(1,1) case \(\lambda _t=\lambda ^*_t = \alpha _1 Z_t^2 + \beta _1\), which is an important term as the property \(E[(\alpha _1 Z_t^2 + \beta _1)^\kappa ]=1\) determines \(\kappa >0\) for this process. So both proposals form natural extensions to the case with \(p\not =1\) and/or \(q\not =1\). In the only other case where we have found a closed form solution for \(\lambda _t\) similar relationships hold. Specifically, for a GARCH(pp) processes, i.e., \(q=p\), when \(\alpha _i=\beta _i=0\) for \(i=1,\ldots ,p-1\) we have that \(\lambda _t=(\alpha _p Z_t^2 + \beta _p)^{1/p}=(\lambda ^*_t)^{1/p}\). Lemma 1 shows that \(\lambda _t\) and \(\lambda ^*_t\) have similar moment properties and related growth rates when multiplied over different t.

Lemma 1

Suppose that \(\{{\textbf{A}}_t\}\) are given by (3) where \(Z_t\) are as defined by (1), \(\lambda ^*_t\) is defined by (20), and let \(\lambda _t\) be the largest magnitude eigenvalue of the matrix \({\textbf{A}}_t\). Then for all GARCH(pq) processes with \(p>0\)

$$\begin{aligned} E(|\ln \lambda _t^*|)<\infty \text{ and } E(|\ln \lambda _t |)<\infty , \end{aligned}$$
(21)

and

$$\begin{aligned} \mu _t=: \frac{1}{t}\ln \left( \prod _{i=1}^t \frac{\lambda ^*_i}{\lambda _i} \right) \xrightarrow {\text {a.s.}}\mu \text{ as } t\rightarrow \infty \end{aligned}$$

to a finite constant \(\mu \) where \(\mu =E(\ln \lambda ^*)-E(\ln \lambda )\), with \(-\infty< \mu <\infty \). When \(p=0\), i.e., an ARCH(q) process, all convergences only hold if additionally we have that \(E(\ln |Z_t|)<\infty \).

Note that \(E(\ln |Z|)<\infty \) holds for all Gaussian and Student-t distributions as the additional restriction that the condition implies is that the density of Z is bounded at zero. In all examples we considered we found that \(\mu \le 0\).

There are pros and cons of the two approaches. As we have seen, \(\lambda ^*_t\) is analytically simple and so it is easy to derive closed form results and interpret, but we find in Sect. 6 it has limitations in terms of the numerical stability of its products and in Sect. 3.2 show that different derivations for GARCH and ARCH cases are required. In contrast, although \(\lambda _t\) in general lacks a simple closed form, which makes analytical derivations more difficult as seen by the proof of Lemma 1, our numerical studies find that obtaining \(\lambda _t\) is simple, fast and that there are no issues of numerical instability with its products. These properties influence our strategy. We first work with \(\lambda ^*_t\) to develop bounds (for GARCH in Sect. 3.2 and ARCH in Appendix A) and limits, then derive the equivalent results for \(\lambda _t\).

3.2 Bounds on matrix product norms for GARCH(pq) with \(p>0\)

For a GARCH(pq) process (\(p>0\)), consider the \((p+q)\times (p+q)\) matrix \({\textbf{A}}_{t}\cdots {\textbf{A}}_{1}\) with (ij)th entry \(a_{i,j}^{(t)}\). Given the form of \({\textbf{A}}_{t}\) in expression (3), we have that \(a_{i,j}^{(t)}\ge 0\) for all ti and j. Then

$$\begin{aligned} \Vert {\textbf{A}}_{t+1}\cdots {\textbf{A}}_{1}\Vert= & {} \left( Z_{t+1}^2 + 1\right) \left[ \sum _{i=1}^q\alpha _i \sum _{j=1}^{p+q}a_{i,j}^{(t)}\right. \nonumber \\{} & {} \left. +\sum _{i=1}^p\beta _i \sum _{j=1}^{p+q}a_{q+i,j}^{(t)}\right] +\sum _{i=1, i\not = q}^{p+q-1}\sum _{j=1}^{p+q}a_{i,j}^{(t)}.\nonumber \\ \end{aligned}$$
(22)

We use expression (22) to develop bounds on \(\Vert {\textbf{A}}_{t}\cdots {\textbf{A}}_{1}\Vert \) that hold for all GARCH(pq) processes. As \(\phi _i\le \phi ^{(M)}\) for all \(i=1,\ldots ,p+q\) the term in the square brackets is bounded above by \(\phi ^{(M)}\Vert {\textbf{A}}_t\cdots {\textbf{A}}_1\Vert \) and as \(a_{i,j}^{(t)}\ge 0\) the last term is bounded above by \(\Vert {\textbf{A}}_t\cdots {\textbf{A}}_1\Vert \). By applying this combined bound iteratively, starting from t, we have

$$\begin{aligned} \Vert {\textbf{A}}_{t}\cdots {\textbf{A}}_{1}\Vert \le \left\{ \prod _{i=2}^t\left[ \phi ^{(M)} \left( Z_{i}^2 + 1\right) + 1 \right] \right\} \Vert {\textbf{A}}_{1}\Vert . \end{aligned}$$
(23)

The lower bound follows from expression (22) by dropping all terms involving \(\alpha _i, i=1,\ldots ,q-1\) and \(\beta _j, j=1,\ldots ,p-1\), taking a common lower bound on the coefficients of \(a_{i,j}^{(t)}\) terms for all i and j, and setting \(c_{p,q}:=\min (\alpha _q,\beta _p)>0\), to give

$$\begin{aligned}{} & {} \Vert {\textbf{A}}_{t+1}\cdots {\textbf{A}}_{1}\Vert \ge \min [(Z^2_t+1)c_{p,q},1] \Vert {\textbf{A}}_{t}\cdots {\textbf{A}}_{1}\Vert \\{} & {} \quad \ge \left( \prod _{i=2}^t \min [(Z^2_i+1)\min (\alpha _q,\beta _p),1]\right) \Vert {\textbf{A}}_{1}\Vert . \end{aligned}$$

The value of these bounds is not obvious at first sight given that they are stochastic and involve t products (where we are interested in \(t\rightarrow \infty \)) over unbounded variables. As the norm of the product \(\Vert {\textbf{A}}_{t}\cdots {\textbf{A}}_{1}\Vert \) is tending to zero it seems sensible to first normalise the size of the individual terms in the product to get more helpful bounds. We do this by scaling each matrix \({\textbf{A}}_i\) by \(\lambda _i^*\). Let

$$\begin{aligned} \Delta ^*_t = \prod _{i=1}^{t} \left( {\textbf{A}}_{t+1-i}/\lambda ^*_{t+1-i}\right) \text{ and } \eta ^*_t= \frac{1}{t}\ln \Vert \Delta ^*_t\Vert , \end{aligned}$$
(24)

then, using upper bound (23), gives that

$$\begin{aligned} \eta ^*_t \le \frac{1}{t}\sum _{i=2}^t \ln \left\{ \left[ \frac{\phi ^{(M)} \left( Z_{i}^2 + 1\right) + 1}{\lambda _{i}^*} \right] \right\} + \frac{1}{t}\ln \left( \frac{\Vert {\textbf{A}}_{1}\Vert }{\lambda _1^*}\right) .\nonumber \\ \end{aligned}$$
(25)

Using the property that \(\lambda ^*_i\ge \beta _p>0\) for any GARCH(pq) process, Lemma 2 in Appendix B for each term in expression (25), and that \(\log \) is an increasing function, we have

$$\begin{aligned} \eta ^*_t{} & {} \le \frac{t-1}{t}\ln \left[ \max \left( \frac{\phi ^{(M)}+1}{\beta _p}, \frac{\phi ^{(M)}}{\alpha _q}\right) \right] \\{} & {} \quad +\frac{1}{t}\ln \left[ \max \left( \frac{\phi }{\alpha _q},\frac{\phi +p+q-2}{\beta _p}\right) \right] . \end{aligned}$$

Similarly, the lower bound for the norm of the product together with Lemma 2 gives

$$\begin{aligned} \eta ^*_t{} & {} \ge \frac{1}{t}\sum _{i=2}^t\ln \left\{ \min [c_{p,q}(Z^2_i+1),1]/\lambda _i^*\right\} \\{} & {} \quad +\frac{1}{t}\ln \left[ \min \left( \frac{\phi }{\alpha _q},\frac{\phi +p+q-2}{\beta _p}\right) \right] . \end{aligned}$$

The final term in each inequality bound converges almost surely to 0 as \(t\rightarrow \infty \). The second term of \(\min [c_{p,q}(Z^2_i+1),1]\) occurs with probability \(\Pr (Z_i^2>(1-c_{p,q})/c_{p,q})\) and, as \(t\rightarrow \infty \), the sample mean of these terms converge to \(E[-\ln (\alpha _qZ^2+\beta _p)~|~Z^2>(1-c_{p,q})/c_{p,q}]\), which is a finite constant as \(E(\ln |Z|)<\infty \). Similarly the sample mean of the first terms in the minimum converges to \(\ln (c_{p,q})+E\{\ln [(\alpha _qZ^2+\beta _p)/(\alpha _qZ^2+\beta _p)]~|~Z^2<(1-c_{p,q})/c_{p,q}\}\), where the random variable concerned has finite bounds. These convergences follow by the strong law of large numbers and as \(E(|\ln \lambda ^*_t|)<\infty \). If \(\lim _{t\rightarrow \infty }\eta ^*_t\) exists, then this limit is in the finite interval \([c_L,c_U]\), where

$$\begin{aligned} c_L= & {} E\left( \ln \left\{ \min [c_{p,q}(Z^2+1),1]/(\alpha _q Z^2+\beta _p)\right\} \right) , \!\text { and } \nonumber \\ c_U= & {} \ln [(\phi ^{(M)}+1)/\min (\alpha _q,\beta _p)]. \end{aligned}$$
(26)

3.3 New formulations for \(\gamma \) in terms of \(\lambda ^*\) and \(\lambda \)

Theorem 1

Suppose that \(\{{\textbf{A}}_t\}\) are given by (3) with \(Z_t\) defined by (1), \(\lambda ^*_t\) and \(\lambda ^*\) are defined by (20), the limit (6) exists, and \(\eta ^*_t\) is defined by (24). It follows that as \(t\rightarrow \infty \), \(\eta ^*_t \xrightarrow {\text {a.s.}}\gamma -E(\ln \lambda ^*):=\eta ^*\), for \(p>0\), and for \(p=0\) this holds when \(E(\ln |Z|)<\infty \), with the constant limit \(\eta ^*\in [c_L,c_U]\), with these bounds defined by (26) and (35) for GARCH and ARCH processes respectively. Hence, for both processes the Lyapunov exponent satisfies \(\gamma =E(\ln \lambda ^*)+\eta ^*\), where \(\eta ^*\) is the almost sure limit of \(\eta ^*_t\).

At first sight the value of Theorem 1 seems limited, as it simply expresses a limit \(\gamma \) as function of two terms. Whilst one is a simple expectation, the other, \(\eta ^*\), is also a limit. The benefits though are for numerical evaluation and interpretation. Under definition (7), \(\gamma \) is the limit of \(\gamma _t\) as \(t\rightarrow \infty \) where both the numerator and denominator diverge to infinity, with the numerator a product of t unbounded stochastic terms and is numerically unstable to evaluate as a limit. In contrast there are finite bounds for \(\eta _t^*\) for all t.

An alternative version of Theorem 1 applies when the largest magnitude eigenvalue \(\lambda _t\) of the set of eigenvalues of \({\textbf{A}}_t\) replaces \(\lambda ^*_t\). Similarly to the definition of \(\eta ^*_t\), define

$$\begin{aligned} \Delta _t = \prod _{i=1}^{t} \left( \frac{{\textbf{A}}_{t+1-i}}{\lambda _{t+1-i}}\right) , \eta _t= \frac{1}{t}\ln \Vert \Delta _t\Vert \text{ and } \eta =\lim _{t\rightarrow \infty }\eta _t. \end{aligned}$$
(27)

Sect. 6.2 shows that \(\Delta _t\) and \(\eta _t\) are numerically stable to evaluate for large t, unlike \(\Delta ^*_t\) and \(\eta ^*_t\).

Theorem 2

Suppose that \(\{{\textbf{A}}_t\}\) are given by (3) with \(Z_t\) defined by (1) and the limit \(\gamma \) of expression (6) exists. Let \(\lambda ^*\) be defined by (20), \(\lambda \) be the largest magnitude eigenvalue of a matrix \({\textbf{A}}\), which is identically distributed to \({\textbf{A}}_t\), and let both \(\eta _t\) and \(\eta \) be defined by (27). We then have that \(\eta _t\xrightarrow {\text {a.s.}}\eta \) as \(t\rightarrow \infty \) where \(\eta \) is a finite constant, \(\eta {\mathop {=}\limits ^{\text {a.s.}}}\eta ^*+\mu \), where \(\mu =E(\ln \lambda ^*)-E(\ln \lambda )\), with \(-\infty< \mu <\infty \) and \(\gamma {\mathop {=}\limits ^{\text {a.s.}}}E(\ln \lambda )+\eta \).

To assess Theorems 1 and 2 in terms of what is already known we first compare with the GARCH(1, 1) process which is known to satisfy \({\textbf{A}}_{t}\cdots {\textbf{A}}_{1} = {\textbf{A}}_t \prod _{i=1}^{t-1} (\alpha _1Z^2_i+\beta _1) = {\textbf{A}}_t \prod _{i=1}^{t-1} \lambda ^*_i\), and \(\lambda ^*_i=\lambda _i\). So

$$\begin{aligned} \gamma _t= & {} \frac{1}{t}\ln \Vert \prod _{i=1}^{t}{\textbf{A}}_{t+1-i}\Vert \\= & {} \frac{1}{t} \ln \Vert {\textbf{A}}_t\Vert +\frac{1}{t}\sum _{i=1}^{t-1} \ln \lambda ^*_i \xrightarrow {\text {a.s.}}E(\ln \lambda ^*). \end{aligned}$$

Hence Mikosch and Stărică (2000) conclude that \(\gamma =E[\ln (\alpha _1Z_t^2+\beta _1)]=E(\ln \lambda ^*)\). Theorems 1 and 2 agree with this result, which implies that \(\eta ^*=\eta =0\) for all GARCH(1,1) processes.

4 Evaluating the spectral measure and the tail index

This section gives the details of our algorithm for sampling from the limit distribution \(H_{\hat{\varvec{\Theta }}_t}\) and then uses this algorithm repeatedly to find \(\kappa \). The algorithm requires no assumptions on the support for \(Z_t\). Throughout we take \(t=0\) as we start the tail process at that time in Sect. 5.1. We will first assume that \(\kappa \) is known and present Algorithm 1 for generating from \(H_{\hat{\varvec{\Theta }}^{-}_0}\) and then discuss the case when \(\kappa \) is unknown.

To simulate from the spectral measure \(H_{\hat{\varvec{\Theta }}^{-}_0}\), defined via (15), our approach is to introduce a series of distributions related via a recursion, and whose invariant distribution is \(H_{\hat{\varvec{\Theta }}^{-}_0}\). We will then use sequential importance sampling (Doucet et al. 2000, ) to generate approximate samples from these distributions at consecutive iterations. We simulate this process until convergence, and then take the samples after this time as draws from \(H_{\hat{\varvec{\Theta }}^{-}_0}\). The recursion can be viewed as that for distributions associated with Feynman-Kac models, and our algorithm is similar to the use of sequential Monte Carlo for sampling from Feynman-Kac distribution (e.g. Del Moral and Miclo 2000) and fixed point distributions (e.g. Del Moral and Miclo 2003).

Denote the series of random variables whose distribution we are simulating from by \(\widetilde{{\varvec{\Theta }}}_s\), for iteration \(s\ge 0\). The recursion relating the distributions of these random variables for \(s\ge 1\) is

$$\begin{aligned} \Pr (\widetilde{{\varvec{\Theta }}}_s \in \cdot )= \frac{ E(\Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}\Vert ^{\kappa }; {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}/\Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}\Vert \in \cdot )}{ E(\Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}\Vert ^{\kappa })}, \end{aligned}$$
(28)

where as above \(E(X;Y):=E(X {\textbf{1}}_Y)\). By construction, the invariant distribution of this process is \(H_{\hat{\varvec{\Theta }}^{-}_0}\), since if \(\widetilde{{\varvec{\Theta }}}_{s-1}\) is drawn from \(H_{\hat{\varvec{\Theta }}^{-}_0}\), then the right-hand side of (28) is equal to

$$\begin{aligned} \frac{ E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_0\Vert ^{\kappa }; {\textbf{A}}\hat{\varvec{\Theta }}_0/ \Vert {\textbf{A}}\hat{\varvec{\Theta }}_0\Vert \in \cdot )}{ E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_0\Vert ^{\kappa })}. \end{aligned}$$

As \(E(\Vert {\textbf{A}}\hat{\varvec{\Theta }}_0\Vert ^{\kappa }) = 1\), this distribution is equal to the definition of \(H_{\hat{\varvec{\Theta }}^{-}_0}\) given by expression (15).

Algorithm 1 uses importance sampling to generate a sample from \(\Pr (\widetilde{{\varvec{\Theta }}}_{s} \in \cdot )\) based on a sample from \(\Pr (\widetilde{{\varvec{\Theta }}}_{s-1} \in \cdot )\). This involves first simulating a value for \(\widetilde{{\varvec{\Theta }}}_{s}\) via \(\widetilde{{\varvec{\Theta }}}_{s}= {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}/\Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}_{s-1}\Vert \), and assigning this value a weight proportional to \(\Vert {\textbf{A}} \widetilde{ {\varvec{\Theta }}}_{s-1}\Vert ^\kappa \). Thus we can use sequential importance sampling to generate samples of \(\widetilde{{\varvec{\Theta }}}_{s}\) for \(s\ge 1\) from an initial sample of \(\widetilde{{\varvec{\Theta }}}_{0}\). The assessment of the algorithm’s convergence is discussed in Sect. 6.3.

Our approach is similar to the MCMC algorithm Basrak and Segers (2009) propose for simulating from \(H_{\hat{\varvec{\Theta }}^{-}_0}\). Both algorithms simulate from the same Markov process. The difference is that they simulate a single realisation of a path of the process, and simulate the dynamics at each time point using rejection sampling. By comparison we use sequential importance sampling to simulate a sample of values from expression (28) at each time-point. The main advantage of our approach is that we do not need the distribution of \(Z_t\) to be bounded. Algorithm 1 is also computationally more efficient than that of Basrak and Segers (2009) in situations where a bound exists, but is large, as their probability of rejection is close to 1.

As it stands neither Algorithm 1, nor the algorithm Basrak and Segers (2009), have guaranteed unique convergence to the true limit. For Feynman-Kac models, which include \(H_{\hat{\varvec{\Theta }}^{-}_0}\), Del Moral and Guionnet (2001) provide conditions under which the convergence of Algorithm 1 is unique. Laurini et al. (2022) show that these conditions amount to a subtle, but relatively weak, condition on the mixing of the process, but we have not been able to prove that condition is satisfied by all GARCH processes. Instead, Sect. 6.3 and Laurini et al. (2022) presents extensive empirical evidence to show Algorithm 1 converges to the correct limit each time it has been used, with the convergence achieved after very few steps whatever the initial distribution of \(\widetilde{{\varvec{\Theta }}}_{0}\). The results obtained there are in very strong agreement with empirical extremal properties of the GARCH(pq) process when estimated using very long simulations of the true process.

Now consider the situation when \(\kappa \) is unknown. For a trial value of k (for \(\kappa \)), apply Algorithm 1 until convergence, giving a sample of weighted particles \(\{ \widetilde{{\varvec{\Theta }}}^{(j)}(k), m^{(j)}(k) \}_{j=1}^J\) after the chain is deemed to have converged. Using these particles we approximate the expectation \(E(\Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}_0\Vert ^{k})\) to give \({\tilde{\rho }}_k\), by the Monte Carlo approximation of \(\rho _k\), as

$$\begin{aligned} {\tilde{\rho }}_k=\int _{{\mathbb {R}}}\sum _{j=1}^J \Vert {\textbf{A}}\widetilde{{\varvec{\Theta }}}^{(j)}(k)\Vert ^k\frac{m^{(j)}(k)}{\sum _{n=1}^J m^{(n)}(k)} F_Z(dz), \end{aligned}$$
(29)

where \(F_Z\) is the distribution function of \(Z_t\). We repeat this evaluation over \(k>0\) until we find the unique value of k which gives \({\tilde{\rho }}_k=1\). This value is \(k=\kappa \). See Sect. 6.4 for how we solve this equation whilst accounting for the Monte Carlo noise in \({\tilde{\rho }}_k\).

figure a

To use Algorithm 1 we need to be able to sample from a suitable random variable \(\widetilde{{\varvec{\Theta }}}_0\) on \({\mathbb {S}}^{p+q}\). Ideally the distribution of this random variable should be as close as possible to the target limit distribution function \(H_{\hat{\varvec{\Theta }}^-_0}({\textbf{w}})\), so that the rate of convergence of \(\widetilde{{\varvec{\Theta }}}_s\rightarrow ^d \hat{\varvec{\Theta }}_0\) as \(s\rightarrow \infty \) is maximised. From limit (11) we have that \(\Pr ({\varvec{\Theta }}^-_0 \le {\textbf{w}} \mid R_0> x) {\rightarrow } H_{\hat{\varvec{\Theta }}^-_0}({\textbf{w}})\), as \(x \rightarrow \infty \). So for large enough x, i.e., \(x\ge u\) for some high threshold u, if we treat this limiting representation as an equality, this gives us an initial estimate \(H^{(0)}_{\widetilde{{\varvec{\Theta }}}^-}\) of \(H_{\hat{\varvec{\Theta }}^-_0}\). We select u as a high quantile of the sample of \(R_t\) values, such that limit property (11) appears to be well represented, i.e., radial values appear to have a Pareto tail and radial and angular values appear to be independently distributed.

To obtain \(H^{(0)}_{\widetilde{{\varvec{\Theta }}}^-}\) we generate a sample of length n from the GARCH(pq) process and take the empirical distribution of simulated values of \({\varvec{\Theta }}_t\) given that \(R_t>u\) after a burn in period of \(n_b\) i.e.,

$$\begin{aligned} H^{(0)}_{\widetilde{{\varvec{\Theta }}}^-}({\textbf{w}})=\sum _{j=n_b+1}^n {\textbf{1}}(R_j>u, {\varvec{\Theta }}^-_j\le {\textbf{w}})/\sum _{j=n_b+1}^n {\textbf{1}}(R_j>u). \end{aligned}$$

As initial particles for Algorithm 1 we use all the realisations of \({\varvec{\Theta }}_t\) given that \(R_t>u\), for \(t=1,\ldots ,n\). We take \(n=1.1\times 10^7\), with u being the \(99.99\%\) quantile of \(R_t\) giving \(J=10^3\) particles, each with equal weight \(J^{-1}\).

5 Tail processes and associated properties for GARCH(\(p,q\))

5.1 Generation of different tail processes for GARCH(\(p,q\))

The tail process \(\{\hat{X}^2_t\}_{t\ge 0}\) is evaluated using Algorithm 2. There are two stages to the algorithm, initialisation and propagation. However, in most cases we are interested into the extremal properties of both the upper and lower tails for \(X_t\), and denote their respective tail processes by \(\{\hat{X}^U_t\}_{t\ge 0}\) and \(\{\hat{X}^L_t\}_{t\ge 0}\), defined in expression (32). These additional tail processes are evaluated using Algorithm 3, which is driven by outputs from Algorithm 2 and exploits the GARCH(pq) formulation (1). In the special case where \(p=0\), i.e., the ARCH(q) process, an adaption to Algorithm 3 is required.

For the initialisation stage for Algorithm 2, we consider the behaviour of the squared GARCH process conditional on it being in an extreme state at time \(t=0\), so we require that \(\hat{X}^2_0>1\). Focusing on limit (11) when \(t=0\), we have that the radial and angular limit variables are independent with distributions (12) and (15) respectively. We denote these limit variables by \(\hat{R}_0\) and \(\hat{\varvec{\Theta }}_0=({\hat{\vartheta }}_0^{(1)}, \ldots ,{\hat{\vartheta }}_0^{(p+q)}) \sim H_{\hat{\varvec{\Theta }}_0}\), with \(\hat{R}_0>1\) and \(\hat{\varvec{\Theta }}_0\in {\mathbb {S}}^{p+q}\). Direct simulation of \((\hat{R}_0,\hat{\varvec{\Theta }}_0)\) produces \(\hat{X}^2_0= \hat{R}_0{\hat{\vartheta }}_0^{(1)}>0\), not automatically \(\hat{X}^2_0>1\). So we use the rejection simulation method where replicates of \((\hat{R}_0, \hat{\varvec{\Theta }}_0)\) are generated until a realisation with \(\hat{X}^2_0>1\) is achieved, which is guaranteed to happen as \(\Pr (\hat{X}^2_0>1)>0\). For a generated pair \((\hat{R}_0, \hat{\varvec{\Theta }}_0)\), with \(\hat{X}^2_0>1\), we have most interest in \((\hat{X}^2_0, {\hat{\sigma }}^2_0)= (\hat{R}_0{\hat{\vartheta }}_0^{(1)}, \hat{R}_0{\hat{\vartheta }}_0^{(q+1)})\), but all elements of \((\hat{R}_0, \hat{\varvec{\Theta }}_0)\) are needed for Algorithms 2 and 3.

The propagation stage uses results of Basrak et al. (2002), with the tail chain for \(\hat{\varvec{\Theta }}\), denoted \(\{\hat{\varvec{\Theta }}_t^{TC}\}_{t\ge 0}\), given by

$$\begin{aligned} \hat{\varvec{\Theta }}_t ^{TC}= {\textbf{A}}_t \hat{\varvec{\Theta }}_{t-1}^{TC} \text{ for } t\ge 1 \text{ with } \hat{\varvec{\Theta }}_0 ^{TC}=\hat{\varvec{\Theta }}_0 \end{aligned}$$
(31)

with \(\hat{\varvec{\Theta }}_0\) being the vector generated in the initialisation step. From property (31), for \(t\ge 0\) we have \(\hat{\varvec{\Theta }}_t ^{TC}= {\textbf{A}}_t {\textbf{A}}_{t-1} \cdots {\textbf{A}}_1 \hat{\varvec{\Theta }}_0\). We denote the components of the vector variable in the tail chain by \(\hat{\varvec{\Theta }}_t^{TC}=({\hat{\vartheta }}_t^{(1)}, \ldots ,{\hat{\vartheta }}_t^{(p+q)})\) for \(t\ge 1\). From this tail chain for \(t=1, 2, \ldots \) and from the initialisation variable \(\hat{R}_0\), we determine the tail process for \(X_t^2\) and associated conditional variances, by setting \(\hat{X}^2_t=\hat{R}_0{\hat{\vartheta }}_t^{(1)}\) and \({\hat{\sigma }}^2_t= \hat{R}_0{\hat{\vartheta }}_t^{(q+1)}\) for all \(t\ge 1\).

Ideally tail chain (31) is run until \(t=T\), where T is such that \(\hat{X}^2_t<1\) for all \(t>T\) with probability 1. A good approximation of T is achievable as all components of \(\hat{\varvec{\Theta }}_t^{TC}\) have negative drift and converge almost surely to 0 as \(t\rightarrow \infty \). So T is taken as large as possible subject to limits of storage and computational time. Following sensitivity checks we took \(T=1000\), but for processes with weak extremal dependence \(T=50\) is more than sufficient.

figure b

For the class of GARCH(pq) processes (1) we are interested in both the upper and lower tails of \(X_t\), and denote their respective tail processes \(\{\hat{X}^U_t\}_{t\ge 0}\) and \(\{\hat{X}^L_t\}_{t\ge 0}\), with the latter the negated lower tail process of \(X_t\). Here the constituent variables of these processes are defined by

$$\begin{aligned} \begin{aligned} \hat{X}^U_t=&{} \lim _{u\rightarrow \infty } \{X_t/u | X_0>u\}, \text { and } \\ \hat{X}^L_t =&{} \lim _{u\rightarrow \infty } \{-X_t/u | X_0<-u\} \text { for } t\ge 0, \end{aligned} \end{aligned}$$
(32)

so \(\hat{X}^U_0>1\) and \(\hat{X}^L_0>1\), with no constraints on the subsequent values/signs of the subsequent terms in the tail processes. When the distribution of the innovations \(Z_t\) is symmetric about 0 these two processes are identical stochastically. When deriving the upper and lower tail processes for \(X_t\) it may appear best to evolve the strategy of de Haan et al. (1989) by starting directly from the \(\{\hat{X}^2_t\}_{t\ge 0}\) tail process and to square root this process and filter the series retaining positive values by using independent Bernoulli\((\delta )\) variables, with \(0\le \delta \le 1\) termed the tail balance, given by Breiman’s lemma to be

$$\begin{aligned} \delta= & {} \Pr (X_t> x \mid |X_t|>x)\\= & {} \lim _{x\rightarrow \infty } \Pr (\sigma _t(Z_t)_+>x)/\Pr (\sigma _t|Z_t|>x)\\= & {} E((Z_t)_+^{2\kappa })/E(|Z_t|^{2\kappa }), \end{aligned}$$

where \((Z_t)_+=\max (Z_t,0)\). Then \(\delta =1/2\) when the tails are symmetric (this is the case considered by de Haan et al. (1989)) and if \(\delta =0\) or 1 the lower and upper tail are dominant respectively. Unfortunately \(\{\hat{X}^2_0\}\) and the Bernoulli\((\delta )\) variable associated with this are not independent, which we are grateful for a referee for identifying. Although \(\delta \) is not required for our approach we do report its value as it gives a good indication of the general asymmetry in the tails of GARCH processes.

A more subtle approach is required. Critically, in addition to the \(\{\hat{X}^2_t\}_{t\ge 0}\) tail process, Algorithm 2 also provides the associated values of \(\{{\hat{\sigma }}^2_t\}_{t\ge 0}\) when \(p>0\). For GARCH(pq) processes we have the relationship \(Z_t^2=X_t^2/\sigma _t^2\) for all t for the innovations. Hence for realisations of \((\hat{X}^2_t, {\hat{\sigma }}^2_t; t=0,\ldots ,T)\) we let \(s_t:=\hat{X}^2_t/{\hat{\sigma }}^2_t\) denote the corresponding value of \(\hat{Z}^2_t\). These \(s_t\) values are observations from the i.i.d. variables \(Z_t^2\), for \(t=0,\ldots ,T\), with \(Z_t\) having probability density \(f_Z\). Given \(\{s_t\}_{0\le t\le T}\), the signs of the associated \(Z_0, \ldots ,Z_T\) variables are determined by independent Bernoulli\((\delta _t)\) variables with

$$\begin{aligned} \delta _t= & {} \Pr (Z_t>0 \mid Z_t^2=s_t)\nonumber \\= & {} f_Z(s_t^{1/2})/[f_Z(s_t^{1/2})+f_Z(-s_t^{1/2})]. \end{aligned}$$
(33)

Taking account of whether \(Z_0\) is positive or negative determines the tail processes \(\{\hat{X}^U_t\}_{t\ge 0}\) and \(\{\hat{X}^L_t\}_{t\ge 0}\), see Algorithm 3.

When \(p=0\), i.e., with ARCH(q) processes, then \({\textbf{Y}}_t\) in the relationship (3) does not involve \({\hat{\sigma }}^2_t\) in \(\hat{\varvec{\Theta }}_0\in {\mathbb {S}}^{q}\). As we need \({\hat{\sigma }}^2_t\) for \(t=0, \ldots ,T\) in Algorithm 3, the methods used for the GARCH(pq) processes do not immediately apply. However, with a slight of hand, we can derive the results for this case using the GARCH(1, q) process formulation (3) with \(\beta _1=0\), and then applying Algorithms 1-3. This works as \({\hat{\sigma }}^2_t\) is then part of the initial simulation and propagation of \(\{\hat{\varvec{\Theta }}^{TC}_t\}_{t\ge 0}\).

figure c

5.2 The evaluation of cluster functionals for squared and original GARCH processes

From repeated realisations of the tail processes \(\{\hat{X}_t^2\}_{t\ge 0}\), \(\{\hat{X}_t^U\}_{t\ge 0}\) and \(\{\hat{X}_t^L\}_{t\ge 0}\) we can derive the properties of key cluster functionals for the \(X_t^2\), \(X_t\) and \(-X_t\) processes respectively. Here we illustrate this for the extremogram, the extremal index and the cluster size distribution.

The extremogram for the squared GARCH(pq) process is

$$\begin{aligned} \chi _{X^2}(\tau )=\Pr (\hat{X}^2_{\tau }>1 | \hat{X}^2_{0}>1)= \Pr (\hat{X}^2_{\tau }>1). \end{aligned}$$

Similarly the extremogram for the upper and lower tails of the GARCH(pq) process are given by \(\chi _{X^U}(\tau )=\Pr (\hat{X}^U_{\tau }>1 | \hat{X}^U_{0}>1)= \Pr (\hat{X}^U_{\tau }>1)\) and \( \chi _{X^L}(\tau )=\Pr (\hat{X}^L_{\tau }>1 | \hat{X}^L_{0}>1)= \Pr (\hat{X}^L_{\tau }>1)\). Thus, in each case, we can estimate the extremogram as the proportion of simulated tail processes, over independent replicates, with an exceedance of 1 at \(t=\tau \).

Rootzén (1988, Theorem 4.1) and de Haan et al. (1989) derive both the extremal index and the cluster size distribution for a process by first deriving the probability that the forward tail process has i exceedances, so that there are at least i values being in a cluster. Expressed in terms of the tail process \(\{\hat{X}^2_t\}_{t\ge 0}\) for the squared GARCH this gives \(\theta _{X^2}^{(i)} = \Pr \bigl (\#\{t=1,2 \ldots :\hat{X}^2_t>1\} = i-1 \mid \hat{X}_0^2 > 1 \bigr )\). Then the extremal index \(0<\theta _{X^2}\le 1\) of the squared GARCH process, is given by \(\theta _{X^2}=\theta _{X^2}^{(1)}\) and the probability that a cluster is of size i is given by \(\pi _{X^2}(i) = [\theta _{X^2}^{(i)} -\theta _{X^2}^{(i+1)}]/\theta _{X_2}^{(1)} \text{ for } i=1,2, \ldots \), with the reciprocal of the mean of this cluster size distribution being \(\theta _{X^2}\). Identical results hold for extremal indices \(\theta _{X^U}\) and \(\theta _{X^L}\) for \(X_t\) and \(-X_t\) processes respectively.

6 Investigation of the performance of the algorithms

6.1 Introduction

Throughout this section a range of GARCH(pq) models will be illustrated. The details of these models are given here and will be referenced subsequently as GARCH models A-E, where

A: \(p=q=2\) with \((\alpha _1,\alpha _2, \beta _1,\beta _2)=(0.3,0.15,0.2,0.1)\) so \(\phi =0.75 \).

B: \(p=q=2\) with \((\alpha _1,\alpha _2,\beta _1,\beta _2)=(0.07,0.04, 0.8,0.08)\) so \(\phi =0.99 \).

C: \(p=q=1\) with \((\alpha _1,\beta _1)=(0.1, 0.9)\) so \(\phi =1 \).

D: \(p=q=2\) with \((\alpha _1,\alpha _2,\beta _1,\beta _2)=(0.07, 0.03, 0.8, 0.1)\) so \(\phi =1 \).

E: \(p=0\), \(q=2\) with \((\alpha _1,\alpha _2)=(1.2, 0.5 )\) so \(\phi =1.7\).

We selected these volatility models to give a variety of extremal behaviours, with all of these models known to be strictly stationary. Models A and B are also weak stationary. Models C and D, which are IGARCH models, and model E (Francq and Zakoïan 2010, p. 35) are not weak stationary as they have \(E(X_t^2)=\infty \). With the exception of model C all models have \(\max (p,q)\ge 2\), so extremal properties cannot be derived using existing methods of Laurini and Tawn (2012). Model C, with \(\max (p,q)=1\), is an IGARCH(1,1) process, is not covered by previous results but it helps to illustrate the new methods in a case where some analytical solutions are possible. Model E is a ARCH(2) process. The parameter \(\phi \) increases from model A to E, with \(\phi =1\) for models C and D.

To illustrate the effect of different choices for the distribution of the innovation \(Z_t\), throughout the remainder of the paper we focus on the special cases [1] a scaled Student-\(t_{\nu }\) distribution, [2] the skew Student-\(t_{\nu }\) distribution introduced in Azzalini and Capitanio (2003) and [3] the standard normal distribution. In each case the innovation distribution has zero mean and unit variance. First consider the univariate skew-t distribution, denoted by \(\text {St}(\mu , \omega , \xi ,\nu )\), where \((\mu , \omega , \xi ,\nu ) \in {\mathbb {R}}\times {\mathbb {R}}_+\times {\mathbb {R}}\times (2,\infty )\) are location, scale, skewness and degrees of freedom parameters respectively. For the existence of the variance of Z we require that \(\nu >2\). The distribution of Z has density

$$\begin{aligned}{} & {} f_Z(z; \mu , \omega , \xi ,\nu ) \\{} & {} \quad = \frac{2}{\omega }f_T(z_S; \nu ) F_T\left( z_S\xi \sqrt{\frac{\nu +1}{\nu + z_S^2}}; \nu +1\right) , ~~z\in {\mathbb {R}} \end{aligned}$$

where \(z_S=(z-\mu )/\omega \), and \(f_T(\cdot ; \nu )\) and \(F_T(\cdot ; \nu )\) denote, respectively, the density and distribution function of a Student-t random variable with location and scale parameters 0 and 1 respectively and with \(\nu \) degrees of freedom. The moment conditions on Z of model (1) require that

$$\begin{aligned}{} & {} \omega =\left[ \frac{\nu }{\nu -2}- b^2_{\nu , \xi } \right] ^{-1/2} \hbox { and }\mu =-\omega b_{\nu , \xi },\\{} & {} \hbox {where\,} b_{\nu , \xi } = \frac{\xi }{(1+\xi ^2)^{1/2}}\left( \frac{\nu }{\pi }\right) ^{1/2} \frac{\Gamma (\nu /2 - 1/2)}{\Gamma (\nu /2)}, \end{aligned}$$

with \(\Gamma \) being the gamma function and where \(\nu \) and \(\xi \) are constrained further to ensure that \(\omega \) is a positive real number. The parameter \(\xi \in {\mathbb {R}}\) controls the skewness: \(\xi >0\) and \(\xi <0\) correspond to right and left skew respectively, and \(\xi =0\) to a symmetric distribution. Important special cases arise when: \(\xi = 0\), Z is the (scaled) Student-\(t_{\nu }\) distribution; \(\nu \rightarrow \infty \), Z is a skew-Normal distribution; \(\xi = 0\) and \(\nu \rightarrow \infty \), Z is the standard normal distribution.

Table 1 presents values of \(\gamma , \eta , \kappa , \theta _{X^2}, \theta _{X^U}, \theta _{X^L}\) and \(\delta \) for each of models A-E and for the three innovation distributions, where here, and subsequently, use of volatility model A with innovation distribution [1] is denoted as model A1 and similarly for all possible 15 combinations. These values are derived using our numerical methods and evidence for their validity is provided in Sects. 6.2-6.6 for some of the models, and for all the rest in Laurini et al. (2022). Although the Monte Carlo evaluation of these results are subject to noise, it is possible to obtain any desired level of accuracy by running sufficient replicates and assessing the variability using central limit results. Throughout, we have ensured that all numerical values reported are accurate to the stated number of decimal places with a very high probability.

Table 1 Values of key stationarity and extremal properties for models A–E for three innovation distributions: 1 \(t_3\); 2 skew \(t_3\) with \(\xi =1\); and 3 Gaussian, with Model A3 denoting GARCH model formulation A with innovation distribution 3. Empty entries for \(\theta _{X^L}\) are identical to the associated \(\theta _{X^U}\)

6.2 Evaluation of \(\gamma \) and \(\eta \)

Using expression (7) for the evaluation of \(\gamma \), suggests using Monte Carlo methods taking t to be very large. Through Theorems 1 and 2, we have two new ways to evaluate \(\gamma \) using \((\lambda ^*, \eta ^*)\) and \((\lambda , \eta )\) respectively. Of our methods, we prefer Theorem 2 as we found it had superior computational stability and reliability. We illustrate the performance of these methods for model A3, with similar results found for the other models.

Fig. 1
figure 1

Monte Carlo properties for evaluation of \(\gamma \) for GARCH model A3 against iteration t: left shows \(\gamma _t\); right \(\eta _t\). Both panels have the same 10 replicates of \({\textbf{A}}_1, \ldots , {\textbf{A}}_{t}\) displayed by greyscale lines. In the left panel each trace terminates at a random time \(t_i\) for the ith replicate, with \(\gamma _t=-\infty \) for all \(t \ge t_i\)

Fig. 2
figure 2

Illustrations of Algorithm 1 convergence for model C3 at iterations \(s=\{0,1, 2, 100\}\). Left, spectral distribution function shown by thick grey solid line being \(H^{(0)}_{\widetilde{{\varvec{\Theta }}}^-}(w)\) and the true limit distribution \(H_{\hat{\varvec{\Theta }}^-}(w)\) is shown by thick black line. For \(s=0\) the 95% confidence intervals are given by light grey lines. Right, kernel density estimate for the particle mass. Left panel has \(s=0,1, 2\) and \(s=100\) (only \(s=0\) is visible as the other values are equal to the true limit); Right panel has \(s=\{1,2,100\}\): \(s=1\)—dashed grey line, \(s=2\)—dotted dark grey line and \(s = 100\) is the black thick solid line

Figure 1 (left) shows that \(\gamma _t\), evaluated using expression (7), has serious numerical instabilities for large t. The plotted traces of ten independent realisations of \(\gamma _t\) appear to have little variability, each with roughly the same negative value for \(500<t<2000\). Critically though, the plotting of each trace terminates at a random time point \(t_i>2000\) for each of the \(i=1, \ldots ,10\), with the numerically derived values of \(\gamma _t=-\infty \) for \(t\ge t_i\) not plotted in Fig. 1 (left). For large t all entries of the matrix \(\textbf{A}_{t}\cdots \textbf{A}_1\) are non-negative, but at \(t_i\) they are all calculated as being equal to 0 to computer machine precision, and hence \(\gamma _{t_i}=-\infty \). For \(t>t_i\) it follows that \(\gamma _t=-\infty \). The value of \(t_i\) varies over replications as it depends on the set of realisations of \(\{Z_t; t\le t_i\}\) in replicate i. These numerical findings suggest we cannot draw conclusions about the convergence of \(\gamma _t\), as \(t\rightarrow \infty \), from direct Monte Carlo evaluation of \(\gamma \) using expression (7). Similar numerical degenerate features were found using \(\eta ^*_t\), although the associated times of degeneracy, \(t_i\), were found to be larger than in Fig. 1.

This numerical instability for evaluating \(\gamma \) does not appear to have been reported. For example, estimates of \(\gamma \) using this approach are presented for ARCH(2) processes in Francq and Zakoïan (2010, p. 34/35), however they stop evaluating \(\gamma _t\), when \(t=1000\), which is before we see the critical failure of numerical evaluation in Fig. 1. By increasing t, for the cases they study, we find identical numerical problems to those experienced for model A3 in Fig. 1.

From Theorem 2, \(\gamma =E(\ln \lambda )+\lim _{t\rightarrow \infty } \eta _t\), with \(\eta _t\) defined by (27). Critically, here each of the \(\textbf{A}_{t}\) terms in the product are first divided by the associated \(\lambda _t\) before the matrix multiplication, thus avoiding any computer precision problems. Figure 1 (right) shows that \(\eta _t\) is converging for each of the same 10 replicates shown (left) and does not experience any numerical instability; a finding that holds whatever the value of t. The value we report for \(\eta =\lim _{t\rightarrow \infty } \eta _t\) is evaluated as the mean over the ten different realisations using \(t=3000\), giving \(\eta =0.019\). We evaluated \(E(\ln \lambda )\) by both numerical integration and using the Monte Carlo approximation \(\sum _{i=1}^{t}\ln (\lambda _i)/t\) for large t and obtained identical results of \(-0.359\) once Monte Carlo noise is accounted for. Combining results we find that \(\gamma =-0.359+0.019=-0.34<0\), showing that model A3 satisfies the strict stationarity condition.

Fig. 3
figure 3

Convergence of Algorithm 1 for model D3 at iterations \(s=\{0,1, 2, 100\}\) with marginal distribution of the spectral measure convergence for \(H_{{\widetilde{\vartheta }}^{(i)}_s}(w_i)\), for \(i=1,2,3\) and the kernel density for the particle mass. Line types are as for Fig. 2. The rightmost panel displays \(s = \{1, 2, 100\}\). Although we do not have an expression for the true values of marginal distributions, our algorithm gives identical values for these distributions for all iterations with \(s > 100\). So it is reasonable to interpret the algorithm to have converged by iteration \(s=100\) and so for the distributions with \(s=100\) to be the truth

6.3 Convergence of Algorithm 1

We illustrate the convergence of Algorithm 1 for models C3 and D3. First consider model C3 where the true distribution of \(\hat{\varvec{\Theta }}_0\), here a scalar, is given by expression (13). Hence we can compare the s iteration estimate \(H^{(s)}_{\widetilde{{\varvec{\Theta }}}^-}(w)\) against the truth \(H_{\hat{\varvec{\Theta }}^-_0}(w)\). Figure 2 illustrates this distributional convergence as well as that of the distribution of the particle weights, \(m^{(j)}_s\) (\(j=1, \ldots , J\)) of expression (30) on iteration s, where we take \(J=10^6\). First note that the 95% pointwise confidence intervals of the initial estimate \(H^{(0)}_{\widetilde{{\varvec{\Theta }}}^-}\) given in Sect. 4 do not contain the true target distribution \(H_{\hat{\varvec{\Theta }}^-_0}(w)\). Despite having a good initial guess for Algorithm 1, there is a statistically significant difference here due to the slow convergence of the distribution of \(\Pr ({\varvec{\Theta }}^-_0<w|R_0>u)\) to \(H_{\hat{\varvec{\Theta }}^-_0}(w)\) as \(u\rightarrow \infty \).

After one step of Algorithm 1 we have that \(H^{(s)}_{\widetilde{{\varvec{\Theta }}}^-}(w)\) for \(s\ge 1\) is equal to \(H_{\hat{\varvec{\Theta }}^-_0}(w)\) to within visible detection, so convergence is achieved in one step. Initially, i.e., for \(s=0\), all particle weights are equal to \(J^{-1}\), but, as Fig. 2 (right) shows, within an iteration they have quite a different distribution of weights and that this distribution essentially has converged at \(s=2\) to its limit form. Thus Algorithm 1 works exceptionally well in this case where we know the answer. Similar tests over other GARCH(1,1) processes gave identical convergence performances. We found almost perfect convergence after one or two iterations whatever the initial distribution estimate. This feature is consistent with there being a unique solution for \(H_{\hat{\varvec{\Theta }}^-_0}(w)\) for this model and that our algorithm is robust and highly efficient in converging to this limit.

Next we assess the convergence of \(H^{(s)}_{\widetilde{{\varvec{\Theta }}}^-}({\textbf{w}})\), for model D3 where \(\hat{\varvec{\Theta }}^-_0\) has three dimensional distribution we do not know. We cannot easily show graphically the full joint distribution convergence and even for lower dimensional summaries we can only show the algorithm converges to some limit. Figure 3 illustrates convergence for each of the marginal distributions of \(H^{(s)}_{\widetilde{{\varvec{\Theta }}}^-}({\textbf{w}})\) and the distribution of the particle weights over iterations, for \(s=0, 1, 2\) and 100. We also assessed (not shown) the convergence of the dependence structure of \(H^{(s)}_{\widetilde{{\varvec{\Theta }}}^-}({\textbf{w}})\) through monitoring how \(\text{ corr }({\widetilde{\vartheta }}^{(i)}_s,{\widetilde{\vartheta }}^{(j)}_s)\) converges as \(s\rightarrow \infty \), for all \(1\le i<j\le p+q-1\), where \(\widetilde{{\varvec{\Theta }}}_s^-=({\widetilde{\vartheta }}^{(1)}_s, \ldots ,{\widetilde{\vartheta }}^{(p+q-1)}_s)\). In all the cases we explored, visual convergence was obtained before \(s=10\) and identical values found for \(s>100\).

The same limit values were achieved for a wide range of starting values which reassure us that there is a unique limit. Although in our examples the selection of a good initialisation to the algorithm (Sect. 4) was not found to be important on the convergence speed, we anticipate it could be when the complexity of the GARCH(pq) model increases. Our extensive empirical findings in Sect. 7 and Laurini et al. (2022) show that the limit we find using Algorithm 1 leads to values of \(\kappa \) and the extremogram and extremal index all fully in accord with long run simulations of the GARCH process, for each of models A-E, which gives us extra confidence in the convergence.

6.4 Evaluation of \(\kappa \)

Basrak and Segers (2009), and subsequent authors, imply that the way to evaluate \(\kappa \) is by numerical solution of the limiting equation (16), although they do not illustrate this. In Fig. 1 (left panel) we showed that there are major numerical instabilities in evaluating \(\Vert {\textbf{A}}_t \cdots {\textbf{A}}_1\Vert \) for large t; so in practice it is impossible to solve equation (16) directly. We have developed a numerically robust approach for evaluating \(\kappa \) for any p and q, using Algorithm 1. Here we illustrate this method and discuss the values of \(\kappa \) that are obtained for models A-E.

To calculate \(\kappa \) we apply Algorithm 1, iterating over different values of k. Key to the solution is the evaluation of the Monte Carlo estimate \({\tilde{\rho }}_k\) of \(\rho _k\) in expression (29). Figure 4 shows \({\tilde{\rho }}_k\) against k for each of the models A3-E3. There is clearly a unique solution for \(k>0\) to the equation \({\tilde{\rho }}_k=1\), with the values of \(k=\kappa \) that solve this equation given in Table 1. We restricted the Monte Carlo noise in the estimates \({\tilde{\rho }}_k\) of \(\rho _k\), for each value of k shown in Fig. 4, by taking \(J = 10^6\) and evaluated the Monte Carlo integral (29) with \(10^4\) replicates on Z to get \(\kappa \) to the required precision. To find \(\kappa \), from the curve of \({\tilde{\rho }}_k\), we used an initial grid search coupled with a bisection method.

Fig. 4
figure 4

Plots of \((k, {\tilde{\rho }}_k)\): left, for models A3 (—) and B3 (\(\cdots \)) which are weak stationary; right for models C3 (black dashed), D3 (grey solid) and E3 (black dotted), which are not weak stationary. In all panels grey dotted lines represent horizontal and vertical lines set at 1

Fig. 5
figure 5

Diagnostic QQ plot for the marginal tail of the squared GARCH models A3 and B3, left and right respectively, comparing empirical and limit distributions. Results are based on 1000 simulations of \(5\times 10^7\) GARCH processes with threshold x the 0.99998 marginal quantile. The solid line has a gradient \(\kappa \) and the conditional quantiles of empirical estimators are shown for \(2.5\%-97.5\%\) as the shaded region and for \(25\%, 50\%\) and \(75\%\) quantiles as grey lines

Fig. 6
figure 6

Extremogram \((\tau , \chi _{X^2}(\tau ))\) for squared GARCH processes of models A3-A1 (top-bottom left panels) and D3-D1 (top-bottom right panels). Black lines are our algorithm’s approximation of the true limit values and the three grey lines are empirical extremogram estimates \({\tilde{\chi }}_{X^2}(\tau , u)\), based on a sample of size \(n=5\times 10^7\), at u corresponding to 0.99 (continuous solid light grey), 0.999 (dashed grey) and 0.9999 (dotted dark grey) quantiles of \(X_t^2\)

Table 1 illustrates that \(\phi \) has an impact on the value of \(\kappa \). When \(\phi \not = 1\) no explicit relationship appears to hold between \(\phi \) and \(\kappa \), as \(\kappa \) changes markedly with the innovation distribution. When \(\phi <1\) the shorter the tail of the innovation distribution gives the larger \(\kappa \) and hence shorter tails of the GARCH(pq) marginal distribution; whereas the reverse holds when \(\phi >1\); and when \(\phi =1\) then \(\kappa \) is invariant to the innovation distribution, with \(\kappa = 1\). The case when \(\phi >1\) is somewhat surprising, as it might be expected that having a heavier tail innovation would result in a heavier tailed process, whereas in fact the opposite occurs. This feature is found to hold much more broadly, and can easily be illustrated in the GARCH(1,1) case. The explanation is that the larger \(\phi \) is the greater \(\sigma ^2_t\) can grow over a period of time, and ultimately the large \(\sigma ^2_t\) values produce the extremes and not isolated large \(Z^2_t\) values.

Next, we illustrate that the derived value of \(\kappa \) is consistent with the GARCH(pq) process’ observed marginal tail. The observable tail can be derived from long run simulations. We compare the limiting probabilities \(\Pr (\hat{X}_t^2> r \mid \hat{X}_t^2 > 1)=r^{-\kappa }\) with the empirical estimate of the probabilities \(\Pr (X_t^2> rx \mid X_t^2 > x)\) for very large x over a range of \(r>1\). Figure 5 shows this comparison for x being the 0.99998 marginal quantile on a log-scale. If \(\kappa \) is correct and x is large enough, the log-probabilities should be proportional with gradient \(\kappa \). The results show that the limit tail is consistent with the empirical distribution subject to Monte Carlo noise, and hence the \(\kappa \) value seems appropriate.

6.5 Extremogram

Figure 6 gives the extremogram \(\chi _{X^2}(\tau )\) for the squared of GARCH process for models A3, A1, D3 and D1, with equivalent plots for the other models given in Laurini et al. (2022). In all cases the heavier tailed innovation distribution leads to weaker extremal dependence at all lags. Models D3 and D1, both IGARCH processes, exhibit much slower decay rates in extremal dependence as lag \(\tau \) increases than for models A3 and A1 with \(\phi <1\). The level of extremal dependence appears to be strongly related to \(\phi \). Furthermore, we see for models D3 and D1 that \(\chi _{X^2}(2)>\chi _{X^2}(1)\), with \(\chi _{X^2}(\tau )\) decaying monotonically for \(\tau \ge 2\). We believe that the reason for this feature is that \(\beta _2>\max (\alpha _1, \alpha _2)\), which makes the variance \(\sigma ^2_t\) more likely to be larger at observations with lag 2 than lag 1, and hence this induces stronger clustering of extreme values two time points apart.

An empirical estimate \({\tilde{\chi }}_{X^2}(\tau , u)\), where u is a threshold, of the extremogram of \(\chi _{X^2}(\tau )\) based on a sample of length n from a GARCH(pq) process is given by

$$\begin{aligned} {\tilde{\chi }}_{X^2}(\tau , u)=\sum _{j=1}^{n-\tau } {\textbf{1}}(X^2_j>u, X^2_{j+\tau }>u)/\sum _{j=1}^{n-\tau } {\textbf{1}}(X^2_j>u). \end{aligned}$$

Figure 6 shows \({\tilde{\chi }}_{X^2}(\tau , u)\) for large n and for three threshold choices u corresponding quantiles of \(X_t^2\). This gives strong evidence that our evaluation of \(\chi _{X^2}(\tau )\) is accurate. Specifically, the agreement with limit values \(\chi _{X^2}(\tau )\) is very good generally, with the empirical estimates suffering from bias and variance trade-off, as with all threshold based estimates. Model A has the slowest convergence of the empirical estimators, but even here at the highest threshold there is almost perfect overlap between empirical estimates and the true values for all lags. In contrast, for model D the highest threshold produces the least good estimate, presumably due to its high variance. Additional comparisons are given in Laurini et al. (2022) for the other models.

6.6 Evaluation of the extremal index

Algorithms 1-3 give output values than can be used to evaluate the extremal index, \(\theta _{X^2}, \theta _{X^U}\) and \(\theta _{X^L}\), for processes \(X_t^2\), \(X_t\) and \(-X_t\) respectively by using the methods discussed in Sect. 5.2. Here we show that these values are consistent with empirical estimates obtained from long-run simulations from the GARCH(pq) process.

Fig. 7
figure 7

The extremal indices \(\theta _{X^2}, \theta _{X^U}\) and \(\theta _{X^L}\) for the model D2 with asymmetric Z (left to right), as calculated using our algorithms, shown by the horizontal dotted line. Runs estimator (\(m=100\) (\(+\)) and \(m=1000\) (\(\times \))) and the intervals estimator (\(\circ \)) are illustrated with their estimated 95% confidence intervals (shaded regions) based on 100 independent replicates of \(n=10^8\) sample data. The number of exceedances for each u are reported on the top axis

We can estimate the extremal index for a stationary process \(\{V_t\}\) using the runs estimator, \({\tilde{\theta }}_{V}(u,m)\), proposed by Smith and Weissman (1994), based on a sample of size n, a threshold u and a run length m, where

$$\begin{aligned} {\tilde{\theta }}_{V}(u,m)= & {} {\hat{\Pr }}\bigl (V_t<u; t=1,2,\ldots ,m \mid V_0> u \bigr )\\= & {} \frac{\sum _{j=1}^{n-m} {\textbf{1}}(\max (V_{j+1}, \ldots , V_{j+m})<u, V_j>u)}{\sum _{j=1}^{n-m} {\textbf{1}}(V_j>u)}. \end{aligned}$$

We take \(V_t=X^2_t, X_t\) and \(-X_t\) respectively, where \(X_t\) is a GARCH(pq) process. We compare the runs estimator (for a range of u and m) with our values of the extremal indices. As the selection of the run length m in the runs estimator is subjective (we take \(m=100\) and 1000 to assess sensitivity to this choice), we also consider the intervals estimator of Ferro and Segers (2003) where m is objectively chosen for each u based on the distribution of inter-arrival times of consecutive exceedances of u by \(\{V_t\}\).

Fig. 8
figure 8

Plot of \(\kappa \) (black) and \(\kappa _A\) (grey) against \(\phi \), for classes 1 (left) and 2 (right): for \(\omega = (0.25, 0.5, 0.75)\) corresponding to continuous, dashed and dotted line types respectively. The vertical and horizontal black dotted-and-dashed lines cross at \(\phi = \kappa = 1\). The values of \(\phi \) which our algorithms produce a value of \(\kappa = 0\) correspond to GARCH models that are not strictly stationary

For model D2, Fig. 7 shows \(\theta _{X^2}, \theta _{X^U}\) and \(\theta _{X^L}\) together with the runs estimator (\(m=100\) and 1000) and the intervals estimator, based on \(n=10^8\), for a range of values of u and then averaged over 100 independent replicates. Also shown are the estimated 95% confidence intervals (of these averaged estimators), with uncertainty increasing with larger u. Of the two estimators, the intervals estimator is more stable and typically closest to the limit value. In each case we see that the runs estimator with \(m=100\) is overestimating all three extremal indices for all thresholds, indicating that this choice of m is too small for the level of extremal dependence. Although the runs estimator with \(m=1000\) and the intervals estimator perform similarly, they both suffer underestimation for lower thresholds. It is reassuring that at the highest thresholds our values of the extremal indices typically fall inside the confidence intervals despite them being narrow, e.g., of width \(0.01-0.02\). Extensive comparisons, with similar conclusions, are given in Laurini et al. (2022) for all 15 of the models covered in Table 1.

Now that we are confident that we have reliable estimates of the extremal index, we examine the implications for the three extremal indices for models A-E, and each innovation distributions, with these values given in Table 1. Recall that the extremal index, \(\theta _{V}\), of the process \(\{V_t\}\) determines the average size of clusters of extremes of \(\{V_t\}\) via \(1/\theta _{V}\). We find that shorter tailed innovations give larger clusters on average. For increasing \(\phi \), for \(0<\phi \le 1\), we have increasing average cluster sizes, but that pattern does not follow when \(\phi >1\). In all cases, \(\min (\theta _{X^L},\theta _{X^U})\ge \theta _{X^2}\), indicating the extremes of the processes \(\{X_t\}\) and \(\{-X_t\}\) exhibit less clustering on average than the \(\{X^2_t\}\) process. When \(\xi =0\) then \(\theta _{X^L}=\theta _{X^U}\), whereas for \(\xi >0\) more clustering occurs in the upper than in the lower tail of the GARCH(pq) process, with the reverse when \(\xi <0\).

7 Study of extremal properties for subclasses of the GARCH(2,2) process

7.1 Subclass specification

Although Sect. 6 illustrates that our algorithms provide robust methods for the computational evaluation of extremal properties of a general GARCH(\(p,q\)) process, a systematic study of the extremal properties of a general GARCH(\(p,q\)) process is beyond the scope of this paper. However, we consider some investigation of these properties is justified for simple subclasses, to show how different features of the construction affect the various extremal characteristics.

We consider the following classes of GARCH(2, 2) with Gaussian innovations, Class 1: \(\alpha _1=\alpha _2=\omega \phi /2, \beta _1=\beta _2=(1-\omega )\phi /2\) and Class 2: \(\alpha _1=\beta _1=\omega \phi /2, \alpha _2=\beta _2=(1-\omega )\phi /2\), with \(\phi >0\) having its usual meaning (2). Here \(0<\omega <1\) controls the relative importance of the past process values relative to the past volatility values (Class 1) and values of process and volatility at \(t-1\) to \(t-2\) (Class 2), with increasing \(\omega \) giving more weight the former in each case. For \(0<\phi \le 1\) the process is stationary, whereas the necessary condition for strict stationarity when \(\phi >1\), given in Sect. 2.2, implies \(\phi <1/(1-\omega )\) (Class 1) and \(\phi <2\) (Class 2). So for these cases we need to check for strict stationarity as well.

7.2 Tail index

Without restricting \(\phi \) to give strict stationarity, i.e., not checking the value of \(\gamma \) to assess if strict stationarity applies, we explored what \(\kappa \) values were given by applying Algorithm 1 for Classes 1 and 2. Figure 8 plots the values of evaluated \(\kappa \) against \(\phi \), for \(0.5\le \phi \le 1.5\). For both classes and all studied \(\omega \) values, the derived values for \(\kappa \) decrease monotonically with increasing \(\phi \). There are three conclusions we can draw from these results which have broader implications.

Fig. 9
figure 9

Contour plots for the extremal index \(\theta _{X^U}\) for the GARCH(2, 2) process: left, as function of \((\alpha _1,\beta _1)\) with \(\alpha _2 = \beta _2 = 0.05\); right, as function of \((\alpha _2,\beta _2)\) with \(\alpha _1 = \beta _1 = 0.05\). In both panels the innovation \(Z_t\) is standard normal and the grey dashed line is the IGARCH(2, 2) model

Firstly, for each of the examples in Fig. 8 the evaluated value is \(\kappa =0\) for large \(\phi \). In this plot we report a value \(\kappa =0\) when the only solution of k to the equation (19) is \(k=0\) (we realise that \(\kappa \) cannot really be 0), with these cases arising when \(\rho _k\) is found to be an increasing function for \(k\ge 0\). The value of \(\phi \) for which \(\kappa =0\) is achieved appears to be different for each example. We denote this by \(\phi _S\), with \(\phi _S=\text{ min }\{\phi >0: \kappa =0\}\), and note that \(\phi _S>1\) in all cases.

As \(\kappa =0\) for \(\phi >\phi _S\) is obtained using Algorithm 1 this suggests that for these \(\phi \) values the Class 1 and 2 examples are not like the processes studied in Fig. 4, which were strictly stationary processes. A possible reason for this could be that \(\kappa \) is found to be 0 by Algorithm 1 when the process is not strictly stationary. We investigated this supposition by testing for strict stationarity of the example classes for values \((\phi _L, \phi _U)\) of \(\phi \), which are within 0.01 of \(\phi _S\), with \(1<\phi _L<\phi _S<\phi _U\). In all the examples we evaluated \(\gamma \) for \((\phi _L, \phi _U)\) and found that \(\phi _L\) gave \(\gamma <0\) and all \(\phi _U\) gave \(\gamma >0\), with the odd case requiring longer runs to reduce Monte Carlo noise issues. For this example it appears that \(0<\phi <\phi _S\) is required for strict stationarity. As there is nothing special about these classes of examples it appears that strict stationarity holds if and only if the only solution to equation (19) is \(\kappa > 0\). If this were found to hold for all GARCH(pq) processes then this would offer a novel and computationally efficient way to test for strict stationarity, with some benefits over the approach we developed in Theorem 2.

Secondly, Fig. 8 shows that \(\phi \) strongly influences the value of \(\kappa \), though \(\kappa \) also varies over \(\omega \). Critically we have that, whatever the value of \(\omega \), for Classes 1 and 2 that \(0<\phi <1,~\phi =1, ~\phi >1\) if and only if \(\kappa >1, ~\kappa =1, ~0\le \kappa <1\), respectively. These findings about \(\kappa \) and \(\phi \) are consistent with our results in Table 1 for models A-E, and with extensive empirical evidence we have obtained for a range of GARCH(pq) process with different p and q and different innovation tail decays, so we believe they are indicative of a general result. It is also noteworthy that \(\kappa \) is much more sensitive to \(\omega \) in Class 1 than Class 2, suggesting that the process term values are much more important than the volatility terms in determining \(\kappa \).

Thirdly, if our supposition that \(\phi =1\) if and only if \(\kappa =1\) for any IGARCH(pq) process with \(\max (p,q)\ge 2\) held generally, it would be a novel result. This result was proved for \(p=q=1\) by Mikosch and Stărică (2000) and \(\kappa =1\) implies \(\phi =1\) can be shown general using Breiman’s Lemma (as noted by a referee). However, the result should not be a surprise as the finite mean and infinite variance of the IGARCH(pq) process implies that \(0.5< \kappa \le 1\). So if our claim is true it gives that all IGARCH(pq) processes have \(E(|X_t| ^{2-\epsilon }) < \infty \) for any \(\epsilon \in (0, 2]\). This claim for \(\kappa \) is not too surprising though, as the variance of \(X_t\) is infinite when \(\phi =1\) but is finite when \(\phi <1\), suggesting that \(\kappa \), for the IGARCH(pq), is the boundary point for having a finite variance.

A weakness of Algorithm 1 is that it is rather black-box and requires considerable numerical exploration to gain insight into how key stochastic features like \((\lambda ^*, \eta ^*)\) or \((\lambda ,\eta )\) influence the value of \(\kappa \). To provide some insight into the role of these features we have developed a relatively good approximation \(\kappa _A\) for \(\kappa \) which can be expressed transparently and trivially as a function of these features. We used a range of heuristic arguments and empirical investigations, to propose \(\kappa _A>0\) as the unique positive solution the equation

$$\begin{aligned} E \left( [\lambda \exp (\eta )]^{\kappa _A}\right) = 1, \end{aligned}$$
(34)

where \(\lambda \) and \(\eta \) are defined in Sect. 3.3. To explain a little about where \(\kappa _A\) comes from consider the GARCH(1,1) processes, where in Sect. 3.3 we found that \(\lambda ^*=\lambda \) and \(\eta ^*=\eta =0\), yet from Mikosch and Stărică (2000) we know that \(\kappa >0\) must satisfy \(E[(\alpha _1Z_t^2+\beta _1)^\kappa ]=1\), which is simply equation (34) in this case, so here \(\kappa _A=\kappa \). Figure 8 shows that \(\kappa _A\) generally provides an excellent approximation to \(\kappa \) in terms how it varies with \(\phi \) and \(\omega \) for examples Classes 1 and 2, typically being within 7% of the value of \(\kappa \). Furthermore, for models A3-E3 we find \(\kappa _A\) to be, approximately +1%, +8%, 0%, +4%, and -25% different than \(\kappa \), respectively.

7.3 Extremal index

To see how the GARCH(2,2) parameters in affect the extremal index, Fig. 9 presents a contour plot of \(\theta _{X^U}\), with normal \(Z_t\), over parameters \((\alpha _1, \alpha _2,\beta _1,\beta _2)\). In each panel we hold fixed two parameters and contour over the others. Figure 9, shows that \(\theta _{X^U}\) decreases, i.e., average cluster sizes increase, with increasing \(\phi \), up to an IGARCH(2,2) model. Consequently, contours are near linear in the parameters. Unless \(\max (\beta _1, \beta _2)\) is large then small values of \(\alpha _1\) and \(\alpha _2\) tend to lead to very limited clustering. This seems logical as small \(\alpha _1\) and \(\alpha _2\) reduce the effect of the large \(X_t^2\) value on subsequent volatilities, so without \(\beta _1\) and \(\beta _2\) being large, to pick up the momentum of the evolution of the event, the large event is very likely to die out rapidly. We obtain stronger extremal dependence with larger values of the pair \((\alpha _2, \beta _2)\) than for \((\alpha _1,\beta _1)\) for equal values of \(\phi \), as seen by values of \(\theta _{X^U}\) being smaller in Fig. 9 right panel by comparison to the left panel.

8 Discussion

Naïvely, it might seem that for a GARCH(pq) process the extremal features can be obtained sufficiently accurately using empirical estimators based on simulated data from very long runs. However, Figs. 5, 6 and 7 illustrate that this is essentially impossible due to the needs for a very large threshold u, to be selected for convergence to be assured, and the need for large numbers of exceedances of u, for numerical stability; collectively these lead to n needing to be too large for computation. This is seen through the confidence intervals and threshold sensitivity in such estimates given by Laurini et al. (2022).

It appears likely that the new theory and methods we present extend to the broader class of stochastic recurrence equations discussed by Collamore and Mentemeier (2018). Specifically, for any process \({\textbf{Y}}_t \in {{\mathbb {R}}}^d\), with \({\textbf{Y}}_t = {\textbf{A}}_t {\textbf{Y}}_{t-1} + {\textbf{B}}_t\), where \({\textbf{A}}_t\) and \({\textbf{B}}_t\) are stochastic, i.i.d. sequences of matrices and vectors respectively, satisfying the conditions of Kesten (1973), then our paper gives methods which help assess stationarity of the process; determine marginal distribution tail index of \({\textbf{Y}}_t\); determine ways to simulate from the spectral measure; and are able to generate a wide range of extremal properties.

None of these properties can currently be obtained due to a lack of appropriate numerically stable algorithms. The theory and methods presented here overcome these limitations and provide a broad toolbox of numerically robust approaches to derive the extremal properties of a wide class of stochastic recurrence equations. Extensions of our methods to allow \(\Pr (Z_t=0)>0\), to cover liquidity issues, missing values and market closures, and to account for asymmetric volatility behaviour, both areas of active research (see Francq and Zakoïan, 2013 and Francq and Sucarrat, 2021), will be particularly useful.

We focused on the limiting extremal properties, but there is also interest in sub-asymptotic properties, e.g., sub-asymptotic versions of the extremogram and the extremal index given by

$$\begin{aligned}{} & {} \chi _X(\tau ;x)=\Pr (X_{t+\tau }>x~\mid ~X_t>x),\\{} & {} \quad \text{ and } \theta _{X}(x,m)= \Pr (X_t<x; t=1,2,\ldots ,m\mid X_0 > x) \end{aligned}$$

respectively, for large but finite x and a suitably large m. These sub-asymptotic quantities might converge quite slowly to their respective limits \(\chi _X(\tau )\) and \(\theta _{X}\). So it is worth evaluating how this convergence behaves and obtaining values of these extremal properties for values of x which are of relevance to applications. We believe that it should be possible to get such results using our methods in combination with the techniques of Collamore and Mentemeier (2018).