Limit theory and robust evaluation methods for the extremal properties of GARCH(p, q) processes

Generalized autoregressive conditionally heteroskedastic (GARCH) processes are widely used for modelling financial returns, with their extremal properties being of interest for market risk management. For GARCH(p,q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q$$\end{document}) processes with max(p,q)=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max (p,q) = 1$$\end{document} all extremal features have been fully characterised, but when max(p,q)≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\max (p,q)\ge 2$$\end{document} much remains to be found. Previous research has identified that both marginal and dependence extremal features of strictly stationary GARCH(p,q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q$$\end{document}) processes are determined by a multivariate regular variation property and tail processes. Currently there are no reliable methods for evaluating these characterisations, or even assessing the stationarity, for the classes of GARCH(p,q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q$$\end{document}) processes that are used in practice, i.e., with unbounded and asymmetric innovations. By developing a mixture of new limit theory and particle filtering algorithms for fixed point distributions we produce novel and robust evaluation methods for all extremal features for all GARCH(p,q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q$$\end{document}) processes, including ARCH and IGARCH processes. We investigate our methods’ performance when evaluating the marginal tail index, the extremogram and the extremal index, the latter two being measures of temporal dependence.

price of a generic asset. The behaviour of the extreme values of daily log-returns is critically important for market risk management. Marginally these processes have powerlaw heavy tail decay and determining the associated (power) tail index is therefore important to quantify this. Additionally, isolated extreme values of daily log-return can often be managed, but there is major risk when there is a clustering of these extreme values, and so the study of this dependence structure during extreme events is essential.
It is standard to assume that the series {X t } is a stationary series. The most widely adopted models for {X t } are the generalised autoregressive conditionally heteroskedastic (GARCH) models (Bollerslev 1986) and stochastic volatility models (Taylor 1986). These models are capable of capturing many of the empirical features of daily log-returns. Both processes have heavy tailed marginal distributions but they differ in terms of their extremal dependence structure. One of the most common ways to measure this is through the lag τ tail dependence proposed by Ledford and Tawn (2003), with Davis and Mikosch (2009b) terming {χ X (τ )} τ ≥0 the extremogram. For stochastic volatility models Breidt and Davis (1998) show that there is no clustering of extreme values, so that χ X (τ ) = 0 for all τ > 0. Thus extreme values from this stochastic volatility process occur in temporal isolation, although there are examples of other such processes for which this is not the case (Mikosch and Rezapour 2013). In contrast, for any GARCH( p, q) process χ X (τ ) > 0 for at least one value of τ > 0. The values of the extremogram, and other extremal dependence features, are only known for a very restricted subclass of GARCH( p, q) processes.
The aim of our paper is to derive all the extremal features for all GARCH( p, q) models used in a wide range of financial applications and to present computationally efficient algorithms for their evaluation. We consider GARCH( p, q) models, for p ∈ N and q ∈ N + , of the form for t ∈ Z, where the parameters are α 0 > 0, α i ≥ 0, i = 1, . . . , q − 1, α q > 0 and β j ≥ 0, j = 1, . . . , p − 1, β p > 0 for p ≥ 1, and {Z t } are independent and identically distributed (i.i.d.) random variables with E(Z t ) = 0 and Var(Z t ) = 1 for all t ∈ Z. The processes {σ t } and {Z t } are commonly referred to as the conditional volatility and innovations of {X t }. We additionally assume that {Z t } are continuous random variables, which avoids technical issues that occur when Pr(Z t = 0) > 0 (Buraczewski et al. 2016). For the process to be strictly stationary the parameters need to satisfy the constraints in Sect. 2.2. Properties of GARCH( p, q) processes are often determined by Two important special cases of GARCH( p, q) processes arise when p = 0 or φ = 1, corresponding to ARCH(q) and IGARCH( p, q) processes respectively. The IGARCH( p, q) process is strictly stationary but not weakly stationary, due to violating the condition E(X 2 t ) < ∞ for all t. We develop new numerically robust and efficient methods for assessing whether any GARCH( p, q) process is strictly stationary. Our results cover all of the GARCH( p, q) processes including these special cases.
Existing theoretical and computational methods for deriving the extremal properties are well established for special cases of the GARCH( p, q) process, namely: for symmetric Z t with p = 0, q = 1, corresponding to the ARCH(1) process (de Haan et al. 1989) and for p = q = 1, corre-sponding to a GARCH(1,1) (Laurini and Tawn 2012); and for asymmetric Z t with p = q = 1 (Ehlert et al. 2015). For a class of processes which contain both the ARCH(1) and GARCH(1,1) processes, but not any GARCH( p, q) process with max( p, q) ≥ 2, Collamore et al. (2014) provide an algorithm to derive Pr(X t > v) for any v. For general GARCH( p, q) models, with arbitrary ( p, q) many theoretical extremal properties have been derived by Basrak et al. (2002), Davis and Mikosch (2009a),  and Collamore and Mentemeier (2018), including marginal distributions and some results for the extremal clustering properties. At first sight it seems that these results give everything that is needed for their numerical evaluation. But, as we will show, this is far from the case. In fact, until now none of these features can be evaluated for general GARCH( p, q) processes or for the processes considered by Collamore and Mentemeier (2018).
Consider the marginal tail behaviour of |X t | and X 2 t for GARCH( p, q) processes X t . Basrak et al. (2002) give a limit expression, as u → ∞, that there exists κ > 0 such that for fixed x > 1 These papers only give an asymptotic limiting expression for κ, but do not evaluate it. We find that direct computation using their expression gives very poor numerical performance. Janssen (2010) presents an alternative approach to evaluate κ, however that approach requires Z t to have bounded support, ruling out distributions used by practitioners, e.g., Z t being Gaussian or t-distributed. Furthermore, the associated numerical methods are very slow and unreliable. Collamore and Mentemeier (2018) characterise the limit lim u→∞ u κ Pr(X 2 t > u) = C > 0, with κ formulated as in Basrak et al. (2002). So, without techniques to obtain κ their methods cannot evaluate C or any of the other properties they characterise. We propose the first computationally efficient numerical algorithm for the evaluation of κ, which is valid irrespective of whether Z t are unbounded or bounded. We also provide strong empirical evidence for how φ influences κ, including κ = 1 for all IGARCH( p, q) and how our evaluation of κ can be used for assessing stationarity. Now consider the extremal dependence and the clustering features of GARCH( p, q) processes.  propose algorithms for their evaluation with bounded innovations. However, Basrak and Segers (2011) identify these methods have major limitations and that they do not hold for any GARCH( p, q) process. So currently no extremal clustering features for GARCH( p, q) processes, when max( p, q) ≥ 2, can be evaluated. We propose an entirely new algorithm to evaluate a range of cluster fea-tures regardless of p and q and for Z t having any continuous distribution.
There are a range of features of GARCH processes that are of interest in managing financial stress periods. We focus primarily on {χ X (τ ); τ ≥ 0}, discussed above, and the extremal index for the process X t denoted 0 < θ X ≤ 1, where 1/θ X is the limiting mean number of extreme values per independent extreme event (Hsing et al. 1988). We also evaluate these properties for the lower tail and for the square of the GARCH process. Other features can also be derived using our methods, such as the duration of a stress period and the total losses in such a period.
All of these cluster functionals can be obtained from the tail process (Smith et al. 1997;Segers 2003;Planinić and Soulier 2018). For a heavy tailed process {X t } the tail process {X t } t≥0 is defined in the following way. If for any t ∈ N + the variables (X 0 /u, X 1 /u, . . . , X t /u) | X 0 > u, converge weakly to (X 0 ,X 1 , . . . ,X t ), withX 0 non-degenerate as u → ∞, then the limit process {X t } t≥0 is termed the forward tail process. For Breidt and Davis (1998) models of stochastic volatilityX t = 0 almost surely for all t ≥ 1, so large values are not followed by large values for these processes. In contrast, for any GARCH( p, q) process at least onê X t , for t ≥ 1, is non-degenerate, and furthermore all elements of the tail process are non-degenerate if all the coefficients of σ 2 t in expression (1) are positive. In this paper we derive the theory for obtaining the forward tail process, the extremogram, the extremal index and cluster size distribution for any GARCH( p, q) process with bounded or unbounded support for the innovations. We provide a new fast, yet accurate, Monte Carlo algorithm for the numerical evaluation of these extremal features. Our approach is quite different from that of Collamore and Mentemeier (2018), who start their Markov chains in non-extreme states and use a form of importance sampling to move into the tail region. Relative to our approach, using the tail process, this is guaranteed to be inefficient for evaluation of limiting properties, as we start in the limit. However, for evaluating non-limit tail features the Collamore and Mentemeier (2018) approach has advantages over ours, but to be useful they need our evaluation of κ.
The paper is structured in the following way. In Sect. 2 we give the required background details for the properties of stationary GARCH( p, q) processes and the theory of multivariate regular variation that is required for our methodology. In Sect. 3 we give new results for robust computational testing for stationarity and in Sect. 4 a novel particle filtering algorithm for fixed point distributions that enables us to obtain the tail index κ and to sample from a ( p + q)-dimensional extreme state of the process. In Sect. 5 we derive the tail processes for the squared and original GARCH( p, q) processes. Section 6 discusses the numerical evaluation of the method, including checking for stationarity, evaluating κ, and illus-trating the rapid convergence of the particle filter algorithm. Section 7 has a study of a range of extremal dependence features of the GARCH(p, q) process for a variety of parameter values. In particular, the empirical evidence from the systematic study of two sub-classes of GARCH( p, q) process suggests a number of other features that remain to be proved. In Sect. 8 we discuss the likely wider impact of our methods and in the Appendix give proofs. In Supplementary Material, Laurini et al. (2022), we provide some theory on the convergence properties of our methods and extensive illustrations of their reliability.

Matrix formulation
Focusing on the squared GARCH process, X 2 t , we write the process as a stochastic recurrence equation (SRE) to enable the exploitation of a range of established results (Kesten 1973).
where the superscript T denotes vector's transpose, α (s) = (α 1 , . . . , α s ) ∈ R s , β (s) = (β 1 , . . . , β s ) ∈ R s , I s is the identity matrix of size s, 0 (r ×s) is a matrix of zeros with r rows and s columns and 0 s is a length s column vector of zeros. If s ≤ 0 these terms are to be interpreted as being dimensionless, hence ARCH(q) processes arise using the top left q × q elements of the matrix. Squared GARCH( p, q) processes satisfy the SRE where {A t } and {B t } are each i.i.d. sequences. As Z t is a continuous variable, A t has no rows of zeros, this avoids problems discussed by Buraczewski et al. (2016). The expression of the GARCH( p, q) in the SRE formulation (3) is due to Francq and Zakoïan (2010, Section 2.2.2, p. 29). This SRE formulation is less parsimonious than that of Bougerol and Picard (1992), but has the benefit of covering all GARCH( p, q) processes, unlike that of Bougerol and Picard (1992) which excludes the p = q = 1 case that expression (3) gives, simplifying to where α (s) and β (s) are scalar and none of the I s , 0 r ×s or 0 s terms are included. Kesten (1973) presents necessary and sufficient conditions for SREs of the form (4), with general matrix A t and vector B t , to have a unique, strictly stationary solution. The conditions that are non-trivially true for GARCH( p, q) processes are that E(ln + A t ) < ∞ and E(ln + B t ) < ∞ (here ln + x = ln x, if x ≥ 1 and 0 otherwise), and that the top Lyapunov exponent γ , given by

Strict stationarity
has the property that γ < 0. This result holds for all norms.
Here and subsequently we use the L 1 norm, so for a matrix A with a i, j being the (i, j)th element, For the squared GARCH( p, q) the moment conditions on A t and B t always hold. To illustrate this for A t , first consider the random variable So for GARCH( p, q) processes X 2 t and X t to be strictly stationary it is necessary and sufficient that γ < 0. Expression (6) is not an ideal starting point for evaluating γ . Instead we also have that where a.s. denotes almost sure convergence, (Francq and Zakoïan 2010, Theorem 2.3-p. 30). It would appear that a relatively simple simulation can be performed to obtain a reliable estimate of γ via expression (7). However, in Sects. 3.3 and 6.2 we show this is far from the case and a mix of careful asymptotic approximation analysis and numerical evaluation is required to evaluate γ . In some cases we do not need to evaluate γ to find if the process is strictly stationary, e.g., GARCH( p, q) processes are always strictly stationary when φ ≤ 1; this includes all IGARCH( p, q) processes. Also p j=1 β j < 1 is necessary but not sufficient for strict stationarity. So, numerical evaluation of γ is required whenever φ > 1 and p j=1 β j < 1. Basrak et al. (2002) and Collamore and Mentemeier (2018) show that there exists a unique stationary solution to the SRE (4) and that this solution exhibits a multivariate regular variation property, i.e., for any t, any vector norm · and all r > 0,

Multivariate regular variation results
as x → ∞, where v → denotes vague convergence, κ > 0, andˆ t is a ( p + q)-dimensional random vector on the unit sphere (with respect to a norm · ) defined by S p+q ⊂ R p+q .
If condition (8) holds then Y t is said to exhibit multivariate regular variation with index κ and the distribution ofˆ t is termed the spectral measure of the vector Y t (Resnick 1987). A consequence of property (8) for GARCH( p, q) processes is that all the marginal variables of Y t have regularly varying tails with index κ > 0, i.e., for r ≥ 1 and all t It is insightful to consider a slightly rearranged version of limit (8), and with the L 1 vector norm. We define radial, R t , and angular (two variants t and − t ) random variables by with S p+q now being the ( p + q) dimensional unit simplex. For GARCH( p, q) processes, with p ≥ 1, we have two angular variables as the p + q dimensional variable t has redundancy in its final dimension as t = 1, and so for studying the distribution of angular variables it is simpler to work with the p where − t is t without its last component. We use this W − notation to create a ( p+q−1) dimensional vector from any ( p+q) dimensional vector W on the simplex S p+q throughout. Furthermore, for w ∈ R p+q−1 , we define with vector algebra, here and elsewhere, interpreted as being componentwise. We will denote the limit random variables, defined similarly to distribution (10). Then, for r ≥ 1, as x → ∞, the limit (8) becomes, From the first expression for the asymptotic form in limit (11) we see that the radial variable R t and the angular variables − t become asymptotically independent, as the radial variable R t grows due to x → ∞, i.e., the variablesR t andˆ − t are independent. The second term in this limit shows thatR t is a Pareto random variable with tail index κ, i.e., Basrak and Segers (2009) and Janssen (2010) identify that there is additional structure imposed on Hˆ − t and κ respectively by the GARCH( p, q) process. We discuss these restrictions below.
For the GARCH(1, 1) process, Laurini and Tawn (2012) provided an expression for the spectral measure, for a different description of the angular variable to that used here. For the choice of the angular variable (9), for all t, their result translates to a univariateˆ − t , with distribution where F Z is the distribution function of Z t . When max( p, q) ≥ 2, through use of the multivariate regular variation structure, Basrak and Segers (2009, Propositions 3.3, 5.1) show that where 0 ≤ w ≤ 1 and A is i.i.d. to A t , and uniquely Pr(ˆ t ∈ ·) = E( Aˆ t κ ; Aˆ t / Aˆ t ∈ ·), where the notation Segers (2009, p. 1075) propose an approach to simulate from Hˆ − t (w) for a SRE of the form (4), with the required distribution Hˆ − t being the invariant distribution of a Markov chain and MCMC methods used for its evaluation. However, this method cannot be used for any GARCH( p, q) process for the following reasons. Firstly, they make an assumption that A t is bounded, which excludes the possibility of Z t being, for example, Gaussian or t ν distributed. Much more critically though, Basrak and Segers (2011) note that their proof was flawed and their results only hold under very specific conditions on the matrix A t in the SRE framework (4), which excludes all GARCH( p, q) processes. Our approach in Sect. 4 overcomes these restrictions.

Janssen's approach to determine Ä
Next focus on how κ has been determined. Following methods of Kesten (1973), Basrak et al. (2002) showed that there exists a κ > 0 which is the unique positive solution of the equation Collamore and Mentemeier (2018) found the same expression for the marginal tail index for more general stochastic recurrence relations. For the GARCH(1, 1) process Mikosch and Stȃricȃ (2000) show that κ is simple to evaluate using expression (16). Specifically, taking A t as in expression (5) then , from which it simply follows that expression (16) holds and κ satisfies E α 1 Z 2 t + β 1 κ = 1. Setting β 1 = 0 for the GARCH(1, 1) process gives the same result for κ derived by de Haan et al. (1989) for the ARCH(1) process. For general GARCH( p, q) processes no such simplification of equation (16) exists. So it is natural to try to evaluate κ by solution of the equation (16), but such an approach is impossible due to numerical instabilities.
The only existing feasible method to evaluate κ has been proposed by Janssen (2010, Proposition 4.3.1), which exploits Kesten (1973, Proof of Theorem 3). With A specified in (4) the conditions required for the results of Kesten (1973) apply and the equality holds for all continuous functions g, where H k is a probability measure on S p+q (see step 1 in the proof of Theorem 3 in Kesten 1973), and where ρ k > 0. For all pairs (ρ k , H k ) that satisfy equality (17), for any given k ∈ (0, ∞) then ρ k is determined solely by k. The special case of g ≡ 1 in (17) gives the simplification From condition (14) we know that the κ-th moment of Aˆ t is equal to 1, whereˆ t ∼ Hˆ t on the space S p+q . It follows from property (18) that ρ k = 1 when k = κ. Kesten (1973) and Janssen (2010) show that there is only one solution to equation (14), so κ is the unique solution of and that the unit measure H κ must be Hˆ t . Thus if we can find, or simulate from, Hˆ t we can find κ. To evaluate κ all that is required is to define a class of unit measures H k , over k ∈ (0, ∞), which contains within it as an interior point Hˆ t , and then vary k until property (19) is found.
By adapting the algorithm of Basrak and Segers (2009), Janssen (2010) proposes a valid algorithm to simulate from a class of functions H k , which enables the evaluation of κ and Hˆ t . Critically this only applies when the innovations Z t have bounded support. In Sect. 4 we describe an algorithm that applies even if Z t is unbounded and which has substantial computational efficiency gains for calculating κ, compared to the algorithm of Janssen (2010), even when Z t is bounded. GARCH( p, q) processes as they require that α q > 0, even for ARCH(q) processes. As we use the L 1 norm of the matrix, (7) and (16), for γ and κ respectively, involve the behaviour of the A t A t−1 · · · A 1 as t → ∞. As this term tends to zero we need to capture its dominant behaviour to stabilise it. The approach we will take uses products of simple scalar summaries of the individual matrices. One possibility is the trace of A t , with trace(A t ) = α 1 Z 2 t + β 1 for all t, p and q, as the trace of a square matrix is the sum of all its eigenvalues. As we can have α 1 = β 1 = 0 when min( p, q) ≥ 2, we instead focus on the related quantities

Preliminaries
where Here α q > 0 for all GARCH( p, q) processes so λ * t > β p > 0, but for ARCH(q) processes β p = 0 so λ * t can be arbitrarily close to 0. Another possible summary of A t is λ t , the largest magnitude eigenvalue of all of its eigenvalues, and similarly we use the term λ for this when Z 2 t is replaced by Z 2 . For A t , given by representation (3), it is guaranteed that λ t is simple and exceeds all other eigenvalues in absolute value; see Seneta (1981, Theorem 1.1). In general λ t does not have a simple closed form expression.
At first sight the two proposals for stabilising the matrix product seems quite different. However, their properties are highly related. For example in the GARCH(1,1) case λ t = λ * t = α 1 Z 2 t + β 1 , which is an important term as the property E[(α 1 Z 2 t + β 1 ) κ ] = 1 determines κ > 0 for this process. So both proposals form natural extensions to the case with p = 1 and/or q = 1. In the only other case where we have found a closed form solution for λ t similar relationships hold. Specifically, for a GARCH( p, p) processes, i.e., q = p, Lemma 1 shows that λ t and λ * t have similar moment properties and related growth rates when multiplied over different t.
Lemma 1 Suppose that {A t } are given by (3) where Z t are as defined by (1), λ * t is defined by (20), and let λ t be the largest magnitude eigenvalue of the matrix A t . Then for all GARCH( p, q) and , an ARCH(q) process, all convergences only hold if additionally we have that E(ln |Z t |) < ∞.

Bounds on matrix product norms for GARCH(p, q) with p > 0
For a GARCH( p, q) process ( p > 0), consider the ( p +q)× i, j . Given the form of A t in expression (3), we have that a (t) i, j ≥ 0 for all t, i and j. Then We use expression (22) to develop bounds on A t · · · A 1 that hold for all GARCH( p, q) processes. As φ i ≤ φ (M) for all i = 1, . . . , p + q the term in the square brackets is bounded above by φ (M) A t · · · A 1 and as a (t) i, j ≥ 0 the last term is bounded above by A t · · · A 1 . By applying this combined bound iteratively, starting from t, we have The lower bound follows from expression (22) by dropping all terms involving α i , i = 1, . . . , q − 1 and β j , j = 1, . . . , p − 1, taking a common lower bound on the coefficients of a (t) i, j terms for all i and j, and setting c p,q := min(α q , β p ) > 0, to give The value of these bounds is not obvious at first sight given that they are stochastic and involve t products (where we are interested in t → ∞) over unbounded variables. As the norm of the product A t · · · A 1 is tending to zero it seems sensible to first normalise the size of the individual terms in the product to get more helpful bounds. We do this by scaling each matrix then, using upper bound (23), gives that Using the property that λ * i ≥ β p > 0 for any GARCH( p, q) process, Lemma 2 in Appendix B for each term in expression (25), and that log is an increasing function, we have Similarly, the lower bound for the norm of the product together with Lemma 2 gives The final term in each inequality bound converges almost surely to 0 as t → ∞. The second term of min[c p,q (Z 2 i + 1), 1] occurs with probability Pr(Z 2 i > (1 − c p,q )/c p,q ) and, as t → ∞, the sample mean of these terms con- Similarly the sample mean of the first terms in the minimum converges to ln(c where the random variable concerned has finite bounds. These convergences follow by the strong law of large numbers and as E(| ln λ * t |) < ∞. If lim t→∞ η * t exists, then this limit is in the finite interval [c L , c U ], where

New formulations for in terms of * and
Theorem 1 Suppose that {A t } are given by (3) with Z t defined by (1), λ * t and λ * are defined by (20), the limit (6) exists, and η * t is defined by (24). It follows that as t → ∞, η * t a.s. −→ γ − E(ln λ * ) := η * , for p > 0, and for p = 0 this holds when E(ln |Z |) < ∞, with the constant limit η * ∈ [c L , c U ], with these bounds defined by (26) and (35) for GARCH and ARCH processes respectively. Hence, for both processes the Lyapunov exponent satisfies γ = E(ln λ * ) + η * , where η * is the almost sure limit of η * t . At first sight the value of Theorem 1 seems limited, as it simply expresses a limit γ as function of two terms. Whilst one is a simple expectation, the other, η * , is also a limit. The benefits though are for numerical evaluation and interpretation. Under definition (7), γ is the limit of γ t as t → ∞ where both the numerator and denominator diverge to infinity, with the numerator a product of t unbounded stochastic terms and is numerically unstable to evaluate as a limit. In contrast there are finite bounds for η * t for all t. An alternative version of Theorem 1 applies when the largest magnitude eigenvalue λ t of the set of eigenvalues of Sect. 6.2 shows that t and η t are numerically stable to evaluate for large t, unlike * t and η * t .
Theorem 2 Suppose that {A t } are given by (3) with Z t defined by (1) and the limit γ of expression (6) exists. Let λ * be defined by (20), λ be the largest magnitude eigenvalue of a matrix A, which is identically distributed to A t , and let both η t and η be defined by (27). We then have that η t a.s.

Evaluating the spectral measure and the tail index
This section gives the details of our algorithm for sampling from the limit distribution Hˆ t and then uses this algorithm repeatedly to find κ. The algorithm requires no assumptions on the support for Z t . Throughout we take t = 0 as we start the tail process at that time in Sect. 5.1. We will first assume that κ is known and present Algorithm 1 for generating from Hˆ − 0 and then discuss the case when κ is unknown. To simulate from the spectral measure Hˆ − 0 , defined via (15), our approach is to introduce a series of distributions related via a recursion, and whose invariant distribution is Hˆ − 0 . We will then use sequential importance sampling (Doucet et al. 2000, ) to generate approximate samples from these distributions at consecutive iterations. We simulate this process until convergence, and then take the samples after this time as draws from Hˆ − 0 . The recursion can be viewed as that for distributions associated with Feynman-Kac models, and our algorithm is similar to the use of sequential Monte Carlo for sampling from Feynman-Kac distribution (e.g. Del Moral and Miclo 2000) and fixed point distributions (e.g. Del Moral and Miclo 2003).
Denote the series of random variables whose distribution we are simulating from by s , for iteration s ≥ 0. The recursion relating the distributions of these random variables for s ≥ 1 is where as above , then the right-hand side of (28) is equal to As E( Aˆ 0 κ ) = 1, this distribution is equal to the definition of Hˆ − 0 given by expression (15). Algorithm 1 uses importance sampling to generate a sample from Pr( s ∈ ·) based on a sample from Pr( s−1 ∈ ·). This involves first simulating a value for s via s = A s−1 / A s−1 , and assigning this value a weight proportional to A s−1 κ . Thus we can use sequential importance sampling to generate samples of s for s ≥ 1 from an initial sample of 0 . The assessment of the algorithm's convergence is discussed in Sect. 6.3. Our approach is similar to the MCMC algorithm  propose for simulating from Hˆ − 0 . Both algorithms simulate from the same Markov process. The difference is that they simulate a single realisation of a path of the process, and simulate the dynamics at each time point using rejection sampling. By comparison we use sequential importance sampling to simulate a sample of values from expression (28) at each time-point. The main advantage of our approach is that we do not need the distribution of Z t to be bounded. Algorithm 1 is also computationally more efficient than that of  in situations where a bound exists, but is large, as their probability of rejection is close to 1.
As it stands neither Algorithm 1, nor the algorithm Basrak and , have guaranteed unique convergence to the true limit. For Feynman-Kac models, which include Hˆ − 0 , Del Moral and Guionnet (2001) provide conditions under which the convergence of Algorithm 1 is unique. Laurini et al. (2022) show that these conditions amount to a subtle, but relatively weak, condition on the mixing of the process, but we have not been able to prove that condition is satisfied by all GARCH processes. Instead, Sect. 6.3 and Laurini et al. (2022) presents extensive empirical evidence to show Algorithm 1 converges to the correct limit each time it has been used, with the convergence achieved after very few steps whatever the initial distribution of 0 . The results obtained there are in very strong agreement with empirical extremal properties of the GARCH( p, q) process when estimated using very long simulations of the true process. Now consider the situation when κ is unknown. For a trial value of k (for κ), apply Algorithm 1 until convergence, giving a sample of weighted particles after the chain is deemed to have converged. Using these particles we approximate the expectation E( A 0 k ) to givẽ ρ k , by the Monte Carlo approximation of ρ k , as where F Z is the distribution function of Z t . We repeat this evaluation over k > 0 until we find the unique value of k which givesρ k = 1. This value is k = κ. See Sect. 6.4 for how we solve this equation whilst accounting for the Monte Carlo noise inρ k .
Algorithm 1: Sampling from Hˆ − 0 (w) 1 Generate a sample of 0 from a distribution of S p+q . Set s = 1.
2 Generate J independent copies of A, denote these as A The weighted particles, { s } J j=1 gives our approximation to the distribution of˜ s . 6 If converged to stationarity, output the set of weighted particles.
Otherwise set s = s + 1 and go to step 2.
To use Algorithm 1 we need to be able to sample from a suitable random variable 0 on S p+q . Ideally the distribution of this random variable should be as close as possible to the target limit distribution function Hˆ − 0 (w), so that the rate of convergence of s → dˆ 0 as s → ∞ is maximised. From limit (11) we have that Pr as x → ∞. So for large enough x, i.e., x ≥ u for some high threshold u, if we treat this limiting representation as an equality, this gives us an initial estimate H We select u as a high quantile of the sample of R t values, such that limit property (11) appears to be well represented, i.e., radial values appear to have a Pareto tail and radial and angular values appear to be independently distributed.
To obtain H − we generate a sample of length n from the GARCH( p, q) process and take the empirical distribution of simulated values of t given that R t > u after a burn in period of n b i.e., As initial particles for Algorithm 1 we use all the realisations of t given that R t > u, for t = 1, . . . , n. We take n = 1.1 × 10 7 , with u being the 99.99% quantile of R t giving J = 10 3 particles, each with equal weight J −1 .

Generation of different tail processes for GARCH(p, q)
The tail process {X 2 t } t≥0 is evaluated using Algorithm 2. There are two stages to the algorithm, initialisation and propagation. However, in most cases we are interested into the extremal properties of both the upper and lower tails for X t , and denote their respective tail processes by {X U t } t≥0 and {X L t } t≥0 , defined in expression (32). These additional tail processes are evaluated using Algorithm 3, which is driven by outputs from Algorithm 2 and exploits the GARCH( p, q) formulation (1). In the special case where p = 0, i.e., the ARCH(q) process, an adaption to Algorithm 3 is required.
The propagation stage uses results of Basrak et al. (2002), withˆ 0 being the vector generated in the initialisation step. From property (31), for t ≥ 0 we haveˆ T C t = A t A t−1 · · · A 1ˆ 0 . We denote the components of the vector variable in the tail chain byˆ for t ≥ 1. From this tail chain for t = 1, 2, . . . and from the initialisation variableR 0 , we determine the tail process for X 2 t and associated conditional variances, by settinĝ for all t ≥ 1. Ideally tail chain (31) is run until t = T , where T is such thatX 2 t < 1 for all t > T with probability 1. A good approximation of T is achievable as all components ofˆ T C t have negative drift and converge almost surely to 0 as t → ∞. So T is taken as large as possible subject to limits of storage and computational time. Following sensitivity checks we took T = 1000, but for processes with weak extremal dependence T = 50 is more than sufficient.
For the class of GARCH( p, q) processes (1) we are interested in both the upper and lower tails of X t , and denote their respective tail processes {X U t } t≥0 and {X L t } t≥0 , with the latter the negated lower tail process of X t . Here the constituent variables of these processes are defined bŷ soX U 0 > 1 andX L 0 > 1, with no constraints on the subsequent values/signs of the subsequent terms in the tail processes. When the distribution of the innovations Z t is symmetric about 0 these two processes are identical stochastically. When deriving the upper and lower tail processes for X t it may appear best to evolve the strategy of de Haan et al. (1989) by starting directly from the {X 2 t } t≥0 tail process and to square root this process and filter the series retaining positive values by using independent Bernoulli(δ) variables, with 0 ≤ δ ≤ 1 termed the tail balance, given by Breiman's lemma to be where (Z t ) + = max(Z t , 0). Then δ = 1/2 when the tails are symmetric (this is the case considered by de Haan et al. (1989)) and if δ = 0 or 1 the lower and upper tail are dominant respectively. Unfortunately {X 2 0 } and the Bernoulli(δ) variable associated with this are not independent, which we are grateful for a referee for identifying. Although δ is not required for our approach we do report its value as it gives a good indication of the general asymmetry in the tails of GARCH processes.
A more subtle approach is required. Critically, in addition to the {X 2 t } t≥0 tail process, Algorithm 2 also provides the associated values of {σ 2 t } t≥0 when p > 0. For GARCH( p, q) processes we have the relationship Z 2 t = X 2 t /σ 2 t for all t for the innovations. Hence for realisations of (X 2 t ,σ 2 t ; t = 0, . . . , T ) we let s t :=X 2 t /σ 2 t denote the corresponding value ofẐ 2 t . These s t values are observations from the i.i.d. variables Z 2 t , for t = 0, . . . , T , with Z t having probability density f Z . Given {s t } 0≤t≤T , the signs of the associated Z 0 , . . . , Z T variables are determined by independent Bernoulli(δ t ) variables with Taking account of whether Z 0 is positive or negative determines the tail processes {X U t } t≥0 and {X L t } t≥0 , see Algorithm 3.
When p = 0, i.e., with ARCH(q) processes, then Y t in the relationship (3) does not involveσ 2 t inˆ 0 ∈ S q . As we needσ 2 t for t = 0, . . . , T in Algorithm 3, the methods used for the GARCH( p, q) processes do not immediately apply. However, with a slight of hand, we can derive the results for this case using the GARCH(1, q) process formulation (3) with β 1 = 0, and then applying Algorithms 1-3. This works asσ 2 t is then part of the initial simulation and propagation of {ˆ

Algorithm 3: Obtaining the upper and lower tail processes of GARCH( p, q)
1 InputX 2 t andσ 2 t for t = 0, . . . , T from the output of Algorithm 2.

The evaluation of cluster functionals for squared and original GARCH processes
From repeated realisations of the tail processes {X 2 t } t≥0 , {X U t } t≥0 and {X L t } t≥0 we can derive the properties of key cluster functionals for the X 2 t , X t and −X t processes respectively. Here we illustrate this for the extremogram, the extremal index and the cluster size distribution.
The extremogram for the squared GARCH( p, q) process is Similarly the extremogram for the upper and lower tails of the GARCH( p, q) process are given by χ X U (τ ) = Pr(X U τ > 1|X U 0 > 1) = Pr(X U τ > 1) and χ X L (τ ) = Pr(X L τ > 1|X L 0 > 1) = Pr(X L τ > 1). Thus, in each case, we can estimate the extremogram as the proportion of simulated tail processes, over independent replicates, with an exceedance of 1 at t = τ . Rootzén (1988, Theorem 4.1) and de Haan et al. (1989) derive both the extremal index and the cluster size distribution for a process by first deriving the probability that the forward tail process has i exceedances, so that there are at least i values being in a cluster. Expressed in terms of the tail process {X 2 t } t≥0 for the squared GARCH this gives θ (i) Then the extremal index 0 < θ X 2 ≤ 1 of the squared GARCH process, is given by θ X 2 = θ (1) X 2 and the probability that a cluster is of size i is given by π . ., with the reciprocal of the mean of this cluster size distribution being θ X 2 . Identical results hold for extremal indices θ X U and θ X L for X t and −X t processes respectively.
To illustrate the effect of different choices for the distribution of the innovation Z t , throughout the remainder of the paper we focus on the special cases [1] a scaled Student-t ν distribution, [2] the skew Student-t ν distribution introduced in Azzalini and Capitanio (2003) and [3] the standard normal distribution. In each case the innovation distribution has zero mean and unit variance. First consider the univariate skew-t distribution, denoted by St(μ, ω, ξ, ν), where (μ, ω, ξ, ν) ∈ R × R + × R × (2, ∞) are location, scale, skewness and degrees of freedom parameters respectively. For the existence of the variance of Z we require that ν > 2. The distribution of Z has density f Z (z; μ, ω, ξ, ν) /ω, and f T (·; ν) and F T (·; ν) denote, respectively, the density and distribution function of a Student-t random variable with location and scale parameters 0 and 1 respectively and with ν degrees of freedom. The moment conditions on Z of model (1) require that with being the gamma function and where ν and ξ are constrained further to ensure that ω is a positive real number. The parameter ξ ∈ R controls the skewness: ξ > 0 and ξ < 0 correspond to right and left skew respectively, and ξ = 0 to a symmetric distribution. Important special cases arise when: ξ = 0, Z is the (scaled) Student-t ν distribution; ν → ∞, Z is a skew-Normal distribution; ξ = 0 and ν → ∞, Z is the standard normal distribution. Table 1 presents values of γ, η, κ, θ X 2 , θ X U , θ X L and δ for each of models A-E and for the three innovation distributions, where here, and subsequently, use of volatility model A with innovation distribution [1] is denoted as model A1 and similarly for all possible 15 combinations. These values are derived using our numerical methods and evidence for their validity is provided in Sects. 6.2-6.6 for some of the models, and for all the rest in Laurini et al. (2022). Although the Monte Carlo evaluation of these results are subject to noise, it is possible to obtain any desired level of accuracy by running sufficient replicates and assessing the variability using central limit results. Throughout, we have ensured that all numerical values reported are accurate to the stated number of decimal places with a very high probability.

Evaluation of and Á
Using expression (7) for the evaluation of γ , suggests using Monte Carlo methods taking t to be very large. Through Table 1 Values of key stationarity and extremal properties for models A-E for three innovation distributions: 1 t 3 ; 2 skew t 3 with ξ = 1; and 3 Gaussian, with Model A3 denoting GARCH model formulation A with innovation distribution 3. Empty entries for θ X L are identical to the associated θ X U Theorems 1 and 2, we have two new ways to evaluate γ using (λ * , η * ) and (λ, η) respectively. Of our methods, we prefer Theorem 2 as we found it had superior computational stability and reliability. We illustrate the performance of these methods for model A3, with similar results found for the other models. Figure 1 (left) shows that γ t , evaluated using expression (7), has serious numerical instabilities for large t. The plotted traces of ten independent realisations of γ t appear to have little variability, each with roughly the same negative value for 500 < t < 2000. Critically though, the plotting of each trace terminates at a random time point t i > 2000 for each of the i = 1, . . . , 10, with the numerically derived values of γ t = −∞ for t ≥ t i not plotted in Fig. 1 (left). For large t all entries of the matrix A t · · · A 1 are non-negative, but at t i they are all calculated as being equal to 0 to computer machine precision, and hence γ t i = −∞. For t > t i it follows that γ t = −∞. The value of t i varies over replications as it depends on the set of realisations of {Z t ; t ≤ t i } in replicate i. These numerical findings suggest we cannot draw conclusions about the convergence of γ t , as t → ∞, from direct Monte Carlo evaluation of γ using expression (7). Similar numerical degenerate features were found using η * t , although the associated times of degeneracy, t i , were found to be larger than in Fig. 1.
This numerical instability for evaluating γ does not appear to have been reported. For example, estimates of γ using this approach are presented for ARCH(2) processes in Francq and Zakoïan (2010, p. 34/35), however they stop evaluating γ t , when t = 1000, which is before we see the critical failure of numerical evaluation in Fig. 1. By increasing t, for the cases they study, we find identical numerical problems to those experienced for model A3 in Fig. 1.
From Theorem 2, γ = E(ln λ) + lim t→∞ η t , with η t defined by (27). Critically, here each of the A t terms in the product are first divided by the associated λ t before the matrix multiplication, thus avoiding any computer precision problems. Figure 1 (right) shows that η t is converging for each of the same 10 replicates shown (left) and does not experience any numerical instability; a finding that holds whatever the value of t. The value we report for η = lim t→∞ η t is evaluated as the mean over the ten different realisations using t = 3000, giving η = 0.019. We evaluated E(ln λ) by both numerical integration and using the Monte Carlo approximation t i=1 ln(λ i )/t for large t and obtained identical results of −0.359 once Monte Carlo noise is accounted for. Combining results we find that γ = −0.359 + 0.019 = −0.34 < 0, showing that model A3 satisfies the strict stationarity condition.

Convergence of Algorithm 1
We illustrate the convergence of Algorithm 1 for models C3 and D3. First consider model C3 where the true distribution of 0 , here a scalar, is given by expression (13). Hence we can compare the s iteration estimate H (s) − (w) against the truth Hˆ − 0 (w). Figure 2 illustrates this distributional convergence as well as that of the distribution of the particle weights, m ( j) s ( j = 1, . . . , J ) of expression (30) on iteration s, where we take J = 10 6 . First note that the 95% pointwise confidence intervals of the initial estimate H  Fig. 3 Convergence of Algorithm 1 for model D3 at iterations s = {0, 1, 2, 100} with marginal distribution of the spectral measure convergence for H ϑ (i) s (w i ), for i = 1, 2, 3 and the kernel density for the particle mass. Line types are as for Fig. 2. The rightmost panel displays s = {1, 2, 100}. Although we do not have an expression for the true values of marginal distributions, our algorithm gives identical values for these distributions for all iterations with s > 100. So it is reasonable to interpret the algorithm to have converged by iteration s = 100 and so for the distributions with s = 100 to be the truth After one step of Algorithm 1 we have that H (s) − (w) for s ≥ 1 is equal to Hˆ − 0 (w) to within visible detection, so convergence is achieved in one step. Initially, i.e., for s = 0, all particle weights are equal to J −1 , but, as Fig.2 (right) shows, within an iteration they have quite a different distribution of weights and that this distribution essentially has converged at s = 2 to its limit form. Thus Algorithm 1 works exceptionally well in this case where we know the answer. Similar tests over other GARCH(1,1) processes gave identical convergence performances. We found almost perfect convergence after one or two iterations whatever the initial distribution estimate. This feature is consistent with there being a unique solution for Hˆ − 0 (w) for this model and that our algorithm is robust and highly efficient in converging to this limit.
Next we assess the convergence of H (s) − (w), for model D3 whereˆ − 0 has three dimensional distribution we do not know. We cannot easily show graphically the full joint distribution convergence and even for lower dimensional summaries we can only show the algorithm converges to some limit. Figure 3 illustrates convergence for each of the marginal distributions of H (s) − (w) and the distribution of the particle weights over iterations, for s = 0, 1, 2 and 100. We also assessed (not shown) the convergence of the dependence structure of H ). In all the cases we explored, visual convergence was obtained before s = 10 and identical values found for s > 100.
The same limit values were achieved for a wide range of starting values which reassure us that there is a unique limit. Although in our examples the selection of a good initialisation to the algorithm (Sect. 4) was not found to be important on the convergence speed, we anticipate it could be when the complexity of the GARCH( p, q) model increases. Our extensive empirical findings in Sect. 7 and Laurini et al. (2022) show that the limit we find using Algorithm 1 leads to values of κ and the extremogram and extremal index all fully in accord with long run simulations of the GARCH process, for each of models A-E, which gives us extra confidence in the convergence. , and subsequent authors, imply that the way to evaluate κ is by numerical solution of the limiting equation (16), although they do not illustrate this. In Fig. 1 (left panel) we showed that there are major numerical instabilities in evaluating A t · · · A 1 for large t; so in practice it is impossible to solve equation (16) directly. We have developed a numerically robust approach for evaluating κ for any p and q, using Algorithm 1. Here we illustrate this method and discuss the values of κ that are obtained for models A-E.

Evaluation of Ä
To calculate κ we apply Algorithm 1, iterating over different values of k. Key to the solution is the evaluation of the Monte Carlo estimateρ k of ρ k in expression (29). Figure 4 showsρ k against k for each of the models A3-E3. There is clearly a unique solution for k > 0 to the equationρ k = 1, with the values of k = κ that solve this equation given in Table 1. We restricted the Monte Carlo noise in the estimatesρ k of ρ k , for each value of k shown in Fig. 4, by taking J = 10 6 and evaluated the Monte Carlo integral (29) with 10 4 replicates on Z to get κ to the required precision. To find κ, from the curve ofρ k , we used an initial grid search coupled with a bisection method. Table 1 illustrates that φ has an impact on the value of κ. When φ = 1 no explicit relationship appears to hold between φ and κ, as κ changes markedly with the innovation distribution. When φ < 1 the shorter the tail of the innovation distribution gives the larger κ and hence shorter tails of the GARCH( p, q) marginal distribution; whereas the reverse holds when φ > 1; and when φ = 1 then κ is invariant to the innovation distribution, with κ = 1. The case when φ > 1 is somewhat surprising, as it might be expected that hav- ing a heavier tail innovation would result in a heavier tailed process, whereas in fact the opposite occurs. This feature is found to hold much more broadly, and can easily be illustrated in the GARCH(1,1) case. The explanation is that the larger φ is the greater σ 2 t can grow over a period of time, and ultimately the large σ 2 t values produce the extremes and not isolated large Z 2 t values. Next, we illustrate that the derived value of κ is consistent with the GARCH( p, q) process' observed marginal tail. The observable tail can be derived from long run simulations. We compare the limiting probabilities Pr(X 2 t > r | X 2 t > 1) = r −κ with the empirical estimate of the probabilities Pr(X 2 t > r x | X 2 t > x) for very large x over a range of r > 1. Figure 5 shows this comparison for x being the 0.99998 marginal quantile on a log-scale. If κ is correct and x is large enough, the log-probabilities should be proportional with gradient κ. The results show that the limit tail is consistent with the empirical distribution subject to Monte Carlo noise, and hence the κ value seems appropriate. Figure 6 gives the extremogram χ X 2 (τ ) for the squared of GARCH process for models A3, A1, D3 and D1, with equivalent plots for the other models given in Laurini et al. (2022). In all cases the heavier tailed innovation distribution leads to weaker extremal dependence at all lags. Models D3 and D1, both IGARCH processes, exhibit much slower decay rates in extremal dependence as lag τ increases than for models A3 and A1 with φ < 1. The level of extremal dependence appears to be strongly related to φ. Furthermore, we see for models D3 and D1 that χ X 2 (2) > χ X 2 (1), with χ X 2 (τ ) decaying monotonically for τ ≥ 2. We believe that the reason for this feature is that β 2 > max(α 1 , α 2 ), which makes the variance σ 2 t more likely to be larger at observations with lag 2 than lag 1, and hence this induces stronger clustering of extreme values two time points apart. Extremogram (τ, χ X 2 (τ )) for squared GARCH processes of models A3-A1 (top-bottom left panels) and D3-D1 (top-bottom right panels). Black lines are our algorithm's approximation of the true limit values and the three grey lines are empirical extremogram estimatesχ X 2 (τ, u), based on a sample of size n = 5 × 10 7 , at u corresponding to 0.99 (continuous solid light grey), 0.999 (dashed grey) and 0.9999 (dotted dark grey) quantiles of X 2 t An empirical estimateχ X 2 (τ, u), where u is a threshold, of the extremogram of χ X 2 (τ ) based on a sample of length n from a GARCH( p, q) process is given bỹ

Extremogram
1(X 2 j > u). Figure 6 showsχ X 2 (τ, u) for large n and for three threshold choices u corresponding quantiles of X 2 t . This gives strong evidence that our evaluation of χ X 2 (τ ) is accurate. Specifically, the agreement with limit values χ X 2 (τ ) is very good generally, with the empirical estimates suffering from bias and variance trade-off, as with all threshold based estimates. Model A has the slowest convergence of the empirical estimators, but even here at the highest threshold there is almost perfect overlap between empirical estimates and the true values for all lags. In contrast, for model D the highest threshold produces the least good estimate, presumably due to its high variance. Additional comparisons are given in Laurini et al. (2022) for the other models.

Evaluation of the extremal index
Algorithms 1-3 give output values than can be used to evaluate the extremal index, θ X 2 , θ X U and θ X L , for processes X 2 t , X t and −X t respectively by using the methods discussed in Sect. 5.2. Here we show that these values are consistent with empirical estimates obtained from long-run simulations from the GARCH( p, q) process.
We can estimate the extremal index for a stationary process {V t } using the runs estimator,θ V (u, m), proposed by Smith and Weissman (1994), based on a sample of size n, a threshold u and a run length m, wherẽ We take V t = X 2 t , X t and −X t respectively, where X t is a GARCH( p, q) process. We compare the runs estimator (for a range of u and m) with our values of the extremal indices. As the selection of the run length m in the runs estimator is subjective (we take m = 100 and 1000 to assess sensitivity to  Fig. 7 The extremal indices θ X 2 , θ X U and θ X L for the model D2 with asymmetric Z (left to right), as calculated using our algorithms, shown by the horizontal dotted line. Runs estimator (m = 100 (+) and m = 1000 (×)) and the intervals estimator (•) are illustrated with their estimated 95% confidence intervals (shaded regions) based on 100 independent replicates of n = 10 8 sample data. The number of exceedances for each u are reported on the top axis this choice), we also consider the intervals estimator of Ferro and Segers (2003) where m is objectively chosen for each u based on the distribution of inter-arrival times of consecutive exceedances of u by {V t }.
For model D2, Fig. 7 shows θ X 2 , θ X U and θ X L together with the runs estimator (m = 100 and 1000) and the intervals estimator, based on n = 10 8 , for a range of values of u and then averaged over 100 independent replicates. Also shown are the estimated 95% confidence intervals (of these averaged estimators), with uncertainty increasing with larger u. Of the two estimators, the intervals estimator is more stable and typically closest to the limit value. In each case we see that the runs estimator with m = 100 is overestimating all three extremal indices for all thresholds, indicating that this choice of m is too small for the level of extremal dependence. Although the runs estimator with m = 1000 and the intervals estimator perform similarly, they both suffer underestimation for lower thresholds. It is reassuring that at the highest thresholds our values of the extremal indices typically fall inside the confidence intervals despite them being narrow, e.g., of width 0.01 − 0.02. Extensive comparisons, with similar conclusions, are given in Laurini et al. (2022) for all 15 of the models covered in Table 1. Now that we are confident that we have reliable estimates of the extremal index, we examine the implications for the three extremal indices for models A-E, and each innovation distributions, with these values given in Table 1. Recall that the extremal index, θ V , of the process {V t } determines the average size of clusters of extremes of {V t } via 1/θ V . We find that shorter tailed innovations give larger clusters on average. For increasing φ, for 0 < φ ≤ 1, we have increasing average cluster sizes, but that pattern does not follow when φ > 1. In all cases, min(θ X L , θ X U ) ≥ θ X 2 , indicating the extremes of the processes {X t } and {−X t } exhibit less clustering on average than the {X 2 t } process. When ξ = 0 then θ X L = θ X U , whereas for ξ > 0 more clustering occurs in the upper than in the lower tail of the GARCH( p, q) process, with the reverse when ξ < 0.

Subclass specification
Although Sect. 6 illustrates that our algorithms provide robust methods for the computational evaluation of extremal properties of a general GARCH( p, q) process, a systematic study of the extremal properties of a general GARCH( p, q) process is beyond the scope of this paper. However, we consider some investigation of these properties is justified for simple subclasses, to show how different features of the construction affect the various extremal characteristics. We consider the following classes of GARCH(2, 2) with Gaussian innovations, Class 1: α 1 = α 2 = ωφ/2, β 1 = β 2 = (1 − ω)φ/2 and Class 2: α 1 = β 1 = ωφ/2, α 2 = β 2 = (1 − ω)φ/2, with φ > 0 having its usual meaning (2). Here 0 < ω < 1 controls the relative importance of the past process values relative to the past volatility values (Class 1) and values of process and volatility at t − 1 to t − 2 (Class 2), with increasing ω giving more weight the former in each case. For 0 < φ ≤ 1 the process is stationary, whereas the necessary condition for strict stationarity when φ > 1, given in Sect. 2.2, implies φ < 1/(1−ω) (Class 1) and φ < 2 (Class 2). So for these cases we need to check for strict stationarity as well.

Tail index
Without restricting φ to give strict stationarity, i.e., not checking the value of γ to assess if strict stationarity applies, we explored what κ values were given by applying Algorithm 1 for Classes 1 and 2. Figure Fig. 8 the evaluated value is κ = 0 for large φ. In this plot we report a value κ = 0 when the only solution of k to the equation (19) is k = 0 (we realise that κ cannot really be 0), with these cases arising when ρ k is found to be an increasing function for k ≥ 0. The value of φ for which κ = 0 is achieved appears to be different for each example. We denote this by φ S , with φ S = min{φ > 0 : κ = 0}, and note that φ S > 1 in all cases.
As κ = 0 for φ > φ S is obtained using Algorithm 1 this suggests that for these φ values the Class 1 and 2 examples are not like the processes studied in Fig. 4, which were strictly stationary processes. A possible reason for this could be that κ is found to be 0 by Algorithm 1 when the process is not strictly stationary. We investigated this supposition by testing for strict stationarity of the example classes for values (φ L , φ U ) of φ, which are within 0.01 of φ S , with 1 < φ L < φ S < φ U . In all the examples we evaluated γ for (φ L , φ U ) and found that φ L gave γ < 0 and all φ U gave γ > 0, with the odd case requiring longer runs to reduce Monte Carlo noise issues. For this example it appears that 0 < φ < φ S is required for strict stationarity. As there is nothing special about these classes of examples it appears that strict stationarity holds if and only if the only solution to equation (19) is κ > 0. If this were found to hold for all GARCH( p, q) processes then this would offer a novel and computationally efficient way to test for strict stationarity, with some benefits over the approach we developed in Theorem 2.
Secondly, Fig. 8 shows that φ strongly influences the value of κ, though κ also varies over ω. Critically we have that, whatever the value of ω, for Classes 1 and 2 that 0 < φ < 1, φ = 1, φ > 1 if and only if κ > 1, κ = 1, 0 ≤ κ < 1, respectively. These findings about κ and φ are consistent with our results in Table 1 for models A-E, and with extensive empirical evidence we have obtained for a range of GARCH( p, q) process with different p and q and different innovation tail decays, so we believe they are indicative of a general result. It is also noteworthy that κ is much more sensitive to ω in Class 1 than Class 2, suggesting that the process term values are much more important than the volatility terms in determining κ.
Thirdly, if our supposition that φ = 1 if and only if κ = 1 for any IGARCH( p, q) process with max( p, q) ≥ 2 held generally, it would be a novel result. This result was proved for p = q = 1 by Mikosch and Stȃricȃ (2000) and κ = 1 implies φ = 1 can be shown general using Breiman's Lemma (as noted by a referee). However, the result should not be a surprise as the finite mean and infinite variance of the IGARCH( p, q) process implies that 0.5 < κ ≤ 1. So if our claim is true it gives that all IGARCH( p, q) processes have E(|X t | 2− ) < ∞ for any ∈ (0, 2]. This claim for κ is not too surprising though, as the variance of X t is infinite when φ = 1 but is finite when φ < 1, suggesting that κ, for the IGARCH( p, q), is the boundary point for having a finite variance.
A weakness of Algorithm 1 is that it is rather black-box and requires considerable numerical exploration to gain insight into how key stochastic features like (λ * , η * ) or (λ, η) influence the value of κ. To provide some insight into the role of these features we have developed a relatively good approximation κ A for κ which can be expressed transparently and trivially as a function of these features. We used a range of heuristic arguments and empirical investigations, to propose κ A > 0 as the unique positive solution the equation  Fig. 9 Contour plots for the extremal index θ X U for the GARCH(2, 2) process: left, as function of (α 1 , β 1 ) with α 2 = β 2 = 0.05; right, as function of (α 2 , β 2 ) with α 1 = β 1 = 0.05. In both panels the innovation Z t is standard normal and the grey dashed line is the IGARCH(2, 2) model where λ and η are defined in Sect. 3.3. To explain a little about where κ A comes from consider the GARCH(1,1) processes, where in Sect. 3.3 we found that λ * = λ and η * = η = 0, yet from Mikosch and Stȃricȃ (2000) we know that κ > 0 must satisfy E[(α 1 Z 2 t + β 1 ) κ ] = 1, which is simply equation (34) in this case, so here κ A = κ. Figure 8 shows that κ A generally provides an excellent approximation to κ in terms how it varies with φ and ω for examples Classes 1 and 2, typically being within 7% of the value of κ. Furthermore, for models A3-E3 we find κ A to be, approximately +1%, +8%, 0%, +4%, and -25% different than κ, respectively.

Extremal index
To see how the GARCH(2,2) parameters in affect the extremal index, Fig. 9 presents a contour plot of θ X U , with normal Z t , over parameters (α 1 , α 2 , β 1 , β 2 ). In each panel we hold fixed two parameters and contour over the others. Figure 9, shows that θ X U decreases, i.e., average cluster sizes increase, with increasing φ, up to an IGARCH(2,2) model. Consequently, contours are near linear in the parameters. Unless max(β 1 , β 2 ) is large then small values of α 1 and α 2 tend to lead to very limited clustering. This seems logical as small α 1 and α 2 reduce the effect of the large X 2 t value on subsequent volatilities, so without β 1 and β 2 being large, to pick up the momentum of the evolution of the event, the large event is very likely to die out rapidly. We obtain stronger extremal dependence with larger values of the pair (α 2 , β 2 ) than for (α 1 , β 1 ) for equal values of φ, as seen by values of θ X U being smaller in Fig. 9 right panel by comparison to the left panel.

Discussion
Naïvely, it might seem that for a GARCH( p, q) process the extremal features can be obtained sufficiently accurately using empirical estimators based on simulated data from very long runs. However, Figs. 5, 6 and 7 illustrate that this is essentially impossible due to the needs for a very large threshold u, to be selected for convergence to be assured, and the need for large numbers of exceedances of u, for numerical stability; collectively these lead to n needing to be too large for computation. This is seen through the confidence intervals and threshold sensitivity in such estimates given by Laurini et al. (2022).
It appears likely that the new theory and methods we present extend to the broader class of stochastic recurrence equations discussed by Collamore and Mentemeier (2018). Specifically, for any process Y t ∈ R d , with Y t = A t Y t−1 +B t , where A t and B t are stochastic, i.i.d. sequences of matrices and vectors respectively, satisfying the conditions of Kesten (1973), then our paper gives methods which help assess stationarity of the process; determine marginal distribution tail index of Y t ; determine ways to simulate from the spectral measure; and are able to generate a wide range of extremal properties.
None of these properties can currently be obtained due to a lack of appropriate numerically stable algorithms. The theory and methods presented here overcome these limitations and provide a broad toolbox of numerically robust approaches to derive the extremal properties of a wide class of stochastic recurrence equations. Extensions of our methods to allow Pr(Z t = 0) > 0, to cover liquidity issues, missing values and market closures, and to account for asymmetric volatility behaviour, both areas of active research (see Zakoïan, 2013 andSucarrat, 2021), will be particularly useful.
We focused on the limiting extremal properties, but there is also interest in sub-asymptotic properties, e.g., subasymptotic versions of the extremogram and the extremal index given by χ X (τ ; x) = Pr(X t+τ > x | X t > x), and θ X (x, m) = Pr(X t < x; t = 1, 2, . . . , m | X 0 > x) respectively, for large but finite x and a suitably large m. These sub-asymptotic quantities might converge quite slowly to their respective limits χ X (τ ) and θ X . So it is worth evaluating how this convergence behaves and obtaining values of these extremal properties for values of x which are of relevance to applications. We believe that it should be possible to get such results using our methods in combination with the techniques of Collamore and Mentemeier (2018).