Ordinal Patterns in Clusters of Subsequent Extremes of Regularly Varying Time Series

In this paper, we investigate temporal clusters of extremes defined as subsequent exceedances of high thresholds in a stationary time series. Two meaningful features of these clusters are the probability distribution of the cluster size and the ordinal patterns within a cluster. Since these patterns take only the ordinal structure of consecutive data points into account the method is robust under monotone transformations and measurement errors. We verify the existence of the corresponding limit distributions in the framework of regularly varying time series, develop non-parametric estimators and show their asymptotic normality under appropriate mixing conditions. The performance of the estimators is demonstrated in a simulated example and a real data application to discharge data of the river Rhine.


Introduction
In time series data sets, extremes often do not occur at scattered instants of time, but tend to form clusters. Assigning a cluster of extremes to a single extreme event, such as a flood in the context of a hydrological time series or a stock market crash in the context of a financial data, the distribution of these clusters is crucial for risk assessment. In order to analyze the occurrence times of extremes defined as exceedances over some high threshold u, some profound theory has been built up since the 1970s. Within this framework, data X 1 , . . . , X n from a stationary time series (X t ) t∈Z are typically divided into different blocks. Then, repeated extremes are said to form a cluster if they occur within the same temporal block. Due to the convergence of the process of exceedances to a Poisson point process under appropriate conditions as u → ∞, the distribution of these clusters converges weakly provided that the block size increases at the right speed. The limit distribution is nicely linked to the well-known concept of the extremal index of the time series which can be interpreted as the reciprocal of the mean limiting cluster size (cf. Leadbetter et al., 1983;Embrechts et al., 1997;Chavez-Demoulin and Davison, 2012, for an overview). Besides the extremal index, several other cluster characteristics are of interest and can be estimated, such as the distribution of the cluster size (Robert, 2009) or more general cluster functionals (Drees and Rootzén, 2010). Convergence of clusters in appropriate sequence spaces preserving the order of observations can be shown within the framework of regular variation (Basrak et al., 2018). Even though positive theoretical results exist, estimation of characteristics of clusters as defined above is difficult for finite samples. Here, besides the threshold u, also the block size or, equivalently, some cluster identification parameter giving the minimum distance between two separate clusters, needs to be chosen. Instead, in this paper, we will use a different definition of a cluster of extremes by restricting our attention to subsequent threshold exceedances, i.e. a realization of the l-dimensional vector (X i ) t+l−1 i=t will be called a u-exceedance cluster of size l ∈ N if and only if X t−1 ≤ u, X t > u, . . . , X t+l−1 > u and X t+l ≤ u. (1) As any non-exceedance will separate two clusters, this definition is much stricter than the classical definition described above. An advantage of the definition of u-exceedance clusters is that it depends on one parameter, namely the threshold u, only. Such a cluster definition has already been employed in a series of papers by Markovich (2014Markovich ( , 2016Markovich ( , 2017 who analyzes the limit distribution of two cluster characteristics. First, she considers the number of inter-cluster times T 1 (u), i.e. the number of observations between two consecutive clusters, which is a random variable with the same distribution as min{j ≥ 1 : X j+1 > u} conditional on X 1 > u.
Note that this number of inter-cluster time also plays an important role in the estimation of the extremal index (Ferro and Segers, 2003). Secondly, she studies the random variable T 2 (u) with the same distribution as min{j ≥ 1 : X j+1 ≤ u} conditional on X 1 ≤ u, i.e. T 2 (u) − 1 is the length of a u-exceedance cluster starting at some fixed time. Since we have lim u→∞ P(X 2 ≤ u) = 1, the distribution of T 2 (u) is typically expected to converge weakly to a degenerate distribution, i.e. lim u→∞ P(T 2 (u) = 1) = 1. In Markovich (2014Markovich ( , 2016, under appropriate mixing conditions, the rate of convergence is determined as a function of the extremal index. More precisely, for all ε > 0, there exist a threshold u 0 = u 0 (ε) a number j 0 = j 0 (ε) such that, for all u > u 0 and j > j 0 , i.e. for a sufficiently large threshold u, the tail of the distribution of T 2 (u) becomes proportional to a geometric distribution with parameter 1 − P(X 0 > u) θ . Furthermore, Markovich (2014Markovich ( , 2016 provides results for the duration of clusters if the time between subsequent observations is random. In our paper, we will focus on the case of an equally spaced time series, i.e. the case when the terms of the duration and the size of cluster coincide. Here, instead of considering the probability that there is a cluster of a specific size at a certain time, we analyze the size of a randomly chosen u-exceedance cluster or, equivalently, we examine the size of a cluster conditional on being a cluster of positive length. Thus, we first address the question: How long does an extreme event in a time series last provided that it occurs? Secondly, we analyze so-called ordinal patterns which we find in the above mentioned clusters of extremes. Ordinal patterns keep the ordinal information of the data only and, thus, describe their 'up-and-down behaviour'. Here, the relative position of the data points x 0 , . . . , x l−1 is encoded by a permutation π on {0, . . . , l − 1} such that Note that this permutation is unique if the data points x 0 , . . . , x l−1 are pairwise distinct. The following precise definition also accounts for the ties by keeping the order of the indices in this case. Definition 1.1. For l ∈ N, let S l−1 be the set of permutations of {0, . . . , l − 1}. The l-ordinal pattern is defined as the mapping Π : R l → S l−1 that maps a vector (x i ) l−1 i=0 to the unique permutation π satisfying x π(0) ≥ x π(1) ≥ . . . ≥ x π(l−1) and π(min{i, j}) < π(max{i, j}) if x i = x j for i = j.
Ordinal patterns have been introduced in order to analyze noisy data sets which appear in medicine, neuroscience and finance (cf. Bandt and Pompe, 2002;Keller et al., 2007;Sinn et al., 2013). They have already been used successfully in the estimation of the Hurst parameter (Sinn and Keller, 2011). Further applications include tests for structural breaks (Sinn et al., 2012) and the analysis of the Kolmogorov-Sinai entropy of dynamical systems (Keller et al., 2015). Ordinal patterns can be used nicely to capture stylized facts as trends or inversions of the direction which might be used to characterize and classify certain events. To our knowledge the present paper is the first approach to analyze the ordinal behavior which can be observed in clusters of extremes of time series. The advantages of the proposed method include that the whole analysis is stable under monotone transformations of the state space. This will be useful in our analysis. Furthermore, the ordinal structure is not destroyed by small perturbations of the data or by measurement errors. There are fast algorithms to analyze the relative frequencies of ordinal patterns in given data sets (cf. Keller et al., 2007, Section 1.4).
In the future, ordinal patterns in clusters of extremes might be used to detect structural breaks in the extremes of the given time series (cf. Unakafov and Keller, 2018). Dealing with 'correlated' time series, one could analyze the dependence between extreme events in a non-linear fashion as it has been developed in Schnurr (2014) and Schnurr and Dehling (2017). This might be advantageous in particular if the time series are on totally different scales. Finally, as we will point out in Section 6, ordinal patterns at the beginning of a cluster of extremes might be used in order to forecast the length of this cluster in an on-line analysis of data. Our analysis is embedded in a different theoretical framework than the works of Markovich (2014Markovich ( , 2016Markovich ( , 2017, namely, we will assume that the stationary time series of interest, (X t ) t∈Z , is regularly varying. Note that this is a common assumption in extreme value theory allowing for convenient extrapolation to the tails of the distribution. More background on the theory of regularly varying time series will be provided in Section 2. In Section 3, we show that both the distribution of the size of u-exceedance clusters, as defined in (1), and the distribution of the ordinal pattern within a cluster converge to (typically non-degenerate) limit distributions in case of a regularly varying time series. Based on a sliding window approach, non-parametric empirical estimators for the limit distributions are introduced in Section 4. Under conditions, similar to those considered in Davis and Mikosch (2009) for the estimation of the extremogram, consistency (Proposition 4.1) and asymptotic normality (Corollary 4.6) of the estimators are established. In Section 5, we consider the example of max-stable time series and provide sufficient conditions in terms of extremal coefficients for Corollary 4.6 to hold. The conditions are verified for a Brown-Resnick time series which is then simulated to demonstrate the finite-sample behaviour of the estimators. Finally, we apply the estimator to daily discharge data of the river Rhine at Cologne in Section 6. The proofs of our results can be found in the appendix.

Background: Regular Varying Time Series
Throughout this paper, we will assume that X = (X t ) t∈Z is a stationary time series whose marginal distribution F 0 , defined by F 0 (x) = P(X 0 ≤ x), is in the max-domain of attraction of an extreme value distribution, i.e., there exist constants a n > 0, b n ∈ R, such that for some non-degenerate distribution G 0 . Without loss of generality, we may assume that G 0 is an α-Fréchet distribution for some α > 0, and that F 0 has a finite lower endpoint, inf{x ∈ R : F 0 (x) > 0} > −∞. Both properties can be achieved by applying strictly monotone marginal transformations to (X t ) t∈Z provided that F 0 is continuous. As these transformations are the same for each t ∈ Z -remind that X is stationary -they do not have any effect on ordinal structure of the data. In particular, ordinal patterns in extremes are invariant under these transformations. A convenient framework for our further analysis will be provided by regular variation. Among several equivalent definitions of multivariate regular variation (cf. Resnick, 2007Resnick, , 2008, for instance), we will make use of the following convenient one (cf. Basrak et al., 2002, for instance): We say that the d-variate random vector X = (X t 1 , . . . , X t d ), t 1 , . . . , t d ∈ Z, is multivariate regularly varying with index α > 0 if, for some norm · on R d , there exists a probability measure σ on the sphere as x → ∞, where → w denotes weak convergence. The limit measure σ is called spectral measure.
By Corollary 5.18 in Resnick (2008), multivariate regular variation of X with spectral measure σ is equivalent to the fact that the distribution function F of X is in the max-domain of attraction of a multivariate extreme value distribution, i.e.
F n (a n x 1 + b n , . . . , a n The limit distribution G necessarily has Φ α marginal distributions and is of the form for some Radon measure µ on E = [0, ∞) d \ {0}, the so-called exponent measure µ of G. The exponent measure µ and the spectral measure σ are related via The time series X is called regularly varying if all the finite-dimensional margins (X t 1 , . . . , X t d ), are multivariate regularly varying. By Basrak and Segers (2009), regular variation of X is equivalent to the existence of a process Y = (Y t ) t∈Z with P(Y 0 > y) = y −α for y ≥ 1 such that, for every s < t ∈ Z, The process Y is called tail process of X, see Basrak and Segers (2009) and  for more details and further properties.
In the following, we will always assume that the time series X is regularly varying with tail process Y . Furthermore, the probability measure induced by the random vector (Y i ) i∈I will be called µ I for any index set I ⊂ Z.

Distribution of Clusters of Extremes and Ordinal Patterns
In this section, we will analyze the limiting behaviour of the size and ordinal pattern of uexceedance clusters in the time series X with u-exceedances being defined as in (1). Intuitively, the following expression gives a plausible definition of the distribution of the size C u of a randomly selected u-exceedance cluster: If X is ergodic, we can apply the pointwise Birkhoff-Khinchin theorem and obtain that the above limit almost surely exists and equals Dividing both the enumerator and the denominator of (3) by P(X 0 > u), we can see from relation (2) that the distribution of C u eventually becomes independent from the threshold u as u → ∞ provided that Example 3.1. We consider two examples to compare the limiting distribution of C u as u → ∞, i.e. the distribution of the cluster size according to our definition, with the limiting cluster size distribution in the classical setting which has been studied extensively in the literature, cf. Robert (2009) and references therein. While both distributions coincide in the first examples, they significantly differ in the second one.
1. We consider a first order max-autoregressive model (cf. Davis and Resnick, 1989) where a ∈ [0, 1] and {Z t } t∈Z is a unit Fréchet noise process. Equation (5) possesses a stationary solution with unit Fréchet margins that is regularly varying. For t > 0, the tail process {Y t } t∈Z is given by where Y is a standard Pareto random variable. Furthermore, we have with probability 1 − a.
Thus, for l ∈ N, we obtain i.e. the limiting distribution is a geometric distribution with parameter 1−a. Alternatively, this distribution could also be derived from Proposition 1 in Markovich (2017) plugging the formulae for P(T 2 (u) = l + 1) into the expression lim u→∞ P(T 2 (u) = l + 1 | T 2 (u) > 1) and using that the extremal index is given by θ = 1 − a. However, obtaining a closed-form expression seems to be much simpler using the tail process.
Note that the limiting geometric distribution in (6) coincides with the limiting distribution for the size of clusters defined in the classical sense, cf. Perfekt (1994). This is due to the fact that, in the limit, exceedances over high thresholds always occur subsequently.
2. As second example, we consider a stationary moving maximum process (cf. Deheuvels, 1983, for instance) defined by with {Z t } t∈Z being a unit Fréchet noise process. By definition, we have that P(X t ≤ u | X 0 > u) → 1 for all t ∈ 2Z + 1. Consequently, i.e. the distribution of C u converges weakly to a Dirac measure in 1.
In contrast, the limiting cluster size distribution according to the classical definition is obviously a Dirac measure in 2, that is, exceedances over high thresholds always occur in pairs. As the exceedances are separated by a non-exceedance, each pair is considered as two single clusters in our definition, while they belong to the same cluster according to the classical definition.
Similarly to their size, we can investigate ordinal patterns in u-exceedance clusters. Here, for fixed l ∈ N, we are interested in the distribution of the l-ordinal pattern of a (randomly selected) u-exceedance cluster of size l: for each π ∈ S l−1 , i.e. P u,l defines a probability distribution on S l−1 . Again, this distribution converges as u → ∞:

Asymptotic Results for Empirical Estimators
According to Equation (4) and Equation (7), both the limit distribution of clusters and the limit distribution of ordinal patterns within a cluster are given by a ratio of the type a ratio of measures of two sets that are bounded from below by 1 in their second component. More precisely, in case of the cluster size distribution in (4), In the following, we will consider the general class of ratios of the above type, including both the limits in (4) and (7) as special cases, and discuss their estimation from observations X −1 , . . . , X n . It is worth noting that, making use of relation µ {−1,...,t} (A) = µ {−1,...,t+1} (A × [0, ∞)), we may replace t and t 0 by the maximum of the two, i.e. without loss of generality, we may assume that t = t 0 .
Analogously to the calculations in Section 3, one can show that ..,t} (∂A 0 ) = 0. This limit relation gives reason to set a high threshold u and use a ratio estimator of the type is the empirical counterpart of the probability P((X i ) t i=−1 ∈ uA). We will show asymptotic properties of the ratio estimator R n,un (A, A 0 ) for some appropriate sequence of thresholds (u n ) n∈N such that u n → ∞ and n P(X 0 > u n ) → ∞ as n → ∞. To obtain consistency, we also need a mixing condition for the time series (X t ) t∈Z expressed in terms of the α-mixing coefficients Here, we will assume that (X t ) t∈Z is α-mixing, i.e. α h → 0 as h → ∞, and that the coefficients (α h ) ∞ h=0 are summable. The proof is postponed to the appendix.
Proposition 4.1. Let (X t ) t∈Z be a regularly varying, strictly stationary time series with tail process (Y t ) t∈Z whose finite-dimensional distributions are given by (µ I ) I⊂Z . Assume that the corresponding α-mixing coefficients satisfy α n ∈ O(n −δ ) for all n ∈ N and some δ > 1.
Condition (M). There exist a sequence {u n } n∈N ⊂ R of thresholds and an intermediate sequence {r n } n∈N ⊂ N with lim n→∞ u n = lim n→∞ r n = ∞, lim n→∞ n P(X 0 > u n ) = ∞, lim n→∞ r n P(X 0 > u n ) = 0 such that and lim k→∞ lim sup n→∞ rn h=k P(X h > u n | X 0 > u n ) = 0.
Remark 4.2. Condition (M) is an adapted version of the condition in Davis and Mikosch (2009) who consider a ratio estimator of a similar type for the so-called extremogram. Similarly to the condition for consistency given in Equation (11), also the mixing condition in Equation (12) implies conditions on the decay of the sequence {α h } h∈N which will be discussed below in more detail.
The anti-clustering condition in Equation (13) is very similar to Condition (2.8) in Davis and Hsing (1995) and Condition 4.1 in Basrak and Segers (2009). By Proposition 4.2 in Basrak and Segers (2009), it implies that the tail process {Y t } t∈Z converges to 0 almost surely as |t| → ∞ and thus ensures finite cluster size.
To prove the asymptotic normality of the ratio estimators, we first make use of the following auxiliary result in Davis and Mikosch (2009), adapted to our setting.
Lemma 4.3. (Davis and Mikosch, 2009, Thm. 3.1) Let (X t ) t∈Z be a regularly varying, strictly stationary time series with tail process (Y t ) t∈Z whose finite-dimensional distributions are given by We further proceed by noting that Equation (12)  and, consequently, lim inf n→∞ n 2 α n = 0. Imposing the existence of a finite limit superior, we may conclude that there exists some δ ≥ 0 such that α n ∈ O(n −δ ) for some δ ≥ 2.
Using this slight strengthening of Condition (M), we can verify asymptotic normality of the estimators P n,un / P(X 0 > u n ). The proof is postponed to the appendix.
Theorem 4.4. Let (X t ) t∈Z be a regularly varying, strictly stationary time series with tail process (Y t ) t∈Z whose finite-dimensional distributions are denoted by (µ I ) I⊂Z . Moreover, let if δ > 2 in (14) or if δ = 2. Then, Remark 4.5. By regular variation, i.e. the conditional distribution is approximately normal around the desired value even though the bias might be not negligible asymptotically. If the limit expression vanishes, i.e. if we have µ {−1,...,t} (A j ) = 0, we obtain the asymptotic variance σ jj = 0, i.e. the limit distribution is degenerate.
Using the same arguments as in the proof of Corollary 3.3 in Davis and Mikosch (2009), we obtain the following corollary.
Corollary 4.6. Under the assumptions of Theorem 4.4, we have that If, in addition, Remark 4.7. Alternatively to our proofs, we could also show asymptotic normality of the vectors ( P n,un (A j )) N j=0 and ( R n,un (A j )) N j=1 , respectively, using slightly adapted versions of Theorem 3.2 and Corollary 3.3 in Davis and Mikosch (2009). Therein, besides Condition (M), they also assume the conditions lim n→∞ n P(X 0 > u n ) · α rn = 0 (17) and lim n→∞ n 1/3 P(X 0 > u n ) = ∞.
By using different techniques in the proof and extending Condition (M) by the slightly stronger assumption (14), we are able to drop condition (17). Furthermore, we replace condition (18) by conditions (15) and (16), respectively. For δ > 2, condition (15) is weaker than condition (18), which is the limiting case of condition (15) as δ 2. If α h even decays exponentially, i.e. if condition (15) holds for δ being arbitrarily large, the condition simplifies to lim n→∞ n 1−ε P(X 0 > u n ) = ∞ for some ε which is close to the minimal assumption lim n→∞ n P(X 0 > u n ) = ∞ stated in Condition (M). For δ = 2, condition (16) is slightly stronger than condition (18). However, it is still weaker than lim n→∞ n 1/3−ε P(X 0 > u n ) = ∞ for ε > 0. Thus, even though our assumptions are not necessarily weaker than the assumptions in Davis and Mikosch (2009) due to the fact that we further assume (14), our results allow for a simplification of Equations (17) and (18).
In practical applications, the usability of central limit theorems in the flavor of Corollary 4.6 for uncertainty assessment of the resulting estimates is often limited by two obstacles: • The rate of convergence includes the unknown threshold exceedance probability P(X 0 > u n ).
• The asymptotic (co-)variances are complex expressions including series expressions as given in Theorem 4.4.
In the case of Corollary 4.6, however, one can cope with both difficulties in the following way: • Applying Lemma 4.3 to the set A = [0, ∞) × (1, ∞), we obtain that, under Condition (M), Therefore, Corollary 4.6 stills holds true if we replace the exceedance probability P(X 0 > u n ) by its empirical counterpart n −1 n k=1 1 {X k >un} . • Similarly to the asymptotic (co-)variances for the extremogram estimators, the asymptotic (co-)variances arising in Corollary 4.6 can be estimated via various bootstrap techniques such as a stationary bootstrap (Davis et al., 2012) or a multiplier block bootstrap (Drees, 2015). In Section 5, we will make use of the multiplier block bootstrap which has been demonstrated to provide more accurate and robust results than the stationary bootstrap in a simulation study in Davis et al. (2018).

Example: Max-Stable Time Series
An important example of a stationary regularly varying times series (X t ) t∈Z is a stationary max-stable time series with α-Fréchet margins. According to de Haan (1984), such a time series can be represented as where {Γ j } j∈N denote the arrival times of a unit rate Poisson process and (W (j) t ) t∈Z , j ∈ N, are independent copies of a nonnegative time series (W t ) t∈Z such that E W t = 1 for all t ∈ Z. Then, (X t ) t∈Z is regularly varying with index α and its tail process (Y t ) t∈Z is of the form where P is an α-Pareto random variable, i.e. P(P > x) = x −α , x > 1, and ( W t ) t∈Z is an independent time series with see Dombry and Ribatet (2015) for more details.
The dependence structure of a max-stable time series is often summarized by its extremal coefficient function, that is, a sequence {θ(h)} h∈Z ⊂ [1, 2] given by the relation In particular, X h = X 0 a.s. iff θ(h) = 1 and X h and X 0 are (asymptotically) independent iff θ(h) = 2. The extremal coefficient function can be used to provide sufficient conditions for condition (M). The proof is postponed to the appendix.
then Condition (M) holds.
The additional assumption in the second part of the corollary that ensures the asymptotic unbiasedness of the estimator cannot be verified in this general setting. However, for the closely related extremogram (Davis and Mikosch, 2009), we have, from Equation (26) that (see also Buhl and Klüppelberg, 2018, Lemma A.1), i.e.
holds if and only if nu −3α n → 0. A similar behaviour might be expected for the conditional probabilities in Corollary 4.6 which are of the same type.
In the following simulation study, we will focus on one of the most popular models for max-stable processes, namely the Brown-Resnick process, that is, a stationary max-stable time series with unit Fréchet margins and processes (W t ) t∈Z and ( W t ) t∈Z in (19) and (20), respectively, of the form for a centered Gaussian time series (G t ) t∈Z with G 0 = 0 a.s. and stationary increments. Thus, (X t ) t∈Z is a stationary max-stable process and its law is uniquely determined by the semivariogram Kabluchko et al. (2009). In applications, often Brown-Resnick processes associated to a semi-variogram of power type γ(h) = C|h| β , h ∈ Z, for some C > 0 and β ∈ (0, 2] are considered.

Histogram of Cluster Sizes
Cluster Size Brown−Resnick Process with 0.95−Quantile as Threshold Limit Distribution (Tail Process) Figure 1: Histogram of the cluster size distribution for estimates from a simulated Brown-Resnick process (black) and the corresponding limit distribution (red). 95 % confidence intervals are added, obtained via a multiplier block bootstrap (black) and from the theoretical asymptotic distribution (red), respectively.
Similarly to Cho et al. (2016); Buhl and Klüppelberg (2018); ;  for spatial and spatio-temporal Brown-Resnick processes, we can verify that such a Brown-Resnick process satisfies the assumptions of our central limit theorems (Theorem 4.4 and Corollary 4.6, respectively). As our assumptions are different from the ones in the papers mentioned above, we will verify them independently making use of our results for general max-stable processes in Proposition 5.1. The proof is postponed to the appendix.
We now simulate a Brown-Resnick process to demonstrate the performance of the estimators of the type R n,u (·, ·) for the distribution of the cluster size and the ordinal patterns within a cluster. More precisely, we use the extremal functions approach (Dombry et al., 2016) to simulate a Brown-Resnick time series of length 1 000 000 with unit Fréchet margins and associated to the semi-variogram γ(h) = 0.1 · |h| 1.75 . We then estimate • the distribution of the cluster size, • the distribution of ordinal patterns within clusters of size 2 • and the distribution of ordinal patterns within clusters of size 3 based on exceedances over the 95%-quantile of the unit Fréchet distribution using the ratio estimators according to Equation (8). The results are displayed and compared to the exact limit distributions, calculated via simulations from the tail process, in Figures 1 and 2, respectively. The uncertainty of the estimators is assessed via the multiplier block bootstrap (Drees, 2015;Davis et al., 2018) based on fixed blocks of size 1 000. The 95 % confidence intervals obtained from the bootstrap are compared to the theoretical confidence intervals according to the asymptotic distribution given in Corollary 4.6. It can be seen that all the probabilities are estimated rather accurately and that the estimated uncertainty is close to the theoretical one, i.e. both types of confidence intervals have similar sizes. 6 Application: River Discharge at Cologne As an application we consider a time series of daily discharge data of the river Rhine measured at Cologne. In many cases, river discharge data exhibit temporal clustering of extremes, which entails the use of declustering techniques for the statistical analysis of their tail behaviour (cf. Kallache et al., 2011;Asadi et al., 2015, for instance). Here, we study the structure of these clusters making use of the estimators introduced above. We restrict ourselves to the analysis of floods in the extended winter season (DJFM), assuming stationarity of the time series within each winter period consisting of 121 days (and 122 days, respectively, in leak years). The given data set, provided by The Global Runoff Data Centre, 56068 Koblenz, Germany, consists of data from 197 winter seasons from December 1816 to March 2013. In an exploratory analysis, we calculate the empirical version of the extremogram based on exceedances the empirical 95 %-quantile according to Davis and Mikosch (2009). The result, displayed in Figure 3, shows a decrease of extremal dependence as the temporal lag increases being close to asymptotic independence for lags larger than 40 days. Further analyses indicate that runoff data from different seasons may be assumed to be independent. These observations offer the applicability of the ratio estimator and the results on its asymptotic behaviour from Section 4. We choose two different thresholds for the empirical verification of the stability of different exceedance cluster characteristics. More precisely, we consider the empirical 95 %-and 97.5 %quantiles as thresholds leading to 200 and 114 clusters, respectively. As the empirical distributions of cluster sizes are rather difficult to compare due to the large number of potential outcomes relatively to the small number of clusters, we focus on the distribution of the 2-and 3-ordinal patterns. The results are displayed in Figure 4 supplemented by 95 % confidence intervals obtained via a multiplier block bootstrap using each season as a fixed block. Even though the number of clusters is quite small, some interesting observations can be made: While both q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q potential patterns of length two occur with almost the same frequency (in particular in case of the 95 %-quantile), for clusters of size 3, the patterns for which the second observation is the largest, i.e. (1, 0, 2) and (1, 2, 0), are clearly predominant. This means that extreme events that occur three time instants tend to show an "up-down" pattern. In contrast, patters with a "down-up" behaviour, i.e (0, 2, 1) and (2, 0, 1) do not occur at all. In order to obtain more stable results based on a larger number of clusters, one could focus on patterns at the beginning of potentially longer clusters, i.e. one could consider the pattern for the first two instants within all clusters that are at least of size 2, for instance. Maybe one can even use these ordinal patterns at the beginning of clusters to predict the length of the clusters. Such an analysis, however, is beyond the scope of the present article. ≤ 2(r n + 1) n P(X 0 > u n ) Setting r n = P(X 0 > u n ) −1/δ , the right-hand side is asymptotically equal to Thus, by Chebychev's inequality, this implies that P un,n (A) − E( P un,n (A)) P(X 0 > u n ) → p 0.
Since, by regular variation, we obtain that P un,n (A)/ P(X 0 > u n ) → p µ {−1,...,t} (A) both for A = A 0 and A = A 1 . An application of the continuous mapping theorem for convergence in probability completes the proof.
Proof of Theorem 4.4. We prove the equivalent statement that all linear combinations of the random vector converge in distribution to a centered normal distribution with the corresponding variance. To this end, let a 0 , . . . , a N ∈ R and define Z n,k = 1 We note that, for each n ∈ N, the random variable l k=1 Z n,k is centered and that its variance converges a j a l σ jl to the desired quantity, which can be shown analogously to the proof of Lemma 4.3 (i.e. the proof of Thm. 3.1 in Davis and Mikosch (2009)). It remains to show that the asymptotic distribution is normal. To this end, we verify that the triangular scheme {Z n,k } k=1,...,n , n ∈ N, satisfies the conditions of Thm. 4.4 in Rio (2017): • At first, all the variables Z n,k are required to be centered and have finite variance which holds true as they are bounded.
• Secondly, we need to verify that lim sup n→∞ max l=1,...,n Var l k=1 Z n,k < ∞, which again can be shown analogously to the proof of Lemma 4.3.
For the assessment of the integral I 1 , we distinguish between the two cases δ = 2 and δ > 2.
Making use of the monotonicity of the function h → θ(h) on N 0 , the series considered in Equation (12) can thus be bounded by (h + 1)(h + 2) (2 − θ(h + r n )) .