1 Introduction

Peaks-over-threshold (POT) analysis (Davison and Smith 1990) is an approach to extreme value theory that, as its name indicates, uses the observations exceeding a high threshold to elicit information regarding the behavior of extremes. Since it takes into account more than one value per year, it lends itself well to the analysis of time series covering only a short period. Unfortunately, the selection of the threshold requires expert judgment, and stands in the way of an automatic analysis. In the present paper, we propose a “blind” selection procedure that allows the analysis to be performed automatically at a large number of stations. We apply it to hourly precipitation in Switzerland.

Due to its potentially dramatic consequences, intense hourly precipitation is of great importance for engineering and design. In the Alpine region, it is associated with flash floods, mudslides and landslides, and debris flow (Guzzetti et al. 2007; Borga et al. 2010; Toreti et al. 2013). In the form of snow, it can help trigger avalanches. Engineers have long exploited empirical relationships between precipitation amplitudes of different duration to derive the information they need for design (e.g., Geiger et al. 1991; Koutsoyiannis et al. 1998). Yet systematic analyses of extreme hourly precipitation have been hindered by the lack of long records of data at high temporal resolution. Since the 1980s, MeteoSwiss has at its disposal a network of automatic stations that record precipitation at 10-min intervals, thus providing a collection of relatively long time series of precipitation at sub-daily resolution.

The idea of using as extremes the values of a distribution exceeding a high threshold for estimating the amplitude of rare events first saw the light in the context of hydrology (e.g., Todorovic and Zelenhasic 1970). Later on, it was integrated into the context of extreme value theory (Balkema and de Haan 1974; Pickands 1975), according to which the distribution of excesses over a high threshold can be approximated by the generalized Pareto distribution (GPD), provided the largest observations over fixed blocks of time converge towards a non-degenerate distribution. In practice, estimating the GPD parameters implies choosing a threshold large enough for this approximation to be justified.

An important assumption for the approximation with a GPD is the independence of excesses. Environmental observations are manifestations of physical processes with their own time scales and cannot, a priori, be assumed to be independent. As it turns out, threshold exceedances do not necessarily occur in isolation, but are almost always “clustered” together. Clustering is an indication of dependence in the time series, although dependence does not necessarily imply clustering. For dependent series, the distribution of the tail remains of the same family, but applies to the cluster maxima rather than individual exceedances (Davison and Smith 1990).

In practice, the “true” clusters are not known and one must resort to defining them according to some reasonable but artificial criteria. The most common method, used in this paper, is runs declustering, which assumes independent exceedances to be separated by a minimum number of non-exceedances. This minimum “distance” is called the run parameter (Coles 2001). Should two exceedances be separated by a number of non-exceedances smaller than the run parameter, they are considered to form a cluster. Ultimately, all but the largest exceedance within a cluster are discarded.

Generally, expert knowledge of the variable under consideration is used to select the run parameter. This subjective choice can be avoided, however, with the procedure proposed by Ferro and Segers (2003): the run parameter is determined by the mean cluster size, which is estimated from the data. Fawcett and Walshaw (2007) advise against declustering altogether, because their simulations indicate that it leads to biased estimates of parameters and return levels.

Thus, the peaks-over-threshold method presents a particular difficulty: an appropriate choice must be made not only for the run parameter, but also for the threshold. Choosing too high a threshold results in small samples and high uncertainties, while with too low a threshold, the variance is small, but the bias is potentially large. A widely used graphical selection for the threshold, based on stability properties, was proposed by Davison and Smith (1990). The method developed later by Dupuis (1998) to guide threshold selection, while not purely graphical, cannot be used blindly because it requires careful judgment by the practitioner.

Clearly, the value selected for the threshold can be expected to modify the dependence structure of the resulting time series, and thereby affect the minimum distance necessary to separate dependent exceedances. Thus, as Walshaw (1994) points out, the threshold and run parameter should be chosen in combination. A high threshold allows for a smaller run parameter and vice versa (Palutikof et al. 1999). This issue is addressed by Süveges and Davison (2010), who devise a test that ultimately allows joint graphical selection of threshold and run parameter. In the context of climate research, however, it is not uncommon to be confronted with a huge number of time series. Thus, graphical selection is not practicable and there is a need for automatic selection. For heavy-tailed distributions, Frigessi et al. (2002) devise a method that can be fully automated. A further approach that can be automated was proposed by Wadsworth and Tawn (2012), but implies choosing the run parameter separately.

In this paper, we propose a simple automation of the method developed by Süveges and Davison (2010), which tests all pairs of thresholds and run parameters for misspecification of the model for inter-exceedance times. As will be explained below, the automatic selection simply consists in choosing the pair yielding the largest number of observations within a subset with particularly low misspecification. This procedure allows us both to automate our analysis, and to jointly select threshold and run parameter.

We apply this method to time series of hourly precipitation extending from 1981 to 2010 at 59 stations in Switzerland, and present a climatology of extreme hourly precipitation. Dependence at extreme levels is examined independently with the help of dependence measures derived from extreme value theory (Coles et al. 1999), in order to shed light on its seasonal and regional characteristics.

Our paper is organized as follows: The statistical methods applied in this study and the data used for the analysis are presented in Section 2. The results in terms of method and climatic characteristics of hourly extreme precipitation are given in Section 3, and discussed in Section 4.

2 Methods and data

Extreme value statistics offers three approaches to the analysis of rare events. The first, Block Maxima, approximates the parent distribution’s tail with a distribution for the maxima over time blocks of equal size (Fisher and Tippett 1928; Gnedenko 1943). The second, peaks-over-threshold (POT), approximates the behavior of extremes with a distribution for the values over a high threshold. The point process model unifies the first two approaches. It describes occurrences in time or space, and can be used to model threshold excesses as occurrences in time with a given amplitude. In the present paper, we use the POT approach. The use of POT requires the selection of a threshold to satisfy the asymptotic conditions. In addition, in the approach used here, a minimum separation interval between exceedances, called the run parameter, is selected to guarantee independence. In a cluster consisting of exceedances separated by less than the run parameter, only the maximum exceedance is retained. Thus, each combination of values for the threshold and run parameter leads to a distinct series of observations and intervals. The time intervals between consecutive exceedances are called inter-exceedance times; those that separate clusters of exceedances are named inter-cluster times; finally, the intervals between exceedances within a cluster will be referred to as intra-cluster times.

The automatic joint selection of threshold and run parameter proposed in this paper hinges on the time intervals between exceedances. In theory, inter-cluster times must follow an exponential distribution, while intra-cluster times tend to zero. This assumption is verified separately for a set of pairs of threshold and run parameter, among which one pair is selected.

2.1 Theory

The peaks-over-threshold approach to extreme value theory formulates a limiting distribution for the excesses over a high threshold. Let X 1, …, X n be a strictly stationary sequence with marginal distribution F, such that the dependence at extreme levels decays asymptotically. The excesses Y = X − u over a threshold u, conditional on X > u, converge towards a limiting distribution H called the generalized Pareto distribution (GPD):

$$ H(y)=\left\{\begin{array}{c}\hfill 1-{\left(1+\frac{\xi y}{\sigma}\right)}^{-\frac{1}{\xi }},\xi \ne 0,\hfill \\ {}\hfill 1- \exp \left(-\frac{y}{\sigma}\right),\xi =0,\hfill \end{array}\right. $$

where y > 0, and H is defined on 1 + ξy/σ > 0 (Coles 2001; Beirlant et al. 2004).

The parameters of the GPD are the scale σ, which is a measure for the spread of the distribution, and the shape parameter ξ describing the behavior in the tail. If ξ = 0 (ξ > 0), the distribution is called light-tailed (heavy-tailed). If ξ < 0, the distribution is bounded, i.e., the excesses have an upper bound.

One important application of extreme value statistics is to estimate the amplitude of rare events expected to be exceeded on average once every T years. These amplitudes are referred to as return levels for the return period T. For a declustered sequence, the quantity of interest is the rate at which clusters occur (Coles 2001). In this study, we use the formulation by Palutikof et al. (1999), in which the number of exceedances per year is modeled with a Poisson distribution with expected value λ. Let 1/θ denote the mean cluster size. Then, the T year return level is given by

$$ {x}_T=u+\frac{\widehat{\sigma}}{\widehat{\xi}}\left[{\left(T\widehat{\lambda}\widehat{\theta}\right)}^{\widehat{\xi}}-1\right]. $$

For stationary sequences and in the limit of large n, Hsing (1987) shows that the extremal process can be interpreted as a 2-dimensional process with dimensions time and threshold excess. The time intervals are normalized by the number of observations, so that the entire process takes place between 0 and 1. As n becomes large, intra-cluster times collapse to zero, and the clusters, rather than the individual exceedances, are independent. Projected on the time axis, the cluster occurrence follows a Poisson process, while in the other dimension, the largest excess in each cluster, follows a GPD (Davison and Smith 1990). This approach can be exploited to derive the asymptotic distribution of inter-exceedance times, which is at the heart of the automatic selection of threshold and run parameter presented in this paper.

2.2 Modeling of inter-exceedance times and misspecification test

Ferro and Segers (2003) show that for very high thresholds and in the limit of large n, the inter-exceedance times converge to a mixture distribution with parameter θ, named the extremal index: intra-cluster times tend to zero and occur with a probability (1 − θ), while inter-cluster times converge to an exponential distribution and occur with probability θ. The mean of the exponential distribution is 1/θ, which turns out to be the mean cluster size mentioned above. Thus,  θ plays a double role: it is both the proportion of inter-cluster times, and the reciprocal of the mean inter-cluster time. Süveges and Davison (2010) apply the information matrix test (IMT) developed by White (1982) to the likelihood of the limit law of the inter-exceedance times (details of the likelihood function can be found in the Appendix).

In order to use the likelihood function for the distribution of inter-exceedance times in practical applications, Süveges and Davison (2010) truncate intervals that exceed the run parameter in length. The resulting limit law for inter-exceedance times takes the same form as in Ferro and Segers (2003), but the intra-cluster inter-exceedance times, which have length 0, can be accounted for in the likelihood function, allowing for an estimation of the mean cluster size that is not biased towards 1.

For a formal treatment of the IMT in this particular context, the reader is referred to Süveges and Davison (2010). Essentially, the IMT rests on the fact that for a well-specified model, Fisher’s information matrix \( I=E\left\{\frac{\partial^2\ell }{\partial {\theta}^2}\right\} \) equals the variance of the score vector \( J= Var\left\{\frac{\partial \ell }{\partial \theta}\right\} \), where denotes the log-likelihood and E the expected value (see also Davison 2008, chapter 4). The null hypothesis H 0  is that the model is well specified, in which case the difference D = I − J should vanish. The IMT statistic can then be constructed as D divided by its asymptotic variance, and is χ 21 – distributed for large samples. Thus, H 0 can be rejected at the 5 % level for IMT > 3.84. The formula for the IMT can be found in the Appendix.

2.3 Automatic selection of threshold and run parameter

The IMT provides a quantitative assessment of the compatibility of the threshold–run parameter pair with the two-dimensional extremal process. It is used to try, one by one, all combinations of the two parameters in a plausible range, and results in a list of pairs that are not rejected at the 5 % confidence level, i.e., with IMT < 3.84. The automated procedure proposed here is pragmatic: it takes a subset thereof for which the IMT is close to zero—and hence, misspecification is liable to be small—and selects the pair leading to the largest number of observations after declustering. Here, we take IMT < 0.05 (corresponding to a p value of 0.82) as a convenient upper limit for this subset of “non-rejected” IMT values. When the threshold–run parameter pairs lead to a number of exceedances smaller than 80, they are discarded because simulations revealed that there is not enough data to determine whether the pair should be rejected (Süveges and Davison 2010).

Suppose N observations from the stationary sequence X 1, …, X n exceed the threshold u. The probability of exceedance of the threshold is then N/n. Let the indices \( \left\{{j}_i:{X}_{j_i}>u\right\} \) denote the locations of the exceedances, and T i  = j i + 1 − j i (i = 1, …, N − 1) the inter-exceedance times. Let K denote the run parameter, and c (u,K) i  = (N/n)max{T i  − K, 0} be the inter-exceedance times truncated by K and normalized by the probability of exceedance. In effect, K splits the sequence of inter-exceedance times T i   into clusters, separated by inter-cluster intervals. Consecutive exceedances separated by an interval equal to or shorter than K are within the same cluster, and c (u,K) i  = 0. Between clusters, the intervals are simply shortened by  K. Let N c denote the number of clusters, and θ the extremal index, i.e., the parameter of the asymptotic distribution for inter-exceedances times.

The automated procedure is done as follows:

  1. 1.

    For each (u, K) pair, compute c (u,K) i .

  2. 2.

    For each (u, K) pair, determine N C . Compute the IMT (see Appendix), and estimate θ, the extremal index.

  3. 3.

    Determine the (u, K) pairs for which IMT < 0.05.

  4. 4.

    Select the (u, K) pair for which N C is largest.

The range chosen for u is between the 90th and the 99.5th percentile of non-zero values. Note, however, that the zero values were retained in the computation of inter-exceedance times. The values for K extend from 1 to 120 h. In a previous analysis, a threshold equal to the 90th percentile of non-zero values (corresponding to u = 0.90) was selected by applying the graphical approach of Davison and Smith (1990) at a subset of stations representing different climatic regimes in Switzerland (Begert 2008). This threshold was combined with a run parameter of 5 days, thus safely guaranteeing independence of the observations. We shall regard this threshold–run parameter pair (u = 0.90, K = 120) as a reference with which to compare our results. For simplicity, we will refer to the automated procedure as the “IMT selection”, and the selected pair as the “IMT pair.” The pair (u = 0.90, K = 120) will be called “reference pair,” and the use of the reference pair regardless of the season “reference method” or “reference selection.”

2.4 Inference

In this study, the GPD parameters are estimated by maximum likelihood, using the log-likelihood in Davison and Smith (1990) with an added term for the Poisson distribution of the number of exceedances per year.

As we have seen above, the return level x T depends not only on the GPD parameters, but also on the reciprocal mean cluster size θ. Thus, estimation of its confidence intervals requires knowledge of the dependence between θ and the GPD parameters, which is unknown, since they are estimated separately. Confidence intervals for the return levels can nevertheless be determined by a bootstrapping procedure inspired from Ferro and Segers (2003), in which the inter-exceedance times are first categorized into inter-cluster times (between clusters) on the one hand, and intra-cluster times (within clusters) on the other.

The clusters, each consisting of a sequence of exceedances separated by intra-cluster times, are resampled with replacement a sufficiently large number of times. Separately, the inter-cluster times are also resampled with replacement. A new, artificial, time series is then reconstructed by alternating a cluster with an inter-cluster time. The time series is truncated when the original number of exceedances is reached. This procedure preserves the structure of the inter-exceedance times, including the probability of exceedance and the sequence of exceedances within a cluster. The GPD parameters and return levels are then estimated from the resulting artificial time series for the threshold and run parameter selected on the basis of the original time series. The procedure is repeated 5,000 times and the 95 % confidence intervals for the return levels evaluated.

As each (u, K) pair selection leads to a different time series, neither common criteria for model selection, nor goodness-of-fit tests are appropriate for a quantitative comparison of the quality of the fits based on declustered series resulting from IMT or reference selection. Here, we opt for a quantitative summary of the QQ plot, which can be seen as a visual guide to goodness of fit. Let N c denote the number of clusters, and \( {z}_{(1)}\le \cdots \le {z}_{(k)}\le \dots \le {z}_{\left({N}_c\right)} \) the ordered cluster maxima. We use the quantile normalized root mean square error

$$ qnrmse=\sqrt{\frac{1}{N_c}{\displaystyle \sum_{k=1}^{N_c}}{\left(\frac{q_t\left({p}_k\right)-{q}_e\left({p}_k\right)}{q_t\left({p}_k\right)}\right)}^2}, $$

where p k  = (k − 1/2)/N c , q t is the quantile of the estimated GPD, and q e is the empirical quantile. The computation of the empirical quantile is distribution free. It is based on the modal position (see definition 7 of Hyndman and Fan (1996)).

2.5 Extremal dependence measure

Hourly precipitation can be expected to exhibit dependence, even between its less frequently occurring extreme values. Independent information on the duration of this dependence at a particular location can be elicited by means of a pair of dependence measures denoted by \( \left(\chi, \overline{\chi}\right) \), provided by bivariate extreme value theory. The pair is designed specifically to express dependence at extreme levels (Coles et al. 1999; Coles 2001). Let (X, Y) be a two-dimensional random vector, such that the marginal distributions of X and Y are identical. The dependence measure χ provides information on asymptotic dependence, and it is defined as the limit, as the threshold u rises, of the probability that Y also exceeds u if X exceeds u. It can take values between 0 and 1, and vanishes if X and Y are asymptotically independent.

The second dependence measure \( \overline{\chi} \) describes the dependence between asymptotically independent random variables. It takes values between −1 and 1. For asymptotically dependent variables, \( \overline{\chi}=1 \), and for independent variables, \( \overline{\chi}=0 \); the sign of \( \overline{\chi} \) is positive (negative) when an increase (decrease) in X tends to correspond to an increase in Y. For asymptotically independent variables, \( \overline{\chi} \) increases with the strength of dependence at moderately extreme levels. These dependence measures must be used in combination. If χ = 0, X and Y are asymptotically independent, and \( \overline{\chi} \) must be used to evaluate dependence at moderately extreme levels.

In order to evaluate the dependence between extreme occurrences of hourly precipitation, χ and \( \overline{\chi} \) are estimated empirically—the data is transformed to the uniform distribution with the empirical distribution function—for lagged exceedances over a range of thresholds. We estimate \( \overline{\chi} \) at different lags of hourly precipitation for the 99th percentile of the full data set. This corresponds to quantiles of non-zero values between 92 and 93 %, depending on the season, and offers an indication of extremal dependence at the thresholds used for the IMT statistic. As a crude estimate of the lag at which dependence between exceedances can be expected to die out, we will use the first lag at which the lower confidence bound of \( \overline{\chi} \) intersects the \( \overline{\chi}=0 \) line, which we will call the maximum dependent lag. The confidence intervals are estimated with the delta method, and rely on assumptions such as the independence of the observations. They are therefore likely to be much too narrow, and the results should be interpreted with caution.

2.6 Data

The data used in the present study consist of hourly precipitation observations at 59 stations of the meteorological network of the Swiss Federal Office of Meteorology and Climatology (MeteoSwiss). Their geographical and cross-Alpine altitudinal distributions are shown in Fig. 1.

Fig. 1
figure 1

Location of the stations considered in the present study (black dots; red triangle: station Altdorf) on a map of Switzerland (a) and in a vertical cross-section across the Alpine ridge (b). The latter is defined as the distance to the inner-Alpine valleys (thick red line in a). The profile of the Alps is computed with the USGS-GTOPO30 (http://eros.usgs.gov) digital elevation model, and is the minimum air–line distance of each grid-point center to the inner-Alpine valleys. The thick (thin) gray line(s) represent(s) the smoothed median (10 and 90 % quantiles) of the distances in 100-m bins

Stationarity of the parent time series is an essential prerequisite for valid application of the GPD. Thus, only stations that suffered neither a change in instrument type nor a displacement in the period of interest were considered. Note that the data underwent a standard quality check, but were not homogenized. At a small number of stations, trends in the 90 % quantile of non-zero values were detected with a seasonally applied Mann-Kendall test (Mann 1945; Kendall 1948). In such cases, the linear trend was estimated with the Theil-Sen estimator, and subsequently eliminated (Theil 1950; Sen 1968).

The length of the data sets is variable, between 20 and 30 consecutive years, over the period 1981 and 2010. The analysis presented here concentrates on subsets of the original data corresponding to the four seasons. Winter is defined as December–January–February (DJF), spring as March–April–May (MAM), summer as June–July–August (JJA), and autumn as September–October–November (SON).

3 Results

3.1 IMT selection and extremal dependence

IMT selection is applied to hourly precipitation at the 59 selected stations, but is first presented here at one station in detail. The plausibility of the selection in terms of mean cluster size—which can be seen as a summary quantity for threshold and run parameter—is then examined at all stations. As it represents the average length of events, we compare it to the maximum dependent lag (see Section 2.5). Finally, the ensuing GPD estimates are compared with those obtained with the reference method (see Section 2.3).

The IMT selection is illustrated in Fig. 2 for winter and summer hourly precipitation at the station Altdorf, located in Central Switzerland in a valley of the northern Alpine rim. The IMT results for all pairs are represented as a two-dimensional surface for DJF and JJA in Fig. 2a, b. The red dots are the (u, K) pairs selected by the algorithm. The “top left” corner of the surface corresponds to the reference pair.

Fig. 2
figure 2

Analysis of hourly precipitation at station Altdorf (altitude 438 m; distance from inner-Alpine valleys = 26 km) for winter (left) and summer (right). Top IMT statistic. x-axis threshold u as the proportion of non-zero values that are below it. y-axis run parameter K in hours. Blue line 5 % critical value χ 21 (0.95) = 3.84. Black dots (u, K) pairs for which IMT < 0.05 (p value = 0.82). Red dots IMT pair, i.e., (u, K) pair with the largest number of clusters. Middle 50-year return levels in millimeter per hour vs. u for all (u, K) pairs yielding more than 80 clusters (gray segments). The darker shades indicate higher K. Red segments pairs for which the IMT value is below 0.05. Red star selected (u, K) pair. Black cross reference (u, K) pair. Blue point up (down) triangles upper (lower) confidence bounds of pairs with IMT < 0.05. Bottom Return level in millimeter/hour vs. return period. Blue best estimate. Green 95 % confidence intervals. Black dots observations (ordinate) vs. empirical return period (abscissa)

In winter, pairs with low u and low K lead to rejection at the 5 % confidence level. The IMT appears to favor either low u and high K, or high u and low K (Fig. 2a). This particularity is typical for the winter season (not shown). In summer, a pair with a low value for K is selected, while the combination of low u and high K yield inter-cluster times incompatible with the assumptions of a point process (Fig. 2b). All stations but a few have a similar IMT surface in summer (not shown).

Figure 2c, d show the 50-year return levels computed for all (u, K) pairs at station Altdorf in winter and summer. Clearly, the return levels of all pairs are within a narrow range, especially in winter. In the subset of pairs with IMT < 0.05 with a final number of clusters greater than 80 (red segments in Fig. 2c, d; black dots in Fig. 2a, b), all return levels are within the interval between the highest lower confidence bound and the lowest upper confidence bound (blue triangles). In fact, there are only a few pairs, and only in winter, for which the return levels are outside of this interval, and therefore differ significantly from those in the subset. Thus, the estimates are stable, and the selection has little influence on the return levels. In both seasons, the reference pair (u = 0.90, K = 120) yields return levels that are among the lowest of all pairs, but still within the confidence bounds.

The return level plots for the selected (u, K) pair are displayed in Fig. 2e, f. The estimated distribution is bounded in winter and heavy-tailed in summer. This illustrates the difference between winter and summer at stations in the northern Alps, where all stations but one (75 % of stations) have a significantly negative (positive) shape parameter in winter (summer) (not shown).

For all stations, the mean cluster size is represented for the four seasons in Fig. 3a. It is smallest in summer and largest in winter, and consistently larger to the south than to the north of the inner-Alpine valleys. Likewise, the station to station variability increases from summer to winter with intermediate values in spring and autumn; it is also generally greater in the southern Alps. Both the seasonal variation, and the north–south differences are mirrored in the maximum dependent lag—a measure for the longest lag at which dependence might still be expected—derived from the extremal dependence measure (Fig. 3b).

Fig. 3
figure 3

Boxplots of the mean cluster size (left) and maximum dependent lag (right), both in hours, in the four seasons, for the southern (blue) and northern (brown) Alps (see Fig. 1 for definition)

At each station, the IMT selection yields a different threshold–run parameter pair (u, K). These vary from season to season, and differ between northern and southern Alps. In winter, thresholds vary between 2–4 mm, with nearly all southern stations below 2 mm, while in summer values go from over 2 to nearly 10 mm/h (not shown). Run parameters are approximately 60–80 (20–40) h in the northern (southern) Alps in winter, and about 10–15 (10–20) h in the northern (southern) Alps in summer (not shown).

The performance in terms of qnrmse of the IMT selection vs. the reference selection is shown in Fig. 4a. In all seasons, the IMT pair generally leads to a better fit of the GPD estimates than the reference pair. Since the IMT selection allows for lower run parameters than the reference selection, this suggests that unnecessarily large run parameters may be harmful for the subsequent estimation of the GPD parameters. Even in winter, when the difference seems moderate, the qnrmse is smaller for the IMT pair than for the reference pair at 70 % of the stations. As can be seen in Fig. 4b, the reference method leads to IMT values that would strongly suggest rejection at the 5 % level at a majority of stations in all seasons except in winter. In other words, the reference method leads to a series of exceedances with a configuration of clusters and inter-cluster times that can only poorly be represented by the distribution required by the two-dimensional process limit to the extremal process.

Fig. 4
figure 4

Boxplots in winter, spring, summer, and autumn for all stations of a: the difference in quantile normalized root mean square error (qnrmse) between IMT and reference method (negative values indicate better performance for the IMT method); b: the IMT values for the reference pair; c: the 100-year return levels of the IMT (blue) and reference (brown) methods

The 100-year return levels are generally higher for the IMT pair than for the reference pair, especially in summer (Fig. 4c). The negligible differences in winter can be attributed to the fact that the selected run parameters are rather large, and thus close to the reference run parameter. On average, however, the difference is rather small, although it may be substantial at individual stations. Particularly in July, it can reach 10 to 30 % at half of the stations, but the differences rarely exceed the range of the confidence intervals (not shown). Note that—as a quantitative summary of the QQ plot—the qnrmse is a goodness-of-fit measure that is entirely unrelated to the IMT selection of threshold and run parameter, and therefore constitutes an independent assessment of the automation method proposed.

3.2 Climatology of extreme hourly precipitation

In this section, we consider the results from a climatological point of view, and examine the seasonal cycle and geographical patterns of extreme hourly precipitation in terms of its severity, seasonal frequency, and duration. The severity is best described by the return levels for a given return period, while the frequency can be gleaned from the mean number of clusters per season. For information on event duration, we turn to the dependence measure.

The 50-year return levels represented for the four seasons in Fig. 5 show that intense hourly precipitation in Switzerland experiences its minimum in winter (a) and its maximum in summer (c). Winter is characterized by a rather narrow range of low return levels without distinctive spatial pattern. The inner-Alpine valleys, including the Inn valley, remain at low levels all year round. In the northern Alps, the Plateau witnesses an increase in the number of stations with higher return levels in spring (b). In summer, high return levels cover the entire Plateau, as well as the northern Alpine rim with slightly weaker values. Thus, there is a slight increase in the return values towards the Plateau with the distance to the Alpine ridge. In autumn (d), the severity of events wanes to a level hardly higher than in winter. In the southern Alps, the Ticino stands out with comparatively high return levels from March to November, especially in autumn, when the contrast with the north is most pronounced.

Fig. 5
figure 5

Fifty-year return levels for winter (a), spring (b), summer (c), and autumn (d). The color scale codes the return levels in millimeter/hour, and is the same for all seasons. The largest (smallest) dots correspond to the largest (smallest) return levels in the respective season

The vertical cross-section of the 50-year return levels across the Alpine ridge is shown in Fig. 6 for winter and summer. In the northern Alps, return levels increase with height in winter. In summer, they increase with the distance to the inner-Alpine valleys. No altitudinal effect can be detected in the southern Alps in winter. In summer, although the return levels at the five stations in the southern part of the Ticino are nearly twice the return levels at the other southern stations, there are not enough stations to draw any conclusions about the altitudinal effect.

Fig. 6
figure 6

Vertical cross-section of the 50-year return levels across the Alpine ridge for winter (a) and summer (b). The color scale codes the return levels in millimeter/hour, and is the same for both seasons. The largest (smallest) dots correspond to the largest (smallest) return levels in the respective season

The seasonal frequency of events (see Fig. 7), i.e., the number of clusters per season, is low from September to May, generally not exceeding eight events in the season. In summer, it increases all over Switzerland, and the highest frequencies (exceeding 14 events per season) are found along the northern Alpine rim. It is noteworthy that the southern Alps, but especially the Ticino, displays lower frequencies than in the north in all seasons.

Fig. 7
figure 7

Map of the mean number of clusters per season in winter (a), spring (b), summer (c), and autumn (d). The color scale represents the number of clusters per season, and is the same for all seasons. The largest (smallest) dots correspond to the largest (smallest) number of clusters in the respective season

The average duration of individual events and the long-term dependence of intense or heavy hourly precipitation are shown in Fig. 3a, b. Extreme hourly precipitation turns out to be asymptotically independent, even at lags of 1 h, i.e., \( \widehat{\chi}=0 \) (not shown). Thus, the quantity displayed here (b) is derived from \( \widehat{\overline{\chi}} \), the dependence at subasymptotic levels. Events last approximately 4 h in winter and less than 2 h in summer, and about 2–3 h in spring and autumn (a). Dependence between exceedances at different lags subsists for more than 40 h in winter, less than 30 h in spring and autumn, and only about 15 h in summer (b). To the south of the inner-Alpine valleys, the dependence lasts consistently longer than in the northern Alps, a fact reflected in the mean cluster size (a). Particularly in summer and autumn, the difference is considerable, with dependence lasting 10–15 h (20 to 30 h) in the north (south), and about 25 h (50 h) in the north (south), respectively.

4 Discussion

4.1 IMT selection and GPD estimates

The comparison of the IMT selection with the reference selection (Fig. 4) highlights the fact that it may be harmful to select an unnecessarily large run parameter, as shown by Fawcett and Walshaw (2007). Hourly precipitation would be most strongly affected in winter, when pairs with low run parameters tend to be rejected. The IMT selection yields better GPD estimates than the reference method because it looks for pairs leading to the largest possible number of clusters. As this number increases roughly from the “upper right” corner (u = 0.995, K = 120) to the “lower left” corner (u = 0.90, K = 1), the IMT selection will automatically select a pair with a smaller run parameter than the reference pair. The disadvantage of this method is the tendency to choose rather low thresholds, thus introducing the possibility of a bias in the estimates. This might not be of great consequence in winter, when the physical processes involved are probably the same, regardless of the amplitude of the excesses. In the other seasons, on the contrary, the most extreme events and the moderate ones are likely to originate from different processes.

Extreme value theory assumes that the parent distribution is stationary. Like most climatic variables, however, precipitation undergoes an annual cycle. In the present work, seasonality is taken into account by dividing the year in 3-month bins, and considering them separately, rather than modeling it explicitly, as done in several recent studies (Katz et al. 2002; Maraun et al. 2009; Rust et al. 2009; Umbricht et al. 2013). No attempt was made to account for the daily cycle of hourly precipitation. While its amplitude is negligible in winter, the diurnal cycle experiences a maximum in the late afternoon, followed by a gentle decrease over the next 15 h (Wüest et al. 2010). Analysis of the empirical quantiles at the stations used here confirmed this behavior.

For precipitation, the form of the tail appears to vary with duration (Buishand 1991; Pearson and Henderson 1998), geographical location (Revfeim 1982; Buishand and Demaré 1990; Pearson and Henderson 1998; Friederichs 2010; Toreti et al. 2010; Maraun et al. 2011), and altitude (Pearson and Henderson 1998; Cooley et al. 2007; Gardes and Girard 2010). For daily precipitation, the shape parameter appears to be mostly light or heavy-tailed.

It turns out, however, that the shape parameter of extreme hourly precipitation is significantly positive (negative) in the northern Alps in summer (winter) (not shown). A variation of the shape parameter with altitude could not be detected. In summer, it increases from the northern Alpine rim towards the Plateau, as do the return levels (not shown). This is consistent with the higher seasonal frequencies along the northern Alpine rim than in the plain.

The tendency towards more negative shape parameters in winter may, to some extent, be explained by the microphysical processes taking place. Winter precipitation is stratiform, and the precipitation particles form essentially through vapor deposition. The upward motion must remain weak to allow the droplets to fall (Houze 1997). This sets an intrinsic upper limit to the hourly precipitation rate in winter. As a result, observations of winter hourly precipitation in Switzerland are confined to a narrow interval, and the highest values rarely stray far from the body of the distribution.

4.2 Climatology

The signature of the Alps can be seen in the annual mean precipitation (Frei and Schar 1998; Isotta et al. 2013). A marked wet anomaly covers the northern Alpine rim, and most of the southern rim, while the inner-Alpine valleys, in particular the Rhone valley in the southwest and the Inn Valley in the Grisons in the southeast of Switzerland, are dry. The Ticino, on the steep southern rim, is host to the heaviest precipitation (Frei and Schmidli 2006; Isotta et al. 2013).

Some of these features are reflected in the findings of the present study. Winter return levels increase somewhat with height, as might be expected, given the role of orographic precipitation in the Alpine region. The inner-Alpine valleys witness few events of intense hourly precipitation, even in summer. In addition, the return levels there generally remain very low. The low frequency of summer thunderstorms in these deep valleys is attributed to inadequate moisture flux convergence, and lack of a lifting source (van Delden 2001).

The increased number of events in summer along the northern Alpine rim mirrors the thunderstorm path as represented by the lightning climatology (MeteoSwiss, personal communication). It is also reminiscent of the frequency of observations exceeding 10 mm/h in the analysis of reconstructed hourly precipitation by Wüest et al. (2010). Note, however, that these quantities are not directly comparable, since a cluster contains several observations, and the thresholds differ from station to station.

Finally, the Ticino displays comparatively high return levels from March to November, but stands out particularly in autumn with return levels twice to three times as large as in the rest of Switzerland. This can be attributed to southerly flow impinging on the Alps as the midlatitude cyclones reach further south with the approach of winter. These can lead to violent precipitation as the warm humid Mediterranean air is forced upwards over a short distance, rapidly reaching the level of free convection (Gheusi and Davies 2004, and references therein).

In contrast to the annual mean, the pattern of return levels of heavy hourly precipitation in summer does not disclose the structure of the Alps. In fact, the return levels experience a slight increase from the northern Alpine rim towards the Swiss Plateau. This may be explained by the thunderstorm formation and propagation. Thunderstorms form in squall lines ahead of cold fronts (Haase-Straub et al. 1994), or in response to thermally driven topographic flow (Langhans et al. 2013). Linder et al. (1999) identify the Jura and the northern Alpine rim as regions of genesis for convective cells. However, several studies show that these drift towards adjacent flat areas (Finke and Hauf 1996; Bertram and Mayr 2004), such as, in this case, the Swiss Plateau.

Given the fact that hourly precipitation results from processes of varying time scales, we can expect the observations to be dependent over a certain time interval. While synoptic systems take a few days to sweep over Switzerland, convective cells have a lifetime extending from 1 h or less for single cells up to 12 h for supercells (Bertram and Mayr 2004). Of course, they are generally not stationary, and a single station may be affected only over a much shorter time. In the present study, dependence was examined in all seasons north and south of the inner-Alpine valleys, and these climatic characteristics were found to be reflected in the seasonal variation both of the maximum dependent lag and the mean cluster size. The values for dependence in the northern Alps are in accordance with the study by Huser and Davison (2013), who detected dependence of hourly precipitation at extreme levels of the order of 10–15 h in summer at a selection of stations in the Jura and the Swiss Plateau. Both dependence and average duration of events exhibit strong spatial variability in winter (see Fig. 3), despite the fact that winter events are dictated by large-scale midlatitude cyclones. It is noteworthy that dependence itself is consistently larger in the southern Alps. The associated variability from station to station is larger from May to November on the southern side of the Alps, especially in autumn, pointing perhaps to different dominant processes at different stations.

5 Summary and conclusions

In the present paper, we propose an automated procedure for the selection of threshold and run parameter in the peaks-over-threshold (POT) approach to extreme value analysis based on the graphical method developed by Süveges and Davison (2010). The automated procedure sets aside a subset of non-rejectable threshold–run parameter pairs and, in this subset, selects the pair that generates the largest number of clusters. We apply it to hourly precipitation in Switzerland in the period 1981–2010.

The tendency of extreme events to cluster indicates underlying dependence in the data. In particular, dependence between high exceedances should be reflected in the mean cluster size. In order to put our findings into context, lag dependence of hourly precipitation at extreme levels was computed with the help of the dependence measures by Coles et al. (1999).

Applied to hourly precipitation in Switzerland, the misspecification test brings to light typical seasonal structures in the inter-exceedance times. In winter, combinations of low thresholds and low run parameters, leading to relatively small mean cluster sizes, are rejected. In summer, strong misspecification arises for combinations of low thresholds and high run parameters, corresponding to the largest mean cluster sizes. In this context, the automatic selection picks threshold–run parameter pairs that yield mean cluster sizes in accordance with the seasonal characteristics of the separately estimated dependence at extreme levels.

The GPD estimates based on peaks-over-threshold analysis of the cluster maxima resulting from the automated selection highlight many known features regarding precipitation in the Alpine region. It is noteworthy that in summer, the signal due to thunderstorm activity is visible in the seasonal frequency of events, rather than in their severity, as represented by return levels for high return periods.

The present study exemplifies the argument by Fawcett and Walshaw (2007) that unnecessarily large run parameters have adverse consequences on subsequent estimation of the GPD parameters. Compared to a reference selection with fixed threshold and run parameter, the IMT selection allows for lower run parameters. This leads to higher return levels, and in practice to more stringent design measures, a positive outcome considering the uncertainty associated with planning long-term structures based on only 30 years of data. Finally, the patterns of extreme hourly precipitation suggest that requirements for the observational network may be different for hourly than for daily precipitation.