Statistical post-processing of forecasts for extremes using bivariate brown-resnick processes with an application to wind gusts

To improve the forecasts of weather extremes, we propose a joint spatial model for the observations and the forecasts, based on a bivariate Brown-Resnick process. As the class of stationary bivariate Brown-Resnick processes is fully characterized by the class of pseudo cross-variograms, we contribute to the theorical understanding of pseudo cross-variograms refining the knowledge of the asymptotic behaviour of all their components and introducing a parsimonious, but flexible parametric model. Both findings are of interest in classical geostatistics on their own. The proposed model is applied to real observation and forecast data for extreme wind gusts at 119 stations in Northern Germany.

forecast of these events is of great importance. However, the rareness of extreme events impedes any such task and, consequently, existing forecasts often lack accuracy. In meteorology, for example, forecasting extreme wind gusts, which are defined as peak wind speeds over a few seconds, is exacerbated by the short temporal and spatial ranges. Furthermore, numerical weather prediction (NWP) models provide estimates or diagnoses of wind gusts based on empirical knowledge only (cf. Brasseur 2001). Although wind is a prognostic variable in NWP models, its value represents an average wind speed over a few minutes or longer depending on the grid spacing of the NWP model. Hence, post-processing procedures are needed that allow for an enhanced probabilistic forecast.
Occurring as limits of normalized pointwise maxima of stochastic processes, maxstable processes provide a suitable framework for the description of spatial extreme events, commonly used in environmental sciences (Coles 1993;Coles and Tawn 1996;Huser and Davison 2014). Of particular interest is the subclass formed by Brown-Resnick processes which arise as limits of rescaled maxima of Gaussian processes (Brown and Resnick 1977;Kabluchko et al. 2009;Kabluchko 2011).
During the last years, max-stable processes have been frequently applied as models for spatial extremes in environmental sciences. For instance, Engelke et al. (2015) and Genton et al. (2015) recently used max-stable processes to model extreme wind speed observations. The model we propose will go one step further, also taking into account the forecasts in two different aspects: First and in contrast to Engelke et al. (2015) and Genton et al. (2015), we consider the mean forecast to get a normalized version of the extreme observations. Second, besides the observable variable of interest itself, the corresponding forecast is included as second variable yielding a bivariate max-stable process. Here, we will focus on the class of bivariate Brown-Resnick processes (cf. Molchanov and Stucki 2013;Genton et al 2015) to exploit the statistical relation between observable data and the corresponding forecast. Modeling the behavior of observational data, a sample from the distribution of the observations conditional on the forecast is supposed to provide more realistic results than the original forecast and thus will appear as an appropriate probabilistic post-processed forecast.
The paper is structured as follows: In Section 2, we present a univariate model for extreme observations, which may, in general, provide a first alternative to the original forecast. We introduce a model for the marginal distribution, i.e. the distribution of the observable variable of interest at a single location, motivating the normalization of its extremes by the mean forecast. The spatial dependence structure is incorporated into the model by the use of univariate Brown-Resnick processes. Section 3 is dedicated to the bivariate Brown-Resnick process which serves as a joint model for both the maximally observed and forecasted quantities. We deduce a necessary condition on the asymptotic behavior of the pseudo cross-variogram and provide a flexible cross-variogram model which leads to a stationary bivariate Brown-Resnick process. In Section 4, we describe how the model can be fitted to data. Based on this model, we propose a post-processing procedure which is presented in Section 5. Further, we provide tools to verify the procedure and the underlying models. Finally, the methods presented in Sections 4 and 5 are applied to real observation and forecast data for extreme wind gusts provided by Germany's National Meteorological Service, Deutscher Wetterdienst (DWD) (Section 6).

Modeling by a univariate random field
In this section, we present a spatial model for the observed pointwise maximum V obs max within a specific time period. To this end, we assume that, for each location and time period, the maximum V obs max is based on observations at N equidistant instants of times per period, that is, we have V obs max = max t=1,...,N V obs t for V obs 1 , . . . , V obs N ∼ F ϑ for some parameter ϑ. Here, the probability distributions F ϑ are supposed to form a location-scale family with finite second moments, i.e. ϑ = (m, s) ∈ R × (0, ∞) with F (m,s) (x) = F (0,1) x−m s , x ∈ R, and F (0,1) is standardized to mean zero and unit variance. We assume ϑ = (m, s) to be temporally constant at each location within the same time period, but allow the values to vary among different locations and different time periods. The values of m and s will essentially be estimated from the bulk of the distribution, not the tail, and thus, they can often be extracted accurately from forecasts. Within the same time period and at the same location, the observable variables V obs 1 , . . . , V obs N are assumed to be subsequent N elements of a stationary time series (V obs t ) t∈Z . Furthermore, we assume that the standardized distribution F (0,1) belongs to the max-domain of attraction of some univariate extreme value distribution G ξ , ξ ∈ R, i.e. there are sequences (a n ) n∈N , a n > 0, and (b n ) n∈N , b n ∈ R, such that for 1 + ξx > 0. As the second moment of F (0,1) is assumed to be finite, we have ξ < 0.5. Under some conditions on the regularity and the dependence of the stationary sequence V obs 1 , V obs 2 , . . ., we obtain that where a n = a n θ −ξ and b n = b n − ξ −1 (1 − θ −ξ ) for some θ ∈ (0, 1] called extremal index (cf. Coles 2001;Leadbetter et al 1983). Let m = m(l, p) and s = s(l, p) be the mean of the variable at location l and period p and its standard deviation, respectively. Let be the generalized extreme value distribution (GEV). Then, considering the maximum V obs max = V obs max (l, p) for large N, we have approximately that V obs max (l, p) − m(l, p) s(l, p) ∼ G ξ obs ,μ obs (l),σ obs (l) . (2) Here, the GEV parameters are assumed to be the same for every time period, which, in general, enables us to estimate the parameters for current and future time periods from past data. As common in many applications, the extreme value index ξ is also assumed to be constant in space. Under the ideal assumption that V obs i ∼ F (m,s) and that m and s can be determined exactly, the GEV parameters ξ obs , μ obs and σ obs are constant in space, as well. In practice, however, the observed variable of interest is subject to measurement errors whose distribution is spatially varying. Further, m and s often need to be extracted from forecasts with limited spatial resolution. To account for these difficulties, we allow μ obs (l) and σ obs (l) to depend on the location l, while the extreme value index ξ obs is assumed to be constant in space, as common in many applications. In contrast to μ obs and σ obs , m(l, p) and s(l, p) vary in space and time and may be interpreted as normalizing constants that will be the same for observation and forecasts. As m(l, p) and s(l, p) are defined as mean and standard deviation of the variable of interest, the parameters μ obs (l) and σ obs (l) are uniquely determined. Marginal transformation yields that is standard Gumbel distributed for every location l and time period p.
Perceiving the set of locations as a subset of R 2 and the set of periods as a subset of Z, the transformed observations can be regarded as realizations of a spatio-temporal random field {X obs (l, p), l ∈ R 2 , p ∈ Z}. While we assume that the spatial random fields {X obs (l, p), l ∈ R 2 }, p ∈ Z, are independent and identically distributed, we allow for a non-trivial spatial dependence structure. Here, we use the class of Brown-Resnick processes that can be defined for arbitrary dimensions D (Brown and Resnick 1977;Kabluchko et al. 2009): Let = i∈N δ U i be a Poisson point process on R with intensity e −u du and, independently of , let W i , i ∈ N, be independent copies of a zero-mean Gaussian random field {W (s), s ∈ R D } with stationary increments and semi-variogram γ (·) defined by Then, the random field Z defined by and called Brown-Resnick process associated to the semi-variogram γ , is stationary and max-stable with standard Gumbel margins and its law only depends on the semi-variogram γ (Kabluchko et al. 2009). For the application of the Brown-Resnick model to observed data with locations in R 2 , we propose to restrict to semi-variograms from a flexible parametric subclass, such as semi-variograms of the type with ϑ = (s, b, ζ, α) for s, b > 0, ζ ∈ (−π/4, π/4] and α ∈ (0, 2]. Here, the matrix A(b, ζ ) ∈ R 2×2 allows for geometric (elliptical) anisotropy, i.e.

Modeling by a bivariate random field
In this section, we also take into account the dependence between the observed maximum V obs max and its forecast V pred max . As V pred max is a forecast for V obs max , it seems reasonable to use a GEV model similar to the one described in Section 2 with possibly different parameters ξ pred , μ pred (·) and σ pred (·), i.e.
(cf. Eq. 2). Marginally transforming V pred max analogously to Eq. 3 yields a random field {X pred (l, p), l ∈ R 2 , p ∈ Z} with standard Gumbel margins. Thus, we end up with bivariate spatial random fields {(X obs (l, p), X pred (l, p)), l ∈ R 2 } which are assumed to be independent and identically distributed for p ∈ Z.
A bivariate Brown-Resnick process can be constructed in the following way (cf. Molchanov and Stucki 2013;Genton et al 2015): Let i∈N δ U i be a Poisson point process on R with intensity measure e −u du. Further, let W i , i ∈ N, be independent copies of a bivariate centered Gaussian process W = (W (1) , W (2) ) = {(W (1) (s), W (2) (s)) : s ∈ R D } such that the pseudo cross-variogram (Clark et al. 1989;Papritz et al. 1993) does not depend on s ∈ R D . Analogously to the univariate Brown-Resnick process, it can be shown that the bivariate Brown-Resnick process Z = (Z (1) , Z (2) ) defined by is max-stable and stationary. Its law only depends on the pseudo cross-variogram γ .

Remark 1
The fact that (γ ij (h)) i,j =1,2 can be defined independently of s ∈ R D implies that W is intrinsically stationary, i.e. the process {W (s+h)−W (s) : s ∈ R D } is stationary for every h ∈ R D . Both conditions, however, are not equivalent as the definition of γ 12 (h) might depend on s ∈ R d even if W is intrinsically stationary. For instance, if both components are independent, we have γ 12 (h) = 2γ 11 (s+h)+2γ 22 (s) for the off-diagonal element of pseudo cross-variogram. By way of contrast, intrinsic stationarity is equivalent to the cross variogram h → (E(W i (s + h) − W i (s))(W j (s + h) − W j (s))) i,j =1,2 being independent of s.
Indeed, Molchanov and Stucki (2013) already gave necessary and sufficient conditions for a multivariate process of Brown-Resnick type to be stationary. For a fixed intensity e −u du of the Poisson point process, the conditions on Gaussian processes given in Theorem 5.3 in Molchanov and Stucki (2013) can be shown to be equivalent to the conditions on the process W stated above (if we additionally require Z to have standard Gumbel margins) by a straightforward computation. Thus, the Gaussian processes in the above definition of bivariate Brown-Resnick processes are essentially the only ones that yield a stationary max-stable process.
In the following, we investigate the structure and the asymptotic behavior of bivariate variograms that are translation invariant, refining the result by Papritz et al. (1993) that lim h→∞ γ 12 (h)/γ 11 (h) = 1 if γ 11 is unbounded. This allows us to find valid models for bivariate Brown-Resnick processes. The following theorem, as well as the statements above, immediately extend to the general multivariate case.
Proof For i, j ∈ {1, 2}, and h ∈ R D , we obtain where we used the Cauchy-Schwarz inequality for both inequalities. Analogously, we get the assessment Thus, the assertion of the theorem follows with γ 0 = γ 11 .
As the components of a translation invariant bivariate pseudo cross-variogram only differ by a function that may increase only with a rate of order O( √ γ 0 (h)) (Theorem 1), a reasonable and not too restrictive model for the corresponding bivariate Gaussian random field W = (W (1) , W (2) ) is given by where V 1 is a univariate Gaussian random field with stationary increments and semivariogram γ 0 and V 2 is a bivariate stationary Gaussian random field with bivariate cross-covariance function C(h) = (C ij (h)) i,j ∈{1,2} , independent from V 1 . Then, the pseudo cross-variogram γ of W has the form Analogously to the univariate case, we propose to restrict to a parametric subclass of semi-variograms for γ 0 such as where σ, κ > 0 and β ∈ (0, 1). Here, γ 0 is a valid univariate variogram as h → h 2 is a variogram and λ → λ/(λ + 1) β is a Bernstein function (cf. Berg et al 1984;Schilling et al 2010). Note that γ 0 is a variogram of power law type modified to be smooth at the origin.
For the bivariate cross-covariance C, we propose to use a parsimonious version of the bivariate Matérn model (cf. Gneiting et al 2010), which is a bivariate generalization of one of the most widely used models in geostatistics, the Matérn model (cf. Guttorp and Gneiting 2006;Stein 1999, for example). In the bivariate Matérn model, each component of C is a Matérn covariance function which we parametrize in the way suggested by Handcock and Wallis (1994) for a 1 , a 2 , a 12 , σ 1 , σ 2 , ν 1 , ν 2 , ν 12 > 0 and suitable ρ ∈ [−1, 1].
Here, analogously to the parsimonious version of the bivariate Matérn model which is based on a different parametrization (Gneiting et al. 2010), we set a 1 = a 12 = a 2 = a > 0 and ν 12 = 1 2 (ν 1 + ν 2 ). Then, by Thm. 3 in Gneiting et al. (2010), C is a valid bivariate cross-covariance model if and only if To increase the flexibility of the model, we further add a spatially constant effect with variance c 2 in the second component. Thus, C has the form Note that as the common summand γ 0 is smooth at the origin, the behavior of γ ii near the origin, i.e. the differentiability of W (i) , depends only on the behavior of C which can be modeled flexibly by the smoothness parameters ν 1 and ν 2 of the bivariate Matérn model. In particular, as h → 0 and for some k(a, ν) > 0, we have Stein 1999). Furthermore, the sample paths are m times differentiable if and only if ν > m (Gelfand et al. 2010). The behavior of the γ ii as h → ∞, which has to be the same for all components by Theorem 1, is parameterized by β as we have To increase the applicability of our model to real data located in R 2 , we further allow for geometric anisotropy, replacing h by is the anisotropy matrix defined in Eq. 5. Thus, we obtain the variogram model γ (ϑ; ·) given by for i = 1, 2 and h ∈ R 2 where ϑ = (σ, κ, b, ζ, β, c, σ 1 , ν 1 , σ 2 , ν 2 , a, ρ).

Model fitting
In the following, we will assume that data v obs max (l i , p) and v pred max (l i , p) for the maximal observed and forecasted variable of interest at stations l i , i = 1, . . . , n l , and time period p = 1, . . . , n p are available.

Fitting of the univariate model
Let henceforth be k ∈ {"obs", "pred"}. We concentrate here on the estimation of the GEV and max-stable parameters assuming that the unknown mean m(l i , d) and standard deviation s(l i , p) of the underlying distribution F have already been estimated bym(l i , p) andŝ(l i , p), respectively. An example for the later estimates can be found in Section 6. Given the estimatesm(l i , p) andŝ(l i , p), we obtain the standardized data which are assumed to be GEV distributed with parameters ξ k , μ k (l i ) and σ k (l i ).
We assume that the parameters are independent between the stations. This can be justified by measurement errors and the fact, that the forecasts used to estimate m(l i , d) andŝ(l i , d) are not directly available for the station l i but only for the closest grid point. Furthermore, we face model errors, e.g. misrepresentation of orographic effects. Effects stemming from the environment of the measurement stations might even be the major cause for the variations. As genuine variation and measurement errors cannot be separated in our set up, we estimate the parameters separately for each station, via maximum likelihood. As the standardized data y k are assumed to be temporally independent, by Smith (1985), the maximum likelihood estimators (ξ k (l i ), 1 ≤ i ≤ n l , are asymptotically normally distributed if ξ k > −0.5. Thus, under the hypothesis thatξ k = 1 n l n l i=1ξ k (l i ) is the true shape parameter of the GEV at each station, the standardized residualŝ are approximately standard normally distributed, where Var(ξ k (l i )) is the variance of ξ k (l i ) estimated via the Hesse matrix of the log-likelihood function. Thus, the three hypotheses that the shape parameter, the location and the scale parameter are spatially constant can be checked indirectly via one-sample Kolmogorov-Smirnov tests of the corresponding residuals for the standard normal distribution. Here, although the data for different locations may be dependent, we assume that the normalized estimated parameters are independent. By transformation (3), the estimatesξ k ,μ k (l i ) andσ k (l i ) yield normalized data These can be compared to a standard Gumbel distribution via Kolmogorov-Smirnov tests separately for each station as a goodness-of-fit test for the marginal model In order to capture the spatial dependence structure, a univariate Brown-Resnick process associated to a variogram γ k as defined in Eq. 4 is fitted to the transformed data (x k (l i , p)) 1≤i≤n l ,1≤p≤n p . Note that there exist numerous methods of inference for Brown-Resnick processes, see, for example, Engelke et al. (2015) for a comparison of different estimators. The method we will use is based on the extremal coefficient function (Schlather and Tawn 2003). For a stationary Brown-Resnick process associated to the semi-variogram γ k , the pairwise extremal coefficients are given by where denotes the standard normal distribution function (cf. Kabluchko et al 2009). This relation can be used for fitting Brown-Resnick processes to real data as the extremal coefficients θ k (s 1 , s 2 ) can be estimated well via the relation where the F -madogram ν F,k (s 1 , s 2 ) is defined by and F is the marginal distribution function of X k (s) (Cooley et al. 2006). Thus, we obtain a plug-in estimatorθ k (l i , l j ) for the extremal coefficients θ k (l i , l j ), by replacing ν F,k in Eq. 15 by an estimatorν F,k (l i , l j ), 1 ≤ i, j ≤ n l . In order to avoid propagation of errors in marginal modeling, we choose the non-parametric estimator where R p (x) denotes the rank of the p-th component of some vector x (cf. Ribatet 2013). Then, the corresponding variogram parameter vectorθ k can be estimated by a weighted least squares fit ofθ k (l i , l j ) to θ k (l i , l j ) as given in Eq. 14. As proposed by Smith (1990), we choose weights that depend on the (estimated) variance Var(θ k (l i , l j )) of the estimator θ k (l i , l j ). Thus, we obtain the estimator We will further discuss the estimation of the variance of θ k (l i , l j ) in Section 6.

The post-processing procedure
As the bivariate Brown-Resnick process model developed in this paper describes the joint distribution of the observed and forecasted maxima of the variable of interest, it allows for some spatial post-processing of the original forecast. In this section, we will describe the resulting post-processing procedure in more detail and provide some tools to verify the procedure and the underlying model.

Post-processing via conditional simulation
Letξ obs ,μ obs (·),σ obs (·),ξ pred ,μ pred (·),σ pred (·) andθ be estimates for the GEV and variogram parameters derived from past training data. Further, assume that we have v pred max (l i , p),m(l i , p) andŝ(l i , p), i = 1, . . . , n l , based on forecasts for n l locations l 1 , . . . , l n l and a time period p in near future. Then, we obtain an arbitrary number K of realizations (v j (l i )) 1≤i≤n l , j = 1, . . . , K, of the modeled distribution of the maximal observation conditional on the forecast by the following three-step procedure: 1. Transform v pred max (·, p) to standard Gumbel margins: , whereμ pred v andσ pred v are given by Eq. 13 for k = pred. 2. Conditional simulation of a bivariate Brown-Resnick process given its second component: Simulate K independent realizations (x obs j (·), x pred j (·)), j = 1, . . . , K, of a bivariate Brown-Resnick process associated to the pseudo crossvariogram γ (θ obs ; ·) with standard Gumbel margins conditional on x pred j (·) = x pred (·). 3. Transform x obs j (·) to GEV margins: For j = 1, . . . , K, set v j (·, p) =σ obs v (·, p) exp(ξ obs x obs j (·)) − 1 ξ obs +μ obs v (·, p), whereμ obs v andσ obs v are given by Eq. 13 for k = pred. The random fields obtained by this three-step procedure can be interpreted as postprocessed probabilistic forecasts for the maxima of the variable of interest at time period p. While the first and the third steps only consist of marginal transformations, the conditional simulation in the second step is the challenging part of the procedure. For this step, the algorithm by Dombry et al. (2013) can be used. Note that the algorithm, which has originally been designed for conditional simulation of univariate Brown-Resnick processes, can directly be transferred to the multivariate case by perceiving the multivariate processes as univariate processes on a larger index set. However, the computations will be computationally expensive, in particular if the number of conditioning locations gets large.

Verification
In practical applications, the proposed post-processing procedure and the underlying model need to be verified. Here, we do not only consider the full bivariate Brown-Resnick model which forms the base of the post-processing procedure, but also intermediate models such as the marginal GEV model and the univariate model. This allows us to evaluate the effect of incorporating the spatial dependence structure and the forecasted maxima, respectively.
For the evaluation and verification of the different models, we choose a standard verification score in probabilistic prediction, the (negatively oriented) continuous ranked probability score (CRPS) (cf. Gneiting and Raftery 2007): where F is a real-valued distribution and x ∈ R m is an observation. The continuous ranked probability score is a strictly proper scoring rule, i.e. CRP S(F, x)F (dx) ≤ CRP S(G, x)F (dx) for all distribution functions F and G with finite first moments and equality if and only if F = G. This indicates that the mean CRPS for different observations is the smaller, the better the predicted distribution F fits to the true distribution of the observation data. The usefulness of the CRPS for evaluating extremes was shown by Friederichs and Thorarinsdottir (2012).
First, we evaluate the improvement in predictive quality by fitting the parameters of the GEV model given in Eqs. 12 and 13 to the observations instead of the forecast, i.e. we calculate CRPS obs (l i ) and CRPS pred (l i ) where for every station l i , 1 ≤ i ≤ n l , and k ∈ {"obs", "pred"}. For the calculation, we employ the closed formula for the CRPS of a GEV provided by Friederichs and Thorarinsdottir (2012). For ξ = 0, they obtain where l is the lower incomplete gamma function. Furthermore, the CRPS for the GEV fitted to the observations can be compared to the CRPS of the original forecast where F orig l i ,p denotes the distribution of the original (probabilistic) forecast for the maximum of the variable of interest at location l i within time period p. If this forecast is given by an ensemble of values, such as the output of a numerical weather prediction model, for example, F orig l i ,p corresponds to the empirical distribution function of this sample. If the forecast corresponds to a single value, CRPS orig (l i ) reduces to the mean absolute error.
Finally, the full bivariate model and, thus, the proposed post-processing procedure can be evaluated and verified by considering the CRPS where F l i ,p|v pred max denotes the distribution of the observed maximum at location l i , 1 ≤ i ≤ n l , within time period p conditional on v pred max , i.e. the distribution of the post-processed forecast, with the CRPS of the original forecast, CRPS orig (l i ).

Application to real data
In this section, we will apply the fitting and verification procedure described in Section 4 to real wind gust data consisting both of observation and forecast data. We will see that, even though the marginal distributions are fitted quite well, a forecast based on the single GEV for the observations is not able to outperform the forecast by the numerical weather prediction model. However, the results for the bivariate model indicate that the post-processing procedure proposed in Section 5.1 improves the predictive quality. We also discuss the uncertainty of the obtained estimates.

The data
We consider observed as well as forecasted wind speed data provided by Germany's National Meteorological Service, the Deutscher Wetterdienst (DWD). We use observations from 218 DWD weather stations over Germany at 360 days from March 2011 to February 2012. The weather stations register mean and maximum wind speed on an hourly basis. Due to the inertia of the measuring instruments, the maximum wind speed approximately corresponds to the highest 3-second average wind speed. Here, we use the maximum wind speed v obs max (l, d) between 08 UTC and 18 UTC for each station l and each day d.
Furthermore, for each day, forecasts for the wind speed maxima and for the hourly mean wind speed both in 10m height above ground and for the 10-hourperiod from 08 UTC to 18 UTC are available. The forecasts are provided by the COSMO-DE ensemble prediction system (EPS) operated by DWD. COSMO-DE (Baldauf et al. 2011) is a non-hydrostatic limited-area numerical weather prediction model that gives forecasts for the next 21 hours on a horizontal grid with a width of 2.8km covering Germany and neighboring countries. The COSMO-DE EPS is initialized every 3 hours. Here, we take the forecasts that are initialized at 00 UTC. Using the forecasts for the nearest grid point of a station, we obtain forecasts v (1) mean (l, d, τ ), . . ., v (20) mean (l, d, τ ), τ ∈ {9, 10, . . . , 18}, and v (1) max (l, d), . . ., v (20) max (l, d) for every weather station l and every day d. Here, v (j ) mean (l, d, τ ) and v (j ) max (l, d) denote the forecast for the mean wind speed between (τ − 1) UTC and τ UTC and the maximal wind speed, respectively, at station l and day d, forecasted by the j th COSMO-DE ensemble member.
For the application of our model with a stationary spatial dependence structure, in the following, we will restrict ourselves to forecasted and observed data for 119 DWD stations north of 51 • N, denoted by l 1 , . . . , l 119 , as the northern part of Germany has a much more homogeneous topography than the southern part.

Applying the univariate model
As the wind speed observations correspond to 3-second averages, the daily maximal wind gusts v obs max can be perceived as the maximum of a long time series. Further, the distribution of a single wind speed is frequently modeled by a Weibull or a Gamma distribution (e.g., Conradsen et al 1984;Pavia and O'Brien 1986;Sloughter et al 2007), that is, given a fixed shape parameter of the Weibull or Gamma distribution, respectively, which is spatially and temporally constant, the single observations may be assumed to come from a location-scale family of distributions. These considerations give support to the usage of the GEV model presented in Section 2 as a model for the maximal wind speed V k . Fitting a GEV distribution to the standardized wind speeds y k (l i , d) as defined in Eq. 10 needs the estimatesm(l i , d) andŝ(l i , d) for the mean and the standard deviation of the underlying wind speed distribution. We aim to extract these characteristics from the forecast. Here, instead of direct estimates for the mean and the standard deviation, we usê Even though not providing consistent estimates for mean and standard deviation of the underlying distribution,m(l i , d) andŝ(l i , d) lead to a consistent normalization in the following sense: If the data v obs max and the forecasts v (j ) mean are affinely transformed (i.e. the parameters m and s are modified) in the same way, the normalized data y obs (l i , d) remain unchanged. This choice ofm(l i , d) andŝ(l i , d) also ensures the identifiability of the GEV parameters μ k (l i ) and σ k (l i ).
As described in Section 4, the GEV parameters for the standardized observations can be estimated via maximum likelihood and the hypotheses that these are spatially constant can be checked via Kolmogorov-Smirnov tests. For ξ obs , we obtain a pvalue of 0.194. The analogous tests for μ obs and σ obs both yield p-values smaller than 2.2 · 10 −16 . Thus, the hypotheses that the residuals of the estimates of μ obs and σ obs follow a normal distribution both can be rejected and, consequently, we stick to the assumption that the GEV parameters, location and scale, differ among the stations.
In contrast to the location and the scale parameter, the shape parameter of the GEV will be assumed to be spatially constant in northern Germany with the value ξ obs =ξ obs = 1 119 119 i=1ξ obs (l i ) = 0.043 (empirical standard deviation of the sample {ξ obs (l i ) : i = 1, . . . , 119}: 0.049). Note, however, that the estimated shape parameter differs significantly (to a 5%-level) from the mean value in case of 20 stations. For six of these stations, it even differs highly significantly (to a 1%-level), and four of them even to a 0.1%-level. The parameter estimatesμ(l i ) andσ (l i ), 1 ≤ i ≤ 119 for the location and scale parameters, respectively, obtained by maximum likelihood estimation with fixed shape parameter ξ obs =ξ obs are depicted in Fig. 1a. Note that the estimated vectors of location and scale parameters show a strong empirical correlation of 0.97. By Eq. 3, the data can be transformed to standard Gumbel margins. Kolmogorov-Smirnov tests performed separately for each station yield p-values of at least 0.098 with a mean value of 0.718 which indicates that the GEV model fits quite well for all the stations.
As a fit of the GEV distribution to the forecast is needed for both verification of the marginal model and the bivariate Brown-Resnick model, we repeat our analysis which suggests that the distribution of v pred max should be close to a GEV distribution. Note that this choice of v pred max is in complete accordance to the choice ofm(l i , d) as maximal mean of all the ensemble members in Eq. 20.
As the Kolmogorov-Smirnov test of the normalized estimates for ξ pred yields a p-value of 0.53 and the estimates differ significantly from the mean for seven stations (for three of them very significantly), we may assume a shape parameter of ξ pred =ξ pred = 1 119 119 i=1 ξ pred (l i ) = 0.028 (empirical standard deviation of the sample {ξ pred (l i ) : i = 1, . . . , 119}: 0.044) at every station in northern Germany. However, the hypotheses that the estimates for the location and the scale parameter follow a normal distribution have been both rejected. The maximum likelihood estimatesμ pred (l i ) andσ pred (l i ), 1 ≤ i ≤ 119, with fixed shape parameter are shown in Fig. 1b. Here, the empirical correlation of the vectors of estimated location and scale parameters is just as strong as in case of the observations. Kolmogorov-Smirnov tests of the transformed data x pred (l i , d) for every station yield p-values of at least 0.142 with and equal 0.748 in average which also indicates an appropriate fit.
The spatial dependence is modeled by a univariate Brown-Resnick process which is obtained by a weighted least squares fit of the extremal coefficient function. Here, the weights depend on the variance of the estimatorsθ obs (l i , l j ) (see Section 4) estimated by a jackknife procedure where the extremal coefficients are reestimated leaving out one month of data. The estimated extremal coefficientsθ obs and the fitted extremal coefficient functioñ are displayed in Fig. 2. Here, the estimated coefficients seem to be fitted quite well. For verification, we first calculate the mean CRPS for each of the two models given by Eq. 12, CRPS obs (l i ) and CRPS pred (l i ), for every station l i , 1 ≤ i ≤ 119. Then, the improvement or deterioration by using the GEV distributions of the observations instead of the forecasts is expressed in terms of the skill score (e.g., Gneiting and Raftery 2007) which has the value 1 in case of an "optimal" model which equals v obs max a.s. and the value 0 if both models yield the same result. Here, S(l i ) > 0 for 115 of 119 stations. For the skill score corresponding to the mean CRPS averaged over all the stations, we obtain Note that, for simplicity, the reference model (12) for the predictions is based on the maximal ensemble members v pred max (l i , d) only and further information given by the maximal wind speed forecasted by the other ensemble members are neglected. Thus, we further compare the CRPS of the GEV model for the observations, CRPS obs (l i ), with the CRPS of the original COSMO-DE ensemble, CRPS orig (l i ), taking the ensemble forecast as a probabilistic forecast with equal probability for each ensemble member. Here, the skillS(l i ) = 1 − CRPS obs (l i )/CRPS orig (l i ) is positive for 37 of 119 only, with the skill of the averaged CRPS being approximately −0.032. As the skill score is slightly negative, the COSMO-DE ensemble forecast seems to contain more information than our marginal model.
Note that, for a fair comparison, we should avoid the validation of our model on the same data that have been used for the model fit. Hence, we perform cross validation: Separately for every month, the GEV parameters are reestimated leaving out the data for this month and using only the data for the other eleven months for the model fit. The GEV parameters estimated for different months in this way show very little variation corroborating the assumption that they are constant in time. Further, the verification results above are confirmed: We obtain skill scores of 0.285 for the CRPS compared with the GEV model for the forecast and −0.048 compared to the COSMO-DE ensemble.

Applying the bivariate model
A bivariate Brown-Resnick process is fitted to the transformed data according to Section 4. Here, as a preliminary analysis suggests, the parameter ρ is set to the maximal value yielding a valid variogram, i.e.
The estimateθ of the remaining eleven pseudo cross-variogram parameters leads to the fitted extremal coefficient functioñ θ(l i , l j ) = θ k 1 ,k 2 (l i , l j ) k 1 ,k 2 ∈{"obs","pred"} = 2 γ k 1 k 2 (θ; l i − l j )/2 k 1 ,k 2 ∈{"obs","pred"} . Figure 3 presents the estimated extremal coefficientsθ k 1 ,k 2 (l i , l j ), and the fitted extremal coefficient functionsθ k 1 ,k 2 (·, ·) for k 1 , k 2 ∈ {"obs", "pred"}. As illustrated, the fitted model seems to be appropriate with respect to the behavior of the extremal coefficient function. Figure 4 depicts a simulated realization of the corresponding Brown-Resnick process associated to the variogram γ (θ; ·) with standard Gumbel margins. The realization indicates a remarkable amount of positive correlation between x obs and x pred which emphasizes the gain of information by taking x pred into account.
In order to verify the bivariate model, we apply the post-processing procedure proposed in Section 5.1. Due to the computational complexity of the conditional simulation, we do not simulate the observations at all stations simultaneously conditional on the forecast at all locations, but perform post-processing with sample size K = 20, i.e. the size of the original COSMO-DE ensemble, at each location separately conditioning on the forecast at the same location and two neighboring grid cells only. We calculate the CRPS of the post-processed distribution, CRPS biv (l i ), and compare it with CRPS orig (l i ), i.e. the CRPS belonging to the empirical distribution of the COSMO-DE ensemble, yielding a positive skill score for 82 of 119 stations where the skill score related to the mean CRPS equals 0.128 (0.111 cross-validated). If we increase the sample size K = 100, we obtain an improved skill score of 0.164 (0.147 cross-validated), being positive for 104 of 119 stations. Thus, we conclude that the post-processing procedure based on the bivariate Brown-Resnick model is able to improve the forecast given by COSMO-DE ensemble.

Uncertainty assessment
We assess the uncertainty in the estimation of the model parameters via parametric bootstrap, i.e. we simulate data sets from the fitted model and repeat the estimation procedure. As a detailed analysis of the numerical model producing the forecasts is beyond the scope of our paper, we do not account for the uncertainty in the estimatesm(l i , d) andŝ (  To this end, we draw 360 independent realizations from the bivariate Brown-Resnick process fitted in Sections 6.2 and 6.3 using the simulation algorithm by Dombry et al. (2016). The procedure is repeated 100 times yielding 100 independent data sets of the same size as the original one. Following the steps described in Sections 6.2 and 6.3, we thus obtain 100 independent estimates.
The sample of estimates ξ obs has mean 0.043 and a standard deviation of 0.02. In order to validate the p-value of the Kolmogorov-Smirnov tests for ξ obs , we repeat these tests on the simulated data sets and obtain a p-value smaller than the original one (0.194) in 22 of 100 cases which supports the non-rejection of the hypothesis that ξ obs is spatially constant. The p-values of the tests for μ obs and σ obs are confirmed, as well.
Analogously, the results for the marginal parameters for the forecast are verified: The sample of estimates ξ pred has mean 0.03 and a standard deviation of 0.02, while the original p-value (0.53) of the test for ξ obs is undercut in 62 of 100 cases.
Further, we assess the estimation of the Brown-Resnick model parameters. The original parameter values and the sample means and standard deviations obtained from the parametric bootstrap are presented in Table 1. It can be seen that most parameters are recovered well by the estimation procedure. For some parameters of  Besides the uncertainty of the parameter estimates, we also assess the deviation of the non-parametrically estimated extremal coefficients from the parametrically estimated extremal coefficient function. To this end, for k 1 , k 2 ∈ {"obs", "pred"}, we calculate the root-mean-square error RMSE k 1 ,k 2 = ⎡ ⎣ 1 119 2 1≤i,j ≤119 θ k 1 ,k 2 (l i , l j ) − 2 ( γ k 1 ,k 2 (θ; l i , l j )/2) For the original data, we obtain the root-mean-square errors RMSE obs,obs = 0.074, RMSE obs,pred = RMSE pred,obs = 0.078 and RMSE pred,pred = 0.061. Repeating the calculation for the simulated data sets, we obtain sample mean root-meansquare errors 0.039 (for RMSE obs,obs ), 0.041 (for RMSE obs,pred and RMSE pred,obs ) and 0.037 (for RMSE pred,pred ) with standard deviations 0.0018, 0.0019 and 0.0018, respectively. Thus, the deviations for the real data are roughly twice as large as expected in case of the model being correct. One may conclude, that the extremal coefficient for real data is not only a function of the distance, but may be modulated by the topography or the climate at the stations. Finally, note that the same methodology could be used to assess the uncertainty of the post-processed forecast. To this end, different sets of estimated parameters could be used as input for the post-processing procedure described in Section 5.1 to determine the uncertainty of its output. However, a full analysis would also require an assessment of the uncertainty in the estimatesm(l i , d) andŝ(l i , d) which, as already mentioned above, is beyond the scope of this work.