A spatio-temporal model for Red Sea surface temperature anomalies

This paper details the approach of team Lancaster to the 2019 EVA data challenge, dealing with spatio-temporal modelling of Red Sea surface temperature anomalies. We model the marginal distributions and dependence features separately; for the former, we use a combination of Gaussian and generalised Pareto distributions, while the dependence is captured using a localised Gaussian process approach. We also propose a space-time moving estimate of the cumulative distribution function that takes into account spatial variation and temporal trend in the anomalies, to be used in those regions with limited available data. The team’s predictions are compared to results obtained via an empirical benchmark. Our approach performs well in terms of the threshold-weighted continuous ranked probability score criterion, chosen by the challenge organiser.


Introduction
Understanding the behaviour of environmental processes, such as precipitation, wind speed or temperature, is critical for a number of applications, including weather forecasting and predicting the effects of climate change (Cooley et al. 2007;Towe et al. 2017;Rohrbeck et al. 2018). Interest in these environmental processes can be at a single location or over large spatial areas, as well as over a range of time periods. For example, questions of interest may be what percentage of the United Kingdom will observe a mean average temperature of above 15 • C for the month of May, or what is the expected maximum Red Sea surface temperature in 2050.
In some instances, interest lies in the largest values of a spatial process, such as determining the likely spatial extent of a flood event or a heatwave; example methods and applications can be found in Davison et al. (2012), Winter et al. (2016) and Tawn et al. (2018). Approaches include max-stable processes (Smith 1990;Schlather 2002) and Pareto processes (Ferreira and De Haan 2014); these are applicable when locations experience concomitant extremes, which may not always be the case. More recent literature that addresses this issue includes work based on Laplace random fields (Opitz 2016), the Gaussian scale mixture model of Huser et al. (2017), and the conditional spatial model of Wadsworth and Tawn (2019). Review papers on the topic of spatial extremes include Davison and Huser (2015) and Davison et al. (2019).
There has been work to extend some of these approaches for modelling spatiotemporal dependence. While a range of statistical techniques, such as Gaussian processes, exist for analysing spatio-temporal data (Cressie and Wikle 2011;Wikle et al. 2019), current spatio-temporal extremes models are limited to problems of moderate dimensions. For example, spatio-temporal max-stable processes are considered by Davis et al. (2013) and Huser and Davison (2014), but these models suffer from being computationally expensive to fit. Alternative approaches include the skew-t model of Morris et al. (2017), and a space-time extension of the conditional extremes approach (Simpson and Wadsworth 2020), neither of which are currently feasible for the very high number of dimensions we will consider here. A key aspect that unites these models and applications is the importance of understanding the spatio-temporal dependence.
Our analysis focuses on gaining a better understanding of Red Sea surface temperatures. The Red Sea supports a rich and diverse ecosystem and these high temperatures may result in coral bleaching. This degradation of the biodiversity of the Red Sea could have an impact on the local economy, with the north-west coast being a particularly popular tourist destination (Fine et al. 2019). This will also have impact on the fishing industries in the area that support a number of communities situated around the Red Sea. Previous studies such as Allison et al. (2009) andBrander (2010) have investigated the complex impacts of climate change on fisheries. As a result, it is important to understand the changing behaviour of the Red Sea and determine which regions are more susceptible to global warming. Examples of existing extreme value analyses of Red Sea surface temperatures can be found in Hazra and Huser (2020) and Simpson and Wadsworth (2020).
The Red Sea surface temperature data for this challenge are not direct measurements, but are instead daily anomalies available from the period 1985-2015, with the Red Sea being discretised into 16703 grid cells (locations). Data for some of the locations and times have been artificially removed. The aim of the challenge was to issue a probabilistic forecast of the temperature anomalies within specific space-time regions; predictive performance of our proposed model against the benchmark was assessed by using a specific metric, in this case the threshold-weighted continuous ranked probability score (twCRPS) of Gneiting and Ranjan (2011). For more details on the data set and the challenge, see Huser (2020).
In order to model the sea surface temperature anomalies, we adopt a two-step approach which accounts for spatio-temporal variations in the marginal behaviour and dependence structure. The marginal behaviour at a spatial location is modelled via an extreme mixture model, which allows us to capture differences in the body and tails of the data. We find interesting spatial variations in the model parameters, and these can be linked to geographical features of the Red Sea. Conditional on the fitted marginal models, spatio-temporal dependence is then modelled by using a localised Gaussian process. While Gaussian processes imply a restrictive spatiotemporal dependence structure on the extremes, existing spatial extreme methods are too computationally expensive for large data sets. For those locations with insufficient data to fit a Gaussian process, the predictive distribution is estimated through pooling the observed data over a spatio-temporal window.
The paper is structured as follows: Section 2 details the data analysis and the methods used to model both the marginal and dependence behaviour of the sea surface temperature anomalies; Section 3 assesses performance with respect to the provided benchmark estimate and the twCRPS; Section 4 concludes with a discussion of the ways in which the analysis could be improved.

Challenge description and benchmark prediction
Let Y (s, t) be the sea surface temperature recorded at grid cell s ∈ S ⊂ R 2 and time t ∈ T = {1, . . . , T }, where |S| = 16703 and T = 11315. Our analysis focuses on the anomalies whereμ(s, t), (s, t) ∈ X = S × T , is an estimated mean effect. Parts of the anomaly data {A(s, t) : (s, t) ∈ X } were masked artificially by the EVA 2019 data competition organiser; see Huser (2020) for more details. We denote the subset of X for which anomalies are available by X T and we refer to {A(s, t) : (s, t) ∈ X T } as the training data set.
High temperature events are likely to affect large scale areas for sustained periods of time. Therefore interest lies in the distribution of the spatio-temporal minimum rather than the minimum in a particular point in time and space. For each particular space-time point, a local neighbourhood N (s, t) ⊂ X is constructed. Herein, we consider where B(s, r) is a ball centred at location s with radius r = 50 km, and we are interested in the smallest anomaly within the local neighbourhood, The aim is to predict X(s, t) for certain space-time points within a validation set X V , (s, t) ∈ X V ⊂ X \ X T , by providing a corresponding distribution function F s,t (·); these validation points all lie in the period 2007-2015. Huser (2020) derives an estimate, termed the benchmark prediction, for the distribution function F s,t (·), (s, t) ∈ X V , in three steps. Firstly, the set N C of neighbourhoods with no missing values is identified, i.e. N C = {(s, t) : N (s, t) ⊂ X T }. Next, the spatio-temporal minima as defined in Eq. 1 are determined for the neighbourhoods in N C . Finally, the benchmark prediction for F s,t (·) is defined as the empirical cumulative distribution functionF ben of the values {X(s, t) : (s, t) ∈ N C }.

Exploring spatial and temporal features
The benchmark prediction,F ben , assumes that the data are stationary in space and time. Evidence suggests that this assumption may not be valid. As an illustrative example, we focus on the temporal component (the same idea holds for the spatial component) and separate the data into two time horizons H and H * , where H = {1, . . . , T * } and H * = {T * + 1, . . . , 11315} with T * ∈ (1, 11315).
We can test whether there is a difference between the empirical cumulative distribution functions evaluated in these two time horizons, denotedF ben H andF ben H * . We adopt the two-sample Kolmogorov-Smirnov test to detect whether there is any change in the distribution function. The Kolmogorov-Smirnov test statistic is where sup is the supremum function, and D H,H * is tested against a 5% significance level. We choose T * = 8030, which is equivalent to separating the data into two time periods : 1985-2006 and 2007-2015. The estimated test statistic is D H,H * = 0.125, corresponding to a p-value of 0.004. Thus, we reject the hypothesis that the cumulative distribution function remains stationary over the time period.
We are also interested in how the spatial dependence changes as a function of distance. The variogram can be used to determine the spatial dependence of the spatial process Y and is given by for two particular sites (w, w ), where δ is the semi-variogram function (Matheron 1963). If the process Y is stationary, then δ(w, w ) = δ(w − w ). The variogram explains the spatial dependence of average values of the process, however our interest also lies in how the spatial dependence changes for the largest values of the process. The F-madogram is an equivalent function to the variogram that instead describes the spatial dependence of the extremes (Cooley et al. 2006). The F-madogram is given by where F is the distribution function of the spatial process. These two functions help us to determine the dependence structure of the anomaly data and plots are provided in the Appendix. Across subsets of S, the empirical variograms indicate a variation in the spatial dependence structure, while the F-madograms exhibit a similar spatial dependence for the highest anomalies. We further examine a potential trend in the dependence by estimating variograms and Fmadograms for the horizons H = {1, . . . , 5000} and H * = {5001, . . . , T = 11315}. We find little, or no, change in the dependence of the highest anomalies, while the overall spatial dependence increases slightly for some parts of Red Sea.

Space-time moving cumulative distribution function
The main method we propose in Section 2.2 relies on there being sufficient data in a space-time neighbourhood of each validation point (s, t) ∈ X V . For those validation points with insufficient data, we could resort to using the benchmark predictionF ben , given in Section 2.1.1. However, since there is some evidence of non-stationarity in X(s, t), we propose to alter this slightly by deriving cumulative distribution functions across local space-time moving cylinders. Instead of pooling all of those locations that have complete observations, we use a neighbourhood N * (s, t) with where B(s, r) is a ball centred at location s with radius r = 75 km. The estimatê F * s,t (·) for F s,t (·), (s, t) ∈ X V , is then derived as the empirical cumulative distribution function of observations in X(s , t ) : For some of the validation points (s, t) ∈ X V , the neighbourhood N (s, t) contains space-time points for which the anomalies are available, i.e. N (s, t) ∩ X T = ∅. Then, X(s, t) cannot be larger than any of the observed anomalies A(s , t ) : (s , t ) ∈ N (s, t) ∩ X T . Therefore, the minimum observed anomaly within the neighbourhood N (s, t) provides an upper bound for X(s, t). We take this property into account by settingF * The same approach is also applied to our estimates derived in Section 2.2.

Introduction
The empirical distribution derived in Section 2.1.3 captures large-scale spatiotemporal trends in X(s, t) (s ∈ S; t ∈ T ), but does not account for short-term variations in the anomalies which the challenge asks us to predict. This section introduces our model for these short-term variations. Instead of {X(s, t) : (s, t) ∈ X }, we first model the process {A(s, t) : (s, t) ∈ X }, and then derive our estimates for F s,t (·) : (s, t) ∈ X V . Section 2.1.2 shows that the data exhibit spatio-temporal dependence. This motivates pooling data across nearby space-time points in the training data set X T to model the anomaly A(s , t ), where (s , t ) ∈ X \ X T refers to a space-time point for which the recorded anomaly is not available. We may consider modelling {A(s, t) : (s, t) ∈ X } via a Gaussian process. However, such an approach would have to account for, amongst others, the spatial variation in both the dependence structure and the marginal distributions of the anomalies.
We address these aspects in two modelling steps. In the first step, we fit a marginal model to the observed values of A(s, 1), . . . , A(s, T ) separately for each grid cell s ∈ S in Section 2.2.2, yielding an estimated distribution function G s (·). We then consider the process of transformed random variables {U(s, t) = G s [A(s, t)] : (s, t) ∈ X }, which provides approximate uniform marginal distributions for each grid cell. In the second step, in Section 2.2.3, we model the non-stationary process {U(s, t) : (s, t) ∈ X } and derive estimates for F s,t (·) : (s, t) ∈ X V .

Marginal modelling
Since the anomalies represent deviations from an estimated mean, we first test whether A(s, 1), . . . , A(s, T ) (s ∈ S) are normally distributed; for notational brevity, we drop the location index s in the rest of this subsection. The Lilliefors test (Lilliefors 1967) rejects the null hypothesis of A(1), . . . , A(T ) being normally distributed at the 1% significance level for 13,200 of the 16,703 locations. A closer examination of the model fit indicates that the normal distribution is poor at capturing the lower and upper tail behaviour of the anomalies, while providing a very good fit for the bulk of the distribution.
Extreme value theory provides us with an asymptotically justified modelling framework for the highest (lowest) values of a continuous random variable Z. We adopt a peaks-over threshold approach and consider Z | Z > u (u ∈ R). For some suitably high threshold u, exceedances by Z of u are modelled using the generalised Pareto distribution GPD(ψ, ξ ) with where {z} + = max {0, z}, and (ψ, ξ ) ∈ R + × R are the scale and shape parameters respectively. Pickands (1975) shows that this family of distributions arises as the only possible non-degenerate limit for scaled excesses of Z as u tends to the upper end point of the distribution. The lower tail of Z is modelled in the same way by first picking a sufficiently small threshold and then fitting a generalised Pareto distribution using that P ( To improve upon the marginal fit of the normal model, we define an extreme mixture model (Frigessi et al. 2002;Behrens et al. 2004;MacDonald et al. 2011) which provides more flexibility regarding the tail behaviour of the distribution function G(·). Given thresholds and u ( < u), observations are modelled as being normally distributed within the interval [ , u], and generalised Pareto distributed otherwise, with separate parameters (ψ , ξ ) and (ψ u , ξ u ) for the lower and upper tail, respectively. The distribution function is thus (2) where (·) denotes the standard normal cumulative distribution function.
Prior to estimating the spatially varying model parameters (μ, σ, ψ , ξ , ψ u , ξ u ) in Eq. 2, we consider threshold stability plots (Coles 2001) for a subset of 50 grid cells. As a result, we choose the empirical 6% and 94% quantiles of the observed anomalies A(1), . . . , A(T ) as thresholds and u, separately for each grid cell. Parameter estimates are derived using likelihood inference; note that for fixed and u, (μ, σ ) can be estimated independently of (ψ , ξ ) and (ψ u , ξ u ). Figure 1 demonstrates the estimated model parameters over space, indicating a north-south trend in the marginal behaviour of the anomalies. Interestingly, the strong north-south trend for σ correlates with topographical features of the Red Sea; the water in the northern Red Sea is generally deeper than in the southern Red Sea. When studying the tail behaviour, most estimates for the shape parameters ξ and ξ u are negative, corresponding to the distribution of A 1 , . . . , A T being short-tailed with finite lower and upper end points; this agrees with previous studies on extreme low and high temperatures (Thibaud et al. 2016;Winter et al. 2016). We also calculate the site-wise 80% and 90% quantiles of the estimated GPD distributions. Similarly to σ , we find a north-south decrease in these quantiles, with the trend being stronger in the upper tail. Consequently, anomalies in the northern Red Sea appear to be more variable than in the southern Red Sea. To assess the fit of the extreme mixture model Eq. 2, we examine the quantile-quantile plots for two locations in Fig. 2. The plots show a very good overall fit and illustrate that the lower and upper tail behaviour are well captured.

Dependence modelling
We now require a model to capture the spatio-temporal dependence between the anomalies. For spatial extremes, a variety of such models exist, but the spatiotemporal case is less well-studied in the literature. The most well-known models for spatio-temporal extremes are max-stable processes (Davis et al. 2013;Huser and Davison 2014), however, currently available inferential methods are computationally expensive, and cannot handle the large number of space-time locations we wish to study. We instead propose the use of Gaussian processes to model the spatiotemporal dependence. As well as bringing computational efficiency to our approach, Consider a Gaussian process {Z(w) : w ∈ R m } with mean function μ : R m → R and covariance function ρ(w 1 , w 2 ), for w 1 , w 2 ∈ R m . In our application, we are interested in the Gaussian process at a finite set of space-time locations (s, t) 1 , . . . , (s, t) n ∈ S × T . The process at these locations follows a multivariate Gaussian distribution with mean vector μ ∈ R n and covariance matrix ∈ R n×n , obtained via the covariance function ρ. Suppose we want to generate observations for the first n 1 space-time locations, conditioning on observations at a further n 2 locations. The mean vector μ and covariance matrix can be partitioned to represent these two sets of locations, i.e., μ = (μ 1 , μ 2 ) for μ 1 ∈ R n 1 and μ 2 ∈ R n 2 , and = 11 12 21 22 for 11 ∈ R n 1 ×n 1 , 12 ∈ R n 1 ×n 2 , 21 ∈ R n 2 ×n 1 and 22 ∈ R n 2 ×n 2 .
Conditioning on observations z ∈ R n 2 at space-time locations (s, t) n 1 +1 , . . . , (s, t) n , we can generate observations for locations (s, t) 1 , . . . , (s, t) n 1 by sampling from a multivariate Gaussian distribution with mean vector μ * and covariance matrix * given by 22 21 . In order to focus on modelling the dependence of our temperature anomalies, we consider fitted quantiles at each spatial location, defined byû(s, t) =Ĝ s {A(s, t)} ∈ [0, 1], whereĜ s denotes the fitted extreme mixture model Eq. 2 at location s. To obtain values on a scale conducive to Gaussian process modelling, we apply a probit transformation to these empirical quantiles, obtaining z(s, t) = −1 û(s, t) ∈ (−∞, ∞), for each (s, t) ∈ X T ; −1 denotes the inverse cumulative distribution function of the standard normal distribution. Then, the marginal observations for grid cell s ∈ S, z(s, 1), . . . , z(s, T ), are Normal(0, 1) distributed. However, the size of the data makes the estimation of a single Gaussian process across S × T infeasible in R.
Instead, we fit a local Gaussian process separately for each (s, t) ∈ X V , using non-missing values of z(s, t) at locations within a space-time cylinder Here, B(s, r ) denotes a ball centred at s with some radius r > r, i.e., N (s, t) ⊃ N (s, t). We take an exponential separable covariance function, with (s, t) 1 , (s, t) 2 ∈ S × T and | · | denoting the absolute difference. In our analysis, we generally find that φ time is larger than φ lon and φ lat , corresponding to consecutive daily anomalies for the same grid cell being more dependent than the anomalies for two grid cells on the same day which are 1 • in longitude (latitude) apart. The radius r is initially set to 60km, and increased in 10km increments until the cylinder N (s, t) contains at least 100 points, up to a maximum radius of 150km. This approach ensures sufficient data to estimate the parameters of the Gaussian process, while still accounting for spatial trends in the data. We fit these Gaussian process models using the package DiceOptim in R (Picheny et al. ), and generate 500 samples for each missing observation within the original space-time cylinder N (s, t), using the conditional simulation approach detailed above. For those validation locations with less than 100 available data points in N (s, t) at radius r = 150km, we use the distribution function obtained via the space-time moving cylinders discussed in Section 2.1.3.
Having generated these observations for the space-time locations with missing data, we transform back to the interval [0, 1] using the cumulative normal distribution function, and transform to the original scale using the marginal fits discussed in Section 2.2. Taking the minimum observation in N (s, t) for each sample results in 500 simulations of X(s, t) for each (s, t) ∈ X V . We finally use these observations to empirically estimate the required distribution function.

Comparison to the benchmark
We first compare our method to the empirical approach used to calculate the benchmark. In Fig. 3, we present the benchmark CDF, as well as a recalculated benchmark using data only from the validation period (2007 to 2015). We also present results for our proposed method, by averaging our predicted distributions across all spacetime locations in X V , i.e., we take the average CDF for each anomaly value where the distribution functions are evaluated.
Here, we observe a clear difference in the results obtained via the benchmark approach computed for the training data with complete neighbourhoods and observations within the validation period, particularly for anomalies in the range [−1, 1]. This disparity highlights the existence of a temporal trend in the anomaly data (see Section 2), and supports our use of a localised approach. On average, our method closely agrees with the empirical results within the validation period, demonstrating that, overall, we are able to capture this temporal behaviour.

Evaluation of the score
We further examine the performance of our approach using the threshold-weighted continuous ranked probability score (twCRPS) from the data challenge. LetF s,t be the predicted distribution function for the minimum anomaly X(s, t) over N (s, t), and denote the true value by x(s, t). The twCRPS is then approximated by . . . , 400). We select a test data set X * ⊂ X containing 4,200 space-time locations, such that no data are missing over N (s, t) for (s, t) ∈ X * . Data are then randomly removed from N (s, t) for (s, t) ∈ X * ; the amount of missing data is sampled from the empirical distribution function of missing values within N (s , t ) for (s , t ) ∈ X V . We generate five such data sets for each (s, t) ∈ X * , with the aim to recover the true value x(s, t) using our approach in Section 2. The approximated score across the test points in X * is 0.17 × 10 −4 . The discrepancy between the scores for the set X * and the validation set X V used by Huser (2020) is subsequently explained. Firstly, our generated test data is only similar to the validation data with respect to the number of missing data points within the 50km radius. However, the test data exhibit less missing observations within the 150km radius, and our Gaussian process approach can thus be applied more often. In particular, the space-time moving cumulative distribution function had to be derived only once for the points in X * , while we had to use it for 15% of the space-time points in X V . The second reason for the discrepancy in the scores is that X * covers the years 1985-2015 while the challenge considers the period 2007-2015. Given the positive trend in the anomalies, and the fact that the twCRPS gives more weight to predicting high values of X(s, t), a higher average score is to be expected for space-time points in the later years.

Discussion
We thank Raphaël Huser for organising this data challenge. Our approach utilised tools from spatial statistics and extreme value analysis, and could be extended. For instance, other teams showed that machine learning techniques can be applied successfully to recover the missing values in the data. It would be interesting to explore whether a combination of these approaches could yield better results. Specifically, our localised Gaussian process estimates may potentially be replaced by their approaches. The implementation of a cross-validation scheme would help to distinguish the advantages and disadvantages of both approaches.
One drawback of using Gaussian processes for extreme value modelling, is their ability to capture only asymptotic independence, i.e., situations where the largest values occur separately across different variables. The development of computationally efficient models that are able to capture both asymptotic dependence and asymptotic independence is a topic that still requires attention, and it is likely that our approach would be improved by the added flexibility such models would bring. Conditional models for extremes (Heffernan and Tawn 2004;Wadsworth and Tawn 2019), which are able to capture these different tail dependence classes as well as allowing for straightforward simulation of missing data, have recently been extended to the spatiotemporal setting by Simpson and Wadsworth (2020), and provide one possible avenue for the improvement of our approach.
Another extension may be to address the temporal non-stationarity in the marginal behaviour of the anomalies. We again used the Lilliefors test to decide whether the anomalies may be modelled via a normal distribution. The data were split into blocks of length 500 days, and the null hypothesis was rejected for 30%-80% of locations on the 5% significance level across blocks. Next, we considered the two time series A(1), . . . , A(5000) and A(5001), . . . , A(T ) and fit separate extreme mixture models. The thresholds and u were set to the empirical 6% and 94% quantiles of the respective time series. The mean parameter μ was higher for the second time period for all locations, and the highest difference was found for the central Red Sea. Further, σ was higher in the first period for most locations, except the most northern and southern locations. Finally, the 80% quantiles of the GPD for the lower tail were similar across the two periods, while the 80% quantiles of the GPD(ψ u , ξ u ) were on average slightly shorter in the second period. In addition to a temporal trend, we considered seasonality in the parameters of the extreme mixture model; the estimates were very similar for all seasons. In summary, we found a positive trend in the mean of the anomalies and a negative trend in their variation. Furthermore, the results indicated that the temporal trend in marginal behaviour varies across locations. To capture these features, one may define a spatio-temporal model for each of the parameters; however, we did not investigate this aspect further due to time constraints.