Rising extreme sea levels in the German Bight under enhanced CO 2 levels : a regionalized large ensemble approach for the North Sea

We quantify the change in extreme high sea level (ESL) statistics in the German Bight under rising CO 2 concentrations by downscaling a large ensemble of global climate model simulations using the regionally coupled climate system model REMO-MPIOM. While the model setup combines a regionally high resolution with the benefits of a global ocean model, the large ensemble size of 32 members allows the estimation of high return levels with much lower uncertainty. We find that ESLs increase with atmospheric CO 2 levels, even without considering a rise in the background sea level (BSL). Local increases of up to 0.5 m are found along the western shorelines of Germany and Denmark for ESLs of 20–50 years return periods, while higher return levels remain subject to sampling uncertainty. This ESL response is related to a cascade of an enhanced large-scale activity along the North Atlantic storm belt to a subsequent local increase in predominantly westerly wind speed extremes, while storms of the major West-Northwest track type gain importance. The response is seasonally opposite: summer ESLs and the strength of its drivers decrease in magnitude, contrasting the response of the higher winter ESLs, which governs the annual response. These results have important implications for coastal protection. ESLs do not only scale with the expected BSL rise, but become even more frequent, as preindustrial 50-year return levels could be expected to occur almost every year by the end of the century. The magnitude of the relative change in ESL statistics is hereby up to half of the expected rise in BSL, depending on the location. Changes in the highest extremes are subject to large multidecadal variations and remain uncertain, thus potentially demanding even further safety measures.


Introduction
Extreme high sea levels caused by storm floods constitute a major geophysical hazard for low-lying coastal regions such as the southeastern North Sea, known as the German Bight. Strong tidal oscillations due to the shallow shelf sea and the basin's geometry (Fig. 1), combined with its situation along the major northern hemispheric storm-track path can lead to particularly high storm floods. While these conditions have lead to several severe disasters such as the flood of 1962, they have also triggered various counteractions and regulations in coastal protection (MELUR 2012).
Anthropogenic climate change, however, may further alter the risks for coastal areas (Hinkel et al. 2015;IPCC 2019). While the expected rise in the background sea level (BSL) is-at least until the end of the century-relatively well-constrained (IPCC 2019; Slangen et al. (2014) and directly affects extreme sea level heights by shifting the entire distribution, there is less certainty about the relative changes in the sea level distribution, e.g., through changes in the storm climate (Feser et al. 2015). In particular, there is only little confidence on the change of upper-end extreme values (Weisse et al. 2012;Wahl et al. 2017), which by definition occur only rarely. Yet, it is those 'high-impact-lowprobability' extremes that are typically required for flood defense standards. While such height changes are often handled by simply scaling the sea level distribution with the regional or even global mean sea level rise (Buchanan et al. 2017;IPCC 2019), relative changes in the upper tail of Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s0038 2-020-05357 -5) contains supplementary material, which is available to authorized users. the sea level distribution may substantially alter the risk for flooding. We here investigate the regional change in those extreme sea level statistics in the German Bight following an increase in atmospheric CO 2 by downscaling a large ensemble of global climate model simulations. A rise in the background state due to sea level rise is not taken into account. Several studies have investigated how storm induced water levels in the North Sea will change under global warming (von Storch and Reichardt 1997;Flather and Smith 1998;Langenberg et al. 1999;Lowe et al. 2001;Woth et al. 2006;Lowe and Gregory 2005;Sterl et al. 2009;Howard et al. 2010;Gaslikova et al. 2013;Weisse et al. 2014). They traditionally employ a regional barotropic storm surge model forced by atmospheric conditions from climate models, to compare time-slices of a present-day climate with one at the end of the century based on certain future climate change scenarios. While most studies point towards no or only small increases in the frequency or magnitude of storm floods, the rather short data samples, naturally limited by the length of the time slices (typically 30 years), limit the analysis to relatively frequent extremes and impede statistically robust conclusions on upper-end extreme events that happen only once per couple of years. Further, North Sea extreme sea levels have been shown to exhibit a large interannual to multidecadal variability , especially when dealing with more extreme and thus rare events (Lang and Mikolajewicz 2019), additionally stressing the need for large samples of extremes to separate a climate change signal from internal variability. Only a few studies have addressed this signal-to-noise issue for more high-impact events: using a 17-member initial-conditions ensembles and the SRES A1B scenario, (Sterl et al. 2009) found no changes in upper-end North Sea storm flood return levels above natural variability, although the specific focus was on Dutch flood defense standards and selected locations.
Yet, due to the lateral boundaries of the underlying storm surge models, none of the above studies employ a globally consistent climate scale simulation with full atmosphereocean coupling. Hence, they do not account for climaterelated sea level variations external to the North Sea, which have been shown to impact regional North Sea sea level on different scales (Chen et al. 2014;Calafat et al. 2012), nor for external surges (Gönnert and Sossidi 2011) that travel from the North Atlantic into the basin and can build an important component of the total storm surge induced water level, especially under climate change (Woth et al. 2006). Further, studies with only a regional domain can only indirectly relate ESL variations to large-scale drivers.
This study attempts to close these gaps as we (1) adequately sample the internal variability of ESLs by using a large ensemble of general circulation model (GCM) experiments, and (2) downscale this ensemble with a coupled climate model with a global ocean, but a regional zoom on the North Sea (see Mathis et al. 2018;Lang and Mikolajewicz 2019). This allows us to robustly investigate potential changes in high-impact-low-probability events following an increase in atmospheric CO 2 . By employing a total of 32 members, this marks to our knowledge the largest ensemble study using a coupled climate system model to investigate changes in the statistics of regional sea level extremes. The comparably large number of ensemble members leads to much smaller confidence intervals and allows the detection of changes in sea levels of even larger return periods.
This article is structured as follows: First, the model and downscaling setup as well as forcing and scenario are introduced (2); Second, results are presented for (i) changes between high and low CO 2 worlds (3.1), and for (ii) the transient response of extremes (3.2), followed by a seasonal analysis (3.3) and an investigation of the associated drivers in the climate system (3.4). We close with a discussion of the relevance of these results in the light of current emission pathways (4).

Model system and experimental design
We employ the regionally coupled climate system model REMO-MPIOM (Mikolajewicz et al. 2005;Sein et al. 2015;Mathis et al. 2018;Lang and Mikolajewicz 2019), which consists of the global ocean model MPIOM (Marsland et al. 2003;Jungclaus et al. 2013) and the regional atmospheric model REMO (Jacob and Podzun 1997), centered over Europe (Fig. 2). Inside the REMO domain, MPIOM and REMO are interactively coupled with the coupler OASIS-3 Fig. 1 Bathymetry of the German Bight with main study location Cuxhaven (Valcke 2013); outside, the ocean model is forced by output from the parent atmospheric model. This model setup is identical to the one used in (Lang and Mikolajewicz 2019), with the exception that he bottom friction has been modified to produce more realistic tidal amplitudes in the southeastern North Sea. The regional atmospheric model covers the wider European region, parts of northern Africa and the northeast Atlantic, and is run with a 0.44 • setup, corresponding to approx. 50 km grid spacing, and with 27 vertical levels. The ocean model is run on a stretched grid configuration with a nominal horizontal grid resolution of 1.5 • and 30 vertical layers, and includes the full luni-solar ephimeridic tidal potential according to Thomas et al. (2001). In order to maximize the grid resolution in the North Sea, the model's poles are shifted to Central Europe and North America, resulting in a horizontal grid resolution of up to 9 km in the German Bight, thus enabling a more realistic simulation of smallscale shelf processes. At the same time, the ocean model's global domain allows for a consistent simulation of signals across the lateral boundaries defined by REMO. This could be important for coastal sea levels through teleconnections, e.g., modulations in the Gulf stream or the Atlantic Multidecadal Oscillation (Ezer et al. 2016;Turki et al. 2019). To avoid the problem of dry-falling ocean grid-points due to strong tidal fluctuations, MPIOM's uppermost layer thickness is set to 16 m. For a discussion of the suitability in analysing storm floods including a validation using the Cuxhaven tide gauge record, see Lang and Mikolajewicz (2019).
As forcing, we use output from the 1pctCO2 simulations of the Max Planck Institute Grand Ensemble (MPI-GE, Maher et al. 2019), model version MPI-ESM1.1. The model is run in its low resolution configuration (resolution T63) with 47 vertical levels in the atmosphere model ECHAM (Giorgetta et al. 2012) and 1.5 • resolution and 40 vertical layers in the ocean model MPIOM (Jungclaus et al. 2013). The 1pctCO2 scenario covers 150 years, with the atmospheric CO 2 concentration increasing by 1% every year, from preindustrual levels to approximately a quadrupling at the end of the simulation period. The transient character of the simulation provides additional insights into the evolution of long-term variability. To tackle the large internal variability in sea level extremes (Lang and Mikolajewicz 2019), we downscale a set of 32 ensemble members (the maximum number of ensemble members with 6-hourly output that is required for downscaling), resulting in 32 × 150 = 4800 years of simulation. Initial conditions of the individual runs stem from different years of the corresponding downscaled PI-Control run. As we employ a boussinesq model and mainly target changes in ESL statistics, a transient rise in BSL is not taken into account. Since atmosphere-related sea level variations have been shown to act to a first approximation independently of the water depth (Lowe and Gregory 2005;Sterl et al. 2009;Howard et al. 2010), this should however not affect the relative variations in sea level extremes.

Extreme value sampling
Sea level can be described as the sum of a longer-term mean sea level, the tide and the atmospheric surge, as well as their non-linear interactions (Pugh 1987). Instead of treating the terms separately, we here investigate extreme sea levels in terms of total sea level above a long-term reference state. Since we are solely interested in high-water extremes, we use the term extreme sea level (ESL) to refer to the upper end of the distribution only.
We sample ESLs in terms of annual (or seasonal, respectively, for Sect. 3.3) maxima, based on hourly values. In order to not split up the winter storm flood season, we define a year as starting in July. To remove trends in the background sea level, but to still account for low-frequency variability, the centered 30-year running mean of the annual median sea level has been subtracted from each annual maximum value.
To characterize ESLs based on their probabilities, we rely on the commonly used concept of return levels (Coles et al. 2001), representing the water level expected to be exceeded on average once every x years. Such probability-based exceedance levels of assigned return periods are often required for designing coastal defense structures. As simulated data from the instrumental record or from the above mentioned future time slice experiments rarely cover more than a couple of decades, ESLs are typically estimated based on parametric extreme value analysis, resulting in a considerable extrapolation of the data. The advantage of a large ensemble approach is that it allows to infer upper-end extreme value statistics without the use of parametric methods. These non-parametric estimates have been inferred by first ranking the data points of each time slice of the full sea level ensemble and associating a cumulative probability to each value. The respective return periods are inferred from the sampled ESLs using the reciprocal of the probability of exceedance P E , defined as P E = m N+1 , where m is the rank of N events ordered in decreasing order. To increase the robustness of lower return periods, the underlying sample of extreme values has been increased to the highest 10 per year for this type of analysis.

Change in ESL statistics
For the analysis of climate-related ESL changes, we analyse pooled ESL statistics of all 32 ensemble members. We analyse the climate-related change in ESL statistics in two ways: first, we follow the common time-slice approach and compare two 30-year periods at the beginning and end of the 150-year period (climate change response, Sect. 3.1). The two time slices comprise the years 5-34 ("low CO 2 ", 305-403 ppm) and 120-149 ("high CO 2 ", 957-1278 ppm), corresponding to a global mean surface temperature (GMST) change of a bit more than four degrees. To avoid potential artefacts after the branching-off from the control run, we exclude the first 5 years of each run for the analysis. We define 'climate change response' as the difference of high and low CO 2 states. Secondly, we investigate the 150-year transient response (Sect. 3.2) by analysing pooled ensemble extreme value statistics of running 30-year periods. This analysis is followed by a seasonal analysis (Sect. 3.3) and an investigation of associated drivers in the climate system (Sect. 3.4).

Results
The simulated evolution of ensemble statistics of ESL together with a display of the two time slices used for the analysis of the climate change response is shown in Fig. 3 exemplary for Cuxhaven. With values between 2.5 and 6.5 m above the 30-year mean, the individual members exhibit a high spread. For comparison, we include the range of observation-based ESL using the same definition from the last 100 years of tide gauge record (green bar, data from the AMSeL project, see Jensen et al. 2011). For a more detailed validation of simulated extreme value statistics, see Lang and Mikolajewicz (2019).

Climate change response
The traditional approach to assess changes in sea level extremes under climate change is to compare the statistics of typically 30-year periods at the beginning and end of a certain scenario run (usually RCP8.5 or SRES A1B). However, with a single or only a few ensemble members one cannot robustly identify changes in the tails of the extreme value distribution. With the 32 members a 30 year length, a total of 32 × 30 = 960 years is available for each time slice. Hence, a comparison of 30-year segments of low and high CO 2 states can yield much higher levels of accuracy. Figure 4 shows ESL statistics at Cuxhaven in the two periods in terms of their return levels. The individual ensemble members of 30 year length result in a large spread of more than 2 m for 30-year return periods. With 32 ensemble members and effectively 960 years for each period, however, we can reduce the associated uncertainties, so that much higher return periods can be calculated without the need of fitting an extreme value distribution to the data. While individual ensemble members (thin lines) show a large spread As a reference, the overall mean ESL (i.e., the 1-year return level) is indicated by the dashed line. The range of ESL from the tide gauge record is shown by the green bar on the side. Time slices used for the climate change signals in low and high CO 2 worlds are marked on the x-axis and non-uniform climate change signals, the full ensemble (thick lines) shows that return levels generally increase with rising CO 2 . The change in ESLs is largest for return periods of around 20-30 years, where the higher CO 2 concentrations lead to a shift of around 30 cm relative to the mean state. The change is significant for extreme values up to return periods of around 33 years, based on both a two-sided student t-test ( = 0.05 ) and the 95% confidence intervals from a parametric extreme value analysis using a Generalized Extreme Value (GEV) distribution. For higher return periods, the confidence range of the extreme value estimates surpasses the climate change signal.
Is this climate change response sight-specific or is there a more general regional pattern visible? The spatial climate change response pattern of ESLs is shown in Fig. 5 for return periods of 20 and 50 years. While there is a slight increase in both ESL estimates in the entire German Bight, changes are most pronounced along the western Danish and German coasts, with maximum values of 0.4-0.5 m at around 55 • N close to the German/Danish border. As a consequence, ESLs that correspond to a certain return period in preindustrial times will thus also become more frequent over the wider area, with preindustrial 50 (20) year return levels occurring up to every 10-15 (6-8) years in a high-CO 2 world. This marks an increase by the factor three to five. Note that this analysis only considers the relative changes in extreme values with the long-term mean sea level subtracted. If a BSL rise due to thermal expansion or melting of ice sheets were included as well, these numbers would change even more (see Sect. 4).

Transient response
Additionally to comparing the statistics of two time-slices of low and high CO 2 states, the transient character of the simulation allows a more detailed analysis of temporal changes in extreme value statistics, including the detection of possible nonlinearities in the system or other forms of long-term variability, as a function of both time and atmospheric CO 2 concentrations.
The internal ESL variability, represented by the 30-year running mean of the ensemble standard deviation (see Supplementary Fig. S1) increases with rising CO 2 concentrations, but also exhibits strong multidecadal fluctuations by about 10 cm for the pooled ensemble. This long-term ESL variability is also manifest in the more extreme ESL measures, as the temporal evolution of different return level estimates based on pooled ESL statistics show (Fig. 6). As already seen in the time-slice approach above, ESLs increase accordingly with atmospheric CO 2 concentrations. Although the 20-year return level shows a statistically significant trend over the 150 years, the transient approach reveals a more detailed behavior: a period of low ESL variability between years 30 and 50 leads to a marked 'dint' in the gradual response, especially in the upper-end return levels, where it is of similar magnitude as the overall 150 year response. Pooled ESLs in this period rarely surpass the 5 m threshold (compare Fig. 3), thus leading to the described opposed signal in ESLs of higher return periods. Here, the signal emerges from variability already at around 800 ppm or around 3 • degrees of warming, when the ESL change exceeds the limits of the 95% confidence bounds. This impact of ESL variability on the robustness of the ESL signal for different ESL heights is in line with the climate change signal in Fig. 4. The presence of such marked multidecadal variability of individual members as well as the pooled ensemble suggests that the temporal variations of ESLs, especially of return periods above a couple of decades can be of similar magnitude as the climate change response.

Seasonality
Since annual maximum sea levels in the German Bight would typically occur during the main storm flood season in winter, responses in other seasons are thus masked in the above analysis. Yet, the individual seasonal responses might  (Camphuysen and Leopold 1994), even if we might not expect strong storm floods in those seasons. Figure 7 shows histograms of the frequency of the extreme sea levels occurring in a certain month for low and high CO 2 levels, sampled as annual maxima (i.e., 1-year return levels, top) and 30-year return levels (bottom). As expected, the bulk of ESLs occurs in the winter months, although the strongest annual floods can also occur in the summer months in individual years. This strong seasonality is even more evident for more extreme return levels (lower panel). Over time, however, a change in the seasonal distribution from low to high CO 2 state is evident, with the timing of ESLs becoming more confined to the winter months December and January. In fact, the most extreme storm floods in a high CO 2 world occur in our simulations only in the months from November to March.
This confinement to the winter season has implications for the seasonal climate change signal in return levels.  Figure 8 shows an equivalent analysis of return levels to Fig. 4, but split up into the four different seasons. The change in seasonality results in a stronger climate change signal in winter-only return levels than if based on annual maximum sea levels. If treating the four seasons individually, maximum winter ESL changes reach up to 60 cm for return periods of around 40 years. This is most likely due to the fact that a 'washing-out' of the return level changes due to the annual maximum appearing in different seasons is no longer possible with a seasonal analysis. Interestingly, the summer response shows opposing trends, with a reduction of storm flood heights in summer, with maximum change signals for ESLs of 60-80 year return periods. Autumn and Spring responses are only minor. That is, the seasonally opposing signals in winter and summer together with a shift of the annual maximum to the winter-months lead to a dampening in the year-long signal and accordingly to smaller increases in return levels.

Atmospheric drivers
What drives the change in ESL statistics? With the longterm background state removed, it is the meteorological conditions such as changes in wind speed or direction, the frequency of severe storms, both locally and large-scale, and pressure patterns that govern the atmospheric surge component of sea level.
Several studies have investigated the changes in European wind climate (for a review see Feser et al. 2015). Although an overall tendency for a poleward shift in storm activity is emerging, regional and model-dependent deviations from this large scale picture stress that the confidence in future changes in wind climate in Europe remains "relatively low" . As a main driver of surge levels, we here investigate changes in the regional 10 m wind climate as well as the large-scale storminess and the prevailing storm track categories.

Regional wind climate
The change in regional wind climate, expressed as the ensemble mean winter maximum (i.e., RV 1y ) wind speed change between high and low CO 2 worlds based on annual maximum 10 m wind speeds is shown in Fig. 9 (top left). Predominant westerly winds in the North Sea are stronger in the high-CO 2 world, while the opposite is true for parts of Southern Europe and the Mediterranean. The Northern Atlantic and parts of central Europe do not show significant signals. The annual maximum corresponds to approx. 22 m/s, i.e., around nine Beaufort, and should thus be a good measure of storminess. The 30-year return levels ( RV 30y ) of winter maximum wind speeds show a similar picture with an even stronger signal, although the statistical significance is more regionally confined. Yet, in the North Sea, statistically significant positive climate change signals in the regional wind climate are evident. With an increase in magnitude, also the predominant winter maximum wind direction changes slightly with rising atmospheric CO 2 concentrations, from WNW towards more more zonal W winds (Fig. 10). That is, the westerly winds which are responsible mainly for ESLs along the eastern part of the German Bight, i.e., the western German and Danish coasts, are becoming more frequent. This feature is consistent with the spatial pattern of the ESL changes, which show strongest climate change signals along these coasts. In summer, the simulation shows a regionally consistent and significant reduction in extreme wind speeds, both for RV 1y . and RV 30y (see Supplementary Fig. S.2). The predominant wind direction during summer does also change slightly, yet other than in winter rather towards more northerly directions (see Supplementary  Fig. S.3). The winter shift to stronger winds in the southern North Sea is consistent with results from studies by, e.g., (Beniston et al. 2007;Pinto et al. 2007) or Weisse et al. (2014) who, using a set of regional climate models (RCM) driven by ECHAM4 and HadAM3H, and ECHAM5 output respectively, found an increase in storm activity and in the wind climate over North and Baltic Seas. Similar increases of higher percentiles of westerly wind directions in the Southern North Sea have also been reported by Gaslikova et al. (2013) and Ganske et al. (2016) for different RCMs and the A1B scenario, although most changes in the latter were statistically insignificant. Concerning the more extreme wind speeds, however, (De Winter et al. 2013) found no significant change in the annual maximum wind speed in the North Sea from output of 12 CMIP5 models, with the here-used MPI-ESM showing no significant to slight positive changes in the southern North Sea. With respect to the 12 employed models which exhibit a range from significantly negative to significantly positive responses, the MPI-ESM ranks with its moderately positive response among the upper third of the employed models, with 3 GCMs showing a stronger response. However, with only three realizations, the sampling of internal variability may become a problem for the rather low signals in wind speed extremes.

Large-scale storminess
The large-scale storminess has been investigated using 2-5day bandpass-filtered (Blackmon 1976) sea-level-pressure variance. The resulting change between high and low CO 2 worlds shows an intensification of the North Atlantic storm belt in winter, esp. between 45 • and 60 • N in the northeastern Atlantic, as well as a shift towards a more zonal storm belt (Fig. 11). Again, the summer response is of opposite sign, with a broad reduction in storminess, although the change is smaller in absolute numbers (see Supplementary Fig. S.4). This seasonally opposing pattern is consistent with the seasonality of the ESL change and the increase in storm flood magnitude in (late) winter.

Fig. 9
Regional change (high-low CO 2 world) in winter maximum 10-m wind speeds with wind vectors for 1-year return periods (left) and 30-year return periods (right). Areas not significant at the 95% confidence level are striped out Fig. 10 Wind directions of winter maximum 10 m winds in a low and high CO 2 world (top) and their difference (bottom) at the location indicated in Fig. 9 Generally, there is a consistent cascade from changes in the large-scale storm climate towards changes in regional winds, which eventually produces a response in the statistics of ESLs. The response patterns of winter and summer seasons oppose each other; however, since North Sea wind speeds and ESLs are highest in the winter months, the yearly response of maximum wind speeds broadly follows the winter response.

Storm tracks
High ESLs in the German Bight are a result of storm floods induced by cyclones that travel from the North Atlantic into the North Sea and thereby generate large onshore winds that pile up water against the coast. Several authors have classified such storm-flood inducing strong cyclones according to their tracks. Analysing strong cyclones that led to severe storm floods registered at the Cuxhaven gauge since 1900, (Kruhl 1978) categorized them into two main types, the "North-West Type" and "West and South-West Type".
To investigate potential changes in the dominant storm track types, we perform a lagged Empirical Orthogonal Function (EOF) analysis (Weare and Nasstrom 1982) on SLP anomalies for each ESL event over the entire simulation period at Cuxhaven, with lead times of zero, 24 and 48 h. SLP anomalies have been calculated with respect to the winter-climatology. The advantage of a lagged EOF (or often 'extended EOF') analysis is that it incorporates both spatial and temporal patterns and can thus unravel dynamical structures such as propagating structures or oscillations.
The first EOF (Fig. 12) explains 60% of the variance and shows the predominant storm track with a path from Iceland over the southern Norwegian Sea towards southern Scandinavia. This track fits the "North-West Type" category and has been found to be responsible for most of the storm floods in the German Bight (Gerber et al. 2016;Ganske et al. 2018). The second and third EOFs show deviations from this track, e.g., a more northern or southern track (see Supplementary Figures S.6 and S.6).
By analysing the distribution of the corresponding principal components during both low and high CO 2 states, one can identify temporal changes in their relative frequency, and hence potentially shifts in the importance of each track type. Figure 12 shows histograms of the corresponding first principal component (PC1) for both low and high CO 2 states. It can be seen that the distribution shifts towards more positive values (significant on the 95% confidence level using a two-sided student-t test). The quantile-quantile plot suggests that this shift is partly due to an increase in the frequency of highest absolute values increases. This suggests that storms of this North-West-Type become more severe with rising atmospheric CO 2 levels. The statistics of the next PCs remains similar, indicating that there is-other than an intensification in the dominating track type-no change in the relative predominance of storm track types.

Discussion
The increase in ESL heights with rising atmospheric CO 2 is qualitatively consistent with the results from, e.g., (Lowe et al. 2001;Woth et al. 2006;Debernard and Røed 2008;Gaslikova et al. 2013) or Vousdoukas et al. (2017). Yet, these studies differ in the experimental design as they used barotropic storm surge models driven by atmospheric GCM data, in contrast to the coupled GCM-RCM setup described here. Further, given their smaller ensemble size, the increase in ESL heights is (1) statistically not distinguishable from internal variability (e.g., Gaslikova et al. 2013;Debernard and Røed 2008;Langenberg et al. 1999;Flather and Smith 1998;Sterl et al. 2009;von Storch and Reichardt 1997, or (2) only evident for more moderate ESLs with return periods of the order of magnitude of 1 year (Woth et al. 2006;Lowe et al. 2001). Here, we find statistically significant climate change signals of up to 30-40 cm in ESLs of return periods of up to approx. 30 years, with largest differences confined to the eastern German Bight (North Frisian Islands) and the Danish coast. These increases are in agreement with Vousdoukas et al. (2017) who used a hydrodynamic model forced by an 8-member climate model ensemble for RCP8.5. The spatial climate change response pattern of ESLs found here is consistent with the prevailing westerly wind directions during storms and their increase in magnitude in the high-CO 2 world.
The simulated increase in ESLs, independent of any climate-related background sea level rise, stresses the importance of foresighted planning of coastal defense structures and other adaptation measures. At the same time, the large ESL variability of individual members suggests that interannual to multidecadal deviations from the pooled ensemble signal are possible and may in fact mask an expected increase in ESL heights in the short term, or worse, overshoot an expected ESL pathway. This is especially evident for the upper tail of the ESL distribution, i.e., for floods of return periods of 50 years and more. Such multidecadal variability in ESLs might be related to variations in the large-scale atmospheric modes such as the NAO (Hurrell 1995) or other NAO-like patterns that have been linked to ESLs in the German Bight Lang and Mikolajewicz 2019).

ESL changes in the context of sea level rise
The results presented above only consider the relative change of ESL statistics and do not account for a sea level rise due to changes in volume, through, e.g., thermal expansion, the melting of ice caps or vertical land movement. Inclusion of these processes will, additionally to the presented changes in ESL statistics, lead to a shift of the entire sea level distribution, and thus amplify the rise in ESL heights.
To illustrate what such a superposition of background sea level rise (SLR) could effectively mean for probabilistic flood events in the German Bight region, we extend the analysis of return period changes in Sect. 3.1 by including estimates of SLR until the end of the century. With atmospheric CO 2 concentrations of about 1200 ppm and a change in GMST of more than four degrees at the end of the 150 years of simulation, the here underlying 1pctCO2 scenario is an idealized, yet-considering current emission pathways-rather drastic emission scenario. To set the results of the above analysis into the context of a more commonly used scenario, we translate the estimates of our 1pctCO2 simulation into the IPCC's RCP8.5 pathway by scaling them with GMST. That is, we calculate the change in ESLs not only until the end of the 150-year simulation period, but until the year when the GMST of 1pctCO2 ensemble mean equals the one from the corresponding RCP8.5 ensemble simulations. As sea level is expected to change differently over the globe, we use gridded estimates of end-of-the-century sea level rise for each component from (Carson et al. 2016), which is partly based on Slangen et al. (2014). These different components of the SLR comprise thermosteric and dynamic sea level (both taken from our simulation in year 125), as well as contributions from ice sheets of Antarctica and Greenland, melting of glaciers, glacial isostatic adjustment (GIA) and groundwater depletion (see Table 1). These estimates are similar to the IPCC AR5 (Church et al. 2013), but with a considerable upward correction concerning the dynamic ice contribution from Antarctica (see IPCC 2019). Together they result in around 0.9 m of SLR in the southern North Sea at the end of the century. Note though that all contributions, especially the GIA estimates, show considerable regional differences in the exact numbers. Comparing this number with the climate change signals from Sect. 3.1, shows that the magnitude of ESL changes alone without considering a SLR lies, at least at the coast, at around half of the region's end-of-the-century SLR estimate.
Without considering a change in ESL statistics, i.e., by simply shifting the distribution by the SLR estimate as often done for extreme sea level projections, for example in the IPCC SROCC (2019), a 50-year event would be reached every 10-5 years along the German Bight coast (Fig. 13a). Note that the SLR estimate corresponds to the IPCC's rather conservative "best estimate". Using the upper estimate with the highly uncertain contribution from the Antarctic ice sheet would yield an even stronger reduction in return periods.
However, by considering SLR together with the above described changes in ESL statistics, the climate change impact on return periods becomes even more drastic. Since it has been shown that a rise in BSL can, to a first approximation, be added linearly onto the atmosphere-induced water levels (Lowe and Gregory 2005;Sterl et al. 2009), we add the gridded SLR estimates onto the change in return period of a preindustrial 50-year return level (compare Fig. 5) until the RCP8.5 end-of-century equivalent year 125. The resulting change in the return period of a preindustrial 50-year return level is shown in Fig. 13b. Along the German Bight coast, a preindustrial 50-year event could thus be reached every five to three years at the end of the century. Note that the change factors are strongly determined by the local ESL variability: Locations along the coast where absolute ESLs are highest will experience rather moderate relative changes in return periods. Further offshore, SLR will exceed the here relatively low annual maximum values, hence leading to comparably high change factors. Yet, even along the coast, the return periods decrease by a factor of 10-20, thus leading to twice as high change factors as for SLR only. Accordingly, the 50-year return level would rise by more than 1 m if both BSL and ESL changes are considered. This underlines the importance of considering both ESL and BSL changes. A simple shift of the past sea level distribution to a higher mean could thus result in large and potentially costly errors.

Limitations
Two main caveats that add uncertainty to the presented results are introduced by (1) the model system and (2) the extreme value sampling.
Concerning (1), the ocean model used in this study has with a horizontal resolution of up to 9 km in the German Bight a relatively high resolution compared to traditional global ocean models. Yet, it is still too coarse to resolve local details of coastline and bottom topography. Similarly, its uppermost layer thickness of 16 m cannot resolve the finer vertical structures of the predominantly shallow waters in most coastal parts of the German Bight, especially along the flatter Wadden Sea coast. Here, the shallow waters and the strong tidal oscillations lead to coastline changes that are not accounted for in this model study. A direct comparison between simulated and observed sea levels should therefore be treated with caution. However, using data from the tide gauge record at Cuxhaven, (Lang and Mikolajewicz 2019) have shown that the regionally coupled climate system model REMO-MPIOM adequately simulates extreme sea level variations. Furthermore, although the large ensemble size enables an adequate sampling of the large internal variability, the choice of a single GCM-RCM model chain implies that the here presented results may be model dependent and thus that potential biases in the parent GCM could feed into Concerning the processes between the different sea level components, the non-linear interactions between tide, surge and mean sea level can be important, especially in shallow waters (e.g., Idier et al. 2019). While the tide-surge interaction (Horsburgh and Wilson 2007) is accounted for in this study, the effect of a rising mean sea level onto tide and surge, and their interaction is not taken into account here. For instance, a deeper sea can lead to shifts of the amphidromic points and further changes in the tidal range (Kauker and von Storch 2000;Plüß 2004). Also, surges diminish with a rise in sea level due to the decreased effect of surface wind stress on the water column (Arns et al. 2015). Yet, model experiments support the approximation of a linear superposition of surge and mean sea level (Lowe and Gregory 2005;Sterl et al. 2009;Howard et al. 2010), suggesting that mean sea level rise and surge can simply be added together and the changes in the two components studied separately, as done here. Finally, we have not considered the contribution to extreme sea levels from coastal waves, which add an additional hazard to coastal ecosystems.
Concerning (2), it has been shown that different measures of extreme events are sensitive to the choice of extreme value sampling index (e.g., Wahl et al. 2017). Several techniques have been applied to characterize ESL, from the use of block maxima (Méndez et al. 2007;Marcos et al. 2009) over percentiles (Woodworth and Blackman 2004;Dangendorf et al. 2013), to the selection of events over a given threshold (Méndez et al. 2006). The choice of the respective selection definition essentially represents a trade-off between bias (too many sampled extremes) and variance (too few sampled extremes) for the estimates. Here, we have tested several approaches, leading to no qualitative difference to the presented results. The choice of the block maxima method to represent ESLs has the advantage that it is robust to temporal variations in their magnitude. Instead of relative definitions such as 'surge residual' or 'skew surge' we here use a 'direct' method for sampling ESLs, because it eliminates the need to differentiate between tidal and surge parts and their nonlinear interaction. Further, as most studies with small ensembles rely on much fewer sampled events, an extrapolation by fitting an assumed extreme value distribution is common practice. Yet again, the choice of different extreme value distribution fits onto the same data can lead to substantial differences in the estimated extreme value indices (Wahl et al. 2017). In contrast, the large sample of extreme values due to the ensemble approach here allows the direct computation of extreme indices without the need to rely on such parametric methods.
Finally, it should be noted that irreducible uncertainties in future sea level extremes will remain regarding emission scenario (rather important in the long-term) and initial conditions representing the internal variability of both BSL and atmosphere-induced ESL variations as well as their regional variations (rather important in the short term).

Summary and conclusion
In this study we quantified the changes in North Sea extreme sea level statistics using a large ensemble of transient climate change simulations, downscaled with a coupled climate system model focusing on the southern North Sea. The 32 ensemble members have produced a large enough sample for a more robust analysis of higher extreme values, which can give new insights into future changes of the often desired high-impact-low-probability events. At the same time, the coupled climate model approach with a global ocean allows the free propagation of signals from the North Atlantic to the North Sea as well as a consistent analysis of related drivers in the climate system. Specifically, we have found that ESL heights increase in terms of return levels, with significant changes for return periods of up to approx. 30 years. The change is most pronounced in the winter season, while the summer shows an opposite response as ESL heights reduce with increasing atmospheric CO 2 concentrations. Analysing changes in the underlying atmospheric drivers, we found that these changes in ESL statistics are mainly driven by a cascade of a large-scale, hemispheric response (increased activity along the North Atlantic storm belt) to a regional increase in wind speed maxima, eventually leading to changes in surge heights and ESL statistics. The seasonality of changes in ESL statistics is also evident in the response of atmospheric drivers.
Extreme high sea levels will thus not only scale with the expected change in BSL, but may become even more extreme due to a widening of the sea level distribution. The resulting change in ESL alone accounts hereby for around half of what we expect the background sea level to rise in the region. A high temporal variability on interannual to multidecadal scales, particularly of the highest extreme values, though may complicate the picture in reality, as the variations in a single realization-or the real world for that matter-may temporally exceed or counteract the climate change signal. Opposing seasonal patterns suggest that the time of largest ESLs will become more confined to the winter months in a future warmer world.
Even though the 1pctCO2 scenario can be regarded as an idealized scenario, the qualitative conclusions can still hold for present-day emission pathways and thus be of use for decision-making in coastal protection. The combined effect of mean and extreme sea level rise could surpass traditional estimates based on a simple shift of the sea level distribution. This implies that ESL conditions as during preindustrial times may occur almost every year at the end of the century and thus stresses the necessity of timely adaptation and coastal protection measures.