Advancing global storm surge modelling using the new ERA5 climate reanalysis

This study examines the implications of recent advances in global climate modelling for simulating storm surges. Following the ERA-Interim (0.75° × 0.75°) global climate reanalysis, in 2018 the European Centre for Medium-range Weather Forecasts released its successor, the ERA5 (0.25° × 0.25°) reanalysis. Using the Global Tide and Surge Model, we analyse eight historical storm surge events driven by tropical—and extra-tropical cyclones. For these events we extract wind fields from the two reanalysis datasets and compare these against satellite-based wind field observations from the Advanced SCATterometer. The root mean squared errors in tropical cyclone wind speed reduce by 58% in ERA5, compared to ERA-Interim, indicating that the mean sea-level pressure and corresponding strong 10-m winds in tropical cyclones greatly improved from ERA-Interim to ERA5. For four of the eight historical events we validate the modelled storm surge heights with tide gauge observations. For Hurricane Irma, the modelled surge height increases from 0.88 m with ERA-Interim to 2.68 m with ERA5, compared to an observed surge height of 2.64 m. We also examine how future advances in climate modelling can potentially further improve global storm surge modelling by comparing the results for ERA-Interim and ERA5 against the operational Integrated Forecasting System (0.125° × 0.125°). We find that a further increase in model resolution results in a better representation of the wind fields and associated storm surges, especially for small size tropical cyclones. Overall, our results show that recent advances in global climate modelling have the potential to increase the accuracy of early-warning systems and coastal flood hazard assessments at the global scale.


Introduction
Flooding of densely populated and low-lying coastal areas has large socio-economic impacts all around the world (Jongman et al. 2012). Coastal flooding is generally driven by storm surges which are caused by low mean-sea level pressure (MSLP) and strong 10-m winds (U10), such as those in tropical cyclones (TCs) and extratropical cyclones (ETCs). The surge component and the tidal component together make up the total water level (TWL) (Pugh 1996). Numerical models that accurately simulate TWLs along the world's coastline are essential for operational forecasting (Verlaan et al. 2015) and hindcasting of extreme TWL events (Powell et al. 2010). Moreover, at longer timescales, such models can help to prioritize adaptation efforts by assessing which regions may see an increase in flooding frequency due to climate change (Vousdoukas et al. 2018), and identify the drivers of decadal to interannual sea level variability (Muis et al. 2018).
Factors that determine the height of an extreme TWL and its impact are storm characteristics, tidal phase, bathymetry and coastline geometry. TCs can have lower MSLPs and stronger U10s than ETCs (Keller and DeVecchio 2016), resulting in a higher storm surge. On the other hand, ETCs are generally larger in size than TCs, thereby affecting a larger coastal area (Irish et al. 2008). The TWL is also 1 3 affected by the tidal phase, where the tidal range is generally larger in the mid-latitudes compared to the tropics (Ngodock et al. 2016). High TWLs occur at both high and low tide in areas with a small tidal range, while high TWLs generally coincide with high tide in areas with a large tidal range. The bathymetry also strongly influences the TWL: a strong TC or ETC moving over deep waters before making landfall will generate only a small storm surge, while a much higher storm surge will develop over a broad shallow continental shelf (Resio and Westerink 2008). Lastly, whether flooding will occur depends on the coastal geometry, as high TWLs are more likely to inundate e.g. a low-lying estuary compared to a coastal area with a steep sloping coastline (Bloemendaal et al. 2019).
To simulate TWLs for forecasting and hindcasting purposes and climate change impact assessments, hydrodynamic models such as the Global Tide and Surge Model (GTSM) (Verlaan et al. 2015) and the HIROMB-BOOS Model (Berg and Poulsen 2012) are used. Most global tidal models solve hydrodynamic equations to simulate the tide and assimilate satellite altimetry data to improve the accuracy (Stammer et al. 2014), though, tides within GTSM are solely based on the tidal potential (Irazoqui Apecechea et al. 2017). The storm surge is simulated by forcing a hydrodynamic model with MSLP and U10 fields, with the latter existing of a meridional (v10, northward) and zonal (u10, eastward) wind component (Belmonte Rivas and Stoffelen 2019). This meteorological forcing can be taken from climate reanalyses, operational forecasts, or climate simulation datasets. Despite such meteorological forcing being available on a global scale nowadays (e.g. Saha et al. 2014;Hersbach et al. 2019), studies on extreme TWLs have mostly been conducted on regional to continental scales (Woth et al. 2006;Westerink et al. 2008;Haigh et al. 2014b) since they require computationally demanding high resolution simulations in coastal areas. The implementation of unstructured grids in a hydrodynamic model (Kernkamp et al. 2011) allows for a spatially varying grid resolution and makes it possible to refine the spatial grid in coastal areas and shallow seas, while having course grid sizes in open water. This greatly reduces the computational costs, while maintaining high accuracy and has opened the way for the global modelling of TWL in coastal areas with sufficient resolution. By forcing GTSM with ERA-Interim reanalysis data (Dee et al. 2011), Muis et al. (2016 developed the Global Tide and Surge Reanalysis dataset (GTSR). This dataset consists of time series of TWLs along the world's coastline for the time period 1979-2014, as well as estimates of the exceedance probabilities of extreme TWLs. GTSR has been applied in various flood risk assessments, such as a case study on the Ganges Delta (Ikeuchi et al. 2017), and a global cost-benefit analysis of large-scale coastal flood protection (Lincke and Hinkel 2018). A major limitation of the GTSR dataset, however, is that extreme TWLs related to TCs are underestimated (Muis et al. 2016. This is because ERA-Interim's horizontal resolution of 0.75° × 0.75° (± 79 km at the equator) and temporal resolution of 6 h is insufficient to fully resolve small scale features, size and tracks of TCs (Schenkel and Hart 2012;Murakami 2014;Hodges et al. 2017;Ridder et al. 2018).
In 2018 the European Centre for Medium-range Weather Forecasts (ECMWF) launched its fifth generation climate reanalysis dataset called ERA5, which has a horizontal resolution of 0.25° × 0.25° (± 31 km at the equator) and a temporal resolution of 1 h (Hersbach et al. 2019). Based on previous research, the increase in resolution with respect to ERA-Interim is expected to substantially improve the modelling of TC related storm surges (Bloemendaal et al. 2019). In the future the ECMWF will continue to improve the quality of the reanalysis products, following the improvements that are implemented in ECMWF's operational forecasting system, the Integrated Forecasting System (IFS). IFS serves as the climate model for the production of the reanalysis datasets. It is expected that the production of the next generation global climate reanalysis (ERA6) starts at the ECMWF from 2023 onwards (Hersbach et al. 2019). While the current operational IFS version has a horizontal resolution of 0.125° × 0.125° (~ 16 km at the equator) (ECMWF 2019a), it is uncertain to what extent the resolution of the future ERA6 reanalysis can be increased, compared to ERA5 (Hersbach et al. 2018). Overall, the increases in horizontal resolution between ERA-Interim, ERA5, and the future ERA6 reanalysis are expected to be beneficial for storm surge modelling. However, the exact increase in performance of global storm surge models following from the reanalysis upgrades is unknown.
This study aims to evaluate the improvement in storm surge modelling gained by using the new ERA5 reanalysis over ERA-Interim. In addition, to assess the value of a future higher resolution reanalysis dataset, we compare ERA5 to IFS, which serves as the underlying climate model for such reanalysis datasets. For this, we select eight historical storm surge events including five TCs and three ETCs. To assess how well their wind fields are represented in the meteorological datasets, we compare modelled U10 against satellitebased observations of U10. Subsequently, we simulate storm surges by forcing GTSM with the different meteorological forcings, and compare simulated and observed surge heights. Finally, we discuss the reanalysis upgrades from different user perspectives, which can function as a guideline for users of ERA-Interim who are planning to switch to ERA5.

Methodology
The methodology is illustrated in Fig. 1. First, we describe the selection criteria of the 8 historical events and the ERA-Interim, ERA5 and IFS datasets, and give a brief overview of the IFS versions on which they are based (Sect. 2.1). From here onwards, 'IFS' refers to the meteorological fields produced with ECMWF's operational forecasting system. Second, we describe how modelled U10 fields are compared against satellite-based observations of U10 to examine the representation of TCs and ETCs (Sect. 2.2). Subsequently, GTSM is forced with the retrieved U10 and MSLP fields to simulate the associated surge heights (Sect. 2.3). To evaluate the performance of the different meteorological forcing datasets, we compare modelled and observed storm surges for the eight historical events (Sect. 2.4). Table 1 lists the key characteristics of ECMWF's reanalyses and forecasting systems. We observe major improvements in the model and product specifications over time, especially in terms of spatial and temporal resolution (ECMWF 2019a). To assess which type of storm benefits the most from the improvement in climate model resolution we examine five TC events and three ETC events. We only consider storms that occurred after July 2017 to allow for comparison of ERA-Interim and ERA5 with more recent versions of IFS which have a higher resolution. The five TCs are: TC Irma (2017), TC Florence (2018), TC Michael (2018), TC Mangkhut (2018), and TC Jebi (2018). In addition we investigate one ETC and two TCs that underwent extratropical transition, indicated as TC-ETC, before making landfall: ETC Grayson (2018), TC-ETC Ophelia (2017) and TC-ETC Leslie (2018). Track data is retrieved from the International Best-Track Archive for Climate Stewardship (IBTrACS) (Knapp et al. 2010) at 3 hourly intervals for TCs Irma, Florence, and Michael, and from the National Oceanic and Atmospheric Administration (NOAA) at 6 hourly intervals for TCs Mangkhut and Jebi (NOAA 2018). The tracks and intensities of the 8 historical storm events are shown  in Fig. 2. The radius of maximum winds (R max ), which is the distance between the centre of a cyclone and its band of strongest winds (Irish et al. 2008), is used as an indicator of the size of a TC's eye. We calculate R max at landfall by taking the average distance between the TC's centre and maximum U10 in eight directions, where the centre is defined as the location with the lowest MSLP. For this, we use the IFS dataset which has the highest spatial resolution and is therefore expected to better represent the MSLP and U10 gradients in TCs (Bloemendaal et al. 2019). Since ETCs often consist of a frontal system with the strongest winds being found near the warm front, we don't calculate R max for the ETCs. Table 2 summarizes the main characteristics of each storm in terms of landfall location, observed MSLP and U10, and observed surge height.

Wind speed validation
For the eight historical events we compare U10 from ERA-Interim, ERA5, and IFS against satellite observations from the Advanced SCATterometer (ASCAT). ASCAT is an active radio instrument mounted on a satellite which illuminates the sea surface with two 550 km-wide beams of microwave radiation (Figa-Saldaña et al. 2002;KNMI 2016). The long wavelengths of microwaves, ranging from approximately 1 cm to 1 m, are not susceptible to atmospheric scattering which allows them to penetrate through clouds, haze and dust. By measuring the backscattered radiation from three angles, with antennas oriented at 45°, 90°, and 135° with respect to the satellite track, the sea surface roughness and corresponding wind speed can be determined.
The resulting dataset has a grid size of 12.5 km × 12.5 km and a revisiting time of 12 h, and covers 65% of the global oceans. To assess the performance of the climate models with respect to ASCAT, we calculate the Root Mean Square Error (RMSE), mean absolute error (m/s), relative bias (%), and Pearson's correlation coefficient.
To evaluate the performance of the meteorological datasets, we define the locations where we compute the performance indicators. We generate a set of coordinates on a grid (Fig. 3), for which we compare U10 from the meteorological datasets with observed U10. For TCs, this grid extends up to 500 km from the coastline at 0.2° resolution. For ETCs, we use a grid extending up to 1000 km away from the coastline at 0.5° resolution. This because ETCs generally consist of large frontal systems, extending over hundreds of kilometres (Houze et al. 1976;Evans and Hart 2008). Moreover, ETCs have a higher translation speed compared to TCs (Keller and DeVecchio 2016), which advocates for using a larger grid to ensure more ASCAT visits within the passing of the ETC. For all coordinates on the generated grid we compare the nearest available observed U10 from ASCAT with modelled U10 from ERA-Interim, ERA5 and IFS. Since ETCs generally have lower wind speeds than TCs (Keller and DeVecchio 2016), we set a threshold of 20 m/s for observed U10 from ASCAT. For TCs we consider the winds within a 100 and 200 km radius from the TC's eye, hereby including the TC eyewall, where maximum U10 values are generally found (Chavas and Emanuel 2010;Carrasco et al. 2014; Takagi and Wu 2016).

Storm surge modelling
Storm surges are simulated with GTSM version 3 (Verlaan et al. 2015). GTSM is a global depth-averaged hydrodynamic model based on the Delft3D Flexible Mesh software (Kernkamp et al. 2011). This software allows to locally refine the computational grid by the use of unstructured grids. The cell size is mainly dependent on the bathymetry and the resolution increases from 25 km in deeper parts of the ocean to 2.5 km (~ 1.25 km in Europe) in shallow coastal areas. Wind and pressure fields used to force GTSM are first linearly interpolated to the computational grid underlying GTSM. A constant Charnock parameter is applied with a value of 0.041 to translate wind speed into wind stress. This value is selected because it is most consistent with the value used by the ECMWF to translate wind stress into wind speed. The bathymetry in GTSM is obtained from the General Bathymetric Chart of Oceans 2014 dataset (GEBCO; 30 arc sec resolution) (Weatherall et al. 2015), and for Europe, the higher resolution 15 arc sec EMODnet Bathymetry is used (Calewaert et al. 2016).

Storm surge validation
For validation we compare time series of simulated surge height against available observations from NOAA for TCs Irma, Florence and Michael (NOAA 2019), and the British Oceanographic Data Centre (BODC 2019) for TC-ETC Ophelia. For the remaining historical events, we compare the maximum surge heights between the different forcing datasets because observed time series of TWL and tides are not available for these storm events. We consider all tide gauge-stations that are (1) less than 500 km away from the cyclone's track; and (2) where a storm surge of at least 0.5 m occurred. To evaluate the model performance, we calculate the absolute (m) and relative bias (%), Pearson's correlation coefficient, mean absolute error (m), and the RMSE.
All time series are referenced to mean sea-level (MSL). At some tide gauge stations we observe a constant offset between the observed sea level and the predicted tide. To correct for this offset, we adjust the surge height by adding the differences between the mean of the predicted tide and the mean of the observed sea level. For the three TCs making landfall in the United States we use high water marks available from the United States Geological Survey (USGS

3
2019) to create a better spatial coverage. We subtract the modelled tide from the high water marks and transform the reference datum from NAVD88 to MSL to allow for comparison with modelled maximum surge heights. The overall pattern of the modelled surge over time can differ strongly between adjacent grid points, especially at locations near barrier islands and estuaries where inlets are smaller than the resolution of GTSM and connections consist of a single cell only. Therefore, we select all coastal grid points in GTSM within 10 km from the tide gauge station, and select the grid point that simulates the surge height with the smallest bias for ERA5. Figure 4a shows modelled and observed U10, and it can be seen that ERA-Interim systematically underestimates the wind speed of TCs compared to observations. Spatial plots of TC wind fields from ERA-Interim (Supplementary Materials Figures A1, A2) show a poorly developed eyewall structure for all TCs, implying that the intensification of TCs is insufficient. The least-squares line fitted to ERA-Interim (Fig. 4a) is close to horizontal, showing the very poor ability of ERA-Interim to capture strong TC winds. Averaged across all observations, ERA-Interim has a negative bias of − 24.9%. When only considering the more extreme winds, the negative bias of ERA-Interim increases to − 41.3% (Supplementary Materials Table A1). For ETCs there is a much smaller underestimation of wind speed by ERA-Interim (Fig. 4d). The least-squares line is closer to the 1:1 line, although there is a considerable negative bias of − 10.4% for ETC winds exceeding 20 m/s (Table 3). Stopa and Cheung (2014) also found a consistent low variability of ERA-Interim in comparison to observations which is indicative of a model that is not able to capture extreme winds, and identified an average wind speed bias of − 10% for ERA-Interims 99th percentile.

ERA5
When comparing the performance of ERA5 with ERA-Interim, there is a large decrease of the RMSE and relative bias of 58% and 98% in ERA5, respectively, for all TC winds within a 200 km radius (Table 3). The improved representation of TC intensity by ERA5 (Fig. 4b) compared to ERA-Interim can be attributed to ERA5's higher spatial and temporal resolution. Another reason for this large improvement Fig. 3 Ocean grid to validate model U10 with observed U10 from ASCAT for TC Florence. The recorded TC track is shown by the black dashed line. Radius of 100 (red) and 200 (yellow) km used for selection of data is shown is that the data assimilation scheme of ERA5 includes ASCAT wind speed, while ASCAT data is not being assimilated in the production process of ERA-Interim (Table 1). For ETCs we observe a smaller improvement (Fig. 4d, e) with reductions of the RMSE and relative bias of 37% and 39%, respectively. This can be explained by the large-scale structure of ETCs and the smaller gradients in MSLP and U10 compared to TCs, for which ERA-Interim's coarse resolution is more sufficient. Rivas and Stoffelen (2019) compared ERA-Interim and ERA5 wind speed against ASCAT wind speed and reported a 20% decrease in RMS wind speed agreement with ERA5 compared to ERA-Interim. This suggests that the improvement for extreme winds is stronger than for the more average conditions, which is consistent with our findings.

IFS
In general, the improvements from ERA5 to IFS do not lead to a further improvement of the representation of TC wind fields. While for example the RMSE decreases, the bias increases with IFS compared to ERA5 (Table 3). We note a tendency of IFS to overestimate U10 compared to observations for the highest TC wind speeds (Fig. 4c). This implies that wind speeds exceeding 25 m/s are either overestimated by IFS or underestimated by ASCAT. The latter hypothesis is supported by Chou et al. (2013), who compared ASCAT wind characteristics with dropwindsonde observations of TCs and found an underestimation of the highest wind speeds by ASCAT. Furthermore, ASCAT U10 is derived using a vertical-vertical (VV) polarization   (Mori et al. 2014), for which IFS also underestimated the intensity (ECMWF 2019b). This was likely due to a combination of several factors. In the assimilation process observation thinning is applied on ASCAT U10 such that from every four observations only one is actively assimilated with a horizontal resolution of approximately 100 km as a result (Laloyaux et al. 2016). Also, ASCAT data are discarded when exceeding 35 m/s, resulting in a negative wind speed bias (De Chiara et al. 2016). In addition, the deep and tight core of Haiyan requires a very high spatial resolution (1-4 km) to correctly represent the strong U10 and MSLP field gradients (Magnusson 2014;Mori et al. 2014). For ETC wind fields we observe a small decrease of the relative bias from − 6.4% to − 4.8% in IFS compared to ERA5, while the other parameters do not show a clear difference in performance (Table 3). This seems to suggest that the resolution of ERA5 is sufficient to accurately represent ETCs, and the performance will not improve much further by only increasing resolution. Figure 5 shows maximum simulated and observed surge heights for TCs Irma, Florence and Michael, and TC-ETC Ophelia. Forcing GTSM with ERA-Interim results in large underestimations of the surge height for all TCs. For TC Irma (Fig. 5a), the underestimation of the surge height is especially large at the Florida Keys, where GTSM simulates a surge height of 0.88 m whereas 2.64 m was observed (− 66%). Further north along the coast of Georgia modelled surge heights are in much better agreement with observations. This can be explained by the evolution of TC Irma's wind field after landfall: after Irma moved over the Florida peninsula, the compact wind field started to expand and wind and pressure gradients dropped (Cangialosi et al. 2018). At this point, ERA-Interim's horizontal resolution of 0.75° × 0.75° (± 79 km at the equator) is sufficient at resolving these gradients, whilst at the moment of landfall, when the R max of Irma was 52.7 km, this model resolution was too coarse. Across all observation stations the average bias is − 0.32 m. For TC Michael, the surge heights are strongly underestimated between Mexico Beach and Panama City, where the strongest onshore winds occurred due to the onshore orientation of the wind field. TC Michael's size was too small to be accurately represented by ERA-Interim. As a result, small-scale wind speed gradients are averaged out over larger grid cells in ERA-Interim. This results in an average bias of − 0.28 m across all stations and a modelled maximum surge height of 0.55 m at Mexico beach where the observed maximum surge height is 4.66 m (− 88%). In contrast to TC Michael, TC Florence had a very large wind field and a low forward speed of 9 km/h (Stewart and Berg 2019). As such, ERA-Interim was expected to be able to capture the TC's intensity relatively well. However, although ERA-Interim correctly represents the structure of the wind field (Supplementary Materials Figure A1j), the wind speeds are underestimated, resulting in a weak surge modelling performance. Compared to the TCs, the underestimation of surge heights for ETCs is relatively small. This is for instance illustrated by TC-ETC Ophelia which made landfall with hurricane-force wind speeds in Ireland (Stewart 2018). The spatial distribution of the more extreme surge heights is simulated well by GTSM when forced with ERA-Interim (Fig. 5j). Furthermore the relative and absolute bias of the highest observed surge at Heysham is − 30% and − 0.70 m, respectively, which is much smaller compared to the TCs.

ERA5
Modelled surge heights are in much better agreement with observations when using ERA5 instead of ERA-Interim (Fig. 5), especially for TCs Irma and Florence. For these TCs, the simulated surge height increases approximately from 0.8 m with ERA-Interim to 2.5 m with ERA5 near the area of landfall. The performance also improves (  (Fig. 6). The most likely explanation for this finding is that, similar as was found for ERA-Interim, Michael's eye is too small to be represented correctly by the resolution of ERA5. The underestimation could also be related to the local bathymetry, such as barrier islands and semi-enclosed bays, which are not well resolved in GTSM. GTSM's maximum storm surge height representation improves outside the eyewall of TC Michael. South of Tallahassee, outside the eyewall structure, where gradients in pressure and wind are less steep, a maximum surge height of 2.54 m was registered while GTSM simulates a surge height of 2.35 m when forced with ERA5. For

IFS
Forcing GTSM with IFS instead of ERA5 has varying effects on the performance of GTSM (Fig. 5). For TC Florence and TC-ETC Ophelia there are no noteworthy differences in model performance, implying that the resolution of ERA5 is sufficient to represent the relatively large wind fields of these storms correctly. For TC Irma, GTSM overestimates the surge heights when forced with IFS and the validation indicates an overall decrease in performance compared to ERA5 (Table 4). This could be caused by the overestimation of Irma's wind speed by IFS, as shown by the relative wind speed bias of + 5.34% (Supplementary Materials   Table A2). Another possible explanation for the overestimation is the use of a constant Charnock parameter (Charnock 1955) with a value of 0.041 to calculate the wind drag coefficient in GTSM, which is used to transform wind speed into wind stress. At higher wind speeds (U10 > 30 m/s) a smooth surface is formed by a layer of droplets and foam which shields the waves from the wind, stops the drag coefficient from growing, and might cause it to even decrease at U10 > 40 m/s (Powell et al. 2003;Sterl 2017;Ridder et al. 2018). The maximum wind speeds of Irma in IFS and ERA5 are 51 m/s and 35 m/s, respectively, when the Hurricane was located North of the Dominican Republic two days prior to landfall in Florida. Maximum wind speeds decreased to 43 m/s and 33 m/s in IFS and ERA5, respectively, a few hours prior to landfall in Florida (Supplementary Materials Figures A1 and A3). Hence, the Charnock parameterization probably leads to an overestimation of the wind stress with IFS forcing, with overestimated surge heights as a result. For TC Michael the average absolute bias decreases from − 0.22 m (ERA5) to − 0.06 m (IFS). This large decrease in bias with IFS compared to ERA5 can be explained by the fact that TC Michael is one of the smaller TCs. The use of IFS forcing increases the modelled surge height at Mexico City from 0.84 m with ERA5 to 2.31 m with IFS (Fig. 6), but overall the underestimation remains large at this location. For the four historical events for which no observations are available, we analysed the differences between modelled surge heights for the different meteorological forcings (Supplementary Materials Figures A4, A5). This shows that the increase in surge height between ERA5 and IFS is smaller than the difference between ERA-Interim and ERA5 for the TCs. In addition, the relative difference in surge height for the ETCs between the different forcing datasets is much smaller compared to the TCs. These conclusions are consistent with our findings for TCs Irma, Michael and Florence and TC-ETC Ophelia.
Overall, there is a large improvement in the storm surge modelling performance of GTSM when forced with ERA5 instead of ERA-Interim. For a TC with a small eye, such as TC Michael, the performance of GTSM improves further when using IFS forcing. For ETCs, such as Ophelia, the increase in model performance of a higher resolution meteorological forcing dataset is much lower.

Opportunities for improvement
Our results show that the recent advances in climate modelling, in particular the increase in spatial and temporal resolution, will greatly improve the accuracy of the global  modelling of storm surges. The model accuracy will particularly improve in areas prone to TCs. Our results have identified opportunities to further improve storm surge modelling. First, aside from further increases in resolution in global climate models, developing more accurate earth observation systems might also be valuable for storm surge modelling. ASCAT for example, may underestimate high TC winds, as suggested by our comparison of TC U10 from IFS with ASCAT, and the study by Chou et al. (2013). As a result, the assimilation of ASCAT data in IFS could lead to an incorrect underestimation of TC wind speeds. Currently, a new scatterometer is being developed (van Zadelhoff et al. 2014). The operational use of cross-polarization signals that are more sensitive to extreme winds (25-45 m/s) (Mouche et al. 2017) will be implemented in the second generation MetOp satellite program which will then replace the current MetOp system including the ASCAT-A and ASCAT-B scatterometer instruments from approximately 2021 onwards.
Until then ASCAT will provide the best available observed U10 dataset with global coverage, but it is important to consider the possible underestimation of wind speeds exceeding 25 m/s.  (Powell et al. 2003). As a result, the constant Charnock parameter can potentially lead to an overestimation of the wind stress and corresponding surge height at high wind speeds. This effect is shown for TC Irma.
In future research it would be valuable to assess how the wind parameterization in GTSM could be improved. Third, storm surges are known to be highly sensitive to the bathymetry near the location of landfall of a cyclone (Resio and Westerink 2008). As such, the implementation of the new higher resolution GEBCO 2019 global bathymetry dataset (GEBCO Compilation Group 2019) in GTSM would be a very promising improvement. This will allow to increase the resolution of the computational grid underlying GTSM in shallow coastal areas, resulting in more accurately simulated surges at the coast.

Potential use of ERA5
We have shown that ERA5 can be used to accurately simulate surge heights of historical storm events. An important application of ERA5 is the development of a reanalysis of historical extreme sea levels to estimate exceedance probabilities. The improved representation of TCs and ETCs in ERA5 shows the potential for an update of the Global Tide and Surge Reanalysis (GTSR) dataset (Muis et al. 2016) using ERA5. This is expected to greatly reduce the underestimation of the more extreme surge heights that is reported for GTSR (Muis et al. 2017). Moreover, ERA5 will cover the period 1950-present compared to 1979-2019 for ERA-Interim (Hersbach et al. 2019). The increase in the length of the climate record from 41 to 70 years, allows to further enhance our understanding of decadal to interannual sea level variability (Woodworth et al. 2019). Extending the length of the extreme sea level reanalysis would contribute to more accurate assessments of the influence of natural climate variability such as the El Niño-Southern Oscillation on extreme sea levels (Marcos et al. 2017;Muis et al. 2018). In addition, trends related to climate change could be investigated with more confidence (Cid et al. 2016).
Using the longer ERA5 record would also reduce the uncertainty of fitting an extreme value distribution and as such provide more robust estimates of high Return Periods (RPs) (Haigh et al. 2014b;Wahl et al. 2017). However, calculating RPs of extreme sea levels based on approximately 70 years of data will remain problematic for regions prone to TCs. This is due to the relatively small area affected by a single TC and the low probability of occurrence of the more extreme TCs and corresponding extreme TWLs (Lin and Emanuel 2016). To accurately calculate RPs of extreme sea levels for such regions it will remain necessary to, for example, apply statistical methods to extend the TC track record to thousands of years (e.g. Emanuel et al. 2006;Haigh et al. 2014a).

Potential use of IFS
We have shown that the higher resolution of IFS compared to ERA5 especially increases the accuracy of simulated surge heights of small and fast forward moving TCs, such as TC Michael. This improvement in TC storm surge modelling shows the possibility to apply IFS in operational storm surge models. So far, meteorological forecasts, such as IFS, have not been able to adequately resolve TC intensities and tracks. Therefore, most operational storm surge models (e.g. Glahn et al. 2009;Greenslade et al. 2018) use meteorological forcing derived from parametric TC models. With these parametric models, the TC track can be slightly shifted to create probabilistic storm surge forecasts. Although the use of parametric TC models to simulate storm surges gives reasonable storm surge values, their modelling accuracy is limited when a TC structure varies from the standardized forms used in parametric models (Kohno et al. 2018). E.g. the parametric model of Holland (Holland et al. 2010), is not able to fully resolve the asymmetric structure of TCs (Lin and Chavas 2012;Kepert 2013;Chavas et al. 2015), resulting in over-and underestimations of the wind speed at the left and right side of the TC eye, respectively. In contrast to parametric models, ECMWF's forecasting system is able to predict the TC structure, likely resulting in better storm surge predictions. An interesting analysis for future research is to assess how the performance of global storm surge models based on either TC track data or high-resolution forecasts from ECMWF's forecasting system relate to each other, similar to Muis et al. (2019) who compared the performance of ERA-Interim against a parametric model. An additional benefit of using IFS as forcing in operational storm surge models is that the same source of meteorological data can be used to forecast TC and ETC induced storm surges, thereby greatly simplifying the forecasting approach. Lastly, a potential future enhancement to operational storm surge models could be the development of inundation forecasts. Such inundation forecasts can provide much more detailed information about the impacts of a TC or ETC to emergency managers.

Conclusions
In this study we have examined how the performance of global surge modelling is affected by the improvements in climate reanalysis datasets. For this, we evaluated the performance of ERA-Interim, ERA5 and IFS for eight historical events across the world (5 TCs and 3 ETCs). Validation of U10 shows that for TCs the negative bias of wind speed is reduced from 28.0% for ERA-Interim to 2.5% for ERA5. For ETCs, the negative bias reduces from 10.4 to 6.4%, respectively. The larger improvement for TCs compared to ETCs can be explained by the higher spatial and temporal resolution of ERA5 which especially better captures the large pressure gradients and strong winds that characterize TCs. The accuracy of the surge modelling also improves when using ERA5 instead of ERA-Interim. For TC Irma, for example, the modelled surge height increases from 0.88 m (ERA-Interim) to 2.68 m (ERA5) compared to 2.64 m observed. The use of IFS as forcing data is especially useful for simulating storm surges of fast forward moving TCs with a relatively small wind field, such as TC Michael. For ETCs the improvements with IFS compared to ERA5 are less apparent. To conclude, ERA5 constitutes a major improvement over ERA-Interim, thereby opening up new opportunities for advancing global storm surge modelling.