1 Introduction

Flooding of densely populated and low-lying coastal areas has large socio-economic impacts all around the world (Jongman et al. 2012). Coastal flooding is generally driven by storm surges which are caused by low mean-sea level pressure (MSLP) and strong 10-m winds (U10), such as those in tropical cyclones (TCs) and extratropical cyclones (ETCs). The surge component and the tidal component together make up the total water level (TWL) (Pugh 1996). Numerical models that accurately simulate TWLs along the world’s coastline are essential for operational forecasting (Verlaan et al. 2015) and hindcasting of extreme TWL events (Powell et al. 2010). Moreover, at longer timescales, such models can help to prioritize adaptation efforts by assessing which regions may see an increase in flooding frequency due to climate change (Vousdoukas et al. 2018), and identify the drivers of decadal to interannual sea level variability (Muis et al. 2018).

Factors that determine the height of an extreme TWL and its impact are storm characteristics, tidal phase, bathymetry and coastline geometry. TCs can have lower MSLPs and stronger U10s than ETCs (Keller and DeVecchio 2016), resulting in a higher storm surge. On the other hand, ETCs are generally larger in size than TCs, thereby affecting a larger coastal area (Irish et al. 2008). The TWL is also affected by the tidal phase, where the tidal range is generally larger in the mid-latitudes compared to the tropics (Ngodock et al. 2016). High TWLs occur at both high and low tide in areas with a small tidal range, while high TWLs generally coincide with high tide in areas with a large tidal range. The bathymetry also strongly influences the TWL: a strong TC or ETC moving over deep waters before making landfall will generate only a small storm surge, while a much higher storm surge will develop over a broad shallow continental shelf (Resio and Westerink 2008). Lastly, whether flooding will occur depends on the coastal geometry, as high TWLs are more likely to inundate e.g. a low-lying estuary compared to a coastal area with a steep sloping coastline (Bloemendaal et al. 2019).

To simulate TWLs for forecasting and hindcasting purposes and climate change impact assessments, hydrodynamic models such as the Global Tide and Surge Model (GTSM) (Verlaan et al. 2015) and the HIROMB-BOOS Model (Berg and Poulsen 2012) are used. Most global tidal models solve hydrodynamic equations to simulate the tide and assimilate satellite altimetry data to improve the accuracy (Stammer et al. 2014), though, tides within GTSM are solely based on the tidal potential (Irazoqui Apecechea et al. 2017). The storm surge is simulated by forcing a hydrodynamic model with MSLP and U10 fields, with the latter existing of a meridional (v10, northward) and zonal (u10, eastward) wind component (Belmonte Rivas and Stoffelen 2019). This meteorological forcing can be taken from climate reanalyses, operational forecasts, or climate simulation datasets. Despite such meteorological forcing being available on a global scale nowadays (e.g. Saha et al. 2014; Hersbach et al. 2019), studies on extreme TWLs have mostly been conducted on regional to continental scales (Woth et al. 2006; Westerink et al. 2008; Haigh et al. 2014b) since they require computationally demanding high resolution simulations in coastal areas. The implementation of unstructured grids in a hydrodynamic model (Kernkamp et al. 2011) allows for a spatially varying grid resolution and makes it possible to refine the spatial grid in coastal areas and shallow seas, while having course grid sizes in open water. This greatly reduces the computational costs, while maintaining high accuracy and has opened the way for the global modelling of TWL in coastal areas with sufficient resolution. By forcing GTSM with ERA-Interim reanalysis data (Dee et al. 2011), Muis et al. (2016) developed the Global Tide and Surge Reanalysis dataset (GTSR). This dataset consists of time series of TWLs along the world’s coastline for the time period 1979–2014, as well as estimates of the exceedance probabilities of extreme TWLs. GTSR has been applied in various flood risk assessments, such as a case study on the Ganges Delta (Ikeuchi et al. 2017), and a global cost–benefit analysis of large-scale coastal flood protection (Lincke and Hinkel 2018). A major limitation of the GTSR dataset, however, is that extreme TWLs related to TCs are underestimated (Muis et al. 2016, 2019). This is because ERA-Interim’s horizontal resolution of 0.75° × 0.75° (± 79 km at the equator) and temporal resolution of 6 h is insufficient to fully resolve small scale features, size and tracks of TCs (Schenkel and Hart 2012; Murakami 2014; Hodges et al. 2017; Ridder et al. 2018).

In 2018 the European Centre for Medium-range Weather Forecasts (ECMWF) launched its fifth generation climate reanalysis dataset called ERA5, which has a horizontal resolution of 0.25° × 0.25° (± 31 km at the equator) and a temporal resolution of 1 h (Hersbach et al. 2019). Based on previous research, the increase in resolution with respect to ERA-Interim is expected to substantially improve the modelling of TC related storm surges (Bloemendaal et al. 2019). In the future the ECMWF will continue to improve the quality of the reanalysis products, following the improvements that are implemented in ECMWF’s operational forecasting system, the Integrated Forecasting System (IFS). IFS serves as the climate model for the production of the reanalysis datasets. It is expected that the production of the next generation global climate reanalysis (ERA6) starts at the ECMWF from 2023 onwards (Hersbach et al. 2019). While the current operational IFS version has a horizontal resolution of 0.125° × 0.125° (~ 16 km at the equator) (ECMWF 2019a), it is uncertain to what extent the resolution of the future ERA6 reanalysis can be increased, compared to ERA5 (Hersbach et al. 2018). Overall, the increases in horizontal resolution between ERA-Interim, ERA5, and the future ERA6 reanalysis are expected to be beneficial for storm surge modelling. However, the exact increase in performance of global storm surge models following from the reanalysis upgrades is unknown.

This study aims to evaluate the improvement in storm surge modelling gained by using the new ERA5 reanalysis over ERA-Interim. In addition, to assess the value of a future higher resolution reanalysis dataset, we compare ERA5 to IFS, which serves as the underlying climate model for such reanalysis datasets. For this, we select eight historical storm surge events including five TCs and three ETCs. To assess how well their wind fields are represented in the meteorological datasets, we compare modelled U10 against satellite-based observations of U10. Subsequently, we simulate storm surges by forcing GTSM with the different meteorological forcings, and compare simulated and observed surge heights. Finally, we discuss the reanalysis upgrades from different user perspectives, which can function as a guideline for users of ERA-Interim who are planning to switch to ERA5.

2 Methodology

The methodology is illustrated in Fig. 1. First, we describe the selection criteria of the 8 historical events and the ERA-Interim, ERA5 and IFS datasets, and give a brief overview of the IFS versions on which they are based (Sect. 2.1). From here onwards, ‘IFS’ refers to the meteorological fields produced with ECMWF’s operational forecasting system. Second, we describe how modelled U10 fields are compared against satellite-based observations of U10 to examine the representation of TCs and ETCs (Sect. 2.2). Subsequently, GTSM is forced with the retrieved U10 and MSLP fields to simulate the associated surge heights (Sect. 2.3). To evaluate the performance of the different meteorological forcing datasets, we compare modelled and observed storm surges for the eight historical events (Sect. 2.4).

Fig. 1
figure 1

Flowchart of the research framework showing the forcing datasets (red), wind speed validation (blue), hydrodynamic modelling (green) and validation of the storm surge (yellow)

2.1 Meteorological datasets and historical events

Table 1 lists the key characteristics of ECMWF’s reanalyses and forecasting systems. We observe major improvements in the model and product specifications over time, especially in terms of spatial and temporal resolution (ECMWF 2019a). To assess which type of storm benefits the most from the improvement in climate model resolution we examine five TC events and three ETC events. We only consider storms that occurred after July 2017 to allow for comparison of ERA-Interim and ERA5 with more recent versions of IFS which have a higher resolution. The five TCs are: TC Irma (2017), TC Florence (2018), TC Michael (2018), TC Mangkhut (2018), and TC Jebi (2018). In addition we investigate one ETC and two TCs that underwent extratropical transition, indicated as TC-ETC, before making landfall: ETC Grayson (2018), TC-ETC Ophelia (2017) and TC-ETC Leslie (2018). Track data is retrieved from the International Best-Track Archive for Climate Stewardship (IBTrACS) (Knapp et al. 2010) at 3 hourly intervals for TCs Irma, Florence, and Michael, and from the National Oceanic and Atmospheric Administration (NOAA) at 6 hourly intervals for TCs Mangkhut and Jebi (NOAA 2018). The tracks and intensities of the 8 historical storm events are shown in Fig. 2. The radius of maximum winds (Rmax), which is the distance between the centre of a cyclone and its band of strongest winds (Irish et al. 2008), is used as an indicator of the size of a TC’s eye. We calculate Rmax at landfall by taking the average distance between the TC’s centre and maximum U10 in eight directions, where the centre is defined as the location with the lowest MSLP. For this, we use the IFS dataset which has the highest spatial resolution and is therefore expected to better represent the MSLP and U10 gradients in TCs (Bloemendaal et al. 2019). Since ETCs often consist of a frontal system with the strongest winds being found near the warm front, we don’t calculate Rmax for the ETCs. Table 2 summarizes the main characteristics of each storm in terms of landfall location, observed MSLP and U10, and observed surge height.

Table 1 Key characteristics of ECMWFs reanalyses and forecasting systems (ECMWF 2019a)
Fig. 2
figure 2

Observed trajectories and intensities of the eight historical events. Colours indicate the mean sea-level pressure (hPa)

Table 2 Observed storm characteristics at landfall of the eight historical events. MSLP, U10 and Rmax are provided for the time and date of landfall

2.2 Wind speed validation

For the eight historical events we compare U10 from ERA-Interim, ERA5, and IFS against satellite observations from the Advanced SCATterometer (ASCAT). ASCAT is an active radio instrument mounted on a satellite which illuminates the sea surface with two 550 km-wide beams of microwave radiation (Figa-Saldaña et al. 2002; KNMI 2016). The long wavelengths of microwaves, ranging from approximately 1 cm to 1 m, are not susceptible to atmospheric scattering which allows them to penetrate through clouds, haze and dust. By measuring the backscattered radiation from three angles, with antennas oriented at 45°, 90°, and 135° with respect to the satellite track, the sea surface roughness and corresponding wind speed can be determined. The resulting dataset has a grid size of 12.5 km × 12.5 km and a revisiting time of 12 h, and covers 65% of the global oceans. To assess the performance of the climate models with respect to ASCAT, we calculate the Root Mean Square Error (RMSE), mean absolute error (m/s), relative bias (%), and Pearson’s correlation coefficient.

To evaluate the performance of the meteorological datasets, we define the locations where we compute the performance indicators. We generate a set of coordinates on a grid (Fig. 3), for which we compare U10 from the meteorological datasets with observed U10. For TCs, this grid extends up to 500 km from the coastline at 0.2° resolution. For ETCs, we use a grid extending up to 1000 km away from the coastline at 0.5° resolution. This because ETCs generally consist of large frontal systems, extending over hundreds of kilometres (Houze et al. 1976; Evans and Hart 2008). Moreover, ETCs have a higher translation speed compared to TCs (Keller and DeVecchio 2016), which advocates for using a larger grid to ensure more ASCAT visits within the passing of the ETC. For all coordinates on the generated grid we compare the nearest available observed U10 from ASCAT with modelled U10 from ERA-Interim, ERA5 and IFS. Since ETCs generally have lower wind speeds than TCs (Keller and DeVecchio 2016), we set a threshold of 20 m/s for observed U10 from ASCAT. For TCs we consider the winds within a 100 and 200 km radius from the TC’s eye, hereby including the TC eyewall, where maximum U10 values are generally found (Chavas and Emanuel 2010; Carrasco et al. 2014; Takagi and Wu 2016).

Fig. 3
figure 3

Ocean grid to validate model U10 with observed U10 from ASCAT for TC Florence. The recorded TC track is shown by the black dashed line. Radius of 100 (red) and 200 (yellow) km used for selection of data is shown

2.3 Storm surge modelling

Storm surges are simulated with GTSM version 3 (Verlaan et al. 2015). GTSM is a global depth-averaged hydrodynamic model based on the Delft3D Flexible Mesh software (Kernkamp et al. 2011). This software allows to locally refine the computational grid by the use of unstructured grids. The cell size is mainly dependent on the bathymetry and the resolution increases from 25 km in deeper parts of the ocean to 2.5 km (~ 1.25 km in Europe) in shallow coastal areas. Wind and pressure fields used to force GTSM are first linearly interpolated to the computational grid underlying GTSM. A constant Charnock parameter is applied with a value of 0.041 to translate wind speed into wind stress. This value is selected because it is most consistent with the value used by the ECMWF to translate wind stress into wind speed. The bathymetry in GTSM is obtained from the General Bathymetric Chart of Oceans 2014 dataset (GEBCO; 30 arc sec resolution) (Weatherall et al. 2015), and for Europe, the higher resolution 15 arc sec EMODnet Bathymetry is used (Calewaert et al. 2016).

2.4 Storm surge validation

For validation we compare time series of simulated surge height against available observations from NOAA for TCs Irma, Florence and Michael (NOAA 2019), and the British Oceanographic Data Centre (BODC 2019) for TC-ETC Ophelia. For the remaining historical events, we compare the maximum surge heights between the different forcing datasets because observed time series of TWL and tides are not available for these storm events. We consider all tide gauge-stations that are (1) less than 500 km away from the cyclone’s track; and (2) where a storm surge of at least 0.5 m occurred. To evaluate the model performance, we calculate the absolute (m) and relative bias (%), Pearson’s correlation coefficient, mean absolute error (m), and the RMSE.

All time series are referenced to mean sea-level (MSL). At some tide gauge stations we observe a constant offset between the observed sea level and the predicted tide. To correct for this offset, we adjust the surge height by adding the differences between the mean of the predicted tide and the mean of the observed sea level. For the three TCs making landfall in the United States we use high water marks available from the United States Geological Survey (USGS 2019) to create a better spatial coverage. We subtract the modelled tide from the high water marks and transform the reference datum from NAVD88 to MSL to allow for comparison with modelled maximum surge heights.

The overall pattern of the modelled surge over time can differ strongly between adjacent grid points, especially at locations near barrier islands and estuaries where inlets are smaller than the resolution of GTSM and connections consist of a single cell only. Therefore, we select all coastal grid points in GTSM within 10 km from the tide gauge station, and select the grid point that simulates the surge height with the smallest bias for ERA5.

3 Results and discussion

3.1 Wind speed validation with ASCAT

3.1.1 ERA-Interim

Figure 4a shows modelled and observed U10, and it can be seen that ERA-Interim systematically underestimates the wind speed of TCs compared to observations. Spatial plots of TC wind fields from ERA-Interim (Supplementary Materials Figures A1, A2) show a poorly developed eyewall structure for all TCs, implying that the intensification of TCs is insufficient. The least-squares line fitted to ERA-Interim (Fig. 4a) is close to horizontal, showing the very poor ability of ERA-Interim to capture strong TC winds. Averaged across all observations, ERA-Interim has a negative bias of − 24.9%. When only considering the more extreme winds, the negative bias of ERA-Interim increases to − 41.3% (Supplementary Materials Table A1). For ETCs there is a much smaller underestimation of wind speed by ERA-Interim (Fig. 4d). The least-squares line is closer to the 1:1 line, although there is a considerable negative bias of − 10.4% for ETC winds exceeding 20 m/s (Table 3). Stopa and Cheung (2014) also found a consistent low variability of ERA-Interim in comparison to observations which is indicative of a model that is not able to capture extreme winds, and identified an average wind speed bias of − 10% for ERA-Interims 99th percentile.

Fig. 4
figure 4

Scatter plots of modelled and observed U10 for the five TC and three ETC historical events for a, d ERA-Interim; b, e ERA5; and c, f IFS. Colours indicate the data density for bins with a 1 × 1 m size. For the TCs all available observations within a 200 km radius from its centre are shown. For the ETCs all winds exceeding 15 m/s are shown. The black line shows the 1–1 line, whilst the least-squares fit is shown by the red line

Table 3 Statistical summary of wind speed validation where the data of TCs and ETCs is combined. Statistics based on other thresholds are shown in Supplementary Materials Table A1

3.1.2 ERA5

When comparing the performance of ERA5 with ERA-Interim, there is a large decrease of the RMSE and relative bias of 58% and 98% in ERA5, respectively, for all TC winds within a 200 km radius (Table 3). The improved representation of TC intensity by ERA5 (Fig. 4b) compared to ERA-Interim can be attributed to ERA5’s higher spatial and temporal resolution. Another reason for this large improvement is that the data assimilation scheme of ERA5 includes ASCAT wind speed, while ASCAT data is not being assimilated in the production process of ERA-Interim (Table 1). For ETCs we observe a smaller improvement (Fig. 4d, e) with reductions of the RMSE and relative bias of 37% and 39%, respectively. This can be explained by the large-scale structure of ETCs and the smaller gradients in MSLP and U10 compared to TCs, for which ERA-Interim’s coarse resolution is more sufficient. Rivas and Stoffelen (2019) compared ERA-Interim and ERA5 wind speed against ASCAT wind speed and reported a 20% decrease in RMS wind speed agreement with ERA5 compared to ERA-Interim. This suggests that the improvement for extreme winds is stronger than for the more average conditions, which is consistent with our findings.

3.1.3 IFS

In general, the improvements from ERA5 to IFS do not lead to a further improvement of the representation of TC wind fields. While for example the RMSE decreases, the bias increases with IFS compared to ERA5 (Table 3). We note a tendency of IFS to overestimate U10 compared to observations for the highest TC wind speeds (Fig. 4c). This implies that wind speeds exceeding 25 m/s are either overestimated by IFS or underestimated by ASCAT. The latter hypothesis is supported by Chou et al. (2013), who compared ASCAT wind characteristics with dropwindsonde observations of TCs and found an underestimation of the highest wind speeds by ASCAT. Furthermore, ASCAT U10 is derived using a vertical–vertical (VV) polarization technique that tends to underestimate U10 exceeding 25 m/s and has an uncertainty of 4 m/s at a U10 of 40 m/s (Stoffelen et al. 2018). The tendency of IFS to overestimate TC wind speeds is not observed for TC Michael (Supplementary materials Table A2 and Figure A3). This can be explained by Michael’s short lifespan in combination with its high translational speed of 20 km/h (Beven et al. 2019) and relatively small Rmax of 33.7 km. Similar to TC Michael, typhoon Haiyan (2013) had a high translational speed and small Rmax (Mori et al. 2014), for which IFS also underestimated the intensity (ECMWF 2019b). This was likely due to a combination of several factors. In the assimilation process observation thinning is applied on ASCAT U10 such that from every four observations only one is actively assimilated with a horizontal resolution of approximately 100 km as a result (Laloyaux et al. 2016). Also, ASCAT data are discarded when exceeding 35 m/s, resulting in a negative wind speed bias (De Chiara et al. 2016). In addition, the deep and tight core of Haiyan requires a very high spatial resolution (1–4 km) to correctly represent the strong U10 and MSLP field gradients (Magnusson 2014; Mori et al. 2014). For ETC wind fields we observe a small decrease of the relative bias from − 6.4% to − 4.8% in IFS compared to ERA5, while the other parameters do not show a clear difference in performance (Table 3). This seems to suggest that the resolution of ERA5 is sufficient to accurately represent ETCs, and the performance will not improve much further by only increasing resolution.

3.2 Storm surge modelling and validation

3.2.1 ERA-Interim

Figure 5 shows maximum simulated and observed surge heights for TCs Irma, Florence and Michael, and TC-ETC Ophelia. Forcing GTSM with ERA-Interim results in large underestimations of the surge height for all TCs. For TC Irma (Fig. 5a), the underestimation of the surge height is especially large at the Florida Keys, where GTSM simulates a surge height of 0.88 m whereas 2.64 m was observed (− 66%). Further north along the coast of Georgia modelled surge heights are in much better agreement with observations. This can be explained by the evolution of TC Irma’s wind field after landfall: after Irma moved over the Florida peninsula, the compact wind field started to expand and wind and pressure gradients dropped (Cangialosi et al. 2018). At this point, ERA-Interim’s horizontal resolution of 0.75° × 0.75° (± 79 km at the equator) is sufficient at resolving these gradients, whilst at the moment of landfall, when the Rmax of Irma was 52.7 km, this model resolution was too coarse. Across all observation stations the average bias is − 0.32 m. For TC Michael, the surge heights are strongly underestimated between Mexico Beach and Panama City, where the strongest onshore winds occurred due to the onshore orientation of the wind field. TC Michael’s size was too small to be accurately represented by ERA-Interim. As a result, small-scale wind speed gradients are averaged out over larger grid cells in ERA-Interim. This results in an average bias of − 0.28 m across all stations and a modelled maximum surge height of 0.55 m at Mexico beach where the observed maximum surge height is 4.66 m (− 88%). In contrast to TC Michael, TC Florence had a very large wind field and a low forward speed of 9 km/h (Stewart and Berg 2019). As such, ERA-Interim was expected to be able to capture the TC’s intensity relatively well. However, although ERA-Interim correctly represents the structure of the wind field (Supplementary Materials Figure A1j), the wind speeds are underestimated, resulting in a weak surge modelling performance. Compared to the TCs, the underestimation of surge heights for ETCs is relatively small. This is for instance illustrated by TC-ETC Ophelia which made landfall with hurricane-force wind speeds in Ireland (Stewart 2018). The spatial distribution of the more extreme surge heights is simulated well by GTSM when forced with ERA-Interim (Fig. 5j). Furthermore the relative and absolute bias of the highest observed surge at Heysham is − 30% and − 0.70 m, respectively, which is much smaller compared to the TCs.

Fig. 5
figure 5

Maximum modelled surge height for TCs Irma (ac), Michael (df), Florence (gi) and Ophelia (jl). Storm surges were generated forcing GTSM with ERA-Interim (left column), ERA5 (centre column) and IFS (right column)

3.2.2 ERA5

Modelled surge heights are in much better agreement with observations when using ERA5 instead of ERA-Interim (Fig. 5), especially for TCs Irma and Florence. For these TCs, the simulated surge height increases approximately from 0.8 m with ERA-Interim to 2.5 m with ERA5 near the area of landfall. The performance also improves (Table 4), with reductions of the absolute bias of both TC Irma and TC Florence from − 0.32 to − 0.13 m and − 0.67 to − 0.07 m respectively. However, TC Michael’s storm surge is still substantially underestimated as indicated by the negative absolute bias across all stations of − 0.22 m when using ERA5 compared to − 0.28 m with ERA-Interim. At Mexico Beach, where the highest surge of 4.66 m was observed, the simulated surge height only increases marginally with ERA5 over ERA-Interim from 0.55 to 0.84 m (Fig. 6). The most likely explanation for this finding is that, similar as was found for ERA-Interim, Michael’s eye is too small to be represented correctly by the resolution of ERA5. The underestimation could also be related to the local bathymetry, such as barrier islands and semi-enclosed bays, which are not well resolved in GTSM. GTSM’s maximum storm surge height representation improves outside the eyewall of TC Michael. South of Tallahassee, outside the eyewall structure, where gradients in pressure and wind are less steep, a maximum surge height of 2.54 m was registered while GTSM simulates a surge height of 2.35 m when forced with ERA5. For TC-ETC Ophelia, the RMSE decreases from 0.21 to 0.19 m, while the average absolute bias of the surge height is reduced from − 0.20 m with ERA-Interim to − 0.02 m with ERA5.

Table 4 Surge modelling performance of GTSM for TCs Irma, Michael and Florence and ETC Ophelia
Fig. 6
figure 6

Maximum observed surge height for storms Irma, Michael, Florence and Ophelia (black dashed line) compared to simulated surge heights with ERA-Interim (grey), ERA5 (green) and IFS (blue) forcing

3.2.3 IFS

Forcing GTSM with IFS instead of ERA5 has varying effects on the performance of GTSM (Fig. 5). For TC Florence and TC-ETC Ophelia there are no noteworthy differences in model performance, implying that the resolution of ERA5 is sufficient to represent the relatively large wind fields of these storms correctly. For TC Irma, GTSM overestimates the surge heights when forced with IFS and the validation indicates an overall decrease in performance compared to ERA5 (Table 4). This could be caused by the overestimation of Irma’s wind speed by IFS, as shown by the relative wind speed bias of + 5.34% (Supplementary Materials Table A2). Another possible explanation for the overestimation is the use of a constant Charnock parameter (Charnock 1955) with a value of 0.041 to calculate the wind drag coefficient in GTSM, which is used to transform wind speed into wind stress. At higher wind speeds (U10 > 30 m/s) a smooth surface is formed by a layer of droplets and foam which shields the waves from the wind, stops the drag coefficient from growing, and might cause it to even decrease at U10 > 40 m/s (Powell et al. 2003; Sterl 2017; Ridder et al. 2018). The maximum wind speeds of Irma in IFS and ERA5 are 51 m/s and 35 m/s, respectively, when the Hurricane was located North of the Dominican Republic two days prior to landfall in Florida. Maximum wind speeds decreased to 43 m/s and 33 m/s in IFS and ERA5, respectively, a few hours prior to landfall in Florida (Supplementary Materials Figures A1 and A3). Hence, the Charnock parameterization probably leads to an overestimation of the wind stress with IFS forcing, with overestimated surge heights as a result. For TC Michael the average absolute bias decreases from − 0.22 m (ERA5) to − 0.06 m (IFS). This large decrease in bias with IFS compared to ERA5 can be explained by the fact that TC Michael is one of the smaller TCs. The use of IFS forcing increases the modelled surge height at Mexico City from 0.84 m with ERA5 to 2.31 m with IFS (Fig. 6), but overall the underestimation remains large at this location.

For the four historical events for which no observations are available, we analysed the differences between modelled surge heights for the different meteorological forcings (Supplementary Materials Figures A4, A5). This shows that the increase in surge height between ERA5 and IFS is smaller than the difference between ERA-Interim and ERA5 for the TCs. In addition, the relative difference in surge height for the ETCs between the different forcing datasets is much smaller compared to the TCs. These conclusions are consistent with our findings for TCs Irma, Michael and Florence and TC-ETC Ophelia.

Overall, there is a large improvement in the storm surge modelling performance of GTSM when forced with ERA5 instead of ERA-Interim. For a TC with a small eye, such as TC Michael, the performance of GTSM improves further when using IFS forcing. For ETCs, such as Ophelia, the increase in model performance of a higher resolution meteorological forcing dataset is much lower.

4 Outlook: future use of ERA5 and IFS in global surge modelling

4.1 Opportunities for improvement

Our results show that the recent advances in climate modelling, in particular the increase in spatial and temporal resolution, will greatly improve the accuracy of the global modelling of storm surges. The model accuracy will particularly improve in areas prone to TCs. Our results have identified opportunities to further improve storm surge modelling. First, aside from further increases in resolution in global climate models, developing more accurate earth observation systems might also be valuable for storm surge modelling. ASCAT for example, may underestimate high TC winds, as suggested by our comparison of TC U10 from IFS with ASCAT, and the study by Chou et al. (2013). As a result, the assimilation of ASCAT data in IFS could lead to an incorrect underestimation of TC wind speeds. Currently, a new scatterometer is being developed (van Zadelhoff et al. 2014). The operational use of cross-polarization signals that are more sensitive to extreme winds (25–45 m/s) (Mouche et al. 2017) will be implemented in the second generation MetOp satellite program which will then replace the current MetOp system including the ASCAT-A and ASCAT-B scatterometer instruments from approximately 2021 onwards. Until then ASCAT will provide the best available observed U10 dataset with global coverage, but it is important to consider the possible underestimation of wind speeds exceeding 25 m/s. Second, further advances in global surge modelling could be achieved by improving the wind parameterization in GTSM. This study as well as previous studies using GTSM have used a constant Charnock parameter to calculate the drag coefficient. This parameterization in GTSM is not fully consistent with the parameterization in ECMWF’s forecasting system. Until now, it is assumed that this has had limited effects on GTSM’s performance since U10 values rarely exceed 30 m/s in ERA-Interim and 40 m/s in ERA5. IFS does however contain U10 values of approximately 50 m/s, at which observations have shown that the drag coefficient starts to level off (Powell et al. 2003). As a result, the constant Charnock parameter can potentially lead to an overestimation of the wind stress and corresponding surge height at high wind speeds. This effect is shown for TC Irma. In future research it would be valuable to assess how the wind parameterization in GTSM could be improved. Third, storm surges are known to be highly sensitive to the bathymetry near the location of landfall of a cyclone (Resio and Westerink 2008). As such, the implementation of the new higher resolution GEBCO 2019 global bathymetry dataset (GEBCO Compilation Group 2019) in GTSM would be a very promising improvement. This will allow to increase the resolution of the computational grid underlying GTSM in shallow coastal areas, resulting in more accurately simulated surges at the coast.

4.2 Potential use of ERA5

We have shown that ERA5 can be used to accurately simulate surge heights of historical storm events. An important application of ERA5 is the development of a reanalysis of historical extreme sea levels to estimate exceedance probabilities. The improved representation of TCs and ETCs in ERA5 shows the potential for an update of the Global Tide and Surge Reanalysis (GTSR) dataset (Muis et al. 2016) using ERA5. This is expected to greatly reduce the underestimation of the more extreme surge heights that is reported for GTSR (Muis et al. 2017). Moreover, ERA5 will cover the period 1950–present compared to 1979–2019 for ERA-Interim (Hersbach et al. 2019). The increase in the length of the climate record from 41 to 70 years, allows to further enhance our understanding of decadal to interannual sea level variability (Woodworth et al. 2019). Extending the length of the extreme sea level reanalysis would contribute to more accurate assessments of the influence of natural climate variability such as the El Niño-Southern Oscillation on extreme sea levels (Marcos et al. 2017; Muis et al. 2018). In addition, trends related to climate change could be investigated with more confidence (Cid et al. 2016).

Using the longer ERA5 record would also reduce the uncertainty of fitting an extreme value distribution and as such provide more robust estimates of high Return Periods (RPs) (Haigh et al. 2014b; Wahl et al. 2017). However, calculating RPs of extreme sea levels based on approximately 70 years of data will remain problematic for regions prone to TCs. This is due to the relatively small area affected by a single TC and the low probability of occurrence of the more extreme TCs and corresponding extreme TWLs (Lin and Emanuel 2016). To accurately calculate RPs of extreme sea levels for such regions it will remain necessary to, for example, apply statistical methods to extend the TC track record to thousands of years (e.g. Emanuel et al. 2006; Haigh et al. 2014a).

4.3 Potential use of IFS

We have shown that the higher resolution of IFS compared to ERA5 especially increases the accuracy of simulated surge heights of small and fast forward moving TCs, such as TC Michael. This improvement in TC storm surge modelling shows the possibility to apply IFS in operational storm surge models. So far, meteorological forecasts, such as IFS, have not been able to adequately resolve TC intensities and tracks. Therefore, most operational storm surge models (e.g. Glahn et al. 2009; Greenslade et al. 2018) use meteorological forcing derived from parametric TC models. With these parametric models, the TC track can be slightly shifted to create probabilistic storm surge forecasts. Although the use of parametric TC models to simulate storm surges gives reasonable storm surge values, their modelling accuracy is limited when a TC structure varies from the standardized forms used in parametric models (Kohno et al. 2018). E.g. the parametric model of Holland (Holland et al. 2010), is not able to fully resolve the asymmetric structure of TCs (Lin and Chavas 2012; Kepert 2013; Chavas et al. 2015), resulting in over- and underestimations of the wind speed at the left and right side of the TC eye, respectively. In contrast to parametric models, ECMWF’s forecasting system is able to predict the TC structure, likely resulting in better storm surge predictions. An interesting analysis for future research is to assess how the performance of global storm surge models based on either TC track data or high-resolution forecasts from ECMWF’s forecasting system relate to each other, similar to Muis et al. (2019) who compared the performance of ERA-Interim against a parametric model. An additional benefit of using IFS as forcing in operational storm surge models is that the same source of meteorological data can be used to forecast TC and ETC induced storm surges, thereby greatly simplifying the forecasting approach. Lastly, a potential future enhancement to operational storm surge models could be the development of inundation forecasts. Such inundation forecasts can provide much more detailed information about the impacts of a TC or ETC to emergency managers.

5 Conclusions

In this study we have examined how the performance of global surge modelling is affected by the improvements in climate reanalysis datasets. For this, we evaluated the performance of ERA-Interim, ERA5 and IFS for eight historical events across the world (5 TCs and 3 ETCs). Validation of U10 shows that for TCs the negative bias of wind speed is reduced from 28.0% for ERA-Interim to 2.5% for ERA5. For ETCs, the negative bias reduces from 10.4 to 6.4%, respectively. The larger improvement for TCs compared to ETCs can be explained by the higher spatial and temporal resolution of ERA5 which especially better captures the large pressure gradients and strong winds that characterize TCs. The accuracy of the surge modelling also improves when using ERA5 instead of ERA-Interim. For TC Irma, for example, the modelled surge height increases from 0.88 m (ERA-Interim) to 2.68 m (ERA5) compared to 2.64 m observed. The use of IFS as forcing data is especially useful for simulating storm surges of fast forward moving TCs with a relatively small wind field, such as TC Michael. For ETCs the improvements with IFS compared to ERA5 are less apparent. To conclude, ERA5 constitutes a major improvement over ERA-Interim, thereby opening up new opportunities for advancing global storm surge modelling.