1 Introduction

An ongoing challenge in water-level modeling is the trade-off between scale and skill. Generally, larger-scale products (regional to global) are forced to trade accuracy for increased spatial and temporal coverage (Vousdoukas et al. 2016). Skill is often lost for larger-scale products because of coarser resolution, parameterization of local-scale processes, computational costs, and reduced physics (Fewtrell et al. 2008; Maraun et al. 2010). This trade-off is problematic because, especially in the field of coastal hazards, there remains a strong need for large-scale data that are relevant at local scale (Ward et al. 2015). In terms of spatial coverage, a complete understanding of risk and how to allocate resources at governmental levels requires a regional understanding of hazards that extends beyond isolated local study sites. Furthermore, most communities outside of large urban centers lack the resources and/or technical expertise to develop local hazard assessments. Therefore, products that cover large geographic regions with planning-level granularity provide both a more unified and equitable resource for understanding coastal hazards by making data available for every coastal community. Beyond large spatial scales, long temporal records are required for constraining extremes (Wahl et al. 2017), understanding changes to hazards over time, and for characterizing decadal to interannual-scale variability (Weisse et al. 2014). Overall, there remains a strong motivation for developing and improving large-scale models that are both efficient and accurate for coastal hazards research.

This study explores the recently developed Global Tide and Surge Model ERA5 dataset (GTSM-ERA5, Muis et al. 2016, 2020) for predicting coastal water levels along the Southeast Atlantic coastline of the USA. Previous water-level hindcast analyses in the region have been limited spatially to tide gauge locations (Ezer and Atkinson 2014; Davis and Vinogradova 2017), specific study sites (e.g., New York, Georgas et al. 2016), or focused only on specific types of events such as Hurricanes (Marsooli et al. 2018) or extratropical events such as Nor’easters (Lin et al. 2019). This data gap is primarily due to computational constraints and the scale/skill trade-off discussed above. GTSM, created using Delft3D Flexible Mesh (Kernkamp et al. 2011) by Deltares, narrows this trade-off for nearshore water-level modeling by leveraging a variety of recent modeling developments. These include unstructured meshes and parallel processing, in combination with increased computational processing power. This study builds on recent work by Muis et al. (2016), who found that GTSM successfully reproduced observed daily water-level maxima across a global sample of tide gauges as well as in the US Atlantic region (Muis et al. 2019). In addition, Dullaart et al. (2020) and Bloemendaal et al. (2019) showed acceptable skill of GTSM, forced with ERA5 resolution winds (31 km), in simulating surge levels produced by tropical cyclones. Combined, these recent works suggest that GTSM-ERA5 is a viable source for developing a detailed water-level hindcast in the study region.

In this study, GTSM-ERA5 data were combined with estimates of wave setup to produce a hindcast-scale (~ 38 years) dataset of water levels with a regional coverage area (1000 + kilometers) along the US Southeast Atlantic Coast. A water-level hindcast of this sort does not exist in the region and is expected to facilitate a variety of coastal research goals, such as coastal hazard assessments. A series of statistical corrections were developed to improve model skill for individual signals (astronomic tides, monthly mean sea levels, non-tidal residual, seasonality, and wave setup; all signals defined below in Sect. 3) that contribute to the GTSM-ERA5 derived water levels in the study region. Following application of these corrections, the spatial variability of individual component contributions to the total water level, including wave setup, was quantified, and compared to gain further insight into processes contributing to flooding across the region. Results focus on extremes, as a primary interest for coastal hazard assessments, and their contributing signals. Overall, this modeling framework both accurately and efficiently reproduces coastal water levels in a complex setting influenced by both tropical and extratropical storm systems and thus is a viable approach for examining water levels at both large spatial and temporal scales.

This paper is organized with the initial section (Sect. 2) describing the regional scope of the produced dataset and the utilized observational and reanalysis data. Section 3 presents methods used to bias correct and decompose the time-series into astronomic tides, seasonality, monthly mean sea levels (MMSL), and non-tidal residuals (NTR). Section 3 also details the method for calculating wave setup, an additional critical component for experienced total water levels along the coast. The next section (Sect. 4) discusses model-observation comparisons using regional tide gauges, with model performance evaluated both for signals contributing to water levels and for combined water levels. Finally, the paper moves into results (Sect. 5) and discussion (Sect. 6).

2 Study sites and observations

2.1 Study region

This study focuses on the Southeast US Atlantic coastline, extending 1,500 km from the southeastern tip of Florida north to the entrance to Chesapeake Bay (Fig. 1). This region covers a diverse range of coastal systems including sandy beaches, bays, marshes, and barrier islands. This range of coastal morphologies is closely coupled with the variable ocean climate along the coast, as explored by this study. For example, barrier island characteristics are at the most basic level controlled by tidal range and wave climate (Hayes 1979). The coastal zone in the southern portion of this region (Florida) is highly developed, whereas the north section represents a more mixed land usage (Titus et al. 2009). Low lying vulnerable infrastructure is ubiquitous along the coast (e.g., Armstrong and Lazarus 2019; Bin et al. 2011; Wdowinski et al. 2018; Ezer and Atkinson 2014), and approximately 14 million residents live in the coastal counties spanning the study region (US Census 2021).

Fig. 1
figure 1

Study region with GTSM-ERA5 output locations marked as black circles and tide gauge locations marked as blue squares. Gray labels denote NOAA designated tide gauge station IDs. Information on tide gauges, including ID number, latitude, longitude, and the date when observations start, are listed in the table to the right of the map

2.2 Observational/reanalysis data

2.2.1 ERA5

ERA5 (Hersbach et al. 2018, 2020) atmospheric and ocean wave fields were used as forcing for the GTSM-ERA5 dataset (Muis et al. 2020) and wave setup parameterization, respectively. ERA5 represents a significant improvement over previous global reanalysis products through increased temporal (hourly) and spatial (31-km grid) resolution, improvements to model physics, and a drastic increase in assimilated data. Validations of ERA5 meteorological products have shown significant improvements in model predictive skill over the previous ERA-Interim product (Rivas and Stoffelen 2019; Haiden et al. 2019).

A variety of wave hindcast and reanalysis products are available with data within the study region. The ERA5 wave product (Hersbach et al. 2018, 2020) was selected primarily for consistency with the hydrodynamics, which were forced with ERA5 atmospheric fields. Additionally, ERA5 improves upon other available wave hindcasts by including hourly assimilation of altimeter derived significant wave height data. Several studies have shown as good or better skill by ERA5 compared to other wave products (Erikson et al. in review; Timmermans et al. 2020).

2.2.2 Tide gauges

NOAA’s tide gauge database (https://tidesandcurrents.noaa.gov/) was used to validate the GTSM-ERA5 water levels. This study benefits from the region’s relatively high density (seventeen) of long-term tide gauge records. The full available 6-min observed water-level intervals were used (Fig. 1).

3 Methods

This study features a significantly simplified modeling framework than previous studies of this scale. Specifically, the unstructured nature of Delft3D Flexible Mesh (utilized in GTSM-ERA5) eliminates the need for various levels of model nesting to downscale global-level oceanographic data to the local scale. Furthermore, the increased resolution and quality of the ERA5 reanalysis product reduce the need for dynamic downscaling of atmospheric data. Therefore, the full methods of this study reduce to a series of statistical corrections to the GTSM data and a wave setup model. These components are detailed in the following sections.

3.1 Hydrodynamics

Modeling of nearshore water levels is performed using the Global Tide and Surge Model version 3.0 (GTSMv3.0) (Muis et al. 2020, 2022). GTSMv3.0 is a global-scale implementation of Delft3D Flexible Mesh (Kernkamp et al. 2011) developed by Deltares. The unstructured nature of the GTSM grid allows for variable mesh resolution ranging from approximately 2.5 km along the US coastlines to 25 km in the deep ocean. While resolution is improved from previous global models, nearshore resolution is still often insufficient to resolve small-scale coastal features. The overall mesh is over 5 million grid cells with bathymetry set using the General Bathymetric Chart of the Ocean (GEBCO 2014) bathymetry dataset, which provides global 30 arc-second bathymetry. GTSM is a global model so has no open ocean boundaries, only atmospheric. Therefore, tides are generated internally through tidal generating forces for over 60 harmonics (Apecechea et al. 2017). Atmospheric forcing is provided through wind- and sea-level pressure fields using the ERA5 climate reanalysis (Hersbach et al. 2018, 2020). The 38-year hindcast dataset (1979–2017) produced using GTSM forced with ERA5 is called GTSM-ERA5 (which is used for the rest of this paper). GTSM-ERA5 was configured with output stations at approximately 20 km resolution along the coastline (see Fig. 1). These stations were supplemented with outputs co-located at tide gauge locations as well as increased station resolution in bathymetrically complex regions. Co-located output/tide gauge locations allow for multi-decadal-scale model-observation comparisons at various locations within the study region.

GTSM-ERA5 and NOAA tide gauge water levels were decomposed into a subset of contributing signals: tides, seasonality, non-tidal residual (NTR), and monthly mean sea level (MMSL). Water levels were separated into 4 components to gain insight into processes contributing to high water levels, as well as to better develop methods for process specific corrections to each signal. The decomposition of water levels was accomplished by first separating water levels into tides* and NTRs*, labeled here with a star as these signals are intermediate and further separated. For GTSM-ERA5, two different model runs were performed, one with full forcing and one with tidal only forcing. Modeled NTRs* were calculated by subtracting modeled tides from simulations with full forcing. For tide gauge stations, observed NTRs* were calculated by subtracting NOAA-predicted tide signals from observed water levels. After separating tidal* and NTR* water levels, each of these signals was split into two additional signals. NTR* was separated into a MMSL component and a NTR component (as defined as NTR* with MMSL removed). Tidal* water levels were separated into a seasonal signal and a tidal signal (as defined as tidal* with seasonality removed). The definition and calculation of the MMSL and seasonal signal are described below in Sects. 3.1.3 and 3.1.4.

Here we define the still water level (SWL) as the linear superposition of four components (Eq. 1a): astronomic tides, astronomic seasonal cycles, non-tidal residuals, and monthly mean sea levels. Note that the SWL varies in time but includes the term “still” to distinguish the elevations from the addition of wave setup or run-up. The separation of SWL signals can be visually seen for both the observed and raw GTSM-ERA5 data in Fig. 2. The addition of wave setup (see Sect. 3.2) to the SWL is termed “mean water levels” or MWLs (Eq. 1b) (FEMA 2015).

Fig. 2
figure 2

Example comparison of observed and modeled SWL signals at tide gauge 8665530 (Charleston, SC). Displayed time series are before statistical correction (raw GTSM-ERA5 data). Note that signals are decomposed as per Eq. (1a), so tides have seasonality removed and NTR has MMSL removed

$$\text{Still}\,\text{Water}\,\text{Level}\,(\text{SWL})=\text{Tides}+\text{Seasonality}+\text{NTR}+\text{MMSL}$$
(1a)
$$\text{Mean}\, \text{Water}\, \text{Level}\,(\text{MWL})=\text{SWL}+\text{wave}\,\text{setup}$$
(2b)

3.1.1 SLR

Observed SWLs across the study region were found to have a statistically significant trend (ranging from 2 to 4 mm/year). At the scale of the considered 38-year period, this trend was considered contributable to sea-level rise (SLR) rather than short term variability. Global SLR was included in GTSM-ERA5 (both in the full forcing and tide-only runs) to capture nonlinear effects of SLR on hydrodynamics (Idier et al. 2019; Wahl 2017). Modeled local trends in water levels were found to be consistent with observed trends at the tide gauges and satellite altimetry (Sweet et al. 2022). This study takes the approach of removing the trend in SWL, both from the modeled and observed time series. A linear regression was fit to the entire modeled and observed time series at each output location and then removed from the water-level time series. Higher-order polynomial fits for SLR were explored but found statistically insignificant. This is likely because SLR nonlinearities have become more significant in recent eras. This study is focused on a hindcast period when SLR was better modeled as a linear trend. SLR was removed for a variety of reasons, the first being that it allows for a stationary analysis where extremes (as well as other SWL statistics) are not a function of time. Secondly, the approach allows for validating the effectiveness of the modeling framework separate from SLR. In the future, SLR becomes a source of significant uncertainty (Sweet et al. 2022), so it is valuable to assess the skill of the modeling framework separate from the SLR component.

3.1.2 Tides

The tidal component of SWL is the largest error fraction, with respect to root-mean-square error (RMSE), in GTSM-ERA5 data at most tide gauge comparison points (see Fig. 7). Therefore, an alternative tidal correction model was developed. The periodic nature of tides represents a unique problem in that corrections in the time domain are unlikely to be effective. As an example, a small error in tidal harmonic phase (shifting the tidal time series) will create a cyclic over and under bias that is difficult to correct with time domain methods targeted at systematic bias. Corrections to tidal harmonics in the frequency domain are therefore more effective. This has the additional benefit of reducing the complexity of the correction since a tidal harmonic can be fully represented by only two parameters (phase and amplitude).

A variety of openly available and well validated global tide models were selected to improve modeled tidal water levels. A subset of best performing global tide models (Stammer et al. 2014; Seifi et al. 2019) were downloaded and compared to NOAA-predicted tides at all tide gauges within the study region (see Table 1). Older versions of both the TPXO and FES model series were included as some studies have found that these older versions may perform better than the newest version in certain regions (Lyard et al. 2021). For our study site, the FES2014 (Lyard et al. 2006) and TPXO v9.2 (Egbert et al. 2010) were found to be the best performing models, with approximately equal skill on average across the region. This said, model performance was found to depend on location.

Table 1 Comparison of model RMSE (in centimeters) for all tide gauges (rows) and all tidal model products (columns) tested as part of this study.

A blended product was created to take advantage of the best parts of each model in the ensemble. Conceptually, this is like global climate model ensembles where the mean of multiple models can be considered more robust than any individual model. The developed strategy predicts tides using a weighted mean of all tidal harmonics (both phase and amplitude). The weight for each model was determined by the average performance of that model across all tide gauges. More specifically, each tidal constituent amplitude/phase was assigned its own weighting of the contributing models as determined by the performance of that model. Equation (2b) details the utilized weighting scheme:

$${{{W}}}_{\text{i},\text{j}}=\left(\frac{1}{{MAE}}_{\text{i,j}}\right)/\sum\limits_{j=1}^{n}{MAE}_{\text{i,j}}$$
(3)

where

W: Individual model weighting for the specific harmonic phase or amplitude.

MAE: Mean Absolute Error across all tide gauges for the specific harmonic phase/amplitude.i: index for each amplitude and phase of tidal harmonics j: index for models contributing to the tidal ensemble n: number of models available for the specific amplitude/phase of the tidal harmonic

The numerator of Eq. (3) penalizes the model’s weight in the weighted average for a larger MAE. The denominator normalizes the weighting so that the sum of weights is 1.

Tides were computed by extracting each tide model’s phase and amplitude for all available tidal harmonics at each GTSM-ERA5 output location. Each models’ contributions to the weighted ensemble were controlled by the harmonics available for that specific model. Since more dominant harmonics were included in all tidal models, these harmonics had a larger ensemble sample. As GTSM-ERA5 data are only available as time series, the phase and amplitudes were determined by performing a harmonic analysis using Utide software (Codiga 2011, 2021) at each output location. While generally found to perform less well than using results from the FES and TPX models, GTSM-ERA5 was included within the ensemble to integrate nonlinear interactions between tide and surge as well as higher-order harmonics. Utide considers 68 harmonics, all of which were included in the ensemble. A large percentage of these were constrained only by the GTSM-ERA5 dataset since the other utilized tide models only include the most important harmonics (approximately 13). A new set of phase and amplitudes were generated by combining the ensemble phase and amplitudes through the weighting scheme detailed above. Tides were then recomputed using this new set of phase and amplitudes using the t_tide software package (Pawlowicz et al. 2002).

3.1.3 Astronomic seasonality

Seasonality was defined for this project as the astronomically derived SA (Solar Annual; period of 365 days) and SSA (Solar Semi-Annual; period of 183 days) harmonics within the tidal component. This differs from other study definitions of seasonality, for example by NOAA (NOAA 2009), who analyses the yearly average of MMSL. NOAA’s definition would therefore include seasonal variations in NTR in addition to tides. Seasonality was included as an individual component in the modeling framework for a variety of reasons. The first was that seasonality was determined to be a major source of error in GTSM-ERA5 data. Secondly, seasonality was found to be insufficiently reproduced using the tidal model ensemble method (see Sect. 3.1.2). This was primarily because most of the selected tidal models do not include seasonal harmonics. Only FES2014 and GTSM-ERA5 contained seasonality as a tidal component, and both were found to greatly underpredict variability in the study region, necessitating the need for corrections to this parameter.

A simple linear regression-based shift of the seasonal harmonic phase and amplitude was selected as the statistical correction methodology. The regression was developed using a single explanatory variable, station latitude, which was implemented to account for along coast variations in the seasonal error (see Eq. 3).

$$\text{Corr}={B}_{1}\,\text{Lat}+B_{2}$$
(4)

where Lat: latitude in degrees of the tide gauge location. B: least squares fitted regression coefficients. Corr: Correction applied to the phase/amplitude of the Seasonal Harmonic.

Each harmonic amplitude and phase error (the response variable in the regression) was fitted with an individual linear regression based on the error at tide gauges within the region. This can be seen visually in Fig. 3. Notably, two gauges were omitted from the linear regression fitting (station 8652587 and 8654467; Oregon Inlet, NC, and Hatteras, NC) due to poor comparison between GTSM-ERA5 data and observations. A sensitivity analysis showed that inclusion of these stations did not significantly affect regression coefficient estimates. Rather, these stations were omitted more for consistency with the handling of these outlier stations in other parts of the modeling framework.

Fig. 3
figure 3

Visual representation of linear corrections to the SA (Solar Annual; period of 365 days) and SSA (Solar Semi-Annual; period of 183 days) harmonics amplitude (subpanel 1 and 2) and phase (subpanel 3 and 4) as a function of latitude. Dashed lines are the linear relationship fitted to the circles which plot the error at individual tide gauge stations. Triangles are stations that were not used for fitting the correction

3.1.4 Monthly mean sea level

Monthly mean sea level (MMSL) was derived in this study using a low-pass filter on SWLs after the removal of tides and seasonality. A filtering approach was chosen instead of the more conventional “monthly means” method as a more complete method of removing low-frequency energy from the signal. The monthly means definition is somewhat complicated when removed from the overall signal, requiring interpolation between months to a finer temporal resolution to remove offsets. Furthermore, the resulting effects in the frequency domain are complicated with removal and retention of periodic signals being based on the relationship of the mean and signal’s periods (Wang et al. 2009). This study instead uses a 4th-order Butterworth filter with cutoff frequency of 0.3858 µHz (1/30 days) applied using a backward forward methodology to remove phase shift. Various filter designs and orders were tried, and the resulting MMSL was found to be fairly insensitive to filter choice, the only changes being small shifts in the partitioning of signal energy between MMSL and NTR. Overall, the resulting MMSL time series was found to be a cleaner representation of low-frequency energy than the classic monthly means definition, although it is acknowledged that a multitude of methods for looking at longer-scale signals exists (for example, that used in Li et al., (2022)).

It was decided to separate MMSL from the NTR signal for additional insight into GTSM-ERA5’s performance. MMSL is often controlled by different processes than local NTR. The US East Atlantic has seen a recent surge of research on MMSL after it was noticed that tide gauge records in the region showed a significant acceleration of SLR, approximately 3–4 times greater than the global average (Sallenger et al., 2012). The exact cause of this increase in MMSL remains debated but has been tied to a variety of processes, including regional atmospheric patterns (Andres et al. 2013; Domingues et al. 2018; Zhao and Johns 2014), Gulf Stream dynamics (Andres et al. 2013; Domingues et al. 2018; Ezer et al. 2013; Kopp 2013; Yin and Goddard 2013), ice sheet decline (Davis and Vinogradova 2017), and climate patterns such as the North Atlantic Oscillation (NAO) (Ezer and Atkinson 2014; Kenigson et al. 2018; Kopp 2013). GTSM-ERA5 is not capable of capturing all of these potential contributors to MMSL, so the signal was separated from higher frequency NTR for further investigation and potentially for correction.

Despite limits to the physics included in GTSM-ERA5, further investigation found that MMSL variability was reasonably well represented by GTSM-ERA5. GTSM-ERA5’s MMSL signal was found to have a small bias (~ 3 cm) in the mean MMSL value in comparison to observations. Therefore, MMSL was post-processed by adding 3 cm. With GTSM-ERA5 skill found to be sufficient, no additional statistical corrections were applied to the MMSL component.

3.1.5 Non-tidal residual

Non-tidal residual (NTR) in this study was defined as conventional NTR* (SWL minus tides), but additionally with MMSL removed (see Sect. 3.1.4). Therefore, NTR is herein defined as all NTR energy with a frequency higher than 0.3858 µHz. GTSM-ERA5 was found to have significant skill at resolving NTR across the study region with only a few problematic tide gauges. Tide gauges 8652587 (Oregon Inlet, NC) and 8654467 (Hatteras, NC) were flagged as outliers in comparisons to GTSM-ERA5. Further investigation showed that these gauges are located behind thin barrier islands with small inlet channels. While GTSM-ERA5 leverages unstructured meshing to have very high coastal resolution for a global model, the resolution is still on the order of 2.5 km. This resolution remains insufficient to resolve small-scale coastal features (such as thin barrier islands or small channels). While sub-resolution features make model-observation comparisons at the flagged tide gauges problematic, it is less of a problem for the overall study and produced hindcast dataset. In the resulting analysis, GTSM-ERA5 data are only examined in the nearshore open-coast region. The use of only open-coast GTSM-ERA5 data restricts the study’s conclusions to the open coast region and means that GTSM-ERA5 does not need to be, nor is expected to be, skillful within back bays.

Overall, GTSM-ERA5 NTR data were found to be reasonably skillful in comparison to tide gauges (average RMSE of 6 cm). A statistical correction was performed due to the importance of NTR during flooding events. Additionally, extreme SWLs were found to be significantly underpredicted, a result in agreement with results by Muis et al. (2017). Underprediction of extremes is particularly undesirable due to the importance of these events from a coastal hazard perspective.

Bias correction was performed using a modified version of quantile matching. While quantile matching (Déqué, 2007) has seen successful application in coastal modeling settings (Parker et al. 2017), this study has a much larger region of interest where exact matching of model cumulative distribution functions (CDFs) to observations becomes problematic. Specifically, applying corrections derived at tide gauges to locations far from the tide gauge may introduce errors. While this approach is possible, most likely through interpolating corrections between tide gauges (Arns et al. 2015), it is likely that an exact CDF matching constraint is too strong and may result in overfitting. For example, there is no reason to assume that the GTSM-ERA5 bias measured at a tide gauge located deep within a hydraulically complex river mouth is the same as the bias at a nearby ocean coast location. This study compares each tide gauge to GTSM-ERA5 data as a single sample of the overall error in the model. We therefore correct the GTSM-ERA5 data at each quantile by the average error across all tide gauges. The same correction (based on the average error) is applied across the entire region, as shown in Fig. 4.

Fig. 4
figure 4

Visual representation of the utilized quantile-based statistical correction for non-tidal residuals. Each line represents the cumulative distribution function (CDF) corrections to fit modeled CDFs to observations at individual tide gauges. The dotted line is the applied correction representing the average quantile correction across all gauges

Figure 4 plots the difference (observed NTR–modeled NTR) between CDFs at each quantile. Therefore, the value on the x-axis (correction) is the amount that must be added to the model NTR to exactly match the observed CDF. The dotted line is the average of all tide gauges and therefore the correction applied to the GTSM-ERA5 NTR data across the entire region. Figure 4 shows that GTSM-ERA5, in general, overpredicts low NTR and underpredicts high NTR (a bias toward too little variability). The median bias (quantile 0.5) is low (model underprediction), and this low bias increases moving toward larger NTR with the largest underpredictions found for the upper extreme NTR. Across tide gauges, the general trend, although with significant variability, is increasing bias in NTR moving from the south (darker colors) to the north (lighter colors).

Notably, three outlier gauges were removed from this analysis and not included in the average correction calculation. Tide gauges 8652587 (Oregon Inlet, NC), 8654467 (Hatteras, NC), and 863901 (Chesapeake Channel, VA) correction CDFs were found to deviate significantly from the general pattern shown in Fig. 4. Gauges 8652587 (Oregon Inlet, NC) and 8654467 (Hatteras, NC) were highlighted earlier in section as likely effected by a mesh resolution/bathymetry-related issue. Gauge 8638901’s (Chesapeake Channel, VA) outlier behavior is probably due to an insufficient overlap between the GTSM-ERA5 and tide gauge record (less than a year). Additionally, the gauge is located at the center of the Chesapeake Bay inlet and likely experiences complex hydrodynamics not resolved by GTSM-ERA5. Based on these arguments, it was judged valid to remove these outlier stations from the NTR statistical correction procedure.

3.2 Wave setup

This study additionally considers waves as an important driver of coastal MWLs in the study region (Dietrich et al. 2011a, 2011b). The primary pathway through which waves influence flooding is wave setup, via an increase in mean sea level near the shoreline caused by the transfer of momentum during wave breaking. While either wave run-up or setup could have been considered in this study, wave setup was selected because sustained flooding occurs at timescales of hours to days. Wave run-up occurs at scales of seconds to minutes and so wave setup is more influential and appropriate for consideration in a flood hazard case. Waves can additionally affect MWLs through coupling with surge, tides, sea level, and other processes, but these effects are generally smaller than wave setup along the open coast (Idier et al. 2019). There are two approaches for calculating wave setup: physical modeling or empirical formulations. At the scale of this study, 1000 s of kilometers, a physical model capable of accurately resolving wave setup would be computationally prohibitive. Instead, this study utilizes the empirical Stockdon formulation (Stockdon et al. 2006), following other large spatial scale approaches (e.g., Melet et al. 2018; Kirezci et al. 2020; Wahl et al. 2016; Serafin et al. 2017).

The Stockdon formulation relates setup at the shoreline to the deep-water wave height, period, and beach foreshore slope. Foreshore beach slope data were obtained from the USGS National Assessment of Coastal Change Hazards (Doran et al., 2022). The dataset consists of foreshore slopes along pre-defined transects spaced approximately 50 m in the alongshore direction (Farris et al. 2018). Transects were resampled from 50-m spacing to 1-km spacing alongshore. At each transect, a temporally average foreshore beach slope was calculated from all available shoreline data and spatially smoothed using a Hanning filter to minimize noise (Doran et al. 2015). Wave characteristics were determined by extracting data from the full ERA5 (Hersbach et al. 2018, 2020) wave output grids along the 10 m depth contour. ERA5 wave data were then matched to the nearest transect offshore endpoint, while also preventing duplication with multiple wave data points matched to a single transect location. The 10 m depth ERA5 wave data were then reverse-shoaled to respective deep-water conditions using linear wave theory and assuming a shore-normal approach. This approach, rather than directly extracting the wave parameters from deep-water, was taken for two reasons. The first was to account for local changes to wave conditions between deep-water and the nearshore. The second was for consistency with the methodology used in developing the wave setup parameterization, e.g., Stockdon et al. (2006). The spatial resolution of the transect-based wave setup data is significantly finer than the GTSM-ERA5 data. Therefore, a characteristic setup was calculated for each GTSM-ERA5 data location by averaging wave setup across all transects within 5 km of the location of interest. It is also worth noting that the utilized wave setup parameterizations predict wave setup at the beach while GTSM-ERA5 output stations are in the nearshore. There is therefore a cross-shore offset between these two MWL components. This was explored further by examining GTSM-ERA5 output stations located at multiple locations in the cross-shore to see how much predicted SWL varied. No significant difference was found, so the cross-shore offset was deemed to be similarly non-relevant to results. This conclusion is likely since the processes responsible for changes to MWL in the nearshore (e.g., wave setup/setdown, three-dimensional circulation, etc.) are not resolved in the GTSM-ERA5 model.

4 Model-observation comparisons

Assessment of model improvements following individual SWL component corrections was evaluated by comparing observed and modeled time series data (Barnard et al. 2023a, b) at each of the tide stations. Model performance was validated both from a SWL perspective, but also for individual contributing signals. This section starts with an evaluation of SWL and then evaluates model-observations differences of each contributing component described in the preceding sections.

4.1 Water levels

Figure 5 shows both a visual and table-based comparison of observed and modeled SWLs (corrected) at all tide gauges within the region. As for comparisons above, Station 8652587 and 8654467 (Oregon Inlet, NC, and Hatteras, NC) were removed as unresolved by the model. Calculated skill values are for the full overlapping period between the tide gauge record and the SWL hindcast. RMSE and R2 are calculated using standard definitions. Bias is calculated using the mean bias error (MBE) definition and denotes the average bias in the model.

Fig. 5
figure 5

Model skill comparison between corrected GTSM-ERA5 SWLs and tide gauge data. The left panel shows the data spatially by plotting RMSE at the location of the utilized tide gauge. The right panel shows the model skill data in table format

Overall, it is found that SWLs are well predicted by the corrected GTSM-ERA5 data. The calculated RMSE from the overall modeling framework is comparable with other studies/models using much more complex and finely resolved models. In addition to improvements of RMSE, overall bias in the model is reduced to below 2 cm for all gauges.

4.2 SWL component signals

Signal decomposition both provides the opportunity for process specific corrections, but also to examine model performance for different SWL components. Figure 6 shows a comparison of GTSM-ERA5 model outputs versus observations at a single tide gauge: 8665530 (Charleston, SC). This location was chosen for its average performance across the various SWL components as well as being located near the center of the study region. Figure 6 shows the effects of statistical correction by including model performance both for the raw (top row) and bias-corrected (bottom row) GTSM model output.

Fig. 6
figure 6

Observed versus model scatter plot for non-tidal residual (column 1), MMSL (column 2), seasonal cycle (column 3), and tide (column 4). Row 1 corresponds to the raw data from GTSM-ERA5, while the 2nd row corresponds to the model results after statistical correction. Data are plotted for NOAA tide gauge 8665530 (Charleston, SC). The plotted scatter color is based on the density of points in the region with dark colors being less dense and light colors being denser

In Fig. 6, the most noticeable changes are for the seasonal and tide corrections with large improvements to RMSE and model skill. For this specific tide gauge, NTR corrections did not improve overall/average NTR. Extreme NTRs were found to be improved which can be seen by the shifting of higher NTR data points. MMSL scatter shape is not changed, due to the nature of the correction, but the bias is removed via a vertical shifting.

To further understand inter-station variability in model performance, an analysis was performed to determine each MWL component’s contribution to MWL error. For this analysis, the RMSE (which is the same as the standard deviation of the error) is converted to variance. This is because variances are additive (assuming independence), allowing us to sum the errors of individual components. Therefore, theoretically the variance of each MWL component should add up to the variance of MWL. This is mostly true with a small amount of “unexplained” variance (marked in Fig. 7 in gray) that is due to the components not being truly independent.

Fig. 7
figure 7

Error variance between the observed and model SWL at each tide gauge location and separated by process signal. Unexplained variance is the difference between the total error variance and the sum of the individual components. Top brackets show the state where each tide gauge is located. The final “Average Error” section shows the average error variance across all tide gauges as well as an estimate of the error variance for wave setup (not calculated at each tide gauge station) from Stockdon et al. (2006)

From Fig. 7, poorly performing locations are traced to the source SWL component. Tides are the main source of poor SWL predictive skill, with error an order of magnitude greater than the other components at stations 8651370 (Duck, NC), 8662245 (Oyster Landing, SC), 8670870 (Fort Pulaski, GA), and 8720030 (Fernandina Beach, FL). Therefore, improvements to the tide model at poorly performing locations are the most viable way to improve overall model performance. Seasonality is a non-relevant contributor to error, while NTR and MMSL are approximately uniform across station locations.

A general estimate of wave setup error is plotted as the far-right bar in Fig. 7 for comparison to other SWL signals. This estimate was based on the regression RMSE from Stockdon et al. (2006). Station explicit setup error estimates are not included because no data exist for wave setup at tide gauges. Furthermore, tide gauges are generally considered to not contain wave setup signals. This is not because wave setup is absent in locations with tide gauges, but rather that the mechanics and signal processing of tide gauges is designed to filter out wave setup signals (Sweet et al. 2015). Therefore, wave setup is considered absent in both the observations and modeled datasets, so not included in the statistical correction. With no observed data for comparison, as well as reduced physics (parameterization vs. a physical model), wave setup can be considered to have a relatively high level of uncertainty. This is reflected in the large, plotted error variance in Fig. 7, which is more than double the SWL error at most tide gauge stations.

4.2.1 Tides

Tides were found to be the largest source of error in uncorrected GTSM-ERA5. This is in part because tides are the dominant source of SWL variability in the region. Therefore, small errors in phase or amplitude of tidal constituents can control the overall error in a SWL comparison. Also, GTSM-ERA5 is a purely hydrodynamic model with tides generated internally. While global-scale hydrodynamic models have made significant recent progress (Apecechea et al. 2017), they remain less skillful than data assimilation-based products (Stammer et al. 2014). Table 1 confirms this via comparison to tide gauges and strengthens the decision to use the ensemble tide method.

For the ensemble, the resulting tidal data are better than any of the contributing tidal models, although only by a small margin (see Table 1). This said, the tidal ensemble has other intrinsic benefits such as uncertainty estimation and an efficient pathway to include additional tidal information as it becomes available. Additionally, it is likely that improvements to the tidal ensemble could be made by including more tide models, better tuning of the weighting function (Eq. 2b) or incorporating corrections.

As noted above (Sect. 4.2 and Fig. 7), tides dominate the model error at some of the tide gauges (gauges 8651370, 8662245, 8720030, 8670870: Duck, NC, Oyster Landing, SC, Fernandina Beach, FL, Fort Pulaski, GA). In general, at these locations none of tide models successfully reproduce the tidal signal. Tides are handled fundamentally different in this study than the other components of SWL in that the methodology uses ensemble methods to produce a better tide estimate, rather than a statistical correction. Because observations are not used for corrections (only in the weighting), if all models in the ensemble agree about an incorrect tidal signal, then the ensemble calculated tides will similarly be incorrect. Tide gauges 8662245 (Oyster Landing, SC) and 8720030 (Fernandina Beach, FL) are both located within complicated estuary systems where local complexities of tidal wave propagation are likely not being fully resolved by the utilized coarse-scale global tide models. This resolution issue is likely less of an issue at the open coast locations used for analysis in this study.

4.2.2 Seasonality

The seasonality correction was found to greatly improve the seasonal signal. Overall RMSE (across all gauges) was reduced from approximately 5 to 1.5 cm through the correction. From the average perspective, the correction is found to work well, but with an overcorrection at the farthest north gauges (8632200, 8638610, and 8651370; Kiptopeke, VA, Sewells Point, VA, Duck, NC). Even for these “overcorrection” sites, corrected seasonality was found to make up the smallest fraction of the error budget (see Fig. 7). Based on this analysis, a more complicated correction was decided to be unwarranted.

4.2.3 Monthly mean sea level

Corrected MMSL values reasonably match observed values. Figure 6 shows that MMSL skill (as measured by R2) is less than for other SWL components. This is because MMSL is subject to less stringent corrections than other components. GTSM-ERA5 MMSL was found to be sufficiently accurate that more complex statistical correction was not warranted. This said, Fig. 7 shows that MMSL can become a proportionally larger percentage of the overall error if overall error is small. For example, at station 8723214 (Virginia Key, FL) MMSL represents approximately half of the overall error. As this study is most interested in overall error in predicted SWL, this is not problematic as the overall error remains low. GTSM-ERA5’s ability to reproduce MMSL was unexpected based off previous research showing the importance of the Florida Current (Ezer et al. 2013), which should be unresolved in a two-dimensional (2D) hydrodynamic model implementation.

4.2.4 Non-tidal residual

For the study region, NTR is the second largest contributor to SWL, and often the largest contributor during extreme events. Therefore, a proper handling of NTR in the modeling framework is requisite for properly resolving flooding and coastal hazards. After correction, GTSM-ERA5 NTR data match observations (average RMSE of 7 cm). NTR performance varies location to location (around 5 cm), likely due to the individual bathymetry characteristics of the tide gauge locations. For example, the worst performing comparison (station 8662245, Oyster, Landing, SC) is located in a complex shallow estuary, far landward up a small braided tidal channel. The utilized resolution for producing the GTSM-ERA5 dataset is unlikely to properly resolve NTR propagation up an estuary of this type.

5 Extreme values and relevant drivers

This study represents a large temporal and spatial hindcast of MWLs that are relevant for a variety of users and applications. Significant efforts have been made to apply robust corrections and improve overall model skill through iterative validation. Results presented in this section will focus on MWL extremes as the primary driver of coastal flooding. Extreme events are defined in this section using the point-over-threshold (POT) method with de-clustering of events to prevent the double counting of individual storm events (Wahl et al. 2017). A unique POT threshold value is selected for each GTSM-ERA5 station. The utilized threshold is calculated using an iterative procedure that sets the threshold using the constraint that a specified number of extreme events are selected out of the full record. Extreme values are calculated from the MWL time series consisting of both the GTSM-ERA5 derived SWL and the wave setup components. The following sections also discuss empirical recurrence interval events, which are calculated by ranking the POT events and inverting the empirical CDF. Using this definition, the annual event is the 38th largest POT event out of the 38-year record and the maximum observed event has a 38-year recurrence interval.

5.1 Contributors to extreme MWLs

Figure 8 shows GTSM-ERA5 output for all stations and all water-level components along the study region coastline. The data are presented in two ways to demonstrate how contributions vary spatially (panel b) and how contributions are a function of event rank (panel c).

Fig. 8
figure 8

Panel (a) shows GTSM-ERA5 stations (in black) with highlighted “multi-POT” node locations marked in red and numbered to correspond with panel (c). Panel (b) shows the average contribution to MWLs for the top 38 events (equal to the number of simulated years) at each station. Panel (c) shows the contribution to extreme MWLs for the max event (top bar) and binned events ranging from 5th largest to 60th largest. Binned values are calculated by averaging the 10 events bracketing the event rank of interest (e.g., the plotted bar for rank “15” is the average of all events from rank 10–20)

Extreme events are both compound (composed of multiple signals) and variable along the coastline (Fig. 8). Tides, NTR, and wave setup are the dominant contributors to extremes with the balance between these signals varying as a function of latitude. The average contributions for the top 38 events (approximately the 1-year event to the 38-year event: panel b) show that the tidal component displays the most spatial variability. Tides are most important at the center of the study site, around Georgia and South Carolina, with the relative magnitude tapering to the north and south. Tidal range in general is larger in this region with the coastline sweeping landward away from the nearest amphidromic point. Wave setup has the opposite spatial pattern as tides, reaching a minimum at the center of the study region and increasing moving north and south. NTR is more spatially uniform, although a general trend is present with NTR decreasing moving south. The sum of the components follows a spatial pattern similar to tides, reaching a max near 31 degrees latitude and decreasing moving north and south. A recent study of tide gauges by Li et al. (2022) provides additional agreement to these results, showing a decrease in NTR and increase in seasonal cycle contributions moving south.

Panel c of Fig. 8 shows how contributions to extreme MWLs change as a function of event rank. Rather than looking at individual events of a certain rank, plotted is a binned value calculated by averaging the 10 events bracketing the event rank of interest. Therefore, the plotted bar for rank “15” is the average of all events from rank 10 to 20. Binning is done to provide a more robust interpretation of MWL contributions that smooths over event-to-event variability. The plotted “max” event is an exception, which is not binned but still included to show the most extreme hindcasted event. In general, the NTR contribution increases moving from the 60-bin rank (bottom bar in each subplot) to the max event (top bar in each subplot). This is especially visible in subplot 2, a location near the border of North and South Carolina. At this location, NTR magnitude more than triples moving from the rank 60 bin to the max event. Tides are found to be more uniform in magnitude, which is expected due to the random nature of the co-occurrence of storms and tides. Tidal contributions may vary from event to event but would be expected to have a somewhat consistent contribution when averaged across multiple events. The wave setup contribution increases slightly moving from more frequent events to the most extreme.

Results also show that the location of maximum event magnitude along the coast is dependent on the recurrence interval of the event considered. The average of the top 38 events, Subpanel b, has a spatial maximum at around 31 degrees latitude. The maximum event, on the other hand (shown in subplot c as the top bar), is much larger at 34 degrees latitude. This suggests that different locations are more susceptible to nuisance-level flooding, while others may be more susceptible to higher recurrence interval (rare) events. That said, the large difference between the Rank 1 and Rank 2 events (not shown) in the region around 34 degrees latitude suggests that the hindcasted extreme event may be uncharacteristically extreme. A larger sample of storms would be necessary to properly constrain the magnitude of the 38-year recurrence interval storm.

5.2 Dominant forcing mechanism for extreme MWLs

Of interest to coastal hazard analysis is the specific process that is the “dominant” contributor to extremes. This allows for a conceptual understanding of what processes are important and the types of events that need to be considered. For example, coastal risk mitigation strategies will likely be different if flooding is controlled by tides or hurricanes. This concept is further explored for the study region in Fig. 9.

Fig. 9
figure 9

Dominant MWL component for the top 38 ranked events at each GTSM-ERA5 station. The first panel plots the GTSM-ERA5 output stations which have their dominant components plotted in the second panel. For the second panel, each row represents a GTSM-ERA5 output station, and each column represents an extreme event, with event rank 1 being the highest over the 38-year record. The color indicates what MWL component was dominant for the specific ranked event

Figure 9 plots the dominant MWL component for each individual event ranging from the 38th largest event (annual event: left side of the plot) to the largest event (38-year event: right side of the plot). Event rank is determined at each station, so columns do not necessarily represent a temporally consistent event. This plot differs from Fig. 8 in that MWL contributions are not averaged/binned, so the individual variability between events can be seen. This shows that many stations have a mixture of dominant event contributors, often sharing between tides, NTR, and wave setup. Therefore, no location has a “standard” MWL component mixture that could simplify the modeling process. This said, in general, setup is found to be more dominant in the south, tides in the central region, and NTR in the north.

Both Figs. 8 and 9 bring to focus the importance of NTR for the most extreme events. Tides, MMSL, seasonality, and wave setup (to a lesser degree) are bounded, while NTR has the potential to become very large. Large NTR signals are shown by the top bars in Fig. 8, panel c (max observed event), as well as NTR being more often dominant moving toward more extreme events in Fig. 9. Large NTR signals are primarily the result of hurricanes which can drive surge on the scale of meters (Irish et al. 2008; Woodworth et al. 2019). For this reason, many studies only consider hurricanes in hazard assessments as these events are considered the primary cause of extremes (Marsooli et al. 2018). Figure 9 shows this assumption is not strictly valid since some locations are not NTR dominated (e.g., southern Florida, where a narrow shelf limits storm surge), even for the max observed event. Moving toward less extreme events, this assumption become increasingly weaker, with the annual event (rank 38) being almost entirely dominated by signals other than NTR.

While hurricanes are in general characterized by large NTR signals, it is possible that hurricanes are still driving extremes even if NTR is not the dominant signal. Figure 10 explores this concept further by plotting the percentage of the top 10 extreme events produced by hurricanes along the study region. This was determined by selecting the top ten ranked events at each location and then cross referencing their occurrence date with a database of hurricane arrivals along the coast. The extreme MWL event was considered to be caused by a hurricane if the MWL event was within 2 days and 500 km of hurricane land arrival.

Fig. 10
figure 10

Percentage of top 10 extreme events that can be traced to local hurricane forcing

Figure 10 shows that for much of the region, hurricanes are major drivers for extreme events but not the only driver. For many locations, only half or less of the top 10 events are related to a hurricane. Therefore, only modeling hurricanes would miss a significant number of extreme events and bias the resulting extreme value analysis. The Mid-Northern part of the study region is more effected by hurricanes with up to 100% of the top 10 events being hurricane driven. The large number of hurricanes is contrasted by parts of the middle of the region (near Georgia) where closer to 3 of the top 10 extreme events are related to hurricanes. Notably, this does not signify that these areas were not impacted by hurricanes, just that the result was not necessarily a top 10 extreme event. The presence of non-hurricane top 10 events emphasizes the importance of the co-occurrence of various signals (e.g., high tides, large waves, etc.) to produce an extreme event. Furthermore, this shows that using only hurricane events to constrain the recurrence interval of extremes is likely missing many of the most extreme events, even up to the 38-year event. The record length of this analysis (38 years) is a limitation to the robustness of this conclusion, which is explored further in the discussion (Sect. 6.3).

6 Discussion

6.1 Model performance for SWL components

Overall, the modeling framework detailed in this study is found to perform quite well for predicting SWLs, exhibiting a similar skill to significantly more finely resolved or physically complex models. For example, Olabarrieta et al. (2011) produced RMSEs of the same order using the fully wave coupled three-dimensional (3D) COAWST model implemented with 75 m grid resolution. Also, this study’s model-observation comparisons are done against tide gauge data, which are most often located within ports or estuaries. Therefore, the observation points used for comparison are not a proper sample of the full study region and likely provide a non-representative interpretation of GTSM-ERA5’s skill as developed in this study. It is expected that GTSM-ERA5, which serves as the basis for MWL calculations, would perform better at locations with less complex and fine-scale bathymetry. GTSM-ERA5 output stations used for analysis and the hindcast product are all at open coast locations, away from the complex geometries and hydrodynamics of inlets. Therefore, the presented model skill analysis is likely a conservative estimate of the overall model skill.

As a potential source of error, Delft3D FM does not provide output at specified locations exactly (either through interpolation or direct calculation) but rather snaps output to the nearest mesh node location. Therefore, GTSM-ERA5’s produced output can be from a location spatially offset from the actual tide gauge. Based on mesh resolution, this shift is a maximum of around 1 km. As, in general, surge and tide signals are slowly varying, it is expected that the effect of this potential mismatch in locations is minimal, and that accurate representation of shoreline boundaries and depth variations contributes more to the differing results. However, this spatial shifting could have more of an effect in complex bathymetric regions.

An additional consideration in model performance is that NTR was found to have a large amount of high-frequency variability. Both the observed and modeled NTR time series are found to be “noisy” with a large amount of energy with a frequency > 0.007 Hz (1/6 h). Analysis showed that there is no statistically significant correlation between GTSM-ERA5 and observed NTR for this high-frequency NTR. It is possible that this high-frequency energy represents locally controlled NTR (either by fine-scale bathymetry or local forcing) which is below GTSM-ERA5’s resolution. This noise could also be an artifact of how NTR is being calculated. Interaction between surge and tide can create lags between the tidal and full SWL time series. The subsequent subtracting of the two time series can then produce spurious NTR signals (Horsburgh et al. 2007). As a sensitivity study, the NTR statistical correction framework was run with the high-frequency NTR signal removed (taking the assumption that high-frequency energy is white noise). This procedure was found to produce a lower NTR RMSE, but it was decided to not take this approach because real NTR signals can exist within this high-frequency band. For example, it was found that the removal of high-frequency energy was smoothing out/removing NTR pulses from hurricanes, which can happen at small timescales. Secondly, it was decided to not remove the high-frequency noise as a potentially misleading assessment of true RMSE. With a fair bit of the model error residing in high-frequency noise, assessment of model skill with this signal removed may represent a false metric of actual model performance.

6.2 Model performance for extreme SWLs

With this study’s focus on extreme events, it is worth considering model performance specifically for extreme events. An analysis based on 50 extreme events for each location shows that overall (across all tide gauges) RMSE for predicted SWL increases from 13 to 18 cm. The bulk of this increased error is due to NTR. NTR RMSE approximately doubles from around 7–12 cm when looking at only extremes. In addition to NTR, MMSL RMSE increases by 2 cm. Both tides and seasonality show no change in model error. This is because storm arrival is approximately random in time and so no more likely to occur at a specific tide or seasonal phase. In other words, tide and seasonality error is approximately uniform in time so not affected by when extremes occur.

The decrease in model performance for extreme SWL is expected as, in general, extremes are more difficult to model. Current hydrodynamic models tend to underpredict extreme surge events for a variety of reasons ranging from wind fields (Akbar et al. 2017) (which are often the leading cause) to the utilized physics implementation (Weisberg and Zheng 2008) to model resolution. Additionally, this study’s choice of modeled processes is missing some physics that may be contributing to error, such as surge/wave interactions (Idier et al. 2019) and 3D effects. GTSM-ERA5 was produced using the 2D barotropic version of the Delft3D Flexible Mesh model. While baroclinic processes have been shown to be important at specific locations within the study region, implementing a global baroclinic model would be computationally prohibitive while only improving model skill on the order of 1–2 cm (Ye et al. 2020). Overall, the calculated model skill is found to be in alignment with the current state of the practice for surge modeling. As a comparison point, Marsooli et al. (2018) used coupled ADCIRC/SWAN to predict maximum hurricane surges within the same study region. Model validation for this alternative methodology showed a RMSE of ~ 20 cm on average. Veeramony et al. (2017) showed an even larger average RMSE of ~ 40 cm using Delft3D to simulate Hurricane Ike. Therefore, this study’s utilized modeling strategy is consistent, in terms of model skill, with the current state of the practice.

6.3 Contributing signals to extreme MWLs

This study emphasizes the importance of compound flooding, with all three of NTR, tides, and setup being important for properly modeling extreme MWL along the coastline. Seasonal and MMSL signals were found to be smaller contributors to extreme water levels. However, their magnitude and coincidence with the other components still make them important for hazard analyses, especially when considering less-extreme events. The compound composition of water levels in this region means that timing and co-occurrence is important for an extreme event. Even an extremely large NTR event is unlikely to produce an extreme event if it arrives on a low tide. This agrees with previous research that has emphasized a proper handling of the co-occurrence of forcing parameters for modeling extreme events (Wahl et al. 2011; Li et al 2022) rather than more simplified approaches such as considering the max surge added to a maximum tide signal.

This study further emphasizes other recent research highlighting the importance of wave setup as a contributor to extreme MWL (Melet et al. 2018; Ruggiero et al. 2013). At the national scale, setup has been found to contribute 10–82% of water levels during extreme events (Vitousek et al. 2017, Sweet et al. 2022). Setup was found to be more often the dominant component in the southern region of the study site. Setup variability is likely because the US Atlantic coastline is characterized by a broad continental shelf which limits the amount of wave energy that can penetrate to the coastline (Defne et al. 2009). The southern region of Florida has a much smaller continental shelf (see depth contours in Figs. 8 or 9), allowing more wave propagation and therefore a larger wave setup along the coast. Wave setup has significantly more uncertainty than other modeled MWL signals in this study. Setup data for specific events should be treated as an order of magnitude estimate rather than an exact calculation. Since there is no reason to suspect any systematic bias in wave setup parameterizations, it is expected that aggregated average values for setup contributions should be more robust.

This study takes the pragmatic approach of only discussing events within the hindcast period and does not extrapolate to larger return period events. This study’s hindcast period of 38 years only contains the observed number of hurricanes and storms, so is vulnerable to uncertainty from non-observed variability in storms and MWL components (Serafin and Ruggiero 2014; Lin and Emanuel, 2012). This study avoids extreme water-level value analysis beyond the modeled 38 years, which would amplify uncertainty. Additionally, readers should consider analysis based on the most extreme event of the record (e.g., Rank 1 in Fig. 8) as less certain than the averaged results containing multiple samples. An example of how this effect is the lack of hurricanes within the extreme record for the center of the region (Fig. 10). Hurricanes are less common in this area (Bossak et al. 2014), but the rare direct hurricane hit is still possible. This direct hit would produce a very different MWL composition (likely more NTR dominated). It is impossible to determine exactly how rare this type of event is or how it would affect extreme event composition without simulating a very large number of hurricane events (Lin and Emanuel 2012). This type of study is outside the scope of this paper; readers are cautioned that presented results are only for the hindcast period and not a general conclusion regarding extremes.

While this study is unable to consider larger recurrence interval events due to an under sampling of storms and an insufficient record length, its continuous time series approach is well suited for examining higher frequency extreme events (annual to 15-year recurrence period). Studies which only model hurricane events or specific storm events implicitly assume that those types of events are always responsible for flooding. This is likely true for high return period events but becomes increasingly suspect moving toward the annual event. This study shows that modeling just hurricanes would be insufficient for constraining even the 30-year recurrence period event. While not shown, the maximum event at stations across the region was found to be hurricane induced for only around half of the GTSM-ERA5 output stations during the 38-year time-period from 1979 through 2017. Therefore, an ideal modeling strategy to assess coastal flooding hazards within the region would require blending multiple hurricane/storm simulations (for higher return period events) with a non-hurricane-based compound flooding component (for lower return period events).

7 Conclusions

This study presents a new approach for modeling large-scale (both temporal and spatial) mean water levels (MWL)s along the coastline. This method pairs the recent Global Tide and Surge Reanalysis (GTSM-ERA5, Muis et al. 2020) with a process specific statistical correction to model non-tidal residual (NTR) and monthly mean sea levels (MMSLs). Tides are modeled using a novel ensemble technique which combines multiple global tide models to produce an optimal tidal hindcast for the study region. Open coast wave setup data are estimated using a setup parameterization. Based on comparisons to observations at all tide stations within the study region and outside of highly flow constricted water bodies, the resulting hindcast was found to perform well with an average root mean square error (RMSE) of approximately 13 cm. Tides were found to be the largest contributor to error, likely due to insufficient bathymetric resolution in back-barriers and harbors. It is expected that error is smaller at the open coast locations where output stations are placed for the developed MWL hindcast.

The modeling framework is utilized to produce a hindcast of MWLs along the Southeast US Atlantic coastline from 1979 to 2017. An analysis of extremes from this hindcast period found that the contributors to extremes are variable along the coastline. The average tidal contribution to extremes is largest at approximately 31 degrees latitude and then decreases from north to south. Average NTR contributions to extremes increase south to north, while average wave setup contributions are the opposite, increasing from north to south. The relative importance of each signal is a function of the magnitude of the event, with larger events in general being more NTR dominated. This said, most locations have a balanced mix of tides, NTR, and setup contributing to extreme events, emphasizing previous research that the co-occurrence of various processes is requisite for extremes. This result is further confirmed in the hindcast, such that not all extremes (even up the 38-year event) are produced by hurricanes. Instead, results show that non-hurricane-driven combinations of processes are important to consider for high- to mid-frequency events (annual to 38-year recurrence interval events).