1 Introduction

Storm surges induced by tropical cyclones (TCs) and extratropical cyclones (ETCs) are often the costliest threat to coastal communities along the northeast coast of the United States. ETCs during the cool season, including nor’easters (cyclonic storms that generate strong northeasterly winds along the East Coast of North America), generate most of the large storm surges (Colle et al. 2010; Catalano and Broccoli 2018). For instance, 88 of the 100 largest storm surge events at the Battery NY (located at the southern tip of Manhattan in New York City) are caused by ETC events (Catalano and Broccoli 2018). Examples of remarkable cool-season storm surges include the Great Appalachian storm of November 1950, which caused a peak surge height of about 2.4 m at the Battery NY, and the December 1992 event, whose 2.5-m storm tide (including astronomical tide) resulted in flooding of the New York City subway and a shutdown of the transportation systems for several days (Colle et al. 2010). Also, the blizzard of February 25–27, 2010, caused a large peak surge height of about 1.9 m in Boston, MA.

Hazards associated with storm surges are highly correlated with storm frequency and characteristics. Prior studies have shown that a warming climate may result in an overall reduction in the number of Northern Hemisphere ETCs (e.g., König et al. 1993; Carnell et al. 1996; Geng and Sugi 2003; Ulbrich et al. 2009; Chang et al. 2012; Eichler et al. 2013). Studies focused on the western North Atlantic Ocean have also demonstrated that the density of ETC tracks will reduce in the twenty-first century (e.g., Teng et al. 2008; Long et al. 2009; Colle et al. 2013; Chang 2013).

However, there exist regional variations in ETC changes and uncertainties among global climate models. Colle et al. (2013) and Michaelis et al. (2017) investigated future ETC changes more regionally for a subset of global models from the Coupled Model Intercomparison Project5 (CMIP5; Taylor et al. 2012) for the representative concentration pathway (RCP) 8.5 scenario and found an increase in ETC track density immediately off the U.S. East Coast in the early to late twenty-first century. However, in contrast to Colle et al. (2013), Michaelis et al. (2017) showed that the density increase is not statistically significant. This is consistent with Zhang and Colle (2018), who showed that the ETC increase along the U.S. East Coast is sensitive to the model(s) used in the analysis, since some models have weaker or stronger storms depending on the differences in the baroclinicity and amount of latent heating predicted or resolved by the model. For example, the GFDL-ESM2M and CCSM4 are two CMIP5 models used below in our study, and Zhang and Colle (2018) showed that the GFDL model has a relatively large decrease in the low-level temperature gradient in the future, thus little future change in the ETCs even when downscaled to ~ 20 km grid spacing. In contrast, CCSM4 produced stronger and more frequent future storms primarily because of more latent heating, especially when downscaled to high resolutions.

Changes in the number of relatively deep cyclones have also been analyzed using the CMIP5 models. Using 15 CMIP5 models, Colle et al. (2013) found that the number of relatively deep (minimum sea-level pressure < 980 hPa) and relatively weak (1000–1010 hPa) storms over the Western Atlantic region decreases by 10% and 12%, respectively. Michaelis et al. (2017) showed that strong ETCs (minimum sea-level pressure perturbation of at least − 51 hPa) within the North Atlantic storm-track region could occur less often under the future climate condition. However, they also found a slight increase in maximum and average 10-m wind speeds off the coasts of the Northeastern United States. Seiler et al. (2018) found a decrease in rapidly developing ETCs for two climate models along the East Coast of North America. In contrast, Marciano et al. (2015) used the pseudo-global warming (PGW) approach to simulate 10 relatively strong (< 995 hPa) extratropical cyclones over U.S. East Coast for the current and future climates down to the 4-km grid spacing. This approach adds the average temperature perturbation from the CMIP5 models to the analysis boundary conditions for these storms to create future predictions. They found that increased latent heat release in the future climate resulted in an increase in cyclone intensity.

These effects of climate change on ETC track, frequency, or intensity could influence future storm surge hazards along the northeast coast of the United States. Changes in future storm surges can be projected by simulating surge heights induced by projected ETCs under future climate conditions. For example, Roberts et al. (2017) used a statistical model to study the impact of simulated twenty-first-century climate changes to ETCs (for the RCP8.5 emission scenario) on coastal flooding at the Battery NY. Their multilinear regression model was trained using reanalysis data over 1979–2012 to relate the storm surge height to the surface wind stress and mean sea level pressure. Their model results showed minor changes (on the order of 0.01 m) to the median surge height between a historical period (1979–2004) and a twenty-first-century period (2054–2079). However, statistical modeling of storm surges does not capture the physical processes that govern surge dynamics. Using a hydrodynamic model, Orton et al. (2016) quantified the present-day ETC storm surge hazards for the Battery NY; however, their study did not focus on the impact of climate change. Hydrodynamic modeling has also been applied to assess future ETC storm surge hazards at larger scales (e.g., for the Europe by Vousdoukas et al. (2016). However, to our knowledge, no study has yet focused on the impact of climate change on ETC storm surges along the northeast coast of the United States.

Using a hydrodynamic model, climate projections from seven climate models, and observations, the present study investigates the changes to the ETC storm surges between a historical period (1979–2012) and a future period (2054–2079, RCP8.5 emission scenario) at a number of sites along the northeast coast of the United States. The selected sites are at the location of tide gauge stations of the National Oceanic and Atmospheric Administration (NOAA), where relatively long records of observed water levels are available. Some of the selected sites are located at major coastal cities in the New England and Mid-Atlantic regions including Boston, New York City, Baltimore, and Washington D.C. For each site, observed water levels are analyzed to estimate ETC storm surges during the historical period, and hydrodynamic modeling is utilized to simulate ETC storm surges for the future climate. Statistical methods are applied to estimate the storm surge return levels, the changes of which from the historical period to the future period are used to characterize the impact of climate change on ETC storm surges.

The outline of the paper is as follows. In section 2, we describe the data and methods, which are evaluated using historical observations in section 3. The projected changes in future ETC storm surge return levels are presented in section 4, and the results are discussed and concluded in section 5.

2 Data and methods

Storm surges are simulated using the advanced circulation model (ADCIRC), a finite-element computational tool originally developed by Luettich et al. (Luettich Jr et al. 1992) and Westerink et al. (1994) to compute flow and transport in rivers, estuarine, coastal, and oceanic systems. We run the depth-averaged hydrodynamic module of ADCIRC on the basin-scale computational mesh generated by Marsooli and Lin (2018). This computational domain covers the western North Atlantic Ocean, extending between latitudes 8 and 46 degrees north and longitudes 98 and 60 degrees west. The mesh has a resolution of between 1 km nearshore and 100 km in the deep ocean. Marsooli and Lin (2018) evaluated the model performance for historical TCs and found satisfactory agreements between modeled and observed storm surges along the U.S. East and Gulf Coasts, with an overall root-mean-square error (RMSE), bias, and Willmott skill (Willmott 1981) of 0.33 m, 0.01 m, and 0.88, respectively. In the present study, this model is applied to model ETC storm surges along the U.S. East Coast.

To study the impact of climate change on ETC storm surges, we drive the hydrodynamic model with meteorological forcing (including wind and pressure fields) projected by seven global climate models listed in Table 1: CCSM4, CNRM, GFDL-ESM2M, IPSL-CM5A-MR, MRI-CGCM3, NorESM1-M, and MIROC5. These models are chosen to provide a range of future cyclone solutions as outlined in Colle et al. (2013). We also tried to match as many models as possible with Roberts et al. (2017). We were able to use four of the seven models they used given the CMIP5 6-hourly data availability, while the other three were replaced with MIROC5, MRI-CGMCM3, and IPSL-CM5A-MR, since this data was available at the time of this study. We apply the climate-model projection for mid to late twenty-first century (2054–2079) under the RCP8.5 emission scenario. For comparison, we also apply the climate model estimation for the historical period of 1979–2004 (hereinafter the “control” scenario). We apply the wind and pressure forcing from the climate models with a gridded spatial/temporal resolution of 0.5o × 0.5o/6 h (interpolated from the lower climate-model resolutions), which is relatively low and may induce low biases in hydrodynamic modeling of storm surges. Thus, we first evaluate and quantify the accuracy of the hydrodynamic modeling by simulating historical ETC surges using historical ETC wind and pressure fields with the same resolution (i.e., 0.5o × 0.5o and 6 h) obtained from climate forecast system reanalysis (CFSR).

Table 1 CMIP5 models evaluated and some of their attributes

We simulate the storm surges continuously during the ETC season (November 1–March 31). Forced by the continuous time series of surface wind and mean-sea-level pressure from CFSR and climate models, the ADCIRC model produces continuous time series of surge heights for each ETC season without a distinction between individual ETC events. Observations from NOAA tide gauges also provide continuous time series of storm surge heights. However, in the probabilistic storm surge hazard assessments, we need to compile event-based information and analyze peak surge heights generated by individual storms. Here, we simply apply a minimum surge height of 0.5 m and a maximum time interval of 3 days to identify ETC events from the continuous (hourly) storm surge time series and then find the peak surge height generated by each ETC event. For the CFSR dataset, depending on the location, the minimum surge height corresponds to between 89th (Kings Point NY and Delaware City DE) and 99th (Woods Hole MA) percentile of hourly storm surge observations. These thresholds allow us to identify the major ETC events and avoid including smaller events (as extreme events are the main focus in risk assessment studies). In the process of selecting events, we assume that none of the significant surges is generated by TCs, as the length of storm surge time series covers only the ETC season. We note that some of the historical extreme surge events recorded at the tide gauge stations are mainly caused by river flooding due to heavy rainfall or rapid snowmelt rather than wind forcing (e.g., flooding on the Potomac River, Washington D.C., in January 1996). These events are not included in the model evaluation and return level analysis.

The storm surge observations (estimated by subtracting predicted tides from observed water levels) during a historical period (i.e., 1979–2013) are obtained from NOAA, Center for Operational Oceanographic Products and Services. Figure 1 shows the location of the selected NOAA tide gauge stations. To quantify the accuracy of the storm surge modeling based on CFSR during the historical period, we compare the model results and observations for each site and use the root-mean-square error (RMSE), bias, and Willmott skill (Willmott 1981) to quantify the accuracy of the model. A perfect agreement between modeled and observed surge heights results in a RMSE and bias of zero. The Willmott skill scores the model accuracy between zero (lowest accuracy) and one (highest accuracy). This bias of CFSR simulations relative to the observation is mainly induced by the limited resolution of meteorological input and accuracy of the hydrodynamic model. Zhang and Colle (2018) showed that the CMIP5 models tend to underpredict ETC intensity due to their limited resolution. Thus, we assume that similar biases exist in the CMIP5 simulations, which applies the same hydrodynamic model and resolution of meteorological forcing. Thus, we apply the bias calculated based on the CFSR simulations (hereafter “hydrodynamic-modeling bias”) to correct all climate-model–simulated surge heights for each site.

Fig. 1
figure 1

Selected sites in the Northeastern United States. The name assigned to each site represents the name of NOAA’s tide gauge station located at that site

The impact of climate change on ETC storm surge hazards is investigated by comparing the estimates of storm surge return levels for a historical period (1979–2004) and a future period (2054–2079). The return levels for the historical period are estimated from the observed surge heights during 1979–2004. The return levels for the future period are estimated from the CMIP5-modeled surge heights. The two components in return-level estimation are the storm frequency (defined based on the surge threshold) and storm surge cumulative distribution function (CDF). The storm surge CDF is estimated based on exceedance statistical methods. Extreme events usually produce rare, long tail probability distribution events. Thus, similar to Lin et al. (2010), we model the tail of the storm surge CDF using the peaks-over-threshold method (POT) with a generalized Pareto distribution (GPD) and maximum likelihood estimation (Coles 2001). The rest of the distribution is modeled using nonparametric density estimation. The threshold values that separate the tail from the rest of the distribution are selected by trial and error (so that the best CDF curve is fitted to the empirical points).

The estimates of storm surge return levels could be biased due to potential bias in the climate model projected wind and pressure fields themselves (hereafter “climate-modeling bias”). Here, we use the same bias-correction approach as in Lin et al. (2016) to calculate and remove these biases from the storm surge return levels. In this approach, storm frequency and storm surge CDF are bias-corrected separately by comparing the modeled and observed results. The estimate of the future storm frequency is bias-corrected by multiplying it with a correction factor, which is the ratio of the observed frequency for the historical period (1979–2004) and the frequency estimated by the climate model for the same period (the control projection). The storm surge CDF is bias-corrected using the quantile-quantile–mapping method so that the best fit between the modeled and observed CDFs for the historical period is obtained. The biases calculated for the historical period are assumed to be the same over future periods and, thus, are employed to bias-correct the storm surge CDFs in the future period (2054–2079).

In addition to the projections of future storm surge return levels from each of the seven climate models, we also present a “composite” projection as a weighted average of all seven projections, similar to Lin and Shullman (2017). The weight of a climate model is determined based on the accuracy of the estimated storm surge return levels for the control scenario (i.e., historical period of 1979–2004) compared with the observed return levels (for the same historical period). Here, we simply calculate a weighting factor for the climate model i as Wi = Si/∑Si, where Si is the Willmott skill score (Willmott 1981) for the surge return level curve for model i. That is, we calculate S based on quantitative agreement between the climate-model and CFSR-based surge return level curves, with a score of one for perfect agreement and zero for complete disagreement.

3 Model evaluation

We evaluate the performance of the hydrodynamic modeling based on CFSR by comparing the modeled and observed storm surge heights generated by historical ETCs between 1979 and 2013. Figure 2 compares the time series of storm surge heights at three sites (i.e., Boston MA, the Battery NY, Ocean City MD) and for nine different events. The comparisons reveal capabilities of the model to replicate the temporal patterns in surge height variations. However, the magnitude of surge heights is somehow underestimated (i.e., a negative hydrodynamic-modeling bias exists), which is likely due to the insufficient resolutions of the CFSR forcing data and surge computational mesh and missing physics in the simulations such as wave setup and river flow.

Fig. 2
figure 2

Time series of modeled (red lines; original without correction) and observed (black lines) storm surge heights for nine ETC events at three sites: Boston MA (left panels), the Battery NY (middle panels), and Ocean City Inlet (right panels). The surge heights are shown relative to the mean sea level. Monthly variations in mean sea level and climatological seasonal mean sea level fluctuations are subtracted from observations

Figure 3 compares the modeled and observed temporal peak surge heights induced by the historical ETCs between 1979 and 2013 at the three selected sites. Comparisons indicate that a negative hydrodynamic-modeling bias does exist, and removing the bias significantly reduces RMSE and increases the Willmott skill score. The hydrodynamic-modeling bias in modeled surge heights for all study sites is between − 0.19 m and − 0.36 m, and the averaged bias over all sites is − 0.27 m. Removing the bias from the model results reduces the averaged RMSE from 0.31 m to 0.15 m and increases the averaged Willmott skill score from 0.51 to 0.76 (note that the calculated bias at each site is removed from the model results at the same site). In the remainder of this paper, the hydrodynamic-modeling bias is subtracted from the modeled peak surge heights before performing the statistical analysis and estimating the storm surge return levels.

Fig. 3
figure 3

Observed and modeled peak surge heights induced by ETC events between 1979 and 2013. The upper panels compare the original model results with observations. The bottom panels compare the hydrodynamic-modeling bias corrected model results with observations

Next, we use the observed and modeled surge heights to estimate storm surge return levels for the control scenario (historical period of 1979–2004). Table 2 summarizes the 10- and 50-year return levels estimated based on observations. The estimated storm surge return levels are between 0.9 m and 1.5 m for the 10-year return period and between 1.1 m and 2.1 m for the 50-year return period for the study sites. The largest and smallest return levels are estimated for Sandy Hook, NJ, and Woods Hole, MA, respectively. The left panels of Fig. 4 compare the modeled and observed storm surge return levels for the historical period of 1979–2004. Comparisons reveal that the modeled return levels based on most of the climate models are excessively underestimated, but the return levels based on the MRI model are overestimated.

Table 2 Storm surge return levels calculated based on observed surge heights between 1979 and 2004 at the NOAA tide gauge stations
Fig. 4
figure 4

Left panels show the ETC storm surge return level curves generated based on observed surge heights during the time period of 1979–2004 (black line) and simulated storm surges for the control projections (1979–2004) of climate models (colorful lines) for three study sites. Dashed lines show the 95% confidence intervals (same color as solid lines). Middle panels compare the observed and modeled annual maximum surge heights (data points are first sorted in ascending order and then plotted). Right panels compare the modeled annual maximum surge heights for the control (1979–2004) and future (2054–2079) periods (data points are first sorted in ascending order and then plotted). (Results corrected for hydrodynamical-modeling biases)

These biases in the modeled storm surge return levels could originate from over/underestimated storm frequency and/or storm surge magnitude. Table 3 compares the observed and model-estimated storm annual frequency (here annual represents the time period when ETCs occur, i.e., November–March). While all climate models overestimate the storm frequency (i.e., a positive bias exist) for the historical period, the bias in the MRI model is substantially larger than that in other models. The middle panels in Fig. 4 compares the modeled and observed annual maximum surge heights. The annual maximum surge heights are overestimated by the MRI model; whereas, they are underestimated by other models. Thus, the highly overestimated storm frequency and storm surge magnitude (as indicated by the overestimated annual maximums) by the MRI model explain why the storm surge return levels based on this model are larger than the observation-based return levels. Other models significantly underestimate the return levels as they significantly underestimate the surge magnitudes.

Table 3 Observed and model-estimated annual frequency (defined based on the surge threshold) of ETCs at three selected sites (original without correction)

4 Future predictions

The right panels in Fig. 4 compare the modeled annual maximum surge heights for the historical period of 1979–2004 and future period of 2054–2079. The comparison indicates, for example, surge magnitudes projected by CNRM will significantly increase in the future climate. Table 3 also shows that the storm frequency is projected to increase substantially by CNRM (e.g., by 82% at Boston MA). Next, we discuss about the projected changes in the storm surge return levels. As the comparisons between simulated and observed storm surge return levels for the historical period (Fig. 4 left panels) revealed that the estimates based on the climate models are biased, we first remove these climate-modeling biases from the modeled storm surge return levels for the future period of 2054–2079, assuming that these biases do not change over the projection periods.

Figure 5 presents the storm surge return levels projected under the future climate (time period of 2054–2079), compared with those estimated based on observations in the historical period (1979–2004), for Boston, the Battery, and Ocean City. The results from the CNRM model show an increase in the storm surge levels in the future climate for all three sites. At the Battery and Boston, NorESM1 predicts subtle changes in storm surge levels with lower return periods but substantial increase in surge levels with higher return periods. The projections from other models indicate minor effects of climate change on storm surge levels. As a result, the weighted-average projections show slight increases in the storm surge levels under future climate condition. For example, the estimated 10- and 50-year storm surge levels at Boston change from 1.09 m and 1.34 m, respectively, to 1.15 m and 1.36 m (about 5% and 1% increase). The 10- and 50-year storm surge levels increase, respectively, by 3% and 2% at the Battery and 7% and 9% at Ocean City.

Fig. 5
figure 5

Storm surge return level curves under current (based on observations 1979–2004) and future conditions (based on climate models’ projection 2054–2079) for three study sites. The left panels show the future return level curves for each of the seven climate models. The right panels show the weighted-average future return level curves. Dashed lines show the 95% confidence intervals (same color as the solid lines). (Results corrected for both hydrodynamic-modeling and climate-modeling biases)

The increase in storm surge levels estimated by CNRM is due to the fact that this model predicts a significant increase in storm annual frequency (Table 3) and surge intensity (right panels of Fig. 4), as discussed above. The NorESM1 model predicts a slight decrease in the storm frequency but an increase in the magnitude of storm surges generated by intense events at the Battery and Boston (and at other sites located in the New England region; not shown). Other models show a minor change in the storm frequency as well as the storm surge magnitude.

Figure 6 displays the changes in the 10-year storm surge return levels due to climate change for all study sites. The CCSM4 model predicts up to 15% reduction in surge levels at sites located in Delaware and Chesapeake Bays and up to 13% increase at other sites. Predictions from GFDL and MIROC show subtle effects of climate change on 10-year storm surge levels (between − 7 and 8%). NorESM1 predicts that the impact of climate change on the surge levels at sites located at higher latitudes is to increase it between 1 and 11%. At sites located at lower latitudes, on the other hand, the 10-year storm surge level reduces by 2–7%. Projections from CNRM significantly differ from other models. This model predicts that the climate change could cause substantial increases (up to 27%) in storm surge levels in the entire domain. Similar to NorESM1, the IPSL model predicts an increase in surge levels at higher latitudes and a decrease at lower latitudes. In contrast to NorESM1 and IPSL, the MRI model predicts that the increase in the surge levels at sites located at lower latitudes is larger than that at sites located at higher latitudes. Overall, the weighted-average projections show that the climate change would increase the 10-year storm surge level by less than 7% in most sites.

Fig. 6
figure 6

Impact of climate change on 10-year storm surge levels for all study sites. The percentage change is calculated by comparing the storm surge levels estimated for the modeled climate of 2054–2079 and the observed climate of 1979–2004. (Results corrected for both hydrodynamic-modeling and climate-modeling biases)

The changes in the 50-year storm surge return levels are shown in Fig. 7. The spatial patterns of climate change impacts on the 50-year surge level are similar to those on the 10-year surge level, but the magnitudes and variations are larger. It is noted that projections from CNRM indicate substantial increases (up to 36%) in the 50-year surge levels at all sites. NorESM1 predicts up to 34% increase at sites located from Maine to New Jersey and up to 17% decrease at lower-latitude sites. In contrast to NorESM1, MIROC5 shows substantial decrease (up to 20%) in the 50-year surge levels at high-latitude sites (from Maine to New Jersey). CCSM4 projects large decreases for sites in Delaware and Chesapeake Bays. For example, the 50-year surge level reduces by 26% at Baltimore. Overall, the weighted-average projections show a slight decrease (less than 5%) in the 50-year surge return levels at sites located in Chesapeake Bay and slight increase at other sites (less than 6% increase in most sites).

Fig. 7
figure 7

Impact of climate change on 50-year storm surge levels for all study sites. The percentage change is calculated by comparing the storm surge levels estimated for the modeled climate of 2054–2079 and the observed climate of 1979–2004. (Results corrected for both hydrodynamic-modeling and climate-modeling biases)

5 Discussion and conclusions

Using a hydrodynamic model, changes to ETC storm surge levels are projected between the historical period of 1979–2004 and the mid-to-late-twenty-first-century period of 2054–2079 for a number of urban sites in the Northeastern United States, including Boston, New York City, Baltimore, and Washington D.C. Overall, weighted-average projections over seven CMIP5 climate models indicated a small change in the storm surge return levels. We found that the 10-year ETC storm surge level would increase by less than 7% in most sites. The 50-year ETC storm surge level would decrease by less than 5% at sites located in Chesapeake Bay and increase by less than 6% at most sites in other regions.

We found some discrepancies among projections from different climate models. For example, projections from NorESM1 showed an increase in the 50-year storm surge level at sites located at coastal areas from New Jersey to Maine, whereas projections from MIROC5 showed a decrease at these sites. While projections from other models indicated only a slight increase/decrease in 10- and 50-year storm surge levels, projections from CNRM showed a substantial increase in storm surge levels at all sites (up to 27% and 36% increase in 10- and 50-year surge levels, respectively). These model uncertainties should not be neglected. We showed that discrepancies are due to differences in the ETC frequency (defined based on a surge threshold), and surge magnitude projected by the climate models. It would be interesting to further investigate the variation by comparing the underlying synoptic-scale features projected by these models.

The present study has several limitations. Directly applying the atmospheric wind and pressure fields projected by climate models, which have limited resolutions and possibly biases, induces uncertainties in the surge estimation. The hydrodynamic model also has a limited resolution (~ 1 km along the coast) and neglects influences such as surface waves and rainfall-induced runoff. Although efforts were made to remove the biases resulted from these limitations by statistical methods, future research may reduce these biases through improving the physical modeling.

The present study focuses on ETC storm surge, which is one of several components that contribute to coastal flooding. The total flood level is a combination of storm surge, astronomical tide, and sea level rise. Sea level rise is expected to make substantial contribution to future flooding (Nicholls and Cazenave 2010). Tides can also be an important contributor, especially in high-latitude northeastern regions, where tidal amplitudes are relatively high. TCs can produce larger storm surges than ETCs in the northeastern regions, and potential changes in TC climatology may significantly increase future flooding (Lin et al. 2012, 2016). Our future studies would consider ETC and TC storm surges, astronomical tide, and sea level rise together in projecting coastal flood hazards and risk.