The effect of density stratification on the prediction of global storm surges

With the long-term goal of developing an operational forecast system for total water level, we conduct a hindcast study of global storm surges for Fall 2014 using a baroclinic ocean model based on the NEMO framework. The model has 19 vertical levels, a horizontal resolution of 1/12°, and is forced by hourly forecasts of atmospheric wind and air pressure. Our first objective is to evaluate the model’s ability to predict hourly sea levels recorded by a global array of 257 tide gauges. It is shown that the model can provide reasonable predictions of surges for the whole test period at tide gauges with relatively large tidal residuals (i.e., gauges where the standard deviation of observed sea level, after removal of the tide, exceeds 5 cm). Our second objective is to quantify the effect of density stratification on the prediction of global surges. It is found that the inclusion of density stratification increases the overall predictive skill at almost all tide gauges. The increase in skill for the instantaneous peak surge is smaller. The location for which the increase in overall skill is largest (east coast of South Africa) is discussed in detail and physical reasons for the improvement are given.


Introduction
Storm surges are variations of sea level about the tide due to storms. Positive surges can cause severe coastal flooding and negative surges can be navigation hazards for ships operating in shallow water. Needham et al. (2015) recently provided an overview of the global data sources, observations and impacts of historically extraordinary storm surges. To reduce possible casualties and economic losses caused by storm surges, many countries have developed regional forecast systems to provide warnings to coastal communities and marine related industries.
The early development of numerical storm surge prediction systems was reviewed by Heaps (1983) and Bode and Hardy (1997). Two-dimensional shallow water equations have been used extensively in operational regional forecast systems, e.g. SLOSH for the US (Jelesnianski et al. 1992) and CS3 for the UK (Flather 2000). Statistical models are sometimes used in conjunction with a dynamical model (e.g. SLOSH) to compensate for errors in the model or its atmospheric forcing. To take account of uncertainty in the atmospheric forcing, there is now a move towards the development of ensemble-based systems. For example, Flowerdew et al. (2010) have developed an ensemble surge forecast system for the UK and, more recently, Bernier and Thompson (2015) have evaluated an ensemble prediction system for Atlantic Canada.
In contrast to the regional systems mentioned above, the present study focuses on the development of a global, baroclinic storm surge prediction system. The global system is not designed to compete with regional models (it only has a resolution of 1/12 • ). Its purpose is to improve the predictions of the regional forecast systems. For example, the global model can be used to identify regions with relatively low predictability (due, for example to low predictability in the atmospheric forcing) and also to provide lateral boundary conditions for regional systems.
The first objective of the present study is to quantify the skill of the global model in the prediction of water levels caused by variations in wind and air pressure. In this initial development of the global system, we focus on Fall of 2014 and compare the model's predictions to observations made by a global array of 257 tide gauges. The existing model most relevant to the present study is MOG2D-G (Carrère and Lyard 2003, see also Appendix). This is an excellent global surge prediction model that is used widely to provide a Dynamic Atmospheric Correction (DAC) for dealiasing altimeter data. The DAC data for Fall 2014 are also evaluated against the tide gauge dataset to compare its predictive skill with our model.
The second objective is to quantify how density stratification may influence the predictions of water level and storm surges. One way in which density stratification can influence the surges is by changing the vertical structure of the currents and thus bottom stress (e.g. Gordon 1982). Although some studies (e.g. Davies 1988;Hearn and Holloway 1990;Weisberg and Zheng 2008;Zheng et al. 2013) have examined the effect of vertical structure in barotropic models, the effect of baroclinicity has received less attention. One exception is the modelling study of Minato (1998) of the storm surge in Tosa Bay produced by typhoon Anita. It was shown that the predicted storm surge increased slightly by including density stratification because bottom stress was reduced. Another way in which density stratification can influence surges is by changing the dispersion relation of coastal trapped waves (e.g. Clarke 1977;Gill 1982) and also reducing the amount of backscattering by freely propagating coastal trapped waves as they encounter changes in the width of the shelf (e.g. Wilkin and Chapman 1990). In the present study, we compare model runs with and without realistic density stratification.
The numerical model and its configuration are described in Section 2. The observations used to evaluate the model predictions are described in Section 3. The model predictions are evaluated in Section 4 based on comparison with the observed sea levels, and the effect of density stratification is explored. The results of the study are summarized and discussed in Section 5.

Numerical model
We use a baroclinic ocean model based on the NEMO framework to predict storm surges. The model has 19 z-levels in the vertical and a horizontal grid spacing of approximately 1/12 • . To include the effect of mean density stratification, temperature and salinity fields are initialized by, and weakly restored to, an observed climatology. Almost the same numerical model was used recently by Kodaira et al. (2016) in a study of global tidal surface currents associated with M 2 internal tides.
The baroclinic model was further modified to exclude the mean general ocean circulation (e.g. western boundary currents such as the Gulf Stream and Kuroshio) and allow us to focus on current and sea level variations about the mean state. This type of perturbation approach has been used successfully in previous studies of global M 2 internal tides using realistic ocean models (e.g. Niwa and Hibiya 2001). Given surges have similar frequencies to the M 2 tides, the same perturbation approach is used in this study to predict surges in the presence of density stratification. Details are given below.

Governing equations
The governing equations of our model are as follows: Variables with primes denote perturbations about the background mean state. Subscript h refers to horizontal. u h is the horizontal velocity vector (u , v ) whereas u is the full three-dimensional velocity vector (u , v , w ). f is the Coriolis parameter multiplied to the upward unit vector. η is the sea surface fluctuation and p a is atmospheric pressure at the surface. ρ 0 is a reference density (1025 kg m −3 ). We define = ρ /ρ 0 where ρ is the perturbation of density about the background state, determined by an equation of state (not shown). The subgrid-scale physics are parameterized by mixing terms as described below. Following Carrère and Lyard (2003), the linear damping term c iwū h is included to parameterize subgrid-scale internal wave drag acting on the barotropic motion (ū h is the depth-averaged horizontal velocity vector). The temperature (T ) and salinity (S) are maintained by restoring terms in (3) and (4) where r is the restoring coefficient and T b and S b are the background mean states to which T and S are weakly restored. The rest of the notation is standard.

Model setup
The governing equations are solved numerically using the Nucleus for European Modeling of the Ocean (NEMO) framework (Madec 2008). A mode-splitting technique is used to calculate the fast barotropic mode efficiently. The model domain is global apart from a closed boundary at 78 • S. A tri-polar ORCA grid with a nominal horizontal resolution of 1/12 • is used along with the ORCA12 bathymetry V3 (available, with documentation, from http://servdap.legi.grenoble-inp.fr/meom). The minimum and maximum water depths are taken to be 10 and 6000 m, respectively, in part to avoid numerical instability. The vertical grid has 19 z-levels with the level spacing increasing from approximately 15 m near the surface to 900 m in the deep ocean. Partial cells are used for the bottom cells to represent the bathymetry more accurately.
A modified UNESCO formula (Jackett and Mcdougall 1995) is used to calculate the density ρ and its perturbation ρ . To include the effect of density stratification on vertical mixing, the vertical eddy viscosity and diffusivity coefficients are computed using the TKE-type turbulent closure model introduced by Gaspar et al. (1990) and subsequently improved by Madec et al. (1998).
The coefficient c iw is calculated following the formulation of Jayne and St. Laurent (2001), and its global distribution is given by Kodaira et al. (2016). Internal wave drag is only applied in areas deeper than 1000 m and so its effect on storm surge prediction is limited. The drag is however important in suppressing basin-scale barotropic motions caused by wind stress. The bottom boundary condition includes quadratic bottom friction with a constant drag coefficient of 2.5 × 10 −3 .
The time steps for the internal and external mode calculations are 180 and 6 s, respectively. The horizontal viscosity and diffusion coefficients are A h = 100 m 2 s −1 and K h = 10 m 2 s −1 . The restoration coefficient for temperature and salinity is r −1 = 5 days.
The model was run for 3 months, from August 1, 2014 to October 30, 2014. The model results for the first 2 weeks are not used in the present study. Due to the large amount of model output, only sea surface height, steric sea level (e.g. Marshall and Plumb 1965) defined here as η s = − 0 −h dz, and bottom stress were stored for each forecast hour.

Initial conditions
The ocean is assumed to be initially at rest. Initial conditions for temperature and salinity were interpolated from the 1/4 • annual climatology of the World Ocean Atlas 2013 Zweng et al. 2013). Although the original climatology is statically stable, this is not necessarily true of the interpolated fields. For this reason, the interpolated temperature and salinity fields were initially diffused for 2 days without advection. If unstable density profiles were detected, the vertical diffusivity was increased to 5 m 2 s −1 . During the diffusion process, the horizontal diffusion coefficient was set to 2 × 10 3 m 2 s −1 , which mixes the ocean by approximately 25 km over 2 days. The diffused seasonal climatologies are then used as the background temperature and salinity fields (T b and S b ) to which the model is restored.
The model run that is initialized and restored to the diffused climatology is baroclinic and henceforth referred to NEMO-Bc. The model with constant temperature and salinity is referred to as NEMO-Bt because it is barotropic.

Atmospheric forcing
Following Bernier and Thompson (2015), the magnitude of the wind stress is assumed to be of the form C d (W )W 2 where W is the wind speed and the drag coefficient is The surface air pressures and winds used to drive the model came from the control run of the global ensemble atmospheric prediction systems (GEPS) developed by Environment Canada (Houtekamer et al. 2014). The initial conditions for GEPS are the result of a recently updated ensemble Kalman filter. GEPS operates with a grid spacing of approximately 66 km. Forecast runs are renewed every 12 h, and the prediction continues for 10 days for each model run.
To concatenate the time sequence of GEPS forecasts of wind and air pressure, temporal blending is applied to the first five forecast hours of the atmospheric forcing fields as follows: a o (t i ) and a n (t i ) are the old and new forecasts, and c(t i ) is the blending coefficient for time t i (defined by c = [0.50, 0.35, 0.25, 0.15, 0.05]). The forecasts, and the blending step, are carried out every 12 h as indicated by the schematic shown in Fig. 1. The same blending was applied by Bernier and Thompson (2015). It was used to avoid  shocks to the surge model when adjacent atmospheric forecasts differ significantly. The blending ensures a smooth transition between the surge forecasts which can be sensitive to the perturbations applied to the atmospheric model.

Tropical storms
The International Best Track Archive for Climate Stewardship (IBTrACS) provides an inventory of reported tropical storms worldwide (Knapp et al. 2010). The position, maximum sustained winds, central pressure, storm name, radius of maximum winds and additional properties are provided by multiple agencies. Figure 2 shows the tracks of tropical storms, and their extratropical extensions, observed during the study period. The tracks were used to choose tide gauges to evaluate the storm surge predictions for specific case studies as detailed below.

Coastal sea levels
Hourly sea levels of research quality were obtained from the Sea Level Centre, University of Hawaii for a globally distributed array of tide gauges. Tide gauges were excluded if (1) the tide gauge is more than 25 km from the nearest model grid point, or (2) observations are missing for more than 10 % of the study period. The result of applying the two selection criteria is a global distribution of 257 tide gauges for model evaluation (Fig. 2).
Tides were removed from each hourly sea level record using the t tide package (Pawlowicz et al. 2002). Hourly observations for all of 2014 were used to determine the amplitude and phase of the tidal constituents for each station. In addition to removing tides, a high pass Butterworth filter was used to suppress variability with period exceeding 20 days. This was done to suppress lower frequency motions not resolved by the NEMO ocean model.

Global sea level prediction
Sea levels predicted at the grid point nearest to each tide gauge station are used for model evaluation. Before comparison, the predicted hourly sea levels were filtered using the same filtering process applied to the tide gauge records (see Section 3.2). Although the astronomical tidal potential is not included in the model, motions at tidal frequencies can be excited by atmospheric pressure as previously reported by Arbic (2005).
The filtered predictions from August 15 to October 30 are quantitatively assessed using the following γ 2 statistic:   prediction is no better than constant. Note γ 2 can exceed 1. The global distributions of γ 2 from the baroclinic run (NEMO-Bc) and the global surge model of Carrère and Lyard (2003) are shown in Fig. 3. In general, γ 2 is less than 0.5 poleward of 30 • . Within 30 • of the equator, relatively large γ 2 values are evident but we note the standard deviation ofη o is less than 5 cm at most of these tide gauge stations. We also note that the model will not resolve well the bathymetry in the vicinity of small islands and so we expect to find large discrepancies in the model predictions (and thus large γ 2 ) for such locations. Large γ 2 values are also evident along the west coast of South America, between the Indian Ocean and Pacific Ocean and the north coast of Australia.
The γ 2 for NEMO-Bc and the DAC are compared in Fig. 4a. The marginal histograms for stations with large (≥ 5 cm) and small (< 5 cm) standard deviations ofη o are fundamentally different for both model runs. The histograms for stations with small standard deviations indicate relatively poor predictions: the median γ 2 values are approximately 0.83 for both runs. For large standard deviations of η o , the fits are much better: the median (interquartile) of γ 2 for NEMO-Bc and the DAC are 0.36 (0.26) and 0.35 (0.28), respectively.
The DAC is based on six-hourly atmospheric forcing whereas NEMO-Bc is forced hourly. This means that the atmospheric forcing for NEMO contains more highfrequency variability. To reduce this discrepancy, we filtered the observed and predicted time series to eliminate variability with period less than 6 h and examined the effect on predictive skill.
The high pass filtering led to a decrease in γ 2 (i.e. increase in skill) at 88 % of the tide gauge stations. The effect is clearly evident in the global distribution of the γ 2 ratio (Fig. 5). Removing variability shorter than 6 h has significantly improved the fit along the west coast of North America, the south coasts of Africa and Australia and New Zealand. One possible explanation is that the blending of the atmospheric forecasts (see (6)) was not completely effective leading to spurious high frequency sea level variability. Another possibility is the limited horizontal resolution of GEPS. This results in the failure of NEMO-Bc to accurately predict higher frequency signals associated with relatively small spatial scales. (Note that the filtering used to remove variability with period less than 6 h is not used in the rest of the paper.)

Selected case studies
We now focus on the ability of the model to predict peak surges and their arrival times for a selection of tropical storms listed in IBTrACS. Although more than 20 tropical storms were observed during the study period, an associated storm surge was not always observed by the global array of tide gauges. More specifically, five storms were detected in The number in parentheses shows the delay (in hours) in the arrival time of the predicted peak surge compared to the time of the observed peak. There are two records for Gonzalo because two peaks are observed at St. John's (see Fig. 8  the North Atlantic but only Gonzalo moved close enough to shore to generate a surge that was observed by the tide gauge station at St. John's (Canada). In the eastern Pacific, 10 storms were detected but storm surges were not observed because of the lack of tide gauges along the southwest coast of North America. The tide gauge at San Diego is closest to the tracks of these storms but its maximumη o was only 12 cm. In the western Pacific, eight storms were detected and surges associated with Kalmaegi, Phanfone and Vongfong were observed. In the Indian Ocean, only one storm was detected but there were no tide gauge stations in the vicinity of its path.
In total, seven storm surges were detected in the sea level records and they are listed in Table 1. The listed peak surge and its arrival time were determined using the local maximum ofη o . The observed storm surges produced by Kalmaegi, Phanfone, Vongfong and Gonzalo were all smaller than one meter (Table 1). The peak surges predicted by NEMO-Bc are consistently smaller than the observed peak surges. The relatively coarse resolution of the ocean model, and its minimum water depth of 10 m, will contribute to the underestimation of the peak surges.
The underestimation is also due in part to the coarse resolution of GEPS (66 km) leading to underestimation of storm intensity.
The arrival time of the peak surge predicted by NEMO-Bc is in good agreement with the observations except for the surge generated by Kalmaegi (Table 1). The predicted surge arrived at Hong Kong 4 h after the observed peak (Fig. 6). The later arrival of the predicted surge is not only because of high frequency fluctuations near the peak surge but also because of an inaccurate prediction of the shape of the surge.

The effect of density stratification
We first compare γ 2 values from NEMO-Bc and NEMO-Bt. When the density stratification is removed from the model, the fit is deteriorated at almost all stations (Figs. 4b  and 7). An alternative metric to γ 2 is the squared correlation (r 2 ) between observed and predicted sea levels for the same location. This statistic is more forgiving because it allows an arbitrary scaling of the observations and predictions. Using r 2 led to the same conclusion: the inclusion of density stratification increases the predictive skill of the model. The improvement in model fit is also evident in the predictions of peak surges by the baroclinic, compared to the barotropic, model (Table 1, see columns labelled Bc and Bt respectively). The arrival time of the peak surges however remains almost unchanged.
The three stations labelled in Fig. 7 were selected for further examination of the effect of density stratification. One station is East London (South Africa) because the change in γ 2 is the largest. The other two stations are Naha (Japan) and St. John's (east coast of Canada) because storms passed over these stations (Fig. 7). We first examine time series for each location, and then the spatial distributions of predicted sea levels to clarify the difference in the predictions of NEMO-Bc and NEMO-Bt. (The filtering described in Fig. 7 Global distribution of ratio of γ 2 from NEMO-Bt and NEMO-Bc (γ 2 Bt /γ 2 Bc ). Red shading implies the addition of density stratification has improved model fit.   Fig. 11 and Fig. 12 Section 3.2 was not applied to the model output before mapping their spatial distributions.) EAST LONDON: Time series of filtered sea level show that NEMO-Bc consistently provides better predictions than NEMO-Bt (Fig. 8a). The exclusion of density stratification increases γ 2 from 0.26 to 0.55.
In order to investigate the physical reasons for the different predictions by NEMO-Bc and NEMO-Bt, predicted sea level changes along the coast of South Africa are examined (Fig. 9). The Hovmöller diagram (Fig. 10a) shows the frequent generation of signals that propagate with the coast on their left at a speed of about 4 m s −1 , consistent with coastal trapped wave theory. The Hovmöller diagram of the difference between NEMO-Bc and NEMO-Bt (Fig. 10b) shows that NEMO-Bt attenuates the propagation of the signal as it encounters the rapid narrowing of the shelf at s = 1500 km (Fig. 9). The phase speed slows for s > 1500 km, consistent with the expected effect of density stratification on coastal trapped wave propagation (e.g. Gill 1982).
Snapshots of sea level are shown in Fig. 11 Fig. 9 The along-shore coordinate (s) defined along the coast of South Africa (blue). The green dots and labels show position along the coast.
The circles indicate location of tide gauge stations with the color showing the ratio of γ 2 for NEMO-Bt to NEMO-Bc (γ 2 Bt /γ 2 Bc ). The thin and thick black lines show the 200 and 2000 m isobaths, respectively [km] [km] s = 1500 km. This is consistent with the expected effect of density stratification on coastal trapped wave propagation (e.g. Wilkin and Chapman 1990).
NAHA: This tide gauge detected a storm surge associated with tropical storm Vongfong as it passed almost directly over the station (Fig. 7). The rapid increase ofη on October 11 indicates the passage of Vongfong (Fig. 8b). The inclusion of realistic density stratification results in better prediction with a slightly larger storm surge. The difference is more significant after the peak surge because NEMO-Bc predicts the subsequent depression of sea level.
Snapshots of sea level for 12:00 on October 11, 2014, when the storm was directly overhead, are shown in Fig. 12. As expected, the large (order 0.1 m) difference in deep water between the barotropic and baroclinic runs is due to steric changes (not shown). The larger peak surge predicted by the baroclinic model in the shallow water in the vicinity of the tide gauge is likely due to a reduction in bottom stress due to baroclinity.
ST JOHN'S: Differences between the two model runs are not always evident. For example, the observed time series for St. John's indicates a large surge caused by Gonzalo (Fig. 8c). However, no significant difference between the two model predictions can be seen. This indicates both models predict a barotropic response, consistent with the rapid translational speed of this particular storm over the wide and shallow Grand Banks offshore of this tide gauge station.

Summary and Discussion
A hindcast study of global storm surges has been carried out for Fall 2014 using a baroclinic ocean model based on the NEMO framework. The model has 19 vertical levels and a horizontal resolution of 1/12 • . It was forced by hourly atmospheric wind and pressure forecast fields provided by Environment and Climate Change Canada.
Based on the γ 2 statistic, the present model provides predictions of global water level that are almost as accurate as the DAC. The model has some difficulty in predicting sea level fluctuations shorter than 6 h; filtering out this high frequency variability improves significantly the fit of the model. The possible explanations include inaccurate prediction of high frequency/small-scale atmospheric and oceanic features by GEPS and NEMO-Bc, and the blending process used to generate the atmospheric forcing. During the study period, more than 20 tropical storms appeared in the IBTrACS archive. However, the global array of 257 tide gauge stations detected storm surges generated by only four of these storms. The arrival time of the peak surge predicted by the model was typically within 2 h of the observed peak arrival time. The model consistently underestimates the peak surges by about 20 cm.
The effect of density stratification, the main focus of the study, was examined by comparing barotropic and baroclinic runs of the model. It was shown that the inclusion of density stratification generally improved the surge predictions as measured by γ 2 . The inclusion of density stratification however only slightly improved (by several cm) the prediction of peak surges. Case studies suggest that the physical reasons for the improvement include the effect of density stratification on the propagation and transmission of coastal trapped waves, and changes in bottom stress associated with changes in the vertical structure of the currents.
An interesting possibility raised by one reviewer is that a depth-averaged barotropic model may fit the observations as well as the 3D baroclinic model used in this study (NEMO-Bc). The reason is that the dynamical response of the shelf and near shore is represented differently in these two types of model. For this reason, we modified NEMO-Bt to have only one layer (henceforth NEMO-2D). Note NEMO-2D uses partial cells that best fit the bathymetry.
The γ 2 for NEMO-2D and NEMO-Bt were similar (not shown), especially for tide gauges with relatively large tidal residuals. In general, the γ 2 for NEMO-Bc remained the lowest. We note however that the implication that 2D models will not perform as well as baroclinic models needs to be checked using 2D models with higher horizontal resolution and smaller minimum depth. (NEMO-Bc assumes a minimum depth of 10 m.) With regard to future work, we plan to optimize the choice of wind drag coefficients and also extend the study period. A longer test period will increase the number of storm surges for analysis and make the model evaluation more comprehensive and definitive. Another important next step is to better understand the physical reasons for the increased skill resulting from the inclusion of density stratification. This will require examination of more case studies and the possible use of higher resolution, nested baroclinic models of regions of particular interest (e.g. South Africa, west of of North America). Our longer term goal is to include tides in the surge model and fully utilize GEPS to develop a global ensemble prediction system for total water level.
Acknowledgments The authors thank two anonymous reviewers and the editor for their exceptionally thorough reviews and constructive comments on an earlier draft of the manuscript. We acknowledge Environment and Climate Change Canada for providing access to computing resources. The authors acknowledge financial support from the Marine Environmental Observation Prediction And Response (MEOPAR) network of centres of excellence.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix MOG2D-G and the DAC
The existing numerical model most relevant to the present study is MOG2D-G, a global barotropic model using finite element space discretization (Carrère and Lyard 2003). The grid size ranges from 20 km in the shallow ocean to 400 km in the deep ocean. The model is forced by 6 hourly atmospheric pressure and wind fields predicted by European Centre for Medium-range Weather Forecasts (ECMWF) analysis.
The predicted signals longer than 20 days are not used and are replaced by estimates based on the inverse barometer effect. After the replacement, the predicted fields are made available online as the so-called Dynamic Atmospheric Correction (DAC). The available period of DAC and the data acquisition method are described by Pascual et al. (2008). We obtained DAC data on a regular 1/4 • grid every 6 h. After the 6 hourly data were interpolated to hourly values they were filtered in the same way as the observations (Section 3.2).