Popular extreme sea level metrics can better communicate impacts

Estimates of changes in the frequency or height of contemporary extreme sea levels (ESLs) under various climate change scenarios are often used by climate and sea level scientists to help communicate the physical basis for societal concern regarding sea level rise. Changes in ESLs (i.e., the hazard) are often represented using various metrics and indicators that, when anchored to salient impacts on human systems and the natural environment, provide useful information to policy makers, stakeholders, and the general public. While changes in hazards are often anchored to impacts at local scales, aggregate global summary metrics generally lack the context of local exposure and vulnerability that facilitates translating hazards into impacts. Contextualizing changes in hazards is also needed when communicating the timing of when projected ESL frequencies cross critical thresholds, such as the year in which ESLs higher than the design height benchmark of protective infrastructure (e.g., the 100-year water level) are expected to occur within the lifetime of that infrastructure. We present specific examples demonstrating the need for such contextualization using a simple flood exposure model, local sea level rise projections, and population exposure estimates for 414 global cities. We suggest regional and global climate assessment reports integrate global, regional, and local perspectives on coastal risk to address hazard, vulnerability and exposure simultaneously. Supplementary Information The online version contains supplementary material available at 10.1007/s10584-021-03288-6.

S-1.1 Estimating extreme sea levels S-1.1.1 Tide gauge data We use long-term records of hourly or sub-hourly sea level observations from quality controlled tide gauges from the University of Hawaii Sea Level Center 4 and also supplement with other tide gauges from the GESLA2 data set (Woodworth et al, 2016). We limit our use of tide gauge records to only those that have record lengths >30 consecutive years in which each year has >80 percent of observations available. In total, we use 360 unique tide gauges, with median and average record lengths of 48 and ⇠54 years, respectively (a list of the tide gauges used is given in the supporting information). We estimate the daily maximum sea level for each day in the tide gauge record with > 80 percent of available observations. We note that this temporal resolution only facilitates the estimation of still-water heights. Tide gauges are often placed inside protected harbors. This generally prevents capturing wave setup effects and swash. Wave setup is more likely to be picked up at tide gauges where there are shallow sloping beaches. Moreover, the time averaging of the observations also filters out contributions from waves, which can be significant contributors to extremes (Melet et al, 2018;Arns et al, 2017). To isolate the variation in extreme sea levels (ESLs), we remove the effect of mean sea level (MSL) change by subtracting the annual MSL from each daily maximum value (i.e., values are de-trended). The de-trended daily maximum tide values are then referenced to local mean higher high water (MHHW; relative to the de-trended mean sea level), defined as the average highest high tide at the tide gauge over a given period (here, either 1993-2012 or the last available 19-year period in the record). The local tidal component is not removed.

S-2
Fig. S-1: Flow of information used in this study to produce projected local population exposure estimates (white hexagon). Green rectangles are for extreme sea level estimation. Red rectangles are for sea-level rise projections. Yellow rectangles are for local population exposure estimates. B19 is Bamber et al (2019); SLR is sea-level rise; GIC are glaciers and ice caps; GRD are gravitational, rotational, and deformational effects; "Other sea level components" includes land water storage, oceanographic processes, and non-climatic background changes, such as glacial-isostatic adjustment.
S-1.1.2 Modeling the probability of rare events with extreme value theory Extreme value theory is commonly used to produce models that describe unusual or rare behavior (Coles, 2001b;Embrechts et al, 1997). If a modeled fit is considered "good", it can be used to extrapolate the frequency of events of interest, sometimes well beyond the length of the observation record (e.g., determining the height of 100-yr ESL event from a 30-yr record of tide gauge data). Various extreme value distributions and approaches to implement them have been proposed (Coles, 2001b;Embrechts et al, 1997), but in the case of ESL estimation, there currently is not an agreed upon "best approach". Depending on the specific project goals, a particular extreme value modeling strategy may be preferred over another. Following previous studies (Tebaldi et al, 2012;Buchanan et al, 2017;Rasmussen et al, 2018;Frederikse et al, 2020;Wahl et al, 2017), we the extreme values at each tide gauge using a generalized Pareto distribution (GPD;Pickens, 1975;Coles, 2001c,a).
The GPD is a peaks over threshold extreme value modeling approach that describes the probability of a given ESL height conditional on the exceedance of a threshold. The GPD has the advantage over other generalized extreme value (GEV) models for two reasons. First, if the tide gauge record is short, a peaks over threshold approach that uses sub-annual observations/extremes may be preferred over other generalized extreme value (GEV) models that only use annual maxima (i.e., one value per year; Lang et al, 1999;Cunnane, 1973). In other words, more information is retained in the peaks over threshold approach. Using annual maxima can waste data since a year of observations is being represented with a single value. For some locations, more than half of the highest observed ESLs have occurred within the same year (e.g., Boston, U.S.A.; Baranes et al, 2020). Furthermore, if a single annual maxima coincides with a low tidal level, the tidal influence is ignored despite its relevance to estimating ESLs. Second, unlike other GEV distributions (such as the Gumbel; Buchanan et al, 2017), the GPD includes a parameter that allows for the flexibility for the distribution to take on different shapes (shape parameter, ⇠, and its value depends on the characteristics of the underlying data), and 3) it can be combined with a Poisson rate parameter to translate ESL exceedance probabilities into expected numbers of annual ESL exceedances (the assumption being that GPD exceedances S-3 are Poisson distributed). The latter may be more intuitive and thus better for communicating the frequency of ESL events.
Following (Coles, 2001a), the cumulative distribution function of the GPD is given by: with u 2 R, ⇠ 2 R, and valid for ↵ > 0, z u when ⇠ 0 and u z u ↵/⇠ when ⇠ < 0. The parameters are as follows, ⇠ is the shape parameter, ↵ is the scale parameter, and u is the threshold water-level above which return levels are estimated with the GPD. Selecting the GPD threshold u (sometimes called the "location" parameter-not to be confused with geographic location) is critical element of the peaks over threshold approach as it can have substantial influence on the tail extrapolation. If the threshold is too low, it could bias the estimates because the included values may not be extreme enough. On the other hand, if the threshold is too high, the variance might be too large because too few points are being included in the analysis (Lang et al, 1999). The selection of the threshold can influence the parameter estimation (Coles and Tawn, 1996;Coles and Powell, 1996). GPD thresholds are often chosen either graphically, by looking at a mean excess plot (Embrechts et al, 1997), or by simply setting it to some high percentile of the data (DuMouchel, 1983). The graphical approach is infeasible in this study's case because we are working with several hundred tide gauges. Instead, we set the GPD threshold at the 99th percentile for all sites. The 99th percentile is generally is above the highest seasonal high tide, balances the bias-variance trade-off in the GPD parameter estimation (Tebaldi et al, 2012) and has been found to perform well at global scales (Wahl et al, 2017). Because of the subjective nature of choosing the GPD threshold (Davison and Smith, 1990;Coles and Tawn, 1994), an automated approach remains elusive (Dupuis, 1998;MacDonald et al, 2011).
After the GPD threshold is selected, the daily maximum sea levels above u are de-clustered to satisfy the statistical independence assumption of the GPD. More specifically, the largest daily maximum sea level value within 3 days of each GPD exceedance is used. The remaining GPD parameters are then estimated using maximum likelihood, and their uncertainty is calculated from their estimated covariance matrix. To account for uncertainty in the GPD fit, the estimated covariance matrix is later sampled using a Latin hypercube sampling to produce 1000 normally distributed GPD parameter pairs. Note that the fit of the GPD and the uncertainty bounds may not always well capture the observed exceedances (e.g., San Juan; Fig 1A). While we extrapolate estimates for ESL events up to the frequency of the 1000-yr event, we caution in using any estimate of ESL that exceeds four times the length of the observation record (Pugh and Woodworth, 2014).
The scale, shape, and threshold/location GPD parameters reflect the historical experience with meteorological and oceanographic phenomena that drive ESL events (Pugh and Woodworth, 2014;Coles, 2001c,a;Tebaldi et al, 2012). The shape parameter governs the concavity and upward statistical limit of G(z). Positive shape parameters are indicative of sites exposed to strong tropical cyclones and/or a wide continental shelf that increases storm surge potential (e.g., sites along eastern North America), while negative shape parameters are associated with weaker, infrequent storms, and potentially a narrower continental shelf (e.g., sites in Western North and South America, Fig. S-2A). ESL frequency distributions with ⇠ > 0 are "heavy tailed" and concave up, due to a larger variation in extremes (e.g., existence of tropical and extra-tropical cyclones). Distributions with ⇠ < 0 are "thin tailed", concave down, and have a statistical upper bound on ESLs. The scale parameter ↵ reflects the variability in the distribution of threshold exceedances (i.e., the width of the distribution). The largest scale parameters are found in regions with frequent wintertime extratropical cyclone activity (e.g., the North Sea and Gulf of Alaska; Fig S-2B). The GPD threshold exceedances are shown in Fig. S-2C. The GPD parameters are given in the supporting data files.

S-1.1.3 Mixture normal-extreme value theory probability model
Because the GPD only is defined above the threshold u, an additional distribution is needed to characterize events below u. Following MacDonald et al (2011) and Ghanbari et al (2019), we implement a non-stationary probability mixture model (Eq. S-1.1.3) that simultaneously characterizes the bulk and upper tail of the de-trended daily maximum tide gauge data. The bulk of the data follows a Normal distribution and is fitted as such, however, the upper tail of is not (Fig. S-3. As such, it is considered separately, and a GPD is fit to the showing the generalized Pareto distribution (GPD) shape parameter (⇠; dimensionless) at cities around the world. B. As for A, but for the GPD scale parameter (↵; meters above mean higher high water). C. As for A, but for the GPD threshold or "location" parameter (u; meters above mean higher high water).
observed extreme values. Studies often ignore the bulk portion of the ESLs distribution, or assume it is well fitted by an arbitrary distribution without empirically testing such fits (e.g., a Gumbel, see Normal distribution fit (red line). Right: As for Left, but a quantile-quantile plot. Note that the bulk of the maximum daily water levels fits a Normal distribution, but the tails do not.
The cumulative distribution function of the mixture model is given by: where N (z|µ, ) and G(z|u, ↵, ⇠) are the Normal and conditional GPD cumulative distribution functions, respectively, and ' is the ratio of de-clustered daily maximum sea level observations that meet or exceed u over the total number of observations. The Normal distribution parameters µ and are estimated from the de-trended daily maximum sea level observations, and u is the GPD threshold. An additional benefit of the Normal-GPD probability mixture model (Eq. S-1.1.3) is that it somewhat bypasses the uncertainty in the GPD threshold choice of u through the use of a smooth transition function between the bulk and tail components (MacDonald et al, 2011;Behrens et al, 2004).
To account for local change in mean sea level (described below), the GPD threshold u and the Normal distribution parameter µ linearly shifts. We do not consider future changes in storm frequency (Walsh et al, For a given tide gauge, the annual expected number of exceedances of height z are given by the Normal-GPD model as follows: where n y is the number of observations per year and erf(·) is the error function. The exceedances are assumed to follow a Poisson process, with a mean of n y ⇥ '. . However, incomplete understanding of the physical processes that govern ice sheet melt inhibits realistic representations in process-based models. In such cases of incomplete scientific understanding, SEJ using calibrated expert responses is one approach for estimating such uncertain quantities (as employed here). Each RSLC probability distribution is conditional on a scenario in which global mean surface air temperature (GSAT) stabilizes in 2100 at either +2 C (consistent with the Paris Agreement; UNFCCC, 2015) or +5 C (consistent with unchecked emissions growth; GSAT relative to 1850-1900Hausfather and Peters, 2020). Samples from each RSLC probability distribution are used to shift the ESL return curves in the direction of the RSLC. Fig. 1A shows the future (2070) ESL return curve for a tide gauge located in San Juan (Puerto Rico). The "kinks" in the return curves appear as a result of the highest samples in the RSLC probability distribution causing the expected ESL frequency calculation to saturate and then subsequently increase the expected number of ESL events. Both the positioning and the presence of the kinks in the return curves are sensitive to the choice of where the upper-tail of the RSLC distribution is truncated (Rasmussen et al, 2020). The other RSLC component contributions are from ocean thermal expansion, glaciers and ice caps, and land-water storage. General circulation model (GCM) output is used to generate the steric and glacial ice melt sea level components for each global mean surface air temperature (GSAT) stabilization scenario. The +2 C scenario used the GCM outputs specified for the same GSAT scenario in Rasmussen et al (2018) and the +5 C scenario used GCM outputs from the representative concentration pathway (RCP) 8.5 from Kopp et al (2014). Also accounted for are regional effects such as ocean dynamics (from GCM output), gravitational and rotational effects of ice mass redistribution, glacial isostatic adjustment, and other local vertical land motion factors (e.g., tectonic uplift, sediment compaction and ground water withdrawal). Probability distributions of local RSLC are produced using 10,000 The vertical population profile of a city can determine the population exposure to a given ESL event. We produce 1-dimensional (vertical) population profiles for 414 global cities using CoastalDEM and population density data from the WorldPop 2010 high resolution (3 arc second) gridded global population data set (Tatem, 2017). In order to simplify our analysis and also isolate the impact of RSLC on population exposure, we assume that population remains fixed in time. Thus, our results are not literal projections of future population exposure-which will depend upon population growth (Hauer et al, 2016;Jongman et al, 2012) and the dynamic response of the population to RSLC (Merkens et al, 2016;Hauer, 2017)-but are instead intended to highlight the impact of ESL events relative to changes in their frequency. A population exposure profile for San Juan (Puerto Rico) is given in Fig. 1B. Exposure profiles for all cities are included in the supporting data.
We map flood extents for each city using the "bathtub" model, a static flood model approach that only considers the vertical elevation of two surfaces, 1) the terrain and 2) a given ESL return level (e.g., 100-yr event) measured at the nearest tide gauge within a 100-km radius of each city center. The return level of interest is spatially extrapolated over the terrain. Static flood approaches have many limitations, including overestimating observed flood extents relative to hydrodynamic models ( , our exposure estimates do not account for these defenses because they can be overtopped and breached; only the population in the floodplain is considered. However, protection is assumed to only change the flood hazard and enters our integrated calculations of population exposure that consider all ESL return levels (Sec. S-1.1). To our knowledge, verified, location-specific levels of protection are not available at the global scale 5 . To account for flood defenses that provide a margin of safety, we make multiple arbitrary assumptions regarding the current level of protection for all cities. Specifically, we produce results assuming spatially uniform "no protection" and protection up to the height of the 1-and 10-yr ESL event. These assumptions may greatly differ from reality and could lead to gross over-estimates for cities with existing flood protection that afford a high margin of safety from rare storms, such as London, New Orleans, Tokyo, Shanghai, and most major cities in the Netherlands (Nicholls et al, 2008;Hallegatte et al, 2013;Xian et al, 2018;Scussolini et al, 2016). Despite this, we still include protection assumptions for all cities to 1) limit the inclusion of the lowest elevations which are most prone to vertical errors in the DEM and 2) to highlight the importance of the flood protection assumption and the need for accurate estimates thereof (e.g., Scussolini et al, 2016). The protection assumptions do not impact AFs above the height of the protection level, but can significantly impact integrated metrics that consider all ESLs and impacts, such as the expected annual exposure (EAE; Table S-4)(Kulp and Strauss, 2017).
A vertical adjustment was made to the ESLs to reference the same local mean higher high water (MHHW) datum as CoastalDEM. ESLs use tide gauge data for estimating MHHW, while CoastalDEM uses the glob-S-7 ally extensive MSL model MSS CNES-CLS15 6 , based on a 1993-2012 record of satellite sea surface height measurements from TOPEX/Poseidon, and MHHW deviations from MSL provided by Mark Merrifield, University of Hawaii, developed using the model TPX08 (Egbert and Erofeeva, 2002). Using the National Oceanic and Atmospheric Administration's (NOAA) VDatum tool (version 3.7;Parker et al, 2003), we convert these measurements to vertically reference EGM96, and we subtract the resulting surface from CoastalDEM to convert its vertical datum to MHHW (adjustments are provided in the supporting data files).
We take a static flood modeling approach by spatially extrapolating return periods of ESLs, measured at tide gauges, onto the surrounding topography. We map one tide gauge to each city. For cities where there is more than one tide gauge within a 100-km radius (e.g., Willet's Point and the Battery in NYC), the tide gauge with the longest record is used (the tide gauges assigned to each city are listed in the supporting data files). While the height of a given ESL return period may vary within a city, in part due to complicated bathymetry and coastlines, Kulp and Strauss (2017) has shown that ESL population exposure results for the U.S. are generally insensitive to tide gauge assignment within a 100-km radius. In any case, static flood approaches do not account for complex local geomorphology, local ocean dynamics, and frictional losses as water moves over different land surfaces (e.g., wetlands, beaches, urban areas). More complex hydrodynamic inundation approaches that model the flow of the ocean onto a variety of land surfaces could be used to more accurately estimate local floods for cities across the globe. However, such an approach is out of the scope of this study as our intention is to merely highlight the limitations of physical ESL metrics. Furthermore, we note that hydrodynamic inundation modeling may not necessarily outperform simpler approaches, especially in active tropical cyclone regions (Hunter et al, 2017;Muis et al, 2016Muis et al, , 2017Wahl et al, 2017), in part due to poor representation of tide-surge interactions (Arns et al, 2020) and short simulation periods that are less likely to produce rare, extreme events found in multi-decadal tide gauge records. For some locations, the height of the 100-yr ESL event can be under-predicted by up to 3 meters compared to tide gauge-derived estimates (supporting information of Muis et al, 2017).
The WorldPop population data are resampled to align with CoastalDEM raster (for more details, see Kulp and Strauss, 2019), integrated over select elevations (-2 to 13 m above MHHW, either 0.25 or 0.5 m increments), and then tabulated according to the satellite-derived urban footprint for each city from Natural Earth (which may differ from the actual administrative boundary; Kelso and Patterson, 2012). We note that Kulp and Strauss (2019) instead uses LandScan population density at 30 arc seconds. Coastal cities are only included in the analysis if there are populations with in the defined boundary at any elevation < 13 m above local MHHW. Connected components analysis excludes low-elevation inland areas that are not linked to the ocean. Linear interpolation is used between each select elevation to produce a continuous 1-dimensional population exposure profile D(z), where z is the ESL height. We note that z could be a vector to account for spatial differences in ESL height for the same return period within a city and also spatial variation in flood protection.
Exposure analyses are most sensitive to spatially-autocorrelated vertical errors in the DEM at local scales and when assessing population vulnerability at low elevations (e.g., < 0.5 m; Kulp and Strauss, 2019). The higher the elevation that population exposure is being assessed at (e.g., longer return periods, such as the 100-yr event), the less of an impact these errors will have on exposure (Kulp and Strauss, 2019). To assess the impact of elevation errors on population exposure AFs, we use the example of the 100-year ESL event in New York City. Considering lidar topography as ground truth, we find that the EAE AFs are generally insensitive to errors in CoastalDEM; small differences only appear by 2100 for the +5 C scenario (2.3 vs 2.6; Table S-2). This is despite CoastalDEM underestimating population exposure relative to lidar (connected components analysis not performed for this test; Tables S-1,S-2). We note that CoastalDEM was trained on lidar elevation data in the U.S., so locations outside the U.S. may be more sensitive. Tokyo, Japan  (2012) and may differ from actual political boundaries. Populations are assumed to remain constant in time. Denoted is the current level of protection (A min ), assumed to be the 10-yr ESL event, the height of the historical 100-yr ESL event (grey), and the expected heights of the 100-yr ESL event under a +2 C (orange) and +5 C (red) climate scenario. Also shown for New York City is a population profile generated using a 0.3-m horizontal resolution light detection and ranging (LiDAR)-derived digital elevation model for the City of New York (https://data.cityofnewyork.us/City-Government/1-foot-Digital-Elevation-Model-DEM-/dpc8-z3jc).

S-9
S-1.4 Estimating physical and population exposure amplification factors Following Buchanan et al (2017), the ESL frequency AF for an event of height z ⇤ under uncertain RSLC is is the expected number of exceedances of height z ⇤ after considering RSLC (Sec. S-1.1). The ESL return level AF is given by 1 + E[ ]/z ⇤ . Here, we extend the ESL return level AF to changes in population exposure. We define the population exposure AF for an event of height z ⇤ under uncertain RSLC as E[D(z ⇤ + )/D(z ⇤ )], where D(·) is a 1-dimensional vertical population profile of a city (Sec. S-1.3). Note that since D(·) is solely a function of z, the frequency amplification of population exposure events is equivalent to the ESL frequency AFs for the same city. Probability distributions of ESL AFs (frequency and return level) and population exposure AFs are produced for each tide gauge using the RSLC samples for each climate scenario (Sec. S-1.2). Results are then taken from these distributions. Table S-1: Estimated total population exposure and amplification factors for the 100-yr extreme sea level (ESL) event for New York City using 1) a 0.3-m horizontal resolution light detection and ranging (LiDAR)derived digital elevation model for the City of New York (https://data.cityofnewyork.us/City-Government/1foot-Digital-Elevation-Model-DEM-/dpc8-z3jc) and 2) CoastalDEM. Future projections assume a climate scenarios in which global mean surface air temperature is stabilized in 2100 at +2 C (relative to 1850-1900Bamber et al, 2019