Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States

Sun, Ning; Yan, Hongxiang; Wigmosta, Mark S.; Coleman, Andre M.; Leung, L. Ruby; Hou, Zhangshuan

doi:10.1038/s41597-022-01221-9

Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States

Data Descriptor
Open access
Published: 05 April 2022

Volume 9, article number 154, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States

Download PDF

2754 Accesses
8 Citations
36 Altmetric
5 Mentions
Explore all metrics

Abstract

Despite the close linkage between extreme floods and snowmelt, particularly through rain-on-snow (ROS), hydrologic infrastructure is mostly designed based on standard precipitation Intensity-Duration-Frequency curves (PREC-IDF) that neglect snow processes in runoff generation. For snow-dominated regions, such simplification could result in substantial errors in estimating extreme events and infrastructure design risk. To address this long-standing problem, we applied the Next Generation IDF (NG-IDF) technique to estimate design basis extreme events for different durations and return periods in the conterminous United States (CONUS) to distinctly represent the contribution of rain, snowmelt, and ROS events to the amount of water reaching the land surface. A suite of datasets were developed to characterize the magnitude, trend, seasonality, and dominant mechanism of extreme events for over 200,000 locations. Infrastructure design risk associated with the use of PREC-IDF was estimated. Accuracy of the model simulations used in the analyses was confirmed by long-term snow data at over 200 Snowpack Telemetry stations. The presented spatially continuous datasets are readily usable and instrumental for supporting site-specific infrastructure design.

Measurement(s)	gridded precipitation
Technology Type(s)	weather station
Sample Characteristic - Environment	flood • snowmelt • hydrological process
Sample Characteristic - Location	contiguous United States of America

Next-Generation Intensity-Duration-Frequency Curves for Diverse Land across the Continental United States

Article Open access 04 December 2023

Comparison of on-site versus NOAA’s extreme precipitation intensity-duration-frequency estimates for six forest headwater catchments across the continental United States

Article Open access 30 June 2023

Efficient statistical approach to develop intensity-duration-frequency curves for precipitation and runoff under future climate

Article 16 January 2021

Background & Summary

Although it is well understood in the scientific community that extreme hydrometeorological events in cold climates are often related to snow processes, i.e., snowmelt and rain-on-snow (ROS)^1,2,3,4,5,6, they have largely been ignored or under-represented in hydrologic design that relies largely on traditional precipitation-based intensity-duration-frequency curves (PREC-IDF) to estimate design basis extreme events (e.g., 100-year 24-hour event). PREC-IDF such as the National Oceanic and Atmospheric Administration (NOAA) Atlas 14⁷ assumes that precipitation is in the form of rainfall that is immediately available for rainfall-runoff processes. This assumption has obvious shortcomings, especially in snow-dominated regions where winter precipitation is primarily snowfall. At locations where runoff is released slowly from accumulated snowpack, PREC-IDF can lead to infrastructure overdesign and incur unnecessary costs. Conversely, underdesign will occur where the snow-driven runoff rate is higher than the rate of precipitation. This was confirmed by previous research^8,9, which demonstrated that PREC-IDF underestimates the 100-year, 24-hour extreme events in 45% of the Snow Telemetry (SNOTEL) stations examined, and the resulting peak design flood could be underestimated by up to 324%.

The Oroville Dam failure in February 2017 that required $1.5 billion to repair is a notable example of costly infrastructure damages resulting from floods driven by ROS events¹⁰. The exclusion of snow processes in the PREC-IDF technique is likely to cause greater errors in estimating extremes in a warming climate and present higher infrastructure design risk. For example, increased frequency and intensity of atmospheric rivers in the future are anticipated to cause more extreme orographic precipitation and subsequently increase flood risk along the U.S. West Coast^11,12,13,14.

Given the limitations in PREC-IDF and implications for design risk, the Next-Generation IDF (NG-IDF) technique⁸ was developed to estimate extreme events based on the amount of water reaching the land surface (W) during rain, snowmelt, and ROS events. By including snow processes, NG-IDF provides a systematic and consistent technique for all environments from rain-dominated, transitional, to snow-dominated locations. Despite the marked advantages of NG-IDF over PREC-IDF, its wide adoption by engineers and planners is hindered by rather limited snow observations for estimating W, especially relative to widely available precipitation products^15,16,17. Using physics-based models to produce reasonable simulations of W, while feasible, is rather challenging, given the significant requirements of expert knowledge in model calibration and considerable cost of computational resources.

To support the broad adoption of the NG-IDF approach, we developed a suite of datasets that characterize extreme events relevant to hydrologic design over 1951‒2013 at a 1/16th-degree (~6 km) resolution across the conterminous United States (CONUS). For over 200,000 locations in the CONUS, the datasets provide: the magnitude and dominant driving mechanism (i.e., rain, melt, and ROS) of extreme events for different durations (24‒72 hours) and return periods (2‒500 years) derived from the NG-IDF curves; the magnitude of design extreme events associated with different hydrometeorological drivers; trend and seasonality of annual maximum W events (AMW) over 1951‒2013; and infrastructure design risk associated with the PREC-IDF technique. These datasets were developed based on sub-daily simulations of W by a well-validated physics-based hydrological model (Distributed Hydrology Soil Vegetation Model, DHSVM¹⁸). To examine the accuracy of the simulations, the simulated snow water equivalent (SWE) was evaluated against long-term observations of SWE from 246 SNOTEL stations distributed across the Western U.S. These datasets intend to offer spatially distributed, quantitative measures of extreme events that are readily usable by the science and engineering communities to understand the climatology and driving mechanism(s) of extreme events, identify potential infrastructure design risk associated with the standard PREC-IDF method, and improve estimation of extremes.

Methods

Overview of NG-IDF datasets development

The method to develop the NG-IDF datasets is illustrated in Fig. 1. In contrast to PREC-IDF that estimates extreme events based on the total amount of precipitation (implicitly assumed to be in the form of rain), NG-IDF curves are developed based on the amount of water available for runoff (W) for the bare ground condition with no canopy cover, which can be represented mathematically by:

$$W=P-\Delta SWE+S$$

(1)

where P is precipitation, ΔSWE is the change in ground snowpack water content, and S indicates condensation (positive) or evaporation/sublimation (negative) of snowpack. In this development, as described in more detail in the following sections, P was taken from the gridded, gauge-based meteorological dataset, ΔSWE and S were simulated by the DHSVM snow model. The resulting W data is a 3-hourly time series for 1950–2013 at 207,173 grid cells covering the land surface of the CONUS at a 1/16th-degree resolution.

As observations suggested that snowfall events largely last up to 72 hours¹⁹, NG-IDF curves were developed for W events with 24-, 48- and 72-hour durations, respectively. We first developed series of annual maximum W (AMW) for years 1951–2013 based on subdaily estimations of W aggregated over different durations. For example, AMW with a 72-hour duration is the maximum of the 72-hour moving sum of W over a given year. Given variations in precipitation seasonality across the CONUS, we defined AMW events for both water year (October 1st through September 30th) and calendar year (January 1st through December 31st). AMW based on calendar year is more appropriate for locations with extreme precipitation occurring during the summer season such as the Southwest U.S., and AMW based on water year is better for locations with extreme winter precipitation such as the Western U.S. Given the focus on snow-driven extreme events, we used AMW defined by water years for IDF curves.

We applied the nonparametric Mann-Kendall test^20,21 on the AMW series. Where trends were significant at the 5% confidence level, we detrended the time series using Sen’s slope²² while maintaining the long-term average of the time series. As suggested by NOAA Atlas 14, the generalized extreme value (GEV) distribution was fitted to AMW across all locations using the L-moments statistics^7,8. Here we used the same GEV distribution for analysis of extreme events with different durations across the CONUS so that direct comparisons can be made in frequency estimates across durations or between locations. To quantify the implications of using PREC-IDF for infrastructure design risk, we also developed PREC-IDF curves following the same approach based on annual maximum precipitation (P). The design risk was quantified by comparing the NG-IDF and PREC-IDF values of extreme events (e.g., 100-year 24-hour event) over the CONUS.

A Monte Carlo (MC) simulation procedure²³ and NOAA Atlas 14⁷ was used to consider sample data uncertainty in frequency analysis. For each location, we generated 1,000 MC synthetic annual maximum series. We then fitted the GEV distribution to each MC series using the L-moments statistics and estimated the associated NG-IDF values. The uncertainties in NG-IDF curves for each location were quantified using the 5% and 95% quantiles (i.e., the 90% confidence interval) of ensemble members.

Gauge-based gridded meteorological data

For simulations of snow processes, the model requires subdaily input of P, air temperature, relative humidity, downward shortwave and longwave radiation, and wind speed. The meteorological input used here was derived from disaggregating the gridded (~6 km), daily land surface meteorological dataset²⁴ over the CONUS using the Mountain Microclimate Simulation Model (MTCLIM) algorithm²⁵. Distinct from reanalysis products, this dataset is one of the few gauge-based gridded datasets with long-term continuous records. It includes daily records of P, maximum and minimum air temperature, and wind speed spanning the period 1950–2013. The disaggregation algorithms²⁵ estimate subdaily air temperatures with a 3rd-order polynomial fit based on daily temperatures range. Radiation and relative humidity are estimated based on daily temperature range, P, and solar geometry²⁵. P is assumed to occur at a uniform rate throughout the day.

Observational snow data

Daily SWE measurements were retrieved from SNOTEL stations for snow model parameterization and evaluation. Among the 785 SNOTEL stations, we selected 246 stations that shared the longest common period (2007‒2013) of bias-corrected and quality-controlled (BCQC) daily SWE records. Briefly, standard quality control procedures^8,26 were applied to remove stations with missing data, outliers, and problematic SWE values (e.g., peak SWE > accumulated winter precipitation). The BCQC procedures and the resulting BCQC datasets²⁷ are available to the public at https://www.pnnl.gov/data-products.

CONUS-scale snow modeling and parameter development

The physics-based, snow submodel of DHSVM¹⁸ was implemented at the point scale to simulate snowpack dynamics under a bare ground and flat terrain condition. The model was run at the 3-hourly time step from 1950–2013 for 207,173 point locations at a 1/16th-degree grid spacing that coincides with the center of the meteorological grids. Model output includes 3-hourly times series of SWE and S, which are used together with observed P for calculating W in Eq. 1.

DHSVM simulates ground snowpack accumulation and melt using a two-layer mass and energy balance ground snowpack module. The mass balance components consist of P, S, changes in SWE, and melt from the snowpack. The partition of P into rain and snow is based on air temperature thresholds:

$$\left\{\begin{array}{cc}R=P & {T}_{a}\ge {T}_{R}\\ R=P({T}_{R}-{T}_{a})/({T}_{R}-{T}_{S}) & {T}_{S} < {T}_{a} < {T}_{R}\\ R=0 & {T}_{a}\le {T}_{S}\end{array}\right.$$

(2)

where T_a is air temperature, T_S and T_R is the temperature threshold for P to be completely snowfall and rain, respectively. If T_a ≥ T_S, 100% of the P is rain (R); if T_a < T_R, 100% of the P is snow; if T_a falls between the two thresholds, rain and snow are proportionally allocated to represent mixed rain and snow events. Energy balance at the snow surface is driven by net radiation, sensible and latent heat, and advected heat by rain. Energy and mass exchange between the thin surface layer and deep snowpack layer occurs via the exchange of meltwater. When liquid water in the deep snowpack exceeds its holding capacity, excess water is released to the underlying soil column. Detailed descriptions of the DHSVM snow model physics and governing algorithms can be found in a large body of literature^18,28,29,30.

Prior calibration of snow models is performed typically at relatively local scales; thus, there are no calibrated, spatial snow parameter sets that can be readily applied for the CONUS-domain snow modeling. In support of this work as well as future large-domain snow modeling, here we developed spatially distributed snow parameters for the CONUS domain. Based on previous research²⁹ that documented the robustness of regionally coherent snow parameters in modeling snowpack dynamics, we developed snow parameters for five spatial clusters covering the CONUS (Fig. 2). Given strong correlations between winter climate and key aspects of snowpack dynamics^31,32,33, we determined the clusters using the k-means clustering machine learning technique³⁴ based on the grid-level climatological mean of P, maximum and minimum air temperature, and wind speed from November through March during 1950–2013. Different numbers of clusters were tested and we selected the optimal five clusters based on the inertia elbow method and our previous work²⁹.

Parameter development focused on four snow parameters that were identified by previous work²⁹ to be crucial for capturing daily SWE dynamics²⁹: (1) T_S (defined in Eq. 2), (2) fresh snow albedo (a_max), (3) albedo decay coefficient during snow accumulation (λ_A) and (4) snowmelt (λ_M). The last three parameters are applied in the snow albedo decay curve of Laramie & Schaake³⁵ for estimating snow surface albedo evolution, given by:

$$\begin{array}{lll}{\alpha }_{A} & = & {\alpha }_{max}\cdot {\lambda }_{A}^{{d}^{{\rm{0}}{\rm{.58}}}}\\ {\alpha }_{M} & = & {\alpha }_{max}\cdot {\lambda }_{M}^{{d}^{{\rm{0}}{\rm{.46}}}}\end{array}$$

(3)

where α_A and α_M are the snow surface albedo during the accumulation and melting seasons, respectively, and d is the number of days since the last snowfall. For other model parameters, we used the default values. The cluster-based snow parameters (Table 1) were developed as follows: (1) we produced prior ensemble parameters, consisting of 10,000 sets of the four snow parameters drawn uniformly from their physically plausible ranges, using the Latin Hypercube Sampling algorithm; (2) snow simulations were conducted at each SNOTEL location for every prior parameter set; (3) the posterior ensemble parameters were resampled from the prior ensemble if they met the threshold values of objective functions with observations. Here, we used three metrics (Eqs. 4‒6): Nash-Sutcliffe Efficiency (NSE) of daily SWE ≥ 0.6, the bias in the mean annual peak SWE (PEAK.ERR) within ± 25%, and the bias in the timing of peak SWE (PDATE.ERR) ≤ 14 days.

$$NSE=1-\frac{{\sum }_{i=1}^{t}{({Y}_{i}-{O}_{i})}^{2}}{{\sum }_{i=1}^{t}{({O}_{i}-\bar{O})}^{2}}$$

(4)

where O_i and Y_i are the observed and predicted daily SWE at day i, respectively; t is the total number of days for which model simulations were performed; $\bar{O}$ is the observed mean daily SWE over the simulation period.

$${PEAK.ERR}={\sum }_{k=1}^{ny}\frac{({Y}_{k}^{P}-{O}_{k}^{P})}{{Y}_{k}^{P}}/ny$$

(5)

where ${O}_{k}^{P}$ and ${Y}_{k}^{P}$ are the observed and predicted peak SWE for the k^th water year, respectively; ny is the total number of years of simulation.

$${PDATE.ERR}={\sum }_{k=1}^{ny}\left({Y}_{k}^{D}-{O}_{k}^{D}\right)/ny$$

(6)

where ${Y}_{k}^{D}$ and ${O}_{k}^{D}$ are Julian dates of observed and predicted peak SWE for the k^th water year, respectively. After step (3), 58 stations with no qualified posterior parameter values were removed from subsequent analyses. Most of these stations are located in high-latitude areas of Montana and Wyoming, where model skill is challenged by the lack of representation of wind effects on snow redistribution, or the maritime Pacific Northwest where snowmelt tends to be sensitive to errors in modeled energy balance and precipitation partitioning when temperature is near freezing²⁹; (4) for each cluster, final parameter values were calculated as the ensemble mean over the posterior parameter space of all stations within the cluster. For the Southern cluster (C3), no SNOTEL observations are available for parameter development. Because snowfall is very limited for most of the Southern cluster where snow parameters have negligible effects on extreme events, we used the parameter values of the maritime cluster for the Southern cluster given their commonality of warm winter. The cluster parameter values are presented in Table 1.

Table 1 Cluster snow parameters developed for the CONUS.

Full size table

Driving mechanism of extreme events

For each location, the driving mechanism was determined for extreme W events with different durations and return periods. Table 2 presents the classification approach used for determining the mechanism based on subdaily information of P and SWE, which include:

1.
Rainfall only (R): precipitation on snow-free ground;
2.
Snowmelt only (M): decreasing SWE with no concurrent precipitation;
3.
Rain-on-snow (ROS): decreasing SWE with concurrent precipitation. Given the interest in flood potential, a ROS event is further refined as one with at least 10 mm rainfall per day falling on a snowpack with at least 10 mm SWE over the selected duration, and the sum of rain and snowmelt contains at least 20% of snowmelt^36,37,38.

Table 2 Classification of the driving mechanism of W extremes.

Full size table

Annual maximum series resulting distinctively from each driving mechanism were determined, based on which IDF curves were developed following the same approach as described for NG-IDF. The mechanism producing the largest IDF value was identified as the dominant mechanism of extreme events.

Seasonality of extreme events

The seasonality of AMW at the 24-, 48- and 72-hour durations was represented by the seasonality index (SI) and the mean date (MD) relative to October 1st due to the use of water year. They were calculated using the circular statistics^1,39,40, given by:

$$SI=\sqrt{{\bar{x}}^{2}+{\bar{y}}^{2}}$$

(7)

$$MD{=\tan }^{-1}(\bar{y}/\bar{x})\cdot 365/2\pi $$

(8)

where

$${\theta }_{i}=D\cdot 2{\rm{\pi }}/365$$

$$\bar{x}={\sum }_{i=1}^{n}\cos ({\theta }_{i})/n$$

$$\bar{y}=\mathop{\sum }\limits_{i=1}^{n}{\rm{\sin }}({\theta }_{i})/n$$

For a given water year denoted by i, D is the day of the AMF occurrence relative to October 1st (i.e., D = 1 if the event occurred on October 1st); n is the total number of water years used in the analysis. SI, ranging from 0 to 1, measures the temporal variability of the occurrence of events. A smaller SI suggests weaker seasonality, and the associated MD is therefore less reflective of the actual timing of the extreme events.

Data Records

The NG-IDF datasets⁴¹ are available to the public through an unrestricted repository at https://doi.org/10.5281/zenodo.5827028 in comma-separated value (.csv) format. Table 3 provides a summary of the folder structures, description of data files, output variables in each file and the format.

Table 3 Description of the NG-IDF datasets.

Full size table

Technical Validation

The accuracy of estimated W and all related datasets depends primarily on the accuracy of daily SWE simulations given that P was observational (see Eq. 1). As there exists no data for direct evaluation of NG-IDF curves, we validated SWE simulations against daily SWE observations from 246 SNOTEL stations. Three model performance metrics that compare the simulated and observed SWE were applied: (1) NSE, (2) PEAK.ERR, and (3) PDATE.ERR, which measured the overall goodness-of-fit, peak SWE, and the timing of peak SWE, respectively. Model evaluations (Fig. 2) showed that NSE of daily SWE was greater than 0.6 at 75% of all stations, PEAK.ERR was within ± 25% at 67% of the stations, and PDATE.ERR was within two weeks at 67% of the stations. Overall, the simulations were able to reproduce the observed SWE dynamics at most stations using the cluster-based snow parameters.

Usage Notes

The NG-IDF datasets listed in Table 3, with no additional data analysis, can be used for a wide variety of applications over spatial scales ranging from local, regional to the CONUS scales. Overall, the presented estimates of extreme events and their characteristics based on long-term observational and simulation records are crucial for understanding flood potential, particularly for cold regions where infrastructure design risk exists from using PREC-IDF curves or NOAA Atlas 14.

For hydrologic engineering designs and analyses, one can obtain for any location(s) of interest the magnitude of extreme W events (Fig. 3a) and their dominant mechanism (Fig. 3c). Through comparing the NG-IDF and PREC-IDF values of extreme events for the same locations, one can determine the magnitude of bias or design risk related to the use of PREC-IDF (Fig. 3b). Information related to the flood seasonality is also key for understanding the generating mechanisms of floods and supporting future flood management. For instance, one can obtain the seasonality index (SI) and the mean timing of AMW events for any location of interest. For locations with a stronger seasonality (i.e., a higher SI), there is less inter-annual variability in the timing of AMW occurrence, and thus the mean date represents better the occurrence dates of AMW from year to year. At broader spatial scales, the datasets can be used for prioritizing locations for flood management and adaptation. As shown in Fig. 3, there is substantial spatial variability in the magnitude of 100-year 24-hour W events, which are typically higher in the ROS-dominated Pacific Northwest mountain ranges, and rain-dominated Gulf coastal plains. The dominant mechanism exhibits greater heterogeneity in topographically complex mountainous regions, and there is a shift in the dominant mechanism from ROS to rain or melt for events with a longer duration (72-hour versus 24-hour) (Fig. 3c,d). The presented datasets can also be used for estimating catchment-scale flood responses by coupling the NG-IDF curves with a rainfall-runoff model as demonstrated in prior work¹⁶.

Lastly, it should be noted that the datasets are subject to a few limitations:

(1)
Diurnal variability is not represented in the precipitation data used here to force the snow model and construct IDF curves. As a result, our estimates of extreme events are limited to daily or longer durations. While the daily temporal resolution is mostly sufficient to capture snow-related extreme events, short-duration IDF curves based on high-resolution rainfall data are more appropriate for capturing short-duration extremes (e.g., flash floods) or floods in small catchments with fast response times (<24 hours).
(2)
Although generally smaller compared to sample data uncertainty⁸, the choice of probability distribution can contribute to uncertainties in estimates of extremes⁴². Depending on the usage, one can apply the distribution of choice or ensemble distributions for analysis of extreme W events using provided AMW datasets.
(3)
Greater uncertainties in NG-IDF estimates are expected for locations with lower snow model skill, such as the maritime Pacific Northwest and locations exposed to high winds during winter.
(4)
Stationarity assumption. Based on the Mann-Kendall test, about 10% of the CONUS shows a statistically significant trend in AMW with any duration of 24, 48, or 72 hours. Hence, the IDF curve estimates based on the stationary assumption are valid for about 90% of the CONUS. For the remaining locations showing a significant trend in AMW, we recommend applying a nonstationary approach for constructing NG-IDF curves. Among a variety of nonstationary approaches^17,43,44, here we demonstrate the application of the Non-stationary Extreme Value Analysis (NEVA) Software⁴⁵ to construct the NG-IDF curve based on the 24-hour AMW at a location in the Oregon State, where AMW has the highest positive trend of 1.12 mm/decade over the CONUS (Supplementary Fig. 1). In this particular case, the analysis suggests that the stationary assumption may lead to the underestimation of extreme events into the future. With the provided AMW series, one can develop nonstationary IDF curves using the approach of choice.
(5)
Consideration of land cover. The NG-IDF datasets provided in this study are developed for open conditions (as opposed to forest conditions). Given strong forest-snow interactions³³ and their implications for streamflows³⁰, incorporating different land cover types into the development of NG-IDF datasets is an undergoing research effort. The datasets presented here are used as the baseline for understanding the vegetation impacts or land use change (e.g., urbanization) impacts on NG-IDF curves.

Code availability

The source code of the DHSVM model used for snow simulations can be freely downloaded at https://github.com/pnnl/DHSVM-PNNL. The R programming language was used for developing IDF curves, detecting trend and determining seasonality of annual maximum series, using the following packages: trend⁴⁶, lmom⁴⁷, circular⁴⁸. Source codes that were used to develop and analyze the data are publicly available at https://github.com/Lizzy0Sun/NG-IDF-analysis-code/.

References

Berghuijs, W. R., Woods, R. A., Hutton, C. J. & Sivapalan, M. Dominant flood generating mechanisms across the United States. Geophys. Res. Lett. 43, 4382–4390 (2016).
Article ADS Google Scholar
Leung, L. R. & Qian, Y. Atmospheric rivers induced heavy precipitation and flooding in the western U.S. simulated by the WRF regional climate model. Geophys. Res. Lett. 36, L03820 (2009).
Article ADS Google Scholar
Li, D., Lettenmaier, D. P., Margulis, S. A. & Andreadis, K. The role of rain‐on‐snow in flooding over the conterminous United States. Water Resour. Res. 55, 8492–8513, https://doi.org/10.1029/2019wr024950 (2019).
Article ADS Google Scholar
McCabe, G. J., Clark, M. P. & Hay, L. E. Rain-on-Snow Events in the Western United States. Bull. Am. Meteorol. Soc. 88, 319–328 (2007).
Article ADS Google Scholar
Musselman, K. N., Clark, M. P., Liu, C., Ikeda, K. & Rasmussen, R. Slower snowmelt in a warmer world. 7, 214–220 (2017).
Ralph, F. M. et al. The Impact of a Prominent Rain Shadow on Flooding in California’s Santa Cruz Mountains: A CALJET Case Study and Sensitivity to the ENSO Cycle. J. Hydrometeorol. 4, 1243–1264 (2003).
Article ADS Google Scholar
Perica, S. et al. Precipitation-Frequency Atlas of the United States, NOAA Atlas 14. (vol. 8, version 2.0, U.S. Dep. of Commer., National Oceanic and Atmospheric Administration, National Weather Service, Silver Spring, Md. (2013).
Yan, H. et al. Next-Generation Intensity-Duration-Frequency Curves for Hydrologic Design in Snow-Dominated Environments. Water Resour. Res. 54, 1093–1108 (2018).
Article ADS Google Scholar
Yan, H. et al. Next-Generation Intensity–Duration–Frequency Curves to Reduce Errors in Peak Flood Design. J. Hydrol. Eng. 24, 04019020 (2019).
Article Google Scholar
NCEI. Billiondollar weather and climate disasters. Available at https://www.ncdc.noaa.gov/billions/events/US/1980-2018 (2018).
Hagos, S. M., Leung, L. R., Yoon, J., Lu, J. & Gao, Y. A projection of changes in landfalling atmospheric river frequency and extreme precipitation over western North America from the Large Ensemble CESM simulations. Geophys. Res. Lett. 43, 1357–1363 (2016).
Article ADS Google Scholar
Warner, M. D., Mass, C. F. & Salathé, E. P. Changes in Winter Atmospheric Rivers along the North American West Coast in CMIP5 Climate Models. J. Hydrometeorol. 16, 118–128 (2015).
Article ADS Google Scholar
Cao, Q. et al. Floods due to Atmospheric Rivers along the U.S. West Coast: The Role of Antecedent Soil Moisture in a Warming Climate. J. Hydrometeorol. 21, 1827–1845 (2020).
Article ADS Google Scholar
Gershunov, A. et al. Precipitation regime change in Western North America: The role of Atmospheric Rivers. Sci. Rep. 9, 9944 (2019).
Article ADS Google Scholar
Hamlet, A. F. New Observed Data Sets for the Validation of Hydrology and Land Surface Models in Cold Climates. Water Resour. Res. 54, 5190–5197, https://doi.org/10.1029/2018WR023123 (2018).
Article ADS Google Scholar
Yan, H. et al. Evaluating next‐generation intensity–duration–frequency curves for design flood estimates in the snow‐dominated western United States. Hydrol. Process. 34, 1255–1268 (2020).
Article ADS Google Scholar
Yan, H., Sun, N., Chen, X. & Wigmosta, M. S. Next-Generation Intensity-Duration-Frequency Curves for Climate-Resilient Infrastructure Design: Advances and Opportunities. Front. Water 2, 545051 (2020).
Article Google Scholar
Wigmosta, M. S., Vail, L. W. & Lettenmaier, D. P. A distributed hydrology-vegetation model for complex terrain. Water Resour. Res. 30, 1665–1679 (1994).
Article ADS Google Scholar
Serreze, M. C., Clark, M. P. & Frei, A. Characteristics of large snowfall events in the montane western United States as examined using snowpack telemetry (SNOTEL) data. Water Resour. Res. 37, 675–688 (2001).
Article ADS Google Scholar
Kendall, M. G. Rank Correlation Methods. (1975).
Mann, H. B. Nonparametric Tests Against Trend. Econometrica 13, 245 (1945).
Article MathSciNet Google Scholar
Sen, P. K. Estimates of the Regression Coefficient Based on Kendall’s Tau. J. Am. Stat. Assoc. 63, 1379–1389 (1968).
Article MathSciNet Google Scholar
Hosking, J. R. M. & Wallis, J. R. Regional Frequency Analysis: An Approach Based on L-Moments. (Cambridge University Press, Cambridge, U. K., 1997).
Livneh, B. et al. A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States: Update and extensions. J. Clim. 26, 9384–9392 (2013).
Article ADS Google Scholar
Bohn, T. J. et al. Global evaluation of MTCLIM and related algorithms for forcing of ecological and hydrological models. Agric. For. Meteorol. 176, 38–49 (2013).
Article ADS Google Scholar
Serreze, M. C., Clark, M. P., Armstrong, R. L., McGinnis, D. & Pulwarty, R. S. Characteristics of the western United States snowpack from snowpack telemetry(SNOTEL) data. Water Resour. Res. 35, 2145–2160 (1999).
Article ADS Google Scholar
BCQC SNOTEL data https://www.pnnl.gov/data-products (2019).
Storck, P., Bowling, L., Wetherbee, P. & Lettenmaier, D. Application of a GIS-based distributed hydrology model for prediction of forest harvest effects on peak stream flow in the Pacific Northwest. Hydrol. Process. 12, 889–904 (1998).
Article ADS Google Scholar
Sun, N. et al. Regional Snow Parameters Estimation for Large‐Domain Hydrological Applications in the Western United States. J. Geophys. Res. Atmos. 124, 5296–5313 (2019).
Article ADS Google Scholar
Sun, N. et al. Evaluating the functionality and streamflow impacts of explicitly modelling forest-snow interactions and canopy gaps in a distributed hydrologic model. Hydrol. Process. 32, 2128–2140 (2018).
Article ADS Google Scholar
Luce, C. H., Lopez-Burgos, V. & Holden, Z. Sensitivity of snowpack storage to precipitation and temperature using spatial and temporal analog models. Water Resour. Res. 50, 9447–9462 (2014).
Article ADS Google Scholar
Lute, A. C. & Luce, C. H. Are model transferability and complexity antithetical? Insights from validation of a variable-complexity empirical snow model in space and time. Water Resour. Res. 53, 8825–8850 (2017).
Article ADS Google Scholar
Sun, N. et al. Forest Canopy Density Effects on Snowpack across the Climate Gradients of the Western United States Mountain Ranges. Water Resour. Res. 58, e2020WR029194 (2022).
Article ADS Google Scholar
Likas, A., Vlassis, N. & Verbeek, J. J. The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003).
Article ADS Google Scholar
Laramie, R. L. & Schaake, J. C. J. Simulation of the continuous snowmelt process. (1972).
Freudiger, D., Kohn, I., Stahl, K. & Weiler, M. Large-scale analysis of changing frequencies of rain-on-snow events with flood-generation potential. Hydrol. Earth Syst. Sci. 18, 2695–2709 (2014).
Article ADS Google Scholar
Li, D., Lettenmaier, D. P., Margulis, S. A. & Andreadis, K. The Role of Rain-on-Snow in Flooding Over the Conterminous United States. Water Resour. Res. 55, 8492–8513 (2019).
Article ADS Google Scholar
Musselman, K. N. et al. Projected increases and shifts in rain-on-snow flood risk over western North America. Nat. Clim. Chang. 8, 808–812, https://doi.org/10.1038/s41558-018-0236-4 (2018).
Article ADS Google Scholar
Burn, D. H. Catchment similarity for regional flood frequency analysis using seasonality measures. J. Hydrol. 202, 212–230 (1997).
Article ADS Google Scholar
Villarini, G. On the seasonality of flooding across the continental United States. Adv. Water Resour. 87, 80–91 (2016).
Article ADS Google Scholar
Sun, N. et al. CONUS NG-IDF Data Sets. Zenodo https://doi.org/10.5281/zenodo.5827028 (2022).
Yan, H. & Moradkhani, H. Toward more robust extreme flood prediction by Bayesian hierarchical and multimodeling. Nat. Hazards 81, 203–225 (2016).
Article Google Scholar
Ragno, E. et al. Quantifying Changes in Future Intensity‐Duration‐Frequency Curves Using Multimodel Ensemble Simulations. Water Resour. Res. 54, 1751–1764 (2018).
Article ADS Google Scholar
Hou, Z. et al. Incorporating Climate Non-stationarity and Snowmelt Processes in Intensity-Duration-Frequency Analyses with Case Studies in Mountainous Areas. J. Hydrometeorol. 20, 2331–2346 (2019).
Article ADS Google Scholar
Cheng, L. & AghaKouchak, A. Nonstationary Precipitation Intensity-Duration-Frequency Curves for Infrastructure Design in a Changing Climate. Sci. Rep. 4, 7093 (2015).
Article Google Scholar
Pohlert, T. Package ‘trend’. (2016).
Hosking, J. R. M. Package ‘lmom’. (2015).
Agostinelli, C. Package ‘circular’. (2017).

Download references

Acknowledgements

This research is supported by the Strategic Environmental Research and Development Program (SERDP) under contract RC‐2546, and the Environmental Security Technology Certification Program (ESTCP) under contract EW21-5140.

Author information

Authors and Affiliations

Energy and Environment Directorate, Pacific Northwest National Laboratory, Richland, Washington, United States
Ning Sun, Hongxiang Yan, Mark S. Wigmosta, Andre M. Coleman & Zhangshuan Hou
Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, United States
Mark S. Wigmosta
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, United States
L. Ruby Leung

Authors

Ning Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongxiang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Mark S. Wigmosta
View author publications
You can also search for this author in PubMed Google Scholar
Andre M. Coleman
View author publications
You can also search for this author in PubMed Google Scholar
L. Ruby Leung
View author publications
You can also search for this author in PubMed Google Scholar
Zhangshuan Hou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.S., Y.H. and M.W. designed the general approach of developing the datasets. N.S. performed model simulations, validation, and data analysis. Y.H. prepared model validation datasets and supported data analysis. M.W. advised and managed the projects that fund this research, and revised the manuscript. N.S. wrote the paper. All authors participated in discussions and reviews during the development of this manuscript.

Corresponding authors

Correspondence to Ning Sun or Mark S. Wigmosta.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, N., Yan, H., Wigmosta, M.S. et al. Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States. Sci Data 9, 154 (2022). https://doi.org/10.1038/s41597-022-01221-9

Download citation

Received: 26 August 2021
Accepted: 02 March 2022
Published: 05 April 2022
DOI: https://doi.org/10.1038/s41597-022-01221-9
Springer Nature Limited

This article is cited by

Characterizing uncertainty in Community Land Model version 5 hydrological applications in the United States
- Hongxiang Yan
- Ning Sun
- Jennie S. Rice
Scientific Data (2023)
Next-Generation Intensity-Duration-Frequency Curves for Diverse Land across the Continental United States
- Hongxiang Yan
- Zhuoran Duan
- Jeffrey R. Arnold
Scientific Data (2023)

Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States

Abstract

Similar content being viewed by others

Next-Generation Intensity-Duration-Frequency Curves for Diverse Land across the Continental United States

Comparison of on-site versus NOAA’s extreme precipitation intensity-duration-frequency estimates for six forest headwater catchments across the continental United States

Efficient statistical approach to develop intensity-duration-frequency curves for precipitation and runoff under future climate

Background & Summary