1 Introduction

The conterminous US (CONUS) is uniquely impacted by a myriad of natural hazards. In response, state and federal organizations focused on disaster preparedness, response, and mitigation [e.g., Federal Emergency Management Agency (FEMA)] have budgets of tens of billions of dollars. Entities like the National Flood Insurance Program (NFIP) provide over $1 trillion in government-subsidized coverage to at risk properties (Kousky 2018). Due to the potential for state and federal monetary liability, FEMA, NFIP, and other agencies and programs have a mandate to encourage loss reduction strategies. Similarly, private entities in industries such as agriculture, transportation, insurance, defense, and tourism invest heavily in loss mitigation strategies to protect assets from weather and climate hazards. Despite this, billion-dollar weather and climate disasters regularly occur (Smith and Katz 2013), and 22 such events—seven hurricanes, 13 severe weather events, and two drought/fire events—were recorded in 2020 (Smith 2021). Day-to-day operational weather forecasts and emerging techniques in sub-seasonal-to-seasonal (S2S) forecasting help stakeholders anticipate such events on a sub-hourly to monthly timescale (White et al. 2017). Beyond that, however, interested parties must rely on forced-boundary climate models to assess how risk—the combined potential for a peril to occur and produce societal consequences—may change in the future. This is particularly true when examining so-called end-of-century climate scenarios, due to the possibility of violating stationarity assumptions and the resulting decay of skill for current climatology (Rogelj et al. 2015).

Assessment of potential changes in CONUS high-impact sensible weather conditions in the twenty-first century is a rapidly expanding subfield of climate science (Karl et al. 2009, CCS Program 2014). General circulation models (GCMs) are often used to examine planetary and synoptic scale responses to projected changes in greenhouse gas concentrations. In contrast, regional climate models (RCMs), forced by GCM output, are often used to examine how meso-γ scale processes respond to climate change through a process known as dynamical downscaling (Giorgi and Gutowski 2015; Prein et al. 2015). Output from RCMs can be used to implicitly examine atmospheric environments supportive of high-impact weather (Trapp et al. 2007; Diffenbaugh et al. 2013; Gensini et al. 2014; Tippett et al. 2015) or explicitly through relatively fine horizontal grid point spacing and vertical/temporal resolution (e.g., Trapp et al. 2011; Mahoney et al. 2013; Robinson et al. 2013; Gensini and Mote 2014; Gensini and Mote 2015; Done et al. 2015; Trapp and Hoogewind 2016; Hoogewind et al. 2017; Liu et al. 2017; Trapp et al. 2019). A long-standing question in climate science has been, “How will high-impact meteorological hazards associated with meso-γ scale processes change in the twenty-first century?” Although answering this question can aid in reducing the uncertainty associated with impacts from regional climate change (Diffenbaugh et al. 2008; Kendon et al. 2014; Tippett et al. 2015), the computational resources needed to explicitly resolve meso-γ scale processes over a sufficiently long simulation period for climate analysis have only become possible over the last decade. The few RCM projections that currently exist over the CONUS span from 10–30 years in length and have provided evidence of potential changes in mesoscale, high-impact weather such as severe convective storms and extreme precipitation (Gensini and Mote 2015; Liu et al. 2017; Prein et al. 2017; Hoogewind et al. 2017). Due to time and computational limitations, these works each represented a single pair (i.e., historical and potential future) of simulations, limiting their individual ability to discern statistically robust conclusions. Together, considering their methodological differences, they paint a more holistic picture about the uncertainty associated with regional climate change and mesoscale high-impact weather. All previous studies recommend pursuit of additional RCM simulations that together can provide an ensemble approach to further quantify uncertainty.

Recent studies utilizing long-term convection-permitting RCM simulations for a CONUS domain have produced promising results. Despite some regional biases (such as such as anomalously warm and dry conditions during boreal summer in the central CONUS), the simulations described in Liu et al. (2017) reasonably reproduce broad climatological patterns of surface temperature and precipitation. Initial research utilizing these simulations suggests that the frequency and nature of mesoscale convective systems (MCSs)—an important boreal warm-season rainfall source for many locations in the CONUS (Haberlie and Ashley 2019b)—may change in a warming world (Prein et al. 2017; Haberlie and Ashley 2019a). These results broadly agree with observational (e.g., Kunkel et al. 2013; Mallakpour and Villarini 2015) and simulated (Patricola and Cook 2013; Harding and Snyder 2014; Kooperman et al. 2014; Wang et al. 2015) findings that suggest heavy rain events associated with deep convection are becoming, or will become, more frequent (Fowler et al. 2021). Additional studies derived from simulation output generated in Liu et al. (2017) suggest the following changes to mesoscale high-impact weather during the twenty-first century: (1) thunderstorm occurrence and supportive environments may increase (Rasmussen et al. 2020); (2) hurricanes may be stronger and slower moving (Gutmann et al. 2018); and (3) snowstorms may reduce in frequency and have smaller spatial footprints (Ashley et al. 2020). Independent, but similar, convection-permitting simulations described by Gensini and Mote (2014) and Hoogewind et al. (2017) both demonstrate an ability to recreate historical climatologies of severe convective storms and suggest that these events may become more frequent and variable during the twenty-first century (Gensini and Mote 2015; Hoogewind et al. 2017). However, these studies acknowledge the challenges associated with simulating a surrogate event due to horizontal grid-spacing, resolution considerations, and issues associated with the observed severe convective storm report database (e.g., Gensini et al. 2020a, b; Gensini 2021).

This research project builds off these previous works and focuses on quantifying changes in commonly used weather variables and climatological metrics that assist in understanding the probability of occurrence of high-impact, meso-γ-scale weather events. Observational data are used to assess the performance of a convection-permitting 15-year historical climate simulation (1990–2005) and then compared to a 15-year potential future simulation (2085–2100). Both simulations use the advanced research core of the Weather Research and Forecasting model (WRF-ARW v.4.1.2) as the RCM with initial and lateral boundary conditions forced by a bias-corrected GCM simulation. Model configuration differs from previous research with respect to the choice of physics, output variables, output variable time steps, vertical resolution, horizontal grid spacing, and use of spectral nudging. The remainder of this manuscript will detail the simulation specifics (Sect. 2), summarize the historical period performance relative to observations (Sect. 3), interpret potential future changes in 2-m temperature and precipitation (Sect. 3), discuss future directions for these data (Sect. 4), and provide lessons learned for future convection-permitting regional climate simulations (Sect. 4).

2 Experiment configuration

2.1 GCM data

Initial GCM data for this study originates from the Community Earth System Model (CESM; Hurrell et al. 2013) provided by the National Center for Atmospheric Research (NCAR) that participated in phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012). Herein, we use a version of these data that are regridded and bias-corrected using 1981–2005 ERA-Interim reanalysis (Dee et al. 2011) following the methods in Bruyère et al. (2014). Bias correction has been shown to be critical for regional climate simulations of temperature and precipitation (Christensen et al. 2008) and represented a critical element in the authors’ choice of GCM data. Essentially, correcting GCM biases reduces errors that are, in turn, passed to the RCM via initial and lateral boundary conditions (e.g., Warner et al. 1997; Rojas and Seth 2003; Wu et al. 2005; Caldwell et al. 2009) and have been shown to improve overall simulation performance (Ines and Hansen 2006; Christensen et al. 2008). Specifically, the bias-correction herein corrects the mean error in the GCM, but retains the six-hourly weather, longer-period climate-variability, and climate change from the GCM (Bruyère et al. 2014). Further details about these GCM data may be found by visiting the online repository provided by Monaghan et al. (2014).

To compare climate regimes, two 15-year epochs were examined comprising of historical (HIST; 1990–2005) and potential future (FUTR; 2085–2100) time slices. 6-hourly interval output of the atmospheric and oceanic state variables were obtained for the HIST and FUTR epochs from the GCM to be used as initial and lateral boundary conditions. Representative Concentration Pathway (RCP; Moss et al. 2010) 8.5 was chosen for the FUTR epoch to examine the influence of a pessimistic/extreme warming scenario on high-impact weather events with the motivation that if deltas exist between the climate epochs, they should be most pronounced under this future emissions scenario. The authors note that caution should be used when interpreting these results for policy decisions (Burgess et al. 2020), since this scenario currently represents the upper bound of possible end-of-century climate states.

2.2 RCM configuration

WRF-ARW version 4.1.2 (Skamarock et al. 2019) was used in a convection-permitting configuration as the RCM for a domain covering the CONUS (Fig. 1). The label convection-permitting is used here, as the computational domain has a horizontal grid point spacing of 3.75 km, which is sufficient to permit the development of deep convective systems (Weisman et al. 1997). Based on the reported warm-season rainfall biases in a similar simulation presented by Liu et al. (2017) and Prein et al. (2017), the authors performed sensitivity testing to find a WRF-ARW configuration that could adequately simulate warm-season rainfall occurrence, particularly in regions with high MCS frequency. Sensitivity testing utilized different boundary layer parameterization schemes, microphysics schemes, and nudging techniques during an active MCS period of June 2008 (Gensini et al. 2020a, b). Examining potential changes in MCS frequency is an overarching goal of the project that supported these simulations, as previous work has shown that in the Midwest CONUS—a highly productive agricultural region—MCS rainfall is an important part of the hydroclimate (Haberlie and Ashley 2019b). Given these considerations, and others from MCS forecasting sensitivity studies (e.g., Squitieri and Gallus 2020), the selected WRF-ARW configuration was based on the namelist.input parameters identified in Table 1.

Fig. 1
figure 1

Regional climate model a domain (magenta outline) consisting of 3.75-km horizontal grid spacing using a Lambert Conformal Conic projection (1400x; 900y) and b vertical resolution of 51 η levels

Table 1 RCM (WRF-ARW v.4.1.2) configuration

GCM output fields from the HIST and FUTR periods were passed to the RCM to provide initial and lateral boundary conditions for integration. For both epochs, 15 simulations (representing each year in the respective period, for a total 30 simulations) were continuously integrated across an entire hydrologic year (1 Oct–30 Sep), with a new initialization occurring every 1 Oct. This contrasts a more frequent (i.e., daily) reinitialization technique (e.g., Trapp et al. 2011; Hoogewind et al. 2017), as less frequent initialization is typically preferred in RCM applications to recreate conditions that require hydrologic memory (Giorgi and Mearns 1999; Chen and Kumar 2002) due to their development over multiple days and months during any given year (Christian et al. 2015).

Spectral nudging (Miguez-Macho et al. 2004) was used at 6-h intervals to large-scale features for select variables (Table 1) above the planetary boundary layer. This process is used to prevent divergence of synoptic scale features over successive time steps within the RCM from GCM lateral and boundary conditions (e.g., von Storch et al. 2000; Feser et al. 2011). Spectral nudging wavelength numbers, coefficients, and variables were chosen following the previously mentioned sensitivity tests (Gensini et al. 2020a, b) and results from previous work (Spero et al. 2014; Liu et al. 2017). Other types of nudging (e.g., grid nudging, no nudging) did not perform as well as spectral during the sensitivity testing (Gensini et al. 2020a, b). Given the zonal and meridional nudging wavelengths of 3 and 2, respectively, only scales around 2000 km and above are constrained. Thus, the model is not constrained in the simulation of mesoscale phenomena, which is important for the examination of sensible extreme meso-γ weather conditions.

For both epochs, fundamental surface state variables (2-m T and q, 10-m u and v wind, mean sea-level pressure) were archived at 15-min output intervals. 3-D variables T, Td, [u, v] wind, and Z were saved every hour on 20 vertically interpolated isobaric levels (1000, 975, 950, 925, 900, 875, 850, 825, 800, 775, 750, 725, 700, 650, 600, 500, 400, 300, 250, and 200 hPa). In addition to these basic meteorological variables, numerous Air Force Weather Agency (AFWA; Creighton et al. 2014) diagnostic variables were saved every 15-min and include other derived parameters and indices relevant to high-impact weather events (Table S1). Hereafter, we refer to these RCM simulations as WRF-BCC (WRF-Bias Corrected CESM).

3 Comparison to assimilated observations

WRF-BCC HIST daily 2-m AGL temperature—derived from hourly values—and daily precipitation were compared to the Parameter-elevation Regressions on Independent Slopes Model (PRISM; Daly 1994) dataset. This observationally driven dataset is provided on a regular ~ 4-km grid and represents interpolated and adjusted values derived from a variety of high-quality data sources. Monthly aggregates of PRISM daily precipitation totals from 1991–2005 (i.e., the HIST climate simulation period) are used to produce temperature normals for months, seasons, and annual periods. HIST mean, minimum, and maximum temperatures are calculated for daily periods starting at 1500 UTC, and these values are also aggregated to monthly, seasonal, and annual normals. Following the approach used by Liu et al. (2017), this work also uses a gridded ensemble dataset (Newman et al. 2015) originating from over 12,000 unique surface stations that accounts for observational uncertainty resulting from issues like gauge undercatch, complex terrain, and low station density. These data provide 100 perturbed realizations of 2-m temperature and precipitation observations on a 1/8° grid across the CONUS, and each realization is also aggregated into monthly, seasonal, and annual normals. PRISM and WRF-BCC HIST monthly aggregates are bilinearly interpolated to the ~ 12-km ensemble dataset grid and direct comparisons between the two datasets are made. For each grid cell, a test is conducted to see if the bias exhibited by WRF-BCC HIST is less than the greatest bias produced by any of the 100 observational realizations. If so, it is expected that the WRF-BCC HIST bias can be explained by observational uncertainty. If not, the bias is determined to be too large to be explained by observational uncertainty alone.

4 Results

4.1 HIST temperature and precipitation

WRF-BCC HIST and PRISM annual daily mean 2-m temperatures were highly correlated (Pearson’s r correlation of 0.99; p value ≈ 0) and exhibited an overall RMSE bias of 0.9 °C (Fig. 2). Overlap with the ensemble dataset occurred over much of the western CONUS and regional pockets in the High Plains and Southeast. Although negative biases of less than 2 °C were noted throughout the Midwest, these biases were larger than any observed within the ensemble dataset. Overall, a muted diurnal 2-m temperature cycle was evident, with nighttime lows too warm, and daytime highs too cold. Interestingly, the mean annual daily low temperature biases in the Midwest were within the observational ensemble range, but the high temperature biases were outside of the ensemble spread—suggesting that WRF-BCC HIST’s handling of daily 2-m high temperatures was likely the main cause of bias in this region. WRF-BCC HIST struggled to recreate the diurnal temperature range over much of the western CONUS despite reliably recreating the daily mean temperature with relatively small cold biases. Such cold biases have been noted over this region in similar studies (Liu et al. 2017) and may be attributable to handling of surface albedo in WRF’s Noah-MP land surface model. Cold biases also dominate much of the CONUS during the Dec–Feb period, whereas warm biases in the High and Northern Plains were evident during Jun–Aug (Fig. 3). Mean daily 2-m temperature biases over much of the Midwest were within the range of ensemble members during the warm season (Mar–Sep); however, Northern and High Plains biases began to fall outside of the ensemble spread during the summer months. Overall, RMSE biases were minimized during fall (0.86 °C) and maximized during winter (1.4 °C); correlation between WRF-BCC HIST and PRISM was significant for all seasons (Pearson r correlation of > 0.97; p value ≈ 0).

Fig. 2
figure 2

Annual daily (1500–1500 UTC) minimum (a, d, g), mean (b, e, h), and maximum (c, f, i) 2-m temperature (°C) for HIST (ac), PRISM (df), and the relative difference WRF-BCC HIST minus PRISM (gi). Root mean square error values are 1., 0.8, and 2.4 °C for minimum, mean and maximum temperatures, respectively. Hatches represent grids where HIST had a bias smaller than at least one ensemble member

Fig. 3
figure 3

Seasonal mean daily 2-m temperatures (°C) from WRF-BCC HIST (column left), PRISM (column middle), and differences (HIST–PRISM; column right) for Dec–Feb (first row), Mar–May (second row), Jun–Aug (third row), and Sep–Nov (fourth row). Root mean square error values are 1.4, 0.9, 0.97, and 0.86 °C, respectively. Hatches represent grids where HIST had a bias smaller than at least one ensemble member for that season

While the spatial patterns are replicated well, WRF-BCC HIST performance representing temperature magnitudes was reduced, as many locations in the CONUS exhibited monthly and seasonal biases larger than what could be explained by observational uncertainty (Figs. S1–S3). WRF-BCC HIST also had difficulty recreating the diurnal cycle of 2-m temperature, particularly daily high temperature, and produced overall cooler near-surface temperatures relative to observations. However, the approach to matching modeled and observed temperature is not as straightforward as is the case with precipitation. To mimic the collection of daily values for maximum, minimum, and mean temperature, the dataset was resampled to days starting at 1500 UTC (instead of 0000 UTC). Generally, this time should be late enough in the morning (early enough in the day) not to “double count” lows (highs). Despite this approach, daily high temperatures were too low and daily low temperatures too high—the fingerprint of a failure to capture daily extrema. Although poor model performance cannot be ruled out, it may be that daily thermometer readings are much more sensitive to these extrema due to their virtually infinite temporal resolution. When comparing top-of-hour (presented in this work) and 15-min (not shown) calculations of mean, maximum, and minimum daily temperatures—as well as 1200 UTC resampling—not much is gained in the way of reduced temperature biases. These issues could have a variety of downstream effects on the performance of the model. For example, the cooler daytime highs may prevent locations from reaching convective temperature—effectively suppressing thunderstorm formation. Indeed, this may explain the negative rainfall bias noted in the portions of the southeast CONUS during the Jun–Aug period, when and where surface sensible heating is the main driver of convection initiation. Despite the lack of agreement with the ensemble spread in some regions during the boreal summer, the warm biases noted in Liu et al. (2017) are reduced over most of the eastern CONUS, including the Great Plains, by 1–2 °C. This reduction in warm bias likely disrupted the temperature / precipitation feedback loop noted in previous work (Liu et al. 2017) that resulted in dry biases over the Plains and Midwest during the boreal warm season.

WRF-BCC HIST and PRISM showed good agreement (Pearson’s r correlation of 0.91; p value ≈ 0; RMSE = 210 mm) for annual average precipitation, as nearly all gridpoints (except for a few in the Intermountain West) had a bias smaller than at least one ensemble member (Fig. 4). A broad dry bias was noted in the southeast, east, and northeast CONUS, whereas WRF-BCC HIST tended to produce too much precipitation in the Intermountain West (e.g., Sierra Nevada and Cascade mountain ranges) when compared to PRISM. Like temperature, spatial patterns (e.g., gradients, placement of maxima and minima) of WRF-BCC HIST precipitation were admirably simulated, but precipitation magnitudes also displayed regional and seasonal biases (Fig. 5). For example, WRF-BCC HIST simulations were found to have a wet bias in most of the Northern Plains and Intermountain West during DJF (Fig. 5c), and a general dry bias in the Southeast during much of the boreal warm season (Fig. 5g, h, o). This dry bias was most prevalent along the eastern half of the Gulf Coast, Florida, and along the Atlantic coast during Jun–Aug (Fig. 5k) and continued during SON (Fig. 5o). Given the placement and timing of these biases, WRF-BCC HIST could be underestimating the magnitude of (or not properly simulating) warm-season precipitation associated with coastal land/sea-breeze induced convection, sub-gridscale airmass thunderstorms, and/or tropical cyclones. Precipitation biases in the Midwest during JJA (correlating with the peak MCS frequency climatology; Fig. 5i) were significantly improved from the biases noted in a similar previous simulation (Liu et al. 2017). It is important to note, however, that many of these seasonal biases in WRF-BCC HIST are within the range of the observational spread (Fig. 5d, h, l, p).

Fig. 4
figure 4

1990–2005 annual average precipitation (mm) for a WRF-BCC HIST and b PRISM (RMSE of 210 mm). Panels c and d represent the raw and percent differences, respectively. Hatches on panel d represent grids where WRF-BCC HIST had a bias smaller than at least one ensemble member

Fig. 5
figure 5

1990–2005 average Dec–Feb precipitation (mm) for a WRF-BCC HIST, b PRISM, c WRF-BCC HIST minus PRISM, and d WRF-BCC HIST minus PRISM expressed as a percentage. Panels eh, il, and mp illustrate Mar–May, Jun–Aug, and Sep–Nov, respectively. Hatched areas depict regions where the model bias is within the range of the observational spread (at least one observational data set has larger differences to PRISM than WRF-BCC HIST)

Examining monthly precipitation data, RMSE values ranged between 20.3 mm in Apr and 39.3 mm in Sep (Figs. S4–S6). In the warm season, the absolute differences in precipitation in the Intermountain West are small, but the percent differences are large (WRF-BCC HIST is too dry) as this is a climatologically arid region and small differences account for large percentages of the seasonal precipitation. The authors hypothesize that WRF-BCC HIST is not properly simulating Intermountain West convection and subsequent precipitation associated with the onset and duration of the North American Monsoon (Adams and Comrie 1997); however, most of these months and locations are within the observed ensemble spread. Summarizing, three specific precipitation biases were noted for WRF-BCC HIST as compared to PRISM/ensemble data: (1) simulations produced too much precipitation in the northern Plains, High Plains, and Intermountain West during Dec–Feb; (2) a general dry bias is present in the simulations across the Southeast, Mid-Atlantic, and Northeast CONUS during the boreal warm season; and (3) simulations did not produce enough precipitation in the Intermountain West during the climatological peak of the North American Monsoon. Outside of these seasonal/regional biases, WRF-BCC HIST nearly mirrored the observed monthly, seasonal, and annual precipitation, which is notable considering that WRF-BCC HIST is forced with GCM output and not reanalysis (i.e., they should not be expected to be exactly the same weather/climate conditions). It is also worth mentioning that caution should be used in interpreting PRISM to be ground “Truth”, especially in areas with limited surface weather cooperative observations.

4.2 HIST temperature and precipitation extremes

2-m temperature and precipitation extreme values from WRF-BCC HIST and PRISM were analyzed to examine potential biases in the tails of these variable distributions. Eight climatologically unique cities were chosen for examination, including Nashville, Tennessee; Phoenix, Arizona; Amarillo, Texas; Seattle, Washington; Grand Junction, Colorado; Albany, New York; Minneapolis, Minnesota; and Tallahassee, Florida. WRF-BCC HIST climatological 2-m temperature range by calendar day (i.e., WRF-BCC record high and low 2-m temperature for a day) followed the same general annual cycle as PRISM extreme values for all cities (Fig. 6). Biases in 2-m temperature extremes were similar to biases noted in the seasonal spatial climatologies. For instance, WRF-BCC HIST simulated calendar day records for minimum 2-m temperature values during the cool season were too cold as compared to PRISM in Minneapolis (Fig. 6g). Except for Tallahassee, all cities examined had lower extreme high 2-m temperatures on average as compared to PRISM. Extreme low temperatures tended to follow a similar muted signal, except in the cool-season where WRF-BCC HIST values tended to be too cold. Pearson’s r and root mean square error were calculated to identify correlation and biases between minimum and maximum temperature extremes in HIST relative to PRISM for each of the selected cities (Figs. S7 and S8). The correlations were all significant (p < 0.05), and Pearson’s r values ranged from 0.85 to 0.98. Root mean square error ranged from 1.5 to 6.9 °C. In general, the extreme maximum temperatures had lower biases in the selected cities (average root mean square error of 3.1 °C) compared to extreme minimum temperatures (average root mean square error of 5.0 °C).

Fig. 6
figure 6

1990–2005 WRF-BCC HIST and PRISM 2-m temperature extremes by calendar day for eight climatologically unique US cities

An extreme value analysis (EVA; Cooley 2009) was conducted using Fisher–Tippett–Gnedenko theorem on the time series of daily total precipitation for both WRF-BCC HIST and PRISM to compare the extreme value distributions. Extremes were detected using a block maxima technique over the period of one year, with the solution converging toward a right-skewed Gumbel distribution. Return intervals (R) (Makkonen 2006) were calculated from the extreme events using the formula:

$$R=\frac{1}{P}/\lambda $$

where p is the probability of exceedance, and λ is the rate of extreme events per block. A 1000-iteration bootstrap sample was applied to estimate the 95% confidence intervals for return period distributions of the extremes. Of the eight cities examined, only one (Seattle, WA) had a statistically significant (95% confidence) different distribution of extreme daily precipitation values using a Mann–Whitney U test (Fig. 7). Significant overlap between WRF-BCC HIST and PRISM in the 95% confidence interval for the other locations gives good confidence that WRF-BCC HIST can capture the extreme values of daily precipitation recorded in PRISM. Additional comparisons of extreme values across these data could be the subject of future work.

Fig. 7
figure 7

1990–2005 WRF-BCC HIST (magenta) and PRISM (green) daily precipitation extremes as calculated by extreme value analysis for eight climatologically unique US cities. Return intervals (x-axis) of daily precipitation (y-axis; mm) are fit to the extreme values using a right-skewed Gumbel distribution. 95% confidence intervals (shading) are calculated using a 1000 iteration bootstrap

4.3 Projected temperature and precipitation based on RCP 8.5 scenario

WRF-BCC HIST (1990–2005) data were then compared to a projected future climate (FUTR; 2085–2100) based on the CMIP5 RCP 8.5 scenario. Robust and significant changes in mean temperatures across the CONUS through the twenty-first century were noted in this extreme scenario (Fig. 8). These projections suggest that, if this scenario is realized, annual and seasonal mean temperatures will exceed even the warmest outlier years and seasons observed in our current climate state. For many areas in the central CONUS, mean annual temperatures are projected to increase by 5–6 °C, and projections for boreal summer and fall suggest even larger changes (6–7 °C). More muted, but significant, seasonal changes take place across the southeast and western CONUS, with projected changes of 2–3 °C for boreal winter and spring, respectively. These values exceed the observed ensemble spread and the biases noted between PRISM and HIST and are in line with GCM projections for RCP 8.5 (Pachauri et al. 2014). In general, the patterns of larger and smaller changes in temperature follow those produced by CMIP5 GCMs (RCP 8.5)—namely, of larger deltas for interior and northern portions of the continent and smaller deltas for more southerly coastal locations. These results are mimic those produced by CMIP6 members using the SSP5-85 scenario (Almazroui et al. 2021).

Fig. 8
figure 8

Differences (FUTR-HIST) in mean daily temperatures (°C) a annually and for b Dec–Feb, c Mar–May, d Jun–Aug, and e Sep–Nov. All changes were significant at the 95% confidence level using a Mann–Whitney U test for the medians and implementation of a field significance false discovery rate of α = 0.1

The greatest potential future precipitation increases of 200–400 mm year−1 were noted in the vicinity of the Cascade Mountain range and across portions of Mid-South, whereas robust 200–500 mm yrar−1 decreases in future precipitation were noted across broad regions of the Southwest, Great Basin, and Southern Plains (Fig. 9). Seasonally, the largest statistically significant increases in future projected precipitation arise from boreal winter and spring (Fig. 10c, d, g, h) and the largest decreases arise from a reduction in future precipitation during Dec–Aug. The peak of the warm-season (Jun–Aug) is perhaps most notable for its widespread projection of drier conditions across the western and central CONUS (Fig. 10k, l). No widespread statistically significant changes in precipitation were noted across the CONUS during Sep–Nov (Fig. 10m–p). These projected annual/seasonal changes in precipitation highlight the regional nature of the potential changes in the hydroclimate due to anthropogenic forcing and underscore the need for regional-to-local scale assessment of climate variables. Overall, the spatiotemporal changes in precipitation are consistent with previous works discussing current/future expansions of the southern Great Plains arid climate regime and a coincident eastward progression of precipitation and deep-convection maxima (Gensini and Brooks 2018; Seager et al. 2018).

Fig. 9
figure 9

Annual average precipitation (mm) for a 1990–2005 WRF-BCC HIST and b 2085–2100 WRF-BCC FUTR (based on RCP 8.5 scenario). Raw and percent differences (FUTR minus HIST) are shown in panels c and d, respectively. Stippling on panel d indicates statistical significance at the 95% confidence level using a Mann–Whitney U test for the medians and implementation of a field significance false discovery rate of α = 0.1

Fig. 10
figure 10

Seasonal average precipitation (mm) for 1990–2005 WRF-BCC HIST (panels a, e, i, m) and 2085–2100 WRF-BCC FUTR (based on RCP 8.5 scenario; panels b, f, j, n). Raw and percent differences (FUTR minus HIST) for each season are shown in the two rightmost columns, respectively. Stippling on % difference panels indicates statistical significance at the 95% confidence level using a Mann–Whitney U test for the medians and implementation of a field significance false discovery rate of α = 0.1

As an example of potential local changes, HIST and FUTR cumulative precipitation plots were created for the eight climatologically unique CONUS cities used in Sect. 4.2 (Fig. 11). Representative of the CONUS Mid-South, Nashville’s future precipitation projection shows a statistically significant (using 1000 random bootstrapped annual values with replacement at 95% confidence level) mean increase of 205 mm (18%) and a broadening of the annual variability (mean standard deviation increase from 134 to 246 mm), suggesting that a future climate may be wetter and have more variability, favoring larger precipitation amounts (Fig. 11a). Phoenix (Fig. 11b) and Amarillo (Fig. 11c), characteristic of a more arid CONUS climate, show a statistically significant decrease in future precipitation accumulation of 44 and 48%, respectively. In both locations, most of the decrease in precipitation was noted in the warm-season months of Mar–Aug. Statistically significant increases in average annual future precipitation were also noted for Seattle (Fig. 11d) and Albany (Fig. 11f), whereas Grand Junction, Minneapolis, and Tallahassee exhibited no statistically significant changes (Fig. 11e, g, h). These results highlight that, even under an aggressive future emissions scenario, some local geographies may not have significantly altered mean annual precipitation accumulations.

Fig. 11
figure 11

Hydrologic year (1 October–30 September) cumulative daily precipitation (mm) for eight climatologically unique CONUS cities. Solid (dashed) black line indicates the mean of all years in HIST (FUTR). The ± 1σ spread for HIST and FUTR are denoted by the blue and red shading, respectively

5 Discussion/conclusions

In this work, two 15-year epochs of dynamically downscaled climate simulations are assessed across the CONUS.: (1) a retrospective simulation representing the climate state from 1990–2005 (WRF-BCC HIST); and (2) an end-of-century simulation representing one extreme climate state—informed by RCP 8.5—during the period 2085–2100 (WRF-BCC FUTR). Both simulations are forced by a bias-corrected version of the NCAR’s CESM CMIP5 member. Liu et al. (2017) is used as a baseline comparative analysis since it is the most recent, long-term (10 + years), climate simulation that employs similar methods. Some differences between this work and Liu et al. (2017) are to be expected for a few reasons—namely, the use of different WRF configurations, different retrospective years examined, and differing representations of climate forcing (initial and lateral boundary conditions herein are informed by a GCM, not reanalysis or reanalysis perturbed by a climate delta).

2-m temperature and precipitation are the focus of this initial study due to their interconnectedness and importance to regional climate. WRF-BCC HIST commendably recreated spatial patterns in both temperature and precipitation, albeit with some notable regional/seasonal biases as suggested by comparisons to the PRISM dataset. Generally speaking, 2-m maximum temperatures were too cold across all seasons/locations (average annual RMSE bias of 2.6 °C) and conditions too dry across the southeast and eastern CONUS during the warm season (average annual RMSE bias of 210 mm). Precipitation simulations were markedly improved over Liu et al. (2017) for all seasons and locations except for the Southeast during Jun–Aug, and Dec–Feb in the Northern/High Plains. Monthly climatologies revealed very few locations outside of the expected ensemble spread of uncertainty, which greatly increases confidence in the model’s ability to replicate the historical climate.

Examining extremes, 2-m temperature and daily precipitation for eight cities indicated that, on average, WRF-BCC HIST was able mimic extreme values found in PRISM in a statistically significant manner. The most significant biases for extreme 2-m temperature were noted in the cool season (WRF-BCC HIST cold bias) for a majority of locations. In addition, WRF-BCC HIST struggled to capture the full range of extreme 2-m temperature values for most locations, leading to a muted range of daily record high and low 2-m temperatures. WRF-BCC HIST daily precipitation extremes were also replicated well, with only one city (Seattle, WA) out of the eight having a return period distribution that was significantly different from PRISM (WRF-BCC HIST return interval values for daily precipitation were too low).

WRF-BCC HIST simulations were also compared to an aggressive/extreme projected future end-of-century (2085–2100) climate using the CMIP5 RCP 8.5 scenario. Future projections of temperature under this scenario are consistent with previous works using various methods and exhibit robust increases in temperature, especially in the interior CONUS. Future precipitation projections are also consistent with previous research suggesting decreases in precipitation across large segments of the southern Great Plains and Intermountain West, whereas significant increases were projected in the Tennessee/Ohio Valleys and across portions of the Pacific Northwest. In many locations, precipitation was also projected to become more variable in the WRF-BCC FUTR.

The authors learned many important methodological considerations from conducting these simulations. We would like to first underscore that these simulations take a significant amount of time (years) to complete given the integration procedures used, and it is not currently computationally feasible to run a full suite, or ensemble, of simulations at these horizontal grid spacings and vertical resolutions. The continuous integration over the annual cycle generally creates challenges for High-Performance Computing (HPC) systems that have limited wall-clock settings. We benefited from the creation of automated scripts to “restart” the WRF-BCC runs as the simulations progressed to increase efficiency and reduce manual input. This included the creation of a programmatic chain of scripts to post-process variables and move them out of temporary, and rather limited, scratch space for archival. In addition, simulations at these horizontal grid spacings and vertical resolutions created I/O times that dramatically slowed down the simulations, especially when outputting files every 15 min. We benefited from compiling WRF with parallel netCDF (pnetCDF) and used quilting to significantly reduce I/O time. Perhaps one of the greatest challenges with such simulations is data storage, analysis, and curation. Nearly a petabyte of data has been generated for this project, which has challenged the authors to implement new and emerging techniques for data storage, visualization, and analysis. We hope to continue to share our growing pains with other interested members of the community through conferences and workshops. Future work will create additional simulations for other RCP scenarios (e.g., RCP 4.5), as well as examine mid-century epochs. Overall, these simulations serve as an additional study (using a novel methodology) to aid in the detection of potential future changes in extreme meso-γ weather phenomena.