A framework for detection and attribution of regional precipitation change: Application to the United States historical record

Despite the emerging influence of anthropogenic climate change on the global water cycle, at regional scales the combination of observational uncertainty, large internal variability, and modeling uncertainty undermine robust statements regarding the human influence on precipitation. Here, we use output from global climate models in a perfect-data sense to develop a framework for conducting regional detection and attribution (D&A) for precipitation, starting with the contiguous United States (CONUS) where observational uncertainty is lower than in other regions. Our unified approach can simultaneously detect systematic trends in mean and extreme precipitation, attribute trends to anthropogenic forcings, compute the effects of forcings as a function of time, and map the effects of individual forcings. Model output is used to conduct a set of tests that yield a parsimonious representation for characterizing seasonal precipitation over the CONUS for the historical record (1900 to present day), which ensures our D&A is insensitive to structural uncertainty. Our framework is developed using synthetic data in a Pearl-causal perspective wherein causality can be identified using intervention-based simulations. While the hypothesis-based framework and accompanying generalized D&A formula we develop should be widely applicable, we include a strong caution that the hypothesis-guided simplification of the formula for the historical climatic record of CONUS as described in this paper will likely fail to hold in other geographic regions and under future warming.


Introduction
Even though anthropogenic influence has been identified for many variables, including surface air temperature (Hegerl et al. 1997;Tett et al. 1999), sea level pressure (Gillett et al. 2003), tropopause height (Santer et al. 2003), free atmospheric temperature (Jones et al. 2003), and ocean heat content (Barnett et al. 2005), robust and conclusive statements regarding human influence on regional precipitation remain elusive. Changes at the global scale are difficult to detect due to offsetting increases and decreases in different parts of the globe, but Hartmann et al. (2013) find it likely that human influence has affected the global water cycle since the middle of the 20th century. There is a detectable and attributable human influence on zonal land-averages of mean precipitation (Zhang et al. 2007;Sarojini et al. 2012;Noake et al. 2012), with increases in precipitation for the Northern Hemisphere (NH) midlatitudes, Southern Hemisphere (SH) subtropics, and SH deep tropics but decreases for the NH 1 3 subtropics. Detection of changes in mean precipitation for individual grid boxes has recently been attempted (Knutson and Zeng 2018), although the anthropogenic signal is relatively weak at these scales. For extreme precipitation, attribution statements have generally only been attempted for continental-scale averages (Min et al. 2011;Zhang et al. 2013;Paik et al. 2020;Dong et al. 2021), with humaninduced increases in greenhouse gases leading to an intensification of heavy precipitation events for much of the NH (with a few exceptions, e.g., Groisman et al. 2005). Two recent studies (Kirchmeier-Young and Zhang 2020; Huang et al. 2021) detect changes in extreme precipitation at subcontinental scales, but the changes are only attributable for a subset of the models considered. All the aforementioned studies rely on global climate models to simulate the anthropogenically-forced signal and natural variability in precipitation and then project observations onto these patterns to assess their agreement with the climate models.
Unfortunately, continental-scale statements about the human influence on precipitation do not provide the spatially resolved conclusions needed to inform assessments of how much climate change is happening locally. Such information, including both the magnitude of the change and its nature (i.e., is precipitation increasing or decreasing), are critical for resource managers who are trying to plan for the impacts of climate change. Furthermore, since precipitation is an episodic variable, in some places magnitudes and frequencies can have conflicting signs resulting in confusing trends in total precipitation (see, e.g., Polade et al. 2014Polade et al. , 2017Gershunov et al. 2019). In these cases, it is important to understand regional (sub-continental) changes as well as the spatial patterns of change. As an example, when considering the contiguous United States (CONUS), even though there are well-documented 50-to 100-year trends in seasonal mean and extreme precipitation (Kunkel 2003;Easterling et al. 2017;Risser et al. 2019a), existing modelbased detection and attribution (D&A) studies that explore local changes in precipitation find essentially no detectable signal. For example, over 1901-2010, Knutson and Zeng (2018) find that the observations are consistent with naturalforcings runs for 58% of the global grid boxes with sufficient observational coverage (this percentage increases to 71% for 1951-2010 and 78% for 1981-2010). Knutson and Zeng (2018) further show that approximately 85% of the CONUS grid cells have non-detectable trends and the other 15% cannot be attributed to human influence. Guo et al. (2019) use a novel "dynamical adjustment" technique to remove the influence of unforced atmospheric circulation variability on wintertime Northern Hemisphere observations, but again only find significant trends for a small fraction of the CONUS. No such grid-cell level statements are available for extreme precipitation.
The lack of confidence in regional attribution statements regarding precipitation is the result of three factors: observational uncertainty, large internal variability, and modeling uncertainty (Sarojini et al. 2016). Globally, the limited existence of high-quality, century-length records is the primary driver of observational uncertainty: Menne et al. (2012) show that dense, long-term measurements of daily precipitation are limited to CONUS, southern Canada, Western Europe, and parts of Australia. Measurement uncertainty translates directly into attribution uncertainty, e.g., attribution statements for zonal averages are sensitive to the observational data set used (Noake et al. 2012) and model-simulated changes are significantly diminished when considering only grid cells with sufficient observational coverage (Sarojini et al. 2012). Otherwise, the magnitude of internal variability is confounded with modeling uncertainty, particularly for regional scales. At these scales, human-induced change is the result of complicated interactions between natural variability and anthropogenic forcing, such that the structural uncertainty of models remains a significant obstacle to attributing regional changes (Sarojini et al. 2016). Furthermore, the aforementioned D&A studies rely on relatively small ensembles, which likely yield insufficient sampling of the internal variability (DelSole et al. 2019). Model uncertainty is exacerbated for precipitation attribution since a treatment of anthropogenic aerosols is critical (Hegerl et al. 2015). In this case, models include two sources of uncertainty: the representation of aerosols internally (e.g., prescribed versus parameterized) as well as the precipitation response to aerosols. For all of these reasons, the comparison of uncertain observations (at least for much of the globe) with uncertain models makes robust attribution statements for regional precipitation extremely difficult.
Fortunately, Sarojini et al. (2016) suggest a path forward for regional detection and attribution of precipitation. First, they argue that new methods can "facilitate the identification of significant changes in precipitation even in the presence of substantial modeling and observational uncertainty" such that we can make "faster progress ...than would be possible by simply waiting for models or observations to improve or by simply waiting for the signal of climate change to strengthen sufficiently to emerge from the noise of internal variability." Importantly, new methods should analyze changes in the processes that control natural variability (i.e., the noise) by explicitly modeling the drivers of precipitation variability. Like Hegerl et al. (2015), they emphasize the importance of accounting for aerosols, highlighting modelbased studies that have identified the counteracting influence of anthropogenic aerosols on expected increases from greenhouse gas emissions (Wu et al. 2013;Wilcox et al. 2013). Finally, they illuminate the importance of disentangling the complex causes of regional changes in precipitation, including the evaluation of a diverse set of external drivers that may influence the precipitation response as well as possible interactions between external forcing and natural variability.
In light of these suggestions, in this paper we propose a novel framework for conducting regional D&A for seasonal mean and extreme precipitation wherein we detect systematic trends, attribute trends to specific forcings, compute the effects of forcings as a function of time, and map the spatial patterns of each forcings' effect. Our initial focus is on the CONUS, since the influence of observational uncertainty is minimized relative to other global land regions. Our approach to detection and attribution provides two novel aspects relative to existing methods. First, we develop a parsimonious formula that clearly specifies how both natural drivers of climate variability and anthropogenic forcing influence precipitation in a single framework. The implication is that we ensure any uncertainty is in the response of precipitation and not in climate models' representation of the drivers and forcings. Second, while traditional D&A studies are subject to climate models' uncertainties regarding short-lived climate forcers (SLCFs) and their influence on precipitation, our philosophy uses model output in a perfect data sense. Importantly, this means that we can turn an apparent limitation into an opportunity: by evaluating the performance of our methodology across the broad diversity of responses to SLCFs in the CMIP6 multi-model ensemble, we ensure our D&A is insensitive to structural uncertainty. In other words, we can evaluate our methodology regardless of the quality of an individual model's characterization of precipitation and its response to external forcing.
Additionally, our proposed framework can be rigorously tested to ensure that it is applied appropriately to an observational data set. First, we derive a physically defensible characterization of the influence of anthropogenic warming that captures theoretical global scaling relationships for mean and extreme precipitation, such that the estimation of scaling relationships is effective in both single-forcing experiments and historical runs with all forcings. We furthermore identify individual external forcings that are important for seasonal precipitation over the CONUS in the historical record, as well as those which can be safely ignored regarding their influence on secular trends. Given the importance of anthropogenic aerosols, we use the broad diversity of responses across the CMIP6 multi-model ensemble to arrive at a method for characterizing aerosols that is insensitive to structural uncertainty. Our framework also assesses the influence of external forcings on the relationships between known large-scale modes of climate variability and precipitation, and can separate the counteracting influence of well-mixed greenhouse gases and anthropogenic aerosols. Finally, we find that the signal-to-noise ratio for secular trends in precipitation is unaffected by warming, i.e., the challenge of detecting trends is not complicated by increased warming to the global climate system. While the analysis in this paper utilizes a Pearl-causal framework (Hannart et al. 2015) for D&A (i.e., using climate models that involve an intervention), we plan to apply the same formula to in situ measurements from the historical record in a Granger-causal framework (Granger 1969).
The paper proceeds as follows: in Sect. 2, we outline a general formula for modeling a time series of annual mean or extreme precipitation as well as a set of testable hypotheses for how the general formula can be simplified for regional D&A over the historical period. In Sect. 3, we describe the climate model data sets, forcing data sets, and modes of natural climate variability used to test the hypotheses proposed in Sect. 2. Section 4 presents the main results of the paper, wherein we systematically examine each hypothesis and draw conclusions for developing an appropriate formula for detection and attribution for precipitation over the CONUS in the historical record. Section 5 concludes the paper. A summary figure (Fig. 1) and table (Table 3) provide high-level overviews of the analysis and subsequent conclusions made.

The D&A formula
We first outline a general framework for modeling a time series of precipitation, denoted {P(t) ∶ t = 1, … , T} , where the temporal units t refer to a year and P(t) is either the seasonal mean precipitation rate (mm day −1 ) or seasonal maximum daily precipitation total (mm, often referred to as Rx1Day). To account for seasonal differences in the characteristics of precipitation, the following framework will be applied separately to each three-month season (DJF, MAM, JJA, and SON). Suppressed in the notation throughout is geographic location; this is (for now) generic but will generally refer to geospatial locations (e.g., model grid cells or weather station locations). The formula is as follows: for time t = 1, … , T , we define where the time series is decomposed into a pre-industrial term P 0 that represents an overall average uninfluenced by external forcing, a forced component P F (⋅) , and an internal variability component P I (⋅) . The forced component of this time series is further decomposed as (1) including a sum of forced responses, anthropogenic or otherwise, and a set of cross-correlation terms between pairs of forcings. (Note: the cross-correlation term is another way of writing a statistical interaction.) The forced term P F (⋅) could also involve higher-order nonlinear terms (e.g., quadratic) or interactions (e.g., terms involving three or more forcings). The terms F i (t, i ) are defined by Eq. 14 (Appendix A) and represent forcings modified by the lagged response of the climate system on characteristic timescales i determined by the thermal inertia of the ocean. The internal variability component P I (⋅) is further decomposed as where P D (⋅) is a "driven" term that describes year-to-year variability due to known modes of large-scale ocean (e.g., the El Niño Southern Oscillation or ENSO) or atmospheric (e.g., the Pacific North American pattern) drivers, and P W (⋅) is a term associated with short-term weather variability. We assume that the driven component can be represented as General Formula: 3.3 for more information). Note that, as described in Risser et al. (2021), the AO is excluded from the DJF analysis due to its strong coupling (and high correlation) with the NAO in this season. The driven component also includes cross-correlation terms (which again can be thought of as statistical interactions) between the drivers and the external forcing agents, wherein (for example) the relationship between seasonal precipitation and ENSO might be influenced by the well-mixed greenhouse gas (WMGHG) forcing. Equation 4 could also include higher-order terms such as interactions between modes of variability or stochastic modulation of teleconnections (see, e.g., Gershunov et al. 2001). Finally, the internal variability component includes an error term P W (⋅) that represents residual variability in the time series due to weather. We suppose that this term is, on average, zero, but has a variance described by i.e., the magnitude of the weather variability depends on the time t and can be approximately modeled as the product of a "baseline" or pre-industrial variance V 0 and a time-varying quantity exp{V 1 t} where 1∕|V 1 | is a characteristic timescale for the response of the variance. This framework allows the weather variability to change (either increase or decrease) over the historical record. Comparison of the all-hist (all historical forcing) and nat-hist (natural-only forcing) ensembles from C20C+ indicates that precipitation variability increases in the presence of anthropogenic forcing (Pendergrass et al. 2017;O'Brien 2019), so our working assumption is that V 1 > 0 . Since the time units t refer to annual quantities, we assume that all the temporal autocorrelation in the time series is fully captured by P F (⋅) and P D (⋅) , i.e., the residual weather variability terms for each year are temporally independent. For the seasonal mean analysis, we assume that the weather variability follows a normal distribution (with mean zero and variance 2 (t) ), while for the seasonal Rx1Day analysis, we assume that the weather variability follows a Generalized Extreme Value distribution (centered on zero such that the variability is described by 2 (t) ). Note that these assumptions on the distribution of weather variability refer to seasonal aggregates (mean and maximum) of daily precipitation and not the total distribution of daily precipitation.
The general formula defined by Eqs. 2-5 contains many degrees of freedom, such that applying the full equation to an observed time series will likely result in over-fitting. We therefore propose a series of tests or falsifiable hypotheses that will allow us to simplify Eqs. 2-5 for use in the special case of seasonal precipitation over the United States. Two notes: first, we set out to investigate a variety of complicated factors using the analyses described in Sect. 4 and found that we could simplify Eq. 1 (the flow chart diagram in Fig. 1 summarizes the various analyses considered); the hypotheses are organized in terms of these discoveries. Second, it is important to emphasize that our analyses focus on the contiguous United States over the historical record (1900 to present), hence all conclusions regarding simplifications to the general formula apply only to a specific geographic region and time period. While we do not claim that any of these hypotheses are satisfied in other parts of the globe or under future warming to the global climate system, we assert that our hypothesis-driven framework and the accompanying general D&A formula are broadly applicable to other regions and timeframes. To summarize the wide range of science questions considered for each component of the general formula in Eq. 1, we have developed the flow chart shown in Fig. 1.
Our first test addresses whether we can correctly identify the influence of WMGHG forcing on precipitation using a linear regression framework: Hypothesis 1 (H1). Globally, the coefficient for WMGHG forcing ( WMGHG ) in Eq. 2 is consistent with 2%∕K for mean precipitation and 6%∕K for extreme precipitation.
While the focus of this paper is CONUS, we do not have any a priori expectations about the magnitude of the WMGHG signature regionally (e.g., over CONUS) and seasonally. We do know what to expect in terms of globalaverage annual temperature changes, and hence our first test is conducted at the global scale. Next, again looking at precipitation globally, we test whether there are meaningful cross-correlation terms between external forcings: Hypothesis 2 (H2).Globally, over the historical period, all cross-correlation terms in the forced component in Eq. 2 are negligible, i.e., ∑ As mentioned previously, there could also be nonlinear components in the forced term given in Eq. 2; however, for the current analysis we assume that these terms are negligible. The climate state over the historical period has not changed enough to warrant strongly nonlinear behavior (e.g., step changes due to the loss of sea ice), and numerous climate model studies have demonstrated the additivity of responses to forcing agents (e.g, Kirkevag et al. (2008); Shiogama et al. (2013); Marvel et al. (2015)). Shifting our focus to CONUS, we next test for the presence of meaningful trends in seasonal precipitation due to individual forcing agents: Hypothesis 3 (H3). For CONUS and over the historical record, the effects of individual forcings are non-negligible for secular trends in precipitation.
In other words, this hypothesis seeks to determine the set of forcings that need to be included in the forced component P F (⋅) of seasonal precipitation over CONUS and over the historical period. Given that one of these non-negligible forcings is tropospheric aerosols (see Sect. 4.2), we next explore a set of tests to assess structural uncertainties in our representation of anthropogenic aerosols: Hypothesis 4a (H4a). For CONUS and over the historical record, the effect of SO 2 on seasonal precipitation dominates the influence of other anthropogenic aerosol species (black and organic carbonaceous aerosols and ammonia).
Hypothesis 4b (H4b). For CONUS and over the historical record, SO 2 emissions correlate with changes in precipitation as well as deposition of SO 2 by rainfall and column integrated SO 2 mass.

Hypothesis 4c (H4c).
For CONUS and over the historical record, a regionalized time series of SO 2 emissions yield the same result as using either local or CONUS-wide estimates of SO 2 emissions.
We next test whether individual forcing agents influence the relationships between the modes of natural variability and seasonal precipitation: Hypothesis 5 (H5). For CONUS and over the historical record, all higher-order and interaction terms between the climate drivers and external forcing agents in Eq. 4 are negligible, i.e. , ∑ Because externally forced trends in seasonal precipitation over the historical record are small, we next test whether we can distinguish the influence of WMGHGs from the influence of anthropogenic aerosols: Hypothesis 6 (H6). For CONUS and over the historical record, the compensating errors in the effects of WMGHG and anthropogenic aerosol forcing are negligible, i.e., the estimated effects of WMGHGs and aerosols are (1) unbiased and (2) do not involve any compensating trade-offs.
Finally, we conduct a test of the influence of warming on the signal-to-noise ratio in seasonal precipitation: Hypothesis 7 (H7). For CONUS and over the historical record, the signal-to-noise ratio of seasonal precipitation is a constant over time, i.e., Each of these hypotheses will be evaluated using a large set of climate model output (most from the CMIP6 multimodel ensemble) in a perfect data sense to test and guide fits that will eventually be applied to observations and also ensure our D&A is insensitive to structural uncertainty. The result of our explorations (described in the remainder of this paper) is that we can simplify the general formula defined by Eqs. 2-5 as follows: The time lag WMGHG is estimated in Appendix A. For reasons discussed in Sect. 4.3, we assume that the response of precipitation to SO 2 is dominated by fast processes and therefore set SO 2 ≈ 0.

Climate model output
The hypotheses proposed in Sect. 1 require a significant amount of climate model output to test each component in a perfect data sense. The various experiments used are described in the following sections. Across the different models, the primary focus is on mean and extreme precipitation, defined as grid-cell average precipitation rate and grid-cell maximum daily precipitation (denoted Rx1Day). Mean and maximum daily quantities are calculated from the GCM-output precipitation rate (where we convert the raw output in kg m −2 s −1 to mm day −1 ). Hypotheses H1 and H2 involve annual precipitation (in light of theoretical expectations) while all other tests involve seasonal quantities.
CMIP6 historical, DAMIP, LUMIP experiments. First, we utilize output from the Detection and Attribution Model Intercomparison Project (DAMIP; Gillett et al. 2016), which seeks to enable estimation of the influence of various anthropogenic and natural forcings on the global climate system by providing sets of single-forcing or single-group forcing experiments; we use the hist-GHG (greenhouse gases only), hist-aer (aerosol only), hist-nat (solar and volcanoes only), and hist-stratO3 (stratospheric ozone only) simulations. Next, we use output from the Land Use Model Intercomparison Project (LUMIP; Lawrence et al. 2016) which seeks to quantify the influence of anthropogenic land-use and land-cover change (LULCC) on climate by way of the "hist-noLu" simulation, which takes the historical runs and turns off the LULCC. For comparison, we also examine the CMIP6 historical simulations ( Eyring et al. 2016) that span 1850 to 2014 and use observationally based, evolving, and externally imposed forcings, specifically solar variability, volcanic aerosols, and changes in atmospheric composition caused by human activities. The Global Circulation Models (GCMs) from DAMIP, LUMIP, and CMIP6-historical used here are shown in Table F.1 in Supplementary material, along with the number of ensemble members and number of years available.
CMIP6 DECK experiments: piControl, 1pctCO2, and abrupt4xCO2. The various historical experiments only consider "moderate" warming to the global climate system, in that they simulate pre-industrial (1850) to present-day conditions. To further assess the influence of WMGHGs specifically under greater levels of warming, we utilize three of the CMIP6 Diagnostic, Evaluation, and Characterization of Klima (DECK) experiments, namely the pre-industrial control simulation (piControl); a simulation forced by an abrupt quadrupling of CO 2 (abrupt4xCO2); and a simulation forced by a 1% yr −1 CO 2 increase (1pctCO2); see Eyring et al. (2016). The experimental design of the 1pctCO2 runs yields simulations of the climate under a very large range of WMGHG forcing, from approximately ≈ 0.32 W m −2 (the 1850 level of WMGHG forcing) to more than 10 W m −2 (present day WMGHG forcing is approximately 3.6 W m −2 ). The abrupt4xCO2 experiment is branched from the piControl and involves an immediate quadrupling of the global annual mean 1850 CO 2 value that is used in piControl.  (Collins et al. 2017, AerChemMIP;) is a CMIP6-endorsed initiative that seeks to quantify the various impacts of aerosols and chemically reactive gases on climate and air quality, with a specific focus on near-term climate forcers (SLCFs: methane, tropospheric ozone and aerosols, and their precursors), nitrous oxide and halocarbons. We use the targeted AerChemMIP experiments with CMIP6 climate models to assess a variety of questions relating to structural uncertainty in our representation of anthropogenic aerosols. First, we utilize a set of AerChemMIP time-slice experiments (see Table 6 of Collins et al. 2017) to assess the influence of individual SLCFs by comparing simulations with pre-industrial versus present-day values of each SLCF. The time-slice control experiment (piClim-control) uses 1850 concentrations of WMGHGs and all SLCFs and is run in an atmosphere-only configuration with prescribed sea-surface temperatures (SSTs) and sea ice. Otherwise, we use a set of experiments that mirror the control but change a single aerosol precursor from 1850 to 2014 values: piClim-SO2 (anthropogenic emissions of SO 2 ), piClim-BC (black carbonaceous aerosols), piClim-OC (organic carbonaceous aerosols), piClim-NH3 (anthropogenic emissions of ammonia), and also piClim-aer (all aerosol precursors). See Table F.3 in Supplementary material for the models and ensemble members used. Next, we utilize the AerChemMIP "histSST" simulations of the historical record (1850 to 2014), which are atmosphereonly runs with prescribed SSTs and sea ice based on the historical record (see Table 2 of Collins et al. 2017). The models that participate in this experiment must include an interactive aerosols component, and (depending on the climate model) may or may not additionally include interactive tropospheric or stratospheric chemistry; see Table F.4 in Supplementary material for the models and ensemble members used.
C20C+ and HAPPI Experiments. First, we use the "all historical" (All-Hist) ensembles from the Climate of the 20th Century Plus Detection and Attribution (C20C+ D&A) project which are uncoupled, atmosphere-and land-only simulations conducted using observed historical climate forcings, sea surface temperatures, and sea ice coverage. The All-Hist simulations correspond to approximately +0.85 • C of warming relative to the pre-industrial period. We furthermore utilize the Plus15-Future and Plus20-Future simulations from the Half a Degree Additional Warming, Prognosis and Projected Impacts (HAPPI; Mitchell et al. 2017), which explore stabilized warming experiments of +1.5 • C and +2.0 • C, respectively. Like the All-Hist, these are atmosphere-only simulations with 2006-2015 boundary conditions modified to reflect stabilized increases in near-surface air temperatures. For the purposes of our analyses (see Sect. 4.6), we require a large ensemble (at least 50 members) that spans at least 10 years for each of All-Hist, Plus15-Future, and Plus20-Future as well as daily precipitation rate (see Table F.5 in Supplementary material).

Characterizing external forcings
At the global scale, we assume that the effects of solar forcings and stratospheric ozone on mean and extreme precipitation are small and negligible; we will later test this claim for the CONUS analysis. Otherwise, we need to account for well-mixed greenhouse gases (WMGHG), volcanic forcing, and anthropogenic aerosols. Time series of the following sources are shown in Fig. 2.  Table 1; the forcing formulae for CFC-11 and CFC-12 are given in Table 3 of Hodnebrog et al. (2013). To characterize the WMGHG forcing overall, we will simply compute a total WMGHG forcing (CO 2 +CH 4 +N 2 O+CFC-11+CFC-12), all in one value as a function of time; see Fig. 2a.
Since the response of precipitation to WMGHG forcings is dominated by the lagged response of SSTs to these forcings (Samset et al. 2016;Douville and John 2021), and since these time lags range between 1 and 3 decades (Ricke and Caldeira 2014) due to the thermal inertia of the upper ocean, the function for total WMGHG forcing includes time lags as discussed in Appendix A.
Volcanic aerosols. For this forcing, we will simply use the observational time series of volcanic stratospheric aerosol optical depth at 550 nm (vSAOD). As with our previous observational analysis (Risser et al. 2021), we use a hybrid time series derived from Sato et al. (1993)  Anthropogenic aerosols. At the CONUS scale, we require a time series of anthropogenic aerosols that can be applied to the observational record. A significant amount of the analysis in this paper specifically addresses this question (see Sect. 4.3), for which we utilize the CMIP6 forcing datasets Gidden et al. 2018), which provide monthly measurements of anthropogenic SO 2 emissions (in kg m −2 s −1 ) across various sectors (e.g., agriculture, industry, etc.) on a 0.5 • × 0.5 • longitude/latitude grid. The monthly data are summed across sectors, averaged seasonally, and then averaged across all 0.5 • × 0.5 • grid cells that lie within the CONUS boundary. These seasonal time series aggregated to the CONUS scale, shown in Fig

Drivers of natural climate variability
A recent paper by Risser et al. (2021) forms the basis for our forthcoming observational analysis and identifies a set of drivers of natural climate variability for describing year-toyear variability in seasonal precipitation over CONUS in the observational record. Given that the motivation of this paper is to set the stage for an observationally based attribution statement regarding seasonal precipitation in CONUS, we must similarly account for modes of natural variability in an analysis of climate model output. Section 2.2 of Risser et al. (2021) contains a thorough literature review of the different sources of natural climate variability considered (as well as those that are not considered); Sect. 4.1 of Risser et al. (2021) demonstrates that this set of "drivers" (as we will henceforth refer to them in this paper) is complete insofar as they sufficiently account for year-to-year variability in seasonal precipitation that can be explained by known modes of large-scale oceanic and atmospheric variability. Here, we explore the ENSO Longitude Index (ELI, Williams and Patricola 2018); the Pacific/North American (PNA) pattern (Wallace and Gutzler 1981;Barnston and Livezey 1987); the Arctic Oscillation (AO) (Thompson and Wallace 2000); the North Atlantic Oscillation (NAO) (Hurrell et al. 2003); and the Atlantic Multidecadal Oscillation (AMO) index (Schlesinger and Ramankutty 1994;Kerr 2000) to quantify multidecadal variability in Atlantic sea surface temperatures (Mann et al. 2020(Mann et al. , 2021. Unlike Risser et al. (2021), we do not include volcanic stratospheric aerosol optical depth as a result of the analysis of the DAMIP hist-nat ensemble in  Sect. 4.2. We use the Climate Variability Diagnostics Package (CVDP; Phillips et al. 2014) to calculate all of these indices (except ELI) from the various coupled climate model runs examined throughout this paper, and implement the ELI calculation as a Python-based analysis pipeline in the Toolkit for Extreme Climate Analysis ( Loring et al. 2016). All indices are calculated based on monthly data and then averaged seasonally (following Risser et al. 2021).

Results
We now describe a set of analyses that test each of the hypotheses outlined in Sect. 1. Figure 1 provides an overview the motivation and testing of each hypothesis and Table 3 re-states the hypotheses and our conclusions. Table 3 also provides a confidence assessment for each conclusion (derived in Appendix E). These confidence statements use language as precisely defined by Mastrandrea et al. (2010); note, however, that these statements do not reflect a measure of expert assessment but instead summarize uncertainty across the large CMIP6 multi-model ensemble.

Global analysis (H1 and H2)
Our first investigation involves hypotheses H1 and H2, which ask whether we can correctly identify the magnitude of the WMGHG effect on precipitation via a linear regression framework (H1), and if so, can we further isolate this WMGHG dependence in a noisy climate system with all forcings (H2)? The hist-GHG ensembles provide a clean test of hypothesis H1, since the only external influence on precipitation in these simulations is the WMGHG forcing. For H2, we subsequently use the historical runs to assess whether the WMGHG dependence is the same in both scenarios or the WMGHG dependence is meaningfully influenced by other external forcings. For both of these tests, we focus on the global average of annual mean and extreme precipitation; here, the global average for extreme precipitation involves the grid cell average of annual maximum daily precipitation (Rx1Day).
To test H1, we conduct three regression analyses: first, we regress is the global average, grid-cell area weighted, annual time series for each hist-GHG ensemble member e = 1, … , N m and year t = t m 0 , … , t m T for model m = 1, … , 11 . In Eq. 7, F WMGHG (t, WMGHG,m ) is the total lagged WMGHG forcing in year t using the modelspecific lag WMGHG,m , the error term m,e (t) is distributed as N(0, 2 (t)) , and the error variance is modeled as in Eq. 5. Second, we model the global average of extreme daily precipitation, i.e., Rx1Day (mm), denoted Z hist-GHG m,e (t) as arising from a GEV distribution (see, e.g., Coles 2001) with timevarying location time-varying scale hist-GHG m (t) (modeled as in Eq. 5), and time-invariant shape hist-GHG m . Finally, we regress the global average of mean surface temperature ts (K), denoted S hist-GHG m,e (t) , on WMGHG as in Eq. 7. In each case, we aggregate across the ensemble and estimating a single set of coefficients for each model. We use these fitted regression equations to estimate the precipitation rate, 20-year return value, and global average surface temperature for 1850 and 2020 and then calculate the model-specific scaling rates (percent change in precipitation per K), denoted R hist-GHG mean,m and R hist-GHG GEV,m , by comparing 2020 estimates with 1850 estimates. To propagate the uncertainty through from the various regressions to the final scaling factors R hist-GHG mean,m and R hist-GHG GEV,m , we utilize the bootstrap to calculate basic bootstrap confidence intervals (see, e.g., Paciorek et al. 2018), here and throughout at the 95% confidence level.
The estimated hist-GHG percent changes in precipitation (mean and return values) and scaling rates as a function of the warming are shown in Fig. 3 for each model along with 95% confidence intervals along both axes. The expected 2%/K and 6%/K for means and extremes, respectively, are shown with black lines. Interestingly, the mean scaling rates seem to be systematically less than 2%/K (although in some cases the confidence interval covers 2%/K); the extreme scaling rate also is generally less than 6%/K, except for the IPSL runs. The scaling factors do not appear to be influenced by the magnitude of the warming, although all but two of the models considered have less than ≈ 1.6 K warming over 1850-2020. While the scaling rates for both mean and extreme precipitation generally fall short of their theoretical values, these results are consistent with related explorations of precipitation scaling. For example, Collins et al. (2013) find that the rates of change in means were 1-3% K −1 , and changes in daily extreme precipitation ranged from 4% K −1 (CMIP3) to 5.3% K −1 (CMIP5 models). Furthermore, Kharin et al. (2013) find rates of change for mean and extreme precipitation in CMIP5 models to be, on average, 1.8% K −1 and 5.8% K −1 , respectively, although these results explore changes for future projections (RCP 2.6, 4.5, and 8.5), which have up to 5K of warming. Furthermore, Kharin et al. (2013) find that these rates are indeed dependent on the level of warming, such that the rate was lower for lower warming amounts. Both of these results are consistent with what we see in Fig. 3, particularly given the moderate levels of warming over the historical period in the hist-GHG runs (note that Fig. 3b also shows the range of scaling rates from Collins et al. (2013) and Kharin et al. (2013) for reference). Similar results are obtained when analyzing the CMIP6 DECK experiments piControl and 1pctCO2 introduced in Sect. 3.1, which involve significantly more warming to the global climate system; see Appendix B. These results support our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the WMGHG forcing as quantified via our regression framework is consistent with theoretical expectations for mean and extreme precipitation.
Next, we outline a formal framework for testing hypothesis H2, where we assess whether we can isolate the WMGHG influence in the historical runs at the global scale and how this compares with the hist-GHG runs. We focus on the global scale so that, for the purposes of this test, we can ignore the substantial and demonstrable regional effects of long-term unforced variability and weather noise (McKinnon and Deser 2018; Deser 2020) on the forced response of the climate system (Kay et al. 2015, Fig. 2). For this test, we assume that the effects of solar forcings and stratospheric O 3 are small, and so we just have to account for the volcanic SAOD, WMGHGs, and aerosols in the historical runs. As  Estimated hist-GHG changes in globally averaged annual mean and extreme precipitation versus warming (a), the corresponding hist-GHG scaling factors R hist-GHG mean,m and R hist-GHG GEV,m versus warming (b), and statistical WMGHG coefficient estimates calculated from hist-GHG versus historical for mean and extreme precipitation (c). Each point contains coordinate-wise 95% basic bootstrap confidence intervals. In a and b, the black lines denote the theoretically expected responses; the range of scaling rates from Collins et al. (2013) and Kharin et al. (2013) are shown in panel (b) with black and blue bars, respectively. The black lines in c are the 1-1 line, and the blue lines in c represent the linear fit for the historical relative to hist-GHG (95% uncertainty band in light gray) previously mentioned, accounting for the aerosol forcing in the historical runs is nontrivial. To account for the different ways each modeling center includes aerosols and to account for the spatially heterogeneous influence of aerosols globally, as a pre-processing step we first calculate a smoothed aerosol trend in mean and extreme precipitation in the histaer runs (specifically, fitting a natural spline curve with 7 degrees of freedom over 1850-2020). Then, we subtract this smoothed trend from the historical annual time series at each grid cell. Once we have removed the aerosol trends, we calculate two area-weighted global average time series for each historical model/ensemble member: the global average of mean precipitation (mm/day) minus the aerosol trend, denoted Y hist, no-aer m,e (t) ; and the global average of extreme daily precipitation, i.e., Rx1Day (mm) minus the aerosol trend, denoted Z hist, no-aer m,e (t) . Similar to the hist-GHG analysis, we then conduct the following regressions: for the mean (again assuming time-varying Normal error), and modeling the Rx1Day measurements Z hist, no-aer (where the hist-GHG quantities are as in Eqs. 7 and 8). Uncertainty is once again quantified using the bootstrap.
For the 11 models in Table F.1 in Supplementary material that have both hist-GHG and historical runs, we compare the statistical WMGHG coefficient estimates from these two experiments along with their 95% basic bootstrap confidence intervals; see , b.). The plots also show the 1-1 line (black) and the linear fit for the historical relative to hist-GHG (blue line; 95% uncertainty band in light gray); the caption of each figure shows the slope of the blue linear fit (± the standard error) and the R 2 value for the linear fit. While the individual models' confidence intervals often do not intersect the 1-1 line, across the multi-model ensemble the regression coefficients for hist-GHG and historical generally agree quite well. In other words, we can indeed isolate the WMGHG dependence in the historical runs consistently with what we get from the hist-GHG results. This result supports our confidence assessment in Appendix E that it is very likely (as precisely defined by Mastrandrea et al. (2010)) that there are no meaningful cross-correlation terms in the WMGHG influence in precipitation over the historical record.

CONUS trends from individual forcing agents (H3)
Next, we set out to investigate if there are meaningful trends in seasonal precipitation at the scale of CONUS due to individual forcing agents. Using the DAMIP and LUMIP experiments, we can explicitly quantify trends due to solar forcing, volcanoes, stratospheric ozone, land use/land cover change, WMGHGs, and anthropogenic aerosols. For this hypothesis, to be as general as possible, we simply assess changes in seasonal precipitation (mean and extreme) due to changes in global average surface temperature for each model and each experiment. For each model grid cell over CONUS, we first extract the seasonal (DJF, MAM, JJA, and SON) mean precipitation rate and maximum daily precipitation (Rx1Day) in each year and ensemble member, and then take area-weighted averages of the seasonal quantities. Next, we regress the time series of seasonal precipitation versus the global average surface temperature (using OLS for precipitation rate and GEV regression for Rx1Day and in both cases using a time-varying weather variance as in Sect. 4.1), aggregating over the ensemble members to obtain a single trend estimate. Next, we convert the trends in global surface temperature to a change in precipitation per century and then use the block bootstrap (as in, e.g., Risser et al. 2019b) to quantify uncertainty and provide a 95% confidence interval. Note that to assess the influence of land use/land cover change (LULCC), we generate a "hist-Lu" scenario by subtracting the hist-noLu ensemble average time series from the historical ensemble average time series, which implicitly assumes linearity in the relationship between LULCC and seasonal precipitation. For comparison, we also calculate and plot corresponding trends from the CMIP6 historical and hist-noLu experiments.
The CONUS-average individual forcing trends are shown in Fig. 4 for each model, experiment, and season for both mean precipitation and Rx1Day. Across essentially all models and seasons, it is apparent that WMGHG forcing yields strong positive trends in mean and extreme precipitation at the CONUS scale as evidenced by the fact that the hist-GHG confidence intervals are generally positive and do not include zero in most cases. Similarly, the anthropogenic aerosol forcing yields strong negative trends in both mean and extreme precipitation since the hist-aer confidence intervals are negative and again do not include zero. Both of these results are very much expected, particularly for precipitation globally, but it is interesting that we verify this result at the CONUS scale. Conversely, the hist-nat and hist-stratO3 trends are in most cases non-distinguishable 1 3 from zero, meaning that these forcing agents do not significantly influence seasonal precipitation at the CONUS scale. In other words, this implies that neither stratospheric ozone nor the natural forcings (solar and volcanoes) have a meaningful impact on seasonal precipitation over CONUS. Finally, while there are only a few models that provide the hist-noLu (and hence the derived hist-Lu) experiment, in most cases the trends due to LULCC as represented by our derived hist-Lu time series overlap zero. It is also generally the case that the hist-noLu trends are non-distinguishable from the historical trends. As such, we also conclude that LULCC does not significantly influence seasonal precipitation over CONUS.
In summary, we have found that WMGHG forcing tends to increase seasonal precipitation, anthropogenic aerosol forcing tends to decrease seasonal precipitation, and the influence of natural forcings, stratospheric ozone, and LULCC is minimal and can be, for our analysis, ignored. These results support our confidence assessments in Appendix E regarding the likelihood that individual forcings are non-negligible for secular trends in precipitation over CONUS: likely for WMGHGs and anthropogenic aerosols, about as likely as not for stratospheric ozone and LULCC, and unlikely for natural forcings (italicized terms are precisely defined by Mastrandrea et al. (2010)).

Structural uncertainty in representing anthropogenic aerosols (H4)
Given the clear influence of tropospheric aerosols on seasonal precipitation identified in Fig. 4, we must account for this external forcing in an analysis of the historical record. The obvious question then becomes: how exactly should this be done appropriately? The natural choice would be to use atmospheric SO 2 concentrations because it is atmospheric SO 2 that alters clouds via oxidation to sulfate cloud-condensation nuclei (Lohmann and Feichter 2005;Chen et al. 2008). However, this is problematic because there are no detailed long-term, spatially resolved records of SO 2 concentrations, and we are hence reliant on models. To make matters worse, atmospheric concentrations and surface deposition rates of SO 2 differ considerably among models due to their chemistry parameterizations and underlying atmospheric codes (Lamarque et al. 2013). This diversity is evident in the CONUS-mean atmospheric burdens and deposition rates of SO 2 from AerChemMIP histSST runs, such that the only field with minimal spread across the multimodel ensemble are the prescribed anthropogenic emissions (Fig. G.3 in Supplementary material).  To this end, we now conduct a set of tests to address structural uncertainty in anthropogenic aerosols and justify our conclusions using climate model output from the AerChemMIP experiments described in Sect. 3.1. Our exploration of how to characterize anthropogenic aerosols is simplified somewhat due to results from the Precipitation Driver and Response Model Intercomparison Project (PDRMIP; Myhre et al. 2017). Using PDRMIP simulations,  show that Asian and European sulfate aerosols have negligible effects on precipitation over CONUS, and therefore we can use aerosol measures local to CONUS. This conclusion is further supported by source apportionments of the SO 2 over CONUS, which show that until 2000 that ≤ 10 % of the SO 2 burden over any of the four quadrants (northwestern, southwestern, northeastern, and southeastern) of US national territory had a non-domestic origin (Yang et al. 2018). Over the global land surface, PDRMIP finds that the fast precipitation response to increased sulfate aerosol burdens is approximately three times that of the slow (SST-mediated) response, although at continental scales (e.g., CONUS) the contributing models do not consistently agree on the partitioning between fast and slow responses (Samset et al. 2016). We have used this finding to support omission of lagged (slow) responses of CONUS precipitation to SO 2 . Supported by the PDRMIP findings and the source apportionment by Yang et al. (2018), we hypothesize that the response of CONUS to SO 2 is dominated by fast responses to domestic emissions of this species.

Dominant aerosol species for changes in precipitation (H4a)
First, we show that SO 2 is, in fact, the dominant aerosol species for changes in precipitation. To support this conclusion, we utilize the AerChemMIP time-slice experiments to assess the influence of each individual near-term climate forcer (SO 2 , black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia) on seasonal precipitation. This test involves looking at changes in seasonal precipitation (both precipitation rate and Rx1Day) for each of piClim-SO2, piClim-BC, piClim-OC, and piClim-NH3 relative to piClimcontrol averaged over the climatology of each experiment. In addition to comparing each "single forcing" experiment versus the control, we further quantify the combined effects of black carbonaceous (BC) aerosols, organic carbonaceous aerosols, and ammonia relative to the control simulation, since the sum of the global-mean effective radiative forcings (ERFs) by these species is much less than that of SO 2 for 1850-2014 (Thornhill et al. 2021)  The results of this analysis are summarized in Table 1, which shows a tally of the model-season combinations (out of 16 total) for which the sum of black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia (BC + OC + NH3) cannot be reliably distinguished from control run for each region. Focusing on the NH land region for seasonal Rx1Day, the changes for BC + OC + NH3 cannot be reliably distinguished from 0 for all but one model-season combination; for seasonal precipitation rate, this is also true for 9/16 of the model-season combinations. Furthermore, for all seasons and models (and regions), it is possible that the decreases in both precipitation rate and Rx1Day for SO 2 are larger in magnitude (more negative) than those for BC + OC + NH3 to within the full 95% CI intervals, i.e., this possibility is not excluded. This result is shown for NH land in Supplemental Figure G.2. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that SO 2 is the aerosol species that dominates historical changes in precipitation over CONUS.

Characterization of SO 2 influence on precipitation (H4b)
Because SO 2 is the dominant aerosol species for driving changes in seasonal precipitation, the next question involves how to quantify the influence of SO 2 . There are (at least) three candidates: SO 2 emissions (denoted "emiso2"), deposition of SO 2 by rainfall (denoted "wetso2"), and column integrated SO 2 mass (denoted "iso2"). Each of these quantities are resolved within the model to the grid cell, either as a boundary condition (emiso2) or as calculated internally within the model via parameterizations (wetso2 and iso2). In principle, the ultimate causal variable relating aerosols to precipitation is emissions, but the relationship between emissions and precipitation is quite complicated because of transport of SO 2 and transformations in the atmosphere. Therefore, we seek empirical support for a simple functional form for the effect of emissions on precipitation. Ideally, emiso2 does as well as either of the other two measures since it is the only field we know reliably in the observational record. To test this claim, we again use the piClim-aer and piClim-control AerChemMIP time-slice experiments to assess the correlation over grid cells between changes in seasonal precipitation (piClim-aer minus piControl) and changes in the SO 2 sources (again piClim-aer minus piControl). Figure 5 shows the correlations over grid cells for NH land for all available data across precipitation type (mean or extreme), season, and SO 2 source along with a 95% basic bootstrap confidence interval. For the NH land regions, the correlations of both precipitation rate and Rx1Day with emiso2 across all combinations of seasons and models are not meaningfully different from the corresponding correlations with iso2 and wetso2 when considering the full 95% CI intervals. The same is true for CONUS and China (not shown); in India the same is true except for MAM precipitation rate in three of the models (again not shown). In other words, we show that the relationship between emissions and precipitation is very similar to that between precipitation and variables more directly connected to cloud microphysics (integrated SO2 or SO2 rainout).
This result supports our confidence assessment in Appendix E that it is very likely (as precisely defined by Mastrandrea et al. (2010)) emiso2 correlates with changes in precipitation as well as wetso2 and iso2; hence we can safely use emiso2 to characterize the relationship between seasonal precipitation and anthropogenic SO 2 .

Regionalized vs. local estimates of SO 2 emissions (H4c)
Finally, given the results of Hypotheses 4a and 4b, we evaluate whether the influence of anthropogenic aerosols should be determined using local (model grid-cell) estimates of emiso2, or whether regionally-averaged emissions could be used instead. In principle, a localized measure would be appropriate given the dominance of fast (microphysicsmediated) effects of SO 2 on rainfall (Samset et al. 2016;Yang et al. 2018), the large geographic gradients in anthropogenic emissions and column-integrated atmospheric burdens of SO 2 across CONUS (Yang et al. 2018) as well as in the frequency of different precipitating weather systems (Kunkel et al. 2012), and the short characteristic lifetimes of Unfortunately, for the time period over which we have sufficient in situ precipitation data (1900 to present), the seasonally-averaged time series of grid cell SO 2 emissions is highly correlated with the WMGHG forcing time series (exceeding 0.8 in absolute value for ≈ 50% of CONUS; in most cases emiso2 and WMGHGs are anti-correlated). This means that it will be essentially impossible for a regression-based analysis to distinguish the effect of WMGHG on precipitation from the effect of SO 2 . Note, however, that the grid cell emiso2 versus WMGHG correlations are much weaker for 1850 to present and become sufficiently small for a regression analysis. As such, we can confidently separate the influence of WMGHG from the influence of grid cell SO 2 , but only when the time series extends back to 1850 (and not when we are limited to a starting year of 1900). See Appendix C for further details on the relationship between WMGHG forcing and grid-cell SO 2 emissions. At the CONUS scale, we see that emissions have generally trended up and then down similarly across the US due to broad societal reasons (i.e., industrialization and then regulations; see Fig. 2) such that the correlation between the CONUS-average emissions and WMGHG forcing is negligible.
A secondary problem arises due to the fact that the 0.5 • × 0.5 • gridded observational data set of SO 2 emissions Gidden et al. 2018) is extremely heterogeneous across CONUS, such that the average annual emissions in nearby grid cells can differ by a factor of nearly 1000 (see Fig. 13a). This extreme spatial variability occurs at very small scales: in some cases, for neighboring grid cells (e.g., portions of the Great Plains; again see Fig. 13a). In this case one must be concerned with transport of SO 2 emissions, since it is unlikely that the grid cells in the Great Plains with essentially zero emissions adjacent to grid cells with very large emissions have negligible atmospheric SO 2 concentrations. Unfortunately, there are neither long-term nor continental-scale observations of SO 2 transport, and simulations of this phenomenon are subject to large structural and parametric uncertainties in convective transport and subsequent scavenging (Feng et al. 2011;Tost et al. 2010) that act to reduce confidence in model estimates of aerosol-cloud interactions (Storelvmo 2012).
To address both of these concerns, we propose an approach that yields a compromise between using grid cell emissions (which are strongly correlated with WMGHG forcing over 1900-present and do not realistically represent SO 2 transport) and a CONUS-wide average emissions time series (which has the correct trajectory but ignores the important local effects of SO 2 ). Specifically, we use "stochastically regionalized" time series of SO 2 emissions, wherein the grid cell emissions are averaged within ensembles of a set of regions across the CONUS (Appendix D).
Since we want the local emissions to follow the same trajectory as the CONUS-wide time series (i.e., generally trending up through 1970 and then decaying), we determine these regions as follows: 1. In each season, identify pairs of years from {1900, … , 1966} and {1967, … , 2020} such that the absolute difference of the SO 2 emissions in the CONUSwide time series in the year-pair is less than a small fraction of peak emissions. The midpoint year of 1966 is chosen since this is the year in which SO 2 emissions reached their maximum; see Fig. 2. 2. The stochastic regionalizer generates a partitioning of CONUS into connected regions that minimize a cost function chosen to decorrelate regional SO 2 emissions from WMGHGs (Appendix D). 3. In order to account for uncertainty in the regionalization, we repeat Step 2 100 times to obtain an "ensemble" of regions. 4. For each ensemble member and using the averaged emissions for the regions identified in Step 2, we reconstruct the grid cell map of seasonal SO 2 emissions in each year.
By construction, the stochastically regionalized (SR) SO 2 emissions time series are only moderately correlated with WMGHGs (on average, roughly -0.2) such that they can be used to simultaneously assess the influence of both emissions and WMGHGs on rainfall.
In order to explicitly test whether using stochastically regionalized emissions introduces significant biases relative to using either grid cell specific emissions or CONUS-wide emissions, we use the AerChemMIP histSST experiments which cover 1850 to 2014. For the histSST simulations, we conduct a regression analysis at each model grid cell where, similar to the station-specific analysis in Risser et al. (2021), we simultaneously assess the influence of WMGHGs, SO 2 , and the set of five climate drivers discussed in Sect. 3.3 as follows: for the mean (extreme) precipitation analysis, we allow the mean (location parameter) to change linearly according to the two forcing time series and five climate drivers with a time-varying error term as in Eq. 5. For each grid cell, we conduct the analysis three times: first, using grid cell-specific SO 2 emissions time series; second, using the stochastically-regionalized emissions (here, the analysis is done for each ensemble member, after which we average coefficient estimates across the ensemble); and third, using a single CONUS-wide average emissions time series (all of these time series are extracted from the model-specific emiso2 output). Conditional on the estimated SO 2 coefficients, we calculate the effect of emiso2 on precipitation by multiplying coefficient estimates by the range of emissions experienced at that grid cell and the calculate an areaweighted CONUS average. All other drivers (as well as the time-varying error) are fixed at their average values. These estimates are shown for all three characterizations of SO 2 in Fig. 6 for the six models that have all required input data and across seasons and mean/extreme precipitation along with 95% basic bootstrap confidence intervals. Clearly, across all seasons, the estimated effect on seasonal precipitation (for both mean and extreme) is essentially identical for all three methods of characterizing SO 2 emissions. The spatial patterns of this effect (not shown) also agree extremely well for each approach. This result supports our confidence assessment in Appendix E that it is virtually certain (as precisely defined by Mastrandrea et al. (2010)) that regionalized time series of emiso2 yield the same results as using either local or CONUS-wide estimates of emiso2. Therefore, we can safely proceed with using regionalized time series of SO 2 emissions to characterize the influence of anthropogenic aerosols in an analysis of the historical record.

Influence of forcing agents on driver vs. precipitation relationships (H5)
As is documented in Risser et al. (2021), the five climate drivers ELI, AO, NAO, PNA, and AMO are sufficient for describing year-to-year variability in seasonal mean and extreme precipitation that is driven by large-scale modes of climate variability, i.e., not due to background weather variability. However, a follow-up question that was not directly addressed in Risser et al. (2021) has to do with whether the relationship of these climate drivers and seasonal precipitation is altered by external forcing. We can directly assess the influence of external forcings on the driver versus precipitation relationships using the DAMIP, LUMIP, and CMIP6 historical experiments. Similar to the station-specific analysis in Risser et al. (2021) and the analysis of AerChemMIP histSST described in Sect. 4.3.3, we conduct a grid-cell specific time series analysis of seasonal mean (using OLS) and extreme (using GEV) precipitation for each model/experiment, again aggregating over any available ensemble members to obtain a single estimate for each grid cell across the ensemble. Both mean and extreme analyses use the time-varying error variance in Eq. 5, and specify that the mean/location parameter vary linearly with a set of experiment-specific terms: -hist-nat and hist-stratO3: climate drivers only -hist-GHG: climate drivers and WMGHGs -hist-aer: climate drivers and SO 2 emissions -historical and hist-noLu: climate drivers, WMGHGs, and SO 2 emissions For the historical, hist-aer, and hist-noLu simulations, we use stochastically regionalized SO 2 emissions as described in Sect. 4.3.3 and Appendix D. In light of the negligible secular trends identified in the hist-nat simulations (Fig. 4), we can use the hist-nat estimates as reference values to compare versus the estimated driver-precipitation relationships in the other experiments. For example, if the hist-GHG analysis (which includes only "main effects" of the drivers/forcing in statistical terminology) recovers the same driver effects as hist-nat, this indicates that there is not a driver-forcing interaction that modifies the driver effect. Similarly, in this case there would be no modification to the forcing effect, either.
To compare the various regression coefficients for each climate driver across the different experiments versus what we obtain from the hist-nat experiment, we use the statistics of a Taylor diagram (Taylor 2001), a traditional diagnostic plot for comparing climate model output with a reference data set, usually an observationally based product. In this case, we use the spatial fields of hist-nat coefficients from each model as the "reference" data set, and then calculate the spatial pattern correlation (Pearson correlation across pairs of grid cells) and relative standard deviation (standard deviation of the comparison data divided by the standard deviation of the reference data) for all other experiments from each model. We also calculate the overall CONUS-average bias of the driver coefficients for the various experiments relative to hist-nat. After accounting for uncertainty in the bias, relative standard deviation, and pattern correlation via 95% bootstrap confidence intervals, we tally the number of models/experiments for which the 95% confidence intervals do not differ significantly from the hist-nat (all comparisons are done within-model). For the bias and relative standard deviation, there are straightforward null values for these statistics: a bias of zero and a relative standard deviation of one. For the pattern correlation, we instead calculate the null value as the intra-experiment pattern correlation calculated across the hist-nat ensemble. For the bias and relative standard deviation, the total number of model/experiments is 40 (there are 51 total, 11 of which are the reference hist-nat runs); for the pattern correlation, the total number of models/experiments is 37 since the GFDL-ESM4 hist-nat runs only have a single ensemble member and hence we cannot calculate a within-experiment pattern correlation.
The tallied results are shown in Table 2. For the bias and relative standard deviation, in most cases 80% (or more) of the models/experiments have estimated driver coefficients that are indistinguishable from the hist-nat experiment when aggregating across CONUS. Given that these comparisons are based on 95% confidence intervals, we would expect approximately 95% agreement if the experiments were truly indistinguishable: this is almost always the case for the bias metric, while the relative standard deviation tends to show slightly less agreement with hist-nat. The agreement for pattern correlation is somewhat reduced, relative to the bias and relative standard deviation, although in almost all cases the agreement is better than 50% (and in many cases much higher). This could be due in part to the relatively small histnat ensembles, i.e., the intra-experiment pattern correlation is calculated based on an ensemble of as few as three (in most cases), with only CanESM5 and IPSL-CM6A-LR having more than nine ensemble members. Supplemental Figures G.4-G.8 show the information summarized in Table 2 Table 2 Fraction of model/ experiment pairs that do not differ significantly from the model's hist-nat experiment (40 total for a. and b.; 37 for c.) Bias not different from zero; relative standard deviation not different from 1; inter-experiment pattern correlation does not differ from the intra-experiment pattern correlation. "Does not differ" is determined when the 95% confidence interval includes the null value. Darker colors indicate a higher proportion of pairs that do not differ. Mean/GEV refers to the OLS regression on precipitation rate and the GEV regression on Rx1Day, respectively in a more verbose fashion, from which we can see that biases are generally centered on zero, the relative standard deviations are mostly centered on one, and the intra-experiment pattern correlations are generally as large or larger than the inter-experiment pattern correlations. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the individual forcing agents do not significantly alter the driver versus seasonal precipitation relationships. Hence, we can conclude that the driven component P D (⋅) of seasonal precipitation in Eq. 6 can be well-approximated by a linear combination of the five modes of climate variability considered here.

Isolating WMGHG and SO 2 for single-forcing vs. historical (H6)
In Sect. 4.2, we found that the externally forced component of seasonal precipitation over CONUS in the historical record can be well-approximated by two forcings, namely well-mixed greenhouse gas emissions (which generally causes an increase in precipitation) and anthropogenic aerosols (which generally cause a decrease in precipitation). The conclusion of much of the literature summarized in Sect. 1 (e.g., Min et al. 2011;Knutson and Zeng 2018) is that trends over CONUS are indistinguishable from zero. If the roughly zero signal we want to characterize, say z(t), is the sum of two components x(t) (WMGHGs) and y(t) (anthropogenic aerosols), then this further implies x(t) ∼ −y(t) such that any influence of WMGHGs on seasonal precipitation over CONUS is offset by an equal and opposite influence from anthropogenic aerosols. While a regression analysis should be able to estimate the effect of each forcing so long as WMGHGs and aerosols are sufficiently uncorrelated, it is nonetheless important to ensure that we can accurately distinguish the influence of WMGHGs from the influence of anthropogenic aerosols. Once again, the single-forcing DAMIP experiments (hist-GHG and hist-aer) together with the CMIP6 historical runs provide a test bed for answering this question. Again using the gridcell-specific analyses for these three experiments outlined in Sect. 4.4, note that across experiments we are modeling both x(t) and y(t) as a constant multiplied by a time series of forcing (in the terminology of Eq. 6, this is x(t) = WMGHG F WMGHG (t, WMGHG,m ) a n d y(t) = SO 2 F SO 2 (t, SO 2 ) ). Since the forcing time series are the same for the single forcing and historical runs, then if the experiment-specific estimates of the regression coeff i c i e n t s y i e l d hist-GHG WMGHG − historical WMGHG ∼ 0 a n d hist-aer SO 2 − historical SO 2 ∼ 0 (on average) we can conclude that our estimates of the influence of WMGHGs and anthropogenic aerosols are unbiased. Figure 7 shows the CONUSaverage bias for the WMGHG and SO 2 coefficients for the single-forcing experiments relative to the historical (i.e., single forcing minus historical) along with 95% basic bootstrap confidence intervals. For SO 2 , the bias is indistinguishable from zero for almost every model-season; for WMGHGs, the bias is somewhat more distinguishable from  zero (in that two to five of the models have confidence intervals that do not overlap zero), but in most cases the confidence intervals include zero. For WMGHGs, when the bias is nonzero, it is generally positive, meaning that we (on average) underestimate the influence of WMGHGs in the historical record. Nonetheless, given this general agreement between the single-forcing and historical experiments, we can indeed distinguish the influence of WMGHGs from the influence of anthropogenic aerosols. The same coefficient estimates can be further interrogated to assess whether the single-forcing vs. historical biases for WMGHGs are correlated with the SO 2 biases. Anti-correlation in the grid-cell WMGHG and SO 2 bias would imply that there is a compensating trade off between our estimates of the effect of these two forcings. We seek to minimize equal and opposite errors (t) i n x(t) = WMGHG F WMGHG (t, WMGHG ) a n d y(t) = SO 2 F SO 2 (t, SO 2 ) ) such that the fit to the observed precipitation time series is unperturbed, yet yields biased estimates of WMGHG and SO 2 , i.e.,

x(t) + y(t) → x(t) + y(t) + [ (t) − (t)] = [x(t) + (t)] + [y(t) − (t)]
WMGHG → and To answer this question, we calculate the Pearson correlation of the grid-cell SO 2 and WMGHG biases as well as a 95% bootstrap confidence interval; see Fig. 8. Across a large majority of the model/season/precipitation type, the confidence intervals for the correlation in biases includes zero. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the compensating errors in the effects of WMGHG and anthropogenic aerosol forcing are negligible.

State-dependence of signal-to-noise ratio (H7)
Our last test concerns the signal-to-noise ratio of seasonal precipitation, i.e., 1 − Var P W (t) Var P(t) , which we have assumed does not depend on the background state of the climate system. Note that we make no specific claims about the variability of the weather term P W (t) , nor do we make any claims about the total year-to-year variability of the entire time series P(t). Rather, this concerns the ratio of these variances, specifically over CONUS. This question is of particular importance because the uncertainty due to weather and fast internal variability, i.e., Var P W (t) , is the largest term (by far) in the time series: Risser et al. (2021) find that the noise of this term accounts for 90-95% (or more) of the variance in seasonal mean and extreme precipitation over CONUS. So, if the proportion of variance due to fast internal variability increases even slightly with warming, this reduces the magnitude of a signal that is minimal to begin with.   Fig. 9 Percent of background to total variability for each model and season in the All-Hist experiment (a) along with trends in the percent background variability per • C (b) calculated from the All-Hist and HAPPI simulations, with 95% bootstrap confidence intervals Table 3 A summary of the various hypotheses examined in this paper, along with the model data sets used, our conclusion ("Conclusion"), and a confidence statement ("Confidence") All statements and conclusions are strictly limited to the continental United States for the pre-industrial period to present day; the confidence labels reflect the categories defined by (Mastrandrea et al. 2010). See Appendix E for more information on how each confidence statement is determined † Note: "emiso2" refers to SO 2 emissions; "wetso2" refers to deposition of SO 2 by rainfall; "iso2" refers to column integrated SO 2 mass * Indicates cases where we fail to reject the null hypothesis versus cases where results are conclusive Hypothesis Model data sets used Conclusion Confidence H1 The WMGHG forcing is consistent with ≈ 2%∕K (for means) and ≈ 6%∕K (for extremes).
DAMIP (hist-GHG); CMIP6 piControl and 1pctCO2 Yes Likely H2 All cross-correlation terms in the forced component of seasonal precipitation are negligible.

AerChemMIP (piClim experiments) Yes Very likely
H4c Regionalized time series of emiso2 yield the same results as using either local or CONUS-wide estimates. To address this hypothesis, we can utilize the C20C+ and HAPPI experiments described in Sect. 3.1. The aforementioned analysis in Risser et al. (2021) utilized a large ensemble from the C20C+ All-Hist experiment; we repeat the analysis here with both the C20C+ All-Hist and HAPPI Plus15-and Plus20-Future. Specifically, for the five models (Table F.5 in Supplementary material) that provide at least 50 members over at least 10 years for each experiment, we calculate three quantities: the background variability, calculated as the across-year average of the within-year ensemble variance (denoted V B in Risser et al. 2021); the total variability, calculated as the ensemble average of the across-year variance (denoted V T in Risser et al. 2021); and the proportion of background vs. total variance. All quantities are first calculated for each CONUS grid cell and then averaged; uncertainty in the background (total) variability is quantified by bootstrapping years (ensemble members). The primary justification for comparing the intra-ensemble variability to weather noise is that the ensembles in each of the C20C+ and HAPPI experiments have identical upper (incoming solar) and lower (SST) boundary conditions as well as identical radiative forcings. The identical lower boundary conditions imply that long-term modes of internal variability (e.g., ENSO) that manifest in SST are also identical. Since the forcings, upper and lower boundary conditions, and long-term SST-driven modes of variability are the same across each ensemble, we can confidently postulate that the intra-ensemble variability is driven by the O( ) ⋘ 1 (i.e., limit of machine precision) perturbations added to the SST at the time t = 0 for each ensemble member. These perturbations result in rapid solution separation (Zhang et al. 2019) and variance across the ensemble in instantaneous meteorological states, which we treat as weather noise (Risser et al. 2021). Figure 9 shows the percent of background variability across models and in each season for the All-Hist experiment as well as trends in the percent variability with warming, calculated from the All-Hist ( +0.85 • C), Plus15-Future ( +1.55 • C), and Plus20-Future ( +2 • C) experiments; Supplemental Figure G.9 shows corresponding plots for the total and background variability. For the models considered here, the percent background variability is roughly 80-95% for mean and 90-98% for extreme precipitation, which is consistent with what Risser et al. (2021) found for a corresponding CAM5.1-1degree ensemble. Across mean and extreme precipitation, there are indeed increases in both the background and total variability with warming (see Supplemental Figure  G.9) which are in almost all cases significant. The general increases in both background and total variability, however, translate to roughly consistent proportion of background variability: in Fig. 9b the trends in the percent of background variability are indistinguishable from zero in all models, seasons, and for both mean/extreme precipitation. This result supports our confidence assessment in Appendix E that it is virtually certain (as precisely defined by Mastrandrea et al. (2010)) that the signal-to-noise ratio 1 − Var P W (t)∕Var P(t) is unaffected by warming.

Discussion
In this paper, we have described novel methodology to address the difficult problem of conducting regional detection and attribution for mean and extreme precipitation. Our framework can simultaneously account for anthropogenic forcing (the "signal") while explicitly modeling the natural variability (the "noise"). Furthermore, we use output from global climate models in a perfect data sense to rigorously test the application of a general formula for modeling a time series of precipitation. Modeling uncertainty is wellknown to be one of the main obstacles to conclusive attribution statements for regional precipitation, but here we turn this limitation into an opportunity by making sure the D&A methodology developed in this paper is insensitive to model uncertainty. Our focus in this paper has been on the CONUS due to the relatively low observational uncertainty, with a forthcoming paper applying our framework to century-length records of in situ measurements of seasonal precipitation. However, in principle, the series of hypotheses enumerated in this paper could be similarly applied to other global land regions to develop an appropriate formula for regional D&A of precipitation.
The analyses described in this paper are extensive and involve a large set of global climate model output. We have already attempted to summarize the scientific process that led to the development and testing of each hypothesis with the flow chart shown in Fig. 1. To further summarize our conclusions, we developed Table 3, which restates the hypotheses, the model data sets used for each hypothesis, our conclusions, and a measure of our confidence in each conclusion. Table 3 also denotes the hypotheses for which we failed to reject the null versus cases where we have definitively verified a conclusion. The confidence statement attempts to summarize the evidence for each hypothesis, albeit in a highly aggregated manner; nonetheless, we intend that a high-level statement for each of the lengthy analyses in the paper will provide a helpful and concise summary of our conclusions. Specific information on how these labels are determined is provided in Appendix E. The specific confidence labels (e.g., likely, very likely, etc.) reflect the categories defined by the Intergovernmental Panel on Climate Change (Mastrandrea et al. 2010). We reiterate that in this work the categories do not summarize a multi-expert assessment but instead provide a qualification of evidence obtained from the model analysis.
Traditional D&A techniques are based on the concept of Pearl causality (Hannart et al. 2015), which is applicable when using model-based counterfactual methods where an "intervention" can be applied and assessed. For example, the C20C+ D&A experiment explores pairs of experiments: one set includes anthropogenic and natural forcings, and the other includes natural forcings only. In this case, the "intervention" is anthropogenic forcing. When observations can be successfully projected onto the anthropogenicallyforced experiments, the reliance on global climate models is a strength because a clear causal statement can be made regarding the presence of human-induced climate change. However, when the goal is regional D&A for precipitation, the large internal variability combined with large model uncertainty (to say nothing of observational uncertainty) essentially prevents conclusive attribution statements regarding any anthropogenic influence. Of course, this may change in the future as models improve or the anthropogenic signal intensifies. Furthermore, there could be value in being selective with respect to which GCMs are used: first, one might validate a set of models for their ability to simulate salient features of the precipitation regime governing a regional hydroclimate (e.g., Gleckler et al. 2008) and rely on the most realistic models for D&A. Meanwhile, the methods identified in this paper incorporate the guidance of Sarojini et al. (2016) and are designed to maximize our usage of existing models and data sets. Our philosophy in this work uses climate models to derive and test a formula for regional D&A, but we plan to then set models aside and construct maps of attributable changes in precipitation using observations only. Such attribution statements use the concept of Granger causality (Granger 1969), which is complementary to more traditional techniques that use Pearl causality. Papers such as Risser and Wehner (2017) successfully made such Grangercausal statements for localized extreme precipitation over the Houston, Texas region in the aftermath of Hurricane Harvey. Granger-causal attribution statements are not as controlled as Pearl-causal statements since they do not involve a set of interventions, but such analyses can both motivate dynamical studies (e.g., the Hurricane Harvey study provided motivation for Patricola and Wehner 2018;Wang et al. 2018) and enhance confidence in attribution statements by using multiple analysis techniques to explore the causes of climate change. And, when climate models prove to be an insufficient tool for Pearl-causal attribution (as in the case of regional precipitation), Granger-causal statements are an important tool for both quantifying the local impacts of climate change on the water cycle and informing methodological developments that utilize Pearl causality. We note that Granger-causal statements are subject to the usual caveats regarding the effect of hidden covariates (i.e., "correlation does not imply causation"). However, using both Pearl and Granger causality together as we have done here removes some of the usual Granger limitations: the correlation identified in an observational analysis using the framework developed in this paper may indeed be causation. As Samset et al. (2016) showed, the response of global precipitation to forcing by CO 2 is dominated by a slow response associated with the effects of warming SSTs. SSTs have a lagged response to instantaneous WMGHG forcing due to thermal inertia of the oceans, which have absorbed over 90% of the excess energy gained by the climate system (Resplandy et al. 2019) and have a O(10 3 ) higher heat capacity than the atmosphere. To include the lagged response of precipitation to instantaneous WMGHG forcing in our general D&A formula, let us hypothesize that the regional precipitation P(t) tracks changes in global mean SST T(t) as a function of time t and that lags in the precipitation response are due to time lags in the response of the global temperature:

A1. Lagged effects of forcing on precipitation
The time evolution of T(t) is dictated by the first law of thermodynamics applied to the global ocean: Define a timescale = c p h∕ . Then the first law can be written as: The solution is: R = Unitless ratio of regional-to-global precipitation = Global precipitation sensitivity: 2%∕ K for mean and 6%∕ K for extremes In the limit = 0 , one can show that, as expected from Eq. 12 when the system is in equilibrium, This follows from Eqs. 13 and 14 if we define an integration variable x = (t − t � )∕ so that Substituting Eq. 13 into Eq. 11 gives an expression for the coefficients that appear, for example, in Eq. 2 for the forced component: where we have introduced the possibility that might exhibit dependence on . The lagged time series F(t, ) calculated from Eq. 14 using the WMGHG forcing time series F(t) from Fig. 2 is shown in Fig. 10 for = 1, … , 20.

A2. Decreasing lagged forcing with increasing lag timescales
The variation of the lagged forcing with lag timescale can be established by computing  The lagged WMGHG forcing F(t, ) calculated from Eq. 14 for ∈ {1, … , 20} (a), as well as the estimated WMGHG regression coefficient ( ) for global average annual precipitation rate as a function of the time lag (b). Estimates in b include 95% basic bootstrap confidence intervals. Note that a shows the decreasing lagged forcing with increasing lag as derived in Eq. 17 and b shows the increasing trend in ( ) with increasing lag derived in Eq. 20.

Note that term A is negative indefinite assuming
If P(t) is prescribed and the forcing obeys the assumptions used to obtain Eq. 17, then Eqs. 15 and 17 imply that The positive trend in ( ) with respect to is observed the DAMIP WMGHG-only runs (Fig. 10).

A3. Estimation of lag timescale using Gregory regression plots
Suppose F(t) = F 0 , i.e., a constant. This is the case in the abrupt4xCO2 runs, which we analyze to estimate . Then the solution for T(t) is This implies that the net energy input to the climate system, evaluated at the top of the atmosphere (TOA), which is just the RHS of the 1st law of thermodynamics, is This is indeed the behavior that's observed in climate models for t ⪅ 20 years in Gregory regression plots of N(t) versus T(t) (Gregory et al. 2004;Meehl et al. 2020). Then the models typically shift to a much slower rate of change.
The time-series of ln N i (t) from the abrupt4xCO2 MME indexed by i for the various models and ensemble members is computed via where the radiative fluxes on the RHS of Eq. 21 have been globally and annually averaged and are denoted by the names of CMIP6 output variables (Centre for Environmental Data Analysis (CEDA) 2016). We have determined three timescales f ,i , s,i , and b,i characterizing the "fast" and "slow" evolution of N i (t) and the "break" between these two regimes as follows: where Θ is the Heaviside function and F 0 = 8.053 is determined from the formulae in Etminan et al. (2016).
We use a Bayesian least squares fit to estimate the timescales f ,i , s,i , and b,i in Eq. 25 using the N i (t) time series from each GCM with abrupt4xCO2 ensembles in Table  F.2 in Supplementary material, aggregating over multiple ensemble members when available. A Bayesian framework is useful for ensuring that the timescales are consistent, i.e., b,i < f ,i and s,i < f ,i , as well as quantifying uncertainty in the break point b,i . Also note that the intercept ln F 0 is fixed at its theoretical value of 8.053. To illustrate the analysis, we show results from the two-member CanESM5 ensemble and six-member MRI-ESM2-0 ensemble; see Fig. 11. The natural log of the net radiative input in each year is shown along with the estimated timescales f ,i , s,i , and b,i . In general, the timescale i used to calculate the lagged WMGHG forcing time series F(t, i ) is set equal to f ,i if f ,i < 33 (the 90th percentile of the multi-model ensemble) and b,i otherwise. For these two models, we have i = 14 for CanESM5 and i = 11 for MRI-ESM2-0. The estimated timescales for all other CMIP6 models with abrupt4xCO2 data are shown in Table 4. For models without abrupt4xCO2 data (e.g., FGOALS-g3), we use the CMIP6 multi-model ensemble average of = 14.

B. Test of H1 under significantly increased warming
To assess our test of H1 under much greater levels of warming than is imposed on the CMIP6 historical experiment, we can utilize climate model runs from the CMIP6 DECK experiments piControl and 1pctCO2 introduced in Sect. 3.1. For each CMIP6 model considered, we first extract the global average of mean daily precipitation and Rx1Day as well as monthly surface temperature for each year in both the piControl and 1pctCO2 runs. Furthermore, we derive the time series of WMGHG forcing for the 1pctCO2 runs using the specified boundary conditions for the 1pctCO2 experiment and various formulae described in Sect. 3.2. We then estimate the changes in precipitation (mean and extreme) using both the raw values of surface temperature and a regression versus WMGHG forcing, for each of two formulations. In formulation 1, the baseline temperature and precipitation come from the last 50 years of the piControl run; then, for year y = 50, 51, … , N yr of each 1pctCO2 run (where N yr is the number of years in the simulation), we compare the precipitation and temperature measurement in year y with the baseline (here, piControl) precipitation and temperature values to estimate the change in precipitation ΔP (%), change in temperature T (K), and scaling factor ΔP∕ T . In formulation 2, we instead use a moving window approach, wherein year y − 49 is the baseline year; in other words, we look at successive 50-year subsets of the 1pctCO2 run. For both of these formulations, we further calculate the precipitation totals in two ways: first, using the "raw" values, i.e., the annual precipitation rate and Rx1Day in the baseline and comparison years; second, we regress the temperature, precipitation rate, and Rx1Day versus the WMGHG forcing (temperature and precipitation rate via ordinary least square; Rx1Day using a GEV distribution as in Eqs. 7 and 8). In the second approach, the starting and ending precipitation/ temperature value come from the fitted regression analysis. The results are shown in Fig. 12

C. Correlations between WMGHGs and SO 2 emissions for 1900-present
As was mentioned in Sect. 4.3.3, there are very strong correlations between the SO 2 emissions time series and the annual WMGHG time series across CONUS for 1900-present when considering the individual 0.5 • × 0.5 • grid cell measurements of SO 2 emissions Gidden et al. 2018). On the other hand, the CONUS-wide sum-total emissions time series over the same time period has a much weaker correlation with the WMGHG time series. We now provide additional details to explain this somewhat counterintuitive result. In the next few paragraphs, we focus on the annual SO 2 emissions time series, although similar results hold for the seasonal time series used for the analysis in Sect. 4.3.3.
The average annual SO 2 emissions at each grid cell are shown in Fig. 13a, d for 1850-2015 and 1900-2015, respectively. Even with the color bar shown on the logarithmic scale, it is clear that the typical SO 2 emissions show significant heterogeneity across CONUS: in most of the eastern U.S. (east of 95 • W), there are moderate emissions, with particularly strong emissions at locations that presumably include an industrial center. Alternatively, in the rural areas of the central U.S., the emissions are much smaller (by a factor of > 100 ), particularly across large swaths of the Northern Great Plains. Indeed, this heterogeneity is what led us to initially consider using spatially resolved local estimates of SO 2 emissions in the analyses throughout this paper.

D. The algorithm for the Stochastic Regionalizer
The Stochastic Regionalizer (SR hereafter) is designed to divide a connected region of the Earth's surface, in this case the continental United States, into non-overlapping domains with specific properties to facilitate separating the effects of WMGHG forcing F WMGHG (t) and SO 2 emissions F aer (t) on precipitation. The specific properties are set by a userselectable cost function C( , t) that depends on location and time t. In general, C( , t) is a function of the time series of these forcings: The SR operates on data on rectilinear grids from the family of cylindrical projections. The locations correspond to points in the center of each grid cell. Let R n = [ 1 , 2 , … , n ] denote a region of n adjacent grid points in contact with each F aer ( , t), t other along either meridians or circles of latitude, i.e., not at just at the corners of the grid boxes. Let denote the area-weighted regional average of SO 2 emissions where a i is the area of the grid cell centered on i . The cost function can also be computed on this region as: The SR constructs regions R n such that the absolute value of the cost function falls below a user-specified threshold C max : The algorithm is stochastic to ensure that our results are insensitive to the precise geographical details of any single regional partitioning. In our application of the SR to test hypotheses H4, H5, and H6, for each season we generated 100 realizations of regional partitioning differentiated by random numbers utilized throughout the construction of each region.
The SR algorithm > 0 m is as follows: We have utilized two cost functions in the SR. Originally we chose to minimize the Pearson correlation (x, y) between two vectors x and y implemented as We set C max using the student t-test. Let be the significance level for the null student t-test hypothesis H 0 that the two time series x and y are uncorrelated. We use = 0.5 . We solve for the maximum Pearson correlation coefficient consistent with this significance level by solving for the t value from the cumulative student t-test distribution function, then inverting the relationship between t and to obtain max = C max .
Let n be the length of the time series x and y, and define = n − 2 . Denoting the cumulative distribution for the student t-test by T(t, ) , we solve for the test statistic t from and I is an incomplete Bessel function. Using we obtain (32) C(R n , t) = � F WMGHG (t), ⟨F aer (t)⟩ � �R n � (33) = 2 1 − T(t, ) For the tests of hypotheses H4, H5, and H6, we used a different cost function designed to minimize the baseline emissions and change in emissions of SO 2 in each region. After fitting ⟨F aer (t)⟩ � �R n with a + b t using linear least squares, we set the cost function to We set C max to a user-settable fraction f ≪ 1 of the 1970 (near maximum) CONUS-mean anthropogenic flux of SO 2 expressed in units of kg/m 2 /s, the same as units those used in input4MIPs (Feng et al. 2020). Adopting an area A = 7663941.7 km 2 for the 48 contiguous states and 1970 SO 2 emissions of E = 31218000 metric tons (EPA 2020), we obtain a CONUS-mean annually-averaged flux of SO 2 given by In our tests of hypotheses H4, H5, and H6, we set f = 0.03 and set To better separate the effects of WMGHGs and SO 2 emissions on precipitation (Fig. 14), we exploit the fact that SO 2 emissions for CONUS rose until the early 1970s, after which the emissions have recently fallen to levels typical of the 19th century (Figs. 2 and 15) due to imposition of mitigation measures for acid rain (NAS 1981). Let t max be the year when CONUS-mean SO 2 emissions F SO 2 (t max ) = max t F SO 2 (t) , and denote years t < t max by t < and times t > t max by t > . For each season, we have constructed series of pairs of years (t <,i , t >,i ) such that the seasonally-averaged CONUS-mean Each colored line extends between a pair of times (t <,i , t >,i ) F SO 2 (t <,i ) ≃ F SO 2 (t >,i ) (Fig. 15). We denote differences in a quantity x(t) between times t <,i and t >,i by We then define a fractional difference in emissions by The series of year pairs and values of e i are listed for each season in Table 5. The average absolute values of e i are less than 0.5% throughout the annual cycle. This implies that at the CONUS scale, time series of i F SO 2 (t) exhibit minimal trends with time. However since the lagged WMGHG forcing is monotonically increasing with time, time series of i F WMGHG (t, WMGHG ) increase rapidly with increasing i t = t >,i − t <,i . We use this difference in the temporal trends of increments in forcing by SO 2 and WMGHGs to facilitate separating the effects of these agents on precipitation. The cost function we employ in the SR algorithm is a form of Eq. 28 modified as follows:

E. Confidence statements for each hypothesis
As described in Sect 5, we provided a summary of the extensive analysis in this paper in Table 3, which includes a measure of our confidence in the conclusion of each hypothesis. Any attempt to distill the results of each analysis might feel oversimplified since the various hypothesis analyze numerous models, four seasons, and two types of precipitation (mean and extreme). Nonetheless, such statements provide an important summary that can quickly communicate the results of our explorations and the strength of evidence for (44) each test. We now describe in greater detail how these confidence statements are derived. The general principle used to derive the confidence statements is to consider the proportion of models, seasons, precipitation types, etc. for which the hypothesis is verified. While this approach implicitly assumes model democracy (i.e., the proportion is not weighted based on any assessment of model quality; see e.g., Knutti et al. (2017)) and furthermore does not account for model dependence (i.e., that the behavior of multiple models may be related due to similar model components; see e.g., Knutti et al. (2013)), more refined statements are beyond the scope of the current work. The conclusions for an individual model/season/etc. is made based on the error bars of a 95% confidence interval and hence robustly accounts for uncertainty throughout. Using a fixed confidence level (95%), the proportion of individual comparisons that satisfy each hypothesis is then compared with the categories in Table 1 of Mastrandrea et al. (2010): ≥ 0.99 implies "virtually certain," ≥ 0.9 and < 0.99 implies "very likely," ≥ 0.66 and < 0.9 implies "likely," ≥ 0.33 and < 0.66 implies "about as likely as not," ≥ 0.1 and < 0.33 implies "unlikely," ≥ 0.01 and < 0.1 implies "very unlikely," and < 0.01 implies "exceptionally unlikely." We now describe the specific quantities used to determine the confidence label for each hypothesis.
Hypothesis 1. The quantities of interest here are the DAMIP hist-GHG scaling estimates shown in Fig. 3b. Our standard of comparison for this hypothesis is the interquartile range of scaling rates estimated for CMIP5 models in Kharin et al. (2013) and shown with blue bars on Fig. 3b: 1.7%K −1 to 2.2%K −1 for mean precipitation and 4.3%K −1 to 8.0%K −1 for extreme precipitation. Across the 11 models and precipitation type (mean/extreme), we count the number of scaling estimates for which the 95% confidence interval is indistinguishable from the Kharin et al. (2013) Fig. 3c. Our standard of comparison for this hypothesis is the gray 95% uncertainty band for each precipitation type from a linear fit to the model estimates: since this uncertainty band covers the 1-1 line for the range of estimates in the historical and hist-GHG experiments, we tally the number of models that are consistent with this uncertainty band (i.e., the individual model CIs overlap with the uncertainty band in at least one axis): Hypothesis 3. The quantities of interest here are the singleforcing trend estimates summarized in Fig. 4, given for each experiment, model, season, and precipitation type. For each single-forcing experiment (hist-Lu, hist-GHG, hist-aer, histnat, and hist-stratO3), we tally the proportion of models/ seasons/precipitation type for which the trend error bars do not include zero. Here, "Number of comp." indicates the number of models/seasons/precipitation type for which we have trend estimates, "# CI ≠ 0" indicates the number of trend confidence intervals that do not include zero, "Confidence" is the third column divided by the second column, and "Label" converts the proportion to the IPCC categories: The individual forcings that we deem non-negligible are those that are at least likely, i.e., WMGHGs and anthropogenic aerosols, while the others (LULCC, natural, and stratospheric ozone) are deemed negligible.
Hypothesis 4a. For this hypothesis, we consider the number of model-season pairs for which the sum of black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia cannot be reliably distinguished from the control run for Northern Hemisphere land regions. Here, "does not differ" is determined when the 95% confidence interval includes 0. As shown in Table 1 Hypothesis 4b. The quantities of interest here are the correlations between changes in the various SO 2 types and precipitation summarized in Fig. 5 across models, seasons, and precipitation type. For the models that have output data for at least two of the three SO 2 types (emiso2, iso2, wetso2), we tally the pairs of error bars that overlap. Here, "Number of comp." indicates the number of pairs of SO 2 type across models/seasons/precipitation types, "# overlap CI" indicates