Explaining the unexplainable: leveraging extremal dependence to characterize the 2021 Pacific Northwest heatwave

In late June, 2021, a devastating heatwave affected the US Pacific Northwest and western Canada, breaking numerous all-time temperature records by large margins and directly causing hundreds of fatalities. The observed 2021 daily maximum temperature across much of the U.S. Pacific Northwest exceeded upper bound estimates obtained from single-station temperature records even after accounting for anthropogenic climate change, meaning that the event could not have been predicted under standard univariate extreme value analysis assumptions. In this work, we utilize a flexible spatial extremes model that considers all stations across the Pacific Northwest domain and accounts for the fact that many stations simultaneously experience extreme temperatures. Our analysis incorporates the effects of anthropogenic forcing and natural climate variability in order to better characterize time-varying changes in the distribution of daily temperature extremes. We show that greenhouse gas forcing, drought conditions and large-scale atmospheric modes of variability all have significant impact on summertime maximum temperatures in this region. Our model represents a significant improvement over corresponding single-station analysis, and our posterior medians of the upper bounds are able to anticipate more than 96% of the observed 2021 high station temperatures after properly accounting for extremal dependence.


Introduction
The immediate synoptic causes of the deadly and unprecedented heatwave across the US Pacific Northwest (PNW) in 2021 are quite clear. An eastward moving wave train propagated from the western subtropical Pacific and developed into an omega atmospheric blocking pattern centered over Western Canada (McKinnon and Simpson, 2022). Here, the term omega alludes to the shape of the mid-latitude jet stream, a narrow, persistent band of high winds aloft, during such meteorological conditions, with relatively warm air and high pressure occurring at the surface under the northward bend of the jet stream. Popularly known as a 'heat dome', the resulting local high pressure and associated anticyclonic circulation caused downward air motion and heating by adiabatic compression (Wang et al., 2022), while clear skies enhanced solar heating at the surface (Mo et al., 2022). Such meteorological conditions are common during heatwaves, however this one differed from known cases in that the air above the Pacific Northwest was pre-heated by condensation (clouds and precipitation) within an atmospheric river that occurred several days prior over the Pacific Ocean. This pre-heated air then rose further in temperature as it descended toward the surface and compressed due to increasing pressure (adiabatic compression). In short, an unusual summer trans-Pacific atmospheric river injected latent heat energy and moisture raising temperatures further (Mo et al., 2022). Furthermore, the intensity of the blocking pattern may been influenced by an Arctic polar vortex split preceding the heatwave (Wang et al., 2022). Despite this complicated meteorology, the unprecedented temperatures and other heat indices experienced during the 2021 PNW heatwave were well predicted several days in advance by forecast centers (Emerton et al., 2022). Undoubtedly, anthropogenic climate change also played a role in the 2021 PNW heatwave but its far outlier temperatures challenge extreme value statistical analyses to confidently predict the event's rarity from previous observations (Bercos-Hickey et al., 2022;Philip et al., 2021). In these two previous studies, the upper bound of a Generalized Extreme Value (GEV) distribution fitted to observed maximum summer daily maximum temperatures, from station data prior to 2021, was lower than the hottest observed 2021 temperatures. As a result, it is impossible to explain the true abnormality of this event or quantify the probability of a similar event in the future using their nonstationary extreme value statistical methods.
Alternatively, Bercos-Hickey et al. (2022) performed a storyline hindcast attribution study using two regional climate models (the Weather Research and Forecasting model, WRF, and RegCM) to attempt to quantify the magnitude of the human influence without addressing the question of the changes in rarity. Their study found that human-induced climate change increased the 2021 PNW heatwave temperature by approximately 1 • C. Philip et al. (2021) included the 2021 temperatures in a fitted GEV distribution and used that information along with summer annual maximum temperatures from long climate model simulations to find a human increase in the 2021 PNW heatwave temperature of about 2 • C. Based on this in-sample GEV analysis, they further concluded that the event was "virtually impossible" without anthropogenic climate change. However, Bercos-Hickey et al. (2022) found that including the 2021 temperatures decreases the p-values of the χ 2 goodness-of-fit tests from greater than 0.2 in general to less than 0.05 at many stations, implying that the 2021 seasonal maxima are generated from a different underlying distribution than the previous years. McKinnon and Simpson (2022) used a large climate model ensemble simulation to examine gridboxes across the global land where simulated maximum temperatures exhibited similar skewness and kurtosis to the observed station temperatures in the PNW region. They found that while the model could produce temperatures that exceed 4.5 standard deviations above average temperatures (similar to the 2021 PNW observations), such occurrences are extremely rare with return periods of about 100,000 years. Finally, Heeter et al. (2023) examined tree ring records finding that the 2021 summer temperatures in the PNW regions were unprecedented in the last millennium.
In this paper, our objective is to examine the statistical difficulties of estimating the probability of far outlier temperatures along with human-and naturally-driven changes to such quantities. Such estimates are required to make the attribution statements that advance our understanding of how anthropogenic forcing and natural climate variability affected the rarity of the PNW heatwave, similarly to those in Philip et al. (2021) and Bercos-Hickey et al. (2022). The primary challenge is that the out-of-sample GEV analyses in these two studies both yielded estimated upper bounds for temperature extremes lower than the observed 2021 daily maxima, meaning the PNW heatwave was impossible to anticipate in advance of its occurrence under station data-based models. Consequently, the human influence on these "impossible" events cannot be attributed, either in likelihood or severity because they are not within the bounds (i.e., supports) of the out-of-sample model fits. This unexplainability is largely due to the fact that analyses in the two studies represent simplified approaches to the problem of estimating the spatial extent of an event like the PNW heatwave: Philip et al. (2021) analyzed the spatial average of annual maxima from all weather stations in a fixed latitude-longitude box (which both aggregates over important topographical variability in the PNW domain and ignores the fact that historical events may have been offset geographically from this box), while Bercos-Hickey et al. (2022) utilize single-station analyses (which ignore both autocorrelation in the climatology of temperature extremes and the fact that individual heatwave events will impact multiple weather stations). As outlined in Zhang et al. (2022), failing to account for the spatial coherence of individual events results in underestimation of the risks associated with weather extremes (Davison et al., 2012;Saunders et al., 2017). Our hypothesis is that extreme value analysis methods that more appropriately account for both climatological-and event-based-autocorrelation (e.g., Huser et al., 2017;Huser and Wadsworth, 2019;Zhang et al., 2021) can better characterize temperature extremes in the PNW such that the 2021 temperatures are less "unexplainable" in an out-of-sample statistical sense. Importantly, these methods should also improve the predictive power of the out-of-sample analysis on the 2021 PNW heatwave so that we can construct well-defined attribution statements.
To test this hypothesis, we analyze the summertime maxima of homogenized Pacific Northwest U.S. temperature station data from 1950 to present using a statistical model that flexibly accounts for spatial variability in the climatology of extreme temperatures and yields data-driven measures of the spatial extent of heatwave events in the PNW. Our statistical framework can account for both secular trends due to anthropogenic and naturally forced year-to-year variability while robustly quantifying uncertainty via Bayesian hierarchical modeling. We explicitly quantify the individual effect of including physical covariates versus accounting for the spatial coherence of heatwave events on the corresponding GEV upper bounds and whether or not the 2021 temperature extremes could have been anticipated in an out-of-sample sense. We then revisit the climate change attribution question, using a statistical counterfactual (following Risser and Wehner, 2017) invoking Granger causal inference (Granger, 1969) to isolate the effect of anthropogenic forcing on both the predictability of the 2021 temperatures and their probabilities.
The paper proceeds as follows. Section 2 describes all input data sources for our analysis, while in Section 3 we outline our methodological innovations, including the construction of the Bayesian hierarchical model (Section 3.1), an attribution framework for isolating the human influence on the heatwave (Section 3.2), and descriptions of alternative statistical models (Section 3.3). Our main results are provided in Section 4, and Section 5 concludes  2 Data 2.1 Long-term temperature records for trend analysis In order to assess secular trends over time and year-to-year variability that are unaffected by inhomogeneities from nonclimatic influences (such as changes to measurement devices and the local effects of the built environment), we analyze a homogenized version of the Global Historical Climatology Network-Daily (GHCN-D) over the contiguous United States (Rennie et al., 2019). While the homogenized records extend back to the early 20th century, we focus our analysis on the last 71 years  due to the high density of nonmissing measurements from weather stations during this period. Focusing our attention on the 487 gauged locations from this database within a longitude-latitude box defined by [125 • W, 116.5 Supplemental Figure A.6(b)), we extract the summertime (June/July/August or JJA) maximum daily maximum temperature measurement, denoted "TXx," from each station in each year so long as there are no more than 40% missing daily measurements in each JJA season. Stations with less than ten non-missing JJA TXx from 1950-2020 are excluded; this results in n = 438 gauged locations with sufficient temperature records. The seasonal maxima from these locations are used in the statistical analysis described in Section 3; note that we do not include any measurements from 2021 in our analysis of the historical record (i.e., the 2021 event is considered out-of-sample).

Estimation of the 2021 PNW temperature extremes
Homogenized temperature records are used for trend analysis because they explicitly modify the raw station measurements to account for known, non-natural shifts in the distribution of daily temperatures. However, to quantify the extreme daily temperatures that actually occurred during the heatwave period (defined as June 25 to July 4, 2021), we prefer to use the available non-homogenized GHCN-D (Menne et al., 2012) records from the longitudelatitude box shown in Figure 1. From the GHCN-D records with available daily maximum temperatures over these ten days, we see that a large majority of the maximum daily measurements (the 2021 TXx) occurred between June 26 and July 1 (see Supplemental Figure A.2); there are n = 470 GHCN-D gauged locations with six non-missing daily maximum temperatures over these six days (see Supplemental Figure A.3). Unfortunately, there is a mismatch in the n = 438 homogenized sites used to estimate historical trends and the n = 470 GHCN-D sites used to define the 2021 event (see Supplemental Figure A.5a.).
To ensure that we can report results for all gauged locations in the homogenized records, we apply the spatial statistical model proposed in Eq. (5), wherein we use thin plate splines and topographical covariates to conduct interpolation to a regular 800m grid over the PNW domain. Supplemental Figure A.4 shows the GHCN-D TXx used as input data as well as the results of this interpolation for the n = 470 GHCN-D sites (Supp. Fig A.4(b)), the n = 438 homogenized sites (Supp. Fig A.4(c); these values are identical to the ones shown in Figure 1), and the 800m grid (Supp. Fig A.4(d)). As shown in Supplemental Figure A.5(b) and (c), the interpolated TXx measurements compare favorably with the GHCN-D input data (R 2 = 0.94), and furthermore provide a better reconstruction than corresponding measurements from the nearest-neighbor ERA5 grid box (ERA5 is a stateof-the-art atmospheric reanalysis product, Hersbach et al., 2020;Bell et al., 2021, that is widely used in climate analysis).

Anthropogenic and natural drivers of temperature change
One benefit of our methodology is that we can include physical covariates to account for spatio-temporal nonstationarity in GEV distributions to describe both long-term trends in the distribution of temperature extremes as well as year-to-year variability due to modes of natural climate variability (see Section 3). Secular trends. Dating back to seminal work by Arrhenius (1897), it is well-established that anthropogenic greenhouse gas (GHG) emissions drive increases to global mean temperature via radiative forcing of the climate system; this was most recently underscored in the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report, which stated "Human activities, principally through emissions of greenhouse gases, have unequivocally caused global warming" (Lee et al., 2023). Following Risser et al. (2022), we use the sum-total forcing time series of the five well-mixed greenhouse gases (CO 2 , CH 4 , N 2 O, and the CFC-11 and CFC-12 halocarbons) to describe long-term, human-induced secular trends in the distribution of extreme temperature measurements. The radiative forcing time series (see Figure A.1(a)) is calculated from reconstructed atmospheric concentrations of each greenhouse gas (Meinshausen and Vogel, 2016;Meinshausen and Nicholls, 2018) via the forcing formulae given in Etminan et al. (2016) and Hodnebrog et al. (2013); a time lag is also included to account for the lagged response of sea surface temperatures to these forcings (see Risser et al., 2022, for further details). Thus, the greenhouse gas forcing time series, denoted by GHG t , t = 1950, . . . , 2021, characterizes a nonlinear trend over 1950-2021, and is applied uniformly across space.
Large-scale modes of oceanic variability. The El Niño-Southern Oscillation (ENSO) is a coupled ocean-atmosphere mode of natural climate variability that cycles between positive (El Niño) and negative (La Niña) phases every two to seven years (Philander, 1985). While the effect of ENSO on winter climate in the Pacific Northwest is robust (Whan and Zwiers, 2017), ENSO also influences the large scale circulation in the region during the summer (Stuecker et al., 2015). Given the complicated dynamics responsible for the 2021 PNW event, we use the ENSO Longitude Index (ELI, Williams and Patricola, 2018) as a covariate to represent this influence. ELI is a sea surface temperature-based index that summarizes the average longitude of deep convection in the Walker Circulation and accounts for the different spatial patterns of ENSO. The JJA seasonally-averaged ENSO time series is applied uniformly across space and is denoted by ELI t , t = 1950, . . . , 2021; see Palmer Drought Severity Index. Heatwave temperatures are strongly influenced by evapotranspirative cooling from the surface soil moisture content and the local vegetation (Domeisen et al., 2023). Furthermore, high temperatures can reduce this available surface moisture. To capture this feedback in our statistical model we use the Palmer Drought Severity Index (PDSI) as another physically motivated covariate (Palmer, 1965). PDSI quantifies the relative surface dryness through moisture supply and demand by calculating the cumulative departure in water balance using precipitation and temperature data. In our analysis, we use the PRISM monthly PDSI data from the West Wide Drought Tracker (Abatzoglou et al., 2017), and flip the sign of their indices so that PDSI ranges from about -10 (wet) to +10 (dry) with values above 3 representing severe to extreme drought. We then calculate the JJA average for each year t = 1950, . . . , 2021 at each spatial location s (denoted by PDSI t (s)). Just prior to the 2021 PNW heatwave, much of Washington State and Oregon were in preexisting severe drought conditions (see Figure A.1(c)).
Urbanization Binary Index. The urban heat island effect is another well-known anthropogenic factor influencing temperature distributions within urban and suburban environments (e.g., Chen and Dirmeyer, 2019) due to differences in heat fluxes (both sensible and latent) and momentum fluxes (e.g., surface roughness) that impact the evolution of daily temperatures (Oleson et al., 2011). To account for the effects of urbanization in a parsimonious manner, we treat urbanization as a binary covariate. Markley et al. (2022) developed a reliable "urbanization year" indicator, based on United State census data, that denotes if and when census tracts became "urbanized" during 1940-2021. They define a region as urbanized if its housing density exceeds 200 housing units per square mile. To model the potential urban heat island effect, we use the dataset of Markley et al. (2022) to create a binary variable Urban t (s) for each year t = 1950, . . . , 2021 to indicate whether the census tract surrounding the station s has been urbanized in year t by finding the corresponding census tract and its urbanization year.

Characterizing spatial nonstationarity in GEV climatology
The anthropogenic and natural drivers described in Section 2.3 are used to account for nonstationarity over space and time in the distribution of extreme temperature measurements (see Eq. (4)). However, we use an additional set of covariates to statistically model spatial nonstationarity in the parameters of the extreme temperature distributions (described in Eq. (5)). In addition to thin plate splines, the following spatial-only covariates are used to statistically model spatial variability in distributional parameters: • The natural logarithm of the average monthly precipitation over January, 1950, to December, 2020, as estimated from the PRISM data set (Daly et al., 1994). This covariate is used to discriminate between wet and dry regimes of the PNW.
• Topographical aspect, slope, and elevation. Elevation (meters above sea level) accounts for the fact that extreme temperatures tend to decrease with increasing altitude (via lapse rate theory). Topographical aspect describes the direction in which each location faces (ranging from zero to 2π), while topographical slope is an index that describes the steepness; collectively, slope and aspect account for the effect of solar radiation and exposure on extreme temperatures. Elevation is obtained from the 800m digital elevation map used to generate the PRISM data product; slope and aspect are calculated using the raster package for R (Hijmans, 2022).
• Distance-to-coast (km) to account for the fact that stations closer to the ocean tend to have cooler temperatures. This variable is calculated by NASA's Ocean Biology Processing Group (2009) at a global grid of 1/16 • using the Generic Mapping Tools package (Wessel and Smith, 1998). They did not consider the landlocked bodies of water (e.g., Lake Tahoe) to be oceans with a coast.
Maps of each of these quantities are shown in Supplemental Figure A.6. Since a 1/16 • degree grid is very high resolution, we simply interpolate them to the GHCN locations to obtain the values for model fitting.

Spatial extreme value analysis
In order to assess the effect of statistical assumptions on the failure of previous studies to characterize a nonzero probability in an out-of-sample context (as described in Section 1), we first describe a more refined nonstationary statistical model that flexibly accounts for spatial coherence in both the climatology of temperature extremes and individual heatwave events. The methodology is grounded in extreme value theory and is comprised of three building blocks, each of which represents important innovations relative to existing observational analyses of the 2021 PNW heatwave, and each of which is specifically tailored to the statistical modeling of daily temperature extremes in PNW: (1) nonstationary marginal GEV analysis including multiple covariates, (2) accounting for climatological spatial dependence, and (3) accounting for spatial coherence among individual events. While presented separately in the following paragraphs, these components are incorporated within a single Bayesian hierarchical model. All unknown statistical parameters are assigned noninformative prior distributions, and Markov chain Monte Carlo methods are used to generate samples from the resulting posterior distribution, upon which all subsequent inference is based. We now describe each of these components in turn.
Nonstationary marginal Generalized Extreme Value analysis. The first innovation of our methodology involves incorporating of a broad range of external information (also known as "covariates") to characterize spatially-and temporally-varying changes in the estimated distribution of extreme temperature. Based on classical extreme value theory (see Theorem 3.1.1 of Coles, 2001), the JJA maxima in year t at a generic weather station s from PNW, denoted by Y t (s), can be considered as arising from a generalized extreme value (GEV) distribution with the cumulative distribution function (CDF) The GEV distribution is characterized by the location parameter µ t (s) ∈ R, the scale parameter σ t (s) > 0 and the shape parameter ξ t (s) ∈ R. While the three GEV parameters provide useful information about the marginal behavior of temperature extremes over time, we are often more interested in summaries of the fitted GEV distribution. For the purposes of this analysis, we are interested in the so-called risk probability for the 2021 heatwave, which summarizes the probability of exceeding the location-specific TXx temperature threshold that occurred during the heatwave (see Figure 1(a); denoted by u(s)) in a given year t. Based on the form of the GEV distribution, we can calculate the risk probability as a direct function of the GEV parameters (Coles, 2001): Additionally, when the shape parameter ξ t (s) is negative, the GEV distribution has a finite, time-varying upper bound, which can also be written in terms of the GEV parameters as The fact that the GEV distribution has a finite bound when ξ t (s) < 0 has important implications for analyzing temperature extremes; see Section 4 for further discussion. Specific to analyzing temperature extremes in the Pacific Northwest, the spatio-temporal variability in GEV parameters are modeled as: in which GHG t , PDSI t (s), ELI t and Urban t (s) are anthropogenic and natural drivers defined in Section 2.3. We tested different combinations in a series of pointwise analyses and determined that this combination of covariates gives the maximum likelihood assuming independence among stations. Thus, {µ 0 (s), . . . , µ 4 (s), σ 0 (s), . . . , σ 2 (s), ξ(s)} are called GEV coefficients, which describe the climatological behavior of temperature extremes. Statistically modeling spatio-temporal variability in the GEV parameters in this manner implies a nonstationary model for extremes, and furthermore means that the risk probabilities in Eq.
(2) can be calculated as a function of the GEV coefficients and the drivers.
Accounting for climatological dependence. The second important innovation of our statistical modeling framework accounts for the fact that the climatology of temperature extremes exhibits spatial dependence: two locations that are nearby are more likely to have similar climatological extremes than two locations that are far apart. This spatial coherence is accounted for by forcing the climatological GEV coefficients to vary smoothly over the geospatial domain of interest. To this end, we suppose each marginal GEV coefficient is a function of thin plate splines (Wood, 2003) and topographical features. In other words, for γ ∈ {µ 0 , µ 1 , µ 2 , µ 3 , µ 4 , σ 0 , σ 1 , σ 2 , ξ}, we statistically model where the thin plate splines b i (·) are smooth functions of longitude and latitude and the g j (s), j = 1, . . . , 5 are the spatial-only covariates described in Section 2.4 (log average precipitation, elevation, slope, aspect, and distance-to-coast; see Supplemental Figure A.6).
We included 99 thin plate spline functions in order to flexibly model fine-scale spatial variability; however, to guard against over-fitting we furthermore included a regularization prior for the {β γ i : i = 1, . . . , 99} (see, e.g., Hans, 2009).
Accounting for the spatial coherence of individual events. While enforcing spatial coherence in the GEV coefficients following Eq. (5) is useful for describing climatological dependence, we must further account for the fact that individual heatwave events (aggregated over the course of a JJA season) will simulatenously affect multiple gauged locations. The third innovation of our methodology is that we account for this so-called "data-level" dependence in the maxima using a flexible method from the spatial extremes literature called a Gaussian scale mixture; see Zhang et al. (2021) for a full treatment of the theory and Zhang et al. (2022) for an application of the methodology to an analysis of extreme precipitation. At each time t, we assume the spatial field of JJA maxima {Y t (s)} has a copula defined by where R t ∼ Pareto{(1 − δ)/δ}, W t (s) is a standard isotropic and stationary Gaussian process with standard Pareto margins and (s) is a white Gaussian noise process with variance τ 2 . The covariance function of W t (s), C θ W (h), describes the covariance of W t (s) at pairs of locations separated by a distance h and is indexed by θ W = (ρ, ν), in which ρ is the range parameter, and ν is the smoothness parameter that controls how rough or blotchy the copula looks. In addition, R t at time t acts as a scaling factor that amplifies the simultaneous large but not yet extreme values in {W t (s)}. In the meantime, larger δ induces heavier-tailed R t and thus the amplifying effect from R t will tend to be more evident. Consequently, the Gaussian scale mixture {X t (s)} exhibits asymptotic independence when δ ∈ (0, 1/2] and asymptotic dependence when δ ∈ (1/2, 1). Larger δ values also induce higher sub-asymptotic dependence strength, and in the case of δ ∈ (0, 1/2] there is still weakening yet nonzero dependence at the sub-asymptotic levels. Using a welldefined dependence measure, we verified empirically in Appendix C that this copula model accurately quantifies the spatial dependence between stations during individual extreme heatwave events. As described in Section 2, we assume that marginally (i.e., at each station) the JJA maxima arise from the GEV distribution described by Eq. (1) and (4). Given that we have one measurement per year at each station, we assume that any long-term autocorrelation or year-to-year variability in the maxima at each station is fully described by the time-varying location and scale parameters and any residual variability is temporally independent. For simplicity, we henceforth write X tj = X t (s j ), X t = (X t1 , . . . , X t,438 ) T and so forth, in which s j denotes the location of jth station, j = 1, . . . , 438. In order to connect the copulas of the observed process {Y t (s)} and the Gaussian scale mixture {X t (s)}, we define the following marginal transformation: where F Y |γ(s),t is the GEV distribution function with parameters {µ t (s), σ t (s), ξ t (s)} defined in Eq. (4) and F X|δ,τ 2 is the marginal CDF for X t (s) in Eq. (6). This marginal transformation is tied together with the copula (6) in a single Bayesian hierarchical model to make inference on the model parameters; see Supplemental Section B for more details.

Statistical counterfactual for quantifying human influence
The statistical framework for extreme value analysis outlined in Eq. (1) and (4) uses covariates to describe long-term trends and year-to-year variability in temperature extremes in the Pacific Northwest, and as such we can use the fitted statistical model to isolate the human influence on various aspects of the heatwave following the approach used in Risser and Wehner (2017). Specifically, the statistical model specified by Eq. (4) implies that risk probabilities and upper bounds (in the case of ξ(s) < 0) are all functions of the input covariates, i.e., GHG forcing, the ELI, the PDSI, and the urbanization index. As such, we can use the fitted statistical model to estimate risk probabilities and upper bounds for arbitrary combinations of GHG forcing, ELI, PDSI, and the urbanization index, including combinations that did not actually occur over 1950-2021. Our results in Section 4 involve comparing risk probabilities and upper bounds for two different climate scenarios (summarized in Table 1): the "factual" quantities represent our best estimates for the conditions present when the heatwave occurred, and the "counterfactual" summarizes the isolated effect of GHG forcing while holding all other quantities fixed.
Exploring the counterfactual scenario in this way allows us to identify the individual effect of GHG forcing under consistent "background" conditions as specified by ELI, PDSI and urbanization. We refer to the counterfactual scenario as "statistical" since it leverages the underlying statistical model to predict the distribution of temperature extremes for a setting of climate variables that did not actually occur (i.e., 1950 levels of GHG forcing and present-day ELI, PDSI, and urbanization). Finally, in a Bayesian framework we obtain posterior samples from all unknown quantities in Eq. (1) and (4) which allows us to propagate all statistical uncertainty to yield posterior distributions of the GEV upper bounds and risk probabilities. Table 1: Summary of the two climate scenarios considered for quantifying human influence on aspects of the heatwave event and conducting attribution statements. The counterfactual is "statistical" in the sense that it is constructed from the fitted models using covariates (and correspond to conditions that never occurred in the real world). The years listed correspond to those used to determine the input values for the anthropogenic and natural drivers. In addition to summarizing the inferred GEV upper bounds and risk probabilities for each scenario, we furthermore use the risk probabilities to construct an attribution statement for the anthropogenic drivers. Specifically, we compare the factual probability of the 2021 heatwave event (as specified by the 2021 TXx shown in Figure 1(a)) with corresponding probabilities from the counterfactual scenario, commonly referred to as probabilistic event attribution (National Academies of Sciences and Engineering and Medicine, 2016). One way to summarize the anthropogenic influence on an event of interest is via the socalled "risk ratio" (Paciorek et al., 2018), i.e., the ratio of risk probabilities. The risk ratio is thus a unitless quantity that summarizes the multiplicative change in probability of a fixed threshold event due to anthropogenic influence; we calculate risk ratio via Here, p F represents the risk probability for the factual scenario and p C represents the risk probability for counterfactual; see Eq.
(2) for the definition of risk probability. RR(s) attributes the isolated effect of anthropogenic GHG forcing on the probability of the heatwave occuring at location s; a risk ratio of greater than one indicates that anthropogenic influence caused the 2021 PNW heatwave to become more likely, while a risk ratio of less than one indicates that anthropogenic influence caused the event to become less likely. We summarize best estimates and uncertainties in the risk ratios via their posterior distributions, which are obtained from the posterior distributions of the risk probabilities.

Alternative statistical models
Recall that the main objective of this paper is to determine whether a more appropriate statistical model (i.e., one that accounts for all known relationships in the data) can characterize the rarity of the 2021 PNW heatwave in an out-of-sample sense and hence enable meaningful conclusions regarding the anthropogenic influence on the probability of the event. However, the statistical model proposed in Section 3.1 represents three improvements upon the methodology used in, e.g., Philip et al. (2021); Bercos-Hickey et al. (2022): (1) inclusion of additional spatio-temporal covariates (characterizing the influence of ELI, PSDI, and urbanization in addition to GHG forcing); (2) accounting for climatological dependence by including wet/dry regimes and orographic properties; and (3) accounting for spatial dependence in the tail of copula. In order to explicitly assess which aspect(s), if any, of our proposed model address the challenges of estimating out-of-sample probabilities related to an extremely rare heatwave, we consider three additional statistical models that can be viewed as special cases of the more general approach outlined in Section 3.1: M1 Pointwise, GHG only: this represents the approach used to analyze in situ records in Bercos-Hickey et al. (2022). Marginally, we simplify Eq. (4) such that the location parameter is a linear function of GHG forcing and the scale and shape parameters are time-invariant. Then, GEV estimates are obtained independently for each weather station record (hence the term "pointwise") such that both climatological and datalevel dependence are ignored.
M2 Pointwise, all covariates: this approach generalizes M1 by using the full suite of covariates (in both location and scale) described by Eq. (4); however, like M1, both climatological and data-level dependence are ignored.
M3 Climatological dependence only, all covariates: this approach generalizes M2 by again using the full suite of covariates (in both location and scale) described by Eq. (4); however, we now account for climatological dependence only and ignore data-level dependence. This is equivalent to the GEV-GP model in Cooley et al. (2007).
M4 All spatial, all covariates: this approach denotes the full model described in Section 3.1 that includes all covariates and accounts for both climatological and datalevel dependence.
These models are furthermore summarized in Table 2. Fitting each of these models in a Bayesian framework allows us to calculate best estimates and uncertainties for 2021 GEV upper bounds, risk probabilities, and risk ratios.

GEV coefficients
From each MCMC update of basis coeffcients {β γ 0 , β γ 1 , . . . , β γ B }, γ ∈ {µ 0 , . . . , µ 4 , σ 0 , . . . , σ 2 , ξ}, we plug each quantity into Eq. (5) and generate the GEV coefficient values {γ(s)} on an unobserved ∼ 4km × ∼ 4km (1/16 th degree) grid over the PNW region. Here we use the thin plate splines and topographical bases values calculated on the grid a priori, and we update {γ(s)} for each MCMC iteration. Figure 2 displays the pointwise posterior medians of GEV coefficients (denoted by {γ(s)}). In order to directly compare the effect of each covariate, we multiplyμ j (s) andσ j (s) by the range of the time series over 1950-2021 (i.e., converting units from • C per unit increase in covariate to just • C). First, forμ 1 (s) we show Figure 2(b) shows that an increase in GHG forcing led to a slight decrease of temperature near the Elliot Bay of Seattle and along the Pacific coast. However, an increase of at least 1 • C is observed in the majority of the PNW region, while the increase is more evident in the Yakima Valley of Washington (∼ 4 • C) and the Willamette Valley of Oregon(∼ 8 • C). This in and of itself could be an attribution statement even if a few observed temperatures are still out of bounds (see later discussion in Section 4.2), since the human influence on extreme temperatures is insensitive to rarity (see Figure 6 in . To show the maximum effect of PDSI, we examine the maximum effect on a pointwise basis and multiplyμ 2 (s) by the range of the time series {PDSI t (s) : t = 1950, . . . , 2021} at each location s, i.e.,μ Recall that PDSI > 0 indicates dry conditions, and PDSI < 0 indicates wet conditions. From Figure 2(b), PDSI has a consistently positive effect on the GEV location, driving a ≈ 1 • C increase uniformly across the domain except for the lower right corner where there are no GHCN stations included in the analysis.
Similarly to Eq. (8), we multiply µ 3 (s) and σ 2 (s) by the range of {ELI t : t = 1950, . . . , 2021}. The effect of ELI on the GEV location is generally weaker than GHG and PSDI, and (like GHG forcing) its effect on typical extreme temperatures is varied over the PNW. The effect of ELI on the GEV scale is also heterogeneous, with an evident decrease in the Shasta Valley of northern California and west of the Cascade Mountains in Washington.
For the urban heat island effect, we multiply µ 2 (s) by 1 to assess the potential temperature increase brought by urbanization. If Urban t (s) ≡ 0 for all t at location s, the urban heat island effect is non-existent even though µ 4 (s) might be positive in Figure 2; if Urban t (s) ≡ 1 for all t, there is some urbanization effect but it stays constant over 1950-2021. It is more interesting to look into locations where Urban t (s) starts out as 0 but changes to 1 before 2020.   Table 2 for the definition of M4 and see Eq. (8) to learn how to calculate the maximum effects.

2021 upper bounds
In Figure 3, we calculate the GEV distributional upper bounds using Eq. (3) and the outof-sample MCMC updates of the marginal parameters under the four modeling settings outlined in Table 2. The top row presents the pointwise medians of the posterior upper bounds in year 2021 using the factual conditions (see Table 1), in which the pointwise analyses (M1 and M2) produced relatively low upper bounds such that over 38% and 28%, respectively, of the stations had 2021 TXx values that exceed their corresponding theoretical upper bounds. In comparison, accounting for spatial dependence, even only at the climatological level, greatly improved the predictability of the 2021 heatwave event while increasing the GEV upper bounds consistently across PNW region: under M3, only 10.3% of stations have 2021 TXx that exceed the posterior median upper bound. Interestingly, the GEV shape parameters at many coastal stations changed from being positive in M1 to being negative but close to zero in M3 and M4 such that the upper bounds are finite but very large. In particular, the cluster of inexplicable 2021 TXx values near the Portland and Seattle-Tacoma metropolitan area in M1-M3 disappeared in M4 due to the incorporation of the urbanization index and extremal copula modeling. Further accounting for extremal dependence in M4 reduced the number of unexplainable events to just 3.4% of stations, demonstrating that accounting for both climatological and extremal dependence are critical for making the PNW heatwave event more predictable in an out of sample sense. Additionally, in one of our previous analyses in which we basically followed the M4 settings but excluded the urban binary index from the model, the proportion of stations with a 2021 TXx value exceeding the posterior median of upper bound was 9.4%, compared to 3.4% under M4 with the urban binary index in upper right panel of Figure 3. Nonetheless, there   Table 2). The first and second rows show upper bound estimates under the factual and counterfactual scenarios, respectively (see Table 1 for details). For these rows, the ' ' signifies an infinite upper bound (corresponding to ξ > 0), and '+' signifies stations for which the observed 2021 TXx exceeded the posterior median of the upper bound. The inset text in each panel displays the fraction of stations for which the upper bound is exceeded. The third row displays the differences in the upper bounds for each scenario.
are still some observations that exceed the posterior median GEV upper bound in spite of using a more sophisticated statistical model with the urbanization index, albeit the upper limits of our 95% credible intervals of the upper bounds from M4 contain the 2021 TXx at all stations. The middle and bottom rows of Figure 3 examine the isolated effect of GHG forcing under the counterfactual scenario, which show that many more stations in Washington and Oregon (between 8-12% more) would have failed to contain the 2021 TXx values if the GHG stayed at the 1950 level. Especially under M3 and M4, differences in the posterior medians of the upper bounds exceeds 3 • C in majority of the stations in Oregon; i.e., increases in GHG forcing drive increases in GEV upper bounds by more than 3 • C. Although M2 under the factual conditions failed to contain a large fraction of the 2021 TXx measurements, the differences in upper bounds are much higher than M1 and even comparable to spatial analyses. This indicates that including more covariates will augment the anthropogenic effects and increase the factual upper bounds but not enough to contain significantly more 2021 TXx values compared to M1. Interestingly, under all modeling settings and factual/counterfactual conditions, there are practically no stations in northern California and eastern Oregon whose 2021 TXx values exceeded the upper bounds, indicating a not-so-extreme heatwave in that region.

Attribution: counterfactual risk probabilities and risk ratios
In order to assess whether or not anthropogenic factors had a meaningful influence on the probability of the 2021 heatwave in a Granger sense (Granger, 1969;Risser and Wehner, 2017), we calculate both location-specific estimates of the risk probabilities and risk ratios for the PNW region (defined by the 40 • N-49 • N, 125 • W-117 • W bounding box); see Figure 4. For attribution results we now focus in on the estimates obtained from the model described in Section 3.1 (labeled M4 in Table 2) since this approach best accounts for known relationships in the data and minimizes the fraction of stations for which the GEV upper bounds are exceeded by the 2021 extreme temperatures.
First, consider the posterior median estimates of the risk probabilities for the factual p F (·) and counterfactual p C (·) scenarios, shown in Figure 4(a). Recall that p F (·) and p C (·) quantify the probability of experiencing an extreme daily maximum temperature measurement at least as large as what was observed during the heatwave under each climate scenario; furthermore, these probabilities represent the best seasonal forecast for extreme daily temperatures in advance of the 2021 summer using a statistical model given the values of the covariates in Eq. (4). The risk probabilities also summarize how unusual the heatwave event was at each station. The counterfactual risk probabilities are generally quite small: for a large majority of stations, the event had a probability of less than 0.1 and in many cases less than 0.01. For the factual scenario, risk probability estimates are also small, but not as small as for the counterfactual scenario: factual risk probabilities are generally between 0.05 and 0.3. As would be expected from the discussion around upper bounds in Section 4.2, there are cases where the posterior medians of the risk probability are zero (denoted by black triangles in Figure 4), meaning that the 2021 heatwave temperatures would have been considered impossible in the counterfactual or factual scenario. However, there are many stations for which the 2021 temperatures were quite "normal," even in the counterfactual scenario: across Nevada and much of northwest California, the observed  and Factual (right) climate scenarios, and subfigure (b) shows their ratio. For the risk probabilities, solid black triangles indicate gauged locations for which the risk probability best estimate is zero; for the risk ratios, solid black circles denote RR(s) = ∞ (wherein the counterfactual risk probability is zero but the factual risk probability is nonzero) and yellow '+' shows where the risk ratios are undefined (i.e., both counterfactual and factual risk probabilities are zero). In the rightmost panel, points that are plotted with additional '•' sign indicates that risk ratio estimates are statistically significantly different from 1.
heatwave temperatures had more than a 0.5 probability. In the meantime, there are still a small number of stations for which the factual risk probability estimates are exactly zeroin other words, for several stations we would have concluded that the 2021 temperatures were impossible even with increases to GHG forcing. However, it is the case that p F (s) = 0 if and only if p C (s) = 0, i.e., if the event was considered impossible in the factual scenario than it would also have been considered impossible in the counterfactual scenario. Next, consider the posterior median estimates of the risk ratios (i.e., the factual risk probability divided by the counterfactual risk probability), shown in Figure 4(b). Across the PNW, the dominant color in Figure 4(b) is purple/magenta (i.e., RR(s) > 1), indicating that the 2021 heatwave event was made more likely by increases to GHG forcing. The risk ratios are often larger than 3, indicating that increases to GHG forcing resulted in a 3fold increase in the probability of the extreme temperatures associated with the heatwave (and often much more than a 3-fold increase). In many cases, these increases in risk probability are statistically significant, in the sense that the lower bound of the 95% credible interval for the risk ratio is also greater than 1 (indicated by black circles around points in Figure 4(b)). This is a powerful result: using a more refined statistical model, we can both quantify nonzero probabilities for the PNW heatwave event as well as determine statistically significant changes to this probability in an out-of-sample sense -this was previously impossible (again, see Philip et al., 2021;Bercos-Hickey et al., 2022). The presence of zeros in the best estimates of the counterfactual risk probability means that in many cases the risk ratio is infinite; however, there are a still ∼ 7% of stations for which the risk ratio is undefined (i.e., both risk probabilities are zero, such that RR = 0/0), even using our best statistical model. (Note that the 7.5% of stations with undefined risk ratios differs from the 3.4% of stations for which the factual GEV upper bounds were exceeded due to the nonlinearity of posterior calculations.) Notably, there are some cases where the risk ratio is less than one (very few stations significantly so). The locations of these stations coincides with the regions where the GHG coefficient is negative; see Figure 2(b). They also correspond to the stations that have very low risk probabilities under both factual and counterfactual scenarios.
In summary, we now have a solution to the original problem posed by the severity of the heatwave event for existing attribution methods: while naïve statistical methods fail to quantify nonzero probabilites of the event (with or without climate change) and hence cannot make confident attribution statements, by using a sophisticated statistical model that more appropriately accounts for known features of the data (i.e., covariates, climatological dependence, and extremal dependence in the copula) we are able to robustly quantify the extent to which anthropogenic GHG emissions increased the likelihood of the observed extreme temperatures in 2021.

Conclusion
In this paper, we built a novel statistical model that accounts for the spatial coherence of both the climatological coefficients while also considering the natural and anthropogenic drivers for extreme temperatures. Using a Bayesian hierarchical framework, our method improves on the results from both Philip et al. (2021) and Bercos-Hickey et al. (2022) as more than 38% of the in situ 2021 TXx values could not be anticipated in their out-ofsample analyses, compared to 3.4% from our best model fit. As a result, we were able to estimate risk probabilities and make attribution statements for most GHCN stations while also quantifying the uncertainties for the estimates of the risk ratios. We showed that the increases in GHG forcing resulted in at least 3-fold increase in the probability of the extreme temperatures associated with the heatwave at the majority of the PNW locations in the US. Furthermore, we performed analyses under a set of simpler models to isolate the contributions from each component of the model to the improvements over predicting the 2021 PNW heatwave. We found that borrowing strength from neighboring stations, even only at the climatological level, maximizes the information contained in the historical data which sheds more light on the extreme events.
It should be noted here that while our analyses focus on the PNW region in the US, the most intense part of the heatwave occurred in British Columbia because the associated heat dome was centered there. However, we confined our analyses to the PNW region within the United States due to a lack of the homogenized temperature data in Canada from GHCN. McKinnon and Simpson (2022) increased the spatial coverage of British Columbia by including station data archived by Government of Canada (2022) and the sub-daily measurements in the Integrated Surface Database (Smith et al., 2011). In future work we plan to include in situ observations from both Canada and the US and re-run the analysis to get a more complete picture of the heatwave and to evaluate the anthropogenic and natural contributions to the event for the larger spatial domain. Bercos-Hickey et al. (2022) argued that the failure to contain the 2021 TXx values using the GEV upper bounds resulted from a violation of the "IID" assumption in the univariate extreme value theory, which requires the samples from each block (season in our case) to be statistically independent and identically distributed. As explained in the introduction, the underlying meteorological conditions responsible for the extreme temperatures are truly unique and different in 2021, indicating that the observed TXx from 2021 do not arise from the same distribution as the TXx values from the preceding 71 years. Including covariates to model year-to-year changes in the marginal GEV distribution relaxes this assumption such that we only require the "residuals" to be IID. Nonetheless, using covariates in the GEV parameters alone within a single station analysis framework is not enough to characterize the extreme 2021 temperatures (as shown in both Bercos-Hickey et al., 2022 and models M1 and M2; see Figure 3). Instead, the real difference-maker is accounting for spatial dependence in extreme temperatures: even accounting for just climatological dependence in the GEV covariates drastically reduces the fraction of "unexplainable" 2021 temperatures (28.1% to 10.3% for the factual scenario). However, incorporating physical covariates and accounting for climatological dependence in these quantities is also not enough: it is only when additionally accounting for data-level dependence via the copula that we can provide the best characterization of the 2021 extreme temperatures.
In Section 4, we focused on posterior median summaries for determining exceedance of upper bounds and assessing the risk ratios between the factual and counterfactual scenarios. A principal benefit of using a Bayesian framework is that one can summarize uncertainty robustly via either 95% credible intervals or non-binary probabilities of upper bound exceedance. As noted previously, it is the case that the upper limits of 95% credible intervals of the upper bounds for statistical model M4 contain the 2021 TXx at all stations! As such, it could be more informative to instead summarize the rarity of the 2021 TXx values with probabilities related to the upper bound failing to contain the event at each station using the posterior samples of our model parameters. However, for simplicity of presentation, in this manuscript we opted to show results for the upper bound posterior medians.
Finally, it is noteworthy that the flexible scale mixture model described in Section 3.1 concluded that TXx measurments in the PNW are asymptotically independent (AI; see Figure C.1 in Appendix C). A similar property emerged for measurements of extreme precipitation (Zhang et al., 2022); however, extreme precipitation events are generally more localized than extreme temperature events, such that we might expect the analysis in this paper to yield asymptotic dependence (AD). As discussed in (Zhang et al., 2022), one reason for extreme temperatures exhibiting AI is the size of the spatial domain considered in this analysis: analyzing smaller spatial domains may cause extreme temperatures to become asymptotically dependent. This sheds light on the importance of building a more flexible model that exhibits nonstationary tail dependence; we are currently pursuing such methodology in related work.     (2) Accept proposal β γ * k with probability p = min 1, ϕ(β γ * k | · · · )/ϕ(β γ (m−1) k | · · · ) , in which ϕ(β γ * k | · · · ) is the full conditional distribution for the corresponding basis coefficients given the data Y and the current values of all the remaining parameters:

C Empirical dependence strength
In spatial extremes, the bivariate measure of upper tail dependence χ h (u) is commonly used to describe the extremal dependence structure (Davison et al., 2012;Huser and Wadsworth, 2019). Specifically, for two locations s i and s j with a distance h > 0, the measure is defined by in which X i = X(s i ), X j = X(s j ) and F i and F j are their marginal distribution functions, respectively. Note that the statistical model in Section 3.1 assumes that {X(s) : s ∈ D} has a stationary and isotropic dependence structure; see Section 5 for discussion on relaxing this assumption.
To show how the measure χ h (u) varies for the TXx over the PNW and WWA regions, Figure C.1 shows the empirical and model-based (or parametric) estimates. Similarly to Figure 8 of Zhang et al. (2021), we generated data-driven estimates of χ h (u) empirically at distances of h = 0.3 (approximately 30 km), h = 1 (approximately 100 km) and h = 2 (approximately 200 km) by collecting all pairs of GHCN stations whose locations are h apart (within a small tolerance = 0.002), and then computing the empirical estimate of χ h (u) along with a 95% nonparametric confidence envelope. This way of obtaining the nonparametric estimator of χ h (u) resembles an empirical variogram estimator. For parametric estimates, we only overlay them on empirical estimates for the PNW region because our model is fit over this region. We take samples from the converged MCMC chain and use parameters from each iteration to generate the smooth χ h (u) based on the statistical model described by Eq. (6). The black lines signify averages of these smooth χ h (u) curves. The results in Figure C.1 demonstrate that our copula model (6) captures the dependence structure of the transformed block maxima well in that the parametric estimates of χ h (u) fall within the confidence bands of the empirical estimates.
On a side note, we notice that the empirical estimates of χ h (u) within the WWA region have more uncertainty than those of the PNW region. This is mainly due to the less number of stations in a smaller region. However, χ h (u) curve in WWA is on average lower than PNW at all three distances, indicating weaker dependence in the joint tail of the copula within WWA. This reveals a major limitation of our methodology -one set of dependence parameter δ and covariance parameters θ W governs the dependence structure of the entire domain, while the different subregions of a larger spatial domain may exhibit nonstationary dependence properties.  ; the 95% confidence band is also shown for empirical estimates. The model-based estimates from the spatial analysis is only shown for the PNW region because the dependence parameter δ and the covariance parameters θ W are imposed on the PNW region.