1 Introduction

Climate mean and extreme changes on local, regional and continental scales directly influence human and natural systems (IPCC 2013). Three sources of uncertainty in climate change predictions involve external forcing, model response, and internal variability (e.g., Hawkins and Sutton 2011; Deser et al. 2012a, b, 2014, and references therein). As detailed in previous studies, external forcing uncertainty arises from incomplete knowledge of anthropogenic forcings employed in emission scenarios. Model uncertainty comes from different climate changes in response to the same external forcing simulated by various climate models, which are constructed with different dynamical cores, physics, and resolutions. Internal variability is natural climate variability and results from processes intrinsic to the climate system, in particular coupled interactions between the atmosphere, oceans, land, and cryosphere.

The role of internal variability in climate change projections on regional scales has been found to be significant and comparable to externally forced responses (e.g., Deser et al. 2012a, 2014). Different approaches have been proposed to isolate internal climate variability generated responses from externally forced climate responses. In particular, the influence of internal climate variability has been considered as the residual from different order polynomial fits to time series of forced climate responses (e.g., Hawkins and Sutton 2009, 2011; Boer 2009). It has also been considered by analyzing a suite of climate simulations forced by identical external forcings but with slightly different atmospheric initial conditions in a given climate model (e.g., Collins and Allen 2002; Deser et al. 2012a; Wallace et al. 2014; Kay et al. 2015). In addition, Thompson et al. (2015) suggested that the internal climate variability in projected climate trends could be estimated from the statistics of observed climates and an unforced climate simulation of sufficient length.

Recently, the large ensemble approach has been widely used in exploring the role of internal climate variability in various aspects of climate changes, such as climate projections of surface air temperature and precipitation (e.g., Deser et al. 2012a, b, 2014; Kay et al. 2015; Chen et al. 2019), atmospheric circulation (Kang et al. 2013), sea level rise (Hu and Deser 2013; Deser et al. 2012b), Arctic sea ice (Wettstein and Deser 2014; Kirchmeier-Young et al. 2017), and the extratropical atmospheric forcing on the tropical El Niño-Southern Oscillation variability (Chen and Yu 2019). Results from large ensemble simulations enable a robust quantification of the responses to external forcings and internal climate variability. Uncertainties in the forced response are found to be generally larger for sea level pressure compared to precipitation, and smallest for surface air temperature. Large-scale atmospheric circulation variability is mainly responsible for the spread in future climate changes.

The purpose of this study is to document the role of internal variability in climate change projections of North American surface air temperature and temperature extremes in a 50-member ensemble of climate simulations conducted with the second-generation Canadian Earth System Model (CanESM2). CanESM2 is a global climate model participating in the Phase 5 of the Coupled Model Intercomparison Project (CMIP5) of the World Climate Research Programme (WCRP). As demonstrated in Sheffield et al. (2013), most CMIP5 models, including CanESM2, can reasonably well reproduce the observed variability over North America from intraseasonal to decadal time scales. Here, we analyze the projected surface air temperature trends in CanESM2 over the period 2010–2055 and compare them to those obtained from large ensemble simulations with similar external forcings in other climate models. In particular, we would like to know if there are projected cooling trends over North America in CanESM2 as appeared in ensemble members of some other models (Deser et al. 2014). We then examine the projected trends of extreme temperatures in CanESM2. Unlike many previous studies that focused on the trends of regional mean extreme temperatures (e.g., Sillmann et al. 2013b; Kay et al. 2015), we analyze the spatial pattern and its uncertainty of extreme temperature trends over North America. We examine the diversity of the projected trends, compare forced and internal components of the projected trends, and analyse the influence of large-scale atmospheric circulation variability on these trends. This analysis would also support our further studies to explore the physical processes of the interannual variability and projected changes of extreme temperatures using the outputs simulated from CanESM2 and its next generation CanESM5 model participating in CMIP6. In addition, previous studies indicated that global warming would be most pronounced during the cold season over high latitudes (Manabe and Stouffer 1980; IPCC 2013). CMIP5 models can reasonably well simulate extreme temperatures and the simulations are generally better in boreal winter than summer (Krueger et al. 2015). Hence, we only analyze the wintertime temperature trend in this study.

The rest of the paper is organized as follows: Sect. 2 describes the observational and reanalysis datasets, CanESM2 climate model simulations, and analysis methods we used. Section 3 evaluates the model performance in simulating the climatological means of surface air temperature and temperature extremes over North America. Section 4 documents the climate change projected trends of North American surface air temperature, inter-member trend variance, forced and internal components of the trends, and contributions of large-scale atmospheric circulation variability on the trends. Section 5 describes the corresponding analyses as in Sect. 4 but for warm and cold extremes. A summary and discussion are given in Sect. 6.

2 Data and Methodology

  1. (a)

    Observational and reanalysis data

The monthly surface air temperature (SAT) data employed in this study are extracted from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis (NCEP hereafter, Kistler et al. 2001) on standard 2.5° × 2.5° grids. The observed monthly temperature extremes analyzed are extreme indices from the HadEX2 dataset (Donat et al. 2013) on 3.75° × 2.5° (longitude-latitude) grids. We use the warm day index TX90 (cold day index TX10) with the percentage of time when daily maximum temperature is greater than its 90th (less than its 10th) percentile. The percentile-based indices are derived using 1961–1990 as the base period and based on a 5-day running window. A bootstrap resampling procedure is applied in the index calculation to avoid inhomogeneity at the boundaries between the base and out-of-base periods (Zhang et al. 2005). The extreme temperatures are examined on a seasonal basis following previous studies (e.g., Klein Tank et al. 2009; Sillmann et al. 2013a). We analyze December-February (DJF) mean SAT and extreme TX90 and TX10 indices for the period from 1951 to 2000. Years refer to the January dates throughout this study.

  1. (b)

    CanESM2 climate model simulations

Outputs from a large ensemble of climate simulations conducted with the second-generation Canadian Earth System Model are employed to explore projected climate trends. CanESM2 is a fully coupled ocean–atmosphere-land-sea ice climate model (http://climate-modelling.canada.ca/climatemodeldata/data.shtml; Arora et al. 2011; von Salzen et al. 2013). Its atmospheric component is a spectral model with T63 triangular resolution of approximately 2.81° × 2.81° grids and 35 vertical layers extending from the surface to the stratopause (Scinocca et al. 2008). The oceanic component was developed from the NCAR Community Ocean Model, with a horizontal resolution of approximately 1.41° × 0.94° (longitude-latitude) and 40 vertical levels. Detailed descriptions of CanESM2 can be found on the above website. The climate simulations we analyzed consist of 50 ensemble members of 150-year simulations for the period from 1950 to 2100, with slightly different initial conditions for each run in 1950 (Kirchmeier-Young et al. 2017; Chen and Yu 2019). Each of the simulations is forced by identical historical greenhouse gas concentration, sulfate aerosols, and other observation based radiative forcings from 1950 to 2005 and the representative concentration pathway 8.5 scenario (RCP8.5, van Vuuren et al. 2011; Collins et al. 2013) from 2006 to 2100. Hence, differences seen in these simulations are due solely to internally generated climate variability (Deser et al. 2014; Wallace et al. 2014).

The monthly mean SAT, sea level pressure (SLP), and temperature extremes (TX90 and TX10) from historical and climate change simulations are employed. As in the observations, the modelled extreme indices are calculated using 1961–1990 as the base period. The climate model simulated variables are compared to the corresponding reanalysis/observation based results to evaluate the model performance. To compare with the observed temperature extremes (Fig. 1), TX90 and TX10 indices are analysed on 3.75° × 2.5° (longitude-latitude) grids. Except this, all modelled variables examined are interpolated to 2.5° × 2.5° grids.

Fig. 1
figure 1

DJF climatological means of SAT (left column), TX90 (middle column) and TX10 (right column) for the period 1951–2000. Results from the 50-member EnM, NCEP/observation, and their difference (EnM-NCEP/Observation) are shown from the top to the bottom rows. Contour intervals are 6.0 °C (3.0 °C) for SAT mean (difference), and 0.5% with different color bars for the mean and difference of extreme temperatures

  1. (c)

    Analysis methods

All analyses are based on DJF seasonal means of variables considered. The multi-member ensemble mean (EnM) quantity is obtained by summarizing the statistics of individual members, i.e. the ensemble average of the 50 member results. The linear trend for the time series of interest is calculated using a least squares method. We partition the temperature trend into external anthropogenic radiative forced and internal climate variability generated components: TrendTotal = TrendForced + TrendInternal, following Deser et al. (2014). The externally forced contribution of the trend is obtained by averaging the projected trends over the 50 ensemble members (i.e., the EnM), whereas the internal variability generated component is estimated by subtracting the externally forced component from the total trend.

Deser et al. (2014) investigated North American SAT trends over the period 2010–2060 using the NCAR Community Climate System Model, version 3 (CCSM3) climate simulations. Their simulations are under the Special Report on Emission Scenarios (SRES) A1B, with carbon dioxide concentrations increasing from approximately 380 ppm in 2000 to about 570 ppm in 2060. The CanESM2 climate change simulations are forced by the RCP8.5 scenario, where carbon dioxide concentrations increase to approximately 570 ppm in 2055. Hence, we calculate SAT trends over the period 2010–2055 using CanESM2 simulations to make our results reasonably comparable to those in Deser et al. (2014), although both models are also driven by other external forcings. Nevertheless, the SAT trend pattern reported here is not sensitive to slight changes of the period analyzed.

The relative agreement of individual member-based spatial patterns with the reanalysis/observation or EnM result is mainly evaluated by calculating second-order space–time climate difference statistics and is illustrated using a modified Taylor diagram termed a BLT diagram (Boer and Lambert, 2001). A BLT diagram displays the pattern correlation, ratio of modeled to reanalysis/observation or EnM variances, and relative mean square difference between each member and reanalysis/observation or EnM quantities. The ratio of variances compares the smoothness of the spatial pattern in an individual member to the reference pattern. The relative mean square difference is a scaled mean square difference, in terms of the reference variance, between the pattern in an individual member and the reference pattern. In addition, an empirical orthogonal function (EOF) analysis is performed to characterize dominant modes of inter-member variability of SLP trends.

3 Climatological means of surface temperature and temperature extremes

Figure 1 displays the DJF climatological means of North American surface air temperatures and extreme indices of warm and cold days over 1951–2000 for the 50-member EnM, observation based result, and their differences. For SAT, the similarity of the spatial patterns with a poleward temperature decrease is quite remarkable in the EnM and NCEP reanalysis (Fig. 1, left column). The pattern is also robust for all ensemble members. Figure 2a presents the relative agreement of the DJF mean SATs for individual members compared to the NCEP reanalysis result using a BLT diagram. The pattern correlations are 0.97–0.98 for the 50 members, with a mean value of 0.98, over North America. Meanwhile, all members have nearly identical spatial variances compared to NCEP as is seen from the ratio of variances of the simulated to reanalysis (approximately 100%, the green numbers and dashed circles in Fig. 2a). Accordingly, the distinction between individual members is barely discernible in the diagram. In addition, the mean square difference between model and reanalysis based patterns (the blue numbers and solid circles in Fig. 2a) indicates that the values for all members, as well as the EnM, are low (about 7.0%). Overall, the reanalysis based SAT pattern over North America is well simulated by CanESM2. However, all the historical simulations, and their EnM, show a warm bias (about 1–2 °C) over most of the continent, especially around the Great Lakes (~ 3.2 °C; Fig. 1, lower left panel). The model bias will be partially removed when considering climate change trends below.

Fig. 2
figure 2

BLT diagrams showing the pattern correlation, the ratio of model to NCEP/observation variances, and the relative mean square difference between model and NCEP/observation values of DJF climatological means of SAT (left), TX90 (middle), and TX10 (right) for the period 1951–2000 over North America

The EnM patterns of warm day index TX90 (Fig. 1, middle column) and cold day index TX10 (Fig. 1, right column) also bear resemblance to the observed results. The patterns show relatively high TX90 values over the southwestern US and central-eastern Canada, as well as low TX10 values over the southeastern US. The pattern correlation of TX90 (TX10) between the EnM and observation is 0.79 (0.88) over North America. It is noted that the climatological means of the warm and cold extreme indices are slightly different from 10% in both the observations and EnM. This is mainly because of climate differences between the base period 1961–1990 used to define the extreme indices and period 1951–2000 analyzed in this study and of daily temperature variability (Yu et al. 2019). In addition, the EnMs of TX90 and TX10 exhibit slightly high intensity biases (less than 1% compared to the observations) over most of the continent, with exceptions over eastern Canada and eastern Mexico for TX90.

The relative agreement of climatological mean TX90 and TX10 patterns for individual members with the corresponding observed results is further compared in Fig. 2b, c. For warm extreme TX90 (Fig. 2b), there exhibits a considerable range of pattern correlations, from 0.08 to 0.84, indicating large uncertainties in the DJF mean TX90 simulations. Meanwhile, 49 members show lower spatial variances compared to the observation, implying that the simulated TX90 patterns are generally flatter than the observed pattern over North America. In addition, the mean square difference between each member and the observation shows that the EnM value is lower than values of most members. For cold extreme TX10 (Fig. 2c), considerable differences are also apparent in pattern correlation and spatial variance across the 50 members. The EnM tends to have the best result in terms of pattern correlation and the mean square difference between model and observation patterns. Overall, the ensemble mean patterns of warm and cold extreme indices are qualitatively similar to the corresponding observed patterns. However, both TX90 and TX10 show much larger uncertainties in spatial pattern and magnitude of the DJF mean indices across the 50 CanESM2 members than that seen in SAT, indicating that the internal climate variability influences temperature extremes more than SAT.

4 Surface temperature trend

  1. (a)

    Projected trend, trend variance, and signal-to-noise ratio

Figure 3 displays the DJF mean SAT trends over the period 2010–2055 for each of the 50 ensemble members and the EnM. The overall signature of poleward amplification of temperature trends can be seen in all ensemble members, as well as the EnM. However, the SAT trends also reveal diversities, in terms of magnitude and spatial structure, across the ensemble members. The regional means of area-weighted SAT trends over land within the North American domain (20°–70°N, 170°–50°W), as shown in Fig. 3, range from 2.80 °C/45 year (member 26, M26) to 5.17 °C/45 year (member 40, M40), with an average of 3.85 °C/45 year for the EnM. Figure 4 further compares the projected SAT trends for individual members to the EnM. The pattern correlations over the North American domain range from 0.79 to 0.97, with a mean of 0.92, confirming the broad similarity of the spatial patterns across the 50 members. Additionally, the ensemble members exhibit a wide range of spatial variances compared to their EnM, with the ratio of variances ranging roughly from 50 to 150%. The uncertainty seen in the CanESM2 SAT trend is generally consistent with those obtained from other climate models (Deser et al. 2014; Kay et al. 2015). However, unlike those reported from the CCSM3 and ECHAM5 (Max Planck Institute climate model, version 5) large ensemble simulations (Deser et al. 2014), no cooling is observed over North America in the CanESM2 trends projected for 2010–2055. This suggests that CanESM2 tends to be less uncertain in projecting North American warming trends in the next half century than CCSM3 and ECHAM5.

Fig. 3
figure 3

DJF mean SAT trends over 2010–2055 (interval 1.0 °C/45 years) for each of the 50 ensemble members and their EnM

Fig. 4
figure 4

BLT diagrams displaying the pattern correlation, the ratio of model to EnM variances, and the relative mean square difference between each model and EnM values of SAT trends for the period 2010–2055

The internal variability of the SAT responses to external forcing can be further quantified by the ensemble standard deviation (ESTD) of SAT trends across the 50 members. The relative contributions of external forcing and internal variability can be measured by the signal-to-noise ratio (SNR) of the ensemble mean SAT trend to the ESTD of the trends. SNR is a measure commonly used to compare the level of desired signal to the level of background noise. Figure 5 shows the ESTD and SNR of the SAT trends over 2010–2055. The inter-member variability is high over Canada and Alaska. The variability pattern resembles those obtained from CCSM3 and ECHAM5, while the variability magnitude is more comparable to that from ECHAM5 and slightly lower than that from CCSM3 (Fig. 5 in Deser et al. 2014). The SNR exhibits high values (greater than 3.5) over northern Canada, Alaska, and the southwestern US, and relatively low values mostly confined to the midsection of North America. This suggests that the SAT response to external forcing tends to be less detectable over the central parts than elsewhere over the continent. The SNR pattern is attributed to both the ensemble mean (Fig. 3, last panel) and inter-member variability trends. The SNR magnitude is also more comparable to that in ECHAM5 than in CCSM3. The differences among the three ensembles result likely from differences in model configuration and physics and/or different ensemble sizes used in these calculations.

Fig. 5
figure 5

Standard deviation (left, interval 0.2 °C/45 years) and signal-to-noise ratio (right, interval 0.5) of SAT trends among the 50 ensemble members

  1. (b)

    Forced and internal components of projected trend

As described above, the externally forced SAT trend (the EnM trend) reveals an expected feature of poleward-intensified warming, with warming trends below 2 °C/45 year over the southeast US, approximately 2–4 °C/45 year over the western-central US and southern Canada, and about 4–7 °C/45 year over northern Canada (shading in Fig. 6, middle row). The forced SAT trend from CanESM2, including magnitude and spatial distribution, is comparable to those from other climate models (Deser et al. 2014; Kay et al. 2015).

Fig. 6
figure 6

Total (top) and their forced (middle) and internal (bottom) components of DJF mean SAT trends over land (color shading, unit °C/45 years) and SLP trends (contours, interval 1.0 hPa/45 years) over the period 2010–2055 for member 26 (left column) and member 40 (right column)

The internally generated SAT trends from the 50 ensemble members exhibit large diversities, like those seen in the total trends (Fig. 3). The regional means of internal trends over North America range from − 1.05 °C/45 year (M26) to 1.32 °C/45 year (M40) among the 50 members. The color shading in Fig. 6 displays the total, externally forced and internally generated SAT trends for the least and most warming members. For the total trend (Fig. 6, top row), the two cases exhibit broadly similar patterns compared to the EnM, with a spatial correlation of 0.94 (0.92) between M26 (M40) and EnM. However, notable differences between them are apparent in trend magnitude, especially with differences of 3–6 °C/45 year over Canada. This is also clearly evident in the internally generated trend (Fig. 6, bottom row). In addition, the magnitude of the internal SAT trend is comparable to that of the forced trend (Fig. 6, middle row), especially over western-central Canada. Hence, the total trend is contributed by both the externally forced and internally generated components. Moreover, the internal temperature trend exhibits large-scale spatial coherence rather than small-scale noise structure, consistent with Deser et al. (2014). Nevertheless, pattern correlations between most ensemble members and the least (most) warming member M26 (M40) are not high over North America, owing primarily to large spatial variations of action centers of the SAT trends (not shown).

  1. (c)

    Dynamically adjusted trend

The large-scale atmospheric circulation anomaly and its induced temperature advection influence atmospheric temperature anomalies, particularly in boreal winter. The circulation-induced internal variability is found to play a crucial role in the climate change projection (e.g., Deser et al. 2012a, 2014; Holmes et al. 2016). Figure 6 also compares the total forcing to the externally forced and internally generated components of SLP (contours) and SAT trends over 2010–2055 for M26 and M40. For the total trend (Fig. 6, top panels), the circulation influence is evident over the northern parts of North America in M40, but not clear in M26. The anomalous warm maritime air flows from the North Pacific into the northern parts of the continent in M40, which contributes to the warming trends over Canada and the northern US. The circulation influence is also evident in the forced SAT trend over the northern portions of the continent, but not clear in the south (Fig. 6, middle panels). EnM reveals a decrease of SLP trends over the North Pacific, indicating the Aleutian low is projected to enhance in CanESM2. The deepening of the Aleutian low is consistent with previous studies, which demonstrated an intensification and northward expansion of the Aleutian low in response to greenhouse warming (e.g., Meehl and Washington 1996; Gan et al. 2017). The deepening of the Aleutian low with global warming can be driven by an El Niño-like warming in the tropical Pacific (Gan et al. 2017). The Aleutian low change has also been found to be associated with changes in storm tracks at midlatitudes (Salathe 2006) as well as remote influences of the Atlantic Ocean (Zhang and Delworth 2007) and Arctic sea ice loss (Sun et al. 2015; Deser et al. 2016).

For the internally generated trend (Fig. 6, bottom panels), which shows the dominant SAT changes of reverse signs in M26 and M40 cases, the circulation influence is also obvious. In particular, the anomalous cold air flows from the north in M26, which follows the dominant anticyclonic anomaly with the center of action over the North Pacific, leading to cold trends over North America. By contrast, the anomalous warm maritime air flows from the North Pacific into Canada and the northern US in M40, which follows the cyclonic anomaly over the North Pacific, resulting in warm trends over the northern parts of North America.

To further demonstrate the impact of circulation-induced variability on SAT trends, we perform an EOF analysis on the SLP trends within the Pacific-North American domain (20°–70°N, 150°E–50°W), as shown in Fig. 6, across the 50 ensemble members. The first three EOF modes account for 84.3% of the inter-member SLP trend variance and are well separated from subsequent EOFs as per the criterion of North et al. (1982). The SLP anomalies in association with the principal components (PCs) related to these three modes are analyzed (not shown). The leading PC associated SLP anomalies reveal an Arctic Oscillation (AO, Thompson and Wallace 1998) like pattern, with opposite SLP anomalies over the Arctic region and northern mid-latitudes. The second PC associated SLP anomalies somewhat resemble the Pacific-North American pattern (PNA, Wallace and Gutzler 1981), with a dominant center of action over western Canada and Alaska. The third PC related SLP anomalies feature a Western Pacific (WP, Wallace and Gutzler 1981) like pattern, with a dominant action center over the Kamchatka Peninsula. The three orthogonal SLP trend predictor patterns are then determined for SAT trends at each grid cell using the method of partial least squares. We remove the influence of these three SLP trend predictor patterns to get the dynamically adjusted version of SAT trends for each ensemble member. This is generally similar to the dynamically adjusted method used in Deser et al. (2014) and Wallace et al. (2014).

Figure 7 compares the total and dynamically adjusted SAT trends for the two members discussed above. By partially removing the circulation-induced component of internal trend variability, M26 and M40 resemble more than their total trend counterparts do. This is apparent both visually in spatial pattern and magnitude (Fig. 7) and from a spatial correlation calculation. The correlation between M26 and M40 is 0.77 over North America for the total SAT trend, and increases to 0.91 for the dynamically adjusted trend. In addition, the adjusted M26 and M40 trend patterns are more comparable to the forced trend, i.e. the EnM (shading in Fig. 6, middle row), with the spatial correlation between M26 (M40) and EnM increasing slightly from 0.94 (0.92) for the total SAT trend to 0.98 (0.94) for the adjusted trend. The result demonstrates the influence of circulation-induced internal variability on the SAT trends. Figure 8 further compares the dynamically adjusted SAT trends for 2010–2055 over the North American domain for the 50 individual members to the EnM in a BLT diagram. The pattern correlations between each member and EnM range from 0.83 to 0.98, with a mean of 0.94 that is slightly higher than that of the total trend (0.92). The ratios of individual member variances to the EnM variance as well as the mean square differences between each member and EnM for the adjusted trend are also slightly lower compared to the counterparts for the total trend (cf. Fig. 8 with Fig. 4). These results suggest that the spread in climate change projections in the ensemble simulations is partially due to the dynamically induced internal variability.

Fig. 7
figure 7

SAT trends (top) and their dynamically adjusted components (bottom) over the period 2010–2055 for member 26 (left column) and member 40 (right column). Contour internal is 1.0 °C/45 years

Fig. 8
figure 8

As in Fig. 4, but for the corresponding dynamically adjusted results

5 Warm and cold extreme trends

  1. (a)

    Projected trends, trend variances, and signal-to-noise ratios

Figure 9 displays the DJF mean warm extreme TX90 trends over the period 2010–2055 for each ensemble member and the EnM. Figure 10 shows the corresponding cold extreme TX10 results. In general, TX90 increases and TX10 decreases over North America. This suggests that more severe warm days and fewer extreme cold days are projected over North America in the next half century, generally consistent with previous studies (e.g., Meehl and Washington 1996; Sillmann et al. 2013b). The patterns of the warm and cold extreme trends differ from that of the SAT trends described above. In addition, the projected TX10 trends show larger uncertainties across the 50 members compared to the TX90 trends.

Fig. 9
figure 9

As in Fig. 3, but for TX90 trends (interval 5.0%/45 years)

Fig. 10
figure 10

As in Fig. 3, but for TX10 trends (interval 1.0%/45 years)

The projected TX90 trends exhibit consistent increases of warm days over North America across the ensemble members, with strong warm extreme increases over the western coast and northern Canada, accompanied by relatively weak extreme increases over the midsection of the continent (Fig. 9). The pattern correlations between each member and the EnM range from 0.70 to 0.96, with a mean of 0.87. Meanwhile, most members have relatively higher spatial variances than the EnM variance, with the variance ratio below 150% (Fig. 11, upper-left). These further indicate the good correspondence in the individual members, and differences mainly in trend magnitude. Hence, an increase in extreme warm temperature days will be seen in the future, with high risks of extreme warm days over the western coast and northern Canada, although large uncertainties are seen in the climatological means of individual ensemble members (Fig. 2). By contrast, the projected TX10 trends reveal considerable differences across the 50 ensemble members (Fig. 10). The TX10 trends are dominated by decreases in extreme cold days for most members, but with large variations in spatial pattern and magnitude. Additionally, 9 out of the 50 members show patches of cold extreme increases over North America, especially with TX10 increases over the central US in member 5. The EnM of TX10 trends reveals a somewhat northwest-to-southeast orientation belt of strong cold extreme decreases, extending from Alaska to the northeast US. The pattern correlations between each member and this EnM are low, with a wide range from 0.14 to 0.68 and a mean of 0.43. In addition, all members have much higher spatial variances than the EnM variance, with the variance ratio ranging approximately from 150 to 350% (Fig. 11, upper-right). The spread in individual values of spatial correlations and variances indicates that the individual members depart considerably from the EnM. Overall, the agreement of the TX90 trends among individual members is evident, whereas large uncertainties are apparent in the TX10 trends.

Fig. 11
figure 11

BLT diagrams displaying the pattern correlation, the ratio of the modeled to EnM variances, and the relative mean square difference between each model and EnM values of DJF mean trends of TX90 (left) and TX10 (right) for the period 2010–2055. The total trend and its dynamically adjusted component are shown at the top and bottom rows, respectively

Figure 12 further shows the ESTD and SNR patterns of the TX90 and TX10 trends. For warm extreme trends, the inter-member variability features a broadly uniform structure, with relatively high variances over the western-central US. The SNR pattern tends to be dominated by the EnM trend pattern, with strong signals over the western coast and northern Canada, accompanied by relatively weak signals over the midsection of North America. This SNR pattern also bears some resemblance to that of the SAT trends (Fig. 5). In contrast, large variances of the cold extreme trends across the ensemble members appear over the western-central parts of southern Canada and the US, especially the Great Plains of North America. The SNR pattern is contributed by both the EnM and ESTD patterns of TX10 trends, with strong signals over the southwestern US, northern Canada, and the northeastern US.

Fig. 12
figure 12

Standard deviation (top, interval 0.5%/45 years) and signal-to-noise ratio (bottom, interval 0.5) of TX90 (left column) and TX10 (right column) trends among the 50 ensemble members

The large spread of projected TX10 changes has also been found in previous studies, especially for weak climate change scenarios (Sillmann et al. 2013b). It remains unclear what is responsible for the spread difference between the projected TX10 and TX90 trends. The difference may be attributed to different changes in the surface temperature advection and local radiative and turbulent fluxes that are directly related to the temperature variation through the surface energy balance (e.g., Campbell and Vonder Haar 1997; Durre and Wallace 2001; Loikith and Broccoli 2012; Horton et al. 2015; Krueger et al. 2015; Tamarin-Brodsky et al. 2019), and/or to differences in remote driving mechanisms for warm and cold extremes (e.g., Johnson et al. 2018).

  1. (b)

    Forced and internal components of projected trends

The temperature extreme trend is also decomposed into externally anthropogenic forced and internal climate variability generated components. For warm extreme TX90, the regional means of total trends over North America range from 13.01%/45 year (member 27, M27) to 21.18%/45 year (member 40, M40) across the 50 members, with a mean of 16.83%/45 year that is also the regional mean of the forced trend. Hence, the regional means of internally generated TX90 trends have a changing range from − 3.82%/45 to 4.35%/45 year. Figure 13 shows the total, externally forced, and internally generated TX90 trends for the two members with the lowest and highest regional mean trends. These two members have broadly similar structure of total trends compared to their EnM (Fig. 13, top and middle rows), with a spatial correlation of 0.89 (0.83) between M27 (M40) and EnM over North America. The most notable discrepancy between them is in trend magnitude, with differences of 10–25%/45 year over the central parts of North America. The difference is also clearly evident in the internally generated TX90 trend (Fig. 13, bottom row), which shows large-scale spatial coherence over North America. In addition, the internally generated trend is comparable to the forced trend in the central parts of the continent, and hence contributes noticeably to the total trend.

Fig. 13
figure 13

Total (top) and its forced (middle) and internal (bottom) components of TX90 trends for member 27 (left column) and member 40 (right column). Contour interval is 5.0%/45 years

For cold extreme TX10, the regional means of total trends range from − 7.25%/45 year (member 17, M17) to − 2.89%/45 year (member 5, M5) over the 50 members, with a mean of − 4.40%/45 year, over North America. Figure 14 presents the total, externally forced and internally generated TX10 trends for M17 and M5. Unlike those seen in TX90 trends, the total TX10 trends in these two members are quite different from their EnM or the forced trend (Fig. 14, top and middle rows), including spatial structure and magnitude of the trends. M17 exhibits decreases of extreme cold days over the whole continent, with striking changes over western-central Canada and the northern US. In contrast, M5 reveals decreases of extreme cold days mainly over Canada and the western and northeastern US, accompanied by cold extreme increases over the central US. The two patterns are hence uncorrelated, with a correlation of 0.04 over North America. The pattern correlation between M17 (M5) and EnM is 0.58 (0.39). In addition, the magnitude of internally generated TX10 trends (Fig. 14, bottom row) is also comparable to that of the forced trend, especially over the central portions of the continent.

Fig. 14
figure 14

Total (top) and its forced (middle) and internal (bottom) components of TX10 trends for member 17 (left column) and member 5 (right column). Contour interval is 1.0%/45 years

The internally generated TX90 and TX10 trends show large-scale spatial coherence over North America, a feature seen above in the SAT trends. The uncertainties in the climatological means and projected changes of extreme warm and cold days suggest that the simulation of North American temperature extremes is likely very uncertain and needs to be applied with caution.

  1. (c)

    Dynamically adjusted trends

Given the association between large-scale circulation anomalies and the synoptic-scale weather variability (e.g., Wallace and Gutzler 1981; Yu et al. 2019), as analyzed above, we remove the influence of the first three SLP trend predictors to get dynamically adjusted versions of the warm and cold extreme trends. The lower panels of Fig. 11 compare the spatial pattern correlation and variance for individual members to their EnMs for the adjusted TX90 and TX10 trends. For warm extreme TX90, the pattern correlations between each member and the EnM range from 0.71 to 0.96, with a mean of 0.89 that is slightly higher than that of the total trends (0.87) described above. For cold extreme TX10, the pattern correlations still show a wide range from 0.21 to 0.72, with a mean of 0.47 that is also slightly higher than the mean of the total trend (0.43). In addition, the ratios of individual member variances to their EnM variance for the adjusted TX90 and TX10 trends are slightly lower compared to those for the corresponding total trends.

The total and dynamically adjusted TX90 and TX10 trends for the two members discussed above in Figs. 13 and 14 have further been compared. For TX90, the dynamically adjusted trends for M27 and M40 exhibit similar structure compared to the corresponding total trends, with slightly weaker trend values over the centers of action (not shown). The pattern correlation increases slightly from 0.89 for the total trend to 0.94 for the adjusted trend between M27 and EnM, and from 0.83 to 0.86 between M40 and EnM. For TX10, the dynamically adjusted trends also show slightly weaker action centers than their total trends for M17 and M5 (cf. Fig. 15 with the top row of Fig. 14). In addition, a cold extreme decrease belt extending from northwest Canada to the northeast US prevails in the dynamically adjusted patterns for both members (Fig. 15), which broadly follows the extreme decrease belt apparent in the EnM (Fig. 14, middle panels). Accordingly, the two adjusted patterns resemble more than their total trend counterparts do. The pattern correlation of the dynamically adjusted trend between M17 and M5 is 0.60, much higher than that of the total trend (0.04) discussed above. In addition, the pattern correlation increases from 0.58 for the total trend to 0.67 for the adjusted trend between M17 and EnM, and from 0.39 to 0.56 between M5 and EnM.

Fig. 15
figure 15

Dynamically adjusted TX10 trends for member 17 (left) and member 5 (right). Contour interval is 1.0%/45 years

Overall, by partially reducing the contribution of the circulation-induced components of the TX90 and TX10 trends, the individual ensemble members resemble their ensemble mean more than the total trends do. However, the circulation influence on the projected temperature extreme trends is generally modest, especially for the cold extreme TX10 (Fig. 11, right column).

6 Summary and discussion

Based on a 50-member large ensemble of climate simulations conducted with CanESM2, together with the observational and NCEP reanalysis data, we study the role of internal climate variability in climate change projections of wintertime surface air temperature and temperature extremes over North America. The CanESM2 performance is evaluated by comparing the DJF climatological patterns of surface air temperatures as well as extreme indices of warm and cold days in its historical simulations with the corresponding observation based results. We then focus on exploring the projected trends of mean and extreme temperatures over the period 2010–2055, by analyzing the anthropogenic radiative forced simulations with the RCP8.5 scenario. The projected mean surface temperature trends obtained from CanESM2 are also compared to those from other climate models. We examine the external anthropogenic forced and internal climate variability generated components of the projected temperature and temperature extreme trends, and analyze the influences of large-scale circulation-induced variability on these trends. The main findings from this analysis can be summarized as follows.

  1. 1.

    CanESM2 large ensemble simulations confirm the important role of internal climate variability in the projected SAT trends, including magnitude and spatial structure, which is consistent with those obtained from other climate models. However, CanESM2 tends to be less uncertain in projecting North American warming trends in the next half century than CCSM3 and ECHAM5. The SAT response to external forcing is more detectable over northern Canada, Alaska, and the southwestern US than the midsection of the continent.

  2. 2.

    The role of internal climate variability in temperature extreme simulations is apparent in both the DJF climatological mean and projected trend. The ensemble mean climatological patterns of TX90 and TX10 indices are qualitatively similar to the corresponding observed patterns. However, both indices exhibit large uncertainties in spatial structure and magnitude of the DJF means across the 50 ensemble members. More severe warm days and fewer extreme cold days are projected over North America in the next half century. Yet the projected TX10 trends show larger uncertainties compared to the TX90 trends. The ensemble mean of the TX90 trends reveals high risks of extreme warm days over the western coast and northern Canada. The difference across the ensemble members mainly appears in magnitude. Additionally, the signal-to-noise ratio pattern of the TX90 trends is similar to that of the SAT trends. By contrast, the ensemble mean of the TX10 trends exhibits an extreme cold day decrease belt extending from Alaska to the northeast US. The individual members depart considerably from this ensemble mean in spatial pattern and magnitude of the trends. The SNR pattern of the TX10 trends reveals strong signals over the southwestern US, northern Canada, and the northeastern US.

  3. 3.

    The internal climate variability generated components of the mean and extreme temperature trends exhibit large-scale spatial coherences over North America, as well as variations across ensemble members. The internally generated trend is comparable to the externally forced trend, especially in the central parts of North America, and hence contributes noticeably to the total trend. The large-scale atmospheric circulation-induced temperature variability influences these projected trends. Removing the influences of the three leading SLP trend predictor patterns on the mean and extreme temperature trends, the dynamically adjusted trends for individual members resemble the corresponding ensemble mean more than the total trends do.

The dynamically adjusted method applied here in estimating the large-scale circulation influence on climate change projected trends may not be the best approach in capturing the circulation-induced variability on temperature extremes. For example, the linkage between the projected sea level pressure and temperature extreme trends is not obvious in the southern parts of North America (cf. contours in the middle row of Fig. 6 with the middle rows of Figs. 13 and 14). The circulation anomalies in association with the temperature extreme trends may be more closely related to the synoptic-scale circulation variability (e.g., Favre and Gershunov 2006; Yu et al. 2019). Regional-scale dynamical and thermodynamical anomalies as discussed in Sect. 5 would also lead to surface temperature and temperature extreme anomalies through the variation of the surface energy budget. These relevant issues remain to be investigated. In addition, McKinnon et al. (2017) found that the internal variability in forced climate trends over North America tends to be overestimated in large ensemble climate simulations. Internal variability simulated by climate models may be inconsistent with observations due to model biases. It remains to be clarified whether CanESM2’s consistent SAT warming trends are related to its warm model bias.