1 Introduction

Like other northern regions, Canada has experienced some of the most rapid warming on Earth. Canadian surface air temperature has increased by 1.7 °C during 1948–2012 (Vincent et al. 2015, see also, Vincent et al. 2012), a warming rate that is about twice that of the global mean temperature. The warming in mean temperature is associated with changes in daily temperature extremes including a reduction in cold nights, cold days and number of days with frost, as well as an increase in the number of warm nights and warm days (Vincent and Mekis 2006; Mekis et al. 2015). The warming is also associated with widespread physical impacts including a reduction in snow cover duration and extent (Brown and Braaten 1998; Mudryk et al. 2015; Najafi et al. 2016), and changes in streamflow regime (Zhang et al. 2001; Vincent et al. 2015).

There is a pressing need to adapt to the warming climate in Canada: the environmental and social impacts of warming are becoming increasingly evident (Warren and Lemmen 2014) and further warming is projected under all future emissions scenarios, including those such as RCP2.6 (IPCC 2013) that require stringent mitigation measures. Moreover, warming over Canada will continue to occur at a rate higher than in many other regions of the world. This is due to well established physical processes that lead to the polar amplification of warming (Kirtman et al. 2013 and reference therein). It is therefore important to produce robust projections of future climate under a range of future emissions scenarios to support adaptation planning. More robust projections can be achieved by understanding the role of human influence in the observed changes because such an understanding can be used to constrain model based projections, thereby increasing the confidence with which the projections can be used (e.g., Allen et al. 2000; Stott and Kettleborough 2002).

There is ample evidence of the influence of low-frequency internal climate variability on the Canadian climate. The positive phases of the Pacific Decadal oscillation (PDO) and El Nino-Southern oscillation (ENSO) are associated with warmer winters in western and central Canada (Bonsal et al. 2001; Shabbar and Yu 2012; Vincent et al. 2015), while the positive phase of the North Atlantic oscillation (NAO) is associated with colder winters in northeastern Canada (Bonsal et al. 2001; Vincent et al. 2015). The spatial patterns of temperature trends for two overlapping periods 1950–1998 and 1948–2012 are different. Trends computed for 1950–1998 showed very strong warming in the southwestern Canada and strong cooling in the northeastern Canada (Zhang et al. 2000). The cooling trend in the northeast disappeared completely when the trend analyses were conducted for 1948–2012 (Vincent et al. 2015). These finding indicate that low frequency variability influences the observed long-term temperature trends. This is consistent with modelling studies showing that natural internal variability can strongly influence high-latitude temperature changes over multi-decadal timescales (Deser et al. 2012; Fyfe et al. 2017).

There is also considerable evidence that anthropogenic forcing may have influenced Canadian temperatures. Decadal climate variability does not fully account for the observed temperature trends indicating that other factors may come into play (Vincent et al. 2015). Changes in temperature in the southern regions of Canada (Zhang et al. 2006), in Arctic temperature (Najafi et al. 2015, 2016), and in Arctic sea ice extent (Min et al. 2008; Kirchmeier-Young et al. 2017) have been attributed, at least in part, to the increase in atmospheric greenhouse gases (GHG) content due to human activities. While anthropogenic induced warming is expected to be stronger in northern high latitude areas than in the lower latitudes due to polar amplification, natural internal variability in the region is also considerably higher than in the lower latitudes. It is therefore necessary to determine whether anthropogenically induced warming is actually detectable in the presence of high internal variability at regional scales that are relevant to impacts. Yet, there is a paucity of such studies.

Anthropogenic influence has been detected in regional temperatures in some parts of the world, (e.g. Zwiers and Zhang 2003, Karoly and Stott 2006; Zhang et al. 2006; Yin et al. 2016). There is significant value in performing careful detection and attribution analysis in different regions due to their different climatic characteristics, possible regional differences in model quality and forcing response characteristics, and considerable value to policy makers and stakeholders in supporting their applications of climate science regionally. The objective of this paper is therefore to examine the influence of anthropogenic forcing on mean and extreme temperatures in Canada, taking decadal climate variability into account. The remainder of the paper is structured as follows: Sect. 2 describes the observational and model simulated data used in this study. Details about methodologies are provided in Sect. 3. Results of analyses are summarized in Sect. 4, including a comparison with the results when PDO and NAO influence is removed from temperature. A summary is provided in Sect. 5.

2 Data

2.1 Observations

We used the second generation homogenized Canadian temperature dataset (Vincent et al. 2012). This dataset has temporal coverage from 1900 to present, and has been adjusted to remove non-climatic discontinuities in both daily maximum and daily minimum temperature time series. It includes data collected at 338 locations across Canada from the surface measurement network, with denser coverage in more populated southern areas. Not all stations have observations for the entire period, and thus the number of stations with data differs from year to year (Fig. 1a). In particular, the climate observing network in northern Canada was established in the late 1940s; observations were rare in the north prior to 1948. For this reason, the time period we studied here starts from 1948.

Fig. 1
figure 1

a Time series of number of stations available in the second generation homogenized Canadian temperature datasets. b Sub-regions and spatial distribution of grid boxes with sufficient data for the analyses. The number of stations within each 5° × 5° longitude by latitude grid box is marked by a numerical value

We applied the following procedure to station data before computing regional average as described in Sect. 3.1. We calculated daily mean temperature as the average of daily maximum and minimum temperatures. Monthly mean temperatures are calculated from the daily values in months with fewer than 5 consecutive days missing, and fewer than 8 days missing in total, and are set to missing otherwise. As indices of daily ‘warm extremes’, we used the annual maxima of daily maximum (TXx, hottest summer day temperature) and daily minimum (TNx, hottest summer night temperature) temperatures. As indices of daily ‘cold extremes’, we used annual minima of daily maximum (TXn, coldest winter day temperature) and daily minimum (TNn, coldest winter night temperature) temperatures. These were extracted as annual largest or smallest values from the respective daily series, taking missing daily values into consideration. We first extracted monthly maxima or minima for months with fewer than three missing daily maximum or minimum observations respectively. The warm extremes, TXx and TNx, typically occur in the warm half year; therefore the annual warm extremes were extracted as the largest values of the respective monthly extreme values from April to September. Cold extremes, TNn and TXn, typically occur in the cold half year; the annual cold extremes were therefore extracted as the lowest values of the respective monthly extreme values from October to March. Annual extremes were set to missing if data availability was not sufficient to calculate the relevant monthly extremes for all months in the warm or cold half-years, as appropriate.

2.2 Model simulations

We used near-surface air temperatures simulated by climate models (GCMs) participating in the Coupled Model Intercomparison Project phase 5 (CMIP5) that are available from the CMIP5 archive to estimate temperature responses to various external forcings. At the time of analysis, the following simulations provided monthly mean temperature values to year 2012 and are used in this study for 1948–2012: 76 simulations conducted with 15 GCMs under the combined historical natural and anthropogenic forcings (ALL), 46 simulations conducted with 10 GCMs under the historical natural forcing (NAT), including solar and volcanic forcings and 35 simulations conducted with 6 GCMs under the historical anthropogenic forcing (ANT) including both greenhouse gases and anthropogenic aerosol forcings. Table 1 lists the models and their experiments whose monthly mean temperature data are used in this study. The four extreme temperature indices were calculated from daily values available from the CMIP5 archive following Sillmann et al. (2013). Daily values of maximum and minimum near-surface air temperatures simulated by the GCMs that ended in 2012 include 22 simulations conducted with 6 GCMs under ALL forcing, 26 simulations conducted with 6 GCMs under NAT forcing, 15 simulations conducted with 2 GCMs under ANT forcing (see Table 2 for details). These various ensembles were used to compute temperature response patterns under the relevant forcings.

Table 1 List of models, experiments, and ensemble sizes available for monthly mean temperature analysis. Bold font indicates ensemble sizes of extended model simulations that end in year 2012 and used for calculating signals. The experiments include model simulations under historical all forcing (ALL), historical anthropogenic forcing (ANT), historical natural forcing including solar and volcanic forcing (NAT), historical greenhouse gases forcing (GHG), historical anthropogenic aerosol forcing (AA), and historical land use and land change forcing (LU)
Table 2 The same as Table 1 but for extreme temperature indices study

Detection and attribution analyses (Sect. 3.2) require a large volume of model simulated data to estimate natural internal (unforced) variability. We used the following three types of data for this purpose. The first is the inter-ensemble differences of the above simulations that were used for signal estimation (bolded numbers in Tables 1, 2). The second is the inter-ensemble differences of simulations that were not used in signal estimation because they end in 2005. These include simulations under various external forcings including ALL, ANT, NAT, greenhouse gases (GHG), anthropogenic aerosols (AA), land use and land changes (LU) (summarized by non-bolded numbers in Tables 1, 2). The third is pre-industrial control (CTL) simulations; we use 21,710 years of monthly temperatures from pre-industrial control (CTL) conducted with 48 CMIP5 models and 13,780 years of daily temperatures from CTL simulations conducted with 29 models. The latter were used to derive extreme temperature indices. The large data samples from these various sources help to reduce estimation error in detection and attribution analysis.

3 Methods and data processing

3.1 Data processing

Canada is a large country with vastly different climates in different regions. Observed changes in the climate are also different across the country (Vincent et al. 2015). Here we considered national and sub-national (regional) average temperatures. We considered four large regions in Canada, including Eastern Canada (EC), the Prairies (PR), Northern Canada (NC) and the British Columbia and Yukon region (BCY; see Fig. 1b for regional boundaries). These regions, which are similar to the four climate regions described in Koeppe (1931), are defined according to their climates as well as societal and geographical considerations.

We analysed annual, winter (December to February, DJF) and summer (June–August, JJA) mean temperatures. To compute observed regional mean temperatures, we first computed temperature anomalies at individual stations and then gridded the station anomalies to 5° × 5° latitude-longitude resolution grid boxes by averaging available station anomaly values within each grid box. Station anomalies were calculated by subtracting the monthly mean values over the 1961–1990 time period for each calendar month as there are the least missing values during 1961–1990. We required at least 25 monthly values available during 1961–1990 when computing the monthly climatologies. Annual and seasonal mean temperature anomalies were computed as the average of the available monthly anomaly values if there were at least 9 or 2 months of data within the year or the season, respectively. To maintain large-scale features and reduce high frequency temporal variability, grid box values were averaged into non-overlapping 5-year mean series starting from 1948–1952 to 2008–2012. A 5-year mean grid box value was computed only if there were no more than two missing values over the five-year period. Grid boxes that have at least ten non-missing five-year means during 1948–2012 were retained for the subsequent analysis. There are 64 such grid boxes over the country (Fig. 1b).

When evaluating observations for the presence or absence of anthropogenic influence, the observations and model simulated signals should ideally be both centered to their pre-industrial climates. However, it is difficult to accurately estimate the real-world climatology in the Canadian domain for pre-industrial times due to lack of data. Here, we re-centered the data to the climatology of the first 30-years of the record we analyse, i.e., to the 1948–1977 mean prior to detection and attribution analysis. As the statistical regression model used in detection and attribution analysis does not involve an intercept term, using a different base period could result in slightly different results, but this does not alter overall conclusions (not shown). The extreme temperature indices were processed similarly to produce corresponding regional averages. That is, we extracted extreme temperatures from daily observations at individual stations, and then gridded the station anomalies to 5° × 5° latitude-longitude resolution grid boxes by averaging available station anomaly values within each grid box. These gridded station data were used to compute regional averages.

Different GCMs may use different horizontal resolutions; therefore, we interpolated monthly temperatures and the annual extreme temperature indices from each individual model’s native resolution to the 5° × 5° observational grid using the first order conservative remapping method implemented in the Climate Data Operators software package (https://code.zmaw.de/projects/cdo). These values are then masked (i.e., set to be missing) to mimic the availability of the observational data. In order to compare model simulated responses with observations, we used the observed data from 1948 to 2012 to mask the simulations for the same period such that the observational and model simulated data have the same spatial and temporal coverage. For the data that were used for the estimation of internal climate variability (or noise), we divided the available simulations into contiguous 65-year “chunks”, and then used the observed data from 1948 to 2012 to mask each chunk such that the first year of each chunk corresponds to 1948 in the observation. The same procedure that was used to compute regional averages from the observational data was applied to the masked simulation data to obtain regional averages for model simulations.

3.2 Detection method

We used the total least squares generalized linear regression based optimal fingerprinting method (Allen and Stott 2003; Ribes et al. 2013) to quantify scaling factors for model-simulated responses to forcings (referred to as signals hereafter), such that they best match the observations. The model simulated responses are computed as multi-model ensemble means of available simulations under specific combinations of external forcings. This method establishes a linear relationship between the observations (y) and the model simulated responses to the external forcings (X) such that \({\varvec{y}}={{\varvec{y}}^*}+{\varvec{\varepsilon}_0}\), \({\varvec{X}}={{\varvec{X}}^*}+{\varvec{E}}\) and \({{\varvec{y}}^*}={{\varvec{X}}^*}\varvec{\beta}\) where \({{\varvec{y}}^*}\) is the unknown climate response to external forcings in the observations, \({\varvec{\epsilon }}_{0}\) is the internal variability in the observations, X is the matrix of model simulated signals with each row corresponding to response to a particular forcing that is obtained by calculating a multi-model average, \({\varvec{X}}^{*}\) is the matrix of unknown true responses simulated by the models that could only be known if it were possible to average across an infinitely large population of simulations, thereby eliminating all traces of internal variability, E is a matrix of noise in the model simulated responses due to model internal variability that is not averaged out with the finite number of model simulations that are used to obtain X and β is a vector of regression coefficients or scaling factors that adjust the responses so that their sum best matches the observations. Evidence that the scaling factor βi for a given signal i is significantly greater than zero demonstrates the possible contribution from that signal to the observed changes, and thus helps support a detection claim. A scaling factor that is less than unity indicates overestimation of the response to forcing by the models; conversely, one that is larger than unity suggests underestimation. Similarly, a finding that the confidence interval for βi lies completely above zero and also includes one indicates that the main features of the forced response that is simulated by models are present in the observations with the expected amplitude, and thus this would support an attribution claim. Note however that a full detection and attribution assessment also requires that other competing explanations, including other possible forcings, have been eliminated, that model simulated internal variability is comparable to the internal variability component of the observations, and that the mechanisms leading to the changes are understood (Mitchell et al. 2001).

A signal corresponding to a particular external forcing was estimated by first calculating the ensemble average from multiple runs of individual GCM under the forcing, and then averaging the ensemble means from the available multiple GCMs. The estimates of covariance matrix \(C\)of internal variability \({\varvec{\epsilon }}_{0}\) required in the detection analysis were obtained from within-ensemble differences as well as pre-industrial control simulations. In addition to the forced simulations for 1948–2012, forced simulations for earlier periods were also used for noise estimation. The within-ensemble difference series have smaller variance due to the removal of the ensemble mean. This bias in the variance is corrected by multiplying the series by \(\sqrt {\frac{n}{{n - 1}}} ~\) (Ribes et al. 2013), where n is the number of individual runs from each GCM. In total, the within-ensemble differences and the control simulations yielded 1060 and 622 65-year chunks of noise data for the mean and extreme temperatures, respectively. These noise data were divided into two equal halves to provide two independent estimates \({\hat {C}_1}\) and \(\hat{C}_{2}\) of the internal variability covariance matrix C, which are required for optimization and uncertainty analysis respectively (Hegerl et al. 1996).

Given the need to estimate a large number of parameters in the covariance matrices and the limited availability of independent noise data, detection and attribution analysis has to be conducted using observation vectors y of relatively small dimension. We reduced data dimensionality by averaging the data both in time and space. We used non-overlapping 5-year mean series to represent the temporal evolution of temperature. The use of 5-year average removes a large portion of year-to-year variability and at the same time allows some decadal scale variability in the response to volcanic activity. To reduce spatial dimension, we considered regional means of 5-year averages only. Detection is conducted on individual regional and country average time series of 5-year means, implying a 13-dimensional problem (i.e. for the 1948–2012 time series). In addition, for Canada as a whole, we also considered the spatial evolution of temperature changes by combining the four sub-region averages as four spatial dimensions to conduct space–time analysis. A detection claim made using a space–time analysis should be more robust than a claim made from an analysis that is based only on time evolution as it requires matching more details between observations and model simulated signals. We used the implementation of Ribes et al. (2013) to solve the regression problem. This implementation uses a regularized covariance matrix estimator in place of the usual sample covariance matrix that has been used traditionally. The analysis assumes that GCM simulated variability is comparable to the variability in the observations. The validity of this assumption is examined using a residual consistency test (Allen and Tett 1999; Ribes et al. 2013).

We conducted a single-signal analysis in which only one signal was used as the predictor in the regression to determine if a particular signal was detectable, considering the ALL, ANT and NAT signals individually. These signals were computed from all available simulations under the respective forcings. As the signals are not necessarily orthogonal to each other, the detection of a particular signal in a one-signal analysis may not accurately reflect the overall contribution of the corresponding forcing to the observed changes. We therefore also conducted a two-signal analysis in which two signals were used jointly in the regression with an aim to determine if the influence of anthropogenic forcing (ANT) could be separated from that of natural external forcing. Ideally, we would have conducted the two-signal analysis using ANT and NAT signals, however, the number of simulations that are available under ANT forcing is rather limited. As an alternative, we conducted the two-signal analysis using ALL and NAT signals, assuming linear additivity of the signals (Tett et al. 2002; Jones et al. 2013): \({T_{{\text{ALL}}}}={T_{{\text{ANT}}}}+{T_{{\text{NAT}}}}\). In the two-signal analysis, we used all available simulations forced with ALL or NAT forcing to improve the signal estimates. As the number of models forced with ALL forcing was larger than that with NAT forcing, the difference between ALL and NAT may also reflect the difference in the composition of the samples of models. For this reason, we also conducted the two-signal analysis with ALL and NAT simulations produced by the same models to examine the sensitivity of the results to this choice. Results from the two sets of analysis are qualitatively similar, with smaller uncertainty ranges in the scaling factors from the analysis using all available simulations.

The linear regression used here assumes that the residual term e0 follows a Gaussian distribution. Extreme values such as extreme temperature indices considered here have skewed non-Gaussian distributions at individual locations. Our analyses of the extremes, however, were conducted on 5-year mean regional averages. It is therefore reasonable to approximate their distributional behaviour with Gaussian distributions since the average of a large number of extreme values is expected to converge asymptotically to a Gaussian distribution. Other methods have been used to explicitly consider the statistical properties of extremes (Zwiers et al. 2011) or to convert extreme values into quantities that are not skewed distributed (Christidis et al. 2011; Zhang et al. 2013; Kim et al. 2016), but are not judged to be necessary in our case.

3.3 Estimation of attributable warming

We used ordinary least squares method to estimate a linear trend and its 90% confidence interval in the observed regional averages of mean or extreme temperatures. The warming attributable to external forcings is calculated by first estimating the linear trends in the noise-reduced model simulated signals (Allen and Stott 2003), and then multiplying the trends by the scaling factors. We only consider uncertainty in the scaling factors when estimating the 90% confidence intervals of the attributable warming. That is, the confidence intervals are obtained by multiplying the trend estimates by the 90% confidence intervals of the scaling factors. We use the scaling factors obtained from the 4-spatial dimension analyses when computing attributable warming because the 4-dimension analyses impose the most exacting requirement on the ability of the estimated signals to represent the observed changes. Specifically, these analyses require a match with observations in terms of both the temporal and spatial patterns of change. Attributable warming in each of the sub-regions is also calculated using the results from the 4-dimensional analysis as opposed to relying on the individual sub-region analysis. The sub-region analyses are conducted on smaller spatial scales and as such involves larger uncertainty. In contrast, using the scaling factors from the 4-dimensional analysis allows us to base the constraint on model simulated warming on observations from all of the sub-regions. Warming attributable to ALL forcing is based on 1-signal detection and attribution analyses and warming separately attributable to ANT and NAT forcing are obtained from the two-signal analysis with the observational data after the effect of NAO and PDO has been regressed out (OBS*, see Sect. 4.4).

4 Results

4.1 Trends

Figure 2 shows the linear trends in observed annual, summer, winter mean temperature as well as annual extreme temperature indices TXx, TNx, TXn, and TNn for the period 1948–2012, based on the gridded values. The trends were computed with a non-parametric method (Zhang et al. 2000) because the linear least squares fit would not be ideal for the extreme temperature indices that are unlikely to follow a Gaussian distribution. Significant warming is observed in annual mean temperature almost everywhere in the country, with the largest magnitude in Northern Canada. Warming is also statistically significant in almost every grid box for summer temperature. Winter temperature shows very strong and significant warming in Canada except the eastern most portion of the country where trend is weaker and is not statistically significant. There is a general warming trend in warm extremes (TXx and TNx) but a weak cooling trend appears in the Canadian Prairies. Cold extremes (TXn and TNn) warm more than the increase in winter mean temperature, particularly apparent in eastern Canada. Overall, while warm seasons and hot extremes are getting warmer, cold seasons and cold extremes are becoming much less cold.

Fig. 2
figure 2

Observed trends (°C/65 years) for the period 1948–2012 for a annual, b summer and c winter mean temperature and for annual extreme temperature indices d TXx, e TNx, f TXn and g TNn over Canada. Grids with trends that are statistically significant at the 5% level are marked with black dots. Areas with insufficient data are marked white

Figure 3 shows annual mean temperature anomalies relative to the 1948–1977 climatology averaged across Canada from observations and model simulations under ALL and NAT forcings. The long-term changes in the observations and the multi-model ensemble mean under ALL forcing match well, and the observed values generally fall within the 90% range of the individual simulations under ALL forcing. On the other hand, the long-term changes in the observations do not match the model simulated response to NAT. In particular, the observations begin to depart from the NAT ensemble mean from the early 1980s and become consistently higher than the 95th percentile of the model simulated Canada average annual mean temperature anomalies under NAT forcing during the last decade of the observational record. We conclude that the observed variability and change is consistent with model simulated variability and change under ALL forcing but inconsistent with that under NAT forcing alone. Similar conclusions can be made when comparing observed and simulated extreme temperature indices (not shown).

Fig. 3
figure 3

Annual mean temperature anomalies (in °C, relative to 1948–1977) for Canada, 1948–2012 from observations (black line) and multi-model ensembles under “ALL” forcing (green line), and “NAT” forcing (blue line). Shading shows the 5–95% ranges of the individual model simulations for ALL and NAT

4.2 Detection results for annual and seasonal mean temperatures

The detection analysis results for mean temperatures are summarized in Fig. 4. The influence of ALL forcings, that is, the effect of the combination of anthropogenic and natural external forcings, is detected in annual and seasonal mean temperatures for Canada and for each of the 4 sub-regions except in summer mean temperatures in the Prairies. The residual consistency test failed for annual and winter mean temperatures in the Prairies but it is unclear if this is due to a problem in the model simulation of internal variability or due to excessive noise in the observed dataset, such as might occur if there is insufficient observational coverage in the region. The best estimates of the scaling factors are often larger than one, but their 90th percent confidence intervals generally include one. These indicate that the model simulated changes under ALL forcing tend to be somewhat smaller than observed, but are consistent with the observed changes. In the two-signal analysis, ANT is detected when accounting for the estimated effect of NAT in annual and seasonal mean temperatures in Canada. The two signals can also be separately detected when accounted for the estimated effect of the other signal in regional detection in NC in winter and in all regions except PR in summer. The fact that the influences of different external forcings can be separated from each other improves our confidence in attributing the observed changes to the causes. NAT is also detected when accounting for the estimated effect of ANT in annual mean temperature in Canada, as well as in winter mean temperature in one-dimension temporal analysis (Fig. 4a) and in summer mean temperature in 4-dimension space–time analysis (Fig. 4b). Overall, the detection of anthropogenic influence in Canadian temperature is very robust as it is generally detected both in the national average and also in regional averages of annual, winter and summer temperatures, even when the estimated effect of NAT is accounted for. Additionally, it is also detected when the spatial distribution of temporal changes in the observation and model simulated response are compared. Note that model simulations appear to have underestimated the observed changes. In almost all cases, the residual consistency tests pass, with the exception of winter temperature for Canada in the 4-dimension setting, winter temperature for BCY and PR regions and annual mean temperature for PR region in which model simulated variability appears to be too small. Again, as the observational data are sparse, it is unclear if this is due to underestimation of the observed variability or the lack of sufficient observational data to accurately describe regional temperature variability.

Fig. 4
figure 4

Best estimates of scaling factors (marks) and their 90% confidence intervals (bars) for ALL (green) from single-signal detection analyses, and for ANT (red) and NAT (blue) from two-signal optimal detection analyses of annual (Ann), winter (DJF) and summer (JJA) mean temperatures. An asterisk indicates the failure of the residual consistency test. In these cases, the ratio between model simulated internal variability and the estimate of internal variability obtained from observations using regression model residuals is found to be significantly less than one

4.3 Detection results for annual temperature extremes

Results for annual extreme temperature indices are summarized in Fig. 5. In one-signal analyses, the influence of ALL forcings is detected for the whole Canadian domain using either the one- or four-region analysis configurations for the four extreme indices (Fig. 5a, b). Considering the best estimates of scaling factors of each index for each forcing, the models overestimate observed changes in TXx as indicated by scaling factors that are smaller than unity, but estimate the observed changes in the other extreme temperature indices well. These results are consistent with many findings of detection and attribution on these four indices at global and broader regional scales (e.g., Zwiers et al. 2011; Min et al. 2013; Wen et al. 2013; Kim et al. 2016; Yin et al. 2016). In the two-signal analyses for Canada as a whole, the ANT signal is clearly detected for all four indices with the best estimates of scaling factors again being larger than unity for cold extremes. In contrast, NAT is not consistently detected in the two signal Canada-wide analyses for the warm extremes. The two-signal detection results at the sub-regional scale (Fig. 5c–f) are similar to the results when considering Canada as one domain, with ANT signals being detectable in most cases while the influence of NAT is generally not detectable for warm extremes. In all cases, the residual consistency test was passed.

Fig. 5
figure 5

Results from single- (ALL, green) and two-signal optimal detection analyses (ANT, red and NAT, blue) for extreme temperature indices for Canada and its regions. Best estimates of scaling factors (marks) and their 90% confidence intervals (bars) are shown for each index (TXx, TNx, TXn and TNn)

4.4 PDO and NAO influence on detection results

As low frequency climate variability internal to the climate systems, such as that represented by the PDO and the NAO, affect temperature trends over the period we analysed, we used linear regression to remove the effect of low frequency variability associated with the PDO and NAO on trends in the observed seasonal and annual mean temperatures and the four extreme temperature indices. This allows us to explore the extent to which observed temperature trends may be explained by natural internal variability. Monthly indices of the PDO [the leading principal component of North Pacific monthly sea surface temperature variability (poleward of 20N for the 1900–1993 period)] and the NAO (the normalized pressure difference between a station on the Azores and one on Iceland, available online at http://research.jisao.washington.edu/pdo/PDO.latest and https://crudata.uea.ac.uk/cru/data/nao/ respectively) were first averaged to seasonal means. A multiple regression model was then used to regress regional mean seasonal temperature series with the seasonal PDO and NAO indices as predictors. The seasonal residuals that remain in the observations after removing PDO and NAO influences are then averaged to derive annual means. For annual extreme temperature indices, regressions of the regional mean temperature indices on the corresponding seasonal NAO and PDO indices were also carried out (warm extremes TXx and TNx, which are based on the warm half year, were regressed with summer circulation indices; conversely, cold extremes TNn and TXn, were regressed with winter NAO and PDO indices). The detection analysis process was repeated on temperature after the influence of these natural modes of variability was removed from the observations. The influence of model simulated NAO and PDO was not removed from model simulated data as different GCMs would have slightly different NAO and PDO patterns. This should not cause problems in interpreting the results as it means that the estimate of model simulated variability could be slightly larger, potentially resulting in larger confidence intervals for the scaling factors and thus making the detection results potentially more conservative. There is an apparent negative correlation between model simulated winter temperature response to ALL forcing and the observed NAO index for the winter season because the observed winter NAO index exhibits a negative trend. It is not possible to determine whether the latter is forced or due to internal variability, but the presence of this correlation raises the possibility that a portion of temperature response to external forcing may have been removed when removing the influence of low frequency variability. Such confounding of a forced response with low-frequency internal variability would inadvertently reduce estimates of warming attributed to forcing and overestimate the amount of the amount of warming over the 65-year might due to low frequency variability.

Figure 6 shows the time series of Canada-wide average 5-year mean anomalies of annual and seasonal mean temperatures and extreme temperature indices before and after the removal of PDO and NAO influence. The influence of PDO and NAO increased annual mean temperature by 0.2–0.4 °C in the period of 1948–1975, and then cooled the mean temperature in the period of 1980–2005 in comparison with the original observations. The patterns are similar for winter mean temperature, but the PDO and NAO impacts on Canadian summer mean temperature are more limited. The one- and two-signal detection results obtained using PDO and NAO removed observations are very similar to those for the original series, with a very small improvement in some scaling factor estimates, which are now closer to unity (not shown), this is consistent with findings from Kim et al. (2016) for extreme temperature at global and regional scales. The results of residual consistency tests are also improved, all residual consistency tests passed, suggesting that the models may not have simulated enough PDO and NAO induced temperature variability.

Fig. 6
figure 6

All-Canada 5-year average anomaly series from observations (OBS, black), PDO and NAO removed observations (OBS*, dashed lines) for annual, winter and summer mean temperatures and indices of extreme temperatures

4.5 Warming attributable to different factors

Trends in mean and extreme temperatures over Canada in both the unadjusted time series of 5-year means and in the adjusted time series with PDO and NAO influences removed are summarized in Fig. 7. Over the period 1948–2012 that we analysed in this study, annual, winter, and summer mean temperature averaged over Canada increased 1.7, 2.8 and 1.1 °C, respectively. The warming trend is reduced by 0.5 and 0.9 °C in the annual and winter mean temperatures respectively after adjustment for PDO and NAO influences, but these adjustments have little impact on summer temperature trends. The four extreme temperature indices, TXx, TNx, TXn, TNn, increased 0.8, 1.2, 3.1 and 3.2 °C respectively. Adjusting for the PDO and NAO influence does not affect TXx trends, but decreases warming trends for TNx by 0.1 °C, TXn by 0.3 °C and TNn by 0.5 °C, respectively.

Fig. 7
figure 7

Observed warming in unadjusted observations (OBS, black bars) and in PDO and NAO removed observations (OBS*, grey bars) for annual, winter and summer mean temperature and the four temperature indices. Warming attributable to ALL forcing (green bars) is based on 1-signal detection and attribution analyses and warming separately attributable to ANT (red bars) and NAT (blue bars) forcing are obtained from the two-signal analysis with OBS*. Thin black bars indicate 5–95% uncertainty ranges

Warming in the Canada-wide averages that is attributable to external forcing is also summarized in Fig. 7. At the scale of the entire country, most of the observed warming can be attributed to anthropogenic influence, with natural forcing playing a minor role. These estimates suggest that anthropogenic influences caused a warming of approximately 1.0, 1.9, and 1.0 °C in annual, winter and summer mean temperatures, respectively, while natural external forcing may have contributed approximately 0.2, 0.2, 0 °C to the observed warming in annual, winter and summer mean temperatures. Anthropogenic influence contributed increases of approximately 0.7, 1.0, 2.8 and 2.8 °C to the indices of annual temperature extremes TXx, TNx, TXn, TNn, respectively, with substantially larger impacts on cold extremes (TXn, TNn), moderating those extremes by about 3 °C, than on warm extremes (TXx, TNx), which were intensified about 1 °C. Both of these changes are large from an impacts perspective. Contributions from natural external forcing to observed warming in these extreme temperatures are negligible.

Warming that is observed and that can be attributed to different forcings in different parts of Canada is summarized in Table 3. Warming is uneven across the country especially in annual and winter mean temperatures, with the strongest warming observed in Northern Canada and the weakest in Eastern Canada. The difference in annual mean temperature trends between regions is as large as 1.1 °C for 1948–2010. Removal of the influence from the NAO and PDO reduces but does not eliminate this difference. Warming attributable to ALL external forcing is largest in Northern Canada, where external forcing is estimated to have contributed a warming of 3.4 °C to the winter temperature trend, of which 3.0 °C is from ANT forcing.

Table 3 Observed warming in unadjusted observations (OBS) and in PDO and NAO removed observations (OBS*) for annual, winter and summer mean temperature and the four temperature indices (TXx, TNx, TXn, TNn) for All-Canada (CA) and 4 Canadian regions (BCY: NC: PR: EC). Warming attributable to ALL forcing is based on 1-signal detection and attribution analyses and warming separately attributable to ANT and NAT forcing are obtained from the two-signal analysis with OBS*. The 5–95% uncertainty ranges are in brackets. Unit is °C over 1948–2012

5 Summary and conclusion

This study compared observed changes during 1948–2012 in mean and extreme temperatures over Canada with those simulated by the CMIP5 multi-model ensemble using a modern optimal detection method. We also considered the possible influence of low frequency variability associated with the influence of the PDO and NAO on long-term changes in temperatures. The impact of these oscillations is only apparent in the winter season, reducing the observed 2.8 °C winter mean temperature increase to 1.9 °C in the PDO and NAO removed data. This effect is also reflected in annual mean temperature trend as winter has the strongest warming among four seasons, reducing the observed 1.7 °C annual mean temperature increase to 1.2 °C after PDO and NAO removal. These oscillations have a lesser impact on extreme temperatures, having a discernable impact only on the winter extremes TXn (3.1 °C warming reduced to 2.8 °C) and TNn (3.2 °C warming reduced to 2.7 °C).

Warming caused by external forcing is now detected and attributed for Canada as a whole and for sub-regions within Canada. Detection of the model simulated response to the combination of anthropogenic and external natural forcing (ALL) is very robust, as ALL is detected in mean and extreme temperatures not only in Canada as a whole but also in its four sub-regions. That is, the magnitude of the warming signal is such that it is now detectable in regions of relatively small size and of highest levels of natural variability. Moreover, attributed warming to all external forcings is substantially greater in the north (as much as 3.4 °C in northern Canada winter temperatures over the 65-year period studied, equivalent to a winter warming rate of 5.2 °C per century) than in the more southerly regions. The models may have underestimated observed warming in Canada.

While the overall sensitivity of our results to removing PDO and NAO influences from Canadian temperature data is minor, model simulated changes are nevertheless somewhat more consistent with observed changes after the data are adjusted to account for those influences. This is expected since PDO and NAO removal reduces the possibility of confounding forced change with temporary trends associated with low frequency internal variability in the observations. The results of Vincent et al. (2015) and those of this study provide a complementary picture of the causes of observed changes in Canadian temperatures. Multi-decadal oscillations such as PDO and NAO have clearly influenced long-term changes in temperature, affecting different regions differently as well as inducing some warming over the 65-yr period of record in Canada’s mean temperature, particularly in winter. We would expect this influence to wane as the period of record continues to lengthen, sampling more complete oscillations of these large-scale phenomena. However, while the influence of these large-scale modes of variation is apparent in Canada’s temperature record, most of the observed warming cannot be explained by natural internal variability. Most of the observed warming can be explained by external forcing on the climate system, with anthropogenic influencing being the dominate factor.