The role of aerosols and greenhouse gases in Sahel drought and recovery

We exploit the multi-model ensemble produced by phase 5 of the Coupled Model Intercomparison Project (CMIP5) to synthesize current understanding of external forcing of Sahel rainfall change, past and future, through the lens of oceanic influence. The CMIP5 multi-model mean simulates the twentieth century evolution of Sahel rainfall, including the mid-century decline toward the driest years in the early 1980s and the partial recovery since. We exploit a physical argument linking anthropogenic emissions to the change in the temperature of the sub-tropical North Atlantic Ocean relative to the global tropical oceans to demonstrate indirect attribution of late twentieth century Sahel drought to the unique combination of aerosols and greenhouse gases that characterized the post-World War II period. The subsequent reduction in aerosol emissions around the North Atlantic that resulted from environmental legislation to curb acid rain, occurring as global tropical warming continued unabated, is consistent with the current partial recovery and with projections of future wetting. Singular Value Decomposition (SVD) applied to the above-mentioned sea surface temperature (SST) indices provides a succinct description of oceanic influence on Sahel rainfall and reveals the near-orthogonality in the influence of emissions between twentieth and twenty-first centuries: the independent effects of aerosols and greenhouse gases project on the difference of SST indices and explain past variation, while the dominance of greenhouse gases projects on their sum and explains future projection. This result challenges the assumption that because anthropogenic warming had a hand in past Sahel drought, continued warming will result in further drying. In fact, the twenty-first century dominance of greenhouse gases, unchallenged by aerosols, results in projections consistent with warming-induced strengthening of the monsoon, a response that has gained in coherence in CMIP5 compared to prior multi-model exercises. Electronic supplementary material The online version of this article (10.1007/s10584-018-2341-9) contains supplementary material, which is available to authorized users.


Introduction
observations, NARI explains~50% of the variance in twentieth century Sahel rainfall variability, all time scales included .
In the BCMIP5 simulation of Sahel rainfall^section, we show that the CMIP5 multi-model mean reproduces the twentieth century evolution of Sahel rainfall, rendering discussion of its attribution to external forcing possible. In the BClimate change in the Sahel through the lens of oceanic influence^section, we propose an explanation that reconciles studies of variability, which typically seek to relate oceanic forcing and regional rainfall response, with studies of change, which typically seek to attribute regional response to external forcing, natural or anthropogenic, by developing an argument for indirect attribution of Sahel rainfall, through the influence of anthropogenic emissions on sea surface temperatures. In the BTransformation of predictors^sub-section, we exploit Singular Value Decomposition (SVD) to describe the linearly independent linear combinations of the two SST predictors defined above, i.e., the averages of sub-tropical North Atlantic and global tropical SSTs. In the BLinear regressions of Sahel rainfall^sub-section, we use the resulting singular vectors in bivariate linear regressions that describe their influence on Sahel rainfall. We present explicit formulas for these SVDbased predictors in the bivariate case; although, as noted by Preisendorfer (1988), such formulas had been available since a publication by Pearson (1901), their presentation in terms of non-dimensional parameters, introduced here, helps to visualize the underlying relationships and facilitates the interpretation of results. In particular, we show that when the ratio of standard deviations of the original predictors is close to one, the bivariate SVD rotation results in the sum and difference of these original predictor time series. In our case, this means that NARI, the difference of NA and GT, which we defined above based on a physical argument, is nearly proportional to one of the SVD predictors and is exhaustively complemented by the other, which is nearly proportional to the sum of the same SST indices, and represents global warming. The derivation of all necessary formulas is relegated to the Appendix. In the BConclusions: Past is not prologue^section, we conclude with a synthesis that finds coherence in attributing past Sahel drought to anthropogenic emissions, while lending credence to projections of a wetter future.

CMIP5 simulation of Sahel rainfall
The multi-model ensemble produced by the Coupled Model Intercomparison Project in support of the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC), referred to as CMIP5 (Taylor et al. 2012), reproduces the observed twentieth century evolution of Sahel rainfall and projects wetter end of twenty-first century conditions more coherently than its CMIP3 predecessor (e.g., compare Biasutti and Giannini (2006) with Biasutti (2013)).
We restrict our analysis to 29 models participating in CMIP5, because only these have all the thermodynamic and dynamical variables necessary to evaluate their moisture budget (the subject of a parallel study). The models used are named in the tables in Supplementary Material. We focus on ensemble means, because we are interested in attribution of observed twentieth century Sahel rainfall variability, i.e., in describing its externally forced component, whether the external forcing is natural (i.e., due to variations in incoming top of the atmosphere insolation and spikes in aerosol concentration from volcanic eruptions) or anthropogenic (most notably, emissions of aerosols and GHGs from fossil fuel burning). Single-model ensemble sizes range between 1 and 10 in the historical simulations of the twentieth century and in the RCP8.5 scenario simulations of the twenty-first century. (In the case of the pre-Industrial control, typically, one simulation is run per model, albeit of varying length.) For each model and type of simulation, when more than one is available, we average realizations in the model's ensemble mean. We then compute the multi-model mean as the average of the single models' ensemble mean, a procedure which further suppresses the manifestation of internal variability.
The top panels in Fig. 1 display standardized Sahel rainfall (the precipitation average for land points in 10°N−20°N, 20°W−40°E, in each model's original resolution, over July-September (Nicholson 1983;Giannini et al. 2003)): twentieth century  time series are depicted on the left-hand side and twenty-first century  time series on the right-hand side. To facilitate comparison, we standardize all the time series: each model's Fig. 1 Time series of standardized Sahel rainfall (top) and time series of the dominant and trailing SVD modes of the predictors' space p 1 (middle) and p 2 (bottom) for the single-model ensemble means of 29 CMIP5 models (thin green, turquoise, and orange lines) and for the multi-model ensemble mean (thick yellow, brown, and blue lines), from the historical simulations  in the left panels, and from the RCP8.5 simulations  in the right panels. The red lines represent observations ensemble mean, in the thin green lines, is standardized by its own mean and standard deviation. The multi-model mean, in the thick yellow line, is first computed as the average of the original (non-standardized) single-model ensemble mean precipitation time series and then standardized. The reason for proceeding in this order is that we wish to characterize the externally forced component of Sahel rainfall. Since the models' ability to capture the relevant physical processes varies, if we computed the multi-model mean on the standardized singlemodel ensemble means, we would give all models, Bgood^and Bbad,^equal weight. Conversely, if the multi-model mean is taken on rainfall anomalies, the internal variability that dominates models that do not respond to external forcing is weighted out, because by definition it does not have a preferred temporal sequencing. This allows the temporally coherent response to external forcing to emerge. For a comparison to non-standardized time series, i.e., Sahel rainfall anomalies in units of mm/day, the reader is referred to Fig. 2    colored by standardized values of Sahel rainfall, with depictions of the angle of rotation ϕ, standard deviations of the original (σ 1 , σ 2 ) and SVD-based predictors (λ 1 1/2 , λ 2 1/2 ) and the concentration ellipse (dotted): see text for details Biasutti (2013): there it can be seen that (i) the amplitude of variations in the multi-model mean is smaller than that in observations, as expected, given that the single real-world realization expresses the combination of internal variability and externally forced change. However, (ii) the observed evolution falls within the envelope defined by individual ensemble member simulations. The thick red line in the left panel of our Fig. 1 represents observed Sahel precipitation (Becker et al. 2013). Its correlation with the multi-model mean is 0.37 and is corroborated by a visible qualitative match: the first half of the century is characterized by variations around the mean, and is followed by a precipitous decline that lasts until the mid-1980s, and a partial recovery since. Standardization, because it normalizes the amplitude of variation, makes it possible to clearly discern the minimum in the multi-model mean, which occurs in 1982 and coincides with the first of three driest consecutive years in observations, 1982-1984, and with the eruption of El Chichón (Robock and Liu 1994). In contrast, the twenty-first century multimodel mean (top, right panel of Fig. 1) shows an upward trend, albeit with periods characterized by interannual-to-decadal variations that mask the trend, most notably at the end of the century.

Climate change in the Sahel through the lens of oceanic influence
We build an argument for indirect attribution based on the influence of external forcingexternal to the coupled ocean-atmosphere system-on the SST patterns that affect Sahel rainfall. In the BTransformation of predictors^sub-section, we concisely describe oceanic influence using SVD applied to our set of two original predictors for Sahel rainfall, subtropical North Atlantic and global tropical SSTs. In the BLinear regressions of Sahel rainfallŝ ub-section, we use the resulting singular vectors as predictors in bivariate regression. Results of the analyses of the multi-model means and observations are reported in Tables 1 and 2.
Results of the analyses of the single-model ensemble means are collected in tables in Supplementary Material. The Appendices provide a brief review of the nomenclature and definitions for SVD in the general m-dimensional case (Singular Value Decomposition of an m-dimensional predictors' space) and for linear regression (Use of time-dimensional singular vectors as predictors), as well as the derivation of the explicit form for the parameters of SVD in the bivariate case (Singular Value Decomposition of a two-dimensional predictors' space; derivation of Eqs. (1)-(5)).  Columns from the left to right show the simulated century and the type of CMIP simulation (or observational reference period), the standard deviations of the original predictors, σ 1 and σ 2 , in°C (for GT and NA, respectively), their correlation coefficient, ρ, the ratio of their standard deviations, k = σ 1 /σ 2 , angle ϕ from Eq.
(3), in arc degree, variances λ 1 and λ 2 , in (°C) 2 , respectively, explained by the leading and trailing modes, corresponding to time series p 1 and p 2 , and the same in percent of the total variance λ 1 + λ 2

Transformation of predictors
We relate differences between our predictors' covariance structures to differences in the external forcing applied to the pre-Industrial control and twentieth and twenty-first century simulations. The latter two types of simulations are run with time-varying external forcing, natural and anthropogenic: variations in each realization of a single model's ensemble are a combination of internal variability and externally forced change, and their ensemble mean is taken to filter out internal variability. The former are run with constant external forcing, including CO 2 concentrations held fixed at pre-Industrial levels (280 ppm), the intent being to provide a truthful estimation of each model's internal variability. We repeat analyses on each model's first 100 years of the pre-Industrial control simulation (typically, there is only one such simulation per model, of variable length) and on each model's ensemble mean (with ensemble sizes varying between one and ten) for the twentieth and twenty-first century simulations. When there are only two predictors-in our case, x 1 = GT and x 2 = NA-standardized and mutually uncorrelated SVD-based predictors p 1 , p 2 are obtained from the original predictors x 1 , x 2 by orthogonal rotation in the x 1 x 2 plane followed by rescaling: and where the rotation angle ϕ and squared rescaling coefficient λ 1 , λ 2 are given by: and Columns from the left to right show the simulated century and the type of CMIP simulation (or observational reference period), regression coefficients a and b of Sahel rainfall on the time series p 1 and p 2 , respectively, and the correlation coefficient between y, Sahel rainfall simulated in CMIP5, or observed, and its predicted values from regression model (6): ŷ = ap 1 + bp 2 In Eqs.
(3)-(4), k = σ 1 /σ 2 is the ratio of standard deviations σ 1 and σ 2 of x 1 and x 2 , respectively, and ρ is their correlation coefficient. Incidentally, k and ρ are the two non-dimensional parameters, characterizing the variance-covariance matrix of x 1 and x 2 up to a constant factor; to represent the same in terms of p 1 and p 2 , one such parameter is ϕ, while the relative value of either of λ 1 or λ 2 where λ 1 > λ 2 , can be another: Derivation of Eqs.
(1)-(5)^. Figure 2 illustrates these relationships in twentieth century observations (see the last row of Table 1 for the values of relevant parameters). The observed values of the original predictors are the coordinates of the color triangles on the (x 1 , x 2 ) plane, with the corresponding standardized values of Sahel rainfall shown by color. The thick segments shown on the x 1 , x 2 axes indicate sample standard deviations, σ 1 and σ 2 , of these original predictors. Their values are approximately 0.20 and 0.23°C, respectively, and their ratio is k = σ 1 /σ 2 = 0.86. The correlation coefficient of the original predictors is ρ = 0.62. SVD-based predictors p 1 , p 2 , whose corresponding axes are shown by dash lines, are obtained by orthogonal rotation of the original predictors x 1 , x 2 with rotation angle about ϕ = 52°, indicated by the arc connecting the x 1 and p 1 axes. The thick segments on the p 1 and p 2 axes show standard deviations (in our notation, λ 1 1/2 = 0.27°C and λ 2 1/2 = 0.13°C) of the data projections on these axes. The variance of projections is maximized for p 1 , among all possible directions on the x 1 , x 2 plane, therefore λ 1 > λ 2 , necessarily. Also by construction, the in-sample correlation coefficient between p 1 and p 2 is zero; since there are only two predictors, p 2 , being the Blast^of them, minimizes the variance among all directions on the x 1 , x 2 plane. Therefore, p 2 only explains λ 2 / (λ 1 + λ 2 ) = 18.5% of variance in the sample (while p 1 explains the other 81.5%). These features of the data and predictors' pairs are illustrated by the shape of Bconcentration ellipses^(von Storch and Zwiers 1999; see their example 2.8.12, p. 43). The 95% concentration ellipse is shown by the dotted line in Fig. 2; its major semiaxes are 2.45λ 1 1/2 and 2.45λ 2 1/2 . It should be noted that λ 1 and λ 2 do not necessarily determine the relative importance of the predictors p 1 and p 2 in modeling other variables: as color changes among triangles in Fig. 2 demonstrate, p 2 is much more important than p 1 in describing Sahel rainfall changes over 1901-1999. This is confirmed by the calculation of the regression coefficients in the last row of Table 2.
In models, since the ratio, k, of standard deviations of global tropical and sub-tropical North Atlantic SSTs is also close to 1 in the twentieth and twenty-first centuries, and the correlation coefficient of the SST indices, ρ, is positive (Table 1), the angle ϕ in Eq. (3) is close to 45°. That the diagonal is the direction along which is expressed the greatest covariance between x 1 = GT and x 2 = NA can also be clearly seen in Fig. 3. The remaining variance, orthogonal to it by construction, is larger in the twentieth (blue dots in Fig. 3) than in the twenty-first century (red squares). Since sinϕ = cosϕ for ϕ = 45 o , p 1 and p 2 , respectively, in Eqs. (1) and (2), become nearly proportional to the sum and difference of the original predictors, and specifically, p 2 approximates NARI. The time series of p 1 are depicted in the middle panels of Fig. 1: indeed, p 1 represents a Bglobal warmingm ode, evident in both sub-tropical North Atlantic and global tropical ocean temperatures. p 2 (~NARI) explains more variance in the twentieth than in the twenty-first century. Its standardized form is depicted in the bottom panels of Fig. 1. In the middle and bottom left-hand side panels, thick red lines represent the principal components in observations (Kaplan et al. 1998).
Our original predictors, GT and NA, are highly correlated in the twentieth and twenty-first century simulations, less so in the pre-Industrial control simulations ( Fig. 3 and Table 1). As a consequence, the percent of variance explained by the leading mode, λ 1 in Eq. (4), increases in the simulations with time-dependent external forcing, as compared to those with constant forcing. The correlation further increases from twentieth to twenty-first century and consequently so does the relative variance of the leading mode, λ 1 /(λ 1 + λ 2 ) = 1 − λ 2 /(λ 1 + λ 2 ), see Eq. (5). This behavior represents one prominent way in which external forcing expresses itself: it forces a positive Fig. 3 Scatterplots of sub-tropical North Atlantic and global tropical ocean temperatures in 100 years of pre-Industrial control (green circles), in the twentieth century/historical (blue dots), and in the twenty-first century/ RCP8.5 (red squares) simulations. The first 29 panels correspond to single-model ensemble means, and the last panel (in the lower right corner) corresponds to the multi-model mean correlation between the two predictors, through their common, GHG-induced warming. In addition, the relative cooling of the North Atlantic associated with aerosols contributes to the co-variability of indices with a second, preferred direction along which variance is expressed, which by construction is orthogonal to the first. This direction expresses more variance, visible in the scatter plots in Fig. 3, in the twentieth (blue dots) than in the twenty-first century (red squares).

Linear regressions of Sahel rainfall
Our SVD analysis of the original predictors confines any trend entirely to the leading mode time series, p 1 , i.e., the Bsum^or Bwarming^mode, while the residual mode, p 2 , i.e., the Bdifference^or NARI, is completely devoid of any significant trend. We model standardized Sahel rainfall y, our predictand, by linear regression on p 1 and p 2 (see Section BUse of time-dimensional singular vectors as predictorsî n the Appendix):ŷ Regression coefficients a and b and the correlation coefficients between simulated and predicted Sahel rainfall for the multi-model mean and for observations are reported in Table 2. Since the SVD-based predictor time series are standardized and mutually uncorrelated, and predictand time series are standardized, the regression coefficients, a and b, are precisely the correlation coefficients of the predictand and corresponding predictors. In the twentieth century, correlation of the multimodel mean simulation of Sahel rainfall with its SST-based regression is 0.574 and correlation in observations is 0.633. Both terms in the regression contribute, but p 2 contributes more (since b > a): late twentieth century Sahel drought is explained by the absence of North Atlantic warming, relative to warming of the global tropical oceans. The (positive) sign of b, the regression coefficient multiplying p 2 (NARI), is consistent across centuries. However, its magnitude and explanatory power vis à vis Sahel rainfall are drastically diminished in the twenty-first century. Over the twenty-first century, the correlation between the simulated Sahel rainfall and its in-sample prediction is 0.801. The multi-model ensemble achieves this high a correlation essentially through a positive correlation between Sahel rainfall and warming, captured by p 1 . This happens because the coherence among models in projections of twenty-first century precipitation change has increased considerably from CMIP3 to CMIP5 (Biasutti 2013): a majority of models in CMIP5 project a future increase in Sahel rainfal to go along with warming (Schewe and Levermann 2017), in analogy to paleoclimate reconstructions of the BGreen Sahara^, when summer insolation was stronger than now (de Menocal et al. 2000). Among the models with a twenty-first century correlation coefficient between p 1 and Sahel rainfall larger than 0.45 (thus explaining~20% of the variance with the single predictor p 1 ), seven of nine project future wetting (Table S3 in Supplementary Materials). In sum, regression models between centuries are nearly orthogonal: p 2 , analogous to the difference of predictors or NARI, explains a larger fraction of twentieth century variation, while p 1 , analogous to the sum of predictors or warming, alone explains the externally forced twenty-first century change.

Conclusions: Past is not prologue
Our analysis of the multi-model mean by design emphasizes the externally forced response and leads us to propose the following indirect attribution argument for Sahelian climate change, past and future. In the second half of the twentieth century, as global dimming (Stanhill and Cohen 2001;Liepert 2002) opposed global warming in the northern hemisphere, a unique combination of anthropogenic emissions contributed to the late twentieth century drying of the Sahel through their effect on sea surface temperatures: aerosols by cooling the North Atlantic and greenhouse gases by warming the tropical oceans, especially the Indian Ocean. Warming of the global tropical oceans Bupped the ante^for deep convection (Chou and Neelin 2004;Held et al. 2005), while the absence of warming in the North Atlantic reduced the moisture supply to the monsoon and thus its potential to meet the Bupped ante^and to trigger vertical instability ). Our argument is distinct from others previously proposed, which attributed late twentieth century Sahel drought solely to aerosols, whether through cooling of the North Atlantic or of the entire northern hemisphere (Rotstayn and Lohmann 2002;Kawase et al. 2010;Ackerley et al. 2011;Booth et al. 2012;Hwang et al. 2013;Park et al. 2015;Wang et al. 2016), in three ways. First of all, we argue for an indirect effect of anthropogenic emissions, i.e., mediated by sea surface temperatures. Linking emissions to Sahel rainfall through SSTs has the added advantage that it allows the synthesis in a single physical explanation of natural variability and anthropogenic change. Secondly, our argument is not about the role of interhemispheric gradients in SST shifting the latitudinal location of the Intertropical Convergence Zone and by extension rainfall in the Sahel. Rather, it is about quasi-equilibrium in convection as the world warms. The observed late twentieth century drying of the Sahel was more profound-longer lasting and of greater amplitudethan, e.g., the early twentieth century drought around 1910, despite the North Atlantic being cooler during the early period. This observation justifies the search for additional factors in drought, which we identify in greenhouse gas-induced tropical warming, which in the twentieth century occurred simultaneously with North Atlantic cooling. Therefore, thirdly, we argue for the combined drying effect of the two contrasting anthropogenic influences in twentieth century: drought was not caused by aerosols alone or by the cooling of the North Atlantic alone. Neither could it have been caused by greenhouse gases alone, as is made evident in simulations which include only the latter forcing (Biasutti 2013;Dong and Sutton 2015;Gaetani et al. 2017;Hill et al. 2017). Rather, drought resulted from the combination of aerosols and greenhouse gases. One influence cooled and reduced the moisture supply, while the other, warming, raised the threshold for convection-a double jeopardy. In the twenty-first century, external influence is dominated by GHG-induced warming, of both the North Atlantic and global tropical oceans. NARI, their difference, tends to zero, leaving all response to external forcing to be explained by their sum. SVD rotation of the original predictors to their weighted sum and difference and the relative sizes of their regression coefficients (Table 2) expose the near-orthogonality of the twentieth and twenty-first centuries in the response of SSTs and rainfall to external forcing. The oceans' translation of external forcing of predominantly anthropogenic nature leads to contrasting but not inconsistent outcomes in the past and the future. Late twentieth century drying and twentyfirst century projections of wetting are consistent with different balances in the influences of aerosol and greenhouse gas emissions on regional energy and moisture budgets, underlined by one coherent physical explanation rooted in the dynamics of quasi-equilibrium in convection. The fact that emissions need to be considered holistically, not additively, explains how it is possible that while anthropogenic warming had a role in past drought, continued warming in the twenty-first century gives rise to a strengthening of the monsoon. As aerosol emissions have abated, especially around the North Atlantic, it is the combination of warmings of both North Atlantic and global tropical oceans that explains the strengthening of the monsoon: the upped ante can now be met. This situation is consistent with understanding of monsoon response to warming on paleoclimate time scales: when the ocean warms enough to contribute to an Bupped ante^with increased moisture supply but not enough to shift the favored locus of convection from land to ocean (Chou et al. 2001), it reinforces the monsoon, as is the case for West Africa (Braconnot et al. 2012). With the usual caveats related to uncertainty in multi-model mean projections, that further warming may result in wetting of the Sahel is a conclusion worth reinforcing, in equal parts because it has gained in coherence between CMIP3 and CMIP5 (Biasutti 2013), and because the recovery of the rains is being experienced on the ground, in a fashion that is remarkably consistent with expectation from anthropogenic warming (West et al. 2008;Lodoun et al. 2013).
Acknowledgements This article is dedicated to the memory of professor A. Ben Mohamed. The authors acknowledge the technical support of Naomi Henderson and Haibo Liu, in the Ocean and Climate Physics division of LDEO. Through partial processing and local storage, they facilitated the access to the CMIP simulations for the IRI & LDEO community. The authors also acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and thank the climate modeling groups listed in the tables in Supplementary Material for producing and making available their model output. They acknowledge the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison, in partnership with the Global Organization for Earth System Science Portals, for their coordination to support and develop the software infrastructure to distribute CMIP simulations. AG was supported by National Science Foundation grant AGS-0955372 (CAREER), by National Aeronautics and Space Administration grant NNX16AN29G (SERVIR), and by Columbia University's World Project ACToday. AK was partially supported by Office of Naval Research Multidisciplinary University Research Initiatives grant N00014-12-1-0911 and by National Oceanic and Atmospheric Administration award NA17OAR4310156.

Singular Value Decomposition of an m-dimensional predictors' space
Suppose an n × m matrix X contains, by columns, centralized samples of m variables x 1 , x 2 , ..., x m (our original predictors), each observed at the same n > m times. The well-known thin Singular Value Decomposition (SVD) presents X as: where U is an n × m matrix of orthonormal columns, E is an m × m orthogonal matrix, and Σ is a diagonal matrix with non-negative diagonal elements in non-ascending order, called singular values (Golub and Van Loan 1996). Superscript T denotes matrix transposition. The columns of U and E are called left and right singular vectors. Rescaling the left singular vectors by P ¼ ffiffiffiffiffiffiffiffi n−1 p U and introducing Λ = Σ 2 /(n − 1), we rewrite X as: By construction, the columns of P are standardized, mutually uncorrelated time series. Each column is associated with a pattern of loadings (from the corresponding column of matrix E) on the original m variables and with the amount of variance (given by the corresponding diagonal element in Λ) that it explains within the total variance of the m original variables. The latter is easy to see from observing that the variance-covariance matrix of the original variables: has its eigenvalues and eigenvectors equal to diagonal elements of Λ and to columns of E, respectively. The columns of E are also known as Empirical Orthogonal Functions (EOFs), while the columns of V = PΛ 1/2 are known as Principal Components (PC) of the data set contained in matrix X. Decomposition of the data set into EOFs and PCs: X ¼ VE T is often used in climate science to represent the overall co-variability of the original variables as the superposition of linearly independent modes, with several leading modes usually explaining most of the total variance in the system (von Storch and Zwiers 1999).
Singular Value Decomposition of a bivariate predictors' space; derivation of Eqs.
(1)- (5) We centralize our predictors, denote them x 1 and x 2 , respectively, arrange them column-wise in matrix X = [x 1 x 2 ], and perform the SVD of X in the form of Eq. (7). The columns of matrix P = [p 1 p 2 ], called singular vectors, are obtained from the columns of X by an orthogonal rotation with 2x2 matrix E, followed by their normalization by the square roots of diagonal elements of 2x2 matrix Λ: By construction, p 1 and p 2 are standardized, mutually uncorrelated time series. When there are only two predictors, x 1 and x 2 , it is possible to present the elements of matrices E and Λ of the canonical decomposition of the predictors' variance-covariance matrix C = X T X/(n − 1) = EΛE T as functions of its elements.
Let standard deviations of x 1 and x 2 be σ 1 and σ 2 , respectively, and their correlation coefficient be ρ, then their variance-covariance matrix is: Note that: where I is a 2x2 identity matrix, and ϕ from the interval [0, π) is determined by: Introducing a non-dimensional parameter k = σ 1 /σ 2 , and expressing ϕ from Eq. (11) as a function of k and ρ, we obtain Eq. (3). Using the identity: we rewrite: with λ 1 and λ 2 defined by Eqs. (4). Therefore, the canonical decomposition of C is performed with eigenvector and eigenvalue matrices, E and Λ, as follows: where ϕ is given by Eq.
(3) and λ 1 and λ 2 by Eq. (4). Inserting expressions for E and Λ from Eq. (12) into Eq. (9), we write them out in the component form for an arbitrary row of matrix P to obtain Eqs. (1)-(2). Preisendorfer (1988) noted that Eqs. (11) and (4) are known since Pearson (1901). Equation (3) derived above gives an explicit form for the angle ϕ, expressing it through the nondimensional parameters k and ρ of the original predictors' variance-covariance matrix, making it more convenient for interpretation. A traditional non-dimensional parameter of such matrix for SVD-based predictors p 1 , p 2 , the portion of variance explained by one of the predictors, follows from Eq. (4). For p 1 , it is expressed through k and ρ as: For the predictor p 2 with the smaller variance, λ 2 , since λ 2 /(λ 1 + λ 2 ) = 1 − λ 1 /(λ 1 + λ 2 ), using (13) results in Eq. (5). The general structure of the dependence of the angle ϕ, characterizing the rotation of the original predictors toward uncorrelated singular vectors, and of ratio λ 2 / (λ 1 + λ 2 ), characterizing the relative variance of the second (trailing) predictors' mode, on nondimensional parameters of the original predictors' variance-covariance matrix, ρ, the correlation coefficient of the original predictors x 1 and x 2 , and k, their ratio of standard deviations, are illustrated in Fig. 4, with points corresponding to the data samples of multi-model means from CMIP5 simulations and from observations, as listed in Table 1. Finally, the relationships between parameters of variance-covariance matrices in the (x 1 , x 2 ) and (p 1 , p 2 ) predictor planes are illustrated by the shape of Bconcentration ellipses,^which are described by equation: in (x 1 , x 2 ) coordinates and by equation: in (p 1 , p 2 ) coordinates. Here, q(α) is the α% quantile of the χ 2 (2) distribution with two degrees of freedom, so that if (x 1 , x 2 ) observations obey a bivariate normal distribution, α% of the data points is contained within this concentration ellipse (von Storch and Zwiers 1999, see their  Fig. 4 Dependence of the angle ϕ (white contours) and of ratio λ 2 /(λ 1 + λ 2 ), in percent (color shading) on the correlation coefficient ρ and ratio of standard deviations k of GT and NA. White dot, circle, square, and triangle show points on the (ρ, k) plane corresponding to the multi-model means of CMIP5 pre-Industrial control, historical, and RCP8.5 scenario simulations and of observations, respectively. Note that ϕ(ρ, k) is discontinuous at all points of the ray {(ρ, k) : ρ = 0, k ≥ 1}. In particular, while the definition in Eq.

Use of time-dimensional singular vectors as predictors
In regression studies, it often proves advantageous to use as predictors a set of standardized and mutually uncorrelated variables, like p 1 , p 2 , ..., p m (their time series p i , i = 1, 2, ..., m, are given by columns of matrix P and represent rescaled time-dimensional (left) singular vectors of X), instead of the original variables that lack these properties, leading to more tractable and easier to interpret results. Indeed, consider the regression model: where ε is a random error and, for simplicity, the predictand y is assumed standardized. The vector of unknown regression coefficients a = (a 1 , a 2 , …, a m ) T has to be found by the leastsquares best fit to the system y ¼ Pa where y is an n-dimensional vector-a column that contains the standardized data sample for variable y. Dividing normal equations P T P À Á a ¼ P T y by n-1, we find, since P T P/(n −1) = I (where I is an n × n identity matrix), a ¼ P T y n−1 ð Þ ¼ ρ 1 ; ρ 2 ; …; ρ m ð Þ T ; .
where ρ i , for each i = 1, 2, ..., m, is the sample correlation coefficient between p i and y (ρ i ¼ p T i y= n−1 ð Þ, since y and columns of P are all standardized). Therefore, regression model Eq. (14) results in the prediction: for which the sample estimateρ of its correlation coefficient with the actual value of y satisfies the relationship:ρ 2 ¼ ρ 2 1 þ ρ 2 2 þ …ρ 2 m ; making it easy to see how the inclusion of time series p i of each individual mode i into the regression model contributes to the overall skill of prediction. Another measure of in-sample skill estimate for prediction Eq. (15), the standard deviationê, of the differenceŷ−y, can be expressed throughρ asê ¼ ffiffiffiffiffiffiffiffiffi ffi 1−ρ 2 p .
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.