Appendices
Appendix A1: model-based derivations
Model-based derivations of CA, CCA and thus of dc-CA start from the Gaussian response model (ter Braak 1985, 1986) or from a log-linear model with interaction terms (Goodman 1981). They all lead to the transition formulae in Sect. 2.2.
The Gaussian response model is
$$\begin{aligned} E(y_{ij} )=s_{i} h_{j} \exp \left[ {-\frac{\left( {x_i -u_j } \right) ^{2}}{2\sigma _j^2 }} \right] , \end{aligned}$$
(26)
where \(s_i \) is a site-specific parameter (Ihm and Groenewoud 1975; ter Braak 1988) or \(s_i =1\), so that there are no site specific parameters (ter Braak 1985, 1986), \(h_j \) is a species-specific parameter denoting the maximum expected abundance, \(x_i \) is a site score, often interpreted as a latent environmental variable, \(u_j \) is a species score, often interpreted as a latent trait and, in this model, the optimum for the jth species and \(\sigma _j \) is the niche width of the jth species. The link with the correspondence analysis methods is strongest when \(\sigma _j =\sigma \) and when this constant \(\sigma \) is small compared to the range of the site scores (ter Braak 1985), i.e. when there is strong and uniform niche differentiation among the species.
The alternative model-based start is from the log-linear model with saturated main effects for rows (sites) and columns (species) and one or more linear-by-linear interaction terms (Goodman 1981; Ihm and Groenewoud 1984; ter Braak 2017):
$$\begin{aligned} \log \left( {E\left( {y_{ij} } \right) } \right) =r_i^*+c_j^*+b_{te} u_j x_i , \end{aligned}$$
(27)
where \(r_i^*\) and \(c_j^*\) are row and column saturated main effects, \(x_i \) and \(u_j \) latent row and column scores and \(b_{te} \) a scalar coefficient which is set to 1 unless \(\left\{ {x_i } \right\} \) and \(\left\{ {u_j } \right\} \) are both standardized. When \(\sigma _j =\sigma \) in model (26), the models (26) and (27) are re-parametrization of one-another.
The link with correspondence analysis (CA) is easiest to see in model (27). Indeed, rewriting model (27) in an exponential form and using a first order Taylor expansion in terms of \(b_{te} u_j x_i \) yields the reconstitution formula of correspondence analysis (Greenacre 1984; Ihm and Groenewoud 1975):
$$\begin{aligned} \mu _{ij} =R_i^*C_j^*\exp \left( {b_{te} u_j x_i } \right) \approx R_i^{*} C_{\mathrm{j}}^{*} \left( {1+b_{te} u_j x_i } \right) , \end{aligned}$$
(28)
where \(R_i^*=\hbox {exp}( {r_i^*})\) and \(C_j^*=\hbox {exp}({c_j^*})\). So, for small \(b_{te} \) and standardized \(\left\{ {x_i } \right\} \) and \(\left\{ {u_j } \right\} \), both models can be expected to be very similar. Goodman (1981) showed that their estimation equations are then also very similar. For standardized u and x, \(b_{te} \) is the square-root of the first eigenvalue of correspondence analysis on Y.
With a linear constraint on the site scores, x \(=\) Eb, where b is a vector of unknown coefficients, one for each environmental variable, and the same approximation as in obtaining CA from models (26) and (27), CCA is obtained (ter Braak 1986, 1988). With an additional linear constraint on the species scores, u = Tc, where c is a vector of unknown coefficients, one for each trait, dc-CA is obtained in a similar way. The resulting transition formulae are presented in Sect. 2.2. Böckenholt and Böckenholt (1990) showed an example where the estimates by dc-CA and maximum likelihood of the log-linear model are indeed very close.
Appendix A2: WA scores of dc-CA from CCA/RDA algorithm
This appendix shows that the unconstrained scores of weighted \(\hbox {RDA}_{\mathbf{R}}(\mathbf{M}^{*}\sim \mathbf{E})\) are the WA site scores of dc-CA and how to obtain the other set of canonical coefficients of dc-CA from the CCA and RDA pair.
The unconstrained scores of \(\hbox {RDA}_{\mathbf{R}}(\mathbf{M}^{*}\sim \mathbf{E})\) are a linear combination of the response data, i.e.
$$\begin{aligned} \mathbf{X}_{\mathrm{rda}}^*=\mathbf{M}^{*}{} \mathbf{B}^{*} \end{aligned}$$
(29)
where \(\mathbf{B}^{*}\) are the response variable scores of the RDA and
$$\begin{aligned} \mathbf{M}^{*}=\mathbf{R}^{-1}{} \mathbf{YTB}_1 \end{aligned}$$
(30)
the community weighted means of the constrained axes of the first CCA, CCA(\(\mathbf{Y}^{T}\sim \mathbf{T})\), with \(\mathbf{B}_1 \) the matrix of canonical coefficients of this first CCA. Insertion of Eq. (30) into Eq. (29) gives
$$\begin{aligned} \mathbf{X}_{\mathrm{rda}}^*=\mathbf{R}^{-1}{} \mathbf{YTB}_1 \mathbf{B}^{{*}}=\mathbf{R}^{-1}{} \mathbf{YU} \end{aligned}$$
(31)
so that \(\mathbf{X}_{\mathrm{rda}}^*\) are WA site scores derived from constrained species scores as in Eq. (12). These are the desired WA sites scores because their projection onto \(\mathbf{E}\) using weights \(\mathbf{R}\) are the constrained site scores of the RDA and thus also of the dc-CA. The regression coefficients of this projection are the canonical weights for the environmental variables and satisfy Eq. (13). The dc-CA canonical coefficients for the traits \(\mathbf{C}\) follow from Eq. (31) with Eq. (11): \(\mathbf{C}=\mathbf{B}_1 \mathbf{B}^{*}\).
Appendix A3: biplot of fourth-corner correlations
This appendix shows that, for \(\alpha =1\) in Eq. (24), the intra-set correlations of the traits plotted with the fourth-corner correlations of the environmental variables with the axes, together form a weighted least-squares biplot of the fourth-corner correlations between traits and environmental variables. Vice versa, for \(\alpha =0\) in Eq. (24), the intra-set correlations of the environmental variables can be plotted with the fourth-correlations of the traits with the axes. This appendix also explores how intra- and inter-set correlations relate to the fourth-corner correlations of the traits and environmental variables with the axes.
The first equation in (24) can be rewritten as
$$\begin{aligned} \mathbf{B}_f= & {} \left[ \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{\frac{1}{2}}{} \mathbf{P}{\varvec{\Delta }} ^{\alpha }\right] _r =\left[ \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{\frac{1}{2}}{} \mathbf{DQ}{\varvec{\Delta }}^{\alpha -1}\right] _r \nonumber \\= & {} \left[ {\mathbf{E}^{T}{} \mathbf{YT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-\frac{1}{2}}{} \mathbf{Q}{\varvec{\Delta }} ^{\alpha -1}} \right] _r \end{aligned}$$
(32)
by inserting, from Eq. (16), \(\mathbf{P}{\varvec{\Delta }}^{\alpha }=\mathbf{DQ}{\varvec{\Delta }}^{\alpha -1}\)and inserting \(\mathbf{D}\) from Eq. (15). With the second equation in (17), this gives, with \({{\varvec{\Lambda }} }={{\varvec{\Delta }}}^{2}\),
$$\begin{aligned} \mathbf{B}_f =\left[ {\mathbf{E}^{T}\mathbf{YTC}{\varvec{\Delta }}^{2\left( {\alpha -1} \right) }} \right] _r =\left[ {\mathbf{E}^{T}{} \mathbf{YU}{\varvec{\Lambda }}^{\alpha -1}} \right] _r =cor_Y \left( {\mathbf{E},\mathbf{U}} \right) {{\varvec{\Delta }} }^{\alpha -1} \end{aligned}$$
(33)
so that \(\mathbf{B}_f \) consists of fourth-corner correlations between \(\mathbf{E}\) and \(\left[ \mathbf{U} \right] _r \) for K-normalized \(\mathbf{U}\) (\(\alpha =1)\). For \(\alpha =1\) in Eq. (24) and K-normalized \(\mathbf{U}\),
$$\begin{aligned} \mathbf{C}_f =\left[ {\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{1/2}\mathbf{Q}} \right] _r =\left[ {(\mathbf{T}^{T}{} \mathbf{KT}} \right) \mathbf{C}]_r =\left[ {\mathbf{T}^{T}{} \mathbf{KU}} \right] _r . \end{aligned}$$
(34)
so that \(\mathbf{C}_f \) consists of intra-set correlations between \(\mathbf{T}\) and \(\left[ \mathbf{U} \right] _r \).
We now explore how intra- and inter-set correlations of the environmental variables relate to the fourth-corner correlations of the traits and with the axes.
By using Eq. (17), \(\mathbf{B}_f \) in equation in (24) can be rewritten as
$$\begin{aligned} \mathbf{B}_f =\left[ \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-\frac{1}{2}}{} \mathbf{P}{\varvec{\Delta }}^{\alpha }\right] _r =\left[ {\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) \mathbf{B}} \right] _r =\left[ {\mathbf{E}^{T}{} \mathbf{RX}} \right] _r =cor_R \left( {\mathbf{E},\mathbf{X}} \right) {{\varvec{\Delta }} }^{\alpha } \end{aligned}$$
(35)
so that, for R-normalized \(\mathbf{X} \left( {\alpha =0} \right) \), \(\mathbf{B}_f \) consists of intra-set correlations between \(\mathbf{E}\) and \(\left[ \mathbf{X} \right] _r \). From Eqs. (33) and (35), and analogously for \(\mathbf{T}\) and \(\mathbf{X}\),
$$\begin{aligned} cor_\mathbf{Y} \left( {\mathbf{E},\mathbf{U}} \right) =cor_\mathbf{R} \left( {\mathbf{E},\mathbf{X}} \right) {\varvec{\Delta }} \hbox { and } cor_\mathbf{Y} \left( {\mathbf{T},\mathbf{X}} \right) =cor_\mathbf{R} \left( {\mathbf{T},\mathbf{U}} \right) {\varvec{\Delta }} \end{aligned}$$
(36)
so that fourth-corner correlations with the axes are a factor \(\sqrt{\lambda }\) smaller than the intra-set correlations.
An expression for \(\mathbf{B}_f \) in terms of inter-set correlations can be obtained from Eq. (35), using the matrix version of Eq. (13),
$$\begin{aligned} \mathbf{B}_f =\left[ {\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) \mathbf{B}} \right] _r =\left[ {\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{RX}^{*}} \right] _r =\left[ {\mathbf{E}^{T}{} \mathbf{RX}^{*}} \right] _r \end{aligned}$$
(37)
so that, for any \(\alpha \), \(\mathbf{B}_f \) consists of the inter-set correlation of the environmental variables times the standard deviation of \(\mathbf{X}^{*}\), i.e.
$$\begin{aligned} \mathbf{B}_f =cor_\mathbf{R} \left( {\mathbf{E},\mathbf{X}^{*}} \right) diag\left( {sd\left( {\mathbf{X}^{*}} \right) } \right) \end{aligned}$$
(38)
In CCA and RDA implementations in the Canoco software (ter Braak and Šmilauer 2012), this equation is used for so-called biplot scores of environmental variables, so that \(\mathbf{B}_f \) is such a score, which depends on \(\alpha \). In dc-CA, effectively, Eq. (35) is used for \(\mathbf{B}_f \) which together with \(\mathbf{C}_f =cor_\mathbf{K} \left( {\mathbf{T},\mathbf{U}} \right) {{\varvec{\Delta }} }^{1-\alpha }\) forms a biplot as follows from Eq. (24) using Eq. (35) for \(\mathbf{B}_f \) and, for \(\mathbf{C}_f \), the analogous version of this equation and Eq. (36).
We end this appendix with a, perhaps, simpler derivation of equation (33). With the matrix version of Eq. (12) in Eq. (37), we obtain Eq. (33):
$$\begin{aligned} \mathbf{B}_f =\left[ {\mathbf{E}^{T}{} \mathbf{RX}^{*}} \right] _r =\left[ \mathbf{E}^{T}{} \mathbf{R}\left( \mathbf{R}^{-1}\mathbf{YU}{\varvec{\Lambda }}^{\alpha -1} \right) \right] _r =\left[ {\mathbf{E}^{T}{} \mathbf{YU}{\varvec{\Lambda }} ^{\alpha -1}} \right] _r \end{aligned}$$
(39)
Appendix A4: biplot of species niche centroids (SNCs) and CWMs
This appendix shows that an ordination diagram with the species scores \(\left[ \mathbf{U} \right] _r \) supplemented with environmental arrows based on \(\mathbf{B}_f \) form a least-squares biplot of the species niche centroids (SNCs),
$$\begin{aligned} \mathbf{N}=\mathbf{K}^{-1}{} \mathbf{Y}^{T}{} \mathbf{E} \end{aligned}$$
(40)
which is an \(m \times p\) matrix. Analogously, an ordination diagram with the site scores \(\left[ \mathbf{X} \right] _r \) supplemented with trait arrows based on \(\mathbf{C}_f \) form a least-squares biplot of the community weighted means (CWMs) of Eq. (18). Moreover, if the species scores and sites scores \(\mathbf{U}\) and \(\mathbf{X}\) satisfy the transition formulae and thus form a biplot pair for the (fitted) \(\mathbf{Y}\) via the reconstitution formulae (as in CA or CCA), then the environmental arrows \(\mathbf{B}_f \) not only form a least-squares biplot of the fourth-corner correlation with trait arrows \(\mathbf{C}_f \), but with \(\left[ \mathbf{U} \right] _r \) forms also a biplot of the SNCs and \(\mathbf{C}_f \) simultaneously forms with \(\left[ \mathbf{X} \right] _r \) a biplot of the CWMs.
For notational convenience we drop the \(\left[ . \right] _r \) notation and write \(\mathbf{U}\) where it is clear that only r columns are being used. When \(\mathbf{U}\) is given and scaled such that \(\mathbf{U}^{T}{} \mathbf{KU}={\varvec{\Lambda }}^{1-{\upalpha }}\) and \(\mathbf{N}\) is to be approximated by a biplot, the optimal scores for the environmental variables are obtained by fitting the model \(\mathbf{N}=\mathbf{UB}_0 \)+ error by a weighted regression of \(\mathbf{N}\) onto \(\mathbf{U}\) with, as standard in dc-CA, species weights K. The estimated regression coefficients are
$$\begin{aligned} {{\hat{\mathbf{B}}}}_0 =\left( {\mathbf{U}^{T}{} \mathbf{KU}} \right) ^{-1}\mathbf{U}^{T}{} \mathbf{KN}={\varvec{\Lambda }}^{{\upalpha }-1}{} \mathbf{U}^{T}\mathbf{Y}^{T}{} \mathbf{E}=\mathbf{B}_f^T \end{aligned}$$
(41)
where \({{\varvec{\Lambda }} }={\varvec{\Delta }}^{2}\), and the last equality follows from Eq. (33). \({{\hat{\mathbf{B}}}} _0 \) is thus the transpose of \(\mathbf{B}_f \). For \(r =\min (p, q)\), \(\mathbf{UB}_f^T \) is equal to the full rank fitted SNC values (see also Sect. 6.5).
Analogously, suppose \(\mathbf{X}\) is given and scaled such that \(\mathbf{X}^{T}{} \mathbf{R}{} \mathbf{X}={\varvec{\Lambda }}^{\alpha }\) and the community weighted mean matrix \(\mathbf{M}\) [Eq. (18)] is to be approximated by a biplot, the optimal scores for the environmental variables are obtained by fitting the model \(\mathbf{M}=\mathbf{XC}_0 \)+ error by a weighted regression of \(\mathbf{M}\) onto \(\mathbf{X}\) with, as standard in dc-CA, site weights R. The estimated regression coefficients are
$$\begin{aligned} {\hat{\mathbf{C}}}_0 =\left( {\mathbf{X}^{T}{} \mathbf{RX}} \right) ^{-1}{} \mathbf{X}^{T}{} \mathbf{RM}={{\varvec{\Lambda }} }^{-{\upalpha }}{} \mathbf{X}^{T}{} \mathbf{YT} \end{aligned}$$
(42)
The last equality can be shown by the route followed in Eqs. (32) and (33):
$$\begin{aligned} {\mathbf {C}}_{f}= & {} \left[ {\left( {{\mathbf {T}}^{T} {\mathbf {KT}}} \right) ^{{\frac{1}{2}}} {\mathbf {Q}{\varvec{\Delta }} }^{{1 - \alpha }} } \right] _{r} = \left[ {\left( {{\mathbf {T}}^{T} {\mathbf {KT}}} \right) ^{{\frac{1}{2}}} {\mathbf {D}}^{T} {\mathbf {P}{\varvec{\Delta }}}^{{ - \alpha }} } \right] _{r} \nonumber \\&= \left[ {({\mathbf {T}}^{T} {\mathbf {Y}}^{T} {\mathbf {E}}} \right) \left( {{\mathbf {E}}^{T} {\mathbf {RE}}} \right) ^{{ - \frac{1}{2}}} {\mathbf {P}{\varvec{\Delta }}}^{{ - \alpha }} ]_{r} \end{aligned}$$
(43)
$$\begin{aligned} \mathbf{C}_f= & {} \left[ {(\mathbf{T}^{T}{} \mathbf{Y}^{T}{} \mathbf{E}} \right) \mathbf{B}{\varvec{\Delta }} ^{-2\alpha }]_r =(\mathbf{T}^{T}{} \mathbf{Y}^{T}{} \mathbf{X}){ {\varvec{\Lambda }} }^{-\alpha } \end{aligned}$$
(44)
\({\hat{\mathbf{C}}}_0 \) is thus the transpose of \(\mathbf{C}_f \). For r=min(p, q), \(\mathbf{XC}_f^T \) is equal to the full rank fitted CWM values (see also Sect.6.5).
Appendix A5: biplots involving canonical weights
This appendix describes biplots involving canonical weights: \(\mathbf{B}\)–\(\mathbf{C}\), \(\mathbf{B}\)–\(\mathbf{C}_f \), \(\mathbf{B}\)–\(\mathbf{X}\) and the biplots obtained by symmetry: \(\mathbf{B}_f \)–\(\mathbf{C}\) and \(\mathbf{C}\)–\(\mathbf{U}\). For completeness, the reconstitution formula for the fitted community matrix \(\mathbf{Y}\) is given with its biplot based on \(\mathbf{X}\) and \(\mathbf{U}\) as a corollary of the \(\mathbf{B}\)–\(\mathbf{C}\) biplot.
The biplot of
\(\mathbf{B}\)
and
\(\mathbf{C}\)
A weighted regression of the contingency ratios \(\hbox {y}_{++} \mathbf{R}^{-1}{} \mathbf{YK}^{-1}\) on the traits and environmental variables, with weights \(\mathbf{R}\) and \(\mathbf{K}\), results (ignoring \(\hbox {y}_{++} )\) in the regression coefficients (Gabriel 1998)
$$\begin{aligned} \mathbf{F}_{reg}= & {} \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}\mathbf{E}^{T}{} \mathbf{R}\left( {\mathbf{R}^{-1}{} \mathbf{YK}^{-1}} \right) \mathbf{KT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}\nonumber \\= & {} \left( {\mathbf{E}^{T}\mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{YT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1} \end{aligned}$$
(45)
Following ter Braak (1990) and Sect. 3, a biplot of \(\mathbf{F}_{reg} \) can be based on a “rank rweighted least-squares approximation” of the form \(\mathbf{F}_{reg} \approx \mathbf{B}{} \mathbf{C}^{T}\) with \(\mathbf{B}\) and \(\mathbf{C}\) matrices of order \(p \times r\) and \(q \times r\), respectively. It is shown below that the optimal \(\mathbf{B}\) and \(\mathbf{C}\) are the first r columns of the canonical weights of dc-CA. For simplicity of notation these matrices are already indicated by the symbols for canonical coefficients with the \(\left[ . \right] _r \) notation dropped as well, as in Sect. 6.4. When dc-CA is a good approximation of the models in Sect. 6.1 [e.g. Eq. (28)], the \(\mathbf{F}_{reg} \) is likely close to the regression coefficients associated with the interactions between traits and environment in a log-linear model (Brown et al. 2014; ter Braak 2017).
As the regression coefficients have a variance that is proportional to the tensor product of the inverses of the matrices \(\mathbf{E}^{T}{} \mathbf{RE}\) and \(\mathbf{T}^{T}{} \mathbf{KT}\), it is natural to use \(\mathbf{E}^{T}{} \mathbf{RE}\) and \(\mathbf{T}^{T}{} \mathbf{KT}\) as weights. The weighted approximation can be obtained from dc-CA afollows. We seek the minimum over \(\mathbf{B}\)and \(\mathbf{C}\) (free matrices, not yet equal to the canonical coefficients) of
$$\begin{aligned}&\big \vert \big \vert \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{1/2}\left( {\mathbf{F}_{reg} -\mathbf{B}\,\mathbf{C}^{T}} \right) \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{1/2}\big \vert \big \vert ^{2} \nonumber \\&\quad =\big \vert \big \vert \mathbf{D}-\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}\mathbf{B}\,\mathbf{C}^{T}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2}\big \vert \big \vert ^{2} \end{aligned}$$
(46)
As follows from the Eckhart–Young theorem (Greenacre 1984) the minimum is obtained from the singular value decomposition of \(\mathbf{D}\). By consequence, the minimum of (46) is \(\lambda _{r+1} +\ldots +\lambda _{\hbox {min}\left( {p,q} \right) } \) and is obtained by setting \(\mathbf{B}\) and \(\mathbf{C}\) equal to the first r columns of the canonical weights of Eq. (17). The scores of \(\mathbf{X}\) and \(\mathbf{U}\) thus form a biplot of the fitted contingency ratios. The biplot is weighted least-squares with weights \(\mathbf{R}\) and \(\mathbf{K}\).
The regression of the contingency ratios on the traits and environmental variables leads to fitted values and thus also to fitted values of \(\mathbf{Y}\) itself. The fitted values,
$$\begin{aligned} {\hat{\mathbf{Y}}}=\hbox {y}_{++}^{-1} \mathbf{R}\left( {\mathbf{1}_n \mathbf{1}_m^T +\mathbf{X}{} \mathbf{U}^{T}} \right) \mathbf{K}=\hbox {y}_{++}^{-1} \mathbf{R}\left( {\mathbf{1}_n \mathbf{1}_m^T +\mathbf{XB}{} \mathbf{C}^{T}{} \mathbf{U}^{T}} \right) \mathbf{K} \end{aligned}$$
(47)
have the form of the usual reconstitution formula for \(\mathbf{Y}\) but with constrained instead of unconstrained scores as in CA.
The biplot of
\(\mathbf{B}\)
and
\(\mathbf{C}_f \)
The other biplots essentially follow from considering dc-CA as a canonical correlation analysis on inflated trait and environment data and noting that canonical correlation analysis can be seen as reduced-rank regression fitted by maximum likelihood (ter Braak 1990; Tso 1981).
In the super inflated data of integer-valued \(\mathbf{Y}\), each row represents an individual. When predicting traits from environmental variables, the predicted values of the individuals of the same site are all identical and are thus equal to the community weighted mean of the predicted trait values. This suggests to consider the regression of the community weighted means \(\mathbf{M}\) onto the environment \(\mathbf{E}\). With weights \(\mathbf{R}\), the estimated regression coefficients are
$$\begin{aligned} {\hat{\mathbf{B}}}_{\mathbf{T}\sim \mathbf{E}} =\left( {\mathbf{E}^{T}\mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{RM}=\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{YT} \end{aligned}$$
(48)
A biplot of \({\hat{\mathbf{B}}}_{\mathbf{T}\sim \mathbf{E}} \) can be based on a “rank r weighted least-squares approximation” of the form \({\hat{\mathbf{B}}}_{\mathbf{T}\sim \mathbf{E}} \approx \mathbf{BC}_f^T \) with \(\mathbf{B}\) and \(\mathbf{C}_f \) matrices of order \(p\times r\) and \(q \times r\), respectively. It is shown below that the optimal \(\mathbf{B}\) and \(\mathbf{C}_f \) are the first r columns of the canonical weights for the environmental variables and the biplot scores of the traits of the dc–CA. For simplicity of notation these matrices are already indicated by their symbols in the main text and the \(\left[ . \right] _r \) notation is dropped as well, as in Sect. 6.4.
The weighted approximation is motivated as follows. Because the regression coefficients for each column of \(\mathbf{M}\) have a variance that is proportional to the inverse of the matrix \(\mathbf{E}^{T}\mathbf{RE}\), it is natural to use \(\mathbf{E}^{T}{} \mathbf{RE}\) as weights. To make the approximation invariant to linear transformation of \(\mathbf{T}\), the inverse of \(\mathbf{T}^{T}{} \mathbf{KT}\) forms the other set of weights, as in Eq. (23). We seek thus the minimum over \(\mathbf{B}\) and \(\mathbf{C}_f \) (free matrices for now) of
$$\begin{aligned}&\big \vert \big \vert \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{1/2}\left( {\hat{\mathbf{B}}_{\mathbf{T}\sim \mathbf{E}} -\mathbf{BC}_f^T } \right) \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2}\big \vert \big \vert ^{2} \nonumber \\&=\big \vert \big \vert \mathbf{D}-\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{1/2}\mathbf{B}{} \mathbf{C}_f^T \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2}\big \vert \big \vert ^{2} \end{aligned}$$
(49)
As follows from the Eckhart–Young theorem (Greenacre 1984) the minimum is obtained from the singular value decomposition of \(\mathbf{D}\). By consequence, the minimum of (46) is \(\lambda _{r+1} +\ldots +\lambda _{\mathrm{min}\left( {p,q} \right) } \) and is obtained by setting \(\mathbf{B}\) and \(\mathbf{C}_f \) equal to the first r columns of the canonical weights of the environmental variables in Eq. (17) and to the biplot scores of the traits in Eq. (24). This result (and the version with traits and environmental variables interchanged) shows that dc-CA is both a reduced rank regression of CWMs on the environment \(\mathbf{E}\) and a reduced rank regression of SNCs on the traits \(\mathbf{T}\).
The result can now be linked to a canonical correlation analysis of super inflated data. Such a canonical correlation analysis is simultaneously a multivariate regression of the traits on the environment and a multivariate regression of the environment on the traits with all data in super inflated form. The resulting optimal biplots are precisely the same as those obtained above for the regression of CWMs and, by symmetry, SNCs and the regression coefficients of these two regressions satisfy the transition formula in Eq. (6) and (7). In fact, these equations can be written explicitly in terms of CWMs and SNCs:
$$\begin{aligned} \lambda _b \mathbf{b}= & {} \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}\mathbf{E}^{T}{} \mathbf{R}(\mathbf{R}^{-1}{} \mathbf{YT})\mathbf{c}=\left( {\mathbf{E}^{T}\mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{RMc} \end{aligned}$$
(50)
$$\begin{aligned} \lambda _c \mathbf{c}= & {} \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}\mathbf{T}^{T}{} \mathbf{K}(\mathbf{K}^{-1}{} \mathbf{Y}^{T}{} \mathbf{E})\mathbf{b}=\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}{} \mathbf{T}^{T}{} \mathbf{KN}^{T}{} \mathbf{b} \end{aligned}$$
(51)
In conclusion, the biplot of canonical weights \(\mathbf{B}\) and biplot scores \(\mathbf{C}_f \) gives a weighted least-squares approximation of the regression coefficients of the regression of CWMs on the environment, or equivalently of the regression of the traits on the environmental variables (in super inflated form). Conversely, the biplot of canonical weights \(\mathbf{C}\) and biplot scores \(\mathbf{B}_f \) gives a weighted least-squares approximation of the regression coefficients of the regression of SNCs on the environment, or equivalently of the regression of the environment on the traits. The fitted traits and environmental values in these equivalent regression are equal to the fitted CWMs and fitted SNCs and their biplot representation is covered in Sect. 6.4, where CWM and SNC were regressed on \(\mathbf{X}\) and \(\mathbf{U}\), respectively, and thus implicitly on \(\mathbf{E}\) and \(\mathbf{T}\).
The interpolative biplot of
\(\mathbf{B}\)
and
\(\mathbf{X}\)
Gower and Hand (1996) distinguish predictive and interpolative biplots. All biplots so far are predictive biplots in the sense that the two sets of items approximate a matrix by inner products (Gower and Hand 1996). Such biplots use loadings or biplot scores. Interpolative biplots use regression coefficient-like quantities. As their name suggests, they are useful for interpolation or adding a new site or species to the plot (Rui Alves and Beatriz Oliveira 2004). It is thus clear that \(\mathbf{B}\) and \(\mathbf{X}\) form an interpolative biplot.
Appendix A6: Why CWMs and SNCs are key in analyzing trait-environment relationships
Community weighted means (CWMs) and species niche centroids (SNC) appear several times in this paper. This appendix shows their importance in analyzing trait-environment relationships. Three models are considered, a log-linear model for \(\mathbf{Y}\) and two related linear models: a model for the trait data \(\mathbf{T}\) and one for the environment data \(\mathbf{E}\).
When \(\mathbf{Y}\) consists of count data that are Poisson distributed and is modelled by a log-linear model with saturated main effects and interactions between all traits and all environmental variables, then the minimal sufficient statistics are \(\mathbf{E}^{T}{} \mathbf{YT}\) together with \(\mathbf{R}\) and \(\mathbf{K}\) (ter Braak 2017). The CWMs \(\mathbf{M}=\mathbf{R}^{-1}{} \mathbf{YT}\) and SNCs \(\mathbf{N}=\mathbf{K}^{-1}\mathbf{Y}^{T}{} \mathbf{E}\) with \(\mathbf{R}\) and \(\mathbf{K}\) are thus sufficient statistics.
When \(\mathbf{Y}\) consists counts of individuals, it is natural to consider the super-inflated data \(\mathbf{T}_{infl} \) and \(\mathbf{E}_{infl} \) in which each individual is represented by a row: a row of \(\mathbf{T}_{infl} \) consisting of the trait values of the individual of the particular species it belongs to and a row of \(\mathbf{E}_{infl} \) consisting of the environmental values that the individual may experience because it occurs in a particular site. Associated with \(\mathbf{T}_{infl} \) and \(\mathbf{E}_{infl} \) are also the factors species and sites coding for which species and site each row belongs to. We now consider the linear model for \(\mathbf{T}_{infl} \) as a function g of \(\mathbf{E}_{infl} \) and the L\(_{2}\) norm for the residuals.
$$\begin{aligned} \big \vert \big \vert \mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) \big \vert \big \vert ^{2}=\big \vert \big \vert {\Pi }_s \left( {\mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) } \right) +\left( {1-{\Pi }_s } \right) \left( {\mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) } \right) \big \vert \big \vert ^{2}\nonumber \\ \end{aligned}$$
(52)
where \({\Pi }_s \) is the projector onto the factor site. Because the two added terms are orthogonal, the square of their sum is the sum of their squares. Also \({\Pi }_s g\left( {\mathbf{E}_{infl} } \right) =g\left( {\mathbf{E}_{infl} } \right) \) so that \(\left( {1-{\Pi }_s } \right) \left( {\mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) } \right) =\left( {1-{\Pi }_s } \right) \mathbf{T}_{infl} \) and Eq. (52) becomes
$$\begin{aligned} \big \vert \big \vert \mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) \big \vert \big \vert ^{2}=\big \vert \big \vert {\Pi }_s \mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) \big \vert \big \vert ^{2}+ \big \vert \big \vert \left( {1-{\Pi }_s } \right) \mathbf{T}_{infl} \big \vert \big \vert ^{2} \end{aligned}$$
(53)
Such ANOVA-like decomposition was also given in Peres-Neto et al. (2017). The regression of \(\mathbf{T}_{infl} \) as a function g of \(\mathbf{E}_{infl} \)thus depends only on the first part which can be further simplified to a weighted regression of CWM \(\mathbf{M}\) with weights \(\mathbf{R}\)
$$\begin{aligned} \big \vert \big \vert {\Pi }_s \mathbf{T}_{infl} -g\left( {\mathbf{E}_{infl} } \right) \big \vert \big \vert ^{2} =\big \vert \big \vert \mathbf{R}^{1/2}\left( {\mathbf{M}-g\left( \mathbf{E} \right) } \right) \big \vert \big \vert ^{2} \end{aligned}$$
(54)
because, being a projection on sites, \({\Pi }_s \mathbf{T}_{infl} \) consists of trait means per site, i.e. CWMs, and each site is replicated \(y_{i+} \) times, leading to the weights \(\mathbf{R}\). The least-squares regression of \(\mathbf{T}_{infl} \) as a function g of \(\mathbf{E}_{infl} \) can thus be carried out as a weighted regression of the CWMs on the environmental data with weights \(\mathbf{R}\). Similarly it can be shown that by projection of \(\mathbf{E}_{infl} \) on the factor species the least-squares regression of \(\mathbf{E}_{infl} \) as a function h of \(\mathbf{T}_{infl} \) is a weighted regression of the SNCs on the trait data with weights \(\mathbf{K}\).
The variance of the residuals \(\left( {1-{\Pi }_s } \right) \mathbf{T}_{infl} \) per site represents the within-site trait variance that may deserve separate study in relation to the environment. Similarly, the variance of the residuals \(\left( {1-{\Pi }_{species} } \right) \mathbf{E}_{infl} \) per species represents the within-species environmental variance (niche breadth) that may deserve separate study in relation to the traits.
In the above derivations, the Poisson assumption is either explicit or implicit, but can also be overcome by choosing a transformation that makes the result Poisson-like, with variance proportional to the mean. For example, if the data follow a Poisson log-normal distribution it may make sense to analyse log(\(\mathbf{Y}+1)\) instead of \(\mathbf{Y}\).