Environmental and Ecological Statistics

, Volume 25, Issue 2, pp 171–197

# Algorithms and biplots for double constrained correspondence analysis

Open Access
Article

## Abstract

Correspondence analysis with linear external constraints on both the rows and the columns has been mentioned in the ecological literature, but lacks full mathematical treatment and easily available algorithms and software. This paper fills this gap by defining the method as maximizing the fourth-corner correlation between linear combinations, by providing novel algorithms, which demonstrate relationships with related methods, and by making a detailed study of possible biplots and associated approximations. The method is illustrated using ecological data on the abundances of species in sites and where the species are characterized by traits and sites by environmental variables. The trait data and environment data form the external constraints and the question is which traits and environmental variables are associated, how these associations drive species abundances and how they can be displayed in biplots. With microbiome data becoming widely available, these and related multivariate methods deserve more study as they might be routinely used in the future.

## Keywords

Biplot Canonical correlation analysis Canonical correspondence analysis Community ecology Fourth-corner correlation Multivariate analysis Trait-environment relations

## 1 Introduction

Double constrained correspondence analysis (dc-CA) was developed by Jean-Dominique Lebreton, Robert Sabatier and co-workers as a natural extension of canonical correspondence analysis (Lebreton et al. 1988a, b; ter Braak 1986, 1987) and applied in studies relating species attributes (traits) and environmental variables via the central sites-by-species abundance table (Lavorel et al. 1999, 1998). Canonical correspondence analysis (CCA) constrains the row scores of a correspondence analysis (CA) of the central table by linear combinations of environmental variables. In addition to a single constraint on the rows, dc-CA uses also a constraint on the columns. In applications, the column (species) scores are constrained by linear combinations of species traits. Lavorel et al. (1999) nicknamed the method therefore double CCA and Kleyer et al. (2012) also use this name. Here the method is abbreviated to dc-CA. Its inputs are three data tables: the central site-by-species table Y containing the abundances ($$\ge$$0) of the species in the sites (for short the community table), the site-by-environment table E, and the species-by-traits table T. To give another application domain, the central table may contain preference scores of persons on different products and the tables E and T person and product characteristics, respectively.

Curiously, dc-CA has never been explicitly defined mathematically in a scientific paper, presumably because its mathematics was crystal clear to its inventors. An exception is perhaps Böckenholt and Böckenholt (1990) but they defined the constraints in a different way, namely by the null space method (Takane 2013). As some applications of this method did appear, its novelty was gone and no thorough statistical description of the method was published. For Takane (2013), dc-CA is only a special case of constrained principal component analysis, but there is more to say than that, as Hill (1974) said for CA or Tenenhaus and Young (1985) for multiple CA. Meanwhile, Kleyer et al. (2012) evaluated several methods for analysing trait-environment relationships, including dc-CA, and did not detect any advantage of dc-CA over rival methods such as RLQ analysis (Dolédec et al. 1996), another three-table ordination method, and redundancy analysis on community weighted means (CWM-RDA) that consists in combining tables T and Y to build a new table of community weighted trait means (CWMs) that is related to E, in a second step, by a two-table method. All these methods are closely linked but differ notably in whether they take account of correlations among traits and/or among environmental variables by built-in multiple regressions. RLQ is based on PLS-like regressions and thus does not take account of any of these correlations, CWM-RDA takes account of the correlations among environmental variables, and dc-CA takes account of both. As a consequence, RLQ is more robust and can analyse any number of traits and environmental variables, whereas in CWM-RDA the number of environmental variables must not be large compared to the number of sites, and in dc-CA also the number of trait variables must not be large compared to the number of species. But provided care has been taken of these limitations, for example by prior dimension reduction (Lavorel et al. 1999) or variable selection (ter Braak and Verdonschot 1995), dc-CA can reveal trait and environment associations that cannot be identified in RLQ as we demonstrate in Sect. 4.

Reasons for the renewed interest in dc-CA are its relation with the fourth-corner approach (ter Braak 2017), the fact that its inertia is a Rao score test statistic for testing trait-environment association in a log-linear model (ter Braak 2017), and also the desirability of regression-based methods for trait-environment relations (Peres-Neto et al. 2017; Warton et al. 2015) where abundances or community weighted means are considered as the response variables. In this paper the focus is on algorithms for dc-CA. The first one is based on a singular value decomposition (SVD), the second one is an iteration algorithm based on the transition formulae, and the third is based on a combination of CCA and redundancy analysis (RDA) which are widely available together with associated methods for statistical testing and selection of variables (Dray and Dufour 2007; Oksanen et al. 2013; ter Braak and Šmilauer 2012).

These novel algorithms come in addition to three existing algorithms. The first is based on the observation of Lavorel et al. (1999) that dc-CA can be seen as a canonical correlation analysis of traits and environment weighted by the central table. The input data would be inflated trait and environment data tables as exemplified in Dray and Legendre (2008); see also ter Braak (2017). Although canonical correlation analysis is widely available in statistical packages, a weighted version does not exist to our knowledge; the unweighted version could be used only for integer abundance data after ‘super’ inflation of the data so that every individual has a separate row.

In practice, Lavorel et al. (1999) used another algorithm developed by Bacou et al. (1989). This second algorithm consisted of two CCA’s: a traditional CCA of the abundance table with respect to the environment data from which a table of fitted abundances is computed by the reconstitution formula known from CA, and then another CCA of the transposed fitted table with respect to the trait data. A problem with this approach is that the fitted table may contain negative values and most programs for CCA do not allow negative values in the abundance table. Lavorel et al. (1999) do not mention this problem or how they dealt with it. Presumably their software allowed negative abundance values in a CCA.

The third algorithm is in the Appendices S2 and S4 of Kleyer et al. (2012). It starts from the results of a correspondence analysis of the central table, in line with how a canonical correspondence analysis is computed in the R package ade4 (Dray and Dufour 2007; R Core Team 2015). In this algorithm, the CA is foremost used to create row and column weights and a transformed abundance table that is projected on the environmental variables. The result is projected onto the traits, the result of which is finally rotated to principal axes. The projections are obtained by weighted regression. This approach, implemented in R function doublerda in Kleyer et al. (2012), is generic in that it can also be used for a double constrained principal component analysis (dc-PCA) when starting from a principal components analysis of the central table. See Takane (2013) for a related generic framework. The second and third algorithms share the feature that they work on fitted site-by-species tables.

Our new algorithms do not create such tables. The iteration algorithm works directly on the original data tables Y, E and T, the SVD on the matrix product of Y and orthogonalized E and T (say $$\mathbf{E}^{*}$$ and T$$^{*})$$ and the combination of CCA and RDA can be seen as a variant of CWM-RDA in which the community-weighted trait means (CWMs), usually obtained by combining Y and T, are computed with respect to $$\mathbf{T}^{*}$$ instead of to T. Reversely, we also described an alternative SNC-RDA in which the species-niche centroids (SNCs), usually computed by combining Y and E (see Peres-Neto et al. 2017 for more details), are computed with respect to $$\mathbf{E}^{*}$$ instead of to E. The CWMs and SNCs are of smaller dimensions than the original sites-by-species table Y, when the number of environmental variables is smaller than the number of sites and the number of traits is smaller than the number of species, respectively.

The paper is structured as follows. Section 2 defines dc-CA, presents the new algorithms, and defines some additional quantities that are useful for summarizing and plotting the result in biplots. Section 3 discusses the available biplots in detail and Sect. 4 presents a data example and simulated example comparing dc-CA with RLQ. Most derivations are given in the Appendices; all equations are illustrated on real data in an R-script in Supplementary “Appendix S1”.

## 2 Theory and method

### 2.1 Notation

The three data tables (Environment, Community table and Trait data), denoted here by E, Y and T, respectively Dolédec et al. (1996) used R, L and Q; ter Braak (1986) used Z for E), can be arranged as
\begin{aligned} \left[ {{\begin{array}{ll} {\mathbf{Y}^{T}}&{} \mathbf{T} \\ {\mathbf{E}^{T}}&{} \mathbf{F} \\ \end{array} }} \right] \end{aligned}
where F is the missing matrix (i.e., fourth-corner table), representing relations between the environment and species traits, that are of interest and should be estimated. With n sites, m species, p environmental variables and q traits, the dimensions of tables Y, E and T are $$n\times m$$, $$n\times p$$ and $$m\times q$$, respectively, so that F is $$p\times q$$. In the original method proposed by Legendre et al. (1997) the link table Y contained presence-absence of the m species in the n sites but this was later generalized to abundance or count data by Dray and Legendre (2008). The community table is denoted here by Y as it will be treated as a response matrix in CCA and dc-CA.

In the paper, bold lower case is used for column vectors, e.g. $$\mathbf{x}$$ is a vector with elements $$\left\{ {x_{i} } \right\}$$, i = $$1,\ldots , n$$. Bold upper case is used for matrices. Elements of Y will be denoted by $$y_{ij}$$; subscript i denotes a site (row of Y or E) and subscript j denotes a species (column of Y or row of T). A symbol “+” replacing an index means the sum over the index, e.g. $$y_{i+} =\mathop \sum \nolimits _j y_{ij}$$. Further, $$\mathbf{R}$$ and $$\mathbf{K}$$ are diagonal matrices with the weights $$\left\{ {y_{i+} } \right\}$$ and $$\{ {y_{+j} } \}$$, respectively, on their diagonal.

### 2.2 Definition and corresponding equations

There are several distinct derivations of correspondence analysis and therefore also of CCA and dc-CA, all leading to equivalent eigen-equations. Two model-based derivations are presented in “Appendix A1”. But, perhaps it is most natural to define dc-CA in the same spirit as in canonical correlation analysis, namely as maximizing a correlation. In dc-CA, this is the fourth-corner correlation f between trait t and environmental variable e (Legendre and Legendre 2012; Legendre et al. 1997; Peres-Neto et al. 2017), defined as
\begin{aligned} f=cor_\mathbf{Y}^{2} \left( {\mathbf{t},\mathbf{e}} \right) =\frac{\mathop \sum \nolimits _{i,j} y_{ij} {\tilde{t}}_{j} {\tilde{e}} _{i} }{\left\{ {\mathop \sum \nolimits _j y_{+j} {\tilde{t}}_j^{2} \mathop \sum \nolimits _i y_{i+} {\tilde{e}}_i^{2} } \right\} ^{1/2}} \end{aligned}
(1)
with $$\mathop {\tilde{t}}_j$$ and $$\mathop {\tilde{e}}_i$$ being centred versions of the trait and environmental variable, with weighted-means computed using weights derived from the abundance table Y:
\begin{aligned} \mathop {\tilde{t}}\nolimits _j =t_{j} -\mathop \sum \nolimits _{j} y_{+j} t_{j} /y_{++} \hbox { and }{\tilde{e}}_i = e_i -\mathop \sum \nolimits _{i} y_{i+} e_i /y_{++} \end{aligned}
(2)
The first axis of dc-CA can now be defined as follows.

### Definition

dc-CA is a method that finds linear combinations of the traits and of the environmental variables such that the fourth corner correlation between these linear combinations is maximized

The definition leads to an eigen-equation of which the first (non-trivial) eigen-vector is the solution. Later axes are then subsequent eigen-vectors which have the interpretation that they also maximize the fourth corner correlation but subject to the additional constraint of their orthogonality (in a particular metric) to the previous axes. The definition is in line with CA as a method of optimal scaling of the row and column categories of a contingency table (Gifi 1990; Hill 1974).

For simplicity and without loss of generality, it is assumed from now onwards that the traits and environmental variables are centred as in Eq. (2) so that $$\mathbf{1}_{n}^{T} \mathbf{RE}=\mathbf{0}_p$$ and $$\mathbf{1}_{m}^{T} \mathbf{KT}=\mathbf{0}_q$$. This centring eliminates the need for the intercept terms and avoids the trivial axis of (dc-)CA. The definition of dc-CA above leads to the following maximization problem:
\begin{aligned} max_{\mathbf{b},\mathbf{c}} \mathbf{x}^{T}{} \mathbf{Yu} \hbox { with } \mathbf{x}=\mathbf{Eb},\mathbf{u}=\mathbf{Tc}, \mathbf{x}^{T}{} \mathbf{Rx}=1 \hbox { and } \mathbf{u}^{T}{} \mathbf{Ku}=1. \end{aligned}
(3)
This definition lacks explicit centring of $$\mathbf{x}$$ (site scores) and $$\mathbf{u}$$ (species scores), but the linear combinations $$\mathbf{x}=\mathbf{Eb}$$ and $$\mathbf{u}=\mathbf{Tc}$$ are centred, because $$\mathbf{E}$$ and $$\mathbf{T}$$ are centred. Insertion of $$\mathbf{x}=\mathbf{Eb}$$ and $$\mathbf{u}=\mathbf{Tc}$$ in Eq. (3) leads to
\begin{aligned} max_{\mathbf{b},\mathbf{c}} \mathbf{b}^{T}{} \mathbf{E}^{T}{} \mathbf{YTc} \hbox { subject to } \mathbf{b}^{T}{} \mathbf{E}^{T}{} \mathbf{REb}=1 \hbox { and } \mathbf{c}^{T}\mathbf{T}^{T}{} \mathbf{KTc}=1. \end{aligned}
(4)
Applying the Lagrange multiplier method (Magnus and Neudecker 1988) leads to the necessary conditions for the maximizer of Eq. (4):
\begin{aligned} \mathbf{E}^{T}{} \mathbf{YTc}-\lambda _{b} \mathbf{E}^{T}{} \mathbf{REb}=\mathbf{0}_p \hbox { and } \mathbf{T}^{T}{} \mathbf{Y}^{T}{} \mathbf{Eb}-\lambda _c \mathbf{T}^{T}\mathbf{KTc}=\mathbf{0}_{q} , \end{aligned}
(5)
where $$\lambda _{b}$$ and $$\lambda _{c}$$ are Lagrange multipliers, one for each constraint. Re-expression gives the transition formulae of dc-CA between $$\mathbf{b}$$ and $$\mathbf{c}$$:
\begin{aligned} \lambda _b \mathbf{b}= & {} \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}\mathbf{E}^{T}{} \mathbf{YTc} \end{aligned}
(6)
\begin{aligned} \lambda _c \mathbf{c}= & {} \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}{} \mathbf{T}^{T}{} \mathbf{Y}^{T}{} \mathbf{Eb} \end{aligned}
(7)
Insertion of Eq. (7) in Eq. (6) gives the eigen-equation of dc-CA for $$\mathbf{b}$$
\begin{aligned} \lambda \mathbf{b}=\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}\mathbf{E}^{T}{} \mathbf{YT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}\mathbf{T}^{T}{} \mathbf{Y}^{T}{} \mathbf{Eb}, \end{aligned}
(8)
where $$\lambda =\lambda _b \lambda _c$$ is the first eigenvalue of dc-CA and is equal to the square of the maximized fourth-corner correlation between the linear combinations of traits and environmental variables. Equation (8) shows a great similarity with the eigen-equation of canonical correlation analysis (Gittins 1985; Mardia et al. 1980), therewith confirming the observation of Lavorel et al. (1999) that dc-CA is a weighted canonical correlation analysis of traits and environment with the central table acting as weights.
For viewing the relationship of dc-CA with CA and CCA, it is instructive to rewrite the transition formulae (6) and (7) so that these include site and species scores as in ter Braak (1986) for CCA. There are two sets of each: scores that are linear combinations (LC scores) and scores that are weighted averages (WA scores). The latter are indicated with an asterisk in the following transition formulae of dc-CA. A redundant constant $$\alpha$$ (0$$\le \alpha \le 1)$$ is inserted so as to allow various scalings of ordination diagrams based on scores from these transition formulae for different dimensions, as in CA (Greenacre 1984). Example scalings are row-metric preserving $$\left( {\alpha =1} \right)$$, symmetric $$\left( {\alpha =1/2} \right)$$ or column-metric preserving $$\left( {\alpha =0} \right)$$. In Eqs. (3) and (4) the LC site and species scores were R- and K-normalized so as to define a correlation, but for a correlation the scaling does not matter and from now onwards, we set $$\mathbf{x}^{T}\mathbf{Rx}=\lambda ^{\alpha }$$ and the scaling of all other scores then follows from the transition formulae (for example, $$\mathbf{u}^{T}\mathbf{Ku}=\lambda ^{1-\alpha }$$, see next section):
\begin{aligned} \lambda ^{\alpha }u_{k}^*= & {} \mathop {\sum }\nolimits _{i} y_{ik} x_{i} /y_{+k} \hbox { or in matrix notation}, \quad \lambda ^{\alpha }{} \mathbf{u}^{*}=\mathbf{K}^{-1}{} \mathbf{Y}^{T}{} \mathbf{x} \end{aligned}
(9)
\begin{aligned} \mathbf{c}= & {} \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1}{} \mathbf{T}^{T}\mathbf{Ku}^{*} \end{aligned}
(10)
\begin{aligned} \mathbf{u}= & {} \mathbf{Tc} \end{aligned}
(11)
\begin{aligned} \lambda ^{1-\alpha }x_{i}^{*}= & {} \mathop {\sum }\nolimits _{k} y_{ik} u_{k} /y_{i+} \hbox { or in matrix notation},\quad \lambda ^{1-\alpha }{} \mathbf{x}^{*}=\mathbf{R}^{-1}{} \mathbf{Yu} \end{aligned}
(12)
\begin{aligned} \mathbf{b}= & {} \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1}{} \mathbf{E}^{T}{} \mathbf{Rx}^{*} \end{aligned}
(13)
\begin{aligned} \mathbf{x}= & {} \mathbf{Eb}. \end{aligned}
(14)
In words, the transition formulae read as follows. Starting with Eq. (9):
• The WA species scores ($$u_k^{*} )$$ are (proportional to) weighted averages of LC site scores (i.e. the constrained site scores, $$\mathbf{x})$$.

• Canonical weights for the traits ($$\mathbf{c})$$ are coefficients of the regression of the WA species scores ($$u_k^{*} )$$ on the traits ($$\mathbf{T})$$ with the species total abundances ($$y_{+k} )$$ as weights.

• LC species scores (i.e. the constrained species scores, $$\mathbf{u})$$ are a linear combination of the traits.

• WA site scores ($$x_i^{*} )$$ are (proportional to) weighted averages of LC species scores ($$\mathbf{u})$$.

• Canonical weights for the environmental variables ($$\mathbf{b})$$ are coefficients of the regression of the WA site scores ($$x_i^{*} )$$ on the environmental variables ($$\mathbf{E})$$ with the site total abundances ($$y_{i+} )$$ as weights.

• LC site scores (constrained site scores, $$\mathbf{x})$$ are a linear combination of the environmental variables ($$\mathbf{E})$$.

There are two extra equations for dc-CA compared to CCA [Eqs. (10) and (11)] which define the constraints on the species scores based on trait values. In CCA the only species scores are WA scores, and therefore these are used in Eq. (12) for the WA site scores. In dc-CA the LC species scores are used to define the WA site scores [Eq. (12)] and reversely the LC site scores are used to define the WA species scores [Eq. (9)]. In CA, there are only WA site scores and the transition formulae are reduced to Eqs. (9) and (12). Note that dc-CA has two kinds of canonical weights, one for environmental variables and one for traits. The transition formulae show that dc-CA is constrained reciprocal averaging (Hill 1973).

Special cases of dc-CA are of course CCA if $$\mathbf{T}=\mathbf{I}_m$$ or $$q \ge m$$ so that there are effectively no constraints and CA if also $$\mathbf{E}=\mathbf{I}_n$$ or $$p \ge n$$.

### 2.3 Algorithm based on a SVD

The eigenproblem posed in Eq. (8) could be solved by an asymmetric eigenproblem solver or it could be first transformed to a two-sided eigenvalue problem. However, it is more convenient to obtain the solution of dc-CA via a singular value decomposition (SVD) (Golub and Reinch 1970; Golub and Loan 1989). Let
\begin{aligned} \mathbf{D}=\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}{} \mathbf{E}^{T}\mathbf{YT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2} \end{aligned}
(15)
and the SVD of $$\mathbf{D}$$ be
\begin{aligned} \mathbf{D}=\mathbf{P}{\varvec{\Delta }} \mathbf{Q}^{T} \end{aligned}
(16)
with $$\mathbf{P}$$ and $$\mathbf{Q}$$ orthonormal matrices and $${\varvec{\Delta }}$$ a diagonal matrix with singular values in decreasing order. Then the singular values are the maximized fourth corner correlations of the dc-CA axes and the columns of
\begin{aligned} \mathbf{B}=\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}\mathbf{P}{\varvec{\Delta }} ^{\alpha } \hbox { and } \mathbf{C}=\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2}{} \mathbf{Q}{\varvec{\Delta }}^{\alpha -1} \end{aligned}
(17)
satisfy the transition formulae (6) and (7). This can be seen as follows: from Eq. (16), $$\mathbf{P}{\varvec{\Delta }}^{\alpha }=\mathbf{DQ}{\varvec{\Delta }}^{\alpha -1}$$; insert $$\mathbf{P}$$ and $$\mathbf{Q}$$ from Eq. (17) and $$\mathbf{D}$$ from Eq. (15); simplify and rearrange to obtain Eq. (6). By consequence, dc-CA eigenvalues are the squared singular values.

This approach immediately solves for all dc-CA eigen-vectors. The LC site scores and LC species scores of all dimensions, $$\mathbf{X}=\mathbf{EB}$$ and $$\mathbf{U}=\mathbf{TC}$$, are R- and K-orthogonal and the scaling factor $${\varvec{\Delta }}^{\alpha }$$ in Eq. (17) ensures that $$\mathbf{X}^{T}{} \mathbf{R}\mathbf{X}={\varvec{\Lambda }}^{\alpha }$$ and $$\mathbf{U}^{T}{} \mathbf{K}\mathbf{U}={\varvec{\Lambda }}^{1-\alpha }$$, where $${{\varvec{\Lambda }} }= {\varvec{\Delta }}^{2}$$, respectively, as in the transition formulae.

Note that $$tr\left( {\mathbf{D}^{T}{} \mathbf{D}} \right)$$ is equal to the sum of all eigenvalues satisfying Eq. (8), also known as the total inertia of the dc-CA. ter Braak (2017) showed that $$y_{++} tr\left( {\mathbf{D}^{T}{} \mathbf{D}} \right)$$ is the Rao score test statistic for testing the trait-environment interaction in a Poisson log-linear model with saturated main effects. The first eigenvalue is the one-dimensional replacement thereof that could be also useful as test statistic in permutation tests if the alternative hypothesis is likely to be one-dimensional.

### 2.4 Algorithm based on the transition formulae

If only the first or only a few eigen-vectors need to be calculated, iterative methods can be used that repeatedly cycle through the transition formulae starting from arbitrary, non-constant starting values for either $$\mathbf{b}$$ or $$\mathbf{c}$$, when using Eqs. (6) and (7), and for either $$\mathbf{b}$$, $$\mathbf{c}$$, $$\mathbf{x}^{*}$$ or $$\mathbf{u}^{*}$$, when using Eqs. (9)–(14). The latter is a constrained reciprocal averaging algorithm (Hill 1973). Such an algorithm for CCA is described in the “Appendix” of ter Braak and Prentice (1988) and the extension to dc–CA is trivial. This iterative algorithm is related to the well-known power algorithm for solving eigenproblems (Good 1969; Gourlay and Watson 1973). Power algorithms tend to be slow, but acceleration methods make them practical.

### 2.5 Algorithm based on combining CCA and a weighted RDA

This subsection presents an algorithm based on a combination of CCA and RDA. The algorithm also gives insight into the relation between dc-CA and CWM-RDA.

CWM-RDA starts with computing a table $$\mathbf{M}$$ of community weighted mean trait values (CWM)
\begin{aligned} \mathbf{M}=\mathbf{R}^{-1}{} \mathbf{YT} \end{aligned}
(18)
and then applies an RDA of $$\mathbf{M}$$ with respect to the environmental data $$\mathbf{E}$$. This analysis is essentially an SVD of
\begin{aligned} \mathbf{D}_{\mathrm{cwm-rda}} =\left( {\mathbf{E}^{T}{} \mathbf{E}} \right) ^{-1/2}{} \mathbf{E}^{T}{} \mathbf{M}= \left( {\mathbf{E}^{T}{} \mathbf{E}} \right) ^{-1/2}{} \mathbf{E}^{T}(\mathbf{R}^{-1}{} \mathbf{YT}). \end{aligned}
(19)
In comparison with (15), this equation lacks the weighing of the sites with weights $$\mathbf{R}$$ and lacks a term involving the covariances among the traits $$(\mathbf{T}^{T}{} \mathbf{KT})$$. The latter can be included implicitly by transforming the traits, prior to analysis, to orthonormal ones by:
\begin{aligned} \mathbf{T}^{*}=\mathbf{T}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2} \end{aligned}
(20)
so that $$\mathbf{T}^{*T}{} \mathbf{K}1_m =\mathbf{0}_q$$ and $$\mathbf{T}^{*T}\mathbf{KT}^{*}=\mathbf{I}_q$$. By replacing the unweighted RDA by a weighted RDA with site weights $$\mathbf{R}$$, the equivalent SVD is
\begin{aligned} \mathbf{D}_{\mathrm{cwm - wrda}} =\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2} \mathbf{E}^{T}{} \mathbf{R}(\mathbf{R}^{-1}{} \mathbf{YT}^{*})= \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}{} \mathbf{E}^{T}\mathbf{YT}^{*} =\mathbf{D}. \end{aligned}
(21)
As a consequence, dc-CA can be obtained by the following steps:
1. 1.

Transforming the traits to K-orthonormal ones using species weights $$\mathbf{K}$$,

2. 2.

Computing community weighted means of the orthonormal traits $$\mathbf{T}^{*}$$ and then

3. 3.

Applying a weighted RDA of these community weighted means with respect to the environmental data $$\mathbf{E}$$ using site weights $$\mathbf{R}$$.

The difference between dc-CA and CWM-RDA is thus that dc-CA uses differential weights for sites and accounts for the covariances among the traits, whereas CWM-RDA uses equal weights for the sites and accounts for differences in variance of the traits only. As a side note, in Kleyer et al. (2012) and the Traits example in Canoco 5.0 (ter Braak and Šmilauer 2012), the RDA implicitly standardized the response data, which are the community weighted means. However, it makes much more sense to standardize the traits from which they are computed (McCune 2015) and to leave the obtained community weighted means as they are (i.e. not to standardize them in RDA). The reason is that the variances of community weighted means of standardized traits carry important information on the relative importance of the traits (Peres-Neto et al. 2017). For an important trait, the variance will be relatively high and for a trait that is unrelated to the abundance table $$\mathbf{Y}$$, the variance will be very low. This also applies to dc-CA where traits are made orthogonal and also normalized (orthonormalized) and the subsequent community weighted means are analyzed untransformed (non-standardized) by an RDA so as to obtain a dc-CA.

The above Steps 1 and 2 can be combined in a single CCA of the transposed abundance table with respect to the traits, i.e. CCA($$\mathbf{Y}^{T}\sim \mathbf{T})$$. The row scores of this CCA are linear combinations of the traits which are K-orthogonal and, depending on the scaling of the axes, also K-normalized (the scaling is column-metric preserving) (ter Braak 1986, 2014). The column scores of this analysis (response variable scores, in this case representing rows of $$\mathbf{Y})$$ are weighted averages of the row scores and thus community weighted means of K-orthonormal traits [cf. Eq. (9)]. This way of making the traits orthonormal has an advantage for trait data that are (near) singular: the CCA ranks the dimensions in order of their importance for $$\mathbf{Y}^{T}$$ so that it is unlikely that an important dimension is dropped due to an unlucky cut-off in the decision for rank-deficiency.

The resulting algorithm is thus
1. 1.

Perform CCA($$\mathbf{Y}^{T}\sim \mathbf{T})$$: a CCA of the transposed community table onto the traits

2. 2.

Obtain the column scores ($$\mathbf{M}^{*}$$, say) from this analysis in site-metric preserving scaling, an $$n \times q^{*}$$ table of scores with $$q^{*}$$ the rank of the trait data, which are community weighted means of orthonormalized traits.

3. 3.

Perform a weighted $$\hbox {RDA}_{\mathbf{R}}(\mathbf{M}^{*}\sim \mathbf{E})$$: an RDA of $$\mathbf{M}^{*}$$ on the environmental variables using row weights $$\mathbf{R}$$.

The canonical coefficients of this RDA are the dc-CA coefficients for the environmental variables [Eqs. (13) or (17)] and the resulting linear combinations of this RDA are the dc-CA linear combinations of environmental variables [Eq. (14)], both up to sign changes of axes. In “Appendix A2” it is shown that the unconstrained row scores of the RDA are identical to the WA site scores.
Because of the symmetry in dc-CA between rows and columns (sites and species) it is also possible to start with a CCA of $$\mathbf{Y}$$ with respect to $$\mathbf{E}$$:
1. 1.

Perform CCA($$\mathbf{Y}\sim \mathbf{E})$$: a CCA of the community table on to the environmental variables

2. 2.

Obtain the column scores ($$\mathbf{S}^{*}$$, say) from this analysis in species-metric preserving scaling, an $$m \times p^{*}$$ table of scores with $$p^{*}$$ the rank of the environmental data, which are species niche centroids (Peres-Neto et al. 2017) of orthonormalized environmental variables.

3. 3.

Perform weighted $$\hbox {RDA}_{\mathbf{K}}(\mathbf{S}^{*}\sim \mathbf{T})$$: an RDA of $$\mathbf{S}^{*}$$ on the traits using row weights $$\mathbf{K}$$.

The canonical coefficients of this RDA are the dc-CA coefficients for the traits [Eqs. (10) or (17)], the resulting linear combinations of this RDA are the dc-CA linear combinations of traits [Eq. (11)] and the unconstrained row scores of the RDA are identical to the scores in Eq. (9), which can be shown analogously to the proof in “Appendix A2”.

The computer program Canoco 5.10 (ter Braak and Šmilauer 2012) implements dc-CA via these combinations of CCA and weighted RDA where the first combination is used for selection and significance testing of environmental variables and the second for selection and significance testing of traits.

### 2.6 Derived scores

Most scores that are useful for interpretation or plotting in ordination diagrams are already defined and available from the transition formulae of Sect. 2.2. As in CCA and canonical correlation analysis (ter Braak 1990), there are a few more scores that are useful for interpretation and plotting:
• Intra-set correlations of the traits with the constrained dc-CA axes, $$cor_\mathbf{K} \left( {\mathbf{T},\mathbf{U}} \right)$$, and similarly for the environmental variables, $$cor_\mathbf{K} \left( {\mathbf{E},\mathbf{X}} \right)$$.

• Inter-set correlations of the traits with the WA dc-CA axes, $$cor_\mathbf{K} \left( {\mathbf{T},\mathbf{U}^{*}} \right)$$, and similarly for the environmental variables, $$cor_\mathbf{K} \left( {\mathbf{E},\mathbf{X}^{*}} \right)$$. These are inter-set correlations in the setting of CCA as, for example, $$\mathbf{X}^{*}$$ is a linear combination of $$\mathbf{Y}$$ and not of $$\mathbf{E}$$.

• Fourth-corner correlations of the traits with the constrained dc-CA axes, $$cor_\mathbf{Y} \left( {\mathbf{T},\mathbf{X}} \right)$$, and similarly for the environmental variables, $$cor_\mathbf{Y} \left( {\mathbf{E},\mathbf{U}} \right)$$. When dc-CA is interpreted as a canonical correlation analysis on super inflated data, $$\mathbf{Y}$$ seemingly disappears and these fourth-corner correlations are then in fact the inter-set correlations of the canonical correlation analysis. Unless noted explicitly otherwise, inter-set correlations of dc-CA refer to their definition in the setting of CCA and RDA, that is, to $$cor_\mathbf{K} \left( {\mathbf{E},\mathbf{X}^{*}} \right)$$ and $$cor_\mathbf{K} \left( {\mathbf{T},\mathbf{U}^{*}} \right)$$.

• Centroids of scores for categories of nominal traits and environmental variables. Such centroids can be viewed as scores for ‘super species’ and ‘super sites’ as they average scores of species or of sites belonging to the same category. The centroids are all weighted averages using the weights $$\left\{ {y_{i+} } \right\}$$ for sites and $$\{y_{+k} \}$$ for species.

## 3 Biplots

Biplots serve to visualize the main pattern in the analyzed data, by plotting the scores on, typically, the first two axes of the analysis, so that their inner product approximates a matrix, typically a data table or table with statistics such as correlation or regression coefficients (Gabriel 1982).With three kinds of items in the plot (sites, species, environmental variables) such plots are often called triplots, in which pairs of items have an inner product (biplot) interpretation. In dc-CA there is a fourth kind of items: traits. This section proposes quadriplots with all four kinds of items in which almost all pairs of items have a biplot interpretation.

As in CCA, RDA and canonical correlation analysis (ter Braak 1990) there is a choice to visualize in the biplots regression coefficients or correlations. Biplots visualizing regression coefficients are treated in “Appendix A5”. For example, a biplot of both sets of canonical weights ($$\mathbf{B}$$ and $$\mathbf{C})$$ approximates the regression coefficients associated with the bilinear interaction between (i.e. products of) environmental variables and traits. The other biplots in “Appendix A5” essentially follow from considering dc-CA as a canonical correlation analysis on inflated trait and environment data. In this section, the focus is on biplots of fourth-corner correlations between traits and environmental variables.

For K- and R-normalized trait and environmental variables, the $$p \times q$$ matrix of fourth-corner correlations between environmental variables and traits is simply
\begin{aligned} \mathbf{F}=\mathbf{E}^{T}{} \mathbf{YT} \end{aligned}
(22)
Following ter Braak (1990), a biplot of $$\mathbf{F}$$ can be based on a “rank r weighted least-squares approximation” of the form $$\mathbf{F}\approx \mathbf{B}_f \mathbf{C}_f^T$$ with $$\mathbf{B}_f$$ and $$\mathbf{C}_f$$ matrices of order $$p \times r$$ and $$q \times r$$, respectively. For producing biplots, a convenient choice is r is 2 or 3. Whether such an r is adequate can be judged by permutation tests (see example section). With as weight matrices the inverses of $$\mathbf{E}^{T}{} \mathbf{RE}$$ and $$\mathbf{T}^{T}{} \mathbf{KT}$$, the approximation is invariant to linear transformations of $$\mathbf{E}$$ and $$\mathbf{T}$$ and can be obtained from dc-CA as follows. We seek the minimum over $$\mathbf{B}_f$$ and $$\mathbf{C}_f$$ of
\begin{aligned}&\left\| {\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}\left( {\mathbf{F}-\mathbf{B}_f \mathbf{C}_f^T } \right) \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{{-1/2}}}\right\| ^{2}\nonumber \\&\quad =\left\| {\mathbf{D}-\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}{} \mathbf{B}_f \mathbf{C}_f^T \left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{{-1/2}}}\right\| ^{{2}} \end{aligned}
(23)
As follows from the Eckhart–Young theorem (Greenacre 1984) the minimum is obtained from the singular value decomposition of $$\mathbf{D}$$. By consequence, the minimum of (23) is $$\lambda _{r+1} +\ldots +\lambda _{\hbox {min}\left( {p,q} \right) }$$ and is obtained by
\begin{aligned} \mathbf{B}_f =\left[ \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{1/2}\mathbf{P}{\varvec{\Delta }} ^{\alpha }\right] _r \hbox { and } \mathbf{C}_f =\left[ {\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{1/2}\mathbf{Q}{\varvec{\Delta }} ^{1-\alpha }} \right] _r . \end{aligned}
(24)
For $$\alpha =1$$, $$\mathbf{B}_f$$ contains the fourth-corner correlations of the environmental variables with the axes $$\mathbf{U}$$, $$cor_\mathbf{Y} \left( {\mathbf{E},\mathbf{U}} \right)$$, and $$\mathbf{C}_f$$ the intra-set correlations for the traits, $$cor_\mathbf{K} \left( {\mathbf{T},\mathbf{U}} \right)$$ (“Appendix A3”). For $$\alpha =0$$, the role of traits and environmental variables is reversed (“Appendix A3”) so that $$\mathbf{C}_f$$ contains the fourth-corner correlations of the traits with the axes $$\mathbf{X}$$, $$cor_\mathbf{Y} \left( {\mathbf{T},\mathbf{X}} \right)$$, and $$\mathbf{B}_f$$ the intra-set correlations for the environmental variables, $$cor_R \left( {\mathbf{E},\mathbf{X}} \right)$$. The coefficients $$\mathbf{B}_f$$ and $$\mathbf{C}_f$$ are termed biplot scores as in CCA (Oksanen et al. 2013; ter Braak and Šmilauer 2012); they vary with $$\alpha$$ unlike the intra, inter and fourth corner correlations. In the symmetric scaling $$(\alpha =0.5)$$, each biplot score is the geometric mean of its intra-set correlation and its fourth-corner correlation with the axis.

“Appendix A4” shows that when plotting $$\mathbf{B}_f$$, $$\mathbf{C}_f$$, $$\mathbf{X}$$ and $$\mathbf{U}$$ together the pairs $$\mathbf{B}_f$$$$\mathbf{U}$$ and $$\mathbf{C}_f$$$$\mathbf{X}$$ form a weighted least-squares biplot of the species niche centroids $$\left( {\mathbf{K}^{-1}\mathbf{Y}^{T}{} \mathbf{E}} \right)$$ and CWMs $$\left( {\mathbf{R}^{-1}{} \mathbf{YT}} \right)$$, respectively. Also, $$\mathbf{X}$$ and $$\mathbf{U}$$ form a weighted least-squares biplot of the fitted contingency ratios, which are the contingency ratios projected on both $$\mathbf{E}$$ and $$\mathbf{T}$$, analogously to the situation in CCA (ter Braak 2014).

The above biplot options are least-squares for all values of $$\alpha$$. For $$\alpha =1$$, the plot is row-metric preserving, and thus approximates the chi-square distance between sites based on fitted values, and $$\mathbf{C}_f$$ contains the intra-set correlations for the traits and the biplot thus weakly approximates the correlations among the traits. For $$\alpha =0$$, the plot is column-metric preserving, and thus approximates the chi-square distance between species based on fitted values, and $$\mathbf{B}_f$$ contains the intra-set correlations for the environmental variables and the biplot thus weakly approximates the correlations among the environmental variables. In conclusion, when plotting $$\mathbf{B}_f$$, $$\mathbf{C}_f$$, $$\mathbf{X}$$ and $$\mathbf{U}$$ together with $$\alpha =0$$ or 1, five of the six pairs of items have a biplot interpretation (the pairs $$\mathbf{B}_f$$$$\mathbf{X}$$ and $$\mathbf{C}_f$$$$\mathbf{U}$$ have no known useful biplot interpretation for $$\alpha =1$$ and 0, respectively).

## 4 Real data and simulation example

We illustrate dc-CA by analysing the example dataset of Jamil et al. (2013). It represents 20 sites from Dutch dune meadows, where the plant community composition was recorded together with five environmental variables from which two important ones are used here for illustration (abbreviations between parentheses): moisture content of the soil (Moisture) and manure quantity applied (Manure). For each of the 28 plant species, five functional traits were available from which two important ones are used here for illustration: specific leaf area (SLA) and seed mass. With these variables in the model, further traits and environmental variables do not contribute much to explain the community table as judged by species-based and site-based permutation tests, respectively, using the total dc-CA inertia as the test statistic. Fig. 1 Quadriplot of dc-CA relating two selected traits (blue) and two selected environmental variables (red) in the Dutch dune meadow data with 20 sites (circular points) and 29 plant species (triangles) ($$\alpha =1$$, $$\lambda _1 =0.43$$ and $$\lambda _2 =0.15)$$. Abbreviations follow ter Braak (1987). For interpretation see text

Figure 1 shows the row-metric preserving dc-CA biplot $$\left( {\alpha =1} \right)$$ with arrows for traits and environmental variables ($$\mathbf{C}_f$$ and $$\mathbf{B}_f )$$, which display by their inner product their fourth-corner correlation, together with points for species and sites. The strongest correlations are those between Moisture and Seed mass ($$-\,0.32$$) and Manure and SLA (0.23) as indicated by projecting the arrows for Seed mass and SLA on the arrows for Moisture and Manure; alternatively consider their obtuse and sharp angles, respectively, and the lengths of the arrows. The maximized fourth-corner correlations along the first (horizontal) and second (vertical) axes are 0.43 and 0.15, respectively. Because $$\alpha =1$$, the configuration of site points and environmental arrows shows the importance of the first axis compared to the second; the environmental arrows are fourth-corner correlations and the trait arrows intra-set correlations.

Beyond the pair $$\mathbf{B}_f$$$$\mathbf{C}_f$$, four other pairs of items have a biplot interpretation. Example interpretations are as follows:

Biplot of $$\mathbf{B}_f$$$$\mathbf{U}$$: the species points on the left in Fig. 1 have low SNC of Moisture (i.e. occur more at drier conditions) and the species on the right have high SNC with respect to Moisture (i.e. occur more at wetter conditions).

Biplot of $$\mathbf{C}_f$$$$\mathbf{X}$$: the site points when projected onto the arrow for SLA represent the CWM of SLA (their mean SLA), so that, for example, site 1 is inferred to have the highest mean SLA and sites 14 and 15 the lowest mean SLA.

Biplot of $$\mathbf{U}$$$$\mathbf{X}$$: the species and site points form a biplot of the contingency ratios, for example, the share of species Alo gen is high in site 1 compared to sites 14 and 15.

Biplot of $$\mathbf{C}_f$$$$\mathbf{U}$$: the species points when projected on to Seed mass represent their Seed mass, so that the species Vic lat and Lol per are inferred to have high Seed mass and Jun art and Agr Sto low Seed mass.

In this simple example, there is little difference between dc-CA and RLQ: although dc-CA maximized the fourth-corner correlation and RLQ maximized the fourth-corner covariance (subject to standardized axes), the resulting fourth-correlations are almost identical. That is not always the case as will be shown in a simulation example. The R code for the simulation is in Supplementary “Appendix S2”.

In the simulation example, six traits and nine environmental variables are generated from a multivariate normal distribution with variance matrix with entries $$\{\rho ^{i-j}\}$$, so that the pairwise correlation is $$\rho$$ for neighbouring columns and decreases with the column number distance between variables. The difference between the first two traits was taken as a hidden trait dimension ($$\mathbf{t})$$ and, similarly, the difference between the first two environmental variables was taken as a hidden environment dimension ($$\mathbf{e})$$ and the product $$\mathbf{te}$$ was used as one of the predictors in a log-linear model from which negatively binomial counts ($$y_{ij} )$$ were generated. The other traits and environmental variables had no effect on the counts, but to make the data a little more realistic two more latent effects were added to the log-linear model: one product between $$\mathbf{e}$$ and an independent latent standard normal trait $$\mathbf{z}$$ and one product between a latent standard normal environmental variable and the latent trait $$\mathbf{z}$$. The count data are thus in essence two dimensional but only one dimension can usefully be correlated with the measured traits and environmental variables. The intercept in the log-linear model was log(10), all other coefficients were 0.2 and $$\rho =0.7$$, so that neighbouring variables share about 50% of their variance.
Table 1

Percentiles of the fourth-correlations of the first two axes and their squared ratio of dc-CA and RLQ in 10,000 simulated data sets with $$n = m =$$ 100

$$\rho _1$$

$$\rho _2$$

$$\lambda _1 /\lambda _2$$

dc-CA

RLQ

dc-CA

RLQ

dc-CA

RLQ

2.50%

0.15

0.03

0.04

0.02

8.0

0.2

50%

0.21

0.08

0.05

0.03

18.2

7.4

97.50%

0.28

0.15

0.06

0.08

39.2

45.6

Table 1 shows results of the simulation. The maximized fourth-corner correlation of dc-CA of the first axis was much higher than the RLQ fourth-corner correlation of the first axis, with the 2.5% percentile of dc-CA being even bigger than the 97.5% percentile of RLQ. The fourth-corner correlations of the second axes (which were zero in the model) were comparable. The ratio of the first over the second eigenvalue (squared correlation), which is a measure of the dominance of the first axis over the second in each simulated data set, is on average much higher in dc-CA than in RLQ (last two columns in Table 1). Compared to RLQ, dc-CA thus much better indicates that only one dimension is important for describing the association between the observed trait and environmental variables.

Figure 2 shows the distribution of the canonical coefficients of traits and environmental variables on the first axis in the 10,000 simulations. The contrast between the first and second variable is clear in almost all data sets in dc-CA whereas it is blurred or even absent in RLQ. The remaining noise variables have larger coefficients in RLQ than in dc-CA and also show negative bias. Fig. 2 Violin plots of the canonical coefficients of traits and environmental variables on the first axis in the 10,000 simulated data set in the simulation example

## 5 Discussion

This paper fills a gap by giving a mathematical description of double constrained correspondence analysis (dc-CA) starting from the idea that it maximizes a correlation, in particular the fourth-corner correlation between linear combinations of traits and of environmental variables. It was known from the start that dc-CA is identical with canonical correlation analysis of super-inflated trait and environment data. But dc-CA deserves special treatment as the units of sampling are not the individuals that are counted but the sites with individuals belonging to different species. Our mathematical development shows the precise role of community (site-) weighted trait means (CWM) and, its reverse, species niche centroids (species-weighted mean environment, SNC) in dc-CA. In “Appendix A6” it is reiterated why CWMs and SNCs are key statistics in trait-environment studies and that the within-site trait variance and the within-species environmental variance (niche breadth) may deserve separate study in relation to the environment and traits, respectively. The novel algorithm that combines a singly constrained correspondence (i.e. CCA) with a weighted singly constrained principal component analysis (i.e. redundancy analysis) shows the relation with CWM-RDA, an ad-hoc method that is commonly used to relate traits to environment. CWM-RDA uses regression onto the environmental variables, whereas dc-CA also uses regression onto the traits. By contrast, RLQ, one of the oldest multivariate methods for trait-environment analysis, is based on covariance without using regression at all. Our small simulation example demonstrated that, by combining correlated observed variables, dc-CA can detect trait and environment relationships that remain hidden in RLQ.

RLQ is based on coinertia analysis (Dray et al. 2003) while dc-CA is based on CCA. Therefore the comparison between coinertia analysis and CCA by Dray et al. (2003) is of interest. They showed that CCA deteriorates in detecting the hidden gradients when many highly correlated environmental variables that have no real effect are included in the analysis. In such an extreme situation coinertia performs better than CCA and the same is expected for RLQ, and dc-CA. However, with moderate correlations or when multicollinearity problems are taken care of, for example by variable selection or by changing regression to ridge regression, we expect dc-CA to outperform RLQ.

Many researchers see CA as a special form of principal component analysis and thus dc-CA as a special form of double constrained principal component analysis (dc-PCA) (Takane 2013), also known as two-way CANDELINC (Douglas Carroll et al. 1980). The methods are perhaps even more similar in the double constrained case than in the unconstrained case. The similarity (and difference) is best seen from the matrix that must be subjected to SVD to obtain a weighted dc-PCA with weight matrices $$\mathbf{R}$$ and $$\mathbf{K}$$. Whereas dc-CA uses a SVD of $$\mathbf{D}$$ defined in Eq. (15), dc-PCA can be obtained from an SVD of
\begin{aligned} \mathbf{D}_{\mathrm{dc}-{\mathrm{pca}}} =\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}{} \mathbf{E}^{T}{} \mathbf{RYKT}\left( {\mathbf{T}^{T}{} \mathbf{KT}} \right) ^{-1/2} \end{aligned}
(25)
This equation shows immediately that dc-CA is a weighted dc-PCA of the contingency ratios $$\hbox {y}_{++} \mathbf{R}^{-1}{} \mathbf{YK}^{-1}$$ where the weight matrices $$\mathbf{R}$$ and $$\mathbf{K}$$ are diagonal with the row- and column sums of $$\mathbf{Y}$$ on the diagonal. The post-processing to obtain the canonical weights and the row and column scores is identical to that in dc-CA [e.g. Eq. (17)]. Viewed in this framework, there is no interpretation of dc-CA in terms of fourth-corner correlations. The link of the total inertia of dc-CA with the Rao score test statistic for testing linear-by-linear interaction in a contingency table (ter Braak 2017) shows explicitly that dc-CA is a natural method for count-like data.

Aitchison’s log-ratio analysis is essentially the analysis of double-centred log-transformed $$\mathbf{Y}$$ (see discussion by Dawid of Aitchison (1982) which leads to the centred log-ratio transformation). Microbiome data are sometimes analyzed by Aitchison’s log-ratio PCA (Gloor et al. 2016), despite the fact that they contain many zeros. Using weighted (double) (constrained) log-ratio analysis with row- and column sums of $$\mathbf{Y}$$ as weights (Greenacre and Lewi 2009) will decrease the adverse effect of rows and columns with many zeroes, at least if the number of zeroes is reflected in the weights (alternatively the row-wise and column-wise numbers of non-zeroes could be used as weights). A natural alternative is in our view (double) (constrained) CA which does not need tricks for handling zeroes in the data.

The computer program Canoco 5.10 (ter Braak and Šmilauer 2012) implements dc-CA as a combination of CCA and a weighted RDA. Weighted dc-PCA is implemented by changing the initial CCA($$\mathbf{Y}^{T}\sim \mathbf{T})$$ by a weighted $$\hbox {RDA}_{\mathbf{K}}(\mathbf{Y}^{T}\sim \mathbf{T})$$. For this purpose, both RDAs must be centred by rows and by columns in the metrics $$\mathbf{R}$$ and $$\mathbf{K}$$, respectively, which is in agreement with the idea that the trait-environment association is an interaction and should not involve main effects. By prior log-transformation, a (weighted or unweighted) double constrained log-ratio analysis is obtained. For statistical inference about the trait-environment relation in dc-CA see ter Braak (2017) and ter Braak et al. (2017) and, in the log-ratio context, Cormont et al. (2011). These methods can quickly provide an overview of which variables appear important. We believe that they deserve more consideration, evaluation and use.

## Supplementary material

10651_2017_395_MOESM1_ESM.pdf (166 kb)
Supplementary material 1 (pdf 165 KB)
10651_2017_395_MOESM2_ESM.pdf (187 kb)
Supplementary material 2 (pdf 186 KB)
10651_2017_395_MOESM3_ESM.zip (22 kb)
Supplementary material 3 (zip 21 KB)

## References

1. Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc B 44:139–177. http://www.jstor.org/stable/2345821
2. Bacou AM, Sabatier R, Lespinasse P (1989) Analyses des correspondence avec une ou deux contraintes avec le logiciel Biomeco, manuel de l’utilisateur. CEFE, CNRS, MontpellierGoogle Scholar
3. Böckenholt U, Böckenholt I (1990) Canonical analysis of contingency tables with linear constraints. Psychometrika 55:633–639.
4. Brown AM, Warton DI, Andrew NR, Binns M, Cassis G, Gibb H (2014) The fourth-corner solution—using predictive models to understand how species traits interact with the environment. Methods Ecol Evol 5:344–352.
5. Cormont A, Vos CC, van Turnhout CAM, Foppen RPB, ter Braak CJF (2011) Using life-history traits to explain bird population responses to changing weather variability. Clim Res 49:59–71.
6. Dolédec S, Chessel D, ter Braak CJF, Champely S (1996) Matching species traits to environmental variables: a new three-table ordination method. Environ Ecol Stat 3:143–166.
7. Douglas Carroll J, Pruzansky S, Kruskal JB (1980) Candelinc: a general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika 45:3–24.
8. Dray S, Chessel D, Thioulouse J (2003) Co-inertia analysis and the linking of ecological tables. Ecology 84:3078–3089. http://www.jstor.org/stable/3449976
9. Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22:1–20.
10. Dray S, Legendre P (2008) Testing the species traits environment relationships: the fourth-corner problem revisited. Ecology 89:3400–3412.
11. Gabriel KR (1982) Biplot. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 1. Wiley, New York, pp 263–271Google Scholar
12. Gabriel KR (1998) Generalised bilinear regression. Biometrika 85:689–700.
13. Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York. ISBN 9780471926207Google Scholar
14. Gittins R (1985) Canonical analysis. A review with applications in ecology. Springer, Berlin. ISBN 978-3-642-69878-1
15. Gloor GB, Wu JR, Pawlowsky-Glahn V, Egozcue JJ (2016) It’s all relative: analyzing microbiome data as compositions. Ann Epidemiol 26:322–329.
16. Golub GH, Reinch C (1970) Singular value decomposition and least squares solutions. Numerische Mathematik 14:403–420.
17. Golub GH, van Loan CF (1989) Matrix computations. The John Hopkins University Press, Baltimore. ISBN 9780801854149Google Scholar
18. Good IJ (1969) Some applications of singular decomposition of a matrix. Technometrics 11:823–831.
19. Goodman LA (1981) Association models and canonical correlation in the analysis of cross-classifications having ordered categories. J Am Stat Assoc 76:320–334.
20. Gourlay AR, Watson GA (1973) Computational methods for matrix eigenproblems. Wiley, New York. ISBN 0471319155Google Scholar
21. Gower JC, Hand DJ (1996) Biplots. Chapman, London. ISBN 9780412716300Google Scholar
22. Greenacre M, Lewi P (2009) Distributional equivalence and subcompositional coherence in the analysis of compositional data, contingency tables and ratio-scale measurements. J Classif 26:29–54.
23. Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic Press, London. ISBN 978-0122990502Google Scholar
24. Hill MO (1973) Reciprocal averaging: an eigenvector method of ordination. J Ecol 61:237–249.
25. Hill MO (1974) Correspondence analysis: a neglected multivariate method. Appl Stat 23:340–354.
26. Ihm P, van Groenewoud H (1975) A multivariate ordering of vegetation data based on Gaussian type gradient response curves. J Ecol 63:767–777.
27. Ihm P, van Groenewoud H (1984) Correspondence analysis and Gaussian ordination. Compstat Lect 3:5–60Google Scholar
28. Jamil T, Ozinga WA, Kleyer M, ter Braak CJF (2013) Selecting traits that explain species-environment relationships: a generalized linear mixed model approach. J Veg Sci 24:988–1000.
29. Kleyer M et al (2012) Assessing species and community functional responses to environmental gradients: which multivariate methods? J Veg Sci 23:805–821.
30. Lavorel S, Rochette C, Lebreton J-D (1999) Functional groups for response to disturbance in mediterranean old fields. Oikos 84:480–498.
31. Lavorel S, Touzard B, Lebreton J-D, Clément B (1998) Identifying functional groups for response to disturbance in an abandoned pasture. Acta Oecologica 19:227–240.
32. Lebreton JD, Chessel D, Prodon R, Yoccoz N (1988a) L’analyse des relations espèces-milieu par l’analyse canonique des correspondances. I. Variables de milieu quantitatives. Acta Oecologia Generalis 9:53–67Google Scholar
33. Lebreton JD, Chessel D, Richardot-Coulet M, Yoccoz N (1988b) L’analyse des relations espèces-milieu par l’analyse canonique des correspondances. II. Variables de milieu qualitatives. Acta Oecologia Generalis 9:137–151Google Scholar
34. Legendre L, Legendre P (2012) Numerical ecology. Elsevier, Amsterdam. ISBN 9780444538680Google Scholar
35. Legendre P, Galzin RG, Harmelin-Vivien ML (1997) Relating behavior to habitat: solutions to the fourth-corner problem. Ecology 78:547–562.
36. Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York. ISBN 978-0471986331Google Scholar
37. Mardia KV, Kent JT, Bibby JM (1980) Multivariate analysis. Academic Press, London. ISBN 9780124712522Google Scholar
38. McCune B (2015) The front door to the fourth corner: variations on the sample unit $$\times$$ trait matrix in community ecology. Commun Ecol 16:267–271.
39. Oksanen J et al. (2013) vegan: Community Ecology Package. R Package. version 20-9. https://CRAN.R-project.org/package=vegan
40. Peres-Neto PR, Dray S, ter Braak CJF (2017) Linking trait variation to the environment: critical issues with community-weighted mean correlation resolved by the fourth-corner approach. Ecography 40:806–816.
41. R Core Team (2015) R: a language and environment for statistical computing, version 3.0. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org
42. Rui Alves M, Beatriz Oliveira M (2004) Predictive and interpolative biplots applied to canonical variate analysis in the discrimination of vegetable oils by their fatty acid composition. J Chemom 18:393–401.
43. Takane Y (2013) Constrained principal component analysis and related techniques. Chapman and Hall/CRC, Londen. ISBN 9781466556669Google Scholar
44. Tenenhaus M, Young FW (1985) An analysis and synthesis of multiple correspondence analyis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data. Psychometrika 50:91–119.
45. ter Braak CJF (1985) Correspondence analysis of incidence and abundance data: properties in terms of a unimodal response model. Biometrics 41:859–873.
46. ter Braak CJF (1986) Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67:1167–1179.
47. ter Braak CJF (1987) The analysis of vegetation-environment relationships by canonical correspondence analysis. Vegetatio 69:69–77.
48. ter Braak CJF (1988) Partial canonical correspondence analysis. In: Bock HH (ed) Classification and related methods of data analysis. Elsevier Science Publishers B.V. (North-Holland), Amsterdam, pp 551–558. http://edepot.wur.nl/241165
49. ter Braak CJF (1990) Interpreting canonical correlation analysis through biplots of structural correlations and weights. Psychometrika 55:519–531.
50. ter Braak CJF (2014) History of canonical correspondence analysis. In: Blasius J, Greenacre M (eds) Visualization and verbalization of Data. Chapman and Hall, London, pp 61–75. http://edepot.wur.nl/302963
51. ter Braak CJF (2017) Fourth-corner correlation is a score test statistic in a log-linear trait-environment model that is useful in permutation testing. Environ Ecol Stat 24:219–242.
52. ter Braak CJF, Peres-Neto P, Dray S (2017) A critical issue in model-based inference for studying trait-based community assembly and a solution. PeerJ 5:e2885.
53. ter Braak CJF, Prentice IC (1988) A theory of gradient analysis. Adv Ecol Res 18:271–317.
54. ter Braak CJF, Šmilauer P (2012) Canoco reference manual and user’s guide: software for ordination, version 5.0. Microcomputer Power, Ithaca, USAGoogle Scholar
55. ter Braak CJF, Verdonschot PFM (1995) Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquat Sci 57:255–289.
56. Tso MK-S (1981) Reduced-rank regression and canonical analysis. J R Statist Soc B 43:183–189. http://www.jstor.org/stable/2984847
57. Warton DI, Shipley B, Hastie T (2015) CATS regression: a model-based approach to studying trait-based community assembly. Methods Ecol Evol 6:389–398.

© The Author(s) 2018 