1 Introduction

Ecological and evolutionary theory predicts that species have adapted to the environments they occupy (Southwood 1977; Townsend and Hildrew 1994). In recent years, understanding how the units of evolution (species) and their associated traits relate to the environment they inhabit has become a central focus in community ecology (McGill et al. 2006). A central question in this quest has been to establish the functionality of species traits, i.e. determine which traits allow species to survive and prosper where they do. Ultimately, by examining variation among attributes of species (traits) and among attributes of sites (environment), we can describe some of the important rules by which species assemblages emerge. As Legendre et al. (1997) stated: “Testing such hypotheses would require (1) a way to detecting associations between species and habitat characteristics, and (2) a way of testing the significance of these associations.” Traits and environmental (or habitat) variables cannot be correlated directly as they are measured on different units, namely species and sites, respectively, but can be connected via a non-negative link table with rows for sites and columns for species (e.g. presence–absence matrices, abundance or biomass information on species).

Legendre et al. (1997) developed a heuristic method referred to as the fourth-corner approach to trait–environment association. In that approach, three matrices containing information on the distributions of multiple species, species traits and the environmental attributes of species assemblages are combined to estimate a fourth matrix (the fourth-corner) containing correlations between traits and environment. Using ideas from multivariate analysis, Dolédec et al. (1996) developed a three-table ordination method, called RLQ, to establish the links between species traits and environmental variables. RLQ can consider either principal component analysis (PCA) or correspondence analysis (CA) as the central ordination method. For a single trait and a single environment variable, the version based on CA reduces to the fourth-corner method. The univariate version is by far the most used approach to link environmental and trait variation.

The three data tables at hand (Environment, Link and Trait data), denoted here by E, Y and T, respectively (Dolédec et al. 1996 used R, L and Q), can be arranged as

$$\begin{aligned} \left[ {{\begin{array}{c@{\quad }c} {\mathbf{Y}^{T}}&{} \mathbf{T} \\ {\mathbf{E}^{T}}&{} \mathbf{X} \\ \end{array}} } \right] \end{aligned}$$

where X is the missing matrix (i.e. fourth-corner). With n sites, m species, p environmental variables and q traits, the dimensions of tables Y, E and T are \(n \times m, n\times p\) and \(m\times q\), respectively, so that X is \(p \times q\). In the original method proposed by Legendre et al. (1997) the link table Y contained presence–absence of the m species in the n sites but this was later generalized to abundance or count data by Dray and Legendre (2008). The link table is denoted here by Y as it will be treated as response in a regression model later on.

The fourth-corner solution (Dray and Legendre 2008; Legendre et al. 1997) is to determine X in the simplest way, namely by the matrix product \(\mathbf{X}=\mathbf{E}^{T}{} \mathbf{YT}\). For a nominal trait, a nominal environmental variable (expanded to indicator matrices E and T) and a presence–absence data table Y, the fourth data table X is simply a contingency table containing frequencies. A natural test statistic for significance testing is thus to compute the usual chi-square statistic, which will not necessarily follow a chi-square distribution because of the obvious dependencies between the entries. Legendre et al. (1997) investigated this issue and proposed permutation testing strategies as potential solutions. However, none of their strategies worked satisfactorily under all models they considered (Dray and Legendre 2008). Eventually, ter Braak et al. (2012) derived a strategy based on the sequential rejection principle (Goeman and Solari 2010) that controlled the type I error in data generated from any of models considered by Dray et al. (2014). This sequential strategy involves both row and column permutation (see Sect. 3).

For quantitative E and T, the same equation \(\mathbf{X}=\mathbf{E}^{T}{} \mathbf{YT}\) can be used, except that the expansion to indicator matrices must now be replaced by normalization of each column of E and T to a weighted mean of zero and a weighed variance of 1, with site and species weights for E and T being the row and columns sums of Y, respectively. This then yields a matrix X consisting of fourth-corner correlations.

The motivation for the weighting came from considering an “inflated data table” (Legendre et al. 1997), in which Y is vectorized, the zeroes removed, and each non-zero species-site combination is associated with the corresponding rows of E and T. The fourth-corner correlation between a trait and an environmental variable is then the Pearson correlation between the corresponding column of the trait in the inflated T and the corresponding column of the environmental variable in the inflated E. For more general non-negative data tables, this generalizes to a weighted Pearson correlation with abundances as weights and absences carrying zero weight (Dray and Legendre 2008).

The inflation process has an intuitive rationale when the abundance data are counts of individuals. The inflated data table simply lists all individuals (rows) and has, for \(p =q = 1\), two variables (columns), namely the single trait and the single environmental variable. Each row has the trait value of the species it belongs to and the environmental value of the site which it inhabits. The fourth-corner correlation is then simply the unweighted Pearson correlation between the two variables of this table. This is a natural method to use when individuals are sampled with measurements of their traits and the environmental variables where they live. In such sampling, there will be intra-specific (and intra-site) variation, which is ignored in the original formulation of Legendre et al. (1997), but could certainly be accounted for. Significance testing of the correlation by permutation procedures proceeds similarly to the nominal variables case (Dray et al. 2014).The rationale followed by Dolédec et al. (1996) to arrive at an equivalent solution is completely different; it is based on ways to constrain row and column scores in statistical triplets (Cailliez and Pagès 1976; Tenenhaus and Young 1985) defining a correspondence analysis. The link with (doubly) constrained correspondence analysis (Lavorel et al. 1998) returns at several places in this paper.

More recently, Pollock et al. (2012), Jamil et al. (2013) and Brown et al. (2014) independently proposed model-based approaches that generalize the fourth-corner problem to multiple traits and environmental variables using generalized linear (mixed) models (GL(M)M) for vectorized Y, as in the bilinear regression approach of Gabriel (1998). Model-based approaches have great appeal (e.g. for nature conservation purposes) as they can improve the prediction of species abundances not only based on their environmental characteristics, but also on their traits and the interactions between traits and environment. These GL(M)M approaches allow for simultaneous modeling of the abundances of m species in terms of one or more traits and environmental variables. Mainstream methods of variable selection are used to build parsimonious models. In these approaches, X becomes a matrix of (partial) regression coefficients estimating the direction and strength of the interaction between standardized traits and standardized environmental variables as well as their main effects on species distributions (Brown et al. 2014).

This paper seeks to establish connections between the earlier heuristic fourth-corner correlation and the more recent model-based approaches. One such, almost trivial, connection has been presented in the Appendix of Brown et al. (2014). In there, for a nominal trait and a nominal environmental variable, the fourth-corner X is a contingency table obtained by merging columns and rows that belong to the same category of the trait and of the environmental variable, respectively, and the likelihood ratio test on interaction in a contingency table using a Poisson log-linear model is asymptotically equivalent with the usual chi-square test. No such relationships have been established for quantitative variables. The importance of such links is that they allow the generalization and unification of a simple and widely used heuristic method based on correlations (fourth-corner) to the GLM (fixed or mixed) regression machinery to link trait and environmental variation. This paper establishes that the squared fourth-corner correlation times the sum of the elements of the link table Y (i.e. \(y_{++} )\) is precisely the score test statistic for testing the linear-by-linear interaction in a Poisson log-linear model with row and column main effects. Moreover, for multiple traits and environmental variables, the score test statistic is precisely \(y_{++} \) times the total inertia of a doubly constrained correspondence analysis (Kleyer 2012; Lavorel et al. 1999, 1998), which is the natural generalization of a singly constrained correspondence analysis, known as canonical correspondence analysis (Takane 2013; ter Braak 1986, 2014). It is also the natural generalization of RLQ (Dolédec et al. 1996; Dray et al. 2014) for correlated traits and environmental variables.

In ecological applications, however, the assumptions of the Poisson log-linear model are unlikely to hold true for a number of reasons. First, counts are typically over-dispersed compared to the Poisson and therefore modeled by, for example, a negative binomial distribution (Warton 2005). Second, observations from the same site are likely to be dependent and residual correlation among species is to be expected. This dependence has typically been addressed by resampling methods that resample entire sites instead of single individual observations (Oksanen et al. 2013; Wang et al. 2012). Third, observations on the same species are dependent when the observed environment interacts with unobserved (latent) traits, giving residual correlation among sites. This dependence is accounted for in generalized linear mixed models for trait–environment interaction (Jamil et al. 2013; Pollock et al. 2012) by using a random slopes model. Warton et al. (2015a) extended such a random slopes logistic mixed model to a model with factor-analytic terms so as to account for both dependencies and analyzed it in the Bayesian framework using Gibbs sampling.

As this brief literature review shows, there are resampling-based (permutation and bootstrap) and model-based approaches to apply when model assumptions are unlikely to hold true. In the former the attempt is to overcome the shortcomings of the too simple model by resampling; in the latter, the simple model is extended until a ‘correct’ model has been found, defined as passing a number of diagnostics, so that one can then likely trust parametric (asymptotic) statistical inference. As an example, with model-based methods it is possible to build multi-trait multi-environment models in which the assumption of (conditional) independence is perhaps defendable. It is outside the scope of this paper to discuss the pros and cons of model-based versus resampling-based strategies, and how they might be combined. This paper takes the resampling approach using the simple Poisson model with interaction and shows by simulation that different deviations from the assumptions require different resampling methods to rescue the validity of the statistic test on trait–environment interaction. The different deviations also serve to explain why community-based and species-based inference (Shipley et al. 2007) may statistically yield different results (Ackerly et al. 2002; Peres-Neto et al. 2016) and why statistical tests based on community-based resampling as in Warton et al. (2015b) may have inflated type I error when the GLM-model does not hold true.

The paper is structured as follows. In Sect. 2 the score test statistic on interaction is derived, extended to the multi-trait multi-environmental variable case and also specialized to a number of common simple cases. In all cases, the score test statistic can be expressed in terms of the total inertia of a (doubly or singly) constrained correspondence analysis. In Sect. 3 the distribution of the test statistic is examined in the Poisson model from which it was derived and for five extended models and under four permutation schemes. Depending on the model, the permutation distribution obtained in a particular scheme does or does not correspond with the simulated distribution (i.e. the true distribution with sampling error) with only one scheme that controls the type I error in all models. This ‘max’ scheme, developed by ter Braak et al. (2012) from the sequential rejection principle, takes the maximum p value of the community-based permutation test and the species-based permutation test. Section 4 gives a real data example where, as in the simulations, community-based and species-based inference lead to different results, which can then be combined in the max scheme. Section 5 discusses the advantages, limitations and extensions of the approach taken in this paper and formulates the paradox that abundance is a weight in the fourth-corner correlation and a response in the log-linear model and that, nevertheless, these methods are closely related. The paradox is reconciled via a formula, well known in the literature on correspondence analysis, which expresses correspondence analysis as an approximation to a particular log-linear model and by noting that the fourth-correlation is the square-root of the only non-trivial eigenvalue of a doubly constrained correspondence analysis.

2 Theory

Unless otherwise noted, the response is assumed to be count data.

2.1 Likelihood and sufficient statistics

Arguably, the simplest statistical model used for detecting the trait–environment interaction is the log-linear model in which the count \(y_{ij} \) is assumed to follow a Poisson distribution with mean specified by

$$\begin{aligned} \log \left( {\mu _{ij}} \right) =r_i +c_j +b\,t_j e_i , \end{aligned}$$
(1)

with \(r_i \) and \(c_j \) row (site) and column (species) main effects and b the coefficient measuring the direction and strength of the te interaction (i.e. the link between trait and environment). I derive the score test of the null hypothesis \(b=0\) with the alternative hypothesis \(b\ne 0\). The row and column main effects are unknown nuisance parameters and as such they saturate the main effects for species and sites.

Under the assumption of Poisson distributed abundances \(\{y_{ij} \}\), the relevant part of the log-likelihood is

$$\begin{aligned} l\left( \theta \right) =\mathop {\sum }\nolimits _{i,j} \left\{ {y_{ij} \log \left( {\mu _{ij}} \right) -\mu _{ij}} \right\} \end{aligned}$$
(2)

with \(\hbox {E}\left( {y_{ij}} \right) \equiv \mu _{ij} =\hbox {exp}\left( {r_i +c_j +b\,t_j e_i} \right) \), so that

$$\begin{aligned} l\left( \theta \right) =\mathop {\sum }\nolimits _i y_{i+} r_i +\mathop {\sum }\nolimits _j y_{+j} c_j +b\mathop {\sum }\nolimits _{i,j} y_{ij} t_j e_i -\mu _{++} , \end{aligned}$$
(3)

where a “+” replacing an index means the sum over the index, e.g. \(y_{i+} =\mathop {\sum }\nolimits _j y_{ij} \). The minimal sufficient statistics are thus \(\mathop {\sum }\nolimits _{i,j} y_{ij} t_j e_i \) and the row and column totals \(\{y_{i+} \}\) and \(\{y_{+j} \}\).

2.2 Score test statistic

This subsection gives a recap of the score test (Bera and Bilias 2001; Cox and Hinkley 1974; Rao 1973; Yee 2015), which is simpler to compute than the likelihood ratio tests in most cases. The score function \(U\left( \theta \right) \) is the derivative of the log-likelihood \(l\left( \theta \right) \) with respect to the parameter vector \(\theta \):

$$\begin{aligned} U\left( \theta \right) =\frac{\partial }{\partial \theta } l\left( \theta \right) , \end{aligned}$$
(4)

which, under regularity conditions, is asymptotically normal with mean zero and variance equal to the Fisher information:

$$\begin{aligned} \textit{var}_y \left[ {U\left( \theta \right) } \right] =E_y \left[ {\frac{\partial }{\partial \theta } l\left( \theta \right) \frac{\partial }{\partial \theta ^{T}}l\left( \theta \right) } \right] =-E_y \left[ {\frac{\partial }{\partial \theta ^{T}}U\left( \theta \right) } \right] =I\left( \theta \right) . \end{aligned}$$
(5)

The score test statistic to test the null \(H_0 :\theta =\theta _0 \) versus \(H_A :\theta \ne \theta _0 \) is then

$$\begin{aligned} S\left( {\theta _0} \right) =U^{T}\left( {\hat{\theta } _0} \right) I^{-1}\left( {\hat{\theta } _0} \right) U\left( {\hat{\theta } _0} \right) , \end{aligned}$$
(6)

where \(\hat{\theta } _0 \) is \(\theta _0 \) combined with the maximum likelihood estimate of the parameters that are not restricted in the null hypothesis \(\hbox {H}_{0}: \theta =\theta _0 \).

2.3 Score test statistic for the interaction parameter

In the “Appendix” I derive the score test statistic for testing the trait–environment interaction, i.e. to test \(H_0 :b=0\) versus \(H_A :b\ne 0\) in Eq. (1) for the case of the interaction between a single trait and single environmental variable. The score test statistic is

$$\begin{aligned} S\left( b \right) |_{b=0} =\frac{y_{++} \left[ {\mathop {\sum }\nolimits _{i,j} y_{ij} \tilde{t} _j \tilde{e} _i} \right] ^{2}}{\mathop \sum \nolimits _j y_{+j} \tilde{t} _{\mathrm{j}}^2 \mathop \sum \nolimits _i y_{i+} \tilde{e} _{\mathrm{i}}^2} = y_{++} cor_Y^2 \left( {t,e} \right) =y_{++} \,f^{2} \end{aligned}$$
(7)

with \(\tilde{t} _j \) and \(\tilde{e} _i \) centred versions of the trait and environmental variable:

$$\begin{aligned} \tilde{t} _j =t_j -\mathop {\sum }\nolimits _j y_{+j} t_j /y_{++}\hbox { and } \tilde{e} _i =e_i -\mathop {\sum }\nolimits _i y_{i+} e_i /y_{++} \end{aligned}$$
(8)

and f the fourth-corner correlation. This result is perhaps not unexpected because the squared fourth-corner correlation is the first eigenvalue of a doubly constrained correspondence analysis and it is known that (constrained) correspondence analysis decomposes the usual chi-square of a contingency table, \(\chi ^{2}\) say, along factorial axes such that (Greenacre 2007, 1984; Takane 2013):

$$\begin{aligned} \chi ^{2}=y_{++} \mathop {\sum }\nolimits _a \lambda _a =y_{++} \left( {\textit{total\,inertia}} \right) \end{aligned}$$
(9)

with \(\lambda _a \) the a th-eigenvalue and the sum is over all eigenvalues (constrained and unconstrained).

2.4 Score test for multiple traits and environmental variables

With data on p environmental variables and q traits in the \(n \times p\) and \(m \times q \) matrices \(\mathbf{E}=\left\{ {e_{ik}} \right\} \) and \(\mathbf{T}=\left\{ {t_{jl}} \right\} \), the interaction term in the log-linear model of Eq. (1) is replaced by a sum of \(p \times q\) interaction terms \(\{b_{kl} e_{ik} t_{jl} \}\). The score test statistic for the null hypothesis that all interaction coefficients are zero: \(\mathbf{B}=\left\{ {b_{kl}} \right\} =0\), as derived in the “Appendix”, becomes

$$\begin{aligned} S\left( \mathbf{B} \right) |_{\mathbf{B}=0} =y_{++} tr\left( {\mathbf{D}^{T}{} \mathbf{D}} \right) \end{aligned}$$
(10)

with, after centring the columns of E and T as in Eq. (8),

$$\begin{aligned} \mathbf{D}=\left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}{} \mathbf{E}^{T}\mathbf{YT}\left( {\mathbf{T}^{T}{} \mathbf{CT}} \right) ^{-1/2}, \end{aligned}$$
(11)

where \(\mathbf{R}\) and \(\mathbf{C}\) are diagonal matrices with the weights \(\left\{ {y_{i+}} \right\} \) and \(\left\{ {y_{+j}} \right\} \), respectively, on their diagonal.

This score test statistic, when divided by \(y_{++} \), is equal to the total inertia of a doubly constrained correspondence analysis (Kleyer 2012; Lavorel et al. 1999, 1998), which is the natural generalization of a singly constrained correspondence analysis, known as canonical correspondence analysis (Takane 2013; ter Braak 1986, 2014). It is also the natural generalization of RLQ (Dolédec et al. 1996; Dray et al. 2014) for trait and environmental data that are not R- and C-orthogonal.

The matrix \(\mathbf{D}\) can be re-expressed as well in terms of the residuals of Y under the null model. The residual matrix is \(\mathbf{Y}^{*}=y_{++} \mathbf{R}^{-1}{} \mathbf{YC}^{-1}-\mathbf{1}_n \mathbf{1}_m^T \) and, then,

$$\begin{aligned} \mathbf{D}=y_{++}^{-1} \left( {\mathbf{E}^{T}{} \mathbf{RE}} \right) ^{-1/2}\mathbf{E}^{T}{} \mathbf{RY}^{*}{} \mathbf{CT}\left( {\mathbf{T}^{T}{} \mathbf{CT}} \right) ^{-1/2} \end{aligned}$$
(12)

which shows clearly the relation with a weighted regression of the residual matrix \(\mathbf{Y}^{*}\) on the traits and environmental variables where the weights are in \(\mathbf{R}\) and \(\mathbf{C}\).

2.5 Special cases

There are important special cases of these results.

Trait and environment variables are factors or the identity matrix

If trait and environment variables are factors so that \(\mathbf{T}\) and \(\mathbf{E}\) are indicator matrices, the score test statistic is simply the usual chi-square statistic calculated from the contingency table table \(\mathbf{Y}^{te}\), say, containing the total abundance in each class of the cross-classification of the factor classes (see “Appendix”). This result was derived with saturated main effects (having free row and column parameters \(r_i \) and \(c_j\)). Brown et al. (2014) obtained a similar result from the log-linear model with T and E as main effects.

With \(\mathbf{T}\) and \(\mathbf{E}\) as diagonal matrices of size \(m \times m\) and \(n \times n\), respectively, the score test statistics becomes the usual chi-square statistic for contingency table \(\mathbf{Y}\). This particular case is an analysis of \(\mathbf{Y}\) without external constraining information and has no value for trait–environment analysis.

With \(\mathbf{T}\) a diagonal matrix of size \(m \times m\) and \(\mathbf{E}\) an \(n \times p\) matrix, the score test statistic becomes \(y_{++} \) times the total inertia of a canonical correspondence analysis (ter Braak 1986). The (community-based) permutation test on the effects of environmental variables on species abundance using canonical correspondence analysis can thus be viewed as a test on the species-by-environment interaction in a log-linear model using a score test statistic.

Single trait and multiple environment variables

One of the first statistical methods to uncover and describe trait–environment association starts by calculating Community Weighted trait Means (CWM) (Kleyer 2012; Lavorel et al. 2008; Peres-Neto et al. 2016). With a single trait, the CWM is the single n-vector \(\mathbf{t}^{*}\), with

$$\begin{aligned} \mathbf{t}^{*}=\mathbf{R}^{-{1}}{} \mathbf{Yt}, \end{aligned}$$
(13)

where, for simplicity of the following equations, t is already C-centred and C-standarized as in equations (8) and (39) of the “Appendix”. The next step is then to calculate a regression of \(\mathbf{t}^{*}\) on the environmental variables. If a weighted regression is used, using weight matrix \(\mathbf{R}\), the fitted values become

$$\begin{aligned} \hat{\mathbf{t}} =\mathbf{E}\left( {\mathbf{E}^{{T}}{} \mathbf{RE}} \right) ^{-{1}}{} \mathbf{E}^{{T}}{} \mathbf{Rt}^{*} \end{aligned}$$
(14)

and the regression sum of squares, more precisely, the weighted sum of squares of fitted values, is

$$\begin{aligned} \hat{\mathbf{t}} ^{{T}}{} \mathbf{R}\hat{\mathbf{t}} =\mathbf{t}^{*{T}}{} \mathbf{RE}\left( {\mathbf{E}^{{T}}{} \mathbf{RE}} \right) ^{-{1}}{} \mathbf{E}^{{T}}{} \mathbf{Rt}^{*}=\mathbf{t}^{T}{} \mathbf{YE}\left( {\mathbf{E}^{{T}}{} \mathbf{RE}} \right) ^{-{1}}\mathbf{E}^{{T}}{} \mathbf{Yt}=\mathbf{D}^{{T}}{} \mathbf{D} \end{aligned}$$
(15)

with \(\mathbf{D}\) as in Eq. (11). This shows that the score test statistic is \(y_{++} \) times the ratio of the regression sum of squares to the total sum of squares of t. For a nominal environmental variable, this ratio is often called the squared correlation ratio, which is the statistic calculated in this situation by the function fourthcorner2 in the R package ade4 (Dray and Dufour 2007).

Note that the coefficient of determination of the weighted regression of \(\mathbf{t}^{*}\) on the environmental variables divides the regression sum of squares by the sum of squares based on \(\mathbf{t}^{*}\) (instead of on \(\mathbf{t})\) and is thus a factor \(\textit{var}_R \left( \mathbf{t} \right) /\textit{var}_R \left( {\mathbf{t}^{*}} \right) \) higher, as \(\textit{var}_R \left( {\mathbf{t}^{*}} \right) \le \textit{var}_R \left( \mathbf{t} \right) \). This can give spuriously high coefficients of determination when there is in fact no relation at all. The reason is that \(\textit{var}_R \left( {\mathbf{t}^{*}} \right) \) is close to zero when there is no association between \(\mathbf{t}\) and \(\mathbf{Y}\). Similarly, the simple and multiple correlation coefficients between \(\mathbf{t}^{*}\) and the environmental variable(s) are bad test statistics (Peres-Neto et al. 2016). The score test statistics derived in this paper do not have this shortcoming.

Multiple traits and single environment variable

The case of a single quantitative environment variable with multiple traits works analogously to the previous subsection with \(\mathbf{Y}\) transposed. In this case, weighted averages of the environmental variable are calculated for each species, resulting in an m-vector containing, what are called, species niche centroids. The vector can then be regressed on traits (Kleyer 2012; Šmilauer and Lepš 2014), analogously to the approach based on CWM.

3 Distribution of the score test statistic in permutation tests and extended models

This section examines by simulation the distribution of the score test statistic developed in the previous section when the assumptions of the Poisson log-linear model hold true and shows that resampling methods that preserve the row and column totals of Y yield a distribution of the score test statistic that is within sampling variation of both the asymptotic chi-square distribution and the simulated distribution. This section also shows that particular deviations from the assumptions of the Poisson model require different resampling methods to rescue the validity of the statistic test on trait–environment interaction. The deviations lead to models that serve to explain why community-based and species-based inference may statistically yield different results. In this paper the focus is on permutational methods of resampling.

The (asymptotic) distribution of the score test statistic is known to be chi-square with pq degrees of freedom (Cox and Hinkley 1974) when the statistical null model holds true, which is in our case the Poisson log-linear model (1) with \(b=0\). Appendix S1 provides code in the R language (R Core Team 2015) and results of simulations illustrating this. In these simulations, the analytical equations for the score test statistic using (7), (10), (15) and (43) give numerically the same value as the score test statistic calculated using the R package mdscore (da Silva-Junior et al. 2015); the difference between the likelihood ratio and the modified score statistic is small.

Figure 1 compares the true exceedance probability of the score test statistic as estimated on the basis of 10,000 simulated data sets (vertical axis) with the exceedance probability estimated by the chi-square distribution (parametric) and as obtained from four different permutation schemes using 999 permutations each (horizontal axis) across six different data-generating models (the \(2\times 3\) panels). Note that only the sixth model contains a true non-zero interaction between the observed trait t and the observed environmental variable e. Appendix S2 provides R-code for the simulations.

The first five models are different null models. The first model is the Poisson log-linear model (1) with \(n=m=30, b=0\) and \(r_i =\log \left( {\mu _0} \right) +\alpha e_i \) and \(c_j =\beta t_j \), with \(\mu _0 =30, \alpha =0.2\) and \(\beta =0.2\) (the main effects only model). In the second model, the distribution of the counts is set to negative binomial with variance function \(\mu _{ij} +\mu _{ij}^2\). The next models deviate in one aspect from this second model (base model) by adding terms that induce correlations between species across sites and/or between sites across species. In the third model, the base model is extended with a random interaction term of the factor analytic form, \(b_{zx} z_j x_i \), where \(b_{zx} =0.2\) and \(z_j \) and \(x_i \) are independent standard normal deviates. This model gives correlations both among species and among sites in any given data set. The next two models (in the second row of Fig. 1) also use independent standard normal deviates z and x (of length m and n) representing a latent trait and a latent environment variable, but now with interactions with the observed environment e and the observed trait t, respectively. In the fourth model, the base model is extended with the term \(b_{ze} z_j e_i \) (\(b_{ze} =0.2\)), i.e. an interaction of a latent trait z with the observed environment e, which induces correlation among sites. This model is relevant when the observed environmental variable is known to influence the abundance of different species in different ways, but where these differences cannot be explained by the observed trait(s). The fourth model can be re-expressed as:

$$\begin{aligned} \log \left( {\mu _{ij}} \right) =\,\log \left( {\mu _0} \right) +\beta t_j +b_j e_i \end{aligned}$$
(16)

with \(b_j =\alpha +b_{ze} \,z_j \), that is, \(b_j \sim N\left( {\alpha ,\sigma _b^2} \right) \) with \({\upsigma } _b^2 =b_{ze}^2 \). As such, this model is a generalized linear mixed model with random species-specific slopes (random slopes model) as in Jamil et al. (2013). In the fifth model, the base model is extended with the term \(b_{tx} t_j x_i \) (\(b_{tx} =0.2)\), i.e. an interaction of a latent environmental variable x with the observed trait t, which induces correlation among species. This model is of interest when the observed trait is known to influence the abundance of different species in different ways, but where the differences cannot be explained by the observed environmental variables.

The sixth model is the only non-null (alternative) model, namely the base model extended with the terms \(b_{te} t_j e_i +b_{ze} z_j e_i \) with \(b_{te} =0.2\).

The permutation tests (Manly 2006) are carried out using four different schemes (abbreviation between brackets) which all preserve the row and column totals of Y.

  1. 1.

    (rc). Randomly permute both all rows and all columns of Y in respect to each other (Dolédec et al. 1996). This scheme was first proposed by Welch (1990) for permutation testing of interaction in balanced fixed-effects two-way analysis of variance. They destroy any relationship between Y and E, and Y and T, respectively.

  2. 2.

    (row). Randomly permute only the rows of Y. This is model 2 of Dray and Legendre (2008), destroying any relationship between Y and E only.

  3. 3.

    (col). Randomly permute only the columns of Y. This is model 4 of Dray and Legendre (2008), destroying any relationship between Y and T only.

  4. 4.

    (max). Perform a sequential test (Goeman and Solari 2010) with first the row-permutation test using scheme 2, and, if this test is significant, then the column-permutation test using scheme 3, or vice versa (ter Braak et al. 2012). In our case, both tests use the same score test statistic, Eq. (7), so that the sequential test (when both tests are carried out) is then equivalent with the test in which the final p value is the maximum of the two p values. Scheme 4 improves model 5 of Dray and Legendre (2008) and Peres-Neto et al. (2012) in the way the final p value is calculated.

Fig. 1
figure 1

Exceedance probability of the score test statistic of interaction in a Poisson log-linear model that also contains row and column main effects, as estimated from 10,000 simulated data sets, against the exceedance probability, as obtained from the chi-square distribution (chi2) and four different Monte Carlo permutation tests (999 permutations; permutation of rows and column simultaneously (rc), of rows (row) and of columns (col) and the sequential combination of the row and col scheme in the max scheme, which takes the maximum of the p values of the row and the col scheme. The panels show results for six data generating models, five of which represent typical deviations from the Poisson model (top left). All other models had negative binomially distributed response, without (top middle) and with further deviations (see text for details). The bottom right panel is the only model with a genuine non-zero interaction between the observed trait t and environment e. The variables z and x represent an unobserved (i.e. latent) trait and environmental variable, respectively

On the basis of a suggestion of a reviewer, two more permutation methods, which permute the trait values (or the values of the environmental variable) in inflated tables, have been evaluated in Appendix S3.

Except in the bottom-right panel in Fig. 1 (i.e. non-null (alternative) model where \(b_{te} \ne 0\)), the ideal test in terms of type I error rates follows the 1:1 line. Lines above this line indicate liberal tests that have elevated Type I error rate (too many rejections at a specified nominal level, e.g. the horizontal dashed line at 0.05) and lines below the 1:1 line indicate conservative tests that have too few rejections at a specified nominal level. A test is said to control the type I error, if its type I error rate is at most the nominal level (Goeman and Solari 2010), that is, if its lines in Fig. 1 are all at or below the 1:1 line.

The exceedance probability based on the chi-square distribution with 1 degree of freedom is at the 1:1 line only for the Poisson model and is far above this line for the other models. In the top-row panels of Fig. 1, the rc, row and col schemes closely follow the 1:1 line, but the max scheme is slightly below this line, and thus conservative with an observed rejection rate in the 10,000 simulations of about 3% at the nominal 5% level of the test. Note, however, that the test is still reliable in the sense that it does not reject the null hypothesis more often than the nominal level.

In the first two panels of the bottom-row in Fig. 1, the rc scheme nearly coincides with the row scheme and the column scheme, respectively. The schemes are above the 1:1 line and thus liberal with a rejection rate of about 17% at the nominal 5% level of the test. In these panels, the max scheme nearly coincides with the column scheme and the row scheme, respectively. These schemes are about at the 1:1 line and thus have a rejection rate of about 5% at the nominal 5% level of the test.

The data generating model in the bottom-right panel is the only one containing a true non-zero interaction between the observed trait t and observed environmental variable e. In this case, the ideal line is \({\Gamma } \)-shaped, indicating a high rejection rate at each nominal level of the test. At the nominal level of 5%, the rejection rate is \(\sim \)0.90 for the col and max schemes and \(\sim \)0.97 for the row and rc schemes. If \(b_{ze} \) is decreased from 0.2 to 0, these rejection rates are all >0.98 in this case, and the plot is \({\Gamma } \)-shaped, also for the chi-square based probability.

The conclusion from these simulations is that the use of the chi-square based probability gives highly inflated type I errors if the Poisson model does not strictly hold true. From the investigated permutation schemes (including the two methods of Appendix S3), the max scheme is the only scheme that controls the type I error in the five investigated models with latent variables, while providing a strong statistical power when \(b_{te} \ne 0\).

4 Real data example

Different permutation schemes can also lead to different results in real data. This is illustrated here with the Dune Meadow data set (Jongman et al. 1995) consisting of abundances of 28 plants in 20 sites with five environmental variables and, from Jamil et al. (2013), five plant traits. The abundance is on a semi-quantitative rank scale with integer numbers from 0 (absent) to 9 (present everywhere). For illustration purposes only, abundances are treated as counts in this example and, alternatively converted to presence/absence. Suppose for a moment that the only available environmental variable is moisture, which is the major axis of variation of this data (Jongman et al. 1995), and one wishes to know whether it interacts with the plant trait SLA (specific leaf area). Using the fourth-corner score test statistic, the p values for the abundance data (with the p values obtained for presence/absence in this section between parentheses) for the permutation schemes rc, row and col are 0.008 (0.006), 0.028 (0.024), 0.218 (0.185), respectively (using 999 permutations). The first two schemes thus provide evidence for an interaction, whereas the col scheme does not. The simulations in Fig. 1 indicate that one possible reason for such a difference between the row and col schemes is that the environmental variable (moisture) interacts with a latent trait, even if that variable is independent of the trait of interest (SLA). There is indeed another trait in the Dune Meadow data set, namely seed mass, that has almost zero correlation with SLA (r = −0.047) and that interacts with moisture [p values of 0.0001 (0.0001) and 0. 0.012 (0.0185) for the row and col schemes, respectively]. The p value of 0.028 (0.024) in the row scheme for the testing the interaction between SLA and moisture is thus likely caused by the interaction between seed mass and moisture. There is thus no evidence in these data that SLA and moisture have a real interaction. This example illustrates that the evidence for a trait–environment interaction is weak unless both the row and col schemes result in low p values. This line of reasoning leads naturally to the max scheme; the formal argument hinges on the theory of sequential testing (Goeman and Solari 2010) as given in ter Braak et al. (2012).

5 Discussion

This paper shows that the fourth-corner correlation, heuristically developed by Legendre et al. (1997) for examining trait–environment associations, has a close relationship with the Poisson log-linear model with interactions, which has recently been proposed as a model for trait–environment relationships (Brown et al. 2014; Warton et al. 2015b). The squared fourth-corner correlation is proportional to the score test statistic for testing the linear-by-linear interaction in the Poisson log-linear model with row and column main effects. This result gives a mathematical underpinning of a conjecture that Peres-Neto et al. (2016) examined by simulation, namely that the fourth-corner correlation focuses on the interaction of a Poisson log-linear model and is not sensitive to main effects. Moreover, a score test is asymptotically equivalent with the likelihood ratio test, but much quicker to compute as it does not require fitting of the alternative model. This applies particularly to the test based on the fourth-corner correlation in comparison with the test based on the Poisson deviance difference between the main effects only model and the main effects with interaction model. In our R implementation, the test using the fourth-corner correlation is 140 times quicker to compute than the GLM-based test. Note that computing time easily becomes an issue with resampling for statistical inference, particularly, in large data sets.

Ecological data are likely over-dispersed. Then there are two popular models, the quasi-Poisson model and the negative binomial model. The quasi-Poisson model, with its variance proportional to the mean, allows a quasi-likelihood approach that leads to the Poisson deviance to be minimized and thus to the same estimates as the Poisson model. In this case, the squared fourth-corner is safe to use in resampling-based (permutation or bootstrap) significance tests. That is not the case for the negative binomial model (with variance function \(\mu _{ij} +\phi \mu _{ij}^2 \) and scale parameter \(\phi \)). Then, the minimal sufficient statistics are the full data, instead of the three statistics below Eq. (3), and the score test statistic differs from the one in the Poisson model. Resampling based on the squared fourth-corner or the Poisson likelihood ratio (LR) is therefore no longer optimal and power may be lost. In a small simulation study as in the sixth panel of Fig. 1 (100 data sets per scenario and 99 permutations per data set), the power of the row, col and max schemes based on the negative binomial LR was 0.96, 0.94 and 0.93, respectively. By comparison, the power of the fourth-corner test on the same data sets was estimated as 0.97, 0.88 and 0.88, respectively, confirming some loss of power compared to using the negative binomial LR. The negative binomial LR is costly computationally and potentially numerically unstable; for example, in our implementation using the R package mvabund (Wang et al. 2012), I tried to obtain results for 1000 simulations with 999 permutation, but failed due to crashes of R. Note that the negative binomial GLM requires resampling for statistical inference as the parametric version inference is not very trustworthy, even in simple balanced design experiments for small to moderate data set sizes (Szöcs and Schäfer 2015). It would be of interest to develop a score test in the context of the negative binomial distribution.

Statistical tests in this paper have used resampling, based on restricted permutation of the counts. The restrictions ensured that the row and column totals were preserved. Without restrictions, permuting residuals would have been required to preserve these totals. Moreover, unrestricted resampling would treat the data or residuals as if they were exchangeable, whereas this is unlikely due to unobserved variation between species and/or sites.

Brown et al. (2014) advocate community-based resampling as being design-based. However, ecologists typically search for trait–environment association in observational studies. Therefore there exists no real design-based inference; the values at the sites or for species are in no way randomized by design. But it may still be hypothesized that values of traits, values of environmental variables or residuals from models are exchangeable. This viewpoint supports both community-based and species-based resampling, although not necessaritly completely random resampling when there is spatial or temporal autocorrelation or phylogenetic correlation.

Three types of restricted permutations were used here. The rc scheme permuted both rows and columns in the same resample, whereas the row and col schemes permuted either rows or columns. The simulation results showed that

  • the rc scheme is not able to control the type I error rate when there is additional unobserved random variation among sites or among species that interacts with either the observed environment or the observed trait (as in the terms \(b_{ze} z_j e_i \) and \(b_{tx} t_j x_i \) in the simulation models, respectively).

  • the row scheme is not able to control the type I error rate when there is additional species-based random variation that interacts with the observed environment (as in the term \(b_{ze} z_j e_i\)). In this scenario, the species respond differentially to the environment, but the differential response cannot be explained by the measured trait [see Eq. (16)]. By contrast, with additional site-based random variation, the row scheme controls the type I error rate, even if it interacts with the observed trait (as in the term \(b_{tx} t_j x_i\)).

  • vice versa, the col scheme is not able to control the type I error rate when there is additional site-based random variation that interacts with the observed trait (as in the term \(b_{tx} t_j x_i )\). In this scenario, the species respond differentially to the trait, but the differential response cannot be explained by the measured environment. By contrast, with additional species-based random variation, the col scheme controls the type I error rate, even if it interacts with the observed trait (as in the term \(b_{ze} z_j e_i\)).

  • the max scheme, in which the row- and column-based tests are combined, controlled the type I error rate in scenarios with either type of random variation.

Whereas the max scheme is perhaps currently the best simple method to test species-environment association, it is not yet perfect. In particular: 1) its type I error rate is below the nominal level when neither of these random effects is present (see top row of Fig. 1), resulting in some loss of power, and 2) its type I error rate can still be above the nominal level when both types of variation occur simultaneously. For example, if the base model in Fig. 1 is extended with huge latent interactions, namely \(b_{tx} t_j x_i +b_{ze} z_j e_i \), with \(b_{tx} =b_{ze} =1\) (instead of 0.2 as in Fig. 1), the estimated type I error rate for the max scheme is 8.2% at the nominal 5% level (and the row and col scheme both give a type I error rate of 15%). So far, no resampling method has been found to control fully the type I error rate in this scenario. In this case, both the trait and the environment structure the species-by-site interaction, but do not interact among one another, at least not on the log-linear scale. To detect (or guard against) this scenario, the only way to go is presumably model-based (and Bayesian) as in Warton et al. (2015a).

Our simulation confirmed the remark of Brown et al. (2014) that community-based resampling “enables valid inferences that are robust to correlation between species, even when such correlation has not been incorporated into the fitted model”: the simulation in Fig. 1 (middle panel in second row) had correlations among species due a latent environmental variable x that was uncorrelated with the observed variable e. The row scheme gave a correct type I error rate, but the col scheme did not, as species were correlated. Reversely, when there are dependencies among sites due to a latent trait z, row-based resampling gave an inflated type I error rate, but column-based resampling did not (left panel in second row of Fig. 1). When either one or the other situation could be present, the max scheme is a solution to valid inference. When both situations are likely present, the max scheme also shows moderate type I error rate inflation and some form of p value adjustment estimated via simulation might be a way out (to undo possible type I inflation noted in the previous paragaph) or, the other elaborate option, explicit modeling of the correlations in a GLMM model. Both options are outside the scope of this paper. Of course, for observational data, any estimated correlation or association does not imply causation.

A reviewer raised serious objections against any permutation method that is based on permuting species by arguing that: “species (columns) are not the sampling units, they are out of the control of the experimenter and are generally assumed to be correlated due to species interactions and missing predictors”. I add phylogenetic relationships to this (see below). Therefore “Resampling species makes no sense from a design perspective, irrespective of the presence or absence of species-by-environment interaction effects”. The danger of all of this is that a statistical test using species-based resampling may have inflated type I error rate (is too liberal). Let me put this into the context of the max test. If the species-based resampling test is not performed, the final p value is the one from site-based resampling. The p value of the species part of the max test is then effectively nil (under the true null hypothesis, the null hypothesis is always rejected), which corresponds to the maximum type I error rate inflation possible. One is thus better off by applying the species-based test than by not applying it, even in the case that the above mentioned danger of some type I error rate inflation is real.

Note that, as yet, no valid GLM-based statistical test of species-by-environment interaction has been proposed. For example, the site-based residual bootstrapping approach of Warton et al. (2015b) suffers from the same type I error rate inflation as the simple site-based permutation scheme in the scenario of Fig. 1 that includes the \(b_{ze} z_j e_i \) term (ter Braak et al. 2016). Also this inflation can be counteracted by adding species-based resampling as in the max approach (ter Braak et al. 2016). Note also that missing predictors (either as main effects or interactions) are no problem as long as they do not interact with the observed trait and the observed environment. An example hereof is the random interaction scenario in Fig. 1.

Completely random permutations of species and/or of sites were used in this paper. This needs further adaptation as sites may be structured in space (spatial autocorrelation) and time (temporal autocorrelation) and species form a phylogeny (phylogenetic autocorrelation) so that neither sites nor species are really completely independent or exchangeable units. The net effect will be that the effective number of units is actually smaller than the number observed in the data (i.e. loss of degrees of freedom through autocorrelation), likely generating a liberal test when random permutations are used. Possible alternatives for random permutations are restricted permutations (Lapointe and Garland 2014) or data simulation that keeps the original spatial or phylogenetic structure in data (Wagner and Dray 2015). In this kind of hypothesis testing, phylogeny is treated as a nuisance: a trait–environment association is only judged valid when the association contributes beyond contributions due to phylogenic relatedness. For prediction, such a strong requirement is not needed. Prediction of abundance of a new species is expected to be better (with and without taking its trait value into account) the closer it is in the phylogeny to the species present in the data set.

The score test statistic for the testing the slope parameter in a simple regression is the sample size multiplied by the squared Pearson correlation (Bera and Bilias 2001). This result aligns nicely with the fourth-corner correlation defined as the Pearson correlation on inflated trait and environment data, but does not help to understand the link with the Poisson log-linear model. For this, the link between the fourth-corner correlation and correspondence analysis is more helpful as indicated in Sect. 2.3 and in more detail in the next paragraph.

The fourth-corner correlation and the log-linear model appear to treat species abundance in completely different ways: in the former as a weight (as fourth-corner is a weighted Pearson correlation) and in the latter as a response variable. In the following, this paradox is reconciled by using the relationship between the fourth-corner correlation and correspondence analysis (see Sect. 2). Recall that, for \(p=q=1\), the fourth-corner correlation arises as a doubly-constrained correspondence analysis in the RLQ-approach in Dolédec et al. (1996). It is well known that correspondence analysis is related to the Goodman’s (1979) RC-model, which is the model of Eq. (1) with t and e latent. Indeed, a first order Taylor expansion of Eq. (1) in terms of \(bt_j e_i \) yields the reconstitution formula of correspondence analysis (Greenacre 1984):

$$\begin{aligned} \mu _{ij} =R_i^*C_{{j}}^{*} \exp \left( {bt_j e_i} \right) \approx R_i^*C_{{j}}^{*} \left( {1+bt_j e_i} \right) , \end{aligned}$$
(17)

where \(R_i^*=e^{r_i} \) and \(C_{{j}}^{*} =e^{c_j} \). So, for small b, both models can be expected to be very similar. Goodman (1981) showed that their estimation equation are then also very similar. For standardized t and e, b is the square-root of the first eigenvalue of correspondence analysis. Equation (17) applies, of course, also to our case of observed t and e. The log-linear model with row and column main effects and a linear-by-linear interaction is thus similar to the doubly constrained correspondence analysis. ter Braak (1985, 1988) showed such similarity also for data that follow the ecological niche model. Such unimodal data are very far from row-column independence and have in correspondence analysis a first non-trivial eigenvalue close to 1. That theory gives motivation to develop ordination methods for multiple traits and environmental variables based on a decomposition of the total inertia given in equations (10) and (11), as in the software package Canoco 5.1 (ter Braak and Šmilauer 2012), with row- and column-based permutation tests (of residuals) for statistical inference. Such methods can be used as a quick-scan of trait–environment associations. The alternative, or rather, the complementary approach is to go the full Bayesian model-based approach with latent variables and factor analytic structure of which Warton et al. (2015a) provide a nice first implementation. Even such models may need resampling methods, as even the simplest models using the negative binomial already need resampling for valid statistical inference for small to moderate data set sizes (Szöcs and Schäfer 2015). Between these extremes, there is room for GLM- and GLMM-based approaches that use row- and column-based resampling schemes for valid statistical inference.