Skip to main content
Log in

Multiple regression on distance matrices: a multivariate spatial analysis tool

  • Published:
Plant Ecology Aims and scope Submit manuscript

Abstract

I explore the use of multiple regression on distance matrices (MRM), an extension of partial Mantel analysis, in spatial analysis of ecological data. MRM involves a multiple regression of a response matrix on any number of explanatory matrices, where each matrix contains distances or similarities (in terms of ecological, spatial, or other attributes) between all pair-wise combinations of n objects (sample units); tests of statistical significance are performed by permutation. The method is flexible in terms of the types of data that may be analyzed (counts, presence–absence, continuous, categorical) and the shapes of response curves. MRM offers several advantages over traditional partial Mantel analysis: (1) separating environmental distances into distinct distance matrices allows inferences to be made at the level of individual variables; (2) nonparametric or nonlinear multiple regression methods may be employed; and (3) spatial autocorrelation may be quantified and tested at different spatial scales using a series of lag matrices, each representing a geographic distance class. The MRM lag matrices model may be parameterized to yield very similar inferences regarding spatial autocorrelation as the Mantel correlogram. Unlike the correlogram, however, the lag matrices model may also include environmental distance matrices, so that spatial patterns in species abundance distances (community similarity) may be quantified while controlling for the environmental similarity between sites. Examples of spatial analyses with MRM are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Augustin N.H., Mugglestone M.A. and Buckland S.T. (1996). An autologistic model for the spatial distribution of wildlife. J. Appl. Ecol. 33:339–347

    Article  Google Scholar 

  • Bergeron Y. (1991). The influence of island and mainland lakeshore landscapes on boreal forest fire regimes. Ecology 72:1980–1992

    Article  Google Scholar 

  • Borcard D. and Legendre P. (1994). Environmental control and spatial structure in ecological communities: an example using oribatid mites (Acari, Oribatei). Environ. Ecol. Stat. 1:37–61

    Article  Google Scholar 

  • Borcard D., Legendre P., Avois-Jacquet C., and Tuomisto H. (2004). Dissecting the spatial structure of ecological data at multiple scales. Ecology 85:1826–1832

    Article  Google Scholar 

  • Borcard D., Legendre P. and Drapeau P. (1992). Partialling out the spatial component of ecological variation. Ecology 73:1045–1055

    Article  Google Scholar 

  • Condit R., Pitman N., Leigh E.G., Chave J., Terborgh J., Foster R.B., Nuñez P., Aguilar S., Valencia R., Villa G., Muller-Landau H.C., Losos E., and Hubbell S.P. (2002). Beta-diversity in tropical forest trees. Science 295:666–669

    Article  PubMed  CAS  Google Scholar 

  • Dutilleul P., Stockwell J.D., Frigon D. and Legendre P. (2000). The Mantel test versus Pearson’s correlation analysis: assessment of the differences for biological and environmental studies. J. Agricult. Biol. Environ. Stat. 5:131–150

    Article  Google Scholar 

  • Everham E. M. and Brokaw N.V.L. (1996). Forest damage and recovery from catastrophic wind. Bot. Rev. 62:113–185

    Google Scholar 

  • Fortin M.-J. and Payette S. (2002). How to test the significance of the relation between spatially autocorrelated data at the landscape scale: a case study using fire and forest maps. Ecoscience 9:213–218

    Google Scholar 

  • Insightful Corporation (2002). SPLUS version 6.1. Insightful Corporation, Seattle

    Google Scholar 

  • Johnson E.A. (1992). Fire and Vegetation Dynamics: Studies from the North American Boreal Forest. Cambridge University Press, Cambridge

    Google Scholar 

  • Legendre P. (1993). Spatial autocorrelation: trouble or new paradigm. Ecology 74:1659–1673

    Article  Google Scholar 

  • Legendre P. (2000). Comparison of permutation methods for the partial correlation and partial Mantel tests. J. Stat. Comput. Simul. 67:37–73

    Article  Google Scholar 

  • Legendre P., Borcard D., and Peres-Neto P.R. (2005). Analyzing beta diversity: partitioning the spatial variation of community composition data. Ecol. Monogr. 75:435–450

    Article  Google Scholar 

  • Legendre P. and Gallagher E.D. (2001). Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280

    Article  Google Scholar 

  • Legendre P., Lapointe F.-J. and Casgrain P. (1994). Modeling brain evolution from behavior: a permutational regression approach. Evolution 48:1487–1499

    Article  Google Scholar 

  • Legendre P. and Legendre L. (1998). Numerical Ecology, 2nd English edition. Elsevier Science, Amsterdam

    Google Scholar 

  • Lichstein J.W., Grau H.R., and Aragón R. (2004). Recruitment limitation in a subtropical landscape mosaic: impact of an exotic tree invasion. J. Veget. Sci. 15:721–728

    Article  Google Scholar 

  • Lichstein J.W., Simons T.R., Shriner S.A., and Franzreb K.E. (2002). Spatial autocorrelation and autoregressive models in ecology. Ecol. Monogr. 72:445–463

    Google Scholar 

  • Manly B.F. (1986). Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Res. Popul. Ecol. 28:201–218

    Article  Google Scholar 

  • Mantel N.A. (1967). The detection of disease clustering and a generalized regression approach. Cancer Res. 27:209–220

    PubMed  CAS  Google Scholar 

  • Nekola J.C. and White P.S. (1999). The distance decay of similarity in biogeography and ecology. J. Biogeogr. 26:867–878

    Article  Google Scholar 

  • Oden N.L. and Sokal R.R. (1986). Directional autocorrelation: an extension of spatial correlograms to two dimensions. System. Zool. 35:608–617

    Article  Google Scholar 

  • Oden N.L. and Sokal R.R. (1992). An investigation of three-matrix permutation tests. J. Classif. 9:275–290

    Article  Google Scholar 

  • Raufaste N. and Rousset F. (2001). Are partial Mantel tests adequate?. Evolution 55:1703–1705

    PubMed  CAS  Google Scholar 

  • Rawlings J.O., Pantula S.G. and Dickey D.A. (1998). Applied Regression Analysis: A Research Tool. 2nd ed. Springer-Verlag, New York

    Google Scholar 

  • Rossi R.E., Mulla D.J., Journel A.G. and Franz E.H. (1992). Geostatistical tools for modeling and interpreting ecological spatial dependence. Ecol. Monogr. 62:277–314

    Article  Google Scholar 

  • Selmi S. and Boulinier T. (2001). Ecological biogeography of Southern Ocean islands: the importance of considering spatial issues. Am. Nat. 158:426–437

    Article  Google Scholar 

  • Smouse P.E., Long J.C. and Sokal R.R. (1986). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35:627–632

    Article  Google Scholar 

  • Sokal R.R. (1986). Spatial data analysis and historical processes. In: Diday E., Escoufier Y., Lebart L., Pages J., Schektman Y. and Tomassone R. (eds), Data Analysis and Informatics, IV. Elsevier Science, Amsterdam, pp. 29–43

    Google Scholar 

  • Sokal R.R. and Oden N.L. (1978). Spatial autocorrelation in biology 2. Some biological implications and four applications of evolutionary and ecological interest. Biol. J. Linnean Soc. 10:229–249

    Google Scholar 

  • Tuomisto H., Ruokolainen K. and Yli-Halla M. (2003). Dispersal, environment, and floristic variation of western Amazonian forests. Science 299:241–244

    Article  PubMed  CAS  Google Scholar 

  • Urban D., Goslee S., Pierce K. and Lookingbill T. (2002). Extending community ecology to landscapes. Ecoscience 9:200–212

    Google Scholar 

  • Wenny D. G. and Levey D.J. (1998). Directed seed dispersal by bellbirds in a tropical cloud forest. Proc. Natl. Acad. Sci. USA 95:6204–6207

    Article  PubMed  CAS  Google Scholar 

  • Yee T.W. and Mitchell N.D. (1991). Generalized additive models in plant ecology. J. Veget. Sci. 2:587–602

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to Pierre Legendre for thoughtful comments and for clarifying the limitations of the distance matrix approach. Susan Shriner also provided helpful comments on an earlier draft. Maria Elisa Fanjul and Valeria Aschero provided superb assistance in the field. Funding was provided by a Fulbright Scholarship, a National Science Foundation Pre-doctoral Fellowship, and a Princeton University Centennial Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeremy W. Lichstein.

Appendices

Appendix A

Reparameterizing the fully specified lag matrices model

If the lag matrices model is fully specified (contains all lags), interpreting and reparameterizing the model are most straight-forward when the values in the lag matrices are coded as one for plot pairs that fall within a given lag class and zero otherwise. Consider a MRM model with (1) dependent ecological distance matrix Y unfolded into a vector of length n(n−1)/2; (2) an intercept (mean) vector of ones of length n(n−1)/2; and (3) a series of lag distance matrices unfolded into their corresponding distance vectors, each of length n(n−1)/2. (For simplicity, I have excluded environmental variables, but the reparameterization derived below is unchanged if other explanatory distance matrices are also included.) Consider n=5 sites (as in Figure 1) and five lag classes, each containing two of the n(n−1)/2=10 inter-site pairs (in reality, each lag class should contain at least 30 pairs). In matrix notation, the model is \({\bf Y}={\bf X}{\varvec{\beta}}+{\varvec{\varepsilon}}\):

$${\left({\begin{matrix}d_{1,2}\cr d_{1,3}\cr d_{1,4}\cr d_{1,5}\cr d_{2,3}\cr d_{2,4}\cr d_{2,5}\cr d_{3,4}\cr d_{3,5}\cr d_{4,5}\cr \end{matrix}}\right)}= {\left({\begin{matrix}y_{11}\cr y_{12}\cr y_{21}\cr y_{22}\cr y_{31}\cr y_{32}\cr y_{41}\cr y_{42}\cr y_{51}\cr y_{52}\cr \end{matrix}}\right)}= {\left[{\begin{matrix}1 & 1 & 0 & 0 & 0 & 0 \cr 1 & 1 & 0 & 0 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 & 0 \cr 1 & 0 & 0 & 1 & 0 & 0 \cr 1 & 0 & 0 & 1 & 0 & 0 \cr 1 & 0 & 0 & 0 & 1 & 0 \cr 1 & 0 & 0 & 0 & 1& 0 \cr 1 & 0 & 0 & 0 & 0 & 1 \cr 1 & 0 & 0 & 0 & 0 & 1 \cr \end{matrix}}\right]}{\left({\begin{matrix}\mu \cr \tau_1 \cr \tau_2 \cr \tau_3 \cr \tau_4\cr \tau_5 \end{matrix}}\right)}+\left({\begin{matrix}\varepsilon_{11}\cr \varepsilon_{12}\cr\varepsilon_{21}\cr \varepsilon_{22}\cr \varepsilon_{31}\cr\varepsilon_{32}\cr \varepsilon_{41}\cr \varepsilon_{42}\cr\varepsilon_{51}\cr \varepsilon_{52}\cr\end{matrix}}\right),$$
(A.1)

where \(\mu\) is the overall mean response, \(\tau_{i}\) is the mean departure of y in the ith lag from \(\mu\), and \({\varvec{\varepsilon}}\) is the vector of errors. Y is expressed in two different notations above because in regards to the parameterization of the multiple regression model, the n(n−1)/2 inter-site distances (d 1,2, d 1,3,..., d 4,5) should be thought of as y ij : the jth replicate within the ith treatment group (lag class). (In this example, two of the 10 distances, d ab , were arbitrarily assigned to each of the five lags; in a real analysis the geographic distances would determine these assignments.) The fully specified lag matrices model, then, is equivalent to a one-way ANOVA design, and differs from ANOVA only in that significance tests must be performed by permutation (Legendre et al. 1994). Like ANOVA models, the fully specified lag matrices model is not full rank; i.e., there is a linear dependency between the lag vectors (which sum to a vector of ones) and the intercept. Thus, the model has no unique solution and must be reparameterized to a full rank model (Rawlings et al. 1998, Chapter 9).

There are several common reparameterizations, each leading to the identical R 2 but allowing different hypotheses to be tested. In particular, we wish to test the hypotheses that the \(\tau_{i}\) are different from their mean (i.e., \(\tau_{i}-{\bar \tau} \ne 0\)), where \(\tau_{i}-{\bar \tau}\) is an autocorrelation index for the ith lag. Eliminating the intercept from the model leads to a reparameterization in which, rather than estimating \(\mu\) and each \(\tau_{i}\), we estimate each \(\mu_{i}=\mu +\tau_{i}\) (Rawlings et al. 1998:274). This reparameterization does not yield useful hypothesis tests: the \(\mu_{i}\) (e.g., the predicted values in Figure 4 G) may all be significantly different from zero, but not different from each other (in which case there would be no spatial pattern in the data).

To test the hypotheses of interest ( \(\tau_{i}-{\bar \tau} \ne 0\)), we impose the constraint that \(\Sigma\tau_{i} = 0\). Thus, for the model above with five lags, we have

$$ \tau_{5}=-(\tau_{1}+\tau_{2}+\tau_{3}+\tau_{4}). $$
(A.2)

The model for the distances (y) in the fifth lag class (y 51=d 3,5 and y 52=d 4,5 in our example),

$$ y_{5j}=\mu+\tau_{5}+\varepsilon_{5j}, $$
(A.3)

can now be re-written as

$$ y_{5j}=\mu + (-\tau_{1}-\tau_{2}-\tau_{3}-\tau_{4}) +\varepsilon_{5j}. $$
(A.4)

The parameter \(\tau_{5}\) may now be eliminated and the model re-defined as \({\bf Y}={\bf X}^{\ast}{\varvec{\beta}}^{\ast} +{\varvec{\varepsilon}}\) (Rawlings et al. 1998:278):

$$\left({\begin{matrix} y_{11}\cr y_{12}\cr y_{21}\cr y_{22}\cr y_{31}\cr y_{32}\cr y_{41}\cr y_{42}\cr y_{51}\cr y_{52}\cr\end{matrix}} \right)= \left[{\begin{array}{rrrrr} 1 & 1 & 0 & 0 & 0 \cr 1 & 1 & 0 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 \cr 1 & 0 & 0 & 1 & 0 \cr 1 & 0 & 0 & 1 & 0 \cr 1 & 0 & 0 & 0 & 1 \cr 1 & 0 & 0 & 0 & 1 \cr 1 & {-1} & {-1} & {-1} & {-1} \cr 1 & {-1} & {-1} & {-1} & {-1} \cr\end{array}}\right]\left({\begin{matrix}\mu^{\ast}\cr \tau_{1}^{\ast}\cr \tau_{2}^{\ast}\cr \tau_{3}^{\ast}\cr \tau_{4}^{\ast}\cr \end{matrix}}\right)+ \left({\begin{matrix}\varepsilon_{11}\cr \varepsilon_{12}\cr \varepsilon_{21}\cr \varepsilon_{22}\cr \varepsilon_{31}\cr \varepsilon_{32}\cr \varepsilon_{41}\cr \varepsilon_{42}\cr \varepsilon_{51}\cr \varepsilon_{52}\cr \end{matrix}}\right). $$
(A.5)

Note that the linear dependency has been removed, so the reparameterized model has a unique solution. In terms of the original model parameters, the expectation of the ordinary least squares estimate of \({\varvec{\beta}}^{\ast}\) is (Rawlings et al. 1998:278):

$$ \hbox{E}({\hat {\varvec \beta}}*)= \left({\eqalign{ \mu+{\bar \tau } \cr \tau_1-{\bar \tau}\cr \tau_2-{\bar \tau}\cr \tau_3-{\bar \tau}\cr \tau_4-{\bar \tau}\cr }}\right)=\left({\eqalign{ {\bar \mu}\cr {\bar \mu}_1 -{\bar \mu}\cr {\bar \mu}_2 -{\bar \mu}\cr {\bar \mu}_3 -{\bar \mu}\cr {\bar \mu}_4 -{\bar \mu}\cr }}\right). $$
(A.6)

Thus, the coefficients \({\hat \tau }_{1}^{\ast}, {\hat \tau}_{2}^{\ast}, {\hat \tau}_{3}^{\ast}\), and \({\hat \tau}_{4}^{\ast}\) in \({\hat {\varvec\beta}}^{\ast}\) provide autocorrelation indices (\(\tau_{i}-{\bar \tau}\)) for lags 1–4; significance tests for the coefficients are performed using the MRM permutation method described in this paper and in Legendre et al. (1994).

All that remains now is to estimate \(\tau_{5}-{\bar \tau}\) and determine if the estimator is significantly different than zero. An unbiased estimator for \(\tau_{5}-{\bar \tau}\) is (Rawlings et al. 1998:279):

$$ {\hat \tau}_{5}^{\ast}=-({\hat \tau}_{1}^{\ast} +{\hat \tau}_{2}^{\ast}+{\hat \tau}_{3}^{\ast}+{\hat \tau}_{4}^{\ast}). $$
(A.7)

In order to implement a permutation test for the hypothesis \(\tau_{5}^{\ast}\ne 0\) (i.e., \(\tau_{5}-{\bar \tau}\ne 0\)), it is necessary to reparameterize the model in the identical manner as before, but retaining \(\tau_{5}\) and eliminating instead any one of the other \(\tau_{i}\). For example, we could eliminate \(\tau_{1}\), in which case Eq. (A.5) would become:

$$\left({\begin{matrix} y_{11}\cr y_{12}\cr y_{21}\cr y_{22}\cr y_{31}\cr y_{32}\cr y_{41}\cr y_{42}\cr y_{51}\cr y_{52}\cr \end{matrix}} \right)= \left[{\begin{array}{rrrrr} 1 & {-1} & {-1} & {-1} & {-1} \cr 1 & {-1} & {-1} & {-1} & {-1} \cr 1 & 1 & 0 & 0 & 0 \cr 1 & 1 & 0 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 \cr 1 & 0 & 1 & 0 & 0 \cr 1 & 0 & 0 & 1 & 0 \cr 1 & 0 & 0 & 1 & 0 \cr 1 & 0 & 0 & 0 & 1 \cr 1 & 0 & 0 & 0 & 1 \cr \end{array}}\right] \left({\begin{matrix} \mu^{\ast}\cr \tau_{2}^{\ast}\cr \tau_{3}^{\ast}\cr \tau_{4}^{\ast}\cr \tau_{5}^{\ast}\cr \end{matrix}}\right)+ \left({\begin{matrix} \varepsilon_{11}\cr \varepsilon_{12}\cr \varepsilon_{21}\cr \varepsilon_{22}\cr \varepsilon_{31}\cr \varepsilon_{32}\cr \varepsilon_{41}\cr \varepsilon_{42}\cr \varepsilon_{51}\cr \varepsilon_{52}\cr \end{matrix}}\right). $$
(A.8)

The estimated coefficients \({\hat \tau}_{i}^{\ast}\) for the model in Eq. (A.8) should be identical to those for the model in Eq. (A.5), and the permutation tests for the two models should yield the same results.

In summary, the fully specified lag matrices model is reparameterized and tested as follows:

  1. (a)

    Eliminate any one of the lags (\(\tau_{i}\)) from the model by removing its column from X and, for the rows in X corresponding to the removed lag, replace each zero in the X columns of the retained lags with negative one. For example, Eq. (A.5) is the reparameterization of Eq. (A.1) after removing the fifth lag.

  2. (b)

    For the lags retained in the reparameterized model, the ordinary least squares multiple regression coefficients provide autocorrelation indices (\(\tau_{i}-{\bar \tau}\)), and the MRM permutation method provides tests of the null hypotheses \(\tau _{i}-{\bar \tau}= 0\).

  3. (c)

    To obtain the autocorrelation index and significance test for the lag eliminated in step (a), repeat steps (a) and (b), but this time eliminating one of the other \(\tau_{i}\) instead. For example, Eq. (A.8) is the reparameterization of Eq. (A.1) after removing the first lag.

Finally, two notes of caution are in order:

(1) Standardizing the unfolded distance vectors to mean zero and unit variance, which references the regression coefficients for different explanatory variables to a common scale, is often desirable. However, the coefficients of the fully specified lag matrices model are not interpretable in terms of the original response (e.g., Bray–Curtis distances) if the columns of X are standardized. Therefore, the X matrix for the fully specified lag matrices model should be coded with ones and zeros as in Eq. (A.1), and the re-defined X * matrices with ones, zeros, and negative ones as in Eqs. (A.5) and (A.8). Note that the program Permute! (Casgrain 2002) reports standardized regression coefficients. This standardization affects the values of the coefficients, but not their P-values. Nonstandardized coefficients (which reflect the original coding of the variables) may be obtained from any standard regression software.

Another reason not to standardize the X matrix in the fully specified lag matrices model is the following: If there are unequal numbers of plot pairs in the different lag classes, then standardization will cause the columns in X to have different codings, and Eqs. (A.2)–(A.8) will no longer hold.

(2) Reparameterizing the lag matrices model is neither necessary nor desirable if only some of the lags are included in the model. For example, if the Mantel correlogram is used to determine at which lags there is significant autocorrelation, and only these lags are included in an MRM model, then there is no reason to reparameterize the lag matrices: The linear dependency between the intercept and the lag matrices only arises only if all lag matrices are included in the model, and the interpretation of the reparameterization discussed above only holds for the fully specified model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lichstein, J.W. Multiple regression on distance matrices: a multivariate spatial analysis tool. Plant Ecol 188, 117–131 (2007). https://doi.org/10.1007/s11258-006-9126-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11258-006-9126-3

Keywords

Navigation