A permutation-based combination of sign tests for assessing habitat selection

Abstract

The analysis of habitat selection in radio-tagged animals is approached by comparing the portions of use against the portions of availability observed for each habitat type. Since data are linearly dependent with singular variance-covariance matrices, standard multivariate statistical tests cannot be applied. To bypass the problem, compositional data analysis is customarily performed via log-ratio transform of sample observations. The procedure is criticized in this paper, emphasizing the several drawbacks which may arise from the use of compositional analysis. An alternative nonparametric solution is proposed in the framework of multiple testing. The habitat use is assessed separately for each habitat type by means of the sign test performed on the original observations. The resulting p values are combined in an overall test statistic whose significance is determined permuting sample observations. The theoretical findings of the paper are checked by simulation studies. Applications to case studies previously considered in literature are discussed.

This is a preview of subscription content, log in to check access.

Abbreviations

RHU:

Proportional or random habitat use

PAT:

Portion of animal trajectory

PAHR:

Portion of animal home range

CODA:

Compositional data analysis

References

  1. Aebischer NJ, Robertson PA, Kenward RE (1993) Compositional analysis of habitat use from animal radio-tracking data. Ecology 74:1315–1325

    Article  Google Scholar 

  2. Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London

    Google Scholar 

  3. Aitchison J (1994) Principles of compositional data analysis. In: Anderson TW, Fang KT, Olkin J (eds) Multivariate analysis and its applications. Institute of Mathematical Statistics, Hayward, pp 73–81

    Google Scholar 

  4. Calenge C (2006) The package “adehabitat” for the R software: A tool for the analysis of space and habitat use by animal. Ecol Model 197:516–519

    Article  Google Scholar 

  5. Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate distributions. Chapman and Hall, London

    Google Scholar 

  6. Johnson DH (1980) The comparison of usage and availability measurements for evaluating resource preference. Ecology 61:65–71

    Article  Google Scholar 

  7. Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2. Wiley, New York

    Google Scholar 

  8. Johnson DS, Thomas DL, Ver Hoef TJ, Christ A (2008) A general framework for the analysis of animal resource selection from telemetry data. Biometrics 64:968–976

    PubMed  Article  Google Scholar 

  9. Kneib T, Knauer F, Küchenhoff H (2011) A general approach to the analysis of habitat selection. Environ Ecol Stat 18:1–25

    Article  Google Scholar 

  10. Kooper N, Manseau M (2009) Generalized estimating equations and generalized linear mixed-effects models for modelling resource selection. J Appl Ecol 46:590–599

    Article  Google Scholar 

  11. Manly BFJ, McDonald LL, Thomas DL, McDonald TL, Erickson WP (2002) Resource selection by animals. Kluwer, Dordrecht

  12. Pesarin F (1992) A resampling procedure for nonparametric combination of several dependent tests. J Italian Stat Soc 1:87–101

    Article  Google Scholar 

  13. Pesarin F (2001) Multivariate permutation tests: with applications in biostatistics. Wiley, New York

    Google Scholar 

  14. Randles RH, Wolfe DA (1979) Introduction to the theory of nonparametric statistics. Wiley, New York

    Google Scholar 

  15. Strickland MD, McDonald LL (2006) Introduction to the special section on resource selection. J Wildl Manag 70:321–323

    Article  Google Scholar 

  16. Westfall PH, Young SS (1993) Resampling-based multiple testing. Wiley, New York

    Google Scholar 

  17. Worton BJ (1989) Kernel methods for estimating the utilization distribution in home-range studies. Ecology 70:164–168

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Luca Pratelli for his helpful suggestions in the theoretical aspects of the work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lorenzo Fattorini.

Additional information

Handling Editor: Ashis SenGupta.

Appendices

Appendix 1: Different expressions for the hypothesis of random habitat use

The RHU hypothesis (2) actually constitutes a multivariate hypothesis which can be rewritten as

$$\begin{aligned} \text{ H }_{X0} :\bigcap _{j=1}^K \left\{ \text{ E }(X_{Uj} -X_{Aj})=0\right\} \end{aligned}$$
(12)

where \(\text{ E }(X_{Uj} -X_{Aj})=0\) is the univariate hypothesis that the expected use of habitat \(j\) coincides with its expected (or constant) availability. The obvious sense of (12) is that \(\text{ H }_{X0}\) is true if all the univariate hypotheses are true. In turn, chosen a reference habitat \(h\), (12) is equivalent to

$$\begin{aligned} \text{ H }_{X0}:\bigcap _{j\ne h=1}^K \left\{ \frac{\text{ E }(X_{Uj})}{\text{ E }(X_{Uh})}= \frac{\text{ E }(X_{Aj})}{\text{ E }(X_{Ah})}\right\} \end{aligned}$$
(13)

Indeed, if (2) is true, than for any habitat \(j\) it follows from (12) that \(\text{ E }(X_{Uj})=\text{ E }(X_{Aj})\) from which \(\text{ E }(X_{Uj})/\text{ E }(X_{Aj})=1\). Accordingly, for the reference habitat \(h\) and for each \(j\ne h\), it follows that \(\text{ E }(X_{Uj})/\text{ E }(X_{Aj})=\text{ E }(X_{Uh})/\text{ E }(X_{Ah})\) from which \(\text{ E }(X_{Uj})/\text{ E }(X_{Uh})=\text{ E }(X_{Aj})/\text{ E }(X_{Ah})\). As to the reverse, if (13) is true, then for the reference habitat \(h\) and for each \(j\ne h\) it holds that \(\text{ E }(X_{Uj})/\text{ E }(X_{Uh})=\text{ E }(X_{Aj})/\text{ E }(X_{Ah})\) or equivalently \(\text{ E }(X_{Uj})/\text{ E }(X_{Aj})=\text{ E }(X_{Uh})/\text{ E }(X_{Ah})\), i.e. for each \(j=1, \ldots , K\) it holds that \(\text{ E }(X_{Uj})/\text{ E }(X_{Aj})=c\) or equivalently \(\text{ E }(X_{Uj})=c\text{ E }(X_{Aj})\).

But since \(\sum _{j=1}^K {\text{ E }(X_{Uj})} =\sum _{j=1}^K {\text{ E }(X_{Aj})} =1\), then \(c=1\), which obviously implies (2).

In a similar way, chosen a reference habitat \(h\), (3) constitutes a multivariate hypothesis which is equivalent to

$$\begin{aligned} \text{ H }_{Y0} :\bigcap _{j\ne h=1}^K {\left\{ \text{ E }(Y_{Uj}-Y_{Aj}) =0\right\} } \end{aligned}$$

or, more explicitly, to

$$\begin{aligned} \text{ H }_{Y0} :\bigcap _{j\ne h=1}^K \left\{ \text{ E }\left( \ln \frac{X_{Uj}}{X_{Uh}}\right) =\text{ E }\left( \ln \frac{X_{Aj}}{X_{Ah}}\right) \right\} \end{aligned}$$
(14)

From (13) and (14), it is at once apparent that (3) is equivalent to (2) if

$$\begin{aligned} \ln \left\{ \frac{\text{ E }(X_{Uj})}{\text{ E }(X_{Uh})}\right\} -\ln \left\{ \frac{\text{ E }(X_{Aj})}{\text{ E }(X_{Ah})} \right\} =\text{ E }\left\{ \ln \left( \frac{X_{Uj}}{X_{Uh}} \right) \right\} -\text{ E }\left\{ \ln \left( \frac{X_{Aj}}{X_{Ah}} \right) \right\} =0 \end{aligned}$$
(15)

for each \(j\ne h=1,\ldots , K\). Since \(\text{ E }\left\{ \ln (X)\right\} \) generally differs from \(\ln \text{ E }(X)\), relation (15) does not generally hold.

Appendix 2: Dirichlet distributions and log-ratio transforms

The Dirichlet distribution is probably the most familiar model adopted for positive random vectors \(\mathbf{X}=\left[ X_1, \ldots , X_K\right] ^{\mathrm{T}}\) subject to the constraint \(\mathbf{1}^{\mathrm{T}}\mathbf{X}=1\). A \(K\)-variate random vector X is said to have a Dirichlet distribution with parameters \(\delta >0\) and \({\varvec{\uptheta }}=\left[ \theta _1, \ldots , \theta _K\right] ^{\mathrm{T}}\) with \(\theta _{j}>0\) for each \(j=1,\ldots , K\) if the joint probability density function at \(\mathbf{x}=\left[ x_1, \ldots , x_K \right] ^{\mathrm{T}}\) with \(\mathbf{1}^{\mathrm{T}}\mathbf{x}=1\) is given by

$$\begin{aligned} f(\mathbf{x})=\frac{\Gamma (\delta \theta )}{\prod \nolimits _{j=1}^{K} {\Gamma (\delta \theta _j)}}\prod _{j=1}^K {x_j^{\delta \theta _j -1}} \end{aligned}$$

where \(\theta =\mathbf{1}^{\mathrm{T}}{\varvec{\uptheta }}\). As is well known (e.g. Fang et al. 1990), each marginal variable \(X_j\) has a beta distribution on [0,1] with shape parameters \(\delta \theta _j\) and \(\delta (\theta -\theta _j)\) in such a way that

$$\begin{aligned} \text{ E }(X_j)=\frac{\theta _j}{\theta } \end{aligned}$$

and

$$\begin{aligned} \text{ V }(X_j)=\frac{\theta _j (\theta -\theta _j)}{\theta ^{2}(\delta \theta +1)} \end{aligned}$$

Accordingly, marginal expectations do not depend on \(\delta \) and marginal variances increase as \(\delta \) decreases. In the framework of habitat selection analysis, \(\delta \) obviously accounts for the variability of portions of animal trajectories or home ranges within habitat types. However, when these quantities are estimated on the field by means of animal’s radio locations, \(\delta \) also accounts for the number of radio locations adopted in the study, since marginal variances decrease as the \(r_i\text{ s }\) increase and estimates become close to the real values.

If X has a Dirichlet distribution with parameters \(\delta \) and \({\varvec{\uptheta }}\), the log-ratio transform \(\mathbf{Y}=lr_h (\mathbf{X})\) is a random vector on \(R^{K-1}\) whose \(j\)-th marginal random variable \(Y_j =\ln (X_j/X_h)\) has a generalized logistic distribution of type IV with expectation

$$\begin{aligned} \text{ E }(Y_j)=\varphi (\delta \theta _j)-\varphi (\delta \theta _h) \end{aligned}$$
(16)

where \(\varphi (x)=\partial \ln \Gamma (x)/\partial x\) denotes the digamma function (e.g. Johnson et al. 1995, p. 142, Fang et al. 1990, Problem 1.5).

In the case of Johnson’s second order selection, denote by a the vector of portions of habitat types in the study area and suppose that \(\mathbf{X}_U\) has a Dirichlet distribution with parameters \(\delta _U\) and a, in such a way that \(\text{ H }_{X0}\) is true. Thus, in accordance with (16), the squared value of the unreliability measure of CODA-based procedure turns out to be

$$\begin{aligned} \Delta ^{2}=\frac{1}{K}\sum _{h=1}^K {\sum _{j\ne h} {\left\{ \phi (\delta _U a_j)-\phi (\delta _U a_h)-\ln (a_j/a_h)\right\} ^{2}}} \end{aligned}$$
(17)

In a similar way, in the case of Johnson’s third order selection, suppose that \(\mathbf{X}_U\) and \(\mathbf{X}_A\) have Dirichlet distributions with the same parameter a and variability parameters \(\delta _U\) and \(\delta _A\), respectively, in such a way that \(\text{ H }_{X0}\) is true. From (16), the squared value of unreliability measure is

$$\begin{aligned} \Delta ^{2}=\frac{1}{K}\sum _{h=1}^K {\sum _{j\ne h}{\left\{ \phi (\delta _U a_j)-\phi (\delta _U a_h)-\phi (\delta _A a_j)-\phi (\delta _A a_h)\right\} ^{2}}} \end{aligned}$$
(18)

Appendix 3: Generating dependent compositional data

It is worth noting that \(\mathbf{X}_U\) and \(\mathbf{X}_A\) arise from the choice of the same animal and as such they should be realistically presumed as dependent random vectors. However, the general problem of constructing dependent random vectors \(\mathbf{X}_1 =\left[ X_{11}, \ldots , X_{1K}\right] ^{\mathrm{T}}\) and \(\mathbf{X}_2 =\left[ X_{21}, \ldots , X_{2K}\right] ^{\mathrm{T}}\) subject to the constraint \(\mathbf{1}^{\mathrm{T}}\mathbf{X}_1 =\mathbf{1}^{\mathrm{T}}\mathbf{X}_2 =1\) is difficult to solve in the framework of Dirichlet model since any couple of subvectors \(\mathbf{X}_1,\,\mathbf{X}_2\) partitioning a vector X with a Dirichlet distribution turn out to be independent with marginal Dirichlet distributions (see Fang et al. 1990, Theorem 1.4).

For this purpose, it is convenient to consider one vector, say \(\mathbf{X}_1\), distributed as a Dirichlet random vector with parameters \(\delta >0\) and \({\varvec{\uptheta }}\) in such a way that \(\mathbf{1}^{\mathrm{T}}\mathbf{X}_1 =1\), and then obtaining \(\mathbf{X}_2\) by means of \(\mathbf{X}_1 +\mathbf{U}\), where U is a random vector in which \(K-1\) components, say \(U_1, \ldots , U_{K-1}\), are random variables in the range \((-W,W)\) with

$$\begin{aligned} W=\text{ min }\left( X_{11}, \ldots , X_{1K-1}, \frac{X_{1K}}{K-1}\right) \end{aligned}$$

and \(U_K =-(U_1 +\cdots +U_{K-1})\). Indeed, after a straightforward algebra it can be proven that \(0<X_{2j} <1\) for each \(j=1,\ldots , K\) while \(\mathbf{1}^{\mathrm{T}}\mathbf{X}_2 =1\) by construction. Obviously \(\text{ E }(X_{2j})=\text{ E }(X_{1j})+\text{ E }(U_j)\), while \(\text{ V }(X_{2j})=\text{ V }(X_{1j})+\text{ V }(U_j)\), providing that \(\mathbf{X}_1\) and U are independent. If \(\text{ E }(\mathbf{U})=\mathbf 0 \), then \(\mathbf{X}_1\) and \(\mathbf{X}_2\) are dependent with the same mean vector. Moreover, if the \(U_j\text{ s }\) are symmetrically distributed around 0, than \(\text{ Pr }(X_{2j}>X_{1j})=0.5\) for each \(j=1,\ldots , K\). These two last features can be readily achieved if the \(U_j\text{ s }\) are independent beta variables on \((-W,W)\) with shape parameters both equal to \(\beta >0\) in such a way that they turn out to be symmetric around 0, with variance

$$\begin{aligned} \text{ V }(U_j)=\frac{1}{4(\beta ^{2}+1)} \end{aligned}$$

Accordingly the \(U_j\text{ s }\) inflate the variances of the \(X_{1j}\) by a term which increases as \(\beta \) approaches 0.

If \(\mathbf{X}_1\) coincides with the vector of constants a, then if \(\text{ E }(\mathbf{U})=\mathbf 0 \) and the \(U_j\text{ s }\) are symmetrically distributed around 0, \(\text{ E }(\mathbf{X}_2)=\mathbf{a},\,\text{ Pr }(X_{2j} >a_j)=0.5\) and \(\text{ V }(X_{2j})=\text{ V }(U_j)\) for each \(j=1,\ldots , K\). Obviously, in this case the \(U_j\text{ s }\) varies on \((-w,w)\) with \(w=\min (a_1, \ldots , a_{K-1}, \frac{a_K}{K-1})\).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Fattorini, L., Pisani, C., Riga, F. et al. A permutation-based combination of sign tests for assessing habitat selection. Environ Ecol Stat 21, 161–187 (2014). https://doi.org/10.1007/s10651-013-0250-7

Download citation

Keywords

  • Compositional data analysis
  • Johnson’s second order selection
  • Johnson’s third order selection
  • Monte Carlo studies
  • Multiple testing
  • Random habitat use