Effective Sample Size for Line Transect Sampling Models with an Application to Marine Macroalgae

Abstract

This paper provides a framework for estimating the effective sample size in a spatial regression model context when the data have been sampled using a line transect scheme and there is an evident serial correlation due to the chronological order in which the observations were collected. We propose a linear regression model with a partially linear covariance structure to address the computation of the effective sample size when spatial and serial correlations are present. A recursive algorithm is described to separately estimate the linear and nonlinear parameters involved in the covariance structure. The kriging equations are also presented to explore the kriging variance between our proposal and a typical spatial regression model. An application in the context of marine macroalgae, which motivated the present work, is also presented.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Anderson, T. W. (1973), Asymptotically efficient Estimation of the covariance matrices with linear structure, The Annals of Statistics 1: 135–141.

  2. Banerjee, S., Carlin, B., and Gelfand, A. (2004), Hierarchical Modeling and Analysis for Spatial Data, Chapman & Hall, London.

  3. Barndorff-Nielsen, O., Kent, J., Sørensen, M. (1982). Normal variance-mean mixtures and \(z\) distributions. International Statistical Review 50: 145–159.

  4. Box, G. (1954a), Some Theorems on quadratic forms applied in the study of analysis of variance problems. I. Effect of inequality of variance in the one–way classification, Annals of Mathematical Statistics 25: 290–302.

  5. ——– (1954b), Some theorems on quadratic forms applied in the study of analysis of variance problems. II. Effects of inequality of variance and of correlation between errors in the two–way classification, Annals of Mathematical Statistics 25: 484–98.

  6. Box, G. E. P., and Cox, D. R. (1964), An analysis of transformations, Journal of the Royal Statistical Society. Series B 26: 211–252.

  7. Brockwell, P., and Davis, R. (2006), Time Series: Theory and Methods, Springer, New York.

  8. Clifford, P., Richardson, S., and Hémon, D. (1989), Assessing the significance of the correlation between two spatial processes, Biometrics 45: 123–34.

  9. Cressie, N. (1993), Statistics for Spatial Data, Wiley, New York.

  10. Cressie, N., and Lahiri, S. N. (1996), Asymptotics for REML estimation of spatial covariance parameters, Journal of Statistical Planning and Inference 50: 327–341.

  11. Crujeiras, R. M., and Van Keilegom I. (2010), Least squares estimation of non linear spatial trends, Computational Statistics and Data Analysis 54: 452–465.

  12. Dale, M. R. T., and Fortin M-J. (2009), Spatial autocorrelation and statistical tests: some solutions, Journal of the Agricultural, Biological, and Environmental Statistics 14: 188–206.

  13. Dutilleul, P. (1993), Modifying the \(t\) test for assessing the correlation between two spatial processes, Biometrics 49: 305–314.

  14. Faes, C., Molenberghs, G., Aerpts, M., Verbeke, G., and Kenward, M. (2009), Effective sample size and an alternative small-sample degrees-of-freedom method, The American Statistician 63: 389–399.

  15. Field, C., and Genton, M. G. (2006). The multivariate \(g\) distribution, Technometrics 48: 104–111.

  16. Griffith, D. (2005), Effective geographic sample size in the presence of spatial autocorrelation, Annals of the Association of American Geographers 95: 740–760.

  17. Headrick, T. C., Kowalchuk, R. K., and Sheng, Y. (2008). Parametric probability densities and distribution functions for Tukey \(g\) transformations and their use for fitting data, Applied Mathematical Sciences 2: 449–462.

  18. Hedley, S. L., and Buckland, S. T. (2004), Spatial Models for line transect sampling, Journal of the Agricultural, Biological, and Environmental Statistics 9: 181–199.

  19. Kutner, M., Nachtsheim, C., Neter, J., and Li, W. (2004), Applied Linear Statistical Models, McGraw-Hill/Irwin, Homewood, IL.

  20. Mardia, K. V., and Marshall, R. J. (1984), Maximum likelihood estimation of models for residual covariance in spatial regression, Biometrika 71: 135–146.

  21. Protassov, R. S. (2004). EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions with fixed \(\lambda \), Statistics and Computing 14: 67–77.

  22. R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

  23. Rasmussen, C. E., and William, C. K. I. (2006), Gaussian Processes for Machine Learning, The MIT Press, Massachusetts.

  24. Ribeiro, P. J., Diggle, P. J. (2015). geoR: Analysis of Geostatistical Data. R package version 1.7-5.1.http://CRAN.R-project.org/package=geoR

  25. Rubin, D. B., Szatrowski, T. H. (1982), Finding maximum likelihood estimates of patterned covariance matrices by the EM algorithm, Biometrika 69: 657–660.

  26. Szatrowski, T. H. (1980), Necessary and sufficient conditions for explicit solutions in the multivariate normal estimation problem for patterned means and covariances, The Annals of Statistics 8: 802–810.

  27. Vallejos, R., and Osorio, F. (2014), Effective sample size of spatial process models, Spatial Statistics 9: 66–92.

  28. Vásquez, J., B., and Santelices, J. A. (1990), Ecological effects of harvesting Lessonia (Laminariales, Phaeophyta) in central Chile, Hydrobiología 204/205: 41–47.

Download references

Acknowledgments

Ronny Vallejos was partially supported by Fondecyt Grant 1120048, Chile, AC3E Grant FB-0008, and USM Grant 12.15.09. Felipe Osorio was partially supported by FONDECYT Grant 1140580. Jonathan Acosta was partially supported by PIIC at UTFSM, Chile. The authors are indebted to Luis Aris, Luis Figueroa, and Carlos Cortés from IFOP for providing the macroalgae dataset and for helpful discussions. The authors are also grateful to Diego Alvarez from UTFSM for providing preliminary computational results regarding the macroalgae dataset. In addition, the authors would like to thank Dr. Emilio Porcu at UTFSM for his constant support. The authors acknowledge the suggestions from two anonymous referees and an associate editor and the editor of JABES that improved the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ronny Vallejos.

Appendices

Appendix 1: The Estimation Algorithm

figurea
figureb

Appendix 2: The Box–Cox Transformation

The following function was proposed by Box and Cox (1964):

$$\begin{aligned} Z_{\varvec{\delta }}(\varvec{s}) = \displaystyle \left\{ \begin{array}{ccc} \frac{(Y(\varvec{s})+\delta _2)^{\delta _1}-1}{\delta _1} &{};&{} \delta _1\ne 0,\\ \ln (Y(\varvec{s})+\delta _2)&{};&{}\delta _1=0, \end{array}\right. \end{aligned}$$
(18)

where \(Y(\varvec{s})\) is the original variable and \(\varvec{\delta }=(\delta _1,\delta _2)\) is an unknown parameter vector to be estimated to achieve normality of the transformed variable \(Z(\cdot ).\) Given the vector of spatial observations \((Z(\varvec{s}_1),\ldots ,Z(\varvec{s}_n))^{\top }\), \(\varvec{\delta }\) can be estimated (Box and Cox 1964) by maximizing the likelihood function

$$\begin{aligned} L(\varvec{\delta })= -\frac{n}{2} \ln \left( \frac{1}{n}\sum _{i=1}^{n}(Z_{\varvec{\delta }}(\varvec{s}_i)-\bar{Z}(\varvec{s}_i)^2\right) +(\delta _1-1)\sum _{i=1}^{n}\ln (Y(\varvec{s}_i)+\delta _2). \end{aligned}$$
(19)

An alternative way to estimate \(\varvec{\delta }\) is to find the optimal value that maximizes the correlation between \(\Phi ^{-1}\left( (i-0.5)/n\right) \) and \(Z_{(i)},\) where \(\Phi ^{-1}\) is the inverse of the cumulative distribution function of \(Z(\varvec{s}_i)\) and \(Z_{(i)}\) is the order statistic associated with \(Z(\varvec{s}_i)\), for \(i=1,\ldots ,n\) (see Kutner et al. 2004).

Using the logarithm transformation (\(\delta _1=0\)), it is possible to obtain an approximated expression that relates the ESS for the original and transformed variables.

Proposition 1

Suppose that \(\varvec{Z}=(Z(\varvec{s}_1),\ldots ,Z(\varvec{s}_n))^{\top }\sim \mathcal {N}(\varvec{\mu }_{Z},\varvec{\Sigma }_Z )\) and consider the transformation

$$\begin{aligned} Z(\varvec{s}_i)=\ln ( Y(\varvec{s}_i)+\delta _2), \quad i=1,\dots ,n, \end{aligned}$$
(20)

where \(\varvec{Y}=(Y(\varvec{s}_1),\dots ,Y(\varvec{s}_n))^{\top }\) is the original spatial sample. Let \(\varvec{R}_{\varvec{Z}}\) and \(\varvec{R}_{\varvec{Y}}\) be the correlation matrices of \(\varvec{Z}\) and \(\varvec{Y}\), respectively. If \((\varvec{\mu }_{\varvec{Z}})_i= \mu ,\) and \((\varvec{\Sigma }_{\varvec{Z}})_{ii}=\sigma _{ii}=\widetilde{\sigma }^2, i=1,\ldots ,n,\) then

$$\begin{aligned} (\varvec{R}_{\varvec{Y}})_{ij} = c\cdot (\varvec{R_Z})_{ij}k_{ij}+o\left( |\tilde{\sigma }^2|^{m}\right) , \end{aligned}$$
(21)

where \(\displaystyle c=\dfrac{\tilde{\sigma }^2}{\exp (\tilde{\sigma }^2)-1}\) and \(\displaystyle k_{ij}=1+\dfrac{\sigma _{ij}}{2}+\cdots +\dfrac{\sigma _{ij}^{m-1}}{m!}\). Furthermore, \((\varvec{R}_{\varvec{Y}})_{ij} = \dfrac{c}{\tilde{\sigma }^2}(e^{\sigma _{ij}}-1)\) as \(m\rightarrow \infty \).

Proof

Because \(Z(\varvec{s}_i)\) is normally distributed, \(Y(\varvec{s}_i)\) has a lognormal distribution with

$$\begin{aligned} {{\mathrm{E}}}[Y(\varvec{s}_i)]&=e^{\mu _i+0.5\tilde{\sigma }^2}-\delta _2,\\ {{\mathrm{cov}}}(Y(\varvec{s}_i),Y(\varvec{s}_j))&=e^{\mu _i+\mu _j+\tilde{\sigma }^2}\left( e^{\sigma _{ij}}-1\right) , \end{aligned}$$

for \(i,j=1,\dots ,n\).

Using a Taylor expansion for the function \(e^{\sigma _{ij}},\) one obtains

$$\begin{aligned} {{\mathrm{cor}}}(Y(\varvec{s}_i),Y(\varvec{s}_j))= & {} \dfrac{e^{\mu _i+\mu _j+\tilde{\sigma }^2}\left( e^{\sigma _{ij}}-1\right) }{e^{\mu _i+\mu _j+\tilde{\sigma }^2}\left( e^{\tilde{\sigma }^2}-1\right) }\\= & {} \dfrac{1}{e^{\tilde{\sigma }^2}-1}\sigma _{ij}\left( 1+\frac{\sigma _{ij}}{2}+\dots +\frac{\sigma ^{m-1}_{ij}}{m!}\right) +\dfrac{1}{e^{\tilde{\sigma }^2}-1}o\left( |\tilde{\sigma }^2|^{m}\right) \\= & {} \dfrac{\tilde{\sigma }^2}{e^{\tilde{\sigma }^2}-1}\dfrac{\sigma _{ij}}{\tilde{\sigma }^2}k_{ij}+o\left( |\tilde{\sigma }^2|^{m}\right) \\= & {} c\cdot (\varvec{R_Z})_{ij}k_{ij}+o\left( |\tilde{\sigma }^2|^{m}\right) , \end{aligned}$$

where \(\displaystyle {{\mathrm{cor}}}(Z(\varvec{s}_i),Z(\varvec{s}_j))=(\varvec{R_Z})_{ij}=\dfrac{\sigma _{ij}}{\tilde{\sigma }^2}\), \(\displaystyle k_{ij}=1+\dfrac{\sigma _{ij}}{2}+\cdots +\dfrac{\sigma _{ij}^{m-1}}{m!}\), and \(\displaystyle c=\dfrac{\tilde{\sigma }^2}{\exp (\tilde{\sigma }^2)-1}\). Moreover, if \(m\rightarrow \infty \) we have that

$$\begin{aligned} {{\mathrm{cor}}}(Y(\varvec{s}_i),Y(\varvec{s}_j)) = \dfrac{e^{\sigma _{ij}}-1}{e^{\tilde{\sigma }^2}-1} = \dfrac{c}{\tilde{\sigma }^2}(e^{\sigma _{ij}}-1). \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Acosta, J., Osorio, F. & Vallejos, R. Effective Sample Size for Line Transect Sampling Models with an Application to Marine Macroalgae. JABES 21, 407–425 (2016). https://doi.org/10.1007/s13253-016-0252-7

Download citation

Keywords

  • Effective sample size
  • Line transects
  • Spatial association
  • ARMA processes
  • Marine macroalgae