## Abstract

This paper provides a framework for estimating the effective sample size in a spatial regression model context when the data have been sampled using a line transect scheme and there is an evident serial correlation due to the chronological order in which the observations were collected. We propose a linear regression model with a partially linear covariance structure to address the computation of the effective sample size when spatial and serial correlations are present. A recursive algorithm is described to separately estimate the linear and nonlinear parameters involved in the covariance structure. The kriging equations are also presented to explore the kriging variance between our proposal and a typical spatial regression model. An application in the context of marine macroalgae, which motivated the present work, is also presented.

### Similar content being viewed by others

## References

Anderson, T. W. (1973), Asymptotically efficient Estimation of the covariance matrices with linear structure,

*The Annals of Statistics*1: 135–141.Banerjee, S., Carlin, B., and Gelfand, A. (2004),

*Hierarchical Modeling and Analysis for Spatial Data*, Chapman & Hall, London.Barndorff-Nielsen, O., Kent, J., Sørensen, M. (1982). Normal variance-mean mixtures and \(z\) distributions.

*International Statistical Review*50: 145–159.Box, G. (1954a), Some Theorems on quadratic forms applied in the study of analysis of variance problems. I. Effect of inequality of variance in the one–way classification,

*Annals of Mathematical Statistics*25: 290–302.——– (1954b), Some theorems on quadratic forms applied in the study of analysis of variance problems. II. Effects of inequality of variance and of correlation between errors in the two–way classification,

*Annals of Mathematical Statistics*25: 484–98.Box, G. E. P., and Cox, D. R. (1964), An analysis of transformations,

*Journal of the Royal Statistical Society. Series B*26: 211–252.Brockwell, P., and Davis, R. (2006),

*Time Series: Theory and Methods*, Springer, New York.Clifford, P., Richardson, S., and Hémon, D. (1989), Assessing the significance of the correlation between two spatial processes,

*Biometrics*45: 123–34.Cressie, N. (1993),

*Statistics for Spatial Data*, Wiley, New York.Cressie, N., and Lahiri, S. N. (1996), Asymptotics for REML estimation of spatial covariance parameters,

*Journal of Statistical Planning and Inference*50: 327–341.Crujeiras, R. M., and Van Keilegom I. (2010), Least squares estimation of non linear spatial trends,

*Computational Statistics and Data Analysis*54: 452–465.Dale, M. R. T., and Fortin M-J. (2009), Spatial autocorrelation and statistical tests: some solutions,

*Journal of the Agricultural, Biological, and Environmental Statistics*14: 188–206.Dutilleul, P. (1993), Modifying the \(t\) test for assessing the correlation between two spatial processes,

*Biometrics*49: 305–314.Faes, C., Molenberghs, G., Aerpts, M., Verbeke, G., and Kenward, M. (2009), Effective sample size and an alternative small-sample degrees-of-freedom method,

*The American Statistician*63: 389–399.Field, C., and Genton, M. G. (2006). The multivariate \(g\) distribution,

*Technometrics*48: 104–111.Griffith, D. (2005), Effective geographic sample size in the presence of spatial autocorrelation,

*Annals of the Association of American Geographers*95: 740–760.Headrick, T. C., Kowalchuk, R. K., and Sheng, Y. (2008). Parametric probability densities and distribution functions for Tukey \(g\) transformations and their use for fitting data,

*Applied Mathematical Sciences*2: 449–462.Hedley, S. L., and Buckland, S. T. (2004), Spatial Models for line transect sampling,

*Journal of the Agricultural, Biological, and Environmental Statistics*9: 181–199.Kutner, M., Nachtsheim, C., Neter, J., and Li, W. (2004),

*Applied Linear Statistical Models*, McGraw-Hill/Irwin, Homewood, IL.Mardia, K. V., and Marshall, R. J. (1984), Maximum likelihood estimation of models for residual covariance in spatial regression,

*Biometrika*71: 135–146.Protassov, R. S. (2004). EM-based maximum likelihood parameter estimation for multivariate generalized hyperbolic distributions with fixed \(\lambda \),

*Statistics and Computing*14: 67–77.R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

Rasmussen, C. E., and William, C. K. I. (2006),

*Gaussian Processes for Machine Learning*, The MIT Press, Massachusetts.Ribeiro, P. J., Diggle, P. J. (2015). geoR: Analysis of Geostatistical Data. R package version 1.7-5.1.http://CRAN.R-project.org/package=geoR

Rubin, D. B., Szatrowski, T. H. (1982), Finding maximum likelihood estimates of patterned covariance matrices by the EM algorithm,

*Biometrika*69: 657–660.Szatrowski, T. H. (1980), Necessary and sufficient conditions for explicit solutions in the multivariate normal estimation problem for patterned means and covariances,

*The Annals of Statistics*8: 802–810.Vallejos, R., and Osorio, F. (2014), Effective sample size of spatial process models,

*Spatial Statistics*9: 66–92.Vásquez, J., B., and Santelices, J. A. (1990), Ecological effects of harvesting Lessonia (Laminariales, Phaeophyta) in central Chile,

*Hydrobiología*204/205: 41–47.

## Acknowledgments

Ronny Vallejos was partially supported by Fondecyt Grant 1120048, Chile, AC3E Grant FB-0008, and USM Grant 12.15.09. Felipe Osorio was partially supported by FONDECYT Grant 1140580. Jonathan Acosta was partially supported by PIIC at UTFSM, Chile. The authors are indebted to Luis Aris, Luis Figueroa, and Carlos Cortés from IFOP for providing the macroalgae dataset and for helpful discussions. The authors are also grateful to Diego Alvarez from UTFSM for providing preliminary computational results regarding the macroalgae dataset. In addition, the authors would like to thank Dr. Emilio Porcu at UTFSM for his constant support. The authors acknowledge the suggestions from two anonymous referees and an associate editor and the editor of JABES that improved the manuscript.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendices

### Appendix 1: The Estimation Algorithm

### Appendix 2: The Box–Cox Transformation

The following function was proposed by Box and Cox (1964):

where \(Y(\varvec{s})\) is the original variable and \(\varvec{\delta }=(\delta _1,\delta _2)\) is an unknown parameter vector to be estimated to achieve normality of the transformed variable \(Z(\cdot ).\) Given the vector of spatial observations \((Z(\varvec{s}_1),\ldots ,Z(\varvec{s}_n))^{\top }\), \(\varvec{\delta }\) can be estimated (Box and Cox 1964) by maximizing the likelihood function

An alternative way to estimate \(\varvec{\delta }\) is to find the optimal value that maximizes the correlation between \(\Phi ^{-1}\left( (i-0.5)/n\right) \) and \(Z_{(i)},\) where \(\Phi ^{-1}\) is the inverse of the cumulative distribution function of \(Z(\varvec{s}_i)\) and \(Z_{(i)}\) is the order statistic associated with \(Z(\varvec{s}_i)\), for \(i=1,\ldots ,n\) (see Kutner et al. 2004).

Using the logarithm transformation (\(\delta _1=0\)), it is possible to obtain an approximated expression that relates the ESS for the original and transformed variables.

###
**Proposition 1**

Suppose that \(\varvec{Z}=(Z(\varvec{s}_1),\ldots ,Z(\varvec{s}_n))^{\top }\sim \mathcal {N}(\varvec{\mu }_{Z},\varvec{\Sigma }_Z )\) and consider the transformation

where \(\varvec{Y}=(Y(\varvec{s}_1),\dots ,Y(\varvec{s}_n))^{\top }\) is the original spatial sample. Let \(\varvec{R}_{\varvec{Z}}\) and \(\varvec{R}_{\varvec{Y}}\) be the correlation matrices of \(\varvec{Z}\) and \(\varvec{Y}\), respectively. If \((\varvec{\mu }_{\varvec{Z}})_i= \mu ,\) and \((\varvec{\Sigma }_{\varvec{Z}})_{ii}=\sigma _{ii}=\widetilde{\sigma }^2, i=1,\ldots ,n,\) then

where \(\displaystyle c=\dfrac{\tilde{\sigma }^2}{\exp (\tilde{\sigma }^2)-1}\) and \(\displaystyle k_{ij}=1+\dfrac{\sigma _{ij}}{2}+\cdots +\dfrac{\sigma _{ij}^{m-1}}{m!}\). Furthermore, \((\varvec{R}_{\varvec{Y}})_{ij} = \dfrac{c}{\tilde{\sigma }^2}(e^{\sigma _{ij}}-1)\) as \(m\rightarrow \infty \).

###
*Proof*

Because \(Z(\varvec{s}_i)\) is normally distributed, \(Y(\varvec{s}_i)\) has a lognormal distribution with

for \(i,j=1,\dots ,n\).

Using a Taylor expansion for the function \(e^{\sigma _{ij}},\) one obtains

where \(\displaystyle {{\mathrm{cor}}}(Z(\varvec{s}_i),Z(\varvec{s}_j))=(\varvec{R_Z})_{ij}=\dfrac{\sigma _{ij}}{\tilde{\sigma }^2}\), \(\displaystyle k_{ij}=1+\dfrac{\sigma _{ij}}{2}+\cdots +\dfrac{\sigma _{ij}^{m-1}}{m!}\), and \(\displaystyle c=\dfrac{\tilde{\sigma }^2}{\exp (\tilde{\sigma }^2)-1}\). Moreover, if \(m\rightarrow \infty \) we have that

\(\square \)

## Rights and permissions

## About this article

### Cite this article

Acosta, J., Osorio, F. & Vallejos, R. Effective Sample Size for Line Transect Sampling Models with an Application to Marine Macroalgae.
*JABES* **21**, 407–425 (2016). https://doi.org/10.1007/s13253-016-0252-7

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s13253-016-0252-7