Abstract
In many environmental and ecological studies, it is of interest to model compositional data. One approach is to consider positive random vectors that are subject to a unit-sum constraint. In landscape ecological studies, it is common that compositional data are also sampled in space with some elements of the composition absent at certain sampling sites. In this paper, we first propose a practical spatial multivariate ordered probit model for multivariate ordinal data, where the response variables can be viewed as the discretized non-negative compositions without the unit-sum constraint. We then propose a novel two-stage spatial mixture Dirichlet regression model. The first stage models the spatial dependence and the presence of exact zero values, and the second stage models all the non-zero compositional data. A maximum composite likelihood approach is developed for parameter estimation and inference in both the spatial multivariate ordered probit model and the two-stage spatial mixture Dirichlet regression model. The standard errors of the parameter estimates are computed by an estimate of the Godambe information matrix. A simulation study is conducted to evaluate the performance of the proposed models and methods. A land cover data example in landscape ecology further illustrates that accounting for spatial dependence can improve the accuracy in the prediction of presence/absence of different land covers as well as the magnitude of land cover compositions.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademia Kiado, Budapest, pp 267–281
Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London
Aitchison J, Kay JW (2003) Possible solutions in some essential zero problems in compositional data analysis. Working paper, presented at CoDaWorks03
Bai Y, Kang J, Song PX-K (2014) Efficient pairwise composite likelihood estimation for spatial-clustered data. Biometrics 70:661–670
Bhat CR, Varin C, Ferdous N (2010) A comparison of the maximum simulated likelihood and composite marginal likelihood estimation approaches in the context of the multivariate ordered-response model. In: Greene W, Hill RC (eds) Advances in econometrics: maximum simulated likelihood methods and applications. Emerald Group Publishing Limited, Bingley, pp 65–106
Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208
Crow TR, Host GE, Mladenoff DJ (1999) Ownership and ecosystem as sources of spatial heterogeneity in a forested landscape, Wisconsin, USA. Landscape Ecol 14:449–463
Dai B, Ding S, Wahba G (2013) Multivariate bernoulli distribution. Bernoulli 19:1465–1483
De Oliveira V (2000) Bayesian prediction of clipped Gaussian random fields. Comput Stat Data Anal 34:299–314
Eskelson BN, Madsen L, Hagar JC, Temesgen H (2011) Estimating riparian understory vegetation cover with beta regression and copula models. Forest Sci 57:212–221
Feng X, Zhu J, Steen-Adams MM, Lin PS (2014) Composite likelihood estimation for models of spatial ordinal data and spatial proportional data with zero/one values. Environmetrics 25:571–583
Feng X (2015) Composite likelihood estimation and inference for spatial data models. Ph.D. thesis, University of Wisconsin, Madison
Feng X, Zhu J, Steen-Adams MM (2015) On regression analysis of spatial proportional data with zero/one values. Spatial Stat 14:452–471
Gelfand AE, Banerjee S (2010) Multivariate spatial process models. In: Gelfand AE, Diggle PJ, Fuentes M, Guttorp P (eds) Handbook of spatial statistics. Chapman and Hall/CRC, Boca Raton, pp 495–515
Godambe VP (1960) An optimum property of regular maximum likelihood estimation. Annal Math Stat 31:1208–1211
Heagerty PJ, Lele SR (1998) A composite likelihood approach to binary spatial data. J Am Stat Assoc 93:1099–1111
Higgs MD, Hoeting JA (2010) A clipped latent variable model for spatially correlated ordered categorical data. Comput Stat Data Anal 54:1999–2011
Hijazi RH, Jernigan RW (2009) Modeling compositional data using Dirichlet regression models. J Appl Prob Stat 4:77–91
Irvine KM, Rodhouse TJ, Keren IN (2016) Extending ordinal regression with a latent zero-augmented beta distribution. J Agric Biol Envir Stat. doi:10.1007/s13253-016-0265-2
LaMondia J, Bhat CR (2009) A conceptual and methodological framework of leisure activity loyalty accommodating the travel context: application of a copula-based bivariate ordered-response choice model. Technical Paper, Department of Civil, Architectural and Environmental Engineering, The University of Texas at Austin
Leininger T, Gelfand A, Allen J, Silander J (2013) Spatial regression modeling for compositional data with many zeros. J Agric Biol Environ Stat 18:314–334
Lindsay B (1988) Composite likelihood methods. Contemp Math 80:221–239
Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J (2012) Model-based replacement of rounded zeros in compositional data: classical and robust approaches. Comput Stat Data Anal 56:2688–2704
Neelon B, Anthopolos R, Miranda ML (2014) A spatial bivariate probit model for correlated binary data with application to adverse birth outcomes. Stat Methods Med Res 23:119–133
Palarea-Albaladejo J, Martín-Fernández JA (2008) A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Comput Geosci 34:902–917
Qian PZG, Wu H, Wu CFJ (2008) Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics 50:383–396
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
Rhemtulla JM, Mladenoff DJ, Clayton MK (2007) Regional land-cover conversion in the US upper Midwest: magnitude of change and limited recovery (1850–1935-1993). Landscape Ecol 22:57–75
Robinson DT (2012) Land-cover fragmentation and configuration of ownership parcels in an exurban landscape. Urban Ecosyst 15:53–69
Schliep EM, Hoeting JA (2013) Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data. J Agric Biol Environ Stat 18:492–513
Spies TA, Johnson KN, Burnett KM et al (2007) Cumulative ecological and socioeconomic effects of forest policies in coastal Oregon. Ecol Appl 17:5–17
Stanfield BJ, Bliss JC, Spies TA (2002) Land ownership and landscape structure: a spatial analysis of sixty-six Oregon (USA) coast range watersheds. Landscape Ecol 17:685–697
Steen-Adams MM, Mladenoff DJ, Langston NE, Liu F, Zhu J (2011) Influence of biophysical factors and differences in ojibwe reservation versus Euro- American social histories on forest landscape change in northern Wisconsin, USA. Landscape Ecol 26:1165–1178
Stewart C, Field C (2010) Managing the essential zeros in quantitative fatty acid signature analysis. J Agric Biol Environ Stat 16:45–69
Tjelmeland H, Lund KV (2003) Bayesian modelling of spatial compositional data. J Appl Stat 30:87–100
Tsagris M (2014) Zero adjusted Dirichlet regression for compositional data with zero values present. arXiv:1410.5011
Varin C, Vidoni P (2005) A note on composite likelihood inference and model selection. Biometrika 92:519–528
Varin C, Czado C (2010) A mixed autoregressive probit model for ordinal longitudinal data. Biostatistics 11:127–138
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sinica 21:5–42
White MA, Mladenoff DJ (1994) Old-growth forest landscape transitions from pre-European settlement to present. Landscape Ecol 9:191–205
Zhao Y, Joe H (2005) Composite likelihood estimation in multivariate data analysis. Can J Stat 33:335–356
Acknowledgments
Funding has been provided for this research from a USDA Cooperative State Research, Education and Extension Service (CSREES) McIntire-Stennis project and the National Science Foundation PalEON MacroSystems Biology under grant no. DEB1241868. The authors thank Dr. Mark D.O. Adams for database development assistance. We also thank the co-editor, an associate editor, and three anonymous referees for constructive comments that improved the content and presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Appendices
Appendix 1: Technical details for Sect. 2.2
To efficiently obtain the first-order derivatives, for each response type we define a new vector of parameters that contains the cutoff parameters and regression coefficients: \(\tilde{{\varvec{\beta }}}_i = (\alpha _{i0}, \alpha _{i1}, {\varvec{\alpha }}_i', \alpha _{iK}, {\varvec{\beta }}_i')'\). Let \(\tilde{{\varvec{\beta }}} = (\tilde{{\varvec{\beta }}}_1', \ldots , \tilde{{\varvec{\beta }}}_I')'\), and we also let \(\alpha _{i0} \equiv -\infty , \alpha _{i1} \equiv 0\), and \(\alpha _{iK} \equiv +\infty \) only to simplify the notation, while the corresponding derivatives will not be considered in the score function. For response type i, we define new design matrices for the upper and lower limits in the integrals of (7) as
respectively, where \({\varvec{e}}_n\) is a \((p+1)\)-dimensional basis vector with the nth entry being 1 and others being 0. For example, if \(y_{ij} = 1\), then \({\varvec{e}}_{y_{ij}+1} = (0,1,0, \ldots ,0)'\). We then define \({\varvec{X}}_{\text {up}}\) and \({\varvec{X}}_{\text {lo}}\) as the matrices that include design matrices for all response types, that is,
It follows that the bivariate density function \(P(Y_{ij} = y_{ij}, Y_{i'j'}=y_{i'j'})\) can be rewritten as
where \(\tilde{{\varvec{\theta }}} = (\tilde{{\varvec{\beta }}}', {\varvec{\gamma }}', \sigma ^2, \rho )'\), and \({\varvec{x}}_{\text {up},(i-1)N+j}\) and \({\varvec{x}}_{\text {lo},(i-1)N+j}\) are the \(\{(i-1)N+j\}\)th row of \({\varvec{X}}_{\text {up}}\) and \({\varvec{X}}_{\text {lo}}\), respectively.
Further, let \(\tilde{{\varvec{x}}}_{\text {up},ij} = {\varvec{x}}_{\text {up},(i-1)N+j}\) and \(\tilde{{\varvec{x}}}_{\text {lo},ij} = {\varvec{x}}_{\text {lo},(i-1)N+j}\). The derivatives for any of the bivariate CDFs in (14) can be obtained as
where \(\phi _1(t) = (2\pi )^{-1/2}\exp \left( -t^2/2\right) \), \(\xi (ij,i'j') = \big (\tilde{{\varvec{x}}}_{i'j'}'\tilde{{\varvec{\beta }}}- \tilde{\rho }_{ii'jj'}\tilde{{\varvec{x}}}_{ij}'\tilde{{\varvec{\beta }}}\big )(1-\tilde{\rho }_{ii'jj'}^2)^{-1/2}\), and \(\tilde{{\varvec{x}}}_{ij}\) is either \(\tilde{{\varvec{x}}}_{\text {up},ij}\) or \(\tilde{{\varvec{x}}}_{\text {lo},ij}\). The partial derivatives with respect to \(\gamma _{i_1i_2}\), \(\sigma ^2\), and \(\rho \) are obtained by the chain rule, that is,
Appendix 2: Technical details for Sect. 3.1
The conditional density is given by
Thus, the conditional expectations can be expressed as
for \(i \ne i'\), where \({\varvec{Y}}^{(j)} = (Y_{1j}, \ldots , Y_{Ij})'\) with \(Y_{Ij} \equiv 1\).
The covariance between two response types at any two different sites j and \(j'\) is given by
where \(P_\mathrm{B}(\cdot )\) is the probability mass function of a multivariate Bernoulli distribution. The third equality in (18) is by the definition of expectation, and for the last equality, \(E(\tilde{Y}_{ij} \tilde{Y}_{i'j'} | {\varvec{y}}^{(j)}, {\varvec{y}}^{(j')}) = E(\tilde{Y}_{ij} | {\varvec{y}}^{(j)}) E(\tilde{Y}_{i'j'} |{\varvec{y}}^{(j')})\) is due to the definition (9).
Rights and permissions
About this article
Cite this article
Feng, X., Zhu, J., Lin, PS. et al. Composite likelihood approach to the regression analysis of spatial multivariate ordinal data and spatial compositional data with exact zero values. Environ Ecol Stat 24, 39–68 (2017). https://doi.org/10.1007/s10651-016-0360-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-016-0360-0