Abstract
Logistic regression is commonly used to estimate the association of one (or more) independent variable(s) with a binary- dependent outcome. In many applications latent sources are both spatially dependent and non-Gaussian; thus, it is desirable to exploit both properties jointly. Spatial logistic regression is a well-established technique of including spatial dependence in logistic regression models. In this paper, we develop a spatial logistic regression model based on a valid skew-Gaussian random field. For parameter estimation, we use a Monte Carlo extension of the EM algorithm along with an approximation based on the standard logistic function. A simulation study is applied in order to determine the performance of the proposed model and also to compare the results with a recently introduced model with established efficiency. The identifiability of the parameters is investigated as well. As an illustrative purpose, an application to the Meuse heavy metals dataset is presented.
Supplementary materials accompanying this paper appear online.
Similar content being viewed by others
References
Afroughi S (2015) Bayesian inference of spatially correlated binary data using skew-normal latent variables with application in tooth caries analysis. Open J Stat 5:127–139
Billingsley P (2008) Probability and measure. Wiley, Hoboken
Burrough PA, McDonnell RA (1998) Principles of geographical information systems: spatial information systems and geostatistics
Chang W, Haran M, Applegate P, Pollard D (2016) Calibrating an ice sheet model using high-dimensional binary spatial data. J Am Stat Assoc 111(513):57–72
Diggle PJ, Giorgi E (2016) Model-based geostatistics for prevalence mapping in low-resource settings. J Am Stat Assoc 111(515):1096–1120
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Hardouin C (2019) A variational method for parameter estimation in a logistic spatial regression. Spatial Stat 31(1):1–45
Hoeting JA, Davis RA, Merton AA, Thompson SE (2006) Model selection for geostatistical models. Ecol Appl 16(1):87–98
Hosseini F, Eidsvik J, Mohammadzadeh M (2011) Approximate bayesian inference in spatial glmm with skew normal latent variables. Comput Stat Data Anal 55(4):1791–1806
Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Stat Comput 10(1):25–37
Lin P-S, Clayton MK et al (2005) Analysis of binary spatial data by quasi-likelihood estimating equations. Ann Stat 33(2):542–555
Mahmoudian B (2018) On the existence of some skew-gaussian random field models. Stat Prob Lett 137:331–335
Mardia KV, Marshall RJ (1984) Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71(1):135–146
Nisa H, Mitakda MB, Astutik S, et al. (2019) Estimation of propensity score using spatial logistic regression. In: IOP conference series: materials science and engineering, volume 546, page 052048. IOP Publishing
Paciorek CJ (2007) Computational techniques for spatial logistic regression with large data sets. Comput Stat Data Anal 51(8):3631–3653
Rikken M, Van Rijn R (1993) Soil pollution with heavy metals: in inquiry into spatial variation, cost of mapping and the risk evaluation of Copper, Cadmium, Lead and Zinc in the floodplains of the Meuse West of Stein, The Netherlands: field study report. University of Utrecht, Utrecht
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to bayesian regression models. Canadian J Stat 31(2):129–150
Sengupta A, Cressie N, Kahn BH, Frey R (2016) Predictive inference for big, spatial, non-gaussian data: modis cloud data and its change-of-support. Aust New Zealand J Stat 58(1):15–45
Tadayon V, Rasekh A (2019) Non-gaussian covariate-dependent spatial measurement error model for analyzing big spatial data. J Agric Biol Environ Stat 24(1):49–72
Tadayon V, Torabi M (2019) Spatial models for non-gaussian data with covariate measurement error. Environmetrics 30(3):e2545
Tadayon V, Torabi M (2022) Sampling strategies for proportion and rate estimation in a spatially correlated population. Spatial Stat 47:100564
Tayyebi A, Delavar MR, Yazdanpanah MJ, Pijanowski BC, Saeedi S, Tayyebi AH (2010) A spatial logistic regression model for simulating land use patterns: a case study of the shiraz metropolitan area of iran. Advances in earth observation of global change. Springer, Berlin, pp 27–42
Wu W, Zhang L (2013) Comparison of spatial and non-spatial logistic regression models for modeling the occurrence of cloud cover in north-eastern puerto rico. Appl Geogr 37:52–62
Xie C, Huang B, Claramunt C, Chandramouli C (2005) Spatial logistic regression and gis to model rural-urban land conversion. In: Proceedings of PROCESSUS Second International Colloquium on the Behavioural Foundations of Integrated Land-use and Transportation Models: frameworks, models and applications, pages 12–15. University of Toronto
Zhang Z, Arellano-Valle RB, Genton MG, Huser R (2021) Tractable bayes of skew-elliptical link models for correlated binary data. arXiv preprint arXiv:2101.02233
Zhu J, Huang H-C, Wu J (2005) Modeling spatial-temporal binary data using markov random fields. J Agric Biol Environ Stat 10(2):212
Zhu J, Zheng Y, Carroll AL, Aukema BH (2008) Autologistic regression analysis of spatial-temporal binary data via monte carlo maximum likelihood. J Agric Biol Environ Stat 13(1):84–98
Acknowledgements
We would like to thank the Associate Editor and two reviewers for the constructive comments and suggestions, which led to an improved version of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Author Contribution
VT contrived the study, conceptualized the review, reviewed and revised the manuscript. The simulation study, fitting the model to the real data, and documenting the whole manuscript were performed by the first author. MMS had the majority role in the theoretical part of the modeling and also he found an appropriate real data set. Exploratory data analysis of the real data and also some parts of the R functions were provided by the second author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
In what follows, we use the notation \({{{\vartheta }}_i}\) to show \({{\vartheta }}\left( {{s_i}} \right) \). Equation (4) can be written as
in which the fourth line has been derived from the properties \(x^\mathrm{T}Ax = \mathrm{trace}\left( {x^\mathrm{T}Ax} \right) \) and \(\mathrm{trace}\left( {AB} \right) = \mathrm{trace}\left( {BA} \right) \). A closer scrutiny shows, however, that one of the main problematic terms of (A1) is \(\sum \nolimits _i {{E_1}\left[ {\left. {\ln \left( {1 + {e^{{Y_i}}}} \right) } \right| \mathbf{{Z}}} \right] }\). Hardouin (2019) proposed a variational method which is based on replacing this term by an initial approximation of the logistic function \(\kappa \left( x \right) = {e^x}/ \left( {1 + {e^x}} \right) = 1/ \left( {1 + {e^{ - x}}} \right) \) which had been studied by Jaakkola and Jordan (2000) as
This variational lower bound involves the model parameters and the so-called variational parameter \(\theta \). Let \(\Theta = {\left( {{\theta _1}, \ldots ,{\theta _n}} \right) ^\mathrm{T} }\), we apply this lower bound to \( - \sum \nolimits _i {\ln \left( {1 + {e^{{Y_i}}}} \right) } = \sum \nolimits _i {\ln \kappa \left( { - {Y_i}} \right) }\) as the first term of (3). Therefore,
The monotonicity of expectation implies that getting the (conditional) expectation of (A2) (given \(\mathbf{Z}\)) preserves the inequality. Now, we can write \(Q\left( {{{{\varvec{\eta }}}}\left| {{{{{\varvec{\eta }}}}^t}} \right. } \right) \ge {{\tilde{Q}}}\left( {{{{\varvec{\eta }}},\Theta }\left| {{{{{\varvec{\eta }}}}^t}},\Theta ^t \right. } \right) \), where \({{\tilde{Q}}}\) has been resulted by replacing the first term of Q with the expectation of the right hand side of (A2) given \(\mathbf{Z}\), which eliminates \({E_1}\left[ {\left. {\ln \left( {1 + \exp \left\{ {{Y_i}} \right\} } \right) } \right| \mathbf{{Z}}} \right] \) and incorporates \({E_8}\left[ {{W_i^2}\left| \mathbf{{Z}} \right. } \right] \) and \({E_9}\left[ {{\varepsilon _i^2}\left| \mathbf{{Z}} \right. } \right] \) into inference. We then use a two-stage estimation procedure in the M-step, where the first stage consists of maximizing \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}},\Theta }\left| {{{{{\varvec{\eta }}}}^t}},\Theta ^t \right. } \right) \) with respect to the model parameters for fixed \(\Theta \) results in \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}}^{t+1},\Theta }\left| {{{{{\varvec{\eta }}}}^t},\Theta ^t} \right. } \right) \), and in the second stage, updated variational parameters \(\Theta ^{t+1}\) is obtained by maximizing \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}}^{t+1},\Theta }\left| {{{{{\varvec{\eta }}}}^t},\Theta ^t} \right. } \right) \) with respect to \(\Theta \). The updates of the model parameters are as follows. \({\tau ^{{2^{t + 1}}}} = {n^{ - 1}} {{\mathbb {E}}}_4^t\), \({\varvec{\beta }}^{t+1}\) can be easily obtained as a solution of the systems of linear equations
in which the left-hand side can be rewritten as \(\left[ {\sum \nolimits _i {{\mathbf{{x}}_i}{} \mathbf{{x}}_i^\mathrm{T} } } \right] {{\varvec{\beta }}^{t + 1}}\), then,
Moreover,
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tadayon, V., Saber, M.M. A Spatial Logistic Regression Model Based on a Valid Skew-Gaussian Latent Field. JABES 28, 59–73 (2023). https://doi.org/10.1007/s13253-022-00512-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-022-00512-3