Skip to main content
Log in

A Spatial Logistic Regression Model Based on a Valid Skew-Gaussian Latent Field

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Logistic regression is commonly used to estimate the association of one (or more) independent variable(s) with a binary- dependent outcome. In many applications latent sources are both spatially dependent and non-Gaussian; thus, it is desirable to exploit both properties jointly. Spatial logistic regression is a well-established technique of including spatial dependence in logistic regression models. In this paper, we develop a spatial logistic regression model based on a valid skew-Gaussian random field. For parameter estimation, we use a Monte Carlo extension of the EM algorithm along with an approximation based on the standard logistic function. A simulation study is applied in order to determine the performance of the proposed model and also to compare the results with a recently introduced model with established efficiency. The identifiability of the parameters is investigated as well. As an illustrative purpose, an application to the Meuse heavy metals dataset is presented.

Supplementary materials accompanying this paper appear online.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Afroughi S (2015) Bayesian inference of spatially correlated binary data using skew-normal latent variables with application in tooth caries analysis. Open J Stat 5:127–139

    Article  Google Scholar 

  • Billingsley P (2008) Probability and measure. Wiley, Hoboken

    MATH  Google Scholar 

  • Burrough PA, McDonnell RA (1998) Principles of geographical information systems: spatial information systems and geostatistics

  • Chang W, Haran M, Applegate P, Pollard D (2016) Calibrating an ice sheet model using high-dimensional binary spatial data. J Am Stat Assoc 111(513):57–72

    Article  MathSciNet  Google Scholar 

  • Diggle PJ, Giorgi E (2016) Model-based geostatistics for prevalence mapping in low-resource settings. J Am Stat Assoc 111(515):1096–1120

    Article  MathSciNet  Google Scholar 

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472

    Article  MATH  Google Scholar 

  • Hardouin C (2019) A variational method for parameter estimation in a logistic spatial regression. Spatial Stat 31(1):1–45

    MathSciNet  Google Scholar 

  • Hoeting JA, Davis RA, Merton AA, Thompson SE (2006) Model selection for geostatistical models. Ecol Appl 16(1):87–98

    Article  Google Scholar 

  • Hosseini F, Eidsvik J, Mohammadzadeh M (2011) Approximate bayesian inference in spatial glmm with skew normal latent variables. Comput Stat Data Anal 55(4):1791–1806

    Article  MathSciNet  MATH  Google Scholar 

  • Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Stat Comput 10(1):25–37

    Article  Google Scholar 

  • Lin P-S, Clayton MK et al (2005) Analysis of binary spatial data by quasi-likelihood estimating equations. Ann Stat 33(2):542–555

    Article  MathSciNet  MATH  Google Scholar 

  • Mahmoudian B (2018) On the existence of some skew-gaussian random field models. Stat Prob Lett 137:331–335

    Article  MathSciNet  MATH  Google Scholar 

  • Mardia KV, Marshall RJ (1984) Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71(1):135–146

    Article  MathSciNet  MATH  Google Scholar 

  • Nisa H, Mitakda MB, Astutik S, et al. (2019) Estimation of propensity score using spatial logistic regression. In: IOP conference series: materials science and engineering, volume 546, page 052048. IOP Publishing

  • Paciorek CJ (2007) Computational techniques for spatial logistic regression with large data sets. Comput Stat Data Anal 51(8):3631–3653

    Article  MathSciNet  MATH  Google Scholar 

  • Rikken M, Van Rijn R (1993) Soil pollution with heavy metals: in inquiry into spatial variation, cost of mapping and the risk evaluation of Copper, Cadmium, Lead and Zinc in the floodplains of the Meuse West of Stein, The Netherlands: field study report. University of Utrecht, Utrecht

    Google Scholar 

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to bayesian regression models. Canadian J Stat 31(2):129–150

    Article  MathSciNet  MATH  Google Scholar 

  • Sengupta A, Cressie N, Kahn BH, Frey R (2016) Predictive inference for big, spatial, non-gaussian data: modis cloud data and its change-of-support. Aust New Zealand J Stat 58(1):15–45

    Article  MathSciNet  Google Scholar 

  • Tadayon V, Rasekh A (2019) Non-gaussian covariate-dependent spatial measurement error model for analyzing big spatial data. J Agric Biol Environ Stat 24(1):49–72

    Article  MathSciNet  MATH  Google Scholar 

  • Tadayon V, Torabi M (2019) Spatial models for non-gaussian data with covariate measurement error. Environmetrics 30(3):e2545

    Article  MathSciNet  Google Scholar 

  • Tadayon V, Torabi M (2022) Sampling strategies for proportion and rate estimation in a spatially correlated population. Spatial Stat 47:100564

    Article  MathSciNet  Google Scholar 

  • Tayyebi A, Delavar MR, Yazdanpanah MJ, Pijanowski BC, Saeedi S, Tayyebi AH (2010) A spatial logistic regression model for simulating land use patterns: a case study of the shiraz metropolitan area of iran. Advances in earth observation of global change. Springer, Berlin, pp 27–42

    Chapter  Google Scholar 

  • Wu W, Zhang L (2013) Comparison of spatial and non-spatial logistic regression models for modeling the occurrence of cloud cover in north-eastern puerto rico. Appl Geogr 37:52–62

    Article  Google Scholar 

  • Xie C, Huang B, Claramunt C, Chandramouli C (2005) Spatial logistic regression and gis to model rural-urban land conversion. In: Proceedings of PROCESSUS Second International Colloquium on the Behavioural Foundations of Integrated Land-use and Transportation Models: frameworks, models and applications, pages 12–15. University of Toronto

  • Zhang Z, Arellano-Valle RB, Genton MG, Huser R (2021) Tractable bayes of skew-elliptical link models for correlated binary data. arXiv preprint arXiv:2101.02233

  • Zhu J, Huang H-C, Wu J (2005) Modeling spatial-temporal binary data using markov random fields. J Agric Biol Environ Stat 10(2):212

    Article  Google Scholar 

  • Zhu J, Zheng Y, Carroll AL, Aukema BH (2008) Autologistic regression analysis of spatial-temporal binary data via monte carlo maximum likelihood. J Agric Biol Environ Stat 13(1):84–98

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the Associate Editor and two reviewers for the constructive comments and suggestions, which led to an improved version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mehdi Saber.

Ethics declarations

Author Contribution

VT contrived the study, conceptualized the review, reviewed and revised the manuscript. The simulation study, fitting the model to the real data, and documenting the whole manuscript were performed by the first author. MMS had the majority role in the theoretical part of the modeling and also he found an appropriate real data set. Exploratory data analysis of the real data and also some parts of the R functions were provided by the second author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 10 KB)

Appendix

Appendix

In what follows, we use the notation \({{{\vartheta }}_i}\) to show \({{\vartheta }}\left( {{s_i}} \right) \). Equation (4) can be written as

$$\begin{aligned} Q\left( {\mathbf{{{\varvec{\eta }}}}\left| {{\mathbf{{{\varvec{\eta }}}}^t}} \right. } \right)= & {} - \sum \limits _i {{E_1}\left[ {\left. {\ln \left\{ {1 + \exp \left( {\mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}} + {\gamma }{W_i} + {\varepsilon _i}} \right) } \right\} } \right| \mathbf{{Z}}} \right] } \nonumber \\&+ \sum \limits _i {{Z_i}{} \mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}} + \gamma \sum \limits _i {{Z_i}{E_2}\left[ {{W_i}\left| \mathbf{{Z}} \right. } \right] } + \sum \limits _i {{Z_i}{E_3}\left[ {{\varepsilon _i}\left| \mathbf{{Z}} \right. } \right] } \nonumber \\&- \frac{1}{{2{\tau ^2}}}{E_4}\left[ {{\varvec{\varepsilon }}^\mathrm{T}{\varvec{\varepsilon }}\left| \mathbf{{Z}} \right. } \right] - \ln \left| {H + {\delta ^2}{I_n}} \right| - \frac{n}{2}\ln {\pi ^2}{\tau ^2}\nonumber \\&- \frac{1}{2} \mathrm{trace}\left\{ {{{\left( {H + {\delta ^2}{I_n}} \right) }^{ - 1}}{E_5}\left( {\mathbf{{WW^\mathrm{T}}}\left| \mathbf{{Z}} \right. } \right) } \right\} \nonumber \\&- \sqrt{\frac{2}{\pi }} \delta {{\mathbf{{1}}}_n^\mathrm{T}}{\left[ {H + {\delta ^2}{I_n}} \right] ^{ - 1}}{E_6}\left[ {\mathbf{{W}}\left| \mathbf{{Z}} \right. } \right] - \frac{1}{\pi }{\delta ^2}{\mathbf{{1}}_n^\mathrm{T}} {\left[ {H + {\delta ^2}{I_n}} \right] ^{ - 1}}{\mathbf{{1}}_n} \nonumber \\&+ {E_7}\left[ {\left. {\ln {\Phi _n}\left\{ {\delta {{\left( {H + {\delta ^2}{I_n}} \right) }^{ - 1}}(\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n});\mathbf{{0}},\Delta } \right\} } \right| \mathbf{{Z}}} \right] , \end{aligned}$$
(A1)

in which the fourth line has been derived from the properties \(x^\mathrm{T}Ax = \mathrm{trace}\left( {x^\mathrm{T}Ax} \right) \) and \(\mathrm{trace}\left( {AB} \right) = \mathrm{trace}\left( {BA} \right) \). A closer scrutiny shows, however, that one of the main problematic terms of (A1) is \(\sum \nolimits _i {{E_1}\left[ {\left. {\ln \left( {1 + {e^{{Y_i}}}} \right) } \right| \mathbf{{Z}}} \right] }\). Hardouin (2019) proposed a variational method which is based on replacing this term by an initial approximation of the logistic function \(\kappa \left( x \right) = {e^x}/ \left( {1 + {e^x}} \right) = 1/ \left( {1 + {e^{ - x}}} \right) \) which had been studied by Jaakkola and Jordan (2000) as

$$\begin{aligned} \ln \kappa \left( x \right) \ge \ln \kappa \left( \theta \right) + \frac{{x - \theta }}{2} - \lambda \left( \theta \right) \left( {{x^2} - {\theta ^2}} \right) , \qquad \lambda \left( \theta \right) = \frac{{\kappa \left( \theta \right) - 1/2}}{{2\theta }}. \end{aligned}$$

This variational lower bound involves the model parameters and the so-called variational parameter \(\theta \). Let \(\Theta = {\left( {{\theta _1}, \ldots ,{\theta _n}} \right) ^\mathrm{T} }\), we apply this lower bound to \( - \sum \nolimits _i {\ln \left( {1 + {e^{{Y_i}}}} \right) } = \sum \nolimits _i {\ln \kappa \left( { - {Y_i}} \right) }\) as the first term of (3). Therefore,

$$\begin{aligned} - \sum \nolimits _i {\ln \left( {1 + {e^{{Y_i}}}} \right) }\ge & {} \sum \nolimits _i[\ln \kappa \left( {\theta _i} \right) - \frac{{\theta _i} }{2} + {{\theta _i^2}}\lambda \left( {\theta _i} \right) ]\nonumber \\&- \frac{1}{2}\left[ {\sum \nolimits _i {\mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}+ \gamma {W_i} + {\varepsilon _i}} } \right] \nonumber \\&-\sum \nolimits _i \lambda \left( {\theta _i} \right) \left[ \frac{}{} {{\left( {\mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}} \right) }^2} + {\gamma ^2}W_i^2 + \varepsilon _i^2 \right. \nonumber \\&\left. +2\gamma \mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}{W_i} + 2\mathbf{{x}}_i^\mathrm{T} {\varvec{\beta }}{\varepsilon _i} + 2\gamma {W_i}{\varepsilon _i}\frac{}{} \right] . \end{aligned}$$
(A2)

The monotonicity of expectation implies that getting the (conditional) expectation of (A2) (given \(\mathbf{Z}\)) preserves the inequality. Now, we can write \(Q\left( {{{{\varvec{\eta }}}}\left| {{{{{\varvec{\eta }}}}^t}} \right. } \right) \ge {{\tilde{Q}}}\left( {{{{\varvec{\eta }}},\Theta }\left| {{{{{\varvec{\eta }}}}^t}},\Theta ^t \right. } \right) \), where \({{\tilde{Q}}}\) has been resulted by replacing the first term of Q with the expectation of the right hand side of (A2) given \(\mathbf{Z}\), which eliminates \({E_1}\left[ {\left. {\ln \left( {1 + \exp \left\{ {{Y_i}} \right\} } \right) } \right| \mathbf{{Z}}} \right] \) and incorporates \({E_8}\left[ {{W_i^2}\left| \mathbf{{Z}} \right. } \right] \) and \({E_9}\left[ {{\varepsilon _i^2}\left| \mathbf{{Z}} \right. } \right] \) into inference. We then use a two-stage estimation procedure in the M-step, where the first stage consists of maximizing \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}},\Theta }\left| {{{{{\varvec{\eta }}}}^t}},\Theta ^t \right. } \right) \) with respect to the model parameters for fixed \(\Theta \) results in \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}}^{t+1},\Theta }\left| {{{{{\varvec{\eta }}}}^t},\Theta ^t} \right. } \right) \), and in the second stage, updated variational parameters \(\Theta ^{t+1}\) is obtained by maximizing \({{\tilde{Q}}}\left( {{{{\varvec{\eta }}}^{t+1},\Theta }\left| {{{{{\varvec{\eta }}}}^t},\Theta ^t} \right. } \right) \) with respect to \(\Theta \). The updates of the model parameters are as follows. \({\tau ^{{2^{t + 1}}}} = {n^{ - 1}} {{\mathbb {E}}}_4^t\), \({\varvec{\beta }}^{t+1}\) can be easily obtained as a solution of the systems of linear equations

$$\begin{aligned} \sum \nolimits _i {\lambda \left( {\theta _i^t} \right) \left( {\mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}^{t + 1}}} \right) \mathbf{{x}}_i^\mathrm{T} } = - \sum \nolimits _i {\left[ {\frac{1}{4} + \frac{{{Z_i}}}{2} + \lambda \left( {\theta _i^t} \right) \left( {{\gamma ^t}{\mathbb {E}}_2^t + {{\mathbb {E}}}_3^t} \right) } \right] } \mathbf{{x}}_i^\mathrm{T}, \end{aligned}$$

in which the left-hand side can be rewritten as \(\left[ {\sum \nolimits _i {{\mathbf{{x}}_i}{} \mathbf{{x}}_i^\mathrm{T} } } \right] {{\varvec{\beta }}^{t + 1}}\), then,

$$\begin{aligned} {{\varvec{\beta }}^{t + 1}} = - {\left[ {\sum \nolimits _i {\lambda \left( {\theta _i^t} \right) {\mathbf{{x}}_i}{} \mathbf{{x}}_i^\mathrm{T} } } \right] ^{ - 1}}\sum \nolimits _i {\left[ {\frac{1}{4} + \frac{{{Z_i}}}{2} + \lambda \left( {\theta _i^t} \right) \left( {{\gamma ^t}{\mathbb {E}}_2^t + {{\mathbb {E}}}_3^t} \right) {\mathbf{{x}}_i}} \right] }. \end{aligned}$$

Moreover,

$$\begin{aligned} {\gamma ^{t + 1}}= & {} \frac{{\sum \nolimits _i {\left[ {{Z_i} - 0.5 - 2\lambda \left( {\theta _i^t} \right) \left( {\mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}^t} + {{\mathbb {E}}}_3^t} \right) } \right] {{\mathbb {E}}}_2^t} }}{{2\sum \nolimits _i {\lambda \left( {\theta _i^t} \right) {\mathbb {E}}_8^t} }},\\&{\delta ^{t + 1}} = \mathop {\arg \max }\limits _\delta \left\{ { - \ln \left| {{H^t} + {\delta ^2}{I_n}} \right| - \frac{1}{2}tr\{ {{{( {{H^t} + {\delta ^2}{I_n}} )}^{ - 1}} {{\mathbb {E}}}_5^t} \}} \right. \\&- \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n}^\mathrm{T} {\left( {{H^t} + {\delta ^2}{I_n}} \right) ^{ - 1}} {{\mathbb {E}}}_6^t - \frac{1}{\pi }{\delta ^2}{\mathbf{{1}}_n}^\mathrm{T} {\left( {{H^t} + {\delta ^2}{I_n}} \right) ^{ - 1}}{\mathbf{{1}}_n}\\&\left. + {{\mathbb {E}}}_7^t\{ {{\ln {\Phi _n}( {\delta {{( {{H^t} + {\delta ^2}{I_n}} )}^{ - 1}}(\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta {\mathbf{{1}}_n});\mathbf{{0}},{I_n} - {\delta ^2}{{[ {{H^t} + {\delta ^2}{I_n}} ]}^{ - 1}}} )} \left. \right| \mathbf{{Z}}} \}\right\} ,\\&{\psi ^{t + 1}} = \mathop {\arg \max }\limits _\psi \left\{ { - \ln \left| {{H} + {\delta ^{t^2}}{I_n}} \right| - \frac{1}{2}tr\{ {{{( {{H} + {\delta ^{t^2}}{I_n}} )}^{ - 1}} {{\mathbb {E}}}_5^t} \}} \right. \\&- \sqrt{\frac{2}{\pi }} \delta ^t {\mathbf{{1}}_n}^\mathrm{T} {\left( {{H} + {\delta ^{t^2}}{I_n}} \right) ^{ - 1}} {\mathbb {E}}_6^t - \frac{1}{\pi }{\delta ^{t^2}}{\mathbf{{1}}_n}^\mathrm{T} {\left( {{H} + {\delta ^{t^2}}{I_n}} \right) ^{ - 1}}{\mathbf{{1}}_n}\\&\left. + {{\mathbb {E}}}_7^t\{ {{\ln {\Phi _n}( {\delta ^t {{( {{H} + {\delta ^{t^2}}{I_n}} )}^{ - 1}}(\mathbf{{W}} + \sqrt{\frac{2}{\pi }} \delta ^t {\mathbf{{1}}_n});\mathbf{{0}},{I_n} - {\delta ^{t^2}}{{( {{H} + {\delta ^{t^2}}{I_n}} )}^{ - 1}}} )} \left. \right| \mathbf{{Z}}} \}\right\} ,\\&{\left( {{{\theta _i^{t+1}} }} \right) ^2} = {{\left( {\mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}^{t + 1}}} \right) }^2} + {{\left( {{\gamma ^{t + 1}}} \right) }^2}{{\mathbb {E}}}_8^{t+1} + {\mathbb {E}}_9^{t+1} \\&+ 2{\gamma ^{t + 1}}{} \mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}^{t + 1}}{{\mathbb {E}}}_2^{t+1} + 2\mathbf{{x}}_i^\mathrm{T} {{\varvec{\beta }}^{t + 1}}{{\mathbb {E}}}_3^{t+1} + 2{\gamma ^{t + 1}}{{\mathbb {E}}}_2^{t+1} {{\mathbb {E}}}_3^{t+1}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tadayon, V., Saber, M.M. A Spatial Logistic Regression Model Based on a Valid Skew-Gaussian Latent Field. JABES 28, 59–73 (2023). https://doi.org/10.1007/s13253-022-00512-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-022-00512-3

Keywords

Navigation