Skip to main content
Log in

Spatio-temporal additive regression model selection for urban water demand

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Understanding the factors influencing urban water use is critical for meeting demand and conserving resources. To analyze the relationships between urban household-level water demand and potential drivers, we develop a method for Bayesian variable selection in partially linear additive regression models, particularly suited for high-dimensional spatio-temporally dependent data. Our approach combines a spike-and-slab prior distribution with a modified version of the Bayesian group lasso to simultaneously perform selection of null, linear, and nonlinear models and to penalize regression splines to prevent overfitting. We investigate the effectiveness of the proposed method through a simulation study and provide comparisons with existing methods. We illustrate the methodology on a case study to estimate and quantify uncertainty of the associations between several environmental and demographic predictors and spatio-temporally varying household-level urban water demand in Tampa, FL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679

    Article  Google Scholar 

  • Banerjee S, Carlin B, Gelfand A (2014) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC Press, Boca Raton

    Google Scholar 

  • Banerjee S, Ghosal S (2014) Bayesian variable selection in generalized additive partial linear models. Stat 3(1):363–378

    Article  Google Scholar 

  • Blangiardo M, Cameletti M, Baio G, Rue H (2013) Spatial and spatio-temporal models with R-INLA. Spat Spatio-Temporal Epidemiol 4:33–49

    Article  Google Scholar 

  • Bliznyuk N, Carroll RJ, Genton MG, Wang Y (2012) Variogram estimation in the presence of trend. Stat Interface 5:159–168

    Article  Google Scholar 

  • Bliznyuk N, Paciorek CJ, Schwartz J, Coull B (2014) Nonlinear predictive latent process models for integrating spatio-temporal exposure data from multiple sources. Ann Appl Stat 8(3):1538–1560

    Article  Google Scholar 

  • Boyer MJ, Dukes MD, Young LJ, Wang S (2014) Irrigation conservation of Florida-friendly landscaping based on water billing data. J Irrig Drain Eng 140(12):04014037

    Article  Google Scholar 

  • Casella G (2001) Empirical Bayes gibbs sampling. Biostatistics 2(4):485–500

    Article  CAS  Google Scholar 

  • Chouldechova, A, Hastie T (2017) Generalized additive model selection. arXiv preprint: arxiv: 1506.03850

  • Crainiceanu CM, Ruppert D, Wand MP (2005) Bayesian analysis for penalized spline regression using WinBUGS. J Stat Softw 14(14):1–24

    Article  Google Scholar 

  • Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken

    Google Scholar 

  • Donkor E, Roberson JA, Soyer R, Mazzuchi T (2014) Urban water demand forecasting: review of methods and models. J Water Resour Plan Manag 140(2):146–159

    Article  Google Scholar 

  • Duerr I, Merrill HR, Wang C, Bai R, Boyer M, Dukes MD, Bliznyuk N (2018) Forecasting urban household water demand with statistical and machine learning methods using large space-time data: a comparative study. Environ Model Softw 102:29–38

    Article  Google Scholar 

  • Francisco-Fernandez M, Opsomer JD (2005) Smoothing parameter selection methods for nonparametric regression with spatially correlated errors. Can J Stat 33(2):279–295

    Article  Google Scholar 

  • George EI, Mcculloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373

    Google Scholar 

  • Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, NY

    Google Scholar 

  • Gryparis A, Coull Ba, Schwartz J, Suh HH (2007) Semiparametric latent variable regression models for spatio-temporal modeling of mobile source particles in the greater Boston area. J R Stat Soc Ser C 56(2):183–209

    Article  Google Scholar 

  • Haley MB, Dukes MD, Miller GL (2007) Residential irrigation water use in Central Florida. J Irrig Drain Eng 133(5):427–434

    Article  Google Scholar 

  • Harville D (1997) Matrix algebra from a statistician’s perspective. Technometrics 40:749

    Google Scholar 

  • Hastie T, Tibshirani R (1986) Generalized additive models. Stat Sci 1(3):297–318

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Google Scholar 

  • He K, Huang JZ (2016) Asymptotic properties of adaptive group lasso for sparse reduced rank regression. Stat 5(1):251–261 sta4.123

    Article  Google Scholar 

  • Heaton MJ, Datta A, Finley AO, Furrer R, Guinness J, Guhaniyogi R, Gerber F, Gramacy RB, Hammerling D, Katzfuss M, et al (2018) A case study competition among methods for analyzing large spatial data. J Agric Biol Environ Stat, 1–28

  • Johnstone IM, Silverman BW (2004) Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann Stat 32(4):1594–1649

    Article  Google Scholar 

  • Kamman EE, Wand MP (2003) Geoadditive models. Appl Stat 52:1–18

    Google Scholar 

  • Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28(5):1356–1378

    Article  Google Scholar 

  • Kyung M, Gill J, Ghosh M, Casella G (2010) Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal 5(2):369–411

    Article  Google Scholar 

  • Lee S-J, Chang H, Gober P (2015) Space and time dynamics of urban water demand in Portland, Oregon and Phoenix, Arizona. Stoch Environ Res Risk Assess 29(4):1135–1147

    Article  Google Scholar 

  • Lee S-J, Wentz EA, Gober P (2010) Space-time forecasting using soft geostatistics: a case study in forecasting municipal water demand for Phoenix, Arizona. Stoch Environ Res Risk Assess 24(2):283–295

    Article  Google Scholar 

  • Lin C-Y, Bondell H, Zhang HH, Zou H (2013) Variable selection for non-parametric quantile regression via smoothing spline analysis of variance. Stat 2(1):255–268

    Article  Google Scholar 

  • Lindgren F, Rue H (2015) Bayesian spatial modelling with R-INLA. J Stat Softw 63(19):1–25

    Article  Google Scholar 

  • Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach (with discussion). J R Stat Soc B 73(4):423–498

    Article  Google Scholar 

  • Lou Y, Bien J, Caruana R, Gehrke J (2016) Sparse partially linear additive models. J Comput Graph Stat 25(4):1126–1140

    Article  Google Scholar 

  • Luts J, Broderick T, Wand MP (2014) Real-time semiparametric regression. J Comput Graph Stat 23(3):589–615

    Article  Google Scholar 

  • Marra G, Wood SN (2011) Practical variable selection for generalized additive models. Comput Stat Data Anal 55(7):2372–2387

    Article  Google Scholar 

  • Merrill HR, Grunwald S, Bliznyuk N (2017) Semiparametric regression models for spatial prediction and uncertainty quantification of soil attributes. Stoch Environ Res Risk Assess 31(10):2691–2703

    Article  Google Scholar 

  • Opsomer J, Wang Y, Yang Y (2001) Nonparametric regression with correlated errors. Stat Sci 16(2):134–153

    Article  Google Scholar 

  • Piffady J, Parent É, Souchon Y (2013) A hierarchical generalized linear model with variable selection: studying the response of a representative fish assemblage for large european rivers in a multi-pressure context. Stoch Environ Res Risk Assess 27(7):1719–1734

    Article  Google Scholar 

  • R Core Team(2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Rakitsch B, Lippert C, Borgwardt K, Stegle O (2013) It is all in the noise: efficient multi-task gaussian process inference with structured residuals. In: Burges CJC, Bottou L, Welling, M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural Information Processing Systems 26, pp 1466–1474. Curran Associates, Inc

  • Raman S, Fuchs TJ, Wild PJ, Dahl E, Roth V (2009) The Bayesian group-lasso for analyzing contingency tables. In: Proceedings of the 26th annual international conference on machine learning, pp 881–888

  • Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B Stat Methodol 71(5):1009–1030

    Article  Google Scholar 

  • Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion). J R Stat Soc B 71:319–392

    Article  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, New York

    Book  Google Scholar 

  • Scheipl F (2011) spikeSlabGAM: Bayesian variable selection, model choice and regularization for generalized additive mixed models in R. J Stat Softw 43(14):1–24

    Article  Google Scholar 

  • Sun Y, Li B, Genton MG (2012) Geostatistics for large datasets. In: Advances and challenges in space-time modelling of natural events, pp 55–77. Springer, Berlin

  • Taylor-Rodriguez D, Womack AJ, Fuentes C, Bliznyuk N et al (2017) Intrinsic bayesian analysis for occupancy models. Bayesian Anal 12(3):855–877

    Article  Google Scholar 

  • USDA, Natural Resources Conservation Service, U.S. Dept. of Agriculture (2013). Soil surveys of Hillsborough, Pasco, and Pinellas counties. http://soildatamart.nrcs.usda.gov

  • USGS (2005). Evapotranspiration data for Florida. U.S. Geological Survey Florida Evapotranspiration Network, http://fl.water.usgs.gov/et

  • USGS (2011) Evapotranspiration data for Florida. U.S. Geological Survey Florida Evapotranspiration Network, http://hdwp.er.usgs.gov/et2005-2010.asp

  • Wand M, Ormerod J (2011) Penalized wavelets: embedding wavelets into semiparametric regression. Electron J Stat 5:1654–1717

    Article  Google Scholar 

  • Wood S (2016) Just another gibbs additive modeler: interfacing JAGS and mgcv. J Stat Softw Artic 75(7):1–15

    Google Scholar 

  • Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686

    Article  Google Scholar 

  • Wood SN (2006) Generalized additive models: an introduction with R. Chapman and Hall/CRC Press, Boca Raton

    Book  Google Scholar 

  • Xu X, Ghosh M (2015) Bayesian variable selection and estimation for group lasso. Bayesian Anal 10(4):909–936

    Article  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67

    Article  Google Scholar 

  • Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106(495):1099–1112

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolay Bliznyuk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 309 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Merrill, H.R., Tang, X. & Bliznyuk, N. Spatio-temporal additive regression model selection for urban water demand. Stoch Environ Res Risk Assess 33, 1075–1087 (2019). https://doi.org/10.1007/s00477-019-01682-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-019-01682-2

Keywords

Navigation