Skip to main content

Advertisement

Log in

Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In this paper, we present and evaluate a novel Bayesian regime-switching zero-inflated multilevel Poisson (RS-ZIMLP) regression model for forecasting alcohol use dynamics. The model partitions individuals’ data into two phases, known as regimes, with: (1) a zero-inflation regime that is used to accommodate high instances of zeros (non-drinking) and (2) a multilevel Poisson regression regime in which variations in individuals’ log-transformed average rates of alcohol use are captured by means of an autoregressive process with exogenous predictors and a person-specific intercept. The times at which individuals are in each regime are unknown, but may be estimated from the data. We assume that the regime indicator follows a first-order Markov process as related to exogenous predictors of interest. The forecast performance of the proposed model was evaluated using a Monte Carlo simulation study and further demonstrated using substance use and spatial covariate data from the Colorado Online Twin Study (CoTwins). Results showed that the proposed model yielded better forecast performance compared to a baseline model which predicted all cases as non-drinking and a reduced ZIMLP model without the RS structure, as indicated by higher AUC (the area under the receiver operating characteristic (ROC) curve) scores, and lower mean absolute errors (MAEs) and root-mean-square errors (RMSEs). The improvements in forecast performance were even more pronounced when we limited the comparisons to participants who showed at least one instance of transition to drinking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Note that in this article, the “ZI regime” and “drinking regime” were used in the context of alcohol use to correspond to the ZI and Poisson process in the ZIP model, respectively.

  2. One “drink” is equal to 1 can or bottle of beer, a glass of wine, or a shot of hard liquor.

  3. It should be noted that these Euclidean distances were crude measures and did not measure the actual road distances that the participants had to travel (by walking or transportation) to the landmarks. Future research could consider road network-based distance measures.

References

  • Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. Sociological Methodology, 16, 187–212.

    Google Scholar 

  • Berry, L. R., & West, M. (2020). Bayesian forecasting of many count-valued time series. Journal of Business and Economic Statistics, 38(4), 872–887.

    Google Scholar 

  • Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.

    Google Scholar 

  • Bronfenbrenner, U. (1992). Ecological systems theory. Jessica Kingsley Publishers.

  • Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Remer, L. G., & Wiehe, S. E. (2016). Brief report: Using global positioning system (GPS) enabled cell phones to examine adolescent travel patterns and time in proximity to alcohol outlets. Journal of Adolescence, 50, 65–68.

    PubMed  Google Scholar 

  • Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Woychik, M., & Wiehe, S. E. (2017). Association of environmental indicators with teen alcohol use and problem behavior: Teens’ observations vs. objectively-measured indicators. Health and Place, 43, 151–157.

  • Cao, H., Li, X.-L., Woon, D.Y.-K., & Ng, S.-K. (2013). Integrated oversampling for imbalanced time series classification. IEEE Transactions on Knowledge and Data Engineering, 25(12), 2809–2822.

    Google Scholar 

  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

    Google Scholar 

  • Chow, S.-M. (2019). Practical tools and guidelines for exploring and fitting linear and nonlinear dynamical systems models. Multivariate Behavioral Research, 54(5), 690–718.

    PubMed  PubMed Central  Google Scholar 

  • Chow, S.-M., Witkiewitz, K., Grasman, R. P. P. P., & Maisto, S. A. (2015). The cusp catastrophe model as cross-sectional and longitudinal mixture structural equation models. Psychological Methods, 20, 142–164. https://doi.org/10.1037/a0038962.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chow, S.-M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika Application Reviews and Case Studies, 78(4), 740–768.

    Google Scholar 

  • Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavior Research, 18, 147–167.

    Google Scholar 

  • De Jong, P. (1988). A cross-validation filter for time series models. Biometrika, 75, 594–600.

    Google Scholar 

  • Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol .17, pp. 973–978).

  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd (Vol 96, pp. 226–231).

  • Gelfand, A. E., Dey, D. K. & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Statistics 4 (p. 147–159). Oxford University Press.

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. New York: CRC Press.

    Google Scholar 

  • Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.

    PubMed  Google Scholar 

  • Geng, Y., & Luo, X. (2019). Cost-sensitive convolutional neural networks for imbalanced time series classification. Intelligent Data Analysis, 23(2), 357–370.

    Google Scholar 

  • Hahsler, M., Piekenbrock, M., & Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 25, 409–416.

    Google Scholar 

  • Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics, 56(4), 1030–1039.

    PubMed  Google Scholar 

  • Hamilton, J. D. (1994). Time series analysis (Vol. 2). Princeton New Jersey.

  • Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.

    PubMed  Google Scholar 

  • Harvey, A. C. (2001). Forecasting, structural time series models and the Kalman filter. Cambridge: Cambridge University Press.

    Google Scholar 

  • Helske, J. (2017). KFAS: Exponential family state space models in R. Journal of Statistical Software, 78(10), 1–39.

    Google Scholar 

  • Howard, A. L., Patrick, M. E., & Maggs, J. L. (2015). College student affect and heavy drinking: Variable associations across days, semesters, and people. Psychology of Addictive Behaviors, 29(2), 430.

    PubMed  Google Scholar 

  • Jacobson, N. C., Chow, S.-M., & Newman, M. G. (2019). The differential time-varying effect model (DTVEM): Identifying optimal time lags in intensive longitudinal data. Behavioral Research Methods, 51(1), 295–315. https://doi.org/10.3758/s13428-018-1101-0.

    Article  Google Scholar 

  • James, P., Berrigan, D., Hart, J. E., Hipp, J. A., Hoehner, C. M., Kerr, J., & Laden, F. (2014). Effects of buffer size and shape on associations between the built environment and energy balance. Health and Place, 27, 162–170.

    PubMed  Google Scholar 

  • Jane-Llopis, E., & Matytsina, I. (2006). Mental health and alcohol, drugs and tobacco: A review of the comorbidity between mental disorders and the use of alcohol, tobacco and illicit drugs. Drug and Alcohol Review, 25(6), 515–536.

    PubMed  Google Scholar 

  • Ji, L., Chen, M., Oravecz, Z., Cummings, E. M., Lu, Z.-H., & Chow, S.-M. (2020). A Bayesian vector autoregressive model with nonignorable missingness in dependent variables and covariates: Development, evaluation, and application to family processes. Structural Equation Modeling: A Multidisciplinary Journal, 27(3), 442–467.

    Google Scholar 

  • Kim, C.-J., & Nelson, C. R. (1999). State-space models with regime switching: classical and Gibbs-sampling approaches with applications. MIT Press Books.

  • Kuiper, R. M., & Ryan, O. (2018). Drawing conclusions from cross-lagged relationships: Re-considering the role of the time-interval. Structural Equation Modeling: A Multidisciplinary Journal, 25(5), 809–823.

    Google Scholar 

  • Kuppens, P., Allen, N. B., & Sheeber, L. B. (2010). Emotional inertia and psychological maladjustment. Psychological Science, 21(7), 984–991.

    PubMed  Google Scholar 

  • Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34(1), 1–14.

    Google Scholar 

  • Lee, A. H., Wang, K., Scott, J. A., Yau, K. K., & McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in Medical Research, 15(1), 47–61.

    PubMed  Google Scholar 

  • Li, Y., Ji, L., Oravecz, Z., Brick, T. R., Hunter, M. D., & Chow, S.-M. (2019). dynr.mi: An R program for multiple imputation in dynamic modeling. International Journal of Computer Electrical Automation Control and Information Engineering. 13(5), 302–311.

  • Li, Y., Wood, J., Ji, L., Chow, S.-M., & Oravecz, Z. (2021). Fitting multilevel vector autoregressive models in Stan, JAGS, and Mplus. Structural Equation Modeling A Multidisciplinary Journal, 5, 1–24.

    Google Scholar 

  • Litt, M. D., Cooney, N. L., & Morse, P. (1998). Ecological momentary assessment (EMA) with treated alcoholics: Methodological problems and potential solutions. Health Psychology, 17(1), 48.

    PubMed  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  • Lu, Z.-H., Chow, S.-M., Ram, N., & Cole, P. M. (2019). Zero-inflated regime-switching stochastic differential equation models for highly unbalanced multivariate, multi-subject time-series data. Psychometrika, 84(2), 611–645.

    PubMed  PubMed Central  Google Scholar 

  • Lu, Z.-H., Chow, S.-M., Sherwood, A., & Zhu, H. (2015). Bayesian analysis of ambulatory cardiovascular dynamics with application to irregularly spaced sparse data. Annals of Applied Statistics, 9, 1601–1620. https://doi.org/10.1214/15-AOAS846.

    Article  PubMed  Google Scholar 

  • Lütkepohl, H. (2005). Introduction to multiple time series analysis (2nd ed.). New York: Springer-Verlag.

    Google Scholar 

  • MacCallum, R. C., Roznowski, M., Mar, C. M., & Reith, J. V. (1994). Alternative strategies for cross-validation of covariance structure models. Multivariate Behavioral Research, 29(1), 1–32.

    PubMed  Google Scholar 

  • Maisto, S. A., Xie, F. C., Witkiewitz, K., Ewart, C. K., Connors, G. J., Zhu, H., & Chow, S.-M. (2017). How chronic self-regulatory stress, poor anger regulation, and momentary affect undermine treatment for alcohol use disorder: Integrating social action theory and the dynamic model of relapse. Journal of Social and Clinical Psychology, 36, 238–263. https://doi.org/10.1521/jscp.2017.36.3.238.

    Article  Google Scholar 

  • Min, Y., & Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modelling, 5(1), 1–19.

    Google Scholar 

  • Moniz, N., Branco, P., & Torgo, L. (2017). Resampling strategies for imbalanced time series forecasting. International Journal of Data Science and Analytics, 3(3), 161–181.

    Google Scholar 

  • Neal, R. M. (2003). Slice sampling. Annals of Statistics, 31(3), 705–741.

    Google Scholar 

  • Neelon, B. H., O’Malley, A. J., & Normand, S.-L.T. (2010). A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Statistical Modelling, 10(4), 421–439.

  • Oravecz, Z., Tuerlinckx, F., & Vandekerckhove, J. (2011). A hierarchical latent stochastic differential equation model for affective dynamics. Psychological Methods, 16(4), 468.

    PubMed  Google Scholar 

  • Orrù, G., Monaro, M., Conversano, C., Gemignani, A., & Sartori, G. (2020). Machine learning in psychometrics and psychological research. Frontiers in Psychology, 10, 2970.

    PubMed  PubMed Central  Google Scholar 

  • Oud, J. H., & Jansen, R. A. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65(2), 199–215.

    Google Scholar 

  • Pasch, K. E., Hearst, M. O., Nelson, M. C., Forsyth, A., & Lytle, L. A. (2009). Alcohol outlets and youth alcohol use: Exposure in suburban areas. Health and Place, 15(2), 642–646.

    PubMed  Google Scholar 

  • Perchoux, C., Chaix, B., Brondeel, R., & Kestens, Y. (2016). Residential buffer, perceived neighborhood, and individual activity space: New refinements in the definition of exposure areas-the RECORD Cohort Study. Health and Place, 40, 116–122.

    PubMed  Google Scholar 

  • Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3), 711–735. https://doi.org/10.1007/s11222-016-9649-y.

    Article  Google Scholar 

  • Plummer, M., et al. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1-10).

  • Reboussin, B. A., Song, E.-Y., & Wolfson, M. (2011). The impact of alcohol outlet density on the geographic clustering of underage drinking behaviors within census tracts. Alcoholism Clinical and Experimental Research, 35(8), 1541–1549.

    PubMed  Google Scholar 

  • Roychoudhury, S., Ghalwash, M., & Obradovic, Z. (2017). Cost sensitive time-series classification. In Joint European conference on machine learning and knowledge discovery in databases (pp. 495–511).

  • Russell, M. A., Almeida, D. M., & Maggs, J. L. (2017). Stressor-related drinking and future alcohol problems among university students. Psychology of Addictive Behaviors, 31(6), 676.

    PubMed  PubMed Central  Google Scholar 

  • Russell, M. A., & Odgers, C. L. (2020). Adolescents’ subjective social status predicts day-to-day mental health and future substance use. Journal of Research on Adolescence, 30, 532–544.

  • Sánchez-Sánchez, P. A., García-González, J. R., & Coronell, L. H. P. (2019). Encountered problems of time series with neural networks: Models and architectures. IntechOpen: In Recent trends in artificial neural networks-from training to prediction.

  • Shen, H. (2010). Exponentially weighted methods for forecasting intraday time series with multiple seasonal cycles: Comments. International Journal of Forecasting, 26, 653–654.

    Google Scholar 

  • Substance Abuse and Mental Health Services Administration, Office of Applied Studies. (2008). Results from the 2007 National Survey on Drug Use and Health: National Findings (DHHS Publication No. SMA 08-4343, NSDUH Series H-34). Rockville, MD: Substance Abuse and Mental Health Services Administration.

  • Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.

    Google Scholar 

  • Voelkle, M. C., Oud, J. H., Davidov, E., & Schmidt, P. (2012). An SEM approach to continuous time modeling of panel data: Relating authoritarianism and anomia. Psychological Methods, 17(2), 176.

    PubMed  Google Scholar 

  • West, M., & Harrison, J. (1997). Bayesian forecasting and dynamic models (2nd ed.). New York: Springer-Verlag.

    Google Scholar 

  • Wilhelm, F. H., Grossman, P., & Muller, M. I. (2012). Bridging the gap between the laboratory and the real world: Integrative ambulatory psychophysiology. In Handbook of research methods for studying daily life (pp. 210–234). Guilford: New York.

  • Wray, T. B., Merrill, J. E., & Monti, P. M. (2014). Using ecological momentary assessment (EMA) to assess situation-level predictors of alcohol use and alcohol-related consequences. Alcohol Research: Current Reviews, 36(1), 19.

    Google Scholar 

  • Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.

    PubMed  PubMed Central  Google Scholar 

  • Yau, K. K., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20(19), 2907–2920.

    PubMed  Google Scholar 

  • You, D., Hunter, M., Chen, M., & Chow, S.-M. (2019). A diagnostic procedure for detecting outliers in linear state-space models. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2019.1627659 ((PMID: 31264463)).

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhou, S., Li, Y., Bodovski, Y., Chi, G., & Chow, S.-M. (2021a). GPS2space: An open-source Python library for spatial data building and spatial measure extraction. https://github.com/shuai-zhou/gps2space. https://doi.org/10.5281/zenodo.4672651.

  • Zhou, S., Li, Y., Chi, G., Yin, J., Oravecz, Z., Bodovski, Y., ... & Chow, S. M. (2021b). GPS2space: an open-source Python library for spatial measure extraction from GPS data. Journal of Behavioral Data Science, 1(2), 127–155.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanling Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Funding for this study was provided by the NIH Intensive Longitudinal Health Behavior Cooperative Agreement Program under U24AA027684 and U01DA046413 (SV/NF), National Science Foundation grants BCS-1052736, IGE-1806874, and SES-1823633, the Eunice Kennedy Shriver National Institute of Child Health and Human Development under P2C HD041025, and the Pennsylvania State University Quantitative Social Sciences Initiative and UL TR000127 from the National Center for Advancing Translational Sciences. Part of the computations for this research were performed on the Pennsylvania State University’s Institute for Computational and Data Sciences’ Roar supercomputer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Oravecz, Z., Zhou, S. et al. Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates. Psychometrika 87, 376–402 (2022). https://doi.org/10.1007/s11336-021-09831-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09831-9

Keywords

Navigation