Abstract
In this paper, we present and evaluate a novel Bayesian regime-switching zero-inflated multilevel Poisson (RS-ZIMLP) regression model for forecasting alcohol use dynamics. The model partitions individuals’ data into two phases, known as regimes, with: (1) a zero-inflation regime that is used to accommodate high instances of zeros (non-drinking) and (2) a multilevel Poisson regression regime in which variations in individuals’ log-transformed average rates of alcohol use are captured by means of an autoregressive process with exogenous predictors and a person-specific intercept. The times at which individuals are in each regime are unknown, but may be estimated from the data. We assume that the regime indicator follows a first-order Markov process as related to exogenous predictors of interest. The forecast performance of the proposed model was evaluated using a Monte Carlo simulation study and further demonstrated using substance use and spatial covariate data from the Colorado Online Twin Study (CoTwins). Results showed that the proposed model yielded better forecast performance compared to a baseline model which predicted all cases as non-drinking and a reduced ZIMLP model without the RS structure, as indicated by higher AUC (the area under the receiver operating characteristic (ROC) curve) scores, and lower mean absolute errors (MAEs) and root-mean-square errors (RMSEs). The improvements in forecast performance were even more pronounced when we limited the comparisons to participants who showed at least one instance of transition to drinking.
Similar content being viewed by others
Notes
Note that in this article, the “ZI regime” and “drinking regime” were used in the context of alcohol use to correspond to the ZI and Poisson process in the ZIP model, respectively.
One “drink” is equal to 1 can or bottle of beer, a glass of wine, or a shot of hard liquor.
It should be noted that these Euclidean distances were crude measures and did not measure the actual road distances that the participants had to travel (by walking or transportation) to the landmarks. Future research could consider road network-based distance measures.
References
Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. Sociological Methodology, 16, 187–212.
Berry, L. R., & West, M. (2020). Bayesian forecasting of many count-valued time series. Journal of Business and Economic Statistics, 38(4), 872–887.
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.
Bronfenbrenner, U. (1992). Ecological systems theory. Jessica Kingsley Publishers.
Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Remer, L. G., & Wiehe, S. E. (2016). Brief report: Using global positioning system (GPS) enabled cell phones to examine adolescent travel patterns and time in proximity to alcohol outlets. Journal of Adolescence, 50, 65–68.
Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Woychik, M., & Wiehe, S. E. (2017). Association of environmental indicators with teen alcohol use and problem behavior: Teens’ observations vs. objectively-measured indicators. Health and Place, 43, 151–157.
Cao, H., Li, X.-L., Woon, D.Y.-K., & Ng, S.-K. (2013). Integrated oversampling for imbalanced time series classification. IEEE Transactions on Knowledge and Data Engineering, 25(12), 2809–2822.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Chow, S.-M. (2019). Practical tools and guidelines for exploring and fitting linear and nonlinear dynamical systems models. Multivariate Behavioral Research, 54(5), 690–718.
Chow, S.-M., Witkiewitz, K., Grasman, R. P. P. P., & Maisto, S. A. (2015). The cusp catastrophe model as cross-sectional and longitudinal mixture structural equation models. Psychological Methods, 20, 142–164. https://doi.org/10.1037/a0038962.
Chow, S.-M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika Application Reviews and Case Studies, 78(4), 740–768.
Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavior Research, 18, 147–167.
De Jong, P. (1988). A cross-validation filter for time series models. Biometrika, 75, 594–600.
Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol .17, pp. 973–978).
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd (Vol 96, pp. 226–231).
Gelfand, A. E., Dey, D. K. & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Statistics 4 (p. 147–159). Oxford University Press.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. New York: CRC Press.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.
Geng, Y., & Luo, X. (2019). Cost-sensitive convolutional neural networks for imbalanced time series classification. Intelligent Data Analysis, 23(2), 357–370.
Hahsler, M., Piekenbrock, M., & Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 25, 409–416.
Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics, 56(4), 1030–1039.
Hamilton, J. D. (1994). Time series analysis (Vol. 2). Princeton New Jersey.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Harvey, A. C. (2001). Forecasting, structural time series models and the Kalman filter. Cambridge: Cambridge University Press.
Helske, J. (2017). KFAS: Exponential family state space models in R. Journal of Statistical Software, 78(10), 1–39.
Howard, A. L., Patrick, M. E., & Maggs, J. L. (2015). College student affect and heavy drinking: Variable associations across days, semesters, and people. Psychology of Addictive Behaviors, 29(2), 430.
Jacobson, N. C., Chow, S.-M., & Newman, M. G. (2019). The differential time-varying effect model (DTVEM): Identifying optimal time lags in intensive longitudinal data. Behavioral Research Methods, 51(1), 295–315. https://doi.org/10.3758/s13428-018-1101-0.
James, P., Berrigan, D., Hart, J. E., Hipp, J. A., Hoehner, C. M., Kerr, J., & Laden, F. (2014). Effects of buffer size and shape on associations between the built environment and energy balance. Health and Place, 27, 162–170.
Jane-Llopis, E., & Matytsina, I. (2006). Mental health and alcohol, drugs and tobacco: A review of the comorbidity between mental disorders and the use of alcohol, tobacco and illicit drugs. Drug and Alcohol Review, 25(6), 515–536.
Ji, L., Chen, M., Oravecz, Z., Cummings, E. M., Lu, Z.-H., & Chow, S.-M. (2020). A Bayesian vector autoregressive model with nonignorable missingness in dependent variables and covariates: Development, evaluation, and application to family processes. Structural Equation Modeling: A Multidisciplinary Journal, 27(3), 442–467.
Kim, C.-J., & Nelson, C. R. (1999). State-space models with regime switching: classical and Gibbs-sampling approaches with applications. MIT Press Books.
Kuiper, R. M., & Ryan, O. (2018). Drawing conclusions from cross-lagged relationships: Re-considering the role of the time-interval. Structural Equation Modeling: A Multidisciplinary Journal, 25(5), 809–823.
Kuppens, P., Allen, N. B., & Sheeber, L. B. (2010). Emotional inertia and psychological maladjustment. Psychological Science, 21(7), 984–991.
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34(1), 1–14.
Lee, A. H., Wang, K., Scott, J. A., Yau, K. K., & McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in Medical Research, 15(1), 47–61.
Li, Y., Ji, L., Oravecz, Z., Brick, T. R., Hunter, M. D., & Chow, S.-M. (2019). dynr.mi: An R program for multiple imputation in dynamic modeling. International Journal of Computer Electrical Automation Control and Information Engineering. 13(5), 302–311.
Li, Y., Wood, J., Ji, L., Chow, S.-M., & Oravecz, Z. (2021). Fitting multilevel vector autoregressive models in Stan, JAGS, and Mplus. Structural Equation Modeling A Multidisciplinary Journal, 5, 1–24.
Litt, M. D., Cooney, N. L., & Morse, P. (1998). Ecological momentary assessment (EMA) with treated alcoholics: Methodological problems and potential solutions. Health Psychology, 17(1), 48.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Lu, Z.-H., Chow, S.-M., Ram, N., & Cole, P. M. (2019). Zero-inflated regime-switching stochastic differential equation models for highly unbalanced multivariate, multi-subject time-series data. Psychometrika, 84(2), 611–645.
Lu, Z.-H., Chow, S.-M., Sherwood, A., & Zhu, H. (2015). Bayesian analysis of ambulatory cardiovascular dynamics with application to irregularly spaced sparse data. Annals of Applied Statistics, 9, 1601–1620. https://doi.org/10.1214/15-AOAS846.
Lütkepohl, H. (2005). Introduction to multiple time series analysis (2nd ed.). New York: Springer-Verlag.
MacCallum, R. C., Roznowski, M., Mar, C. M., & Reith, J. V. (1994). Alternative strategies for cross-validation of covariance structure models. Multivariate Behavioral Research, 29(1), 1–32.
Maisto, S. A., Xie, F. C., Witkiewitz, K., Ewart, C. K., Connors, G. J., Zhu, H., & Chow, S.-M. (2017). How chronic self-regulatory stress, poor anger regulation, and momentary affect undermine treatment for alcohol use disorder: Integrating social action theory and the dynamic model of relapse. Journal of Social and Clinical Psychology, 36, 238–263. https://doi.org/10.1521/jscp.2017.36.3.238.
Min, Y., & Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modelling, 5(1), 1–19.
Moniz, N., Branco, P., & Torgo, L. (2017). Resampling strategies for imbalanced time series forecasting. International Journal of Data Science and Analytics, 3(3), 161–181.
Neal, R. M. (2003). Slice sampling. Annals of Statistics, 31(3), 705–741.
Neelon, B. H., O’Malley, A. J., & Normand, S.-L.T. (2010). A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Statistical Modelling, 10(4), 421–439.
Oravecz, Z., Tuerlinckx, F., & Vandekerckhove, J. (2011). A hierarchical latent stochastic differential equation model for affective dynamics. Psychological Methods, 16(4), 468.
Orrù, G., Monaro, M., Conversano, C., Gemignani, A., & Sartori, G. (2020). Machine learning in psychometrics and psychological research. Frontiers in Psychology, 10, 2970.
Oud, J. H., & Jansen, R. A. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65(2), 199–215.
Pasch, K. E., Hearst, M. O., Nelson, M. C., Forsyth, A., & Lytle, L. A. (2009). Alcohol outlets and youth alcohol use: Exposure in suburban areas. Health and Place, 15(2), 642–646.
Perchoux, C., Chaix, B., Brondeel, R., & Kestens, Y. (2016). Residential buffer, perceived neighborhood, and individual activity space: New refinements in the definition of exposure areas-the RECORD Cohort Study. Health and Place, 40, 116–122.
Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3), 711–735. https://doi.org/10.1007/s11222-016-9649-y.
Plummer, M., et al. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1-10).
Reboussin, B. A., Song, E.-Y., & Wolfson, M. (2011). The impact of alcohol outlet density on the geographic clustering of underage drinking behaviors within census tracts. Alcoholism Clinical and Experimental Research, 35(8), 1541–1549.
Roychoudhury, S., Ghalwash, M., & Obradovic, Z. (2017). Cost sensitive time-series classification. In Joint European conference on machine learning and knowledge discovery in databases (pp. 495–511).
Russell, M. A., Almeida, D. M., & Maggs, J. L. (2017). Stressor-related drinking and future alcohol problems among university students. Psychology of Addictive Behaviors, 31(6), 676.
Russell, M. A., & Odgers, C. L. (2020). Adolescents’ subjective social status predicts day-to-day mental health and future substance use. Journal of Research on Adolescence, 30, 532–544.
Sánchez-Sánchez, P. A., García-González, J. R., & Coronell, L. H. P. (2019). Encountered problems of time series with neural networks: Models and architectures. IntechOpen: In Recent trends in artificial neural networks-from training to prediction.
Shen, H. (2010). Exponentially weighted methods for forecasting intraday time series with multiple seasonal cycles: Comments. International Journal of Forecasting, 26, 653–654.
Substance Abuse and Mental Health Services Administration, Office of Applied Studies. (2008). Results from the 2007 National Survey on Drug Use and Health: National Findings (DHHS Publication No. SMA 08-4343, NSDUH Series H-34). Rockville, MD: Substance Abuse and Mental Health Services Administration.
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.
Voelkle, M. C., Oud, J. H., Davidov, E., & Schmidt, P. (2012). An SEM approach to continuous time modeling of panel data: Relating authoritarianism and anomia. Psychological Methods, 17(2), 176.
West, M., & Harrison, J. (1997). Bayesian forecasting and dynamic models (2nd ed.). New York: Springer-Verlag.
Wilhelm, F. H., Grossman, P., & Muller, M. I. (2012). Bridging the gap between the laboratory and the real world: Integrative ambulatory psychophysiology. In Handbook of research methods for studying daily life (pp. 210–234). Guilford: New York.
Wray, T. B., Merrill, J. E., & Monti, P. M. (2014). Using ecological momentary assessment (EMA) to assess situation-level predictors of alcohol use and alcohol-related consequences. Alcohol Research: Current Reviews, 36(1), 19.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122.
Yau, K. K., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20(19), 2907–2920.
You, D., Hunter, M., Chen, M., & Chow, S.-M. (2019). A diagnostic procedure for detecting outliers in linear state-space models. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2019.1627659 ((PMID: 31264463)).
Zhou, S., Li, Y., Bodovski, Y., Chi, G., & Chow, S.-M. (2021a). GPS2space: An open-source Python library for spatial data building and spatial measure extraction. https://github.com/shuai-zhou/gps2space. https://doi.org/10.5281/zenodo.4672651.
Zhou, S., Li, Y., Chi, G., Yin, J., Oravecz, Z., Bodovski, Y., ... & Chow, S. M. (2021b). GPS2space: an open-source Python library for spatial measure extraction from GPS data. Journal of Behavioral Data Science, 1(2), 127–155.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Funding for this study was provided by the NIH Intensive Longitudinal Health Behavior Cooperative Agreement Program under U24AA027684 and U01DA046413 (SV/NF), National Science Foundation grants BCS-1052736, IGE-1806874, and SES-1823633, the Eunice Kennedy Shriver National Institute of Child Health and Human Development under P2C HD041025, and the Pennsylvania State University Quantitative Social Sciences Initiative and UL TR000127 from the National Center for Advancing Translational Sciences. Part of the computations for this research were performed on the Pennsylvania State University’s Institute for Computational and Data Sciences’ Roar supercomputer.
Rights and permissions
About this article
Cite this article
Li, Y., Oravecz, Z., Zhou, S. et al. Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates. Psychometrika 87, 376–402 (2022). https://doi.org/10.1007/s11336-021-09831-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-021-09831-9