Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Autoregressive Generalized Linear Mixed Effect Models with Crossed Random Effects: An Application to Intensive Binary Time Series Eye-Tracking Data


As a method to ascertain person and item effects in psycholinguistics, a generalized linear mixed effect model (GLMM) with crossed random effects has met limitations in handing serial dependence across persons and items. This paper presents an autoregressive GLMM with crossed random effects that accounts for variability in lag effects across persons and items. The model is shown to be applicable to intensive binary time series eye-tracking data when researchers are interested in detecting experimental condition effects while controlling for previous responses. In addition, a simulation study shows that ignoring lag effects can lead to biased estimates and underestimated standard errors for the experimental condition effects.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    In this paper, the terms of participant and person are used interchangeably. Similarly, the terms of word and item are used interchangeably.

  2. 2.

    The observed binary responses as covariates are \(y_{(t-1)lji}\) in Eq. 6.

  3. 3.

    When each person and each item are analyzed, trial clustering does not exist.

  4. 4.

    The reason for the variability is that an experimental study was set within a natural unscripted conversation to balance ecological validity with experimental control. By design, each participant should have had 96 trials; this is only if everything worked out correctly. Thus, trials were excluded due to things like: the partner saying the wrong thing (e.g., “the elephant oh big one” instead of “the small elephant”), or the participant wiggling too much and as a result the eye tracker stopped working, et cetera.

  5. 5.

    In our empirical example, there are 8886 unique cases by trial, person, and item. There are 7902 unique cases by person and item because there were 984 cases (11.07% of 8886 cases) in which items are presented twice for two different trials. We created the lagged response within 112 time points for the 7902 unique combinations of person and item because the lagged effect is considered for persons and items and the trial is considered a clustering factor. Dependency in response variables over time from the same item having the same trial number is expected to be explained with a trial random effect in the model. In cases where a given person saw the same item twice on two different trials, we sorted the data by person, item, and time in order, and randomly designated one of the two trials to have its first time point to be replaced by NA. Note that for data sets in which there are many repetitions of each item, we recommend creating \(y_{0lji}\) for the unique combination of trial, person, and item.

  6. 6.

    Thirty min for Model 1 and 10 min for Model 0 were required for one replication on the cluster computers with 8 CPUs and 4G RAM.


  1. Aitkin, M., & Alfó, M. (1998). Regression models for binary longitudinal responses. Statistics and Computing, 8, 289–307.

  2. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

  3. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. New York, NY: Cambridge University Press.

  4. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412.

  5. Barr, D. J. (2008). Analyzing ‘visual world’ eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59, 457–474.

  6. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.

  7. Bauer, D. J., & Cai, L. (2009). Consequences of unmodeled nonlinear effects in multilevel models. Journal of Educational and Behavioral Statistics, 34, 97–114.

  8. Bisconti, T., Bergeman, C. S., & Boker, S. M. (2004). Emotional well-being in recently bereaved windows: A dynamical system approach. Journal of Gerontology, Series, B: Psychological Sciences and Social Sciences, 59, 158–167.

  9. Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control (Revised ed.). San Francisco, CA: Holden-Day.

  10. Bringmann, L. F., Hamaker, E. L., Vigo, D. E., Aubert, A., Borsboom, D., & Tuerlinckx, F. (2017). Changing dynamics: Time-varying autoregressive models using generalized additive modeling. Psychological Methods, 22, 409–425.

  11. Broadbent, D. E., Cooper, P. F., FitzGerald, P., & Parkes, K. R. (1982). The Cognitive Failures Questionnaire (CFQ) and its correlates. British Journal of Clinical Psychology, 21, 1–16.

  12. Browne, M. W., & Nesselroade, J. R. (2005). Representing psychological processes with dynamic factor models: Some promising uses and extensions of ARMA time series model. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Psychometrics: A festschrift to Roderick. P. McDonald (pp. 415–452). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

  13. Brown-Schmidt, S., & Fraundorf, S. H. (2015). Interpretation of informational questions modulated by joint knowledge and intonational contours. Journal of Memory and Language, 84, 49–74.

  14. Chatfield, C. (2004). The analysis of time series: An introduction (6th ed.). London: Chapman and Hall/CRC.

  15. Cho, S.-J., & Rabe-Hesketh, S. (2011). Alternating imputation posterior estimation of models with crossed random effects. Computational Statistics and Data Analysis, 55, 12–25.

  16. Cho, S.-J., Partchev, I., & De Boeck, P. (2012). Parameter estimation of multiple item profiles models. British Journal of Mathematical and Statistical Psychology, 65, 438–466.

  17. Cho, S.-J., & De Boeck., P. (in press). A note on \(N\) in Bayesian information criterion (BIC) for item response models. Applied Psychological Measurement.

  18. Cox, M. D. (1970). A mathematical model of the Indian Ocean. Deep Sea Research and Oceanographic Abstracts, 17, 47–75.

  19. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.

  20. de Haan-Rietdijk, S., Kuppens, P., Bergeman, C. S., Sheeber, L. B., Allen, N. B., & Hamaker, E. L. (2017). On the use of mixed Markov models for intensive longitudinal data. Multivariate Behavioral Research, 52, 747–767.

  21. Gelman, A., & Su, Y.-S. (2016). Arm: Data analysis using regression and multilevel/hierarchical models. R package version 1.9-3. Retrieved March 10, 2017, from https://CRAN.R-project.org/package=arm.

  22. Greven, S., & Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika, 97, 773–789.

  23. Hallett, P. E. (1986). Eye movement. In K. Buff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance. New York, NY: Wiley.

  24. Hanna, J. E., & Brennan, S. E. (2007). Speakers eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language, 57, 596–615.

  25. Hamaker, E. L., van Hattum, P., Kuiper, R. M., & Hoijtink, H. (2011). Model selection based on information criteria in multilevel modeling. In J. Hox & J. K. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 231–255). New York, NY: Taylor & Francis.

  26. Hamaker, E. L., & Grasman, R. P. (2014). To center or not to center? Investigating inertia with a multilevel autoregressive model. Frontiers in Psychology, 5, 1492.

  27. Heagerty, P. J., & Zeger, S. L. (1998). Lorelogram: A regression approach to exploring dependence in longitudinal categorical responses. Journal of the American Statistical Association, 93, 150–162.

  28. Heckman, J. J. (1981). The incidental parameters problem and the problem of initial condition in estimating a discrete time-discrete data stochastic process. In C. F. Manski & D. L. McFadden (Eds.), Structural analysis of discrete data and econometric applications (pp. 179–195). Cambridge, MA: MIT Press.

  29. Heller, D., Grodner, D., & Tanenhaus, M. K. (2008). The role of perspective in identifying domains of reference. Cognition, 108, 831–836.

  30. Hsiao, C. (2003). Analysis of Panel Data (2nd ed.). New York: Cambridge University Press.

  31. Hung, Y., Zarnitsyna, V., Zhang, Y., Zhu, C., & Wu, C. F. J. (2008). Binary time series modeling with application to adhesion frequency experiments. Journal of the American Statistical Association, 483, 1248–1259.

  32. Irwin, D. E. (2004). Fixation location and fixation duration as indices of cognitive processing. In J. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world (pp. 105–134). New York, NY: Psychology Press.

  33. Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434–446.

  34. Joe, H. (2008). Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics and Data Analysis, 52, 5066–5074.

  35. Jongerling, J., Laurenceau, J. P., & Hamaker, E. L. (2015). A multilevel AR(1) model: Allowing for inter-individual differences in trait-scores, inertia, and innovation variance. Multivariate Behavioral Research, 50, 334–349.

  36. Kaiser, E., & Trueswell, J. C. (2008). Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specific approach to reference resolution. Language and Cognitive Processes, 23, 709–748.

  37. Kaufmann, H. (1987). Regression models for nonstationary categorical time series: Asymptotic estimation theory. Annals of Statistics, 15, 79–98.

  38. Liu, S. (2017). Person-specific versus multilevel autoregressive models: Accuracy in parameter estimates at the population and individual levels. British Journal of Mathematical and Statistical Psychology,. https://doi.org/10.1111/bmsp.12096.

  39. Maas, C. J. M., & Hox, J. J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58, 127–137.

  40. McMurray, B., Samelson, V. M., Lee, S. H., & Tomblin, J. B. (2010). Individual differences in online spoken word recognition: Implications for SLI. Cognitive Psychology, 60, 1–39.

  41. Mirman, D., Dixon, J. A., & Magnuson, J. S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59, 475–494.

  42. Molenaar, P. (1985). A dynamic factor model for the analysis of multivariate time series. Psychometricka, 50, 181–202.

  43. Molenaar, P., & Ram, N. (2009). Advances in dynamic factor analysis of psychological processes. In J. Valsiner, P. Molenaar, M. Lyra, & N. Chaudhary (Eds.), Dynamic process methodology in the social and developmental sciences (pp. 255–268). New York, NY: Springer.

  44. Molenberghs, G., & Verbeke, G. (2007). Likelihood ratio, score, and Wald tests in a constrained parameter space. The American Statistician, 61, 22–27.

  45. Quené, H., & van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59, 413–425.

  46. R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved November 1, 2016, from https://www.R-project.org/.

  47. Raaijmakers, J. G. W., Schrijnemakers, J. M. C., & Gremmen, F. (1999). How to deal with the language-as-fixed-effect fallacy: Common misconceptions and alternative solutions. Journal of Memory and Language, 41, 416–426.

  48. Rovine, M. J., & Walls, T. A. (2006). Multilevel autoregressive modeling of interindividual differences in the stability of a process. In T. A. Walls & J. L. Schafer (Eds.), Models for intensive longitudinal data (pp. 124–147). New York, NY: Oxford University Press.

  49. Ryskin, R., Benjamin, A., Tullis, J., & Brown-Schmidt, S. (2015). Perspective-taking in comprehension, production, and memory: An individual differences approach. Journal of Experimental Psychology: General, 144, 898–915.

  50. Ryskin, R. A., Qi, Z., Duff, M. C., & Brown-Schmidt, S. (2017). Verb biases are shaped through lifelong learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(5), 781–794.

  51. Salverda, A., Kleinschmidt, D., & Tanenhaus, M. (2014). Immediate effects of anticipatory coarticulation in spoken-word recognition. Journal of Memory and Language, 71, 145–163.

  52. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

  53. Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G., & Carlson, G. N. (1999). Achieving incremental semantic interpretation through contextual representation. Cognition, 71, 109–147.

  54. Self, S. G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82, 605–610.

  55. Skrondal, A., & Rabe-Hesketh, S. (2014). Handling initial conditions and endogenous covariates in dynamic/transition models for binary data with unobserved heterogeneity. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63, 211–237.

  56. Song, H., & Zhang, Z. (2014). Analyzing multiple multivariate time series data using multilevel dynamic factor models. Multivariate Behavioral Research, 49, 67–77.

  57. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632–1634.

  58. Unema, P. J. A., Pannasch, S., Joos, M., & Velichkovsky, B. M. (2005). Time course of information processing during scene perception: The relationship between saccade amplitude and fixation duration. Visual Cognition, 12, 473–494.

  59. van Buuren, S. (1997). Fitting ARMA time series by structural equation models. Psychometrika, 62, 215–236.

  60. van Rijn, P., Dolan, C. V., & Molenaar, P. C. M. (2010). State space methods for item response modeling of multisubject time series. In P. C. M. Molenaar & K. M. Newell (Eds.), Individual pathways of change: Statistical models for analyzing learning and development (pp. 125–151). Washington, DC: American Psychological Association.

  61. Wang, X., Berger, J. O., & Burdick, D. S. (2013). Bayesian analysis of dynamic item response models in educational testing. The Annals of Applied Statistics, 7, 126–153.

  62. Yoon, S. O., & Brown-Schmidt, S. (2014). Adjusting conceptual pacts in three-party conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 919–937.

Download references


We thank Dr. Paul De Boeck (Ohio State University and KU Leuven) for comments on an earlier draft and the reviewers for their constructive comments that have led to improvement on the first version of this paper. Funding The original data collection and this work were supported in part by National Science Foundation Grants BCS 12-57029 and BCS 15-56700 to Sarah Brown-Schmidt.

Author information

Correspondence to Sun-Joo Cho.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cho, S., Brown-Schmidt, S. & Lee, W. Autoregressive Generalized Linear Mixed Effect Models with Crossed Random Effects: An Application to Intensive Binary Time Series Eye-Tracking Data. Psychometrika 83, 751–771 (2018). https://doi.org/10.1007/s11336-018-9604-2

Download citation


  • eye-tracking data
  • generalized linear mixed effect model
  • intensive binary time series data
  • random item effect