Skip to main content
Log in

Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

Data availability

All data are available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Notes

  1. Although there has been increasing use of Bayesian models, e.g., using the brms package (Bürkner, 2017), brms with default priors shows qualitatively similar behavior to lme4 in our simulations.

  2. Shrinkage can of course move points into positions corresponding to unobservable probabilities, but points do not appear to move enough in practice for striations to disappear.

  3. We also ran these models with probabilities converted into logits (an empirical logit analysis), since a linear model on probabilities fits poorly at the limits of the probability space. This did not improve fit, and all interactions mentioned below remained significant. For the empirical logit analysis, we added .001 in probability to address cases where the false alarm rate was zero. Because false alarm rates are based on 1000 observations per combination of parameters, this should not strongly affect inferences from the empirical logit analysis (cf. Donnelly & Verkuilen, 2017, for situations in which empirical logit analysis is problematic).

  4. These models have fits between adjusted R2 = 82% and 87% except for HH vs. ACR, which has a fit of 77%. Therefore, a necessary caveat is that the variance in HH is not fully captured by our model, possibly because the null hypothesis for HH is that the distribution is uniform rather than normal.

References

  • Ameijeiras-Alonso, J., Crujeiras, R. M., & Rodriguez-Casal, A. (2021). Multimode: An R package for mode assessment. Journal of Statistical Software, 97(9), 1–32. https://doi.org/10.18637/jss.v097

    Article  Google Scholar 

  • Barr, D. J., Levy, R., Schepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.

    Article  Google Scholar 

  • Barth, D., & Kapatsinski, V. (2018). Evaluating logistic mixed-effects models of corpus-linguistic data in light of lexical diffusion. In Mixed-effects regression models in linguistics (pp. 99–116). Springer.

    Chapter  Google Scholar 

  • Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.

    Article  Google Scholar 

  • Bürkner, P.-C. (2017). Brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.

    Article  Google Scholar 

  • Burnham, K. P., & Anderson, D. R. (2004). Model selection and multimodel inference: A practical, model-theoretic approach. Springer.

    Book  Google Scholar 

  • Cheng, M. Y., & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society. Series B, 60, 579–589.

    Google Scholar 

  • Clark, R. G., Blanchard, W., Hui, F. K., Tian, R., & Woods, H. (2023). Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data. Research Methods in Applied Linguistics, 2(1), 100044.

    Article  Google Scholar 

  • Dąbrowska, E. (2012). Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism, 2(3), 219–253.

    Article  Google Scholar 

  • Dąbrowska, E., & Divjak, D. (2019). Individual differences in grammatical knowledge. Cognitive Linguistics, 3, 231–250.

    Google Scholar 

  • Donnelly, S., & Verkuilen, J. (2017). Empirical logit analysis is not logistic regression. Journal of Memory and Language, 94, 28–42.

    Article  Google Scholar 

  • Doornik, J. A., & Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics, 70, 927–939.

    Article  Google Scholar 

  • Drager, K., & Hay, J. (2012). Exploiting random intercepts: Two case studies in sociophonetics. Language Variation and Change, 24(1), 59–78.

    Article  Google Scholar 

  • Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). Diagnosing misspecification of the random-effects distribution in mixed models. Biometrics, 73(1), 63–71.

    Article  PubMed  Google Scholar 

  • Eager, C., & Roy, J. (2017). Mixed effects models are sometimes terrible. arXiv preprint arXiv:1701.04858.

  • Efendi, A., Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). A goodness-of-fit test for the random-effects distribution in mixed models. Statistical Methods in Medical Research, 26(2), 970–983.

    Article  PubMed  Google Scholar 

  • Fisher, N. I., & Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika, 88, 419–517.

    Article  Google Scholar 

  • Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  • Gleitman, L. R., January, D., Nappa, R., & Trueswell, J. C. (2007). On the give and take between event apprehension and utterance formulation. Journal of Memory and Language, 57, 544–569.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hall, P., & York, M. (2001). On the calibration of Silverman’s test for multimodality. Statistica Sinica, 11, 515–536.

    Google Scholar 

  • Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. Annals of Statistics, 13, 70–84.

    Article  Google Scholar 

  • Heagerty, P. J., & Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88, 973–985.

    Article  Google Scholar 

  • Henze, N., & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in Statistics-Theory and Methods, 19(10), 3595–3617.

    Article  Google Scholar 

  • Hodges, J. S. (2014). Richly parameterized linear models: Additive, time series, and spatial models using random effects. Chapman and Hall/CRC.

    Google Scholar 

  • Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  Google Scholar 

  • Huang, X. (2009). Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response. Biometrics, 65(2), 361–368.

    Article  PubMed  Google Scholar 

  • Huang, X. (2011). Detecting random-effects model misspecification via coarsened data. Computational Statistics & Data Analysis, 55(1), 703–714.

    Article  Google Scholar 

  • Hudson Kam, C. L., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1(2), 151–195.

    Article  Google Scholar 

  • Idemaru, K., Holt, L. L., & Seltman, H. (2012). Individual differences in cue weights are stable across time: The case of Japanese stop lengths. The Journal of the Acoustical Society of America, 132(6), 3950–3964.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kimball, A. E., Shantz, K., Eager, C., & Roy, J. (2019). Confronting quasi-separation in logistic mixed effects for linguistic data: A Bayesian approach. Journal of Quantitative Linguistics, 26(3), 231–255.

    Article  Google Scholar 

  • Kievit, R. A., Frankenhuis, W. E., Waldorp, L. J., & Borsboom, D. (2013). Simpson's paradox in psychological science: A practical guide. Frontiers in Psychology, 4, 513.

    Article  PubMed  PubMed Central  Google Scholar 

  • Korkmaz, S., Göksülük, D., & Zararsiz, G. (2014). MVN: An R package for assessing multivariate normality. R Journal, 6(2), 151–162.

    Article  Google Scholar 

  • Litière, S., Alonso, A., & Molenberghs, G. (2007). Type I and type II error under random-effects misspecification in generalized linear mixed models. Biometrics, 63(4), 1038–1044.

    Article  PubMed  Google Scholar 

  • Litière, S., Alonso, A., & Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine, 27(16), 3125–3144.

    Article  PubMed  Google Scholar 

  • Liu, J., & Hodges, J. S. (2003). Posterior bimodality in the balanced one-way random-effects model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 247–255.

    Article  Google Scholar 

  • Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530.

    Article  Google Scholar 

  • Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.

    Article  Google Scholar 

  • McCulloch, C. E., & Neuhaus, J. M. (2011a). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.

    Article  Google Scholar 

  • McCulloch, C. E., & Neuhaus, J. M. (2011b). Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics, 67(1), 270–279.

    Article  PubMed  PubMed Central  Google Scholar 

  • Menn, L., & Vihman, M. (2011). Features in child phonology. In Clements & Ridouane (Eds.), Where do phonological features come from, 261–301. .

  • Mielke, J., Baker, A., & Archangeli, D. (2016). Individual-level contact limits phonological complexity: Evidence from bunched and retroflex/ɹ. Language, 92(1), 101–140.

    Article  Google Scholar 

  • Miglio, V. G., Gries, S. T., Harris, M. J., Wheeler, E. M., & Santana-Paixão, R. (2013). Spanish lo(s)-le(s) clitic alternations in psych verbs: A multifactorial corpus-based analysis. In J. Cabrelli Amaro, G. Lord, A. de Prada Pérez, & J. E. Aaron (Eds.), Selected proceedings of the 15th Hispanic linguistics symposium (pp. 268–278). Cascadilla Press.

    Google Scholar 

  • Móri, T. F., Székely, G. J., & Rizzo, M. L. (2021). On energy tests of normality. Journal of Statistical Planning and Inference, 213, 1–15.

    Article  Google Scholar 

  • Mouselimis L (2023). ClusterR: Gaussian mixture models, K-means, mini-batch-Kmeans, K-Medoids and affinity propagation clustering. R package version 1.3.0, <https://CRAN.R-project.org/package=ClusterR>.

  • Piccini, R. (2019). Statistical learning and the update of sensory priors in human participants(M.S. Thesis,. University of Edinburgh).

    Google Scholar 

  • Roettger, T. B., Mahrt, T., & Cole, J. (2019). Mapping prosody onto meaning–the case of information structure in American English. Language, Cognition and Neuroscience, 34(7), 841–860.

    Article  Google Scholar 

  • Royston, P. (1991). Estimating departure from normality. Statistics in Medicine, 10(8), 1283–1293.

    Article  PubMed  Google Scholar 

  • Schertz, J., Cho, T., Lotto, A., & Warner, N. (2015). Individual differences in phonetic cue use in production and perception of a non-native sound contrast. Journal of Phonetics, 52, 183–204.

    Article  PubMed  PubMed Central  Google Scholar 

  • Schertz, J., Cho, T., Lotto, A., & Warner, N. (2016). Individual differences in perceptual adaptability of foreign sound categories. Attention, Perception, & Psychophysics, 78(1), 355–367.

    Article  Google Scholar 

  • Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., & Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152.

    Article  Google Scholar 

  • Schumacher, R. A., & Pierrehumbert, J. B. (2021). Familiarity, consistency, and systematizing in morphology. Cognition, 212, 104512.

    Article  PubMed  Google Scholar 

  • Siffer. A. (2018). Rfolding: The folding test of unimodality. R package version 1.0, <https://CRAN.R-project.org/package=Rfolding>.

  • Silk, M. J., Harrison, X. A., & Hodgson, D. J. (2020). Perils and pitfalls of mixed-effects regression models in biology. PeerJ, 8, e9522.

    Article  Google Scholar 

  • Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B, 43, 97–99.

    Google Scholar 

  • Smolek, A. (2019). Teaching papa to cha-cha: How change magnitude, temporal contiguity, and task affect alternation learning(Ph.D. Dissertation,. University of Oregon).

    Google Scholar 

  • Sonderegger, M. (2023). Regression modeling for linguistic data. Cambridge, MA: MIT Press.

    Google Scholar 

  • Stengård, E., Juslin, P., Hahn, U., & van den Berg, R. (2022). On the generality and cognitive basis of base-rate neglect. Cognition, 226, 105160.

    Article  PubMed  Google Scholar 

  • Székely, G. J., & Rizzo, M. L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93(1), 58–80.

    Article  Google Scholar 

  • Tomlin, R. S. (1995). Focal attention, voice, and word order. In M. Noonan & P. A. Downing (Eds.), Word order in discourse (pp. 517–552). Amsterdam: John Benjamins.

    Chapter  Google Scholar 

  • Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.

    Article  Google Scholar 

  • White, J. (2014). Evidence for a learning bias against saltatory phonological alternations. Cognition, 130(1), 96–115.

    Article  PubMed  Google Scholar 

  • Wilson, C. (2006). Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science, 30(5), 945–982.

    Article  PubMed  Google Scholar 

  • Zuraw, K. (2016). Polarized variation. Catalan. Journal of Linguistics, 15, 145–171.

    Google Scholar 

  • Zymet, J. (2018). Lexical propensities in phonology: Corpus and experimental evidence, grammar, and learning(Ph.D. Dissertation,. University of California.

    Google Scholar 

Download references

Acknowledgments

Many thanks to Dr. Santiago Barreda for useful discussion of Bayesian statistics and appropriate priors for the brms analysis. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The experimental work was conducted when Zachary Houghton was at the Department of Linguistics, University of Oregon, and formed part of his Honors Thesis in Linguistics.

Open practices statement

The code, data, and aggregated simulation results are available on OSF at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Code availability

All code is available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

ZNH conceived of the original empirical study. ZNH and VK designed the experiment. ZNH conducted the experiment. VK conducted the statistical analyses reported here and wrote the code. ZNH reviewed the code for errors. ZNH and VK both contributed to writing the paper, revisions and responding to reviewers.

Corresponding author

Correspondence to Vsevolod Kapatsinski.

Ethics declarations

Conflicts of interest/competing interests

The authors have no relevant financial or non-financial competing interests to disclose.

Consent to participate

Participants in the empirical study provided written consent to participate in the study.

Consent for publication

Participants in the empirical study provided written consent for publication of their deidentified data.

Ethics approval

The empirical study analyzed here was approved by the University of Oregon Institutional Review Board. The study was performed in accordance with the ethical standard as laid down in the 1964 Declaration of Helsinki and its later amendments.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Houghton, Z.N., Kapatsinski, V. Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression. Behav Res (2023). https://doi.org/10.3758/s13428-023-02287-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02287-y

Keywords

Navigation