Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

Houghton, Zachary N.; Kapatsinski, Vsevolod

doi:10.3758/s13428-023-02287-y

Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

Original Manuscript
Published: 28 November 2023

(2023)
Cite this article

Behavior Research Methods Aims and scope Submit manuscript

Zachary N. Houghton^1,2 &
Vsevolod Kapatsinski²

250 Accesses
2 Altmetric
Explore all metrics

Abstract

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixed-Effects Regression Modeling

Sample Size Determination for Bayesian Hierarchical Models Commonly Used in Psycholinguistics

Article Open access 04 March 2022

Fixed-Effects Regression Modeling

Data availability

All data are available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Notes

Although there has been increasing use of Bayesian models, e.g., using the brms package (Bürkner, 2017), brms with default priors shows qualitatively similar behavior to lme4 in our simulations.
Shrinkage can of course move points into positions corresponding to unobservable probabilities, but points do not appear to move enough in practice for striations to disappear.
We also ran these models with probabilities converted into logits (an empirical logit analysis), since a linear model on probabilities fits poorly at the limits of the probability space. This did not improve fit, and all interactions mentioned below remained significant. For the empirical logit analysis, we added .001 in probability to address cases where the false alarm rate was zero. Because false alarm rates are based on 1000 observations per combination of parameters, this should not strongly affect inferences from the empirical logit analysis (cf. Donnelly & Verkuilen, 2017, for situations in which empirical logit analysis is problematic).
These models have fits between adjusted R² = 82% and 87% except for HH vs. ACR, which has a fit of 77%. Therefore, a necessary caveat is that the variance in HH is not fully captured by our model, possibly because the null hypothesis for HH is that the distribution is uniform rather than normal.

References

Ameijeiras-Alonso, J., Crujeiras, R. M., & Rodriguez-Casal, A. (2021). Multimode: An R package for mode assessment. Journal of Statistical Software, 97(9), 1–32. https://doi.org/10.18637/jss.v097
Article Google Scholar
Barr, D. J., Levy, R., Schepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Article Google Scholar
Barth, D., & Kapatsinski, V. (2018). Evaluating logistic mixed-effects models of corpus-linguistic data in light of lexical diffusion. In Mixed-effects regression models in linguistics (pp. 99–116). Springer.
Chapter Google Scholar
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
Article Google Scholar
Bürkner, P.-C. (2017). Brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.
Article Google Scholar
Burnham, K. P., & Anderson, D. R. (2004). Model selection and multimodel inference: A practical, model-theoretic approach. Springer.
Book Google Scholar
Cheng, M. Y., & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society. Series B, 60, 579–589.
Google Scholar
Clark, R. G., Blanchard, W., Hui, F. K., Tian, R., & Woods, H. (2023). Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data. Research Methods in Applied Linguistics, 2(1), 100044.
Article Google Scholar
Dąbrowska, E. (2012). Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism, 2(3), 219–253.
Article Google Scholar
Dąbrowska, E., & Divjak, D. (2019). Individual differences in grammatical knowledge. Cognitive Linguistics, 3, 231–250.
Google Scholar
Donnelly, S., & Verkuilen, J. (2017). Empirical logit analysis is not logistic regression. Journal of Memory and Language, 94, 28–42.
Article Google Scholar
Doornik, J. A., & Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics, 70, 927–939.
Article Google Scholar
Drager, K., & Hay, J. (2012). Exploiting random intercepts: Two case studies in sociophonetics. Language Variation and Change, 24(1), 59–78.
Article Google Scholar
Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). Diagnosing misspecification of the random-effects distribution in mixed models. Biometrics, 73(1), 63–71.
Article PubMed Google Scholar
Eager, C., & Roy, J. (2017). Mixed effects models are sometimes terrible. arXiv preprint arXiv:1701.04858.
Efendi, A., Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). A goodness-of-fit test for the random-effects distribution in mixed models. Statistical Methods in Medical Research, 26(2), 970–983.
Article PubMed Google Scholar
Fisher, N. I., & Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika, 88, 419–517.
Article Google Scholar
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press.
Book Google Scholar
Gleitman, L. R., January, D., Nappa, R., & Trueswell, J. C. (2007). On the give and take between event apprehension and utterance formulation. Journal of Memory and Language, 57, 544–569.
Article PubMed PubMed Central Google Scholar
Hall, P., & York, M. (2001). On the calibration of Silverman’s test for multimodality. Statistica Sinica, 11, 515–536.
Google Scholar
Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. Annals of Statistics, 13, 70–84.
Article Google Scholar
Heagerty, P. J., & Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88, 973–985.
Article Google Scholar
Henze, N., & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in Statistics-Theory and Methods, 19(10), 3595–3617.
Article Google Scholar
Hodges, J. S. (2014). Richly parameterized linear models: Additive, time series, and spatial models using random effects. Chapman and Hall/CRC.
Google Scholar
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.
Article Google Scholar
Huang, X. (2009). Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response. Biometrics, 65(2), 361–368.
Article PubMed Google Scholar
Huang, X. (2011). Detecting random-effects model misspecification via coarsened data. Computational Statistics & Data Analysis, 55(1), 703–714.
Article Google Scholar
Hudson Kam, C. L., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1(2), 151–195.
Article Google Scholar
Idemaru, K., Holt, L. L., & Seltman, H. (2012). Individual differences in cue weights are stable across time: The case of Japanese stop lengths. The Journal of the Acoustical Society of America, 132(6), 3950–3964.
Article PubMed PubMed Central Google Scholar
Kimball, A. E., Shantz, K., Eager, C., & Roy, J. (2019). Confronting quasi-separation in logistic mixed effects for linguistic data: A Bayesian approach. Journal of Quantitative Linguistics, 26(3), 231–255.
Article Google Scholar
Kievit, R. A., Frankenhuis, W. E., Waldorp, L. J., & Borsboom, D. (2013). Simpson's paradox in psychological science: A practical guide. Frontiers in Psychology, 4, 513.
Article PubMed PubMed Central Google Scholar
Korkmaz, S., Göksülük, D., & Zararsiz, G. (2014). MVN: An R package for assessing multivariate normality. R Journal, 6(2), 151–162.
Article Google Scholar
Litière, S., Alonso, A., & Molenberghs, G. (2007). Type I and type II error under random-effects misspecification in generalized linear mixed models. Biometrics, 63(4), 1038–1044.
Article PubMed Google Scholar
Litière, S., Alonso, A., & Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine, 27(16), 3125–3144.
Article PubMed Google Scholar
Liu, J., & Hodges, J. S. (2003). Posterior bimodality in the balanced one-way random-effects model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 247–255.
Article Google Scholar
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530.
Article Google Scholar
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
Article Google Scholar
McCulloch, C. E., & Neuhaus, J. M. (2011a). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.
Article Google Scholar
McCulloch, C. E., & Neuhaus, J. M. (2011b). Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics, 67(1), 270–279.
Article PubMed PubMed Central Google Scholar
Menn, L., & Vihman, M. (2011). Features in child phonology. In Clements & Ridouane (Eds.), Where do phonological features come from, 261–301. .
Mielke, J., Baker, A., & Archangeli, D. (2016). Individual-level contact limits phonological complexity: Evidence from bunched and retroflex/ɹ. Language, 92(1), 101–140.
Article Google Scholar
Miglio, V. G., Gries, S. T., Harris, M. J., Wheeler, E. M., & Santana-Paixão, R. (2013). Spanish lo(s)-le(s) clitic alternations in psych verbs: A multifactorial corpus-based analysis. In J. Cabrelli Amaro, G. Lord, A. de Prada Pérez, & J. E. Aaron (Eds.), Selected proceedings of the 15th Hispanic linguistics symposium (pp. 268–278). Cascadilla Press.
Google Scholar
Móri, T. F., Székely, G. J., & Rizzo, M. L. (2021). On energy tests of normality. Journal of Statistical Planning and Inference, 213, 1–15.
Article Google Scholar
Mouselimis L (2023). ClusterR: Gaussian mixture models, K-means, mini-batch-Kmeans, K-Medoids and affinity propagation clustering. R package version 1.3.0, <https://CRAN.R-project.org/package=ClusterR>.
Piccini, R. (2019). Statistical learning and the update of sensory priors in human participants(M.S. Thesis,. University of Edinburgh).
Google Scholar
Roettger, T. B., Mahrt, T., & Cole, J. (2019). Mapping prosody onto meaning–the case of information structure in American English. Language, Cognition and Neuroscience, 34(7), 841–860.
Article Google Scholar
Royston, P. (1991). Estimating departure from normality. Statistics in Medicine, 10(8), 1283–1293.
Article PubMed Google Scholar
Schertz, J., Cho, T., Lotto, A., & Warner, N. (2015). Individual differences in phonetic cue use in production and perception of a non-native sound contrast. Journal of Phonetics, 52, 183–204.
Article PubMed PubMed Central Google Scholar
Schertz, J., Cho, T., Lotto, A., & Warner, N. (2016). Individual differences in perceptual adaptability of foreign sound categories. Attention, Perception, & Psychophysics, 78(1), 355–367.
Article Google Scholar
Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., & Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152.
Article Google Scholar
Schumacher, R. A., & Pierrehumbert, J. B. (2021). Familiarity, consistency, and systematizing in morphology. Cognition, 212, 104512.
Article PubMed Google Scholar
Siffer. A. (2018). Rfolding: The folding test of unimodality. R package version 1.0, <https://CRAN.R-project.org/package=Rfolding>.
Silk, M. J., Harrison, X. A., & Hodgson, D. J. (2020). Perils and pitfalls of mixed-effects regression models in biology. PeerJ, 8, e9522.
Article Google Scholar
Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B, 43, 97–99.
Google Scholar
Smolek, A. (2019). Teaching papa to cha-cha: How change magnitude, temporal contiguity, and task affect alternation learning(Ph.D. Dissertation,. University of Oregon).
Google Scholar
Sonderegger, M. (2023). Regression modeling for linguistic data. Cambridge, MA: MIT Press.
Google Scholar
Stengård, E., Juslin, P., Hahn, U., & van den Berg, R. (2022). On the generality and cognitive basis of base-rate neglect. Cognition, 226, 105160.
Article PubMed Google Scholar
Székely, G. J., & Rizzo, M. L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93(1), 58–80.
Article Google Scholar
Tomlin, R. S. (1995). Focal attention, voice, and word order. In M. Noonan & P. A. Downing (Eds.), Word order in discourse (pp. 517–552). Amsterdam: John Benjamins.
Chapter Google Scholar
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
Article Google Scholar
White, J. (2014). Evidence for a learning bias against saltatory phonological alternations. Cognition, 130(1), 96–115.
Article PubMed Google Scholar
Wilson, C. (2006). Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science, 30(5), 945–982.
Article PubMed Google Scholar
Zuraw, K. (2016). Polarized variation. Catalan. Journal of Linguistics, 15, 145–171.
Google Scholar
Zymet, J. (2018). Lexical propensities in phonology: Corpus and experimental evidence, grammar, and learning(Ph.D. Dissertation,. University of California.
Google Scholar

Download references

Acknowledgments

Many thanks to Dr. Santiago Barreda for useful discussion of Bayesian statistics and appropriate priors for the brms analysis. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The experimental work was conducted when Zachary Houghton was at the Department of Linguistics, University of Oregon, and formed part of his Honors Thesis in Linguistics.

Open practices statement

The code, data, and aggregated simulation results are available on OSF at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Code availability

All code is available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Linguistics, University of California, Davis, Kerr Hall, Davis, CA, 95616, USA
Zachary N. Houghton
Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, OR, 97403, USA
Zachary N. Houghton & Vsevolod Kapatsinski

Authors

Zachary N. Houghton
View author publications
You can also search for this author in PubMed Google Scholar
Vsevolod Kapatsinski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZNH conceived of the original empirical study. ZNH and VK designed the experiment. ZNH conducted the experiment. VK conducted the statistical analyses reported here and wrote the code. ZNH reviewed the code for errors. ZNH and VK both contributed to writing the paper, revisions and responding to reviewers.

Corresponding author

Correspondence to Vsevolod Kapatsinski.

Ethics declarations

Conflicts of interest/competing interests

The authors have no relevant financial or non-financial competing interests to disclose.

Consent to participate

Participants in the empirical study provided written consent to participate in the study.

Consent for publication

Participants in the empirical study provided written consent for publication of their deidentified data.

Ethics approval

The empirical study analyzed here was approved by the University of Oregon Institutional Review Board. The study was performed in accordance with the ethical standard as laid down in the 1964 Declaration of Helsinki and its later amendments.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Houghton, Z.N., Kapatsinski, V. Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression. Behav Res (2023). https://doi.org/10.3758/s13428-023-02287-y

Download citation

Accepted: 01 November 2023
Published: 28 November 2023
DOI: https://doi.org/10.3758/s13428-023-02287-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

Abstract

Access this article

Similar content being viewed by others

Mixed-Effects Regression Modeling

Sample Size Determination for Bayesian Hierarchical Models Commonly Used in Psycholinguistics

Fixed-Effects Regression Modeling

Data availability

Notes

References

Acknowledgments

Open practices statement

Code availability

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/competing interests

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression

Abstract

Access this article

Similar content being viewed by others

Mixed-Effects Regression Modeling

Sample Size Determination for Bayesian Hierarchical Models Commonly Used in Psycholinguistics

Fixed-Effects Regression Modeling

Data availability

Notes

References

Acknowledgments

Open practices statement

Code availability

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest/competing interests

Consent to participate

Consent for publication

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation