Abstract
With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
Similar content being viewed by others
Data availability
All data are available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee
Notes
Although there has been increasing use of Bayesian models, e.g., using the brms package (Bürkner, 2017), brms with default priors shows qualitatively similar behavior to lme4 in our simulations.
Shrinkage can of course move points into positions corresponding to unobservable probabilities, but points do not appear to move enough in practice for striations to disappear.
We also ran these models with probabilities converted into logits (an empirical logit analysis), since a linear model on probabilities fits poorly at the limits of the probability space. This did not improve fit, and all interactions mentioned below remained significant. For the empirical logit analysis, we added .001 in probability to address cases where the false alarm rate was zero. Because false alarm rates are based on 1000 observations per combination of parameters, this should not strongly affect inferences from the empirical logit analysis (cf. Donnelly & Verkuilen, 2017, for situations in which empirical logit analysis is problematic).
These models have fits between adjusted R2 = 82% and 87% except for HH vs. ACR, which has a fit of 77%. Therefore, a necessary caveat is that the variance in HH is not fully captured by our model, possibly because the null hypothesis for HH is that the distribution is uniform rather than normal.
References
Ameijeiras-Alonso, J., Crujeiras, R. M., & Rodriguez-Casal, A. (2021). Multimode: An R package for mode assessment. Journal of Statistical Software, 97(9), 1–32. https://doi.org/10.18637/jss.v097
Barr, D. J., Levy, R., Schepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Barth, D., & Kapatsinski, V. (2018). Evaluating logistic mixed-effects models of corpus-linguistic data in light of lexical diffusion. In Mixed-effects regression models in linguistics (pp. 99–116). Springer.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
Bürkner, P.-C. (2017). Brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.
Burnham, K. P., & Anderson, D. R. (2004). Model selection and multimodel inference: A practical, model-theoretic approach. Springer.
Cheng, M. Y., & Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society. Series B, 60, 579–589.
Clark, R. G., Blanchard, W., Hui, F. K., Tian, R., & Woods, H. (2023). Dealing with complete separation and quasi-complete separation in logistic regression for linguistic data. Research Methods in Applied Linguistics, 2(1), 100044.
Dąbrowska, E. (2012). Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism, 2(3), 219–253.
Dąbrowska, E., & Divjak, D. (2019). Individual differences in grammatical knowledge. Cognitive Linguistics, 3, 231–250.
Donnelly, S., & Verkuilen, J. (2017). Empirical logit analysis is not logistic regression. Journal of Memory and Language, 94, 28–42.
Doornik, J. A., & Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics, 70, 927–939.
Drager, K., & Hay, J. (2012). Exploiting random intercepts: Two case studies in sociophonetics. Language Variation and Change, 24(1), 59–78.
Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). Diagnosing misspecification of the random-effects distribution in mixed models. Biometrics, 73(1), 63–71.
Eager, C., & Roy, J. (2017). Mixed effects models are sometimes terrible. arXiv preprint arXiv:1701.04858.
Efendi, A., Drikvandi, R., Verbeke, G., & Molenberghs, G. (2017). A goodness-of-fit test for the random-effects distribution in mixed models. Statistical Methods in Medical Research, 26(2), 970–983.
Fisher, N. I., & Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika, 88, 419–517.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press.
Gleitman, L. R., January, D., Nappa, R., & Trueswell, J. C. (2007). On the give and take between event apprehension and utterance formulation. Journal of Memory and Language, 57, 544–569.
Hall, P., & York, M. (2001). On the calibration of Silverman’s test for multimodality. Statistica Sinica, 11, 515–536.
Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. Annals of Statistics, 13, 70–84.
Heagerty, P. J., & Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika, 88, 973–985.
Henze, N., & Zirkler, B. (1990). A class of invariant consistent tests for multivariate normality. Communications in Statistics-Theory and Methods, 19(10), 3595–3617.
Hodges, J. S. (2014). Richly parameterized linear models: Additive, time series, and spatial models using random effects. Chapman and Hall/CRC.
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.
Huang, X. (2009). Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response. Biometrics, 65(2), 361–368.
Huang, X. (2011). Detecting random-effects model misspecification via coarsened data. Computational Statistics & Data Analysis, 55(1), 703–714.
Hudson Kam, C. L., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1(2), 151–195.
Idemaru, K., Holt, L. L., & Seltman, H. (2012). Individual differences in cue weights are stable across time: The case of Japanese stop lengths. The Journal of the Acoustical Society of America, 132(6), 3950–3964.
Kimball, A. E., Shantz, K., Eager, C., & Roy, J. (2019). Confronting quasi-separation in logistic mixed effects for linguistic data: A Bayesian approach. Journal of Quantitative Linguistics, 26(3), 231–255.
Kievit, R. A., Frankenhuis, W. E., Waldorp, L. J., & Borsboom, D. (2013). Simpson's paradox in psychological science: A practical guide. Frontiers in Psychology, 4, 513.
Korkmaz, S., Göksülük, D., & Zararsiz, G. (2014). MVN: An R package for assessing multivariate normality. R Journal, 6(2), 151–162.
Litière, S., Alonso, A., & Molenberghs, G. (2007). Type I and type II error under random-effects misspecification in generalized linear mixed models. Biometrics, 63(4), 1038–1044.
Litière, S., Alonso, A., & Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Statistics in Medicine, 27(16), 3125–3144.
Liu, J., & Hodges, J. S. (2003). Posterior bimodality in the balanced one-way random-effects model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 247–255.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530.
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
McCulloch, C. E., & Neuhaus, J. M. (2011a). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statistical Science, 26(3), 388–402.
McCulloch, C. E., & Neuhaus, J. M. (2011b). Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics, 67(1), 270–279.
Menn, L., & Vihman, M. (2011). Features in child phonology. In Clements & Ridouane (Eds.), Where do phonological features come from, 261–301. .
Mielke, J., Baker, A., & Archangeli, D. (2016). Individual-level contact limits phonological complexity: Evidence from bunched and retroflex/ɹ. Language, 92(1), 101–140.
Miglio, V. G., Gries, S. T., Harris, M. J., Wheeler, E. M., & Santana-Paixão, R. (2013). Spanish lo(s)-le(s) clitic alternations in psych verbs: A multifactorial corpus-based analysis. In J. Cabrelli Amaro, G. Lord, A. de Prada Pérez, & J. E. Aaron (Eds.), Selected proceedings of the 15th Hispanic linguistics symposium (pp. 268–278). Cascadilla Press.
Móri, T. F., Székely, G. J., & Rizzo, M. L. (2021). On energy tests of normality. Journal of Statistical Planning and Inference, 213, 1–15.
Mouselimis L (2023). ClusterR: Gaussian mixture models, K-means, mini-batch-Kmeans, K-Medoids and affinity propagation clustering. R package version 1.3.0, <https://CRAN.R-project.org/package=ClusterR>.
Piccini, R. (2019). Statistical learning and the update of sensory priors in human participants(M.S. Thesis,. University of Edinburgh).
Roettger, T. B., Mahrt, T., & Cole, J. (2019). Mapping prosody onto meaning–the case of information structure in American English. Language, Cognition and Neuroscience, 34(7), 841–860.
Royston, P. (1991). Estimating departure from normality. Statistics in Medicine, 10(8), 1283–1293.
Schertz, J., Cho, T., Lotto, A., & Warner, N. (2015). Individual differences in phonetic cue use in production and perception of a non-native sound contrast. Journal of Phonetics, 52, 183–204.
Schertz, J., Cho, T., Lotto, A., & Warner, N. (2016). Individual differences in perceptual adaptability of foreign sound categories. Attention, Perception, & Psychophysics, 78(1), 355–367.
Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., & Araya-Ajoy, Y. G. (2020). Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152.
Schumacher, R. A., & Pierrehumbert, J. B. (2021). Familiarity, consistency, and systematizing in morphology. Cognition, 212, 104512.
Siffer. A. (2018). Rfolding: The folding test of unimodality. R package version 1.0, <https://CRAN.R-project.org/package=Rfolding>.
Silk, M. J., Harrison, X. A., & Hodgson, D. J. (2020). Perils and pitfalls of mixed-effects regression models in biology. PeerJ, 8, e9522.
Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society. Series B, 43, 97–99.
Smolek, A. (2019). Teaching papa to cha-cha: How change magnitude, temporal contiguity, and task affect alternation learning(Ph.D. Dissertation,. University of Oregon).
Sonderegger, M. (2023). Regression modeling for linguistic data. Cambridge, MA: MIT Press.
Stengård, E., Juslin, P., Hahn, U., & van den Berg, R. (2022). On the generality and cognitive basis of base-rate neglect. Cognition, 226, 105160.
Székely, G. J., & Rizzo, M. L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93(1), 58–80.
Tomlin, R. S. (1995). Focal attention, voice, and word order. In M. Noonan & P. A. Downing (Eds.), Word order in discourse (pp. 517–552). Amsterdam: John Benjamins.
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779–804.
White, J. (2014). Evidence for a learning bias against saltatory phonological alternations. Cognition, 130(1), 96–115.
Wilson, C. (2006). Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science, 30(5), 945–982.
Zuraw, K. (2016). Polarized variation. Catalan. Journal of Linguistics, 15, 145–171.
Zymet, J. (2018). Lexical propensities in phonology: Corpus and experimental evidence, grammar, and learning(Ph.D. Dissertation,. University of California.
Acknowledgments
Many thanks to Dr. Santiago Barreda for useful discussion of Bayesian statistics and appropriate priors for the brms analysis. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The experimental work was conducted when Zachary Houghton was at the Department of Linguistics, University of Oregon, and formed part of his Honors Thesis in Linguistics.
Open practices statement
The code, data, and aggregated simulation results are available on OSF at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee
Code availability
All code is available at https://osf.io/prj34/?view_only=9fe15609f61746c5853c9c6dacf7e6ee
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
ZNH conceived of the original empirical study. ZNH and VK designed the experiment. ZNH conducted the experiment. VK conducted the statistical analyses reported here and wrote the code. ZNH reviewed the code for errors. ZNH and VK both contributed to writing the paper, revisions and responding to reviewers.
Corresponding author
Ethics declarations
Conflicts of interest/competing interests
The authors have no relevant financial or non-financial competing interests to disclose.
Consent to participate
Participants in the empirical study provided written consent to participate in the study.
Consent for publication
Participants in the empirical study provided written consent for publication of their deidentified data.
Ethics approval
The empirical study analyzed here was approved by the University of Oregon Institutional Review Board. The study was performed in accordance with the ethical standard as laid down in the 1964 Declaration of Helsinki and its later amendments.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Houghton, Z.N., Kapatsinski, V. Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression. Behav Res (2023). https://doi.org/10.3758/s13428-023-02287-y
Accepted:
Published:
DOI: https://doi.org/10.3758/s13428-023-02287-y