Mixed-Effects Regression Modeling

Schäfer, Roland

doi:10.1007/978-3-030-46216-1_22

Roland Schäfer³

1898 Accesses

Abstract

In this chapter, mixed-effects regression modeling is introduced, mostly using alternation modeling as an example. It is one option to deal with cases where observations vary by groups (such as speakers, registers, lemmas) by introducing so-called random effects into the model specification. It is stressed that using a categorical variable as a random effect is just an alternative to using it as a normal fixed effect in a Generalised Linear Model (GLM) as introduced in Chap. 21, but that the two options have different mathematical advantages and disadvantages. Simple random intercepts are introduced, which capture per-group tendencies. However, random slopes (for situations where fixed effects vary per group) and multilevel models (for situations where group-wise tendencies can be predicted from other variables, for example when lemma frequency is useful to predict lemma-specific tendencies) are also introduced. Criteria for including random effects in models and for evaluating the model fit (for example through pseudo-coefficients of determination) are discussed. The demonstration in R uses the popular lme4 package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Trivially, grouping factors should never be ordinal variables. They are always categorical. Terminologically, the “groups” are the collections of observations, and each such group corresponds to the “levels” of the categorical grouping factor.
2.
The fixed effects discussed so far, which are interpreted at the level of observations, are consequently called “first-level fixed effects” or “first-level predictors”.
3.
Choosing one dummy as a reference level is necessary because otherwise, infinitely many equivalent estimates of the model coefficients exist, as one could simply add any arbitrary constant to the intercept and shift the other coefficients accordingly. However, the estimator works under the assumption that there is a unique maximum likelihood estimate. This extends to any other appropriate coding for categorical variables.
4.
There is one other practical difference. If models are used to make actual predictions (which is rarely the case in linguistics), a random effect allows one to make predictions for unseen groups. See Gelman and Hill (2007, 272–275).
5.
Shrinkage is thus stronger (and the conditional mode/mean is closer to 0) if there is less evidence that a group deviates from the overall tendency. The lower the number of observations per group, the less evidence there is.
6.
Again, we do not assume them to be fixed population parameters, which would be the case for true estimates such as fixed effects coefficients.
7.
There are, of course, elegant ways of pulling the frequency values from another data frame on the fly in R.
8.
The variance-covariance matrix of glmm.01 can also be extracted directly using the VarCorr( glmm.01) command.
9.
Since the bootstrap (especially with smaller original sample sizes) tends to run into replications where the estimation of the variance fails and is thus returned as 0, the bootstrap interval is sometimes skewed towards 0 when the profile confidence interval frames the true value symmetrically. The bootstrap is thus not always more robust or intrinsically better. Comparing both methods is recommended.
10.
Again, the accompanying script contains all necessary code.
11.
This entails that GLMMs with only one simple random effect cannot be compared with a model without it, as such a model would be a GLM and not a nested GLMM.
12.
Notice that the results reported in the paper differ slightly from the sample script included with this chapter because the random number generator was in a different state.

References

Bates, D. M. (2010). Lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf.
Google Scholar
Bates, D., Kliegl, R., Vasishth, S., & Baayen, R. (2015a). Parsimonious mixed models. https://arxiv.org/abs/1506.04967.
Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015b). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01.
Article Google Scholar
Biber, D., Finegan, E., & Atkinson, D. (1994). Archer and its challenges: Compiling and exploring a representative corpus of historical english registers. In U. Fries, P. Schneider, & G. Tottie (Eds.), Creating and using english language corpora (pp. 1–13). Amsterdam: Rodopi.
Google Scholar
Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Journal of the American Statistical Association, 87, 178–183. https://doi.org/10.2307/2290467.
Article Google Scholar
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.
Google Scholar
Halekoh, U., & Højsgaard, S. (2014). A Kenward-Roger approximation and parametric bootstrap methods for tests in linear mixed models – the R package pbkrtest. Journal of Statistical Software, 59(9), 1–30. https://doi.org/10.18637/jss.v059.i09.
Article Google Scholar
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001.
Article Google Scholar
Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142. https://doi.org/10.1111/j.2041-210x.2012.00261.x.
Article Google Scholar
Schäfer, R. (2018). Abstractions and exemplars: The measure noun phrase alternation in German. Cognitive Linguistics, 29(4), 729–771. https://doi.org/10.1515/cog-2017-0050.
Article Google Scholar
Schäfer, R., Barbaresi, A., & Bildhauer, F. (2013). The good, the bad, and the hazy: Design decisions in web corpus construction. In S. Evert, E. Stemle, & P. Rayson (Eds.), Proceedings of the 8th Web as Corpus Workshop (WAC-8) (pp. 7–15). Lancaster: SIGWAC.
Google Scholar
Schäfer, R., & Bildhauer, F. (2012). Building large corpora from the web using a new efficient tool chain. In N. C. (Chair), K. Choukri, T. Declerck, M. U. Doan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (pp. 486–493). Istanbul: European Language Resources Association (ELRA).
Google Scholar
Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support: Overconfident estimates in mixed models. Behavioral Ecology, 20(2), 416–420. https://doi.org/10.1093/beheco/arn145.
Article Google Scholar
Zuur, A. F., Ieno, E. N., & Elphick, C. S. (2010). A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution, 1(1), 3–14. https://doi.org/10.1111/j.2041-210x.2009.00001.x.
Article Google Scholar
Zuur, A. F., Ieno, E. N., Walker, N., Saveliev, A. A., & Smith, G. M. (2009). Mixed effects models and extensions in ecology with R. Berlin: Springer.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of German Studies and Linguistics, Humboldt-Universität zu Berlin, Berlin, Germany
Roland Schäfer

Authors

Roland Schäfer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roland Schäfer .

Editor information

Editors and Affiliations

FNRS Centre for English Corpus Linguistics, Language and Communication Institute, UCLouvain, Louvain-la-Neuve, Belgium
Magali Paquot
Department of Linguistics, University of California, Santa Barbara, CA, USA
Stefan Th. Gries

1 Electronic Supplementary Materials

22_Schäfer (ZIP 63 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schäfer, R. (2020). Mixed-Effects Regression Modeling. In: Paquot, M., Gries, S.T. (eds) A Practical Handbook of Corpus Linguistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46216-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-46216-1_22
Published: 05 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46215-4
Online ISBN: 978-3-030-46216-1
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)

Publish with us

Policies and ethics