Abstract
A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g. agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models, incorporate a combination of fixed and random effects, with LMMs being a special case. The inclusion of random effects in particular gives LMMs considerable flexibility in accounting for many types of complex correlated structures often found in data. This flexibility, however, has given rise to a number of ways by which an end-user can specify the precise form of the LMM that they wish to fit in statistical software. In this paper, we review the software design for specification of the LMM (and its special case, the linear model), focusing in particular on the use of high-level symbolic model formulae and two popular but contrasting R-packages in lme4 and asreml.
Supported by R Consortium.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aitkin, M., Dorothy, A., Francis, B., Hinde, J.: Statistical Modelling in GLIM. Oxford University Press, Oxford (1989)
Bates, D., Machler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1) (2015). https://doi.org/10.18637/jss.v067.i01
Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)
Butler, D.G., Cullis, B.R., Gilmour, A.R., Gogel, B.J.: Mixed models for s language environments ASReml-R reference manual (2009)
Butler, D.G., Gogel, B.J., Cullis, B.R., Thompson, R.: Navigating from ASReml-R version 3 to 4 (2018)
Bürkner, P.-C.: brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)
Bürkner, P.-C.: Advanced Bayesian multilevel modeling with the R package brms. R J. 10(1), 395–411 (2018). https://doi.org/10.32614/RJ-2018-017
CAIGE: Caige project (2016). http://www.caigeproject.org.au
Chambers, J.M., Hastie, T.: Statistical models in S. Wadsworth & Brooks/Cole Computer Science Series. Wadsworth & Brooks/Cole Advanced Books & Software (1992). ISBN 9780534167646. http://books.google.fr/books?id=uyfvAAAAMAAJ
Crowder, M., Hand, D.: Analysis of Repeated Measures. Chapman and Hall, London (1990). http://www.python.org
Csárdi, G.: cranlogs: download logs from the ‘RStudio’ ‘CRAN’ mirror (2019). https://CRAN.R-project.org/package=cranlogs. R package version 2.1.1
Cullis, B.R., Smith, A.B., Coombes, N.E.: On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4), 381–393 (2006). https://doi.org/10.1198/108571106X154443. ISSN 1085–7117
Gilmour, A.R., Cullis, B.R., Verbyla, A.P.: Accounting for natural and extraneous variation in the analysis of field experiments. J. Agric. Biol. Environ. Stat. 2(3), 269–293 (1997). https://doi.org/10.2307/1400446
Gilmour, A.R., Gogel, B.J., Cullis, B.R., Thompson, R.: ASReml user guide release 3.0 (2009)
Kuhn, M.: parsnip: a common API to modeling and analysis functions (2018). https://topepo.github.io/parsnip. R package version 0.0.0.9003
Mrode, R.A.: Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Wallingford (2014). https://doi.org/10.1017/CBO9781107415324.004. ISBN 1780643918, 9781780643915
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., R Core Team: Nlme: linear and nonlinear mixed effects models (2019). https://CRAN.R-project.org/package=nlme. R package version 3.1-140
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/
Ryan, T.A., Joiner, B.L., Ryan, B.F.: The Minitab Student Handbook. Duxbury Press, London (1976)
Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)
Smith, N.J., et al.: pydata/patsy: v0.5.1, October 2018. https://doi.org/10.5281/zenodo.1472929
Stan Development Team: RStan: the R interface to Stan (2019). http://mc-stan.org/. R package version 2.19.2
Van Rossum, G., Drake Jr, F.L.: Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands (1995). http://www.python.org
Vazquez, A.I., Bates, D.M., Rosa, G.J.M., Gianola, D., Weigel, K.A.: Technical note: an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 88, 497–504 (2010)
VSN International: Genstat for Windows 19th Edition. VSN International, Hemel Hempstead, UK (2017). Genstat.co.uk
Welham, S.J., Gezan, S.A., Clark, S.J., Mead, A.: Statistical Methods in Biology: Design and Analysis of Experiments and Regression. Chapman and Hall, London (2015)
Wickham, H., FranÃğois, R., Henry, L., MÃijller, K.: dplyr: a grammar of data manipulation (2019). https://CRAN.R-project.org/package=dplyr. R package version 0.8.3
Wilkinson, G.N., Rogers, C.E.: Symbolic description of factorial models for analysis of variance. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 22(3), 392–399 (1973)
Wright, K.: agridat: agricultural datasets (2018). https://CRAN.R-project.org/package=agridat. R package version 1.16
Xie, Y., Allaire, J.J., Grolemund, G.: R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton (2018). ISBN 9781138359338. https://bookdown.org/yihui/rmarkdown
Acknowledgement
This paper benefited from twitter conversation with Thomas Lumley. This paper is made using R Markdown (Xie et al. 2018). Huge thanks goes to the teams behind lme4 and asreml R-packages that make fitting of general LMMs accessible to wider audiences. All materials used to produce this paper and its history of changes can be found on github https://github.com/emitanaka/paper-symlmm.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tanaka, E., Hui, F.K.C. (2019). Symbolic Formulae for Linear Mixed Models. In: Nguyen, H. (eds) Statistics and Data Science. RSSDS 2019. Communications in Computer and Information Science, vol 1150. Springer, Singapore. https://doi.org/10.1007/978-981-15-1960-4_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-1960-4_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1959-8
Online ISBN: 978-981-15-1960-4
eBook Packages: Computer ScienceComputer Science (R0)