Advertisement

Symbolic Formulae for Linear Mixed Models

  • Emi TanakaEmail author
  • Francis K. C. Hui
Conference paper
  • 95 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1150)

Abstract

A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g. agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models, incorporate a combination of fixed and random effects, with LMMs being a special case. The inclusion of random effects in particular gives LMMs considerable flexibility in accounting for many types of complex correlated structures often found in data. This flexibility, however, has given rise to a number of ways by which an end-user can specify the precise form of the LMM that they wish to fit in statistical software. In this paper, we review the software design for specification of the LMM (and its special case, the linear model), focusing in particular on the use of high-level symbolic model formulae and two popular but contrasting R-packages in lme4 and asreml.

Keywords

Multi-level model Hierarchical model Model specification Model formulae Model API Fixed effects Random effects 

Notes

Acknowledgement

This paper benefited from twitter conversation with Thomas Lumley. This paper is made using R Markdown (Xie et al. 2018). Huge thanks goes to the teams behind lme4 and asreml R-packages that make fitting of general LMMs accessible to wider audiences. All materials used to produce this paper and its history of changes can be found on github https://github.com/emitanaka/paper-symlmm.

References

  1. Aitkin, M., Dorothy, A., Francis, B., Hinde, J.: Statistical Modelling in GLIM. Oxford University Press, Oxford (1989)zbMATHGoogle Scholar
  2. Bates, D., Machler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1) (2015).  https://doi.org/10.18637/jss.v067.i01
  3. Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)Google Scholar
  4. Butler, D.G., Cullis, B.R., Gilmour, A.R., Gogel, B.J.: Mixed models for s language environments ASReml-R reference manual (2009)Google Scholar
  5. Butler, D.G., Gogel, B.J., Cullis, B.R., Thompson, R.: Navigating from ASReml-R version 3 to 4 (2018)Google Scholar
  6. Bürkner, P.-C.: brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)CrossRefGoogle Scholar
  7. Bürkner, P.-C.: Advanced Bayesian multilevel modeling with the R package brms. R J. 10(1), 395–411 (2018).  https://doi.org/10.32614/RJ-2018-017CrossRefGoogle Scholar
  8. CAIGE: Caige project (2016). http://www.caigeproject.org.au
  9. Chambers, J.M., Hastie, T.: Statistical models in S. Wadsworth & Brooks/Cole Computer Science Series. Wadsworth & Brooks/Cole Advanced Books & Software (1992). ISBN 9780534167646. http://books.google.fr/books?id=uyfvAAAAMAAJ
  10. Crowder, M., Hand, D.: Analysis of Repeated Measures. Chapman and Hall, London (1990). http://www.python.orgzbMATHGoogle Scholar
  11. Csárdi, G.: cranlogs: download logs from the ‘RStudio’ ‘CRAN’ mirror (2019). https://CRAN.R-project.org/package=cranlogs. R package version 2.1.1
  12. Cullis, B.R., Smith, A.B., Coombes, N.E.: On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4), 381–393 (2006).  https://doi.org/10.1198/108571106X154443. ISSN 1085–7117CrossRefGoogle Scholar
  13. Gilmour, A.R., Cullis, B.R., Verbyla, A.P.: Accounting for natural and extraneous variation in the analysis of field experiments. J. Agric. Biol. Environ. Stat. 2(3), 269–293 (1997).  https://doi.org/10.2307/1400446MathSciNetCrossRefGoogle Scholar
  14. Gilmour, A.R., Gogel, B.J., Cullis, B.R., Thompson, R.: ASReml user guide release 3.0 (2009)Google Scholar
  15. Kuhn, M.: parsnip: a common API to modeling and analysis functions (2018). https://topepo.github.io/parsnip. R package version 0.0.0.9003
  16. Mrode, R.A.: Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Wallingford (2014).  https://doi.org/10.1017/CBO9781107415324.004. ISBN 1780643918, 9781780643915CrossRefGoogle Scholar
  17. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  18. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., R Core Team: Nlme: linear and nonlinear mixed effects models (2019). https://CRAN.R-project.org/package=nlme. R package version 3.1-140
  19. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/
  20. Ryan, T.A., Joiner, B.L., Ryan, B.F.: The Minitab Student Handbook. Duxbury Press, London (1976)Google Scholar
  21. Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)Google Scholar
  22. Smith, N.J., et al.: pydata/patsy: v0.5.1, October 2018.  https://doi.org/10.5281/zenodo.1472929
  23. Stan Development Team: RStan: the R interface to Stan (2019). http://mc-stan.org/. R package version 2.19.2
  24. Van Rossum, G., Drake Jr, F.L.: Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands (1995). http://www.python.org
  25. Vazquez, A.I., Bates, D.M., Rosa, G.J.M., Gianola, D., Weigel, K.A.: Technical note: an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 88, 497–504 (2010)CrossRefGoogle Scholar
  26. VSN International: Genstat for Windows 19th Edition. VSN International, Hemel Hempstead, UK (2017). Genstat.co.uk
  27. Welham, S.J., Gezan, S.A., Clark, S.J., Mead, A.: Statistical Methods in Biology: Design and Analysis of Experiments and Regression. Chapman and Hall, London (2015)Google Scholar
  28. Wickham, H., FranÃğois, R., Henry, L., MÃijller, K.: dplyr: a grammar of data manipulation (2019). https://CRAN.R-project.org/package=dplyr. R package version 0.8.3
  29. Wilkinson, G.N., Rogers, C.E.: Symbolic description of factorial models for analysis of variance. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 22(3), 392–399 (1973)Google Scholar
  30. Wright, K.: agridat: agricultural datasets (2018). https://CRAN.R-project.org/package=agridat. R package version 1.16
  31. Xie, Y., Allaire, J.J., Grolemund, G.: R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton (2018). ISBN 9781138359338. https://bookdown.org/yihui/rmarkdownCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.The University of SydneyCamperdownAustralia
  2. 2.Monash UniversityClaytonAustralia
  3. 3.Australian National UniversityActonAustralia

Personalised recommendations