Skip to main content

Symbolic Formulae for Linear Mixed Models

  • Conference paper
  • First Online:
Statistics and Data Science (RSSDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1150))

Included in the following conference series:

Abstract

A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g. agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models, incorporate a combination of fixed and random effects, with LMMs being a special case. The inclusion of random effects in particular gives LMMs considerable flexibility in accounting for many types of complex correlated structures often found in data. This flexibility, however, has given rise to a number of ways by which an end-user can specify the precise form of the LMM that they wish to fit in statistical software. In this paper, we review the software design for specification of the LMM (and its special case, the linear model), focusing in particular on the use of high-level symbolic model formulae and two popular but contrasting R-packages in lme4 and asreml.

Supported by R Consortium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Aitkin, M., Dorothy, A., Francis, B., Hinde, J.: Statistical Modelling in GLIM. Oxford University Press, Oxford (1989)

    MATH  Google Scholar 

  • Bates, D., Machler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1) (2015). https://doi.org/10.18637/jss.v067.i01

  • Buitinck, L., et al.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)

    Google Scholar 

  • Butler, D.G., Cullis, B.R., Gilmour, A.R., Gogel, B.J.: Mixed models for s language environments ASReml-R reference manual (2009)

    Google Scholar 

  • Butler, D.G., Gogel, B.J., Cullis, B.R., Thompson, R.: Navigating from ASReml-R version 3 to 4 (2018)

    Google Scholar 

  • Bürkner, P.-C.: brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80(1), 1–28 (2017)

    Article  Google Scholar 

  • Bürkner, P.-C.: Advanced Bayesian multilevel modeling with the R package brms. R J. 10(1), 395–411 (2018). https://doi.org/10.32614/RJ-2018-017

    Article  Google Scholar 

  • CAIGE: Caige project (2016). http://www.caigeproject.org.au

  • Chambers, J.M., Hastie, T.: Statistical models in S. Wadsworth & Brooks/Cole Computer Science Series. Wadsworth & Brooks/Cole Advanced Books & Software (1992). ISBN 9780534167646. http://books.google.fr/books?id=uyfvAAAAMAAJ

  • Crowder, M., Hand, D.: Analysis of Repeated Measures. Chapman and Hall, London (1990). http://www.python.org

    MATH  Google Scholar 

  • Csárdi, G.: cranlogs: download logs from the ‘RStudio’ ‘CRAN’ mirror (2019). https://CRAN.R-project.org/package=cranlogs. R package version 2.1.1

  • Cullis, B.R., Smith, A.B., Coombes, N.E.: On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 11(4), 381–393 (2006). https://doi.org/10.1198/108571106X154443. ISSN 1085–7117

    Article  Google Scholar 

  • Gilmour, A.R., Cullis, B.R., Verbyla, A.P.: Accounting for natural and extraneous variation in the analysis of field experiments. J. Agric. Biol. Environ. Stat. 2(3), 269–293 (1997). https://doi.org/10.2307/1400446

    Article  MathSciNet  Google Scholar 

  • Gilmour, A.R., Gogel, B.J., Cullis, B.R., Thompson, R.: ASReml user guide release 3.0 (2009)

    Google Scholar 

  • Kuhn, M.: parsnip: a common API to modeling and analysis functions (2018). https://topepo.github.io/parsnip. R package version 0.0.0.9003

  • Mrode, R.A.: Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Wallingford (2014). https://doi.org/10.1017/CBO9781107415324.004. ISBN 1780643918, 9781780643915

    Book  Google Scholar 

  • Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  • Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., R Core Team: Nlme: linear and nonlinear mixed effects models (2019). https://CRAN.R-project.org/package=nlme. R package version 3.1-140

  • R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/

  • Ryan, T.A., Joiner, B.L., Ryan, B.F.: The Minitab Student Handbook. Duxbury Press, London (1976)

    Google Scholar 

  • Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: 9th Python in Science Conference (2010)

    Google Scholar 

  • Smith, N.J., et al.: pydata/patsy: v0.5.1, October 2018. https://doi.org/10.5281/zenodo.1472929

  • Stan Development Team: RStan: the R interface to Stan (2019). http://mc-stan.org/. R package version 2.19.2

  • Van Rossum, G., Drake Jr, F.L.: Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam, The Netherlands (1995). http://www.python.org

  • Vazquez, A.I., Bates, D.M., Rosa, G.J.M., Gianola, D., Weigel, K.A.: Technical note: an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 88, 497–504 (2010)

    Article  Google Scholar 

  • VSN International: Genstat for Windows 19th Edition. VSN International, Hemel Hempstead, UK (2017). Genstat.co.uk

  • Welham, S.J., Gezan, S.A., Clark, S.J., Mead, A.: Statistical Methods in Biology: Design and Analysis of Experiments and Regression. Chapman and Hall, London (2015)

    Google Scholar 

  • Wickham, H., FranÃğois, R., Henry, L., MÃijller, K.: dplyr: a grammar of data manipulation (2019). https://CRAN.R-project.org/package=dplyr. R package version 0.8.3

  • Wilkinson, G.N., Rogers, C.E.: Symbolic description of factorial models for analysis of variance. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 22(3), 392–399 (1973)

    Google Scholar 

  • Wright, K.: agridat: agricultural datasets (2018). https://CRAN.R-project.org/package=agridat. R package version 1.16

  • Xie, Y., Allaire, J.J., Grolemund, G.: R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton (2018). ISBN 9781138359338. https://bookdown.org/yihui/rmarkdown

    Book  Google Scholar 

Download references

Acknowledgement

This paper benefited from twitter conversation with Thomas Lumley. This paper is made using R Markdown (Xie et al. 2018). Huge thanks goes to the teams behind lme4 and asreml R-packages that make fitting of general LMMs accessible to wider audiences. All materials used to produce this paper and its history of changes can be found on github https://github.com/emitanaka/paper-symlmm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emi Tanaka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tanaka, E., Hui, F.K.C. (2019). Symbolic Formulae for Linear Mixed Models. In: Nguyen, H. (eds) Statistics and Data Science. RSSDS 2019. Communications in Computer and Information Science, vol 1150. Springer, Singapore. https://doi.org/10.1007/978-981-15-1960-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1960-4_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1959-8

  • Online ISBN: 978-981-15-1960-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics