Skip to main content

Linear Models

  • Chapter
  • First Online:
A Tiny Handbook of R

Part of the book series: SpringerBriefs in Statistics ((BRIEFSSTATIST))

  • 6693 Accesses

Abstract

Fitting general linear models in R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Wilkinson, G. N. and Rogers, C. E. (1973) Symbolic descriptions of factorial models for analysis of variance. Applied Statistics, 22, 392–9.

  2. 2.

    See: help(formula) .

  3. 3.

    Logical vectors are treated numerically, TRUE=>1 , FALSE=>0 . Character vectors are internally converted to factors.

  4. 4.

    For further examples see section: “Defining statistical models; formulae” in the manual “An Introduction to R” displayed by help.start().

  5. 5.

    See: help(swiss).

  6. 6.

    For normality see help(qqnorm) and transformations (Tukey’s “ladder of powers”). For unequal variances see help(levene.test, package=car) and the weights argument for lm , for example: weights=1/predict(fit). For outlier diagnostics see also: help(influence.measures) , help(cooks.distance) , help(dffits) , and help(dfbetas) .

  7. 7.

    R has several “generic” functions. Generic functions select more specialized functions, called “methods”, depending upon the class of object they are passed as argument. The aim is to provide a unified point of call to implement the generic meaning of the function for different kinds of data object. See help(UseMethod) and help(Methods). By convention a method’s name is made up of the generic function name followed by the class. For example the summary method for lm objects, (objects returned by the linear model function lm ), is called summary.lm. Similarly the summary method for aov objects, (returned by the analysis of variance function aov ), is called summary.aov. The help page for a generic function describes just the generic behaviour of the function. Usually you’ll want the help page for the particular method which describes the arguments and the object returned. For example help(summary) describes generic behaviour, but help(summary.lm) describes the function that is selected when an lm object is passed to summary.

  8. 8.

    Since drop1 tests each term as if it were the “last-term-in”, it is the same as type-III anova.

  9. 9.

    See also functions logLik and AIC.

  10. 10.

    A data frame of factors may be constructed to suit the design using functions such as gl , rep , and expand.grid. For example: expand.grid(wool=gl(2,1, 18), tension=gl(3,1)).

  11. 11.

    See also function levene.test in package car to test homogeneity of variance.

  12. 12.

    Swap the order of the arguments to transpose the table and the plot. If the data contain NA s these can be omitted from the calculation of cell means by passing na.rm=TRUE as an argument to mean via tapply. See the section: “Passing arguments to the mapped function”. The interaction.plot function does not have an argument to control NA directly, but this can be done by via an anonymous mean function. See the examples in: help(interaction.plot). See also function plotmeans in package gplots for cell means with error bars, and function intxplot in package HH for an interaction plot with error bars.

  13. 13.

    Function aov is essentially an ordinary least squares parameter estimator the same as function lm. The main difference is that the object returned is class aov , so that generic print and summary , respectively, select methods that display sums-of-squares, (rather than regression coefficients), and an ANOVA table of mean squares and F tests, (rather than a table of regression coefficients and t tests). See: help(aov) and help(summary.aov). The formula argument for aov also allows terms to be given within a function named Error to specify nested structures of residual errors, for repeated measures and split-plot designs. For further details see the section “Analysis of variance and model comparison” in the manual “An Introduction to R” which is displayed by the function help.start().

  14. 14.

    Function Anova , (names are case-sensitive so the upper-case A is important), is provided in package car. The :: operator can be used to access a function within a library without having to load the whole library. See help("::").

  15. 15.

    Type I: Terms of the same degree, (such as all the main effect terms, or all the second-order product terms), are assessed sequentially in the order that they appear in the model. Higher-order terms are always assessed after lower-order terms. For example interactions are always assessed after their marginal main effects. Type II: Terms of the same degree are assessed after accounting for all other terms of the same degree, as if each were the last term of that degree in the model. Higher-order terms are assessed after lower-order terms. Type III: Each term is assessed after accounting for all other terms in the model, irrespective of degree.

  16. 16.

    The object returned by anova.lm is a single ANOVA table. The object returned by summary.aov is a list of ANOVA tables, one for each “error stratum”.

  17. 17.

    Partial eta squared is the effect size of each effect derived from the sums-of-squares as SS/(SS+SS.error). Partial omega squared is the estimated population effect size of each effect, derived from variances rather than sums-of-squares.

  18. 18.

    See help(p.adjust) for alternative methods of adjusting for the family-wise error rate.

  19. 19.

    The glht function has methods for lmer (linear mixed-effects) fits providing a route to post-hoc tests for repeated measures designs.

  20. 20.

    Comparisons between all pairs of groups cannot be tested because only k−1 dummy variables are linearly independent. More than this leads to “aliasing”, where some contrasts are linearly dependent upon others and a unique solution for all the parameters does not exist.

  21. 21.

    Lower order terms are said to be “marginal” to an interaction that contains them, and should be present in the model. See: Nelder, J.A. (1977) “A reformulation of linear models”. Journal of the Royal Statistical Society, Series A, 140, 48–77. Nelder, J.A. (1994) “The statistics of linear models: back to basics”. Statistics and Computing 4, 221–234.

  22. 22.

    Treatment contrasts are not proper “contrasts” since the coefficients do not sum to 0. Neither are they orthogonal. Nevertheless they are the default coding for dummy variables in R, mainly because they contrive comparisons that are relatively easy to interpret, even with unbalanced layouts.

  23. 23.

    The p-values for corresponding 1-df tests are not the same because the default contrasts are not orthogonal and the partition sums-of-squares are not additive. Corresponding p-values are the same if orthogonal contrasts such as Helmert are used instead.

  24. 24.

    Several convenience functions are provided to generate contrast matrices for commonly used contrasts. See help(contr.treatment). These include: contr.treatment , for treatment contrasts, aka simple or dummy coding, (the default for unordered factors in R). The intercept is the mean of the reference cell, and other regression coefficients represent mean differences with respect to the reference cell. contr.sum , for sum contrasts, aka deviation or effects coding, (the default in SPSS). The intercept is the grand mean, and other regression coefficients represent deviations from the grand mean. contr.SAS , for contrasts like the default in SAS. Essentially like treatment contrasts but using the last level of factors to define the reference cell. contr.helmert , for Helmert contrasts, aka contrast coding, (the default in S-Plus). Helmert contrasts in R are defined differently from Helmert contrasts in SAS or SPSS. There each level of the factor is compared with the mean of the succeeding levels. The R Helmert contrasts are more like “reverse” Helmert contrasts where each level of the factor is compared with the mean of the preceding levels. contr.poly , for polynomial contrasts (for example for trend analysis, the default for ordered factors in R). See also contr.sdif in the MASS library.

  25. 25.

    See help(options) and options("contrasts").

  26. 26.

    If you provide less than k−1 contrasts for a k -level factor these functions will add orthogonal “filler” contrasts to make up a “complete comparison”. See also: make.contrasts in package gmodels , and also function estimable in package gmodels and function glht in package multcomp.

  27. 27.

    Two contrasts are orthogonal if their sum-of-products (the inner product) is 0. The matrix cross-product contains the sum-of-products of all pairs of column vectors. The off-diagonal elements are the sum-of-products of pairs of different contrasts.

  28. 28.

    It is not possible to assign contrasts directly to an interaction because contrasts for interactions are derived internally from the products of the component dummy variables. So it is necessary to create a new factor to represent the interaction and assign the contrasts to that factor. Assign the contrasts using the contrasts function in order to obtain an orthogonal “complete” comparison. Here there are three contrasts of interest, but the contrasts function automatically adds a further two orthogonal “filler” contrasts.

  29. 29.

    This approach is appropriate when the group variances are reasonably similar. It is used here for illustration although the variances are not very similar.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mike Allerhand .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Allerhand, M. (2011). Linear Models. In: A Tiny Handbook of R. SpringerBriefs in Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17980-8_5

Download citation

Publish with us

Policies and ethics