ANOVA
Abstract
In Chap. 3, we examined how to compare the means of two groups. In this chapter, we will examine how to compare means of more than two groups.
What You Will Learn in This Chapter
In Chap. 3, we examined how to compare the means of two groups. In this chapter, we will examine how to compare means of more than two groups.
6.1 OneWay Independent Measures ANOVA
In principle we could compute three ttests to compare all possible pairs of means (equator vs 49, equator vs 60, and 49 vs 60). However in this case, as shown in Chap. 5, we would face the multiple testing problem with the unpleasant side effect of increasing our Type I error rate as the number of comparisons increases. Situations like this are a case for an analysis of variance (ANOVA), which uses a clever trick to avoid the multiple testing problem.
6.2 Logic of the ANOVA
Terms

way = factor

group = treatment = level
The logic of the ANOVA is simple. We simplify our alternative hypothesis by asking whether or not at least one of the tree populations is larger than the others. Hence, we are stating one hypothesis instead of three by lumping all alternative hypotheses together:
The ANOVA assumes, similarly to the ttest, that all groups have the same population variances σ. If the null hypothesis is true, then the population means are equal for all three groups of trees. Any observed differences in the sample means come from the variance σ alone, which is due to random differences in tree heights (noise), but not to systematic differences in tree heights with geographic region (Fig. 6.1). It turns out that when the null hypothesis is true, the variability between means can be used to estimate σ (by multiplying by the sample sizes). An ANOVA compares this between means estimate to a direct estimate that is computed within each group.
where k is the number of groups (three tree populations), n_{j} is the number of scores in group j (the number of trees within each sampled geographic region), M_{j} is the mean for group j (mean of geographic region sample j), M_{G} is the grand mean of all scores pooled together, and x_{ij} is the ith score for group j (the height of a single tree). To make it easier to distinguish the means from individual scores we use the symbols M_{j} and M_{G} rather than the traditional symbol for a sample mean \(\bar {x}\). The multiplication by n_{j} in the numerator weights the deviations of the group means around the grand mean by the number of trees in each group so that the numbers of scores contributing to the variance estimates are equated between the numerator and denominator.
Just as in the ttest, a criterion is chosen for statistical significance to set the Type I error rate to a desired rate (e.g., α = 0.05). When F exceeds the criterion, we conclude that there is a significant difference (i.e., we reject the null hypothesis of equality between the group means).
The tree example is a oneway ANOVA, where there is one factor (tree location) with three groups (regions) within the factor. The groups are also called levels and the factors are also called ways. There can be as many levels as you wish within a factor, e.g. many more regions, from which to sample trees. A special case is a oneway independent measures ANOVA with two levels, which compares two means as does the ttest. In fact, there is a close relationship between the two tests and in this case it holds that: F = t^{2}. The pvalue here will be the same for the ANOVA and the twotailed ttest. Hence, the ANOVA is a generalization of the ttest.
As with the ttest, the degrees of freedom play an important role in computing the pvalue. For a oneway independent measures ANOVA with k levels, there are two types of degrees of freedom df_{1} and df_{2}, respectively. In general, df_{1} = k − 1 and df_{2} = n − k where n is the total number of sampled scores pooled over all groups, e.g., all trees in the three groups. The total of the degrees of freedom is df_{1} + df_{2} = n − 1.
6.3 What the ANOVA Does and Does Not Tell You: PostHoc Tests
Here, the ANOVA offers a second trick. If we rejected the null hypothesis, it is appropriate to compare pairs of means with what are called “posthoc tests,” which, roughly speaking, corresponds to computing pairwise comparisons. Contrary to the multiple testing situations discussed in Chap. 5, these multiple comparisons do not inflate the Type I error rate because they are only conducted if the ANOVA finds a main effect.
There are many posthoc tests in the statistical literature. Commonly used posthoc tests include: Scheffé, Tukey, and REGWQ. The process is best described with an example, which is provided at the end of this chapter.
6.4 Assumptions
 1.
Independent samples.
 2.
Gaussian distributed populations.
 3.
The independent variable is discrete, while the dependent variable is continuous.
 4.
Homogeneity of variance: All groups have the same variance.
 5.
The sample size needs to be determined before the experiment.
6.5 Example Calculations for a OneWay Independent Measures ANOVA
6.5.1 Computation of the ANOVA
Our final^{1}Fvalue is 9.14. This means that the variability of the group means around the grand mean is 9.14 times the variability of the data points around their individual group means. Hence, much of the variability comes from differences in the means, much less comes from variability within each population. An Fvalue of 9.14 leads to a pvalue of 0.0039 < 0.05 and we conclude that our results are significant, i.e., we reject the null hypothesis that all three sword types yield equal mean numbers of wins (F(2, 12) = 9.14, p = 0.0039). Furthermore, we can conclude that at least one sword type yields a different number of wins than the other sword types. We can now use one of the various posthoc tests to find out which sword(s) is/are superior.
6.5.2 PostHoc Tests
Various procedures exist for performing posthoc tests, but we will focus here on the Scheffé test in order to illustrate some general principles.
Posthoc Scheffé test results for our three comparisons
Comparison  Result 

1 vs. 2^{a}  F(2,12) = 0.33, p = 0.728 
1 vs. 3^{b}  F(2,12) = 5.22, p = 0.023 
2 vs. 3^{c}  F(2,12) = 8.16, p = 0.006 
6.6 Effect Size
Effect size guidelines according to Cohen
Small  Medium  Large  

Effect size  0.01  0.09  0.25 
6.7 TwoWay Independent Measures ANOVA
The oneway independent measures ANOVA generalizes nicely to cases with more than one factor. Here, we will discuss the simplest of such cases, the twofactor design.
 1.
H_{0}: There is no effect of time of day on the number of villains caught.
H_{1}: The number of villains caught during the day are different from the number of villains caught at night.
 2.
H_{0}: There is no effect of costume material on the number of villains caught.
H_{1}: At least one costume material yields different numbers of villains caught than the other costume materials.
 3.
H_{0}: The effect of time of day on the number of villains caught does not depend on costume material.
H_{1}: The effect of time of day on the number of villains caught does depend on costume material.
The first two null hypotheses relate to what are called main effects. The two main hypotheses are exactly the same as computing two oneway ANOVAs. The third hypothesis is a new type of hypothesis and pertains to the interaction between the two factors, costume and day time. To measure the main effect of costume material, we take the average number of villains caught in the spandex group, averaging over both day and night conditions, and compare this with the same averages for the cotton and leather costume conditions. To measure the main effect of time of day, we look at the average number of villains caught for the day condition, averaging over the spandex, cotton, and leather costume material conditions, and compare this with the same average for the night condition.
For the interaction, we consider all groups separately, looking at the number of villains caught for spandex, cotton and leather costume groups separately as a function of day and nighttime crimefighting conditions. If there is a significant interaction, then the effects of time of day on the number of villains caught will depend on which costume material we are looking at. Conversely, the effect of costume material on the number of villains caught will depend on which time of day our friends are fighting crime at.
Testing these three null hypotheses requires three separate Fstatistics. Each Fstatistic will use the same denominator as in the oneway ANOVA (i.e., the pooled variance of the data about the treatment means, or MS_{within} as shown in Fig. 6.3), but the numerators (MS_{between}) will be specific for the particular hypotheses tested.
Another virtue of a twofactor design relative to a onefactor design is that variability that would otherwise be included in the error term (i.e., MS_{within}) is now partly explained by variability due to another factor, thereby reducing MS_{within} and increasing the power for detecting effects when present.
Thus, it may seem that the more factors we add the better we will understand the data and obtain significant results. However, this is not true because we lose power for each factor we are adding due to the fact that we have fewer scores contributing to each mean. Typically, larger samples are needed when the number of factors increases.
Importantly, if we find a significant interaction, the main effect varies depending on the other factor. Thus, we should usually refrain from making conclusions about the main effect if there is an interaction.
The oneway ANOVA avoids the multiple testing problem. However, a multiway ANOVA reintroduces a kind of multiple testing problem. For example, consider a 2 × 2 ANOVA, with a significance criterion of 0.05. A truly null data set (where all four population means are equal to each other) has a 14% chance of producing at least one p < 0.05 among the two main effects and the interaction. If you use ANOVA to explore your data set by identifying significant results, you should understand that such an approach has a higher Type I error rate than you might have intended.
Typical statistical software outputs for a twoway ANOVA
Source  SS  df  MS  F  p  η ^{2} 

Costume material  1.67  2  0.83  0.083  0.920  0.0069 
Time of day  0.83  1  0.83  0.083  0.775  0.0035 
Costume × time  451.67  2  225.83  22.58  0.000003  0.6530 
Error  240.00  24  10.00 
6.8 Repeated Measures ANOVA
Typical statistical software outputs for a repeated measures ANOVA
Source  SS  df  MS  F  p  η ^{2} 

Between times  70  2  35  70  0.00000009  0.94 
Within times  40  12  
Between subjects  36  4  
Error  110  14 
Take Home Messages
 1.
With an ANOVA you can avoid the multiple testing problem—to some extent.
 2.
More factors may improve or deteriorate power.
Footnotes
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons AttributionNonCommercial 4.0 International License (http://creativecommons.org/licenses/bync/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.