ONE FACTOR ANOVA (EXAMPLES)

Tweet
Share

Watch this space

Detailed text explanation coming soon. In the meantime, enjoy our video.

The text below is a transcript of the video.



Connect with StatsExamples here



LINK TO SUMMARY SLIDE FOR VIDEO 2:


StatsExamples-ANOVA-one-factor-examples.pdf

TRANSCRIPT OF VIDEO:


Slide 1.

The one factor ANOVA is the most common technique to see if the means of more than two populations are different from each other. Let's look at some step-by-step examples of how this works.

Slide 2.

If the ANOVA is new to you, or you're not very familiar with it, check out our intro to the one factor ANOVA video first.

Slide 3.

►But now, a quick review.
The question we're interested in is whether any population means from a set of three or more populations differ from one another. Our approach to answering this question will be to perform an analysis of variance, an ANOVA.
The ANOVA is a homoscedastic test, it requires that all the populations have equal variance, so we do have to do a prerequisite test for the equality of the population variances. Assuming the data passes that test then we can do the ANOVA.
►We have conceptual hypotheses about the means which we really care about, but technically we test these with formal hypotheses about the mean sums among and within. Mean sums is another term for variances, which is why the ANOVA is an analysis of variance.
►We calculate the sums of squares, mean sums, and then perform a one-tailed F test to see if the mean sums among is significantly larger than the mean sums within, if MSA is larger than MSW.
►We summarize the results in an ANOVA table and use the p value to "reject" or "fail to reject" our null hypotheses. This tells use whether we have evidence that some of the population means differ or not.
If we fail to reject the null hypothesis then we lack evidence to decide that any means are different.
►If we reject the null hypothesis then we have evidence that at least one of the means differ, but it doesn't tell us which ones differ. We then have two choices to figure out which means differ - performing a series of Bonferroni correct t tests or calculating the minimum significant difference and comparing the Tukey-Cramer intervals based on this.
Let's quickly look at the details of these steps and then look at two examples.

Slide 4.

The first thing is to calculate the SST, SSA, and SSW values.
►For the SST, the sum of squares total, we calculate the sum of squares for all data values comparing the values to the overall mean.
►For the SSA, the sum of squares among, we calculate a sum of squares for the mean values of each group by comparing these values to the overall mean, then multiplying by group sample size, n.
►For the SSW, the sum of squares within, we calculate the sum of squares values separately for each of the k groups, comparing the values in each group to the mean for their group, then sum them.
These are the sums of squares, to get the variances we need to divide them by the appropriate degrees of freedom.

Slide 5.

Typically, we arrange our data as shown to the right, with a column for each of the K groups and a row for each of the n values within each group. Each group has the same number of values. Technically this isn't required, but the calculations get much more complicated with unequal group sizes. We'll use a capital N, which is also k times lowercase n, to represent the number of values in the entire data set.
►The degrees of freedom for the among group calculations is the number of groups minus 1, that's k minus 1.
►The degrees of freedom for the within group calculations is lowercase n minus one for each group, which is also the total number of data values minus the number of groups, capital N minus k.
►The degrees of freedom for the entire data set would then be capital N minus 1. ► MSA is then SSA divided by k minus 1.
► MSW is then SSW divided by capital N minus k.
►The F calculated value is MSA divided by MSW and we do a standard one-tailed F test using an alpha value of 0.05 to test for significance. You can watch our F test video for more about doing an F-test.

Slide 6.

As mentioned, the results are usually presented in an ANOVA table.
There are columns for the source of variance, degrees of freedom, sums of squares, means sums, the F calculated value, and the p value of the F test.
►The value of the p value indicates if any means are significantly different. If the p value is larger than 0.05 then there isn't evidence for any differences in the means and we're done. But if the p value is less than 0.05 then there is evidence for one or more differences in the means and there are two options for what to do next.

Slide 7.

After we've done the ANOVA itself, assuming it gives us a significant result, we have two options to figure out which means differ.
►Option 1 is to do Bonferroni corrected t-tests. We go back to the data sets and do all the pairwise t-tests, but with a smaller threshold alpha value of 0.05 divided by the number of t-tests.
►Option 2 is to calculate Tukey-Cramer comparison intervals, which are one-half the minimum significant difference, around each sample mean and look to see which intervals overlap and which don't. Non-overlapping intervals indicate differing means. ►The equation to get the value for the minimum significant difference, also known as the honest significant difference, uses a value Q from a statistical table combined with the square root of the MSW divided by the sample size for each group.

Slide 8.

For our first example, consider a study that collects 4 snakes from each of 3 different locations and measures their lengths (SVL) to see if the mean SVL in any locations differ from the others.
By the way, SVL stand for snout vent length which is from the tip of the nose to the ... "vent", not all the way to the end of the tail. This is a better measure of body size for many animals than total length because sometimes animals can lose pieces of their tails due to predators and we're usually interested in using the snout vent length to estimate the overall size due to physiology, not bad luck.
The data for our study is shown to the right, there is a column for each of the 3 sites, A, B, and C with the values for the 4 individuals listed in the rows.
We'll ask two questions - are any of the mean snout vent length values different and if so, which ones?

Slide 9.

Here's our nine-step plan.
First, we'll calculate overall mean and variance, group means and variances.
Then, we'll conduct Fmax test for equality of variances. Other better tests exist for this, but for this example we'll do the F max test since it's easy.
Then, we'll proceed with the ANOVA or transform data values in which case we would retry step 2.
Assuming the variances are equal, we'll calculate SST, SSA, and SSW. Then, we'll calculate MSA, MSW, and the F value.
Then, we'll assess the significance of our F value.
If p<0.05 there are significant differences and would do steps 7 & 8, the Bonferroni corrected t-tests and a comparison of the Tukey-Cramer comparison intervals.
Finally, we interpret the results and clearly present them.

Slide 10.

First up is calculating the means and variances for each of our 3 groups so we can do an Fmax test.
Note that the channel has videos about calculating these summary statistics and doing the F max test.
►The means and variance for each group are shown at the bottom of their columns and the overall mean value is 34.
►The largest variance is 4.667 and the smallest is 2.000 so the F max statistic is 4.667 divided by 2.000 equals 2.333 which gives us a p value larger than 0.05.

Slide 11.

The nonsignificant Fmax test result indicates equality of variances and we can do the ANOVA.

Slide 12.

Next we need to calculate SST, SSA, and SSW.
►The SST is the sum of squares using every value comparing them to 34. It's the sum shown, every value in the table minus 34, squared and summed up. When we do this we get a value of 60.
►The SSA is the sum of squares using the three group mean values and comparing each of them separately to 34. We also multiply this value by 4, the number of values in each group. This is 4 times the overall sum of 32 minus 34 squared, 34 minus 34 squared, and 36 minus 34 squared - which is 4 times the values four plus zero plus four - which is 4 times 8 to give 32.
►Lastly, the SSW is really the sum of three different sums of squares, one for each group.
The first is the sum of the squares for the first group, calculating the difference between each value, the 32, 34, 32, and 30 and the group mean of 32 and summing the squares to get 8.
The second is the sum of the squares for the second group, calculating the difference between each value and the group mean of 34 and summing thesquares to get 14.
Then the sum of the squares for the third group, calculating the difference between each value and the group mean of 36 and summing the squares to get 6.
Adding these up, 8 plus 14 plus 6 equals 28 for the SSW.
►Now that we've calculated these we should also check to make sure that the SSA and SSW add up to the SST. In this case 32 plus 28 equals 60 so it's looking good.

Slide 13.

Next are the MSA, MSW , and F values. Note that I've added the sums of squares values to the data table to the right.
►The MSA is the SSA divided by k minus 1. Remembering that k is the number of groups this is 3 minus one to give 2. Our MSA is therefore 32 divided by 2 to give 16.
►The MSW is the SSW divided by capital N minus K. Capital N is the total number of data values which is 12 and k is 3 so we get 9 for the denominator. Our MSW is therefore 28 divided by 9 to give 3.111.
►The F calculated value is 16 divided by 3.111 to give us 5.143.
►At this point we have most of the values for our ANOVA table and we can enter them. You can see how the MS column values come from dividing the two columns to their left and the F comes from dividing the top MS term by the lower one.

Slide 14.

Now we need to assess the significance of the F value.
►For this we go to our table of F critical values for alpha equals 0.05. This is a one-tailed test so we can use these values directly. From our ANOVA table we can see that the numerator of our F test has 2 degrees of freedom and the denominator has 9.
►Looking in the table we see that the critical F value for this F test will be 4.26. Since the 5.143 is larger than 4.26 we know that p is less than 0.05.

Slide 15.

In contrast, when we look at the critical values for an alpha value of 0.025 we get a critical value of 5.71 which 5.143 is smaller than so we know that p is larger than 0.025.

Slide 16.

From our tables we therefore know that the p value is between 0.025 and 0.05.
►If we use a computer, it can give us the precise value of 0.0324 for our ANOVA F test. So we have the result that one or more means differ, but we don't know which ones.

Slide 17.

The first way to determine which means differ is by doing Bonferonni corrected t-tests.
►With three groups there are 3 possible pairwise t-tests so 0.05 divided by 3 equals 0.018333 is the corrected critical value threshold. In other words, we would only reject the null hypothesis if the p value from our t-tests is less than this, not when it's less than 0.05.
►These are the results from doing 3 t-tests and we can see that while two of the comparisons show a nonsignificant difference, the comparison between the means for site A and site C results in a significant difference between the means where the mean of group C is larger.
Since we did the ANOVA first, we don't have to worry that because we did multiple t-tests the risk of type I error added up, the ANOVA and corrected tests ensures that the overall risk is the standard 5%

Slide 18.

The second way to determine which means differ is by calculating Tukey-Cramer comparison intervals.
►The equation for the minimum significant difference is shown here. We use a Q value from a table multiplied by the square root of the MSW divided by the number of values in each group. The Q value varies based on the desired alpha value, the number of groups, and the degrees of freedom associated with the MSW term.
►Looking at a Q table from the StatsExamples website for an alpha value of 0.05 we look in the column for 3 groups and the row for 9 degrees of freedom to get a Q value of 3.95.
►Our MSD then becomes 3.95 times the square root of 3.111 divided by 4 which gives us 3.4836 for our minimum significant difference.
►Adding and subtracting half of this value to each of our means would result in the comparison intervals shown. We can see that while the intervals for A and B overlap and the intervals for B and C overlap, the intervals for A and C do not overlap. This gives us the same conclusion as the Bonferroni corrected t-tests did - there is one significant difference and it's the one comparing sites A and C.
And from looking at the means we can see that the value for C is significantly larger than the value for A.

Slide 19.

Now let's put all this together and interpret the results.
►A concise way of describing everything goes as follows - a one-factor ANOVA indicates that the mean snout vent lengths values differ significantly between the 3 snake populations (F2,9=5.143; p=0.0324). Bonferroni corrected t-tests and the Tukey-Cramer comparison intervals indicate that the mean of population A is significantly smaller than the mean of population C.
We can see how the ANOVA table summarizes these numbers nicely.
►One thing to note is that when we publish or present data like this, we almost never show the Tukey-Cramer comparisons intervals themselves, it is much more common to show data with error bars indicating standard errors or confidence intervals.
This plot shows the 95% confidence intervals which do overlap even though the means are significantly different. Confidence intervals themselves can't be used to judge significance, only comparison intervals can do that.
To indicate the significant difference, we place square brackets with an asterisk as shown. The most common convention is to use a single asterisk for significance at the 5% level and two for significance at the 1% level, but the figure legend should always make that clear if you're looking at a published figure.

Slide 20.

For our second example, consider a study that tests 4 different diets for lab mice and has 5 mice in each treatment to see if the mean mass at maturity differs for any of the diets.
The data for our study is shown to the right, there is a column for each of the 4 diets, A, B, C, and D with the 5 individuals listed in the rows.
We'll ask two questions - are any of the mean mass at maturity values different and if so, which ones?

Slide 21.

First up is calculating the means and variances for each of our 4 groups so we can do the Fmax test.
I've added the means and variance for each group at the bottom of their columns and the overall mean value is 20.5.
The largest variance is 5.00 and the smallest is 2.000 so the F max statistic is 2.50 which gives us a p value larger than 0.05 and the nonsignificant Fmax test result tells us we can do the ANOVA.

Slide 22.

Now for the SST, SSA, and SSW.
►The SST is the sum of squares using every value comparing them to 20.5. It's the sum shown, every value in the table minus 20.5, squared and summed up. When we do this we get a value of 85.
►The SSA is the sum of squares using the 4 group mean values and comparing each of them to 20.5. We also multiply this value by 5, the number of values in each group. This is 5 times the sum of 19 minus 20.5 squared, 22 minus 20.5 squared, 20 minus 20.5 squared, and 21 minus 20.5 squared - which is 5 times 5 to give 25.
►Lastly, the SSW is really the sum of 4 different sums of squares, one for each group. We know that it should be 60 because 85 minus 25 is 60, but let's make sure.
The first is the sum of the squares for the first group, calculating the difference between each value, the 21, 16, 19, 20, and 19 and the group mean of 19 and summing the squares to get 14.
The second is the sum of the squares for the second group, calculating the difference between each value, the 20, 23, 21, 23, and 23, and the group mean of 22 and summing the squared differences to get 8.
The last two groups work the same way to give us 20 and 18.
Adding these up gives us 60 for the SSW, as we expected.

Slide 23.

Next are the MSA, MSW , and F values.
►The MSA is the SSA divided by k minus 1 which is the 4 groups minus one to give 3 and MSA is therefore 25 divided by 3 to give 8.333.
►The MSW is the SSW divided by the total number of data values minus the number of groups - 20 minus 4 equal 16. Our MSW is therefore 60 divided by 16 to give 3.750.
►The F calculated value is the 8.33 divided by the 3.750 to give us 2.222.
►Our ANOVA table looks like this. The next step is to determine the p value that corresponds to an F value of 2.222 with 3 degrees of freedom in the numerator and 16 in the denominator.

Slide 24.

From our StatsExamples table of F critical values for alpha equals 0.05 and using the column for 3 degrees of freedom and the row for 16 degrees of freedom we get a critical value of 3.24.
►Since the 2.222 is not as large as 3.24 we know that p is larger than 0.05.
►Using a computer will give us the precise p value of 0.1643 for our ANOVA F test.
Our F test fails to reject the null hypothesis and we lack the evidence to decide that any of the means are different.

Slide 25.

Even though we know our overall result is that none of the populations have different means, let's look at the Bonferonni corrected t-tests anyway.
►With 4 groups there are 6 possible pairwise t-tests so 0.05 divided by 6 equals 0.008333 is the corrected critical value threshold. We would only reject the null hypothesis if the p value for a t-test is less than 0.008333, not 0.05.
►These are the results from doing the 6 t-tests and we can see that none of them are significant when using our Bonferroni corrected threshold of 0.008333.
This is a nice example of why we do the ANOVA however, one of these t test p values is less than 0.05 even the ANOVA showed than there aren't any significant differences in the population means.

Slide 26.

If we had just done t-tests without the ANOVA, this would be a type I error. We would have thought that the low p value indicated a genuine difference between the means of populations A and B when that t-test result really arises from the increased chances of type I error from doing 6 comparisons.

Slide 27.

Now let's look at the Tukey-Cramer comparison intervals.
►The equation for the minimum significant difference is shown and looking at a Q table from the StatsExamples website for an alpha value of 0.05 in the column for 4 groups and the row for 16 degree of freedom we get a Q value of 4.05.
Our MSD is Q times the square root of the MSW divided by the group sample size which is 4.05 times the square root of 3.75 divided by 5 which gives us 3.5074.
►Adding and subtracting half of this value to each of our 4 group means gives us the comparison intervals shown. We can see that all four of the intervals overlap, none are separated from the others. Not a surprise since the ANOVA told us that none of the means are significantly different.

Slide 28.

Now the interpretation.
A one-factor ANOVA indicates that the mean sizes of the mice at maturity in the 4 diet groups do not differ significantly from one another (F3,16=2.222; p=0.1643).
The ANOVA table with its p value of 0.1643 makes the lack of any significant difference clear.
The bar chart is a little different. Since the 95% confidence intervals are shown we can't use the overlap directly to judge significant differences, but the lack of the bracket with an asterisk is a good sign that none of these means are significantly different.

Slide 29.

Before we finish, two more things.
First, it's helpful to visualize what we do during an ANOVA with this flowchart. We start at the upper left by asking - are the variances equal?
If not, then we need to transform the data into values that meet the equal variance assumption or use a different test like the Kruskal-Wallis test. That test is less powerful and more complicated however, so we generally prefer the ANOVA whenever possible.
On the other hand, if the variances are equal then we do the ANOVA and look at the p value for the F test.
If p is not less than 0.05, then we fail to reject the null hypothesis for the ANOVA and the means appear equal based on our data.
If p is less than 0.05, then we do reject the ANOVA null hypothesis and some of the means differ. We follow this up by doing Bonferroni corrected t-tests or making Tukey-Cramer comparisons to determine which means differ.

Slide 28.

In case you feel like you'd like some practice here is a data set with 5 groups, each of which has 6 values.
With what we’ve done in this video, and using the tables from the StatsExamples website, you should be able to answer the ANOVA questions listed.
What's the ANOVA table, what are the comparison intervals, and which means, if any differ from one another?
Go ahead and try this, at the very end of this video are the answers - you can pause the video there and see if you got it right.

Zoom out

The one sample t test is the first statistical test that most statistics students learn and can be overwhelming. Hopefully, the examples in this video will help you with doing your own one sample t-tests. As always, a high-resolution PDF of this image is available on the StasExamples website via the link below.

End screen

If you found this video useful - like, subscribe and share it so that others can find it too.




Connect with StatsExamples here


This information is intended for the greater good; please use statistics responsibly.