VARIANCE RATIO F TEST (EXAMPLES)

Tweet
Share

Watch this space

Detailed text explanation coming soon. In the meantime, enjoy our video.

The text below is a transcript of the video.



Connect with StatsExamples here



LINK TO SUMMARY SLIDE FOR VIDEO:


StatsExamples-f-test-examples.pdf

TRANSCRIPT OF VIDEO:


Slide 1.

The variance ratio F test is used to test whether two population variances are equal based on samples from each. Let's take a look at some examples of how to do this test.

Slide 2.

First let's review the steps in a variance ratio F test, I'm just going to call this an F test from now on.
You can watch our intro to the F test video for more detailed information, but here's a quick summary.
We want to know if population variances differ between two populations.
We can't measure the populations, that's almost always impractical.
Instead, we take random samples from the population and calculate sample variances.
Our null hypothesis for this test will be that the population variances are equal and the alternative will be that they are not equal.
If the data shows that the sample variances are similar, that's consistent with the null hypothesis being true.
If the data shows that the sample variances are very different from each other, that's what we expect to see if the population variances aren't the same.
Sampling error will always cause the sample variances to differ though, even if the population variances are equal, the question is how much?
So we ask the question: what are the chances the pop. variances are the same (i.e., the null hypothesis is true), based on how much the sample variances differ from one another?

Slide 3.

Here's the formal procedure.
First, we create our null and alternative hypotheses. For a two-tailed F test, just seeing if the variances are different, the null hypothesis is that the population variances are equal and the alternative hypothesis is that they are not equal.
Then we calculate our F calculated value, which is just the larger of the two variances divided by the smaller of the two variances.
Then we compare the F calculated value to F critical values from an F table.
Keep in mind that since this is a two-tailed test, we will be doubling the alpha values for the tables.
Then, we determine the P-value, the probability of seeing an F calculated value as large as we do.
Technically, this is the smallest alpha value we can choose and reject the null hypothesis, but a better way to think about it is that the P-value is the probability that sampling error alone could make the two sample variances are different as we see if the population variances are the same.
Then we decide to "reject the null hypothesis" or "fail to reject the null hypothesis" based on the P-value.
The null hypothesis is consistent with non-small p values.
On the other hand, if the alternative hypothesis is true, our data would usually give us small p values.
At this point we would return to our data and ask ourselves what it means that the population variances appear the same or appear different based on the sample. Some other statistical tests require equal variances, and this test tells us if they're likely to be good options. Or maybe, and this is less common, we genuinely care about the variances themselves.
Lastly, we should always keep in mind the risk of type one or type two error when doing any statistical test.

Slide 4.

For our first example, let's think about how big some meerkats are at two sites of interest. Consider two samples of meerkat lengths, measured to the nearest centimeter, with the values shown.
The questions we will ask are:
Are the variances of the populations at these sites different or not?
With what degree of confidence do we make this conclusion?
► The first step will be to calculate the sample variances. If you need a reminder of how to do this, see our video about calculating summary statistics. Then we calculate their ratio, putting the larger of the two values in the numerator.
► The variances we get are shown to the left. The variance for site 2 is larger than the variance for site 1 so we put that value in the numerator. The calculated F value is therefore 15.143 divided by 2.000 equals 7.571.
► Now we need the degrees of freedom for our two samples to determine our critical values. The degrees of freedom for each sample is the sample size minus one. We also need to keep track of which sample's variance went in the numerator and which went in the denominator, the tables of critical values use these differently.
In this case, since site 2 showed the highest variance, the degrees of freedom for the numerator is the 8 values at site 2 minus one gives 7.
The degrees of freedom for the denominator is the 10 values at site 1 minus one gives 9.

Slide 5.

Now we just need to compare our calculated F value to the critical values that correspond to 7 degrees of freedom in the numerator and 9 in the denominator.
► Looking at our F critical tables for alpha of 0.025 and 0.05, these are from the StatsExamples website, we use the column for 7 degrees of freedom and the row for 9.
This gives us critical values of 4.20 and 3.29. Since this is a two-tailed F test, we need to remember to double the alpha values from the table when thinking about the overall alpha value for our test.
► Comparing our F calculated value of 7.571 to these values we see that it is larger than the 4.20 which indicates that the P-value for this F test is less than 0.05.
► We can therefore say that "The variance of lengths at site 2 is significantly higher than the variance of lengths at site 1 ( p < 0.05 )."
We would therefore conclude that the variances in these two sites are different.
► If we have access to a computer, it can tell us that the exact P-value is 0.0072 which is clearly less than 0.05.
This is very strong evidence that these variances are different, there is less than seven tenths of a percent chance that sampling error all by itself would give us sample variances this different if the population variances are the same.

Slide 6.

For our second example, let's think about the number of scales on certain snakes for females and males. Consider two samples of scale numbers with the values shown.
The questions we will ask are:
Are the scale number variances of the sexes different or not?
With what degree of confidence do we make this conclusion?
► The first step will be to calculate the sample variances. Then we calculate their ratio, putting the larger of the two values in the numerator.
► The variances we get are shown to the left. The variance for females is larger than the variance for males so we put that value in the numerator. The calculated F value is therefore 5.800 divided by 2.500 equals 2.320.
► Now we also need the degrees of freedom for our two samples to determine our critical values. Since females showed the highest variance, the degrees of freedom for the numerator is the 11 female values minus one gives 10.
The degrees of freedom for the denominator is the 13 male values minus one gives 12.

Slide 7.

Now we just need to compare our calculated F value to the critical values that correspond to 10 degrees of freedom in the numerator and 12 in the denominator.
► Looking at our F critical tables for alpha of 0.025 and 0.05, we use the column for 10 degrees of freedom and the row for 12.
This gives us critical values of 3.37 and 2.75. Again, since this is a two-tailed F test, we need to remember to double the alpha values from the table when thinking about the overall alpha value for our test.
► Comparing our F calculated value of 2.320 to these values we see that it is smaller than both of these critical values. This indicates that the P-value for this F test is larger than 0.1.
► We can therefore say that "The variances of the scale numbers in females and males are not significantly different ( p > 0.1 )."
We would decide that the variances in these two sexes do not differ.
► If we have access to a computer, it can tell us that the exact P-value is 0.1692 which is clearly more than 0.1.
Keep in mind that the right way to think about this is not that we have very strong evidence that these variances are same - instead, we looked for evidence that they are different and didn't find convincing evidence.
We would still conclude that the variances are likely the same or similar, but the strength of that conclusion is perhaps undermined by our small sample sizes.
In fact, a sneaky technique used by some statisticians is to deliberately perform small experiments that they know can't give them small P-values, and then claim that this proves the null hypothesis.
Watch out for this.

Slide 8.

Sometimes we prefer to do one-tailed F tests when we have a specific variance that we want to know if it's bigger than the other and we don't care if it's smaller. For simplicity, let's assume that it's the first population variance that we care about being larger than the second. Here's the formal procedure for that.
First, we create our null and alternative hypotheses. The null hypothesis is now that the population variance for population 1 is less than or equal to the variance for population 2. The alternative hypothesis is that the variance for population 1 is greater than the variance for population 2.
The rest is pretty much the same except we calculate our F calculated value by dividing the first variance by the second, not the larger by the smaller.
Then we compare the F calculated value to F critical values from an F table.
Now, since this is a one-tailed test, we will use the alpha values from the tables directly.
Then, we determine the P-value.
This is the probability that sampling error alone could make the first sample variance larger to the degree we see, if the true population variance of population 1 was less than or equal the variance of population 2.
Then we decide to "reject the null hypothesis" or "fail to reject the null hypothesis" based on the P-value.
The null hypothesis is consistent with non-small p values.
On the other hand, if the alternative hypothesis is true, our data would usually give us small p values.
Keep in mind that to do one-tailed tests, we need a genuine a priori reason to test only the one direction - we can't look at our data and then decide.

Slide 9.

For our first one-tailed example, let's think about how many parasites some birds have at two sites of interest. Before we start collecting data, we will predict that the birds in the disturbed area will have higher variance in parasite number. Consider the values shown for the two areas.
The questions we will ask are:
Is the variance in the disturbed area larger than in the preserved area?
With what degree of confidence do we make this conclusion?
► As always, the first step will be to calculate the sample variances. Then we calculate their ratio, but now instead of automatically putting the larger one in the numerator, we will put the variance for the disturbed area in the numerator.
► The variances we get are shown to the left and the calculated F value is 16.500 divided by 3.143 equals 5.250.
► The degrees of freedom for each sample is the sample size minus one. We put the variance for the disturbed sample in the numerator so the degrees of freedom for the numerator is the 5 disturbed area values minus one equals 4. The degrees of freedom for the denominator is the 8 preserved area values minus one equals 7.

Slide 10.

Now we compare our calculated F value to the critical values that correspond to 4 degrees of freedom in the numerator and 7 in the denominator.
► We use the column for 4 degrees of freedom and the row for 7 from our tables of F critical tables.
The critical values are 5.52 and 4.12 and since this is a one-tailed F test we don't double the alpha values from the table.
► Comparing our F calculated value of 5.250 to these values we see that it is larger than the 4.12 which indicates that the P-value for this F test is less than 0.05.
However, 5.250 is smaller than 5.52 so the P-value for this F test is larger than 0.025.
► We can therefore say that "The variance of parasite number in the disturbed habitat is significantly larger than the variance in the preserved area (0.025 ► If we have access to a computer, it can tell us that the exact P-value is 0.0283 which is between 0.025 and 0.05.
It looks like habitat disturbance is associated with the birds having more variance in the number of parasites. This doesn't prove that environmental destruction is the cause, but it would support an argument that it is.
Also, this is fairly weak evidence for a real pattern. The P-value isn't super small and if we had done a two-tailed test then the result would have been nonsignificant. The risk of type one error with this data set is pretty high.

Slide 11.

For our second one-tailed example, let's think about how many different kinds of bacteria thrive on two different sites of interest. Before we start collecting data we will predict that the bacteria on doorknobs will be more variable due to the larger number of people touching them than keyboards which usually just have one user. Consider the values shown for the two sampling sites.
The questions we will ask are:
Is the variance on doorknobs larger than keyboards?
With what degree of confidence do we make this conclusion?
► We calculate the sample variances and their ratio, putting the variance for the doorknobs samples in the numerator. The calculated F value is 17.250 divided by 5.600 equals 3.080.
► The doorknob data is in the numerator so the degrees of freedom for the numerator is the 9 doorknob samples values minus one equals 8. The degrees of freedom for the denominator is the six keyboard samples minus one equals 5.

Slide 12.

Our calculated F value is 3.080 and our degrees of freedom are 8 for the numerator and 5 for the denominator.
► From the table, the critical values are 6.76 and 4.82 and since this is a one-tailed F test we don't double the alpha values.
► Comparing our F calculated value of 3.080 to these values we see that it is smaller than both, which indicates that the P-value for this F test is larger than 0.05.
► We can therefore say that "The variance of the number of bacterial cultures on doorknobs is not significantly larger than the variance on keyboards ( p > 0.05 )."
► The exact P-value is 0.1154, larger than 0.05.
As before, we shouldn't conclude that these variances are same or that the keyboard variance is higher - instead, we looked for convincing evidence that the doorknob variance is larger and didn't find it.
Keep in mind that this is a fairly small experiment so we may be making a type two error when we fail to reject the null hypothesis.

Slide 13.

Finally, a word of caution about the F test.
A strong assumption of the F test is normal population distributions.
If the populations are not normally distributed, then the F test can easily give type one or type two errors.
For this reason, even though the F test is OK as a first check, if we really want to test for equality of variances, we should do a Levene's test, Bartlett's test, or Brown-Forsythe test. These are more accurate, but the math is more complicated
Note however, the one-tailed F tests that are used in the ANOVA procedure are generally OK, unless you have extremely small sample sizes, due to the central limit theorem causing normality for our estimated means.

Zoom out.

I hope you found these examples of the F test useful.
Despite its weakness if the population values are not normally distributed, this test is commonly used in the scientific and statistical literature.

End screen.

Go ahead and comment or click to like if you found this helpful. Subscribe to make it easier to find this channel again in the future.


Connect with StatsExamples here


This information is intended for the greater good; please use statistics responsibly.