TWO SAMPLE UNPAIRED T-TEST EXAMPLES

Tweet
Share

Watch this space

Detailed text explanation coming soon. In the meantime, enjoy our video.

The text below is a transcript of the video.



Connect with StatsExamples here



LINK TO SUMMARY SLIDE FOR VIDEO:


StatsExamples-two-sample-t-test-unpaired-examples.pdf

TRANSCRIPT OF VIDEO:


Slide 1.

The 2 sample unpaired T test is used to compare the means of two populations, let's look at three step by step examples.

Slide 2.

But first, a quick review of the test.
We want to know if two population means differ, but we can't measure both populations so we take samples from each of them and calculate the sample means.
The sample means estimate of the population means, but sampling error makes them imprecise so the question we ask is, what are the chances that the population means are the same based on how much the sample means differ from one another.
If the sample means are almost equal and there's lots of variation or noise, that's poor evidence that the population means are different - but if the sample means are very different and there's little noise, that's good evidence that the population means are different.
The T test is how we take these two things into account.

Slide 3.

There are two basic kinds of T tests, paired and unpaired.
►Paired tests look at cases such as before and after values for the same individuals or twins where one of each pair are in each treatment.
Each value has a single unique paired value in the other population.
►See our other video for more about the pairing process and examples of the calculations for that kind of T test.
►In this video we're going to look at unpaired T tests, one in which the values don't have to be paired.
In these the individual data values don't directly correspond to one another, they come from two distinct populations - like taking values from two different times or locations - or studying the effects of two different treatments in two distinct populations.
Let's look at how the unpaired T test works.

Slide 4.

The way the unpaired T test works is like this. We look at the samples collected from each population, and then calculate a new distribution based on the difference in the sample means with a width based on the combined standard error of both data sets.
In a situation like the one shown at the top and left, a big difference between the means of the samples and little variation in the data will result in a large mean for the differences and a confidence interval that doesn't include zero. This suggests that the true difference between the populations is very unlikely to be zero because the zero is outside the confidence interval. The populations therefore seem to have different means.
In a situation like the one shown at the bottom and right, a small difference between the means of the samples and more variation in the data will result in a small mean for the differences and a wider confidence interval that does include zero. This suggests that the true difference between the populations could easily be zero because the zero is inside the confidence interval. The populations therefore don't seem to have different means.
►Instead of actually creating these confidence intervals and doing this visual comparison, we calculate a T value which represents the difference between the sample means relative to the width of the distribution. When the T calculated value exceeds a critical value that corresponds to situations when the confidence interval would not include the zero, but small T calculated values tend to be when the confidence interval would include zero.

Slide 5.

We actually have some choices to make when doing a two-sample t-test.
►The first choice is based on whether the variances of the populations are known. Keep in mind that this is the population variances, not the sample variances which we can easily calculate from our data.
Knowing the population variances is an unrealistic situation because how would we know the variances of the populations, but not know their means and then just be able to look and see if they're different - no T test required.
Nevertheless, if this was the case then technically we can do what is called a two sample Z test
For this test our Z calculated value comes from the equation shown and the degrees of freedom value is the sum of the sample sizes minus two. We compare our calculated value to the Z or normal distribution instead of the T distribution to determine the significance of the test.
In reality you should never do this test unless you have huge sample sizes and feel very secure in your estimate so the population variances.
This is somewhat unrealistic so I personally ► don't recommend ever doing this test.
Let's look at the two unpaired tests you should consider doing.

Slide 6.

If we don't know the population variances, but we're fairly sure that they're equal, because we've done some other statistical test to compare them, we can do the unpaired homoscedastic T test.
For this test, the T calculated value comes from the equation shown which incorporates the sample means, the sample sizes, and a value called the pooled variance, the SP2.
The pooled variance is a weighted average of the sample variances where the weighting is done based on the degrees of freedom for each sample.
The degrees of freedom value is the sum of the sample sizes minus two.
We compare this calculated T value to the T distribution for the matching degrees of freedom value to determine the significance of the test.

Slide 7.

If we don't know the population variances, and we don't want to assume that they're equal, either because we don't know or because we've tested and know they aren't, we can do the unpaired heteroscedastic T test.
This is the most versatile T test since it works with equal or unequal population variances and is also called the "Welch's Test."
It's slightly less powerful than the homoscedastic test, but less likely to be performed under faulty assumptions since it doesn't rely on getting a comparison of the population variances right.
For this test, the T calculated value comes from the equation shown which incorporates the sample means, the sample variances and the sample sizes.
Unfortunately, the degrees of freedom value for the heteroscedastic T test is much more complicated than the ones we've seen before. The ugly equation shown there is used to calculate the overall degrees of freedom. It takes into account the two variances and the sample sizes and will return a non-integer value that we always round downwards.
As before, we compare this calculated T value to the T distribution for the degrees of freedom to determine the significance of the test.

Slide 8-9.

OK, so here's the formal procedure.
►First, we create a null hypothesis and alternative hypothesis. The null is that the means of the two populations are equal and the alternative is that the means are not equal.
►Then, we get a t-calculated value using one of the equations from before, depending on whether we are assuming equality of the population variances or not.
►Then, we compare that t-calculated value to various t-critical values taken from the t-distribution for the number of degrees of freedom for our sample - these correspond to different confidence intervals and probabilities.
►Then, we determine the probability, the p value, of seeing a t-calculated value as extreme as we do. What's the smallest alpha value we could pick, giving rise to the widest confidence interval, so that the hypothetical population mean would be outside that interval?
This p-value is effectively the probability of seeing our sample data if the null hypothesis is true.
►Finally, we decide to "fail to reject our null hypothesis" or "reject our null hypothesis" based on the p-value.
If the p-value is not small, then we would fail to reject the null hypothesis and generally conclude that we lack evidence to decide that the population means are different.
If the p-value is small, then we would decide to reject the null hypothesis and thereby conclude that we have good evidence to decide that the population means are different.
OK, now let's look at three concrete examples. All three illustrate different important points.

Slide 10-12.

For our first example let's think about the masses of fish caught at each of two different reefs to see if their sizes are different.
Our data is the individual masses of a sample of fish from each population and the questions we want to ask are - are the means of the fish at the two reefs equal and with what confidence do we decide?
►When doing an unpaired T test the first step is to calculate the means and variances for each sample.
If you need a reminder on how to calculate these check out the StatsExamples video on calculating summary statistics.
►In this case reef A gives us a mean of 31 with a variance of 3.200 and reef B gives us a mean of 29 with a variance of 4.444.
These are the values we will use for our unpaired t-test.

Slide 13-16.

First let's look at how we would do a homoscedastic test, this is one in which we assume that the population variances are equal.
We won't worry about how we determine equality of variances in this video, but StatsExamples has other videos on this topic available.
►First we need the degrees of freedom which is the sum of the sample sizes minus two which is 11 plus 10 minus 2 equals 19.
►Then we should calculate the pooled variance since we'll be using it in our T calculated equation. For this we use the equation shown, the variances of 3.200 and 4.444 weighted by the degrees of freedom and divided by the overall degrees of freedom to give us a pooled variance of 3.78947.
►Then we plug the pooled variance into the equation shown along with the sample means and sample sizes to get a final T calculated value of 2.3514.

Slide 17-22.

Now we take our values and compare them to values from the T distribution. The first question will be how the magnitude, the absolute value, of the calculated T value compares to the critical value that corresponds to an overall alpha value of 5%
We go to our T-table and look at the row for 19 degrees of freedom and the column for an alpha value of 0.025 since this is a two-tailed test.
►The value in the table is 2.093 which means that the middle 95% of this T distribution runs from negative 2.093 to positive 2.093. Now our question is how the 2.3514 compares to this region.
► The 2.3514 is larger than the positive 2.093 which means it is outside of the center region. This means that our test will reject the null hypothesis of equal means for the two sides.
► We can therefore say that ""The mean masses are significantly different". We're not done however, there are two more questions we have to answer.
►First, which population appears to have the larger mean? That's easy, we just compare the sample means and that tells us which population mean appears larger. Of course, if our result wasn't significant the sample means wouldn't tell us anything, but it is so they do.
►Second, what is the P value of our test? For this we'll need to examine our T table in more detail.

Slide 23-27.

To figure out the degree of significance for our test we compare our calculated T value to the whole set of critical values in our T table. Remember that these values correspond to areas more extreme than a central region encompassing most of the T distribution for 19 degrees of freedom as shown.
►The 2.3514 lies between two of the critical values in the table - the 2.205 and the 2.539.
►From our calculated T value being more extreme than 2.205 we know that the P value is less than 0.04.
►However, our calculated T value is not more extreme than the 2.539 so we know that the P value is not less than 0.02, it must be larger.
►We can therefore state that, "The mean mass of fish at reef B is significantly larger than the mean mass of fish at reef B (0.02 ►If we have access to a computer, it can provide the exact P value of 0.0296 which we can see is less than 0.04, but more than 0.02.

Slide 28-30.

Now let's look at how we would do a heteroscedastic test, one in which we think the population variances aren't equal or we just want to be cautious and not make that assumption.
►First we calculate the T value which is fairly straightforward. The difference in sample means go in the numerator and the combined standard error is in the denominator. Plugging the values in gives us 2 divided by 0.85056 which is 2.3323 for our T value.
►Then we need the degrees of freedom value which is obtained using this huge and complicated equation which uses the variances and sample sizes in several places. We just need to go slow and plug everything in carefully. When we do, we get 0.54074 divided by 0.03041 which is a value of 17.814.
Degree of freedom in tests are always integer values so we round these calculated values. We always round downwards to simulate having less data and being conservative rather than rounding upwards and acting as if we have more data than we do. The 17.814 gets rounded down to 17 degrees of freedom.
Now that we have a T calculated value and some degrees of freedom, the rest is just like the previous T test.

Slide 31-35.

Now, how do the magnitude, the absolute value, of the calculated T value of this T value compare to the critical value that corresponds to an overall alpha value of 5% ?
►We go to our T-table and look at the row for 17 degrees of freedom and the column for an alpha value of 0.025 since this is a two-tailed test.
►The value in the table is 2.110, how does our 2.3323 compare to this.
► The 2.3323 is larger than the positive 2.110 which means it is outside of the center region. This means that our test will reject the null hypothesis of equal means for the two sides.
► We can therefore say that ""The mean masses are significantly different" and just like earlier, we know that the mean for reef A is larger than the mean of reef B, but we still have one more question.
►What's is the P value of our test? As before, for this we'll need to examine our T table in more detail.

Slide 36-40.

We compare our calculated T value to the whole set of critical values in our T table for 17 degrees of freedom as shown.
►The 2.3323 lies between two of the critical values in the table - the 2.224 and the 2.567.
►From this we know that the P value is less than 0.04, but not less than 0.02.
►We can therefore state that, "The mean mass of fish at reef B is significantly larger than the mean mass of fish at reef B (0.02 ►If we have access to a computer, it can provide the exact P value of 0.0317 which we can see is less than 0.04, but more than 0.02.
We got the same result from the heteroscedastic test that we did for the homoscedastic test, with a couple of minor differences.
Our T calculated value was slightly larger for the homoscedastic test and the threshold T critical value was slightly smaller due to the higher degrees of freedom value.
This illustrates the slightly higher statistical power that the homoscedastic test has compared to a heteroscedastic test.

Slide 41-42.

Now let's consider a scenario in which doctors are testing a new medical procedure. The metric they're interested in is whether it will result in a shorter or longer recovery time so they perform the old procedure on 8 patients as a control and the new procedure on 8.
This isn't a paired test even though the sample sizes are the same because there's nothing that connects specific patients across the two treatments, so we have to use an unpaired technique.
We then ask if the mean recovery times for these two procedures are equal and with what confidence do we decide?
►The first step is to calculate the sample means and variances for our equations.
The mean for the old procedure is a 16 with a variance of 5.1429 and the mean for the new procedure is a 14 with a variance of 6.8571.
Based on these it looks like the new procedure may have as shorter recovery time, but we don't know if this is genuine unless this difference is statistically significant.

Slide 43-46.

This is our data.
►The degrees of freedom is the sum of the sample sizes minus two which is 8 plus 8 minus 2 equals 14.
►Next the pooled variance using the equation shown, the variances of 5.1429 and 6.8571 weighted by the degrees of freedom and divided by the overall degrees of freedom to give us a pooled variance of 6.000.
►Then we plug the pooled variance into the equation shown along with the sample means and sample sizes to get a final T calculated value of 1.6330.

Slide 47-50.

Comparing our T calculated value to the critical values for 14 degrees of freedom we can see that 1.6330 is not as extreme as the critical value for an overall alpha of 0.05 which is 2.145.
►We won't be able to reject the null hypothesis of equal means because P is larger than 0.05, and the mean recovery times for the old and new procedure are not significantly different.
►To get a better sense of what the nonsignificant P value is, we look at the rest of the T values on the row and we see that the T calculated value of 1.6330 lies between the values of 1.345 and 1.761 which correspond to overall P values of 0.2 and 0.1.
►We therefore state that, "The mean numbers of each flower species are not significantly different (0.1 ►With a computer we can obtain the exact P value which is 0.1248.
The data doesn't prove that the new procedure doesn't alter the recovery times, but it demonstrates a lack of evidence that it does.

Slide 51-54.

If we do a heteroscedastic T test instead, we start with the same data shown here.
►First we use the heteroscedastic T test equation, plugging in the means and variances gives us 2 divided by 0.85056 to get a T calculated value of 1.6330.
►It's not a coincidence that our T calculated value is the same as the one we got for the homoscedastic test - this will always happen if the sample sizes are equal.
►Next the degrees of freedom using the complicated equation. Plugging in values carefully gives us a value of 13.72 for the degrees of freedom which we round down to 13.

Slide 55-57.

Comparing our T calculated value to the critical values for 13 degrees of freedom we can see that 1.6330 is not as extreme as the critical value for an overall alpha of 0.05 which is now 2.160 because our degrees of freedom decreased from 14 to 13.
As before, we won't be able to reject the null hypothesis of equal means because P is larger than 0.05, and "the mean recovery times for the old and new procedure are not significantly different."
►Looking at the rest of the T values on the row and we see that the T calculated value of 1.6330 lies between the values of 1.350 and 1.771 which correspond to overall P values of 0.2 and 0.1.
We therefore state that, "The mean numbers of each flower species are not significantly different (0.1 ►With a computer we can obtain the precise P value which is 0.1252.
The slightly higher P value reflects the lower degrees of freedom value because this is a heteroscedastic test instead of a homoscedastic test.

Slide 58-59.


Now let's consider a scenario in which we've measured the number of cracks in structures in two different locations. Perhaps we're interested in whether the different locations alter the long-term integrity of the buildings.
Our questions will be - are the mean numbers of cracks the same and with what degree of confidence do we decide?
►The first step is to calculate the sample means and variances for our two sites.
The means for the sites are 11 and 6 and the variances are 23.600 and 14.000.
Based on these it looks like site 1 has more cracks, but we don't know if this is genuine unless this difference is statistically significant.

Slide 60-63.

Here's our data again.
►The degrees of freedom is the sum of the sample sizes minus two which is 6 plus 8 minus 2 equal 12.
►We calculate the pooled variance using the equation shown and the variances of 23.6 and 14 to get a pooled variance of 18.000.
►Then we plug the pooled variance into the equation shown along with the sample means and sample sizes to get a final T calculated value of 2.1822.

Slide 64-66.

Comparing our T calculated value to the critical values for 14 degrees of freedom we can see that 2.1822 is slightly more extreme than 2.179 which is the critical value for an overall alpha of 0.05.
We can therefore reject the null hypothesis of equal means because P is larger than 0.05, and The mean number of cracks at site 1 is significantly larger than the mean number of cracks at site 2
►To narrow down the P value we look at the rest of the T values on the row and we see that the T calculated value of 2.1822 lies between the values of 2.179 and 2.303 which correspond to overall P values of 0.05 and 0.04.
We therefore state that, "The mean number of cracks at site 1 is significantly larger than the mean number of cracks at site 2 (0.04 ►With a computer we can obtain the precise P value which is 0.0497.
A P value only barely less than 0.05 isn't super convincing, but our result is a significant difference between these two sites.

Slide 67-69.

Now if we do a heteroscedastic T test we would start with the same data as before.
►First we use the heteroscedastic T test equation, plugging in the means and variances gives us a T calculated value of 2.0973, slightly less than the one from the homoscedastic T test.
►Next the degrees of freedom using the complicated equation. Plugging in values carefully gives us a value of 9.1458 for the degrees of freedom which we round down to 9. Note that this is considerably lower than the 12 we had in the homoscedastic case.

Slide 70-72.

Comparing our T calculated value to the critical values for 9 degrees of freedom we can see that 2.0973 is not as extreme as the critical value for an overall alpha of 0.05 which is now 2.262 because our degrees of freedom decreased from 12 to 9.
Unlike before, we now aren't able to reject the null hypothesis of equal means because P is larger than 0.05, and we therefore conclude that "the mean number of cracks at site 1 is not significantly different from the mean number of cracks at site 2."
►Looking at the rest of the T values on the row and we see that the T calculated value of 2.0973 lies between the values of 1.833 and 2.262 which correspond to overall P values of 0.1 and 0.05.
We would therefore state that, "The mean number of cracks at site 1 is not significantly different from the mean number of cracks at site 2 (0.05 ►With a computer we can obtain the precise P value which is 0.0649.
This pair of tests shows that the result we get can depend on the test we use - the exact same data analyzed with a homoscedastic test gave a significant difference whereas an analysis using the homoscedastic test gave a nonsignificant difference.
Our conclusion comes down to our assumption about the population variances and a type II error in that test could lead us to perform a homoscedastic test and get a significant result when we should have done a heteroscedastic test and not rejected the null hypothesis.
This shows the double-edged sword of the homoscedastic test - it has more power, but is prone to an extra risk of type II error compared to the heteroscedastic test which doesn't carry the extra risk from the variance equality assumption.
This is also one of the reasons why we should not overinterpret borderline P values. In reality, if we get a test with a P value this close to 0.05 we should probably redo the entire experiment with a larger sample size and see what happens when we have better data.

Slide 73-76.

Let's go over a few things to remember.
First, when the sample sizes are equal, the t calculated value will be the same for both the homoscedastic and heteroscedastic procedure.
►The homoscedastic t-test has slightly more power for 2 reasons. First, the T calculated value will be equal or larger than heteroscedastic, making it more likely to exceed the critical value for an alpha of 0.05. Second, the degrees of freedom will be larger which means the critical values will be smaller, also making it more likely to exceed the critical value for an alpha of 0.05.
►Even though the homoscedastic test seems better because it has more power, the homoscedastic t-test does carry an extra risk of type II error when we decide that the population variances are equal.
►For this reason, my advice is that we should generally stick to the heteroscedastic t-test in the real world. The two tests will only differ in their conclusions if p is close to 0.05, when we should be extra cautious anyway.

Zoom out.

I hope you found these examples useful. The StatsExamples website has a high resolution PDF of this final screen and a bunch of other completely free statistics resources.

End screen.


Please help others to find this video if they're have trouble doing the T test by liking, subscribing, or commenting.



Connect with StatsExamples here


This information is intended for the greater good; please use statistics responsibly.