TWO SAMPLE T-TEST INTRODUCTION
Watch this spaceDetailed text explanation coming soon. In the meantime, enjoy our video. The text below is a transcript of the video.
Connect with StatsExamples here
LINK TO SUMMARY SLIDE FOR VIDEO:
TRANSCRIPT OF VIDEO:
The 2 sample T test is the most common way to see if the means of two different populations are different from each other. Let's take a look at the concepts behind it and how it works.
The two sample T test is used for comparing population means .
We want to know if the population means differ but we can't measure the populations because that's impractical .
We therefore take random samples from the populations and calculate the sample means .
The sample means are estimates of the population means but sampling error makes them inexact. Even if the populations do have exactly the same mean, the sample means are going to differ from each other because of sampling error.
The way the T test works is by asking the question, what are the chances that population means are the same? In other words our null hypothesis of equal means is true, based on how much the sample means differ from one another.
Obviously, if the sample means are almost exactly the same that would be weak evidence that the population means are different and we would probably be fine with thinking they're the same.
Likewise, if the sample means are extremely different, and there isn't a lot of noise in our data, that would be better evidence that the population means are different.
But exactly how do we figure this out?
One way to figure out if the population means are probably different would be by using the confidence intervals for comparing the means.
If you're unfamiliar with confidence intervals, watch our confidence interval video for more about calculating confidence intervals and what they represent. A link to this video is in the video description.
We could take our two samples and calculate the 95% confidence interval for each of them and look to see whether they overlap.
We know that the true population means are probably in the middle part of these confidence intervals, so if they overlap it could easily be the case that both populations have the same mean in the overlapping region.
When this happens this would represent a lack of evidence that the population means are different from each other.
On the other hand, we may get a situation like this where the confidence intervals don't overlap.
We know that the true population means are probably in the middle part of these confidence intervals, but if they don't overlap it would be very unlikely that both population means are the same and outside of one or both of the confidence intervals for each of the individual populations .
When this happens this would represent good evidence that the population means are different from each other.
The procedure outlined is very similar to the one used in the one sample T test procedure where we compared some hypothetical value to the confidence interval we obtained from one sample. If you're unfamiliar with that test, watch our one sample T test video for more about testing hypothesis about population means. A link to this video is in the video description.
So basically the T test is a comparison of confidence intervals . We could calculate both confidence intervals and see if they overlap.
When they overlap the population means may well be the same, our null hypothesis, but when they don't overlap the population means are probably different, our alternative hypothesis.
Instead of actually doing this however we usually calculate a single T or Z value for the difference and compare that to 0 which is what the prediction for the difference would be when the null hypothesis is true.
We can do this because it turns out the distribution of differences between two normal distributions is itself normal, and we can calculate a confidence interval for these differences based on our sample data.
Let's see why that is and how this works.
The T test uses the additive property of variances .
There's a mathematical theorem that states that for two uncorrelated or independent datasets A & B the variance of the combined data set is the sum of the separate variances as shown
The implication of this is that if the means of two populations are equal then the difference between the two samples should be 0 and the combined variance will be the sum of the original 2 variances. From that combined variance we could get a standard error and a confidence interval for that set of differences.
If the confidence interval of the set of differences includes zero that's what we expect to see if the original two population means are the same.
If the confidence interval of the set of differences does not include zero that's not what we would expect to see if the two original population means are the same
The application of this is that a T test of the differences, where the null hypothesis is that the differences are equal to 0, can be used to see if the means of two populations differ.
This figure illustrates how the T test uses the additive property of variances.
If our population means are the same, then it's very unlikely that the difference between those two sample means will be outside of the 95% confidence interval for a difference of zero between the population means. That confidence interval will be centered around 0 and with a width related to the 95% confidence intervals we would get for estimates of the separate population means.
On the other hand, if our population means are different, then it's more likely that the difference between those two sample means will be outside of the 95% confidence interval for a difference of zero between the population means.
Instead of looking directly at the overlap between the original two confidence intervals, we look to see whether the difference between the sample means lies within the middle region of this distribution.
There is a direct analogy between the one sample and two sample T tests.
In a one sample T test we looked to see whether a hypothesized population mean is within the 95% confidence interval constructed around the sample mean. If it's outside of the interval, then the population mean probably isn't the hypothesized value. If it's within the interval, then the population mean may well be the hypothesized value.
When we do a one sample T test, instead of actually making these confidence intervals and looking at a diagram as shown on the left, we calculate a T value that represents how far apart our sample mean is from the hypothesized population mean, relative to the width of the confidence interval in terms of standard errors.
Likewise, in a two-sample T test we are looking to see whether the confidence interval for the difference between our two sample means includes a hypothesized difference of zero. If zero is outside of the interval, then the population means probably aren't the same. If zero is within the interval, then the population means may well be equal.
Instead of actually making this confidence interval and looking at a diagram as shown on the right, we calculate a T value that represents how far apart our sample means are from one another, relative to the width of the confidence interval in terms of standard errors.
The widths of confidence intervals for single samples were just a particular value times the standard error for that particular sample. Since we're combining two different samples, we need some sort of equation to represent the combined standard error that we're comparing to the difference between the sample means.
How we get that combined to standard error can depend on what type of data we're working with and there are four different types of tests to think about.
OK, So what is the combined standard error that we're going to use in our T calculated equation. Recall that the standard error when were calculating confidence intervals for one sample is the standard deviation of the sample divided by the square root of the sample size. It's multiples of this that created our confidence intervals.
The first scenario, shown on the left, is when our population variances are known. In this situation the standard error is the square root of ... the variance of population 1 divided by sample size of the sample from population 1 plus the variance of population 2 divided by the sample size of the sample from population 2.
This is actually a very unrealistic situation because how would we know the variance of the populations without knowing their means. But if this is the case, then we could use a normal distribution instead of the T distribution to create the confidence interval.
The second scenario, shown in the center, is when our population variances are unknown But we can assume them to be equal. in this situation we have two equations. The standard error is the square root of the pooled variance multiplied by, in parentheses, one over the sample size of sample one plus one over the sample size of sample 2. The pooled variance is the weighted average of the sample variances where the weighting is done by the degrees of freedom for each sample as shown.
This is a more realistic situation Because while we never usually know the variances of the populations, it may well be the case that the population variances are similar and we can assume that the variances are equal.
The third scenario, shown on the right is when the population variances are unknown And we do not assume they are equal. in this case the standard error is the square root of ... the variance of sample 1 divided by sample size from of the sample from population 1 plus the variance of sample 2 divided by the sample size of the sample from population 2.
This is the situation in which we make the fewest assumptions about our populations so, of these three, this is the most likely scenario.
There is also a paired T test that is more powerful than the previous three.
if every value in each data set has a specific partner in the other then the set of individual differences should have a mean of 0 if the population means are equal.
We can then think of this set of differences as a set of values with a hypothetical population mean of 0 which can be tested with a one sample T test.
as shown here we would take each value in data set 1 pair it up with its partner from data set to and calculate a difference and then this set of differences would be used to create a T calculated value.
Again, a link to the video for a one sample T test is in the description below.
This type of test requires a very specific experimental design. This requires paired values that match up. For example, twin studies where one twin from each pair is in each group or a set of individuals where you compare before and after values. You can't create a paired test just by arbitrarily deciding that certain values pair up with others, there has to be something in the experimental design that justifies the pairing.
Now let's think about the overall equations and degrees of freedom for the two sample T tests.
For the scenario where we know the population variances we will actually calculate a Z calculated value because we will be comparing it to the normal distribution. This type of test is called an unpaired 2 sample Z test.
In this case the degrees of freedom will be the sample size of sample one plus the sample size of sample 2 - 2. That's the same thing as the first sample size minus one plus the second sample size minus one.
This type of test should be extremely rare because it's highly unlikely we would know the population variances without knowing the population means.
For the scenario where we don't know the population variances, but we are assuming they are equal, we calculate the T calculated value using the equation shown. We call this an unpaired 2 sample homoscedastic T test. This is also called 2 sample homoscedastic student's T test. The name student was a pseudonym of William Sealy Gosset, who invented the technique while working for the Guinness beer brewery.
In this case the degrees of freedom will be the sample size of sample one plus the sample size of sample 2 - 2, same as for the Z-test.
For our paired 2 sample T test, the T calculated equation is just based on the one sample T test, and the degrees of freedom will be the number of difference values minus one just as it is for the one sample T test.
For the scenario where we don't know the population variances and we don't want to assume they are equal and run the risk of making a type II error if they're not, we calculate the T calculated value using the equation shown. We call this an unpaired 2 sample heteroscedastic T test. This is also called 2 sample heteroscedastic student's T test.
For this task the degrees of freedom is much more complicated and comes from the equation shown.
these are the two most important t-tests.
The paired T test is the most powerful T test and definitely the one you should use if your experimental design allows it. The heteroskedastic T test is not quite as powerful as the homoscadastic T test, but avoids the added risk of type 2 error that comes from assuming the variances are equal which the homoscadastic t-test has.
OK, let's look at the formal procedure for conducting a 2 tailed 2 sample T test.
First we need to create a null hypothesis and alternative hypothesis.
The null hypothesis will be that the means are equal, the mean of population one is equal to the mean of population 2 .
The alternative hypothesis will be that the means are not equal, the mean of population one is not equal to the mean of population 2.
Then we need to calculate our T calculated value which for the unpaired tests is the difference between the sample means divided by the standard error.
Once we have our T calculated value we compare it to various he critical values, the widths of the confidence intervals around 0 for the null hypothesis difference between the means.
Again, if you haven't looked at this before you can watch our one sample T test video from more about comparing T calculated values to T critical values.
Using those values, we determine the smallest alpha value that corresponds to the region containing our T calculated value T. This allows us to determine the probability, the P value, of seeing a T calculated value as extreme as the one we do.
Finally we decide to reject our null hypothesis or fail to reject our null hypothesis based on the P value we obtained.
The null hypothesis of equal means is consistent with non small P values.
The alternative hypothesis of unequal means is what would give us small P values.
The usual threshold to make this decision is a p value of 0.05.
Let's look at this procedure again In a more practical manner.
we start the same way by creating our null hypothesis and alternative hypothesis where the means of the populations are equal or they are not equal
Then we calculate our T calculated value using one of our equations and compare that T calculated value to T critical values from a statistical table
From that we can determine the P value.
For example, if our T calculated value is 2.2 and our degrees of freedom was 18 we could look in our table to the row corresponding to 18 degrees of freedom where we could obtain A range of critical values.
The critical value for an Alpha of 0.025 is 2.101 which is smaller than 2.2, but the 2.2 is smaller than the critical value of 2.214 which is the one corresponding to an alpha value of 0.02. Because we're doing a two tailed test this means that the P value is less than 0.05 but larger than 0.04.
If we were using a computer it could calculate an exact P value of 0.041 for a T value of 2.2 with 18 degrees of freedom. this is less than 0.05 but larger than 0.04.
In this situation we would use the small P value to reject the null hypothesis .
The null hypothesis of equal population means is not consistent with the P value of 0.041 which is less than 0.05.
The alternative hypothesis where the population means are not equal to each other is consistent with a small P value of 0.041 which is less than 0.05.
Alternatively , if instead we had gotten a T calculated value of 2.0 with 18 degrees of freedom things are a little different.
Now when we compare that value to the critical values, the 2.0 is larger than 1.734, the critical value corresponding to an Alpha of 0.05, but not as large as the 2.101 corresponding to an Alpha value of 0.025. Therefore, because this is a two tailed test, our P value is less than 0.1 but larger than 0.05.
A computer calculation would give us the exact answer of 0.061 for the P value.
in this situation we can't use the moderate P value to reject the null hypothesis.
The null hypothesis of equal means is consistent with a P value of 0.06 one which is larger than 0.05.
The alternative hypothesis where the population means are not equal to each other is not consistent with a small P value of 0.061 which is larger than 0.05.
The use of a P value of 0.05, that is 5%, as a threshold for deciding to reject the null hypothesis or not is arbitrary what is the standard.
the term statistically significant is a technical phrase that is used to indicate when a test has returned to P value less than the threshold and the null hypothesis has been rejected.
For example if we have sample means of 18 and 20 and they are significantly different , that is our T test gave us a P value less than 0.05, then we reject the null hypothesis that the means are equal. In fact, we can do even better than just saying they're different, we can say which mean appears to be larger than the other based on the values of our sample means.
On the other hand if we have sample means of 18 and 20 and they are not significantly different, that is RT test gave us a P value larger than 0.05 then we fail to reject the null hypothesis that the population means are equal. We may assume they're equal, but we haven't proven it. What we've really done is look for evidence that they weren't equal and did not find it
The two sample T tests can also be one- tailed.
We can create a null hypothesis in which the mean of population one is less than, or equal to, the mean of population 2 and the alternative hypothesis is that the mean of population one is larger than the mean of population 2.
In this case we would only care about whether the T calculated value has a positive value that is larger than the critical value corresponding directly to an alpha value of 0.05.
Similarly, we can create a null hypothesis in which the mean of population one is greater than, or equal to, the mean of population 2 and the alternative hypothesis is that the mean of population one is smaller than the mean of population 2.
In this case we would only care about whether the T calculated value has a negative value that is less than the negative version of the critical value corresponding directly to an alpha value of 0.05.
As discussed in more detail in the video on one sample T test, we can only do one tailed T tests when we have a reason aside from our data to do so. We cannot look at our sample means first and then decide to do a one tailed test.
Let's think of a flow chart of which T test to use in different circumstances.
First, we ask if the individual data values are paired in some way. If the answer is yes, then we use the paired T test.
Next, if the data values are not paired then we ask ... are the population variances known? If the answer is yes, then we do a Z-test. This is an unrealistic situation which is why we haven't looked at it in detail in this video.
If the population variances are not known, which is the typical situation, we then ask are the population variances equal?
if we don't know the population variances but we know they're equal , or we're willing to assume that they are equal, we can use the unpaired homoscedastic T test.
If we don't know the population variances and we don't know (or don't want to assume) that they're equal, then we will use the unpaired heteroscedastic T test. This test is also perfectly valid if the variances are equal.
Since we can do the heteroskedastic test either way you may wonder why we even bother with the homoscedastic test. The homoscedastic T test is slightly more powerful than the heteroskedastic T test, less likely to make a type 1 error, because it has that extra little bit of information about equal variances.
In my opinion, that slight increase in power is probably not worth the extra risk of type 2 error that we have when we decide whether the variances are equal or not. If the variances are not equal , but we think they are and do a homoscedastic test, then we are using a t test that now has an increased risk of type 1 error. Since the increase in power is marginal, and will only come into play in very borderline cases where the p value is close to 0.05 anyway, it's probably best to be cautious and just stick with the heteroscedastic T test.
To summarize everything with a couple of figures, this is what we're doing.
On the left illustrates a case in which two populations have equal means in which case the confidence intervals obtained from their samples would overlap.
The difference between the sample means, divided by a combination of those confidence intervals, which is essentially what our T calculated equation is doing, would give us a small T value.
The small T value would end up in the middle of that T probability distribution and result in a large P value so we would fail to reject our null hypothesis of equal population means.
On the right illustrates a case in which two populations have different means in which case the confidence intervals obtained from their samples would not overlap.
The difference between the sample means, divided by a combination of those confidence intervals, which is essentially what our T calculated equation is doing, would give us a large T value.
The large T value would end up on the edge of that T probability distribution and result in a small P value so we would reject our null hypothesis of equal population means.
I hope you found this description of the two sample T test interesting and useful. There are links in the description to other videos on this channel that gives step-by-step examples of how to do each of the tests described in this video.
Click to like, subscribe, or otherwise show your appreciation.
Connect with StatsExamples here
This information is intended for the greater good; please use statistics responsibly.