TWO SAMPLE PAIRED T-TEST EXAMPLES

Tweet
Share

Watch this space

Detailed text explanation coming soon. In the meantime, enjoy our video.

The text below is a transcript of the video.



Connect with StatsExamples here



LINK TO SUMMARY SLIDE FOR VIDEO:


StatsExamples-two-sample-t-test-paired-examples.pdf

TRANSCRIPT OF VIDEO:


Slide 1.

The 2 sample paired T test is used to compare the means of two populations when each value in each population has a direct and specific relationship to each value in the other, let's look at three step by step examples.

Slide 2.

But first, a quick review of the test.
We want to know if the population means differ, but we can't measure the populations so we take random samples from the populations and calculate the sample means.
The sample means are estimates of the population means, but sampling error makes them inexact so we ask ourselves, what are the chances that the population means are the same based on how much the sample means differ from one another?
If the sample means are similar, that's weak evidence that the population means are different, but if the sample means are different, then that's better evidence that the population means are different - but we also need to take into account the variation or noise in the data values.
The T test is how we do this.

Slide 3-5.

There are two basic kinds of T tests, unpaired and paired. In unpaired tests the data values don't directly correspond to one another, they come from two distinct populations.
This is like taking values from two different times or locations - or studying the effects of two different treatments in two distinct populations.
►See our other video for examples of the calculations for that kind of T test.
►In this video we're going to look at paired T tests, one in which the values are paired.
These are studies where we're looking at essentially the same population of subjects, but technically two different populations of data, by looking at data sets such as before and after values for the same individuals or twins where one of each pair are in each treatment.
The key thing is that each value has a single unique paired value in the other population. Not that we've paired them up ourselves, there's something inherent about the data that makes only one arrangement of pairing possible.
Given that we have this, let's look at how this test works.

Slide 6.

The way the paired T test works is like this. We look at the samples collected from each population, shown in red and blue, and then calculate a new set of values based on the differences between each pair of values.
The difference between the sample means will be the mean of the differences, and those values can be used to create a confidence interval around that value, as shown in purple on the right.
In a situation like the one shown at the top, a big difference between the means of the samples will result in a large mean for the differences and a confidence interval that doesn't include zero - indicating that the true difference between the populations is very unlikely to be zero. The populations therefore seem to have different means.
In a situation like the one shown at the bottom, a small difference between the means of the samples will result in a small mean for the differences and a confidence interval that includes zero - indicating that the true difference between the populations may well be be zero. There isn't good evidence that the populations have different means.

Slide 5.

This is another way of visualizing it. If we had samples of 8 data values from each population, we would compare each pair and create the new data set of differences.
In the left example we see that not only are the overall means different, the difference between every pair shows that the red data set values are consistently larger.
In the right example, even though the means are different, the relationship between the values in each pair is less consistent with a smaller mean and higher variation in magnitude and sign.
The paired T test ignores the values in the original data sets and focuses on these values of the differences to create their distribution and then looks at the mean and confidence interval of these values.

Slide 6.

In practice what we’ll do is list the values in each data set as shown and compute the difference for each pair. We calculate the means of all three sets of values and the standard deviation of the differences.
The means of the samples won't be used in our test, but will allow us to say which is larger or smaller if we get a significant difference.
►The mean and standard deviation of the differences goes into the T test equation that we'll use to conduct a one-sample t-test with a hypothetical mean of zero. In other words, is the mean of the set of differences significantly different from zero?
The T calculated value is the mean of the differences divided by the standard error of the differences, that's the standard deviation divided by the square root of the sample size. The degrees of freedom for the test is just the number of differences minus 1.

Slide 7-8.

OK, so here it all is at once.
►First, we create a null hypothesis and alternative hypothesis. The null hypothesis is that the mean of the differences is zero and the alternative is that the mean is not equal to zero.
►Then, we get a t-calculated value using the equation from before.
►Then, we compare that t-calculated value to various t-critical values, which correspond to different confidence intervals and probabilities, based on the t-distribution for the number of degrees of freedom for our sample.
►Then, we determine the probability, the p value, of seeing a t-calculated value as extreme as we do. What's the smallest alpha value we could pick, giving rise to the widest confidence interval, so that the hypothetical population mean would be outside that interval?
This p-value is essentially the probability of seeing our sample data if the null hypothesis is true.
►Finally, we decide to "fail to reject our null hypothesis" or "reject our null hypothesis" based on the p-value.
If the p-value is not small, then we would fail to reject the null hypothesis and generally conclude that we lack evidence to decide that the population means are different.
If the p-value is small, then we would decide to reject the null hypothesis and thereby conclude that we have good evidence to decide that the population means are different.
OK, now let's look at some concrete examples.

Slide 9-20.

For our first example let's think about the number of rays in the pectoral fins on the left and right sides of some fish and whether they're exactly the same or there's some kind of bias to one side, an asymmetry.
Our data will be the number of left and right fin rays for a set of fish and the questions we want to ask are - are the means of the two sides equal and with what confidence do we know?
►When doing a paired T test the first step is to create a set of the differences and keeping track of whether they're positive or negative does matter.
This would be the difference of the first pair 36 minus 42 equals negative 6.
41 minus 38 equals positive 3.
44 minus 43 equals positive 1.
40 minus 45 equals negative 5.
39 minus 42 equals negative 3.
37 minus 45 equals negative 8.
30 minus 36 equals negative 6.
41 minus 43 equals negative 2.
34 minus 35 equals negative 1.
These are the values we will actually use for the t-test.

Slide 21-23.

Our second step is to calculate the means of the samples and the mean and variance for the set of differences we just created.
►If you need a reminder on how to calculate these check out the StatsExamples video on calculating summary statistics.
For this data we get a mean of 38 for the left side, 41 for the right side, and the set of differences has a mean of negative 3 with a standard deviation of 3.6056 and this is based on a sample size of 9 differences.

Slide 24.

Now that we have these values for the set of differences, here is how we use them.
►The calculated T value will be the mean of the differences divided by the standard error which is itself the standard deviation divided by the square root of the sample size. So that's negative 3 divided by 3.6056 divided by the square root of 9. This gives us a T calculated value of negative 2.4962.
►The degrees of freedom for our test will be the sample size minus one, nine minus one is eight.
►It's always good to make sure that we understand exactly what allows us to use the paired T test. In this case it's the fact that each individual fish creates a distinct left and right pair that pair with one another, but not with any other values.

Slide 25-28.

Now we take our values and compare them to values from the T distribution. The first question will be how the magnitude, the absolute value, of the calculated T value compares to the critical value that corresponds to an overall alpha value of 5%
►We go to our T-table and look at the row for 8 degrees of freedom and the column for an alpha value of 0.025 since this is a two-tailed test.
►The value in the table is 2.306 which means that the middle 95% of this T distribution runs from negative 2.306 to positive 2.306. Now our question is how the negative 2.4962 compares to this region.
► The negative 2.4962 is smaller than the negative 2.306 which means it is outside of the center region. This means that our test will reject the null hypothesis of equal means for the two sides.
► We can therefore say that "The mean numbers of fin rays of the left and right sides are significantly different."

Slide 29.

So far we have this statement about how the means of the sides are different, but we can do better than this.
►In particular, this is missing 2 vital pieces of information.
►First, we should describe which side has the larger mean and which has the smaller mean. We have this information, why withhold it?
►Second, what degree of confidence do we have? This comes down to the exact P value that our test gives us, we'll get to calculating that in just a second.
►We know that the P value is less than 0.05, but if it's only slightly less, then the data just barely convinces us. It's a borderline result that technically allows us to make a statement about the difference being significant, but it's not the kind of data that would make us super confident about the result.
►On the other hand, if the P value is very small and much less than 0.05, then the data is very convincing. Extremely small P values are the ones that provide us with the evidence to make important decisions.
Now let's look at how we answer these two questions.

Slide 30-31.

For determining which side has the significantly larger mean we have two methods.
The first is using the sign of the T calculated value.
If tcalc is positive, then mean of first data set is larger, but if tcalc is negative, then mean of second data set is larger.
I personally don't recommend this method since it is easy to forget rules like this.
Instead, I recommend that we go back and look at the two sample means to answer this. When we look at the means, one will be larger and the other smaller, the question is answered without having to worry about negative numbers and which data set we put first.
►In this case the means were 38 for the left and 41 for the right so we can modify our conclusion to state that, ""The mean number of fin rays on the left sides of these fish is significantly smaller than the means on the right."

Slide 32-36.

To figure out the degree of significance for our test we compare our calculated T value to the whole set of critical values in our T table. Remember that these values correspond to areas more extreme than a central region encompassing most of the T distribution for 8 degrees of freedom as shown.
►The negative 2.4962 lies between two of the critical values in the table - the negative 2.449 and the negative 2.896.
►From our calculated T value being more extreme than the negative 2.449 we know that the P value is less than 0.04.
►However, our calculated T value is not more extreme than the negative 2.896 so we know that the P value is not less than 0.02, it must be larger.
►We can therefore state that, "The mean number of fin rays on the left sides of these fish is significantly smaller than the means on the right (0.02 ►If we have access to a computer, it can provide the exact P value of 0.0372 which we can see is less than 0.04, but more than 0.02.
This is moderately convincing evidence that would cause us to believe that there is some sort of asymmetry in the number of fin rays in these fish.

Slide 37.

Now consider a scenario where we are interested in whether two flower species differ in their abundances in a large area. What we do is create transects, randomly placed subregions which we sample, and count the number of plants of each species in them.
We then ask, are the means of the populations equal and with what confidence do we know?
We need to calculate the means of the samples, create the set of differences, and then calculate the mean and standard deviation for the set of differences.

Slide 38.

The differences for our 8 transects are shown to the left, you can see how each difference value depends on two of the original values. The means for the species are 22.125 and 19.250 for species A and B respectively. For the set of differences the mean is 2.875 with a standard deviation of 4.0510 based on 8 differences.

Slide 39-42.

This is our data.
►Now we calculate the T value with this equation, plugging in the mean, standard deviation, and sample sizes to get a T calculated value of 2.0073.
►The degrees of freedom is the 8 differences minus 1 to give us 7.
►The pairing that allow us to use the paired T test is that fact that each transect creates a distinct pair of values, one for each species, specific to that transect only.

Slide 43-46.

Now we compare our T calculated value to the critical value for 7 degrees of freedom which is is 2.365. Our calculated value of 2.0073 is not as extreme as this critical value.
The T calculated values ends up within the center 95% of the T distribution, indicating that we won't be able to reject the null hypothesis of equal means because P is larger than 0.05.
►We can get a better sense of what the P value is by looking at the rest of the T values on the row and we see that the T calculated value of 2.0073 lies between the values of 1.895 and 2.365 which correspond to overall P values of 0.1 and 0.05.
►We can therefore state that, "The mean numbers of each flower species are not significantly different (0.05 ►With a computer we can obtain the precise P value which is 0.0847.
The data in this example would not allow us to decide that the two flower species differ in their abundance.

Slide 47.

Now consider a scenario where we are interested in whether an experimental drug alters the cholesterol level of people who take it. We will collect a set of study subjects and measure their cholesterol levels before and after taking the drug.
We then ask, are the means of the cholesterol levels equal and with what confidence do we know?
We need to calculate the means of the samples, create the set of differences, and then calculate the mean and standard deviation for the set of differences.

Slide 48.

The differences for our 6 individuals are shown to the left. The mean cholesterol level before taking the drug was 264 and the mean value afterwards is 260.833 so maybe the drug lowers the values, but we don't know for sure until we see if this difference is significant.
For the set of differences the mean is 3.167 with a standard deviation of 1.4720 based on 6 differences.

Slide 49.

Now we calculate the T value with our equation, plugging in the mean, standard deviation, and sample size values to get a T calculated value of 5.2697.
Our degrees of freedom is the 6 differences minus 1 equals 5.
The pairing that allow us to use the paired T test is that fact that each pair of cholesterol values are specific to a unique individual before and after taking the drug.

Slide 50-54.

Now we compare our T calculated value to the T value for 5 degrees of freedom. First we see that it is larger than 2.571 so we know the difference is significant.
Continuing on through the rest of the row in the table we see that our calculated value is between 4.773 and 5.893 which corresponds to overall alpha values of 0.005 and 0.002.
►We can therefore state that, "The mean cholesterol value after treatment is significantly lower than the mean value before treatment (0.002 ►With a computer we can obtain the precise P value which is 0.0021.
There are a couple of other very important points that this example demonstrates.
►First, is a reduction of 3.167 a meaningful change? Our data shows that it is a significant, non-random, difference, but that doesn't automatically mean that the difference is relevant or important.
Maybe such a small change would not be worth the expense or possible side-effects from the drug. Maybe a difference of only 3 points on the cholesterol scale is clinically meaningless.
Statistics can tell us whether things are significant, but knowledge of the field tells us whether they are meaningful or relevant.
►Second, if we had done an unpaired T test instead, we would have obtained a nonsignificant P value of 0.06 and concluded that there was no difference in the means. I've mentioned that the paired T test is more powerful than the unpaired test, but why?
►This figure shows the before and after values for our 6 subjects. The fact that there is a clear difference is visually obvious and revealed by a test that compares each pair.
However, if we just combined all the data into two unpaired sets then the large yet consistent differences between both values for each subject would have obscured the effects of the drug.
The paired T test gets its power from filtering out the patterns of variation that exist for both of the values in the pairs so that the test can focus on the difference between the values in each pair caused by the treatment.

Slide 55.

There are a few things to keep in mind when doing paired T tests.
Pairing must be based on inherent criteria, we can't just choose the pairs. We looked at examples where the pairing was forced with transects, left vs right, and before vs after, but there are als o examples of studies using twins, or clones, etc.
Even if a paired design goes bad, the unpaired test could be used. It's not as good, but it's a backup if something goes wrong during the experiment.
Lastly, if possible, use the paired test since it is much more powerful.
In fact, scientists go out of their way and spend lots of money to design paired experiments.
►For example, every year since 1976 there has been a twins festival in Twinsburg, Ohio where twins from around the world gather to meet with others who understand their unusual lives. It started off as just a fun event, but now this festival also draws tons of scientists who recruit these individuals for their studies. The availability of genetic pairs of individuals is a powerful tool for scientific studies.
A copy of the link to a story about this festival is in the video description below.

Zoom out.

I hope you found these examples of the two-sample paired T test useful. A copy of this screen is available on the StatsExample website.

End screen.

Click to like, subscribe, or share - you can even do two of these things to make a pair.



Connect with StatsExamples here


This information is intended for the greater good; please use statistics responsibly.