CONFIDENCE INTERVALS FOR THE POPULATION MEAN (EXAMPLES)

Tweet
Share

Watch this space

Detailed text explanation coming soon. In the meantime, enjoy our video.

The text below is a transcript of the video.



Connect with StatsExamples here



LINK TO SUMMARY SLIDE FOR VIDEO:


StatsExamples-confidence-intervals-examples.pdf

TRANSCRIPT OF VIDEO:


Slide 1

Confidence intervals are widely used in statistics, but calculating them can sometimes be a little confusing. Let's look at some examples of calculating confidence intervals and by the end of this video hopefully it will be a piece of cake.


Slide 2

First let's look at a review of what confidence intervals are.
You can watch our intro to confidence intervals video for more information about this, but here's a quick summary
Confidence intervals are used when we want to describe a population mean. Since we generally can't measure the whole population, we have to take a random sample from the population. We then calculate a sample mean from that sample.
The sample mean is an estimate of the population mean, but sampling error makes it inexact. Because of randomness, our sample is not guaranteed to perfectly match our population.
So what do we do to convey that uncertainty?
What we do is we specify a region around the sample mean which probably includes the population mean. Basically, we can make a good guess that the population mean is somewhat similar to our sample mean within a region that we will call the confidence interval for the population mean.


Slide 3

The theoretical justification for our confidence intervals comes from something called the central limit theorem.
This theorem tells us that the distribution of sample means from a population is normally distributed.
As represented in the diagram on the right, no matter what the population distribution looks like, if we take a bunch of samples and calculate the means of those samples, most of those means will be close to the population mean. Only a few will be far away from the population mean and the distribution of those sample means will be normal. The sample means have a normal distribution centered around the population mean with a variance based on the population variance divided by the sample size of our samples.
This result is useful because any given sample is probably from the middle region of this distribution of all the possible sample means. Based on the properties of normal distributions, if we know the population variance and standard deviation, we can specify the exact probability of a particular sample being within a certain distance from the population mean.
However, we don't know the population variance so we have to estimate it from the sample and this estimate might be inaccurate.
To avoid bias arising from the fact that samples tend to underestimate the population variance, we use a t distribution to describe the middle region of our sample means instead of a normal distribution.
We can then use this T distribution just like a normal distribution to calculate probabilities for where the population mean probably is, based on our sample mean.
►Technically you can make confidence intervals with a normal distribution, but you shouldn't unless you are certain about the population variance or you have an extremely large sample size. In these cases, the t distribution will end up matching the normal distribution.


Slide 4

Since the sample mean we have is probably inside the t distribution for the sample means, we create a t distribution around our sample mean. The center of that t distribution probably corresponds to a region around the population mean.
Conceptually we are using the idea from the central limit theorem picture shown in the previous slide, but in reverse. We're using the fact that our sample is in that T distribution around the population mean to make an inference about the location of the mean of the population that would result in that T distribution.
As shown in the figure, we create the T distribution around our sample mean. Identifying the middle 95% of that T distribution would allow us to say there's a 95% probability, or that we have a 95% confidence, that the population mean is in that certain region.
95% in the middle means 2.5% on each side of that middle region.
To make our confidence interval , we just need to identify that middle region


Slide 5

There isn't just one T distribution, there's a different T distribution for each sample size. The standard deviation of each T distribution, something called the standard error, is equal to the sample standard deviation divided by the square root of the sample size.
Regions around the sample mean which contain certain proportions of the overall area are multiples of these standard errors. These regions correspond to probabilities in the same way that we can use the normal distribution to calculate probabilities.
►The widths of these regions are calculated and tabulated in T tables like the one shown here from the StatsExamples website.
the values that make up the body of this table are the values on the X axis, The T values that represent how far to the right and left you have to go to define certain regions in the middle.
This table is organized with rows for each degrees of freedom value. Degrees of freedom is related to the sample size and for our confidence interval calculations the degrees of freedom will be equal to the sample size minus one.
Each column in this table corresponds to an alpha value which is the area to the right of the T value in the table. Since we're interested in the middle region, we will be multiplying these alpha values by two. For example, in order to get the middle 95%, we would want to look in our table for the alpha value that corresponds to 2.5% so that we have 2.5% on each end which leaves 95% in the middle.
Whichever value we identify in the table is the number of standard errors above and below the sample mean that we need to go so that the area in the middle of that T distribution is what we want it to be.
If this is a little confusing, let's look at a couple of examples that will hopefully make everything clear.


Slide 6

For our first example let's consider a sample of 16 values from a population with a mean of 12 and a standard deviation of 5.0. This mean and standard deviation refer to our sample which is what we have the data for, not the population which is what we want to know about but we can't measure completely. We're looking at a sample and using that sample data to make an inference about where the population mean probably is.
For this scenario we're going to calculate 3 different confidence intervals. What is the 95% confidence interval for the population mean, what is the 99% confidence interval for the population mean, and what is the 99.8% confidence interval for the population mean.
Once we get these intervals, we will be able to say that we are a certain percentage confident that the true population mean lies within this interval. In other words, the probability that the population mean lies within our confidence interval is going to be either 95%, 99%, or 99.8%
►The first step is to calculate the standard error because this is the value that we will be multiplying by the T value from our table to figure out how wide to make our confidence interval.
The equation for the standard error is the standard deviation divided by the square root of N. We don't know the population standard deviation sigma so we will estimate it with the sample standard deviation s. This gives us 5 divided by the square root of 16 which is 5 divided by 4 which is a standard error of 1.25.

Slide 7

Let's look at our first question, what is the 95% confidence interval for the population mean. We're going to be taking our standard error of 1.25 and multiplying it by the appropriate T value from our T table. To figure out what value that should be we need the degrees of freedom and the Alpha value.
►First, the degrees of freedom value is the sample size minus one, or N - 1. Since N = 16 for this example the degrees of freedom will be 15. So we would go to our T table and look at the row that corresponds to degrees of freedom 15.
►Next, we need our Alpha value. A 95% confidence interval uses a 2.5% Alpha value on each side. So in the table we'll be using the column for Alpha equals 0.025.
►The T value to use is therefore 2.131. Our 95% confidence interval will be 12 (the sample mean), plus or minus 2.131 (our T value), multiplied by 1.25 (our standard error).
Multiplying this out gives us 12 plus or minus 2.66375 which would give us a range of 9.336 to 14.664 for our 95% confidence interval.
We can then say, based on our sample data, that we are 95% confident that the population mean is somewhere between 9.336 and 14.664. There is a 5% chance that it's not in that range, but there's a much higher probability that it is inside our confidence interval.


Slide 8

Let's look at our second question, what is the 99% confidence interval for the population mean. Again, we're going to use our standard error of 1.25 and multiply it by the appropriate T value from our T table.
►Just as before, the degrees of freedom value is the sample size minus one which is 16 minus one equals 15. We will use the row that corresponds to degrees of freedom 15.
►Next, we need our Alpha value. A 99% confidence interval uses a 0.5% Alpha value on each side. So in the table we'll be using the column for Alpha equals 0.005.
►The T value to use is therefore 2.947. Our 99% confidence interval will be 12 (the sample mean), plus or minus 2.947 (our T value), multiplied by 1.25 (our standard error).
Multiplying this out gives us 12 plus or minus 3.68375 which would give us a range of 8.316 to 15.684 for our 99% confidence interval.
We can then say, based on our sample data, that we are 99% confident that the population mean is somewhere between 8.316 to 15.684. There is a 1% chance that it's not in that range, but there's a much much higher probability that it is inside our confidence interval.


Slide 9

Let's look at our third question, what is the 99.8% confidence interval for the population mean. Again, we're going to use our standard error of 1.25 and multiply it by the appropriate T value from our T table.
►Just as before, the degrees of freedom value is 15.
►Next, we need our Alpha value. A 99.8% confidence interval uses a 0.1% Alpha value on each side. So in the table we'll be using the column for Alpha equals 0.001.
►The T value to use is therefore 3.733. Our 99.8% confidence interval will be 12 (the sample mean), plus or minus 3.733 (our T value), multiplied by 1.25 (our standard error).
Multiplying this out gives us 12 plus or minus 4.66625 which would give us a range of 7.334 to 16.666 for our 99.8% confidence interval.
We can then say, based on our sample data, that we are 99.8% confident that the population mean is somewhere between 8.316 to 15.684. There is a 0.2% chance that it's not in that range, but the probability that it's inside our confidence interval is almost one.


Slide 10

Let's take a look at the three confidence intervals we just calculated. When we wanted an interval with a greater degree of confidence, we had to use a smaller Alpha value which resulted in a larger T value which meant that our interval was larger.
There's a tradeoff between the degree of confidence we have in our interval and the width of that interval. this makes conceptual sense, if we put a larger interval around our sample mean we are more confident that that interval will include the population mean.
In the real world these larger intervals might be less useful and which confidence interval we work with can vary. To be honest, 95% confidence intervals are by far the most common confidence intervals you will see calculated in scientific reports and papers or be asked to calculate when preparing those things yourself.
If you know about P values and the importance of a P value of 0.05 you can see how that corresponds with the 95% we're working with here. The 5% probability that the population mean is outside of our confidence interval matches with the 5% chance of rejecting a null hypothesis when it is correct.
Let's look at another example, but this time we'll vary the sample size.


Slide 11

For our second example let's consider a sample from a population with a mean of 18 and a standard deviation of 5.2. As before, these mean and standard deviation values refer to our sample.
For this scenario we're going to calculate 3 different 95% confidence intervals. What is the 95% confidence interval for the population mean if the sample sizes are 9, 18 , or 36.
Once we get each of these intervals, we'll be able to say that we are 95% confident that the true population mean lies within this interval. Before we even start, we can predict that the confidence intervals will get narrower as our sample sizes gets larger. This is because larger samples will be less prone to randomness resulting in sample means that aren't close to the population mean so we'll be able to be more confident about where the population mean is.
►Unlike the previous example, we can't calculate a single standard error since the sample sizes differ. Remember that the standard error term had the square root of the sample size in the denominator


Slide 12

Let's look at our first sample, what is the 95% confidence interval? We're going to calculate the standard error and multiply it by the appropriate T value from our T table. As before, we'll need the degrees of freedom and the Alpha value.
►First, a 95% confidence interval uses a 2.5% Alpha value on each side. So in the table we'll be using the column for Alpha equals 0.025.
►Then, the degrees of freedom value. This is the sample size minus one, which for this example is a sample size of 9 - 1 is 8. So we will use the row that corresponds to degrees of freedom 8.
The row and column intersect to provide a T value of 2.306.
►Lastly, we need the standard error. This will be the standard deviation for our sample divided by the square root of our sample size. This is 5.2 divided by the square root of nine, which is 3, to give us 1.7333.
►Our 95% confidence interval will be 18 (the sample mean), plus or minus 2.306 (our T value), multiplied by 1.7333 (our standard error).
Multiplying this out gives us 18 plus or minus 3.997 which would give us a range of 14.003 to 21.997 for our 95% confidence interval when we have a sample size of 9.


Slide 13

Now for the next 95% confidence interval We have to recalculate the standard error since our sample size is increased, The appropriate degrees of freedom value for our T table is also different because our sample size has increased. The Alpha value stays the same however.
►First, the 95% confidence interval means we'll use the column for Alpha equals 0.025.
►The degrees of freedom value is the sample size minus one, which is now 18 minus 1 equals 17. So we will use the row that corresponds to degrees of freedom 17.
The row and column intersect to provide a T value of 2.110.
►Lastly, the standard error, the standard deviation for our sample divided by the square root of our sample size. This is 5.2 divided by the square root of 18, which is 4.2426, to give us 1.2257.
►Our 95% confidence interval will be 18 (the sample mean), plus or minus 2.110 (our T value), multiplied by 1.2257 (our standard error).
Multiplying this out gives us 18 plus or minus 2.586 which would give us a range of 15.414 to 20.586 for our 95% confidence interval when we have a sample size of 18.


Slide 14

Now for our last 95% confidence interval Again, we have to recalculate the standard error and select a different appropriate degrees of freedom value because our sample size has increased. The Alpha value stays the same.
►First, the 95% confidence interval means we'll use the column for Alpha equals 0.025.
►The degrees of freedom value is now 36 minus 1 equals 35. We'll use the row that corresponds to degrees of freedom 35.
The row and column intersect to provide a T value of 2.030.
►Lastly, the standard error, the standard deviation for our sample divided by the square root of our sample size. This is 5.2 divided by the square root of 36, which is 6, to give us 0.8667.
►Our 95% confidence interval will be 18 (the sample mean), plus or minus 2.030 (our T value), multiplied by 0.8667 (our standard error).
Multiplying this out gives us 18 plus or minus 1.759 which would give us a range of 16.241 to 19.759 for our 95% confidence interval when we have a sample size of 36.
This situation raises another rule for using the t-distribution. In this example we have 35 degrees of freedom and so does the table. But if we had a slightly higher sample, 38 degrees of freedom for example, our table doesn't have a row for that. So what should we do?
If a situation arises when you don't have t-values for the exact degrees of freedom value you need you should always round the degrees of freedom value down. If our degrees of freedom value was 38 and all we have is this table, we should use the row for 35 degrees of freedom instead of rounding up. In fact, even if the degrees of freedom was 39 we should round down to 35 instead of up to 40.
If we round down, we are calculating the confidence intervals as if we had a bit less data and being conservative in our conclusions. We are not overstating the strength of our evidence.
On the other hand, if we round upwards then we are overstating the strength of our conclusion and portraying a conclusion with more confidence than is justified. This is not a good idea.


Slide 15

Let's take a look at the three confidence intervals we just calculated. When we had a larger sample size two things happened to make the confidence interval narrower.
First, with more data the standard error value became smaller because of the square root of the sample size in the denominator.
Second, with more degrees of freedom we used T values that were lower in the table and they tended to be smaller, indicating that we needed to go fewer standard errors above and below our sample mean to create our confidence interval.
►An interesting thing to notice about our confidence intervals is that there are diminishing returns for increased sample size. Doubling and quadrupling the sample size didn't cut the width of the confidence interval in half or down to one quarter, they only reduced the confidence intervals down to 65% and 46% of the width.
This brings up an important issue in experimental design. When we design experiments the sample size is often a major part of the cost or time of doing the study. But there are diminishing returns for increasing the sample size.
This means that sometimes, to get to a level of precision that we need, the experiment may need to be larger than we can afford. Or it may not even be possible to collect enough data to get a confidence interval that is narrow enough for our practical purposes.


Slide 16

For our last point let's return to something I mentioned earlier.
The T distribution is used instead of the normal distribution because we don't know the population variance and samples typically underestimate it so we use a T distribution which is wider then the normal distribution to eliminate this bias.
But what if we use the normal distribution instead of the T distribution?
This slide shows the widths of the three confidence intervals we just calculated for sample sizes 9,18, and 36. We used the T distribution to calculate these confidence intervals.
What would the confidence intervals look like if we use the normal distribution to determine how many standard errors above and below our sample mean to go when we created our confidence interval?
►For a sample size of 9, the interval is 85% as wide. This portrays a much more restricted region for where the population mean is than we have actual justification for.
►For a sample size of 18, the interval is 93% as wide. This portrays a somewhat more restricted region for where the population mean is than we have actual justification for.
►For a sample size of 36, the interval is 97% as wide. This confidence interval is only slightly smaller than the one we have justification for, but it's still inaccurate.
Using the normal distribution is like rounding our sample size upwards by a lot. Technically, this is like rounding the sample size up to infinity. As mentioned earlier, this is a bad idea because it would give us unwarranted overconfidence in our mathematical conclusions.
If we use confidence intervals that are too narrow, then we increase our risk of thinking we know where the population mean is when we don't. We become more likely to think our population mean is in a certain range when it isn't.
►This is a form of statistical error where we accept or reject a null hypothesis incorrectly. See our video about type I and II errors for more about this.
Since we want to avoid making the wrong conclusions whenever we can, it's best to always use the t-distribution instead of the normal distribution when calculating confidence intervals for all but the most gigantic data sets.

Zoom out

Published scientific data almost always includes calculations of confidence intervals. Larger sample sizes will tend to result in narrower confidence intervals, but there are diminishing returns. Higher degrees of confidence will result in wider confidence intervals which are more likely to be correct, but may not be as useful.
In any case, as I hope these examples have shown, calculating these intervals is quick and straightforward.
A link to a PDF of the slides used in this video, essentially this slide, can be found on the StatsExamples website and using the link below.

End screen

If you found this video useful, please help others find it too. Commenting, liking, and subscribing all increase the chances that YouTube will recommend this video to other people looking to figure out how to calculate confidence intervals.




Connect with StatsExamples here


This information is intended for the greater good; please use statistics responsibly.