FALSE POSITIVES AND NEGATIVES
Watch this spaceDetailed text explanation coming soon. In the meantime, enjoy our video. The text below is a transcript of the video.
Connect with StatsExamples here
LINK TO SUMMARY SLIDES FOR VIDEO:
TRANSCRIPT OF VIDEO:
The risks of false positives and negatives when doing statistical tests is a real issue that we need to be aware of. How those risks are influenced by the probabilities and frequencies of what we are studying can be very surprising.
When we do formal statistical testing we look at the null hypothesis and alternative hypothesis. The null hypothesis is either true or false and we and we make a decision after analyzing our data to either accept or reject the null hypothesis.
When the null hypothesis is true and we accept it, or when it is false and we reject it then we've made the correct decision.
But when the null hypothesis is true and we reject it, or when the null hypothesis is false and we accept it, we have made either a type 1 or type 2 error.
These errors correspond to false positives or false negatives when we think about this in the context of a medical test or some other model where the null hypothesis can be described as a negative result.
In the other vide on this channel, which introduces and the type I and type II terminology and talks about these issues in more detail, the main focus was on the trade-off between making type 1 and type 2 errors. But it turns out that the risk of type 1 and type 2 errors can also be influenced by how likely the null hypothesis is when you perform your test.
Let's look at an example. Say we have a medical test for cancer or HIV or something else.
The null hypothesis when doing such tests would be that the person is not sick and the alternative would be that the person is sick.
So in that scenario, the type 1 error is a false positive, the person gets a positive test result when in fact they don't have the condition. And the negative consequences of that include it scaring the person and causing more follow-up tests which then causes money.
The type 2 error would be a false negative, the person has the condition but the test results says they don't. The negative consequence of this is that they would go untreated until some later time when their symptoms have gotten worse and they pursue more tests.
In this scenario, the type 2 error seems more serious, but the relative costs to society can change as the frequency of the condition varies.
Let's see how this works by looking at an example. Let's consider a hypothetical test which has a false positive rate of 2% and a false negative rate of 3%.
What if we test a hundred thousand people in which 10% of them have the condition that we are concerned about? In this example a hundred thousand individuals will be tested and they will either get a positive test result or a negative test result.
Let's think about the number of individuals for whom the null hypothesis is true and false. That's 90,000 individuals without the condition and 10,000 individuals with the condition.
Of the 90000 individuals without the condition, the false positive rate of 2% means that 1,800 individuals will get a positive test result and the remaining 88,200 will get a negative test result.
Of the 10,000 individuals that do have the condition, the 3% false negative rate will result in 300 individuals getting a negative test and the remaining 9,700 individuals getting a positive test result.
Now let's focus on the individuals who receive a positive test result. Keep in mind that this is what we see, we don't see the true values for the sick and healthy, just the test results.
How many of the positive test results represent type 1 errors?
A total of 1800 + 9700 equals 11,500 individuals receive a positive test result.
Of those, 1,800 are individuals that don't have the condition.
That would mean that 1,800 divided by 11,500 equals 16% of people with a positive test result are, in fact, not sick.
So even though our test is pretty good, 97% or 98% accurate depending on how you think about it, it returns a mistake 16% of the time.
This is the exact same scenario. The only thing that has changed is that now the percentage of individuals with the condition has changed from 10% down to 1%.
Now if we test a hundred thousand individuals, 99,000 will not have the condition and 1000 will have the condition.
Of the 99,000 without the condition, 2% will test positive so that's 1,980. The rest will test negative, as they should, and that's the 97,020.
Of the 1,000 individuals with the condition, 3% will test negative so that's 30. The remaining 970 will test positive as they should.
Now how many of the positive test results represent type 1 errors?
A total of 1,980 + 970 equals 2,950 individuals receive a positive test result.
Of those, 1,980 are individuals that don't have the condition.
That would mean that 1,980 divided by 2,950 equals 67% of people with a positive test result are, in fact, not sick.
So now, even though our test is still pretty good with that 97% or 98% accuracy, it returns a mistake over half the time.
Let's look at a third scenario. Now the percentage of individuals who have the condition is one tenth of a percent.
Now if we test a hundred thousand individuals, 99,900 will not have the condition and 100 will have the condition.
Of the 99,900 without the condition, 2% will test positive so that's 1,998. The rest will test negative, as they should, and that's the 97,902.
Of the 1900 individuals with the condition, 3% will test negative so that's 3. The remaining 97 will test positive as they should.
Now how many of the positive test results are false positives?
A total of 1,998 + 97 equals 2,095 individuals receive a positive test result.
Of those, 1,998 are individuals that don't have the condition.
That would mean that 1,998 divided by 2,095 equals 95% of people with a positive test result are, in fact, not sick.
Think about what this means, our 97 or 98% accurate test is now getting it wrong 95% of the time.
These three scenarios show how the frequency of a condition we are testing for can dramatically skew the accuracy rates of the overall results.
For any particular person the test is good, but when we administer it broadly it gives us bad information much more often than it gives us good information when the true risk becomes low.
Let's look at these results in a slightly different way.
This figure shows the 3 values we just calculated.
On the y-axis is the percent of people who test positive, but don't have the condition. On the x-axis we have the frequency of the condition in the population on a log scale ranging across the three values we just looked at the examples for.
The overall pattern would look like this if we calculate the overall false positive rate for the entire range of other values on the X-axis.
But this is just the pattern for one example test, now let's consider a range of tests and frequencies.
This figure shows how the overall false positive rate corresponds to the condition frequency for 3 example tests.
The solid red line shows the values for a test that only has a false positive and negative rate of 1% - a truly 99% accurate test.
The dotted blue line shows the values for a test with a false positive and negative rate of 2% - a 98% accurate test.
The dashed blue line shows the values for a test with a false positive and negative rate of 3% - a 97% accurate test.
You can see that as the condition gets rare, the left side of the figure, even for very good tests, almost everybody who gets a positive test result is not in fact sick.
What that means is that if we test a large number of people for a rare condition, most of the positive test results will be people who are not sick and that can be a serious problem.
Even if we have a very good diagnostic test, if we test for rare conditions, most of the positive results we get are type 1 errors.
The challenge is then how to balance these type 1 and type 2 errors.
If we don't test people, we'll miss serious conditions. But even when we do test people we'll get false negatives.
This costs lives.
The approach to missing people would be to do more testing and a possible solution to the false negative issue would be multiple tests for each person, even when they get negative initial results.
The solution therefore seems to be more testing.
However, if we do lots of testing we'll get lots of false positives.
But when we do that we will get to the situation where most of the positive tests are false positives.
Each false positive costs money, adds stress, and undermines the public confidence in modern medicine.
This last point is important because if people across society start constantly getting incorrect test results, they will start to doubt all kinds of things about medicine. For example, "if their tests are always wrong maybe they're wrong about vaccines too"
The solution therefore seems to be less testing.
This trade-off and calculations like the ones we've just looked at is why the advice on mammograms, prostate exams, and other kinds of diagnostic tests keeps changing.
Here's a screenshot of a news story from a few years ago about the government getting involved in mammogram recommendations.
It's not always that we develop better tests or discover problems with the ones we have, sometimes these recommendations change because of a desire to prevent flooding the medical community and society with too many false positives.
There is no perfect answer to the tradeoff.
More testing has a downside.
Less testing has a downside.
Both downsides cost money and lives.
No test is perfect, and resources are limited.
We can't afford to test everyone multiple times to eliminate all initial false positives for rare conditions.
Therefore, for very good reasons we don't test as much as we could, and some people die.
As mentioned, there is no single solution to the tradeoff problem, but learning about statistics and recognizing that this can happen can go along way to thinking about it correctly.
Understanding the relationship between the frequency of a disorder and the risk of obtaining false positives and negatives is important.
Knowing this adds context to the resistance from doctors when patients constantly request the latest tests for every rare ailment they've researched on the internet. Good doctors know about this false positive risk, but their patients typically don't.
Show your positive reactions, but not any negative ones, by clicking as appropriate.
Connect with StatsExamples here
This information is intended for the greater good; please use statistics responsibly.