The power of P numbers
Sep. 18th, 2018 08:53 pmI start with a statement saying that I am not a statistician. I am merely an amateur number cruncher. If anyone spots any mistakes or bits that could be clarified, please tell me.
Sorry
ioplokon for this taking so long. The main problem was I was making it over-complicated, so I went back to basics.
For the purpose this example, imagine we are testing to see if using drug A improves cure rates compared to the standard of care. It's a very early trial so we're only going to test it on 10 patients. So that's 10 patients being treated with A and 10 patients being treated with the standard of care (this is your control arm).
1 - Despite what you've heard, most clinical trials do not test against a placebo. They test against standard of care. New drugs have to be as good as what's already out there.
The hypothesis we are testing is: Drug A is better than the standard of care in disease x.
2 - If someone ever tries to blind you with stats, make them tell you what their hypothesis is. If there isn't a hypothesis, the hypothesis is not testable or what they're doing will not help to test the hypothesis, they are trying to bamboozle you.
So we run our experiment. 6 patients in the drug A arm are cured, while 5 people in the control arm are cured.
Is that a real difference, or could that have happened by sheer luck?
I mean, obviously, we’ve tried to make the two groups of patients as similar as we could in terms of disease stage and physical condition but we could have missed something important that makes one person more responsive to treatment than another and therefore not fully or properly balanced the groups.
We are going to use stats to try answer that question.
What statistical significance tells us is "what is the chance that this result is real and not just sheer luck?" So if something is significant to p=0.1, there is a 1 in 10 chance that this result is sheer luck. In science, we say something is statistically significant if p ≤0.05. That means that there is a 1/20 or less chance of the result being due to luck.
3 - Statistical significance does not mean a result is true/real, it means it is unlikely to be due to random chance.
When we run a test of this data, we are trying to disprove the null hypothesis. The null hypothesis is "the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error."
First, we need to know the degrees of freedom in the data. The equation for this is: DF = k - 1
or Degrees of freedom = categorical variables - 1.
In this example, there are two categorical variables, cured and not cured, so the degrees of freedom is 1.
Next we need to work out the expected value for the data. There is a proper equation for this (please see here) but we have a pre-built expected in the control population. The expected number of people cured is the number cured in our control population so our expected value is 5.
We now work out the test statistic, χ2 = ∑[(Oi - Ei)2 /Ei]
∑ is the sum of, O is the observed, E is the expected.
Or, we square (the observed - the expected) then divide it by the expected. We repeat this for the cured patients and the not cured.
In our case, for the cured the observed is 6 and the expected is 5.
So (6-5) = 1
1 squared = 1
1/5 = 0.2
For the not cured, the observed is 4 and the expected is 5.
So (4-5) = -1
-1 squared = 1
1/5 = 0.2
Adding 0.2 for the cured patients, and the 0.2 for the not cured patients gives 0.4.
We then take that value and look it up on a chi squared table. Do not worry about the length and size of the table. The rows are labelled with degrees of freedom. The degree of freedom in our example is 1, so we are interested in the first row.
Go across this row from left to right until you hit the number two, then look up to the top of the column to read the associated p value.
In our case, a p value of 0.4 lies between a p value of 0.975 and 0.2. As both of these values are higher than 0.05, we can't discard the null hypothesis, or, to put it in plainer English, the result of the experiment, where the new treatment had 6 people cured, while the old treatment had 5, could have been due to sheer luck. We cannot say that the extra cured person was due to the new treatment.
Now, as you can imagine, with more complicated numbers, the maths can get a bit tricky, so there are online calculators, I tend to use this one - https://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx
On the first page, our two categories are cured and not cured. The observed cured is 6, so the observed not cured is 4. Our expected, the value from our control is 5 and 5 and we're looking at a 0.05 significance level.
The programme returns a result for us and says "The Chi^2 value is 0.4. The P-Value is 0.527. The result is not significant at p=0.05."
Which means exactly the same as the by-hand version. We can't say that new treatment was the reason why the extra person in the "new treatment" cured was compared to the old treatment.
Sorry
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
For the purpose this example, imagine we are testing to see if using drug A improves cure rates compared to the standard of care. It's a very early trial so we're only going to test it on 10 patients. So that's 10 patients being treated with A and 10 patients being treated with the standard of care (this is your control arm).
1 - Despite what you've heard, most clinical trials do not test against a placebo. They test against standard of care. New drugs have to be as good as what's already out there.
The hypothesis we are testing is: Drug A is better than the standard of care in disease x.
2 - If someone ever tries to blind you with stats, make them tell you what their hypothesis is. If there isn't a hypothesis, the hypothesis is not testable or what they're doing will not help to test the hypothesis, they are trying to bamboozle you.
So we run our experiment. 6 patients in the drug A arm are cured, while 5 people in the control arm are cured.
Is that a real difference, or could that have happened by sheer luck?
I mean, obviously, we’ve tried to make the two groups of patients as similar as we could in terms of disease stage and physical condition but we could have missed something important that makes one person more responsive to treatment than another and therefore not fully or properly balanced the groups.
We are going to use stats to try answer that question.
What statistical significance tells us is "what is the chance that this result is real and not just sheer luck?" So if something is significant to p=0.1, there is a 1 in 10 chance that this result is sheer luck. In science, we say something is statistically significant if p ≤0.05. That means that there is a 1/20 or less chance of the result being due to luck.
3 - Statistical significance does not mean a result is true/real, it means it is unlikely to be due to random chance.
When we run a test of this data, we are trying to disprove the null hypothesis. The null hypothesis is "the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error."
First, we need to know the degrees of freedom in the data. The equation for this is: DF = k - 1
or Degrees of freedom = categorical variables - 1.
In this example, there are two categorical variables, cured and not cured, so the degrees of freedom is 1.
Next we need to work out the expected value for the data. There is a proper equation for this (please see here) but we have a pre-built expected in the control population. The expected number of people cured is the number cured in our control population so our expected value is 5.
We now work out the test statistic, χ2 = ∑[(Oi - Ei)2 /Ei]
∑ is the sum of, O is the observed, E is the expected.
Or, we square (the observed - the expected) then divide it by the expected. We repeat this for the cured patients and the not cured.
In our case, for the cured the observed is 6 and the expected is 5.
So (6-5) = 1
1 squared = 1
1/5 = 0.2
For the not cured, the observed is 4 and the expected is 5.
So (4-5) = -1
-1 squared = 1
1/5 = 0.2
Adding 0.2 for the cured patients, and the 0.2 for the not cured patients gives 0.4.
We then take that value and look it up on a chi squared table. Do not worry about the length and size of the table. The rows are labelled with degrees of freedom. The degree of freedom in our example is 1, so we are interested in the first row.
Go across this row from left to right until you hit the number two, then look up to the top of the column to read the associated p value.
In our case, a p value of 0.4 lies between a p value of 0.975 and 0.2. As both of these values are higher than 0.05, we can't discard the null hypothesis, or, to put it in plainer English, the result of the experiment, where the new treatment had 6 people cured, while the old treatment had 5, could have been due to sheer luck. We cannot say that the extra cured person was due to the new treatment.
Now, as you can imagine, with more complicated numbers, the maths can get a bit tricky, so there are online calculators, I tend to use this one - https://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx
On the first page, our two categories are cured and not cured. The observed cured is 6, so the observed not cured is 4. Our expected, the value from our control is 5 and 5 and we're looking at a 0.05 significance level.
The programme returns a result for us and says "The Chi^2 value is 0.4. The P-Value is 0.527. The result is not significant at p=0.05."
Which means exactly the same as the by-hand version. We can't say that new treatment was the reason why the extra person in the "new treatment" cured was compared to the old treatment.