redfiona99 | Entries tagged with statistics

When I posted about the 2023 Tour de France withdrawals (https://fulltimesportsfan.wordpress.com/2023/11/18/withdrawals-in-week-3-of-the-2023-tour-de-france-an-overall-round-up-and-confirmation-that-the-olympics-didnt-cause-more-withdrawals/),

ioplokon wondered whether the fact that the teams on the women's tour had fewer resources for recuperation etc might affect which racers completed the Tour de France.

Using the stats from Procycling Stats (https://www.procyclingstats.com/), I wanted to see if there was anything in that theory.

Attempt one - if it is a "recovery problem", I'd expect the riders who raced the week before to be less likely to finish the Tour de France Feminine (TDFF).

	Raced within 1 week of the TDFF	Did not race within one week of the TDFF
Completed the TDFF	32	91
Did not complete the TDFF	7	24

Putting those numbers in a 2 x 2 Chi-Squared table suggested that there was no statistically significant difference as to whether racers finished or did not finish the Tour de France Feminine depending on whether or not they raced the week before.

Okay, so I thought, maybe those were only smaller races. Maybe a bigger race would take it out of the racers more. So, was there any difference in whether or not a racer finished the Tour de France Feminine based on whether they'd raced the Giro d'Italia Donne?

	Raced the Giro	Did not race the Giro
Completed the TDFF	34	89
Did not complete the TDFF	9	22

Again, using a 2 x 2 Chi-Squared table suggested that there was no statistically significant difference.

I'm going to say, the big thing that surprised me was how small the crossover was between people who raced in the Giro Donne and the TDFF. I would have expected there to have been more.

Final theory was, hey, maybe cumulative damage would affect this. So, of the 43 riders who did both races, did finishing the Giro Donne have any effect?

	Finished the Giro	Did not finish the Giro
Completed the TDFF	31	3
Did not complete the TDFF	7	2

Sadly for L's hopes that I will make strong conclusions to my posts, nope, there was no statistical significance there either.

So what are we seeing - if there is any factor that affects likelihood of finishing the Tour de France Feminine, it is not having raced the week before, it's not having competed in the grand tour before it, and it is not completing the grand tour before it.

Further investigations would be whether this extends to the men's Tour de France too, given the greater overlap in competitors between the grand tours.

I never actually drop projects, I just don't update them for a while.

So let us return to the Benford's Law project, with information about the first digits in the top news article on the BBC website on 26 out of the 31 days of August 2021. In those 26 articles, there were 398 numbers with leading digits. That's ~ 15 per day, which about the same as June, but more than July.

Most of those numbers came from the article on the 8th of August (https://www.bbc.co.uk/sport/olympics/58112331) which was about the performance of different sports at the Tokyo Olympics compared to their funding.

( The stats for August only )

No number appeared exactly as often as expected, 5 was the closest, but even that was 1% away from expected. 1 and 2 are the most different to their expected values, both are over-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 8.5, the highest since February itself.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If we look at the rolling total from February to the end of August, there have been 2258 numbers with leading digits.

( Rolling total from February to the end of August )

No number exactly its expected value, 5 is the closest. 1 is the number furthest away from its expected value and remains over-represented. If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 3.00, not reducing the way it should do with the addition of more first digits that obey Benford's Law. However, as the critical chi squared value for 9 items with only one line is ~ 15.507, the test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

The test statistic continues to fluctuate rather than reduce.

Today's post was supposed to be about cycling, and withdrawals from the Giro Rosa/Giro d'Italia Femminile compared to withdrawals in the men's Tour de France, but it requires more prose than I am presently capable of (running fencing competitions takes it out of you). Instead, let us return to an update to the Benford's Law project which has been chugging along in the background.

In July, I recorded the first digits in the top news article on the BBC website on 25/31 days. In those 25 articles, there were 261 numbers with leading digits. That's 10-11 per day, which is a less than February but the same as March and May.

( July's numbers )

No number appeared exactly as often as expected, 8 was the closest, only 0.1% away from expected. 1 and 7 are the most different to their expected values with 1 being over-represented and 7 under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 3.6, the lowest monthly total so far.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If we look at the rolling total from February to the end of June, there have been 1860 numbers with leading digits.

( Numbers from February to July )

No number exactly its expected value. 1 is the number furthest away from its expected value and remains over-represented. If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 2.45, reducing as it should with more numbers.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

This is a reduction from the test statistic of the total to May, but it's not as low as it was in April.

In June, I recorded the first digits in the top news article on the BBC website on 24/30 days. In those 24 articles, there were 353 numbers with leading digits. That's 14-15 per day, which is a lot more than in March, April and May, but about the same as in February.

( Table for June's results only )

2 is appearing the expected percentage of times. 1 and 8 are the most different to their expected values with 1 being over-represented and 8 under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 4.9, the same as May.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If we look at the rolling total from February to the end of June, there have been 1599 numbers with leading digits.

( Table for the rolling total )

2 is exactly its expected value. 1 is the number furthest away from its expected value and remains over-represented, the next furthest away is 6 which is under-represented. If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 2.71.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

This is a reduction from the test statistic of the total to May, but it's not as low as it was before May.

Crossposts: https://redfiona99.livejournal.com/1303332.html

This follows the three previous posts.

I was better at remembering to add the daily article in May, adding articles on 29 of 31 days.

Looking at May's articles only, 313 leading digit numbers were used (10-11 per day, slightly more than April, about the same as March and less than February).

( Table for May's results only )

3 is appearing the expected percentage of times. 1 and 7 are the most different to their expected values wth 1 being over-represented and 7 under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 6.67, slightly higher than April.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If we look at the rolling total from February to the end of May, there have been 1254 numbers with leading digits.

( Rolling total to the end of May )

2 and 3 are the numbers closest to their expected values. 1 is the number furthest away from its expected value and remains over-represented, the next furthest away is 6 which is under-represented. If you add together the sum of all the values of (observed-expected) squared, all divided by the expected, the calculated test statistic is 2.84.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford’s Law.

Interestingly, as more numbers from articles added you would expect the calculated test statistic to reduce. Previously, it has (February = 8.6, February + March = 3.49, February + March + April = 2.29), but the test statistic has increased this time to 2.84, possibly explained by the articles from the 1st, 7th and 8th of May being very skewed towards the number 1 and having a lot of numbers in them.

Crossposts: https://redfiona99.livejournal.com/1300956.html

This is the results of the third month of monitoring news articles for which numbers they contain.

I missed a couple more days in April, I blame Easter, and I will catch these up at the end of the year.

In the 27 days I did manage to capture, 232 numbers were used in the leading news articles on bbc.co.uk (~ 8 to 9 per day). This is slightly less than the 9-10 in March and a lot less than the 15 per day from February.
( Chi-squared table for April )

9 is the number closest to its expected value. 2 is over-represented, 8 is under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 5.7.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If you look at the rolling total of February to the end of April, the numbers are starting to add up. Since the start of February, there have been 941 digits in headline news articles.

( Chi-squared table for February to the end of April combined )

5 is the number closest to its expected value. 1 remains over-represented, while 6 is under-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 2.29.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

Interestingly, as more numbers from articles have been added the calculated test statistic has reduced (February = 8.6, February + March = 3.49, February + March + April = 2.29). This is what you would expect to see if the numbers in the articles fulfill Benford's law.

Crossposts: https://redfiona99.livejournal.com/1285668.html

This is the results of the second month of monitoring news articles for which numbers they contain.

March featured the first days I missed (I blame Easter), so I will have to add two days on at the end of the year.

In the 29 days I did manage to capture, 273 numbers were used (~ 9 to 10 per day). This is less than the ~15 per day from February.
( Chi-squared table for February )

1 and 8 are the closest to expected. 5 is over-represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 5.6.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

If you look at the rolling total of February and March, the numbers are starting to add up. There were 709 digits in headline news articles.

( Chi-squared table for February and March combined )

7 and 8 are the closest to expected. 1 remains over-represented, as it was in February. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 3.49.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

Interestingly, as more numbers from articles have been added the calculated test statistic has reduced (February = 8.6, February + March = 3.49). This is what you would expect to see if the numbers in the articles fulfill Benford's law.

Crossposts: https://redfiona99.livejournal.com/1280851.html

Benford's Law gains its power with larger numbers, and I started my Benford's law project in the shortest month. I don't think these things through, do I? But you have to start somewhere.

The 28 daily news articles contained 436 numbers written as numbers (~15 per day).

3 and 7 are found pretty much exactly as often as expected. 1 is over represented. If you add together the sum of all the values of (observed-expected)squared, all divided by the expected, the calculated test statistic is 8.6.

The critical chi squared value for 9 items with only one line is ~ 15.507

The test statistic smaller than the critical value therefore the difference is not significant. This data does not disobey Benford's Law.

Crossposts: https://redfiona99.livejournal.com/1277214.html

Introduction:

Some years ago, I read the book, “How Long Is a Piece of String?: More Hidden Mathematics of Everyday Life by Rob Eastaway” (as reviewed here), and one chapter fascinated me. The chapter was chapter 12 - “Is it a fake?”, and the section that particularly caught my interest was about Benford’s Law. Excessively simplifying, in naturally occurring numbers, the leading digits will follow a distinct pattern, and will not be randomly distributed.

The expected % of leading numbers for each digit can be seen in the table below:

If you have a large naturally occurring data set that doesn’t conform to this, it tells you there are either constraints on it so that the data doesn’t cover all of the possibilities (e.g. human heights in m are will start with a 1 or a 2, no one has ever been 4 m tall) or something else is going on.

Testing this theory:

I wanted to test this out on *something*. Problem was, what? Most sports data is possibility-limited e.g. fewer goals will be scored in football the 9th or 9xths minute than would be scored in the 8th and 8xths minute, not because of the minute, but because the game stops at the 90th minute. Other data isn’t big enough. I needed a source of numbers that was large and unlimited.

Eventually, possibly in a fit of cynicism, I decided to try the leading digits of numbers reported in the news. Advantages to this plan - I can use a single, traceable data source - one article a day from the BBC news website. The BBC doesn’t tend to delete pages so if someone wanted to double check my numbers, I could give them the links.

Disadvantages to this plan - when I first attempted it, Article 50 was in the news, and skewing my results.

Having looked at the results, and realised this and a few methodological errors, and going a bit stir-crazy because of lockdown 3, I decided to try it again.

Attempt Number 2:

These were the rules I developed to try to avoid that and similar pitfalls:
1 - no numbers in names e.g. 19 in COVID-19 does not count as a leading digit
2 - no numbers from dates (I had done this originally, but worth restating)
3 - only digits written as digits. This threw up an unexpected problem - the BBC has somewhat intermittent editorial control on whether digits under 10 are written as words or numbers, and this may skew results. I’ve saved the links to the articles I’ve used to put the project together so I can go through them again if I want to (or if someone else wants to look at them).

I started on the 1st of February 2021, and will carry on till 1st of February 2022 (barring disaster). The other advantage of this system is that if I miss a day, I can fill them in with more days at the end. I will give monthly updates and running totals, plus some commentary if I have any.

I start with a statement saying that I am not a statistician. I am merely an amateur number cruncher. If anyone spots any mistakes or bits that could be clarified, please tell me.

Sorry

ioplokon for this taking so long. The main problem was I was making it over-complicated, so I went back to basics.

For the purpose this example, imagine we are testing to see if using drug A improves cure rates compared to the standard of care. It's a very early trial so we're only going to test it on 10 patients. So that's 10 patients being treated with A and 10 patients being treated with the standard of care (this is your control arm).

1 - Despite what you've heard, most clinical trials do not test against a placebo. They test against standard of care. New drugs have to be as good as what's already out there.

The hypothesis we are testing is: Drug A is better than the standard of care in disease x.

2 - If someone ever tries to blind you with stats, make them tell you what their hypothesis is. If there isn't a hypothesis, the hypothesis is not testable or what they're doing will not help to test the hypothesis, they are trying to bamboozle you.

So we run our experiment. 6 patients in the drug A arm are cured, while 5 people in the control arm are cured.

Is that a real difference, or could that have happened by sheer luck?

I mean, obviously, we’ve tried to make the two groups of patients as similar as we could in terms of disease stage and physical condition but we could have missed something important that makes one person more responsive to treatment than another and therefore not fully or properly balanced the groups.

We are going to use stats to try answer that question.

What statistical significance tells us is "what is the chance that this result is real and not just sheer luck?" So if something is significant to p=0.1, there is a 1 in 10 chance that this result is sheer luck. In science, we say something is statistically significant if p ≤0.05. That means that there is a 1/20 or less chance of the result being due to luck.

3 - Statistical significance does not mean a result is true/real, it means it is unlikely to be due to random chance.

When we run a test of this data, we are trying to disprove the null hypothesis. The null hypothesis is "the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error."

First, we need to know the degrees of freedom in the data. The equation for this is: DF = k - 1
or Degrees of freedom = categorical variables - 1.

In this example, there are two categorical variables, cured and not cured, so the degrees of freedom is 1.

Next we need to work out the expected value for the data. There is a proper equation for this (please see here) but we have a pre-built expected in the control population. The expected number of people cured is the number cured in our control population so our expected value is 5.

We now work out the test statistic, χ2 = ∑[(Oi - Ei)2 /Ei]

∑ is the sum of, O is the observed, E is the expected.

Or, we square (the observed - the expected) then divide it by the expected. We repeat this for the cured patients and the not cured.

In our case, for the cured the observed is 6 and the expected is 5.

So (6-5) = 1

1 squared = 1

1/5 = 0.2

For the not cured, the observed is 4 and the expected is 5.

So (4-5) = -1

-1 squared = 1

1/5 = 0.2

Adding 0.2 for the cured patients, and the 0.2 for the not cured patients gives 0.4.

We then take that value and look it up on a chi squared table. Do not worry about the length and size of the table. The rows are labelled with degrees of freedom. The degree of freedom in our example is 1, so we are interested in the first row.
Go across this row from left to right until you hit the number two, then look up to the top of the column to read the associated p value.

In our case, a p value of 0.4 lies between a p value of 0.975 and 0.2. As both of these values are higher than 0.05, we can't discard the null hypothesis, or, to put it in plainer English, the result of the experiment, where the new treatment had 6 people cured, while the old treatment had 5, could have been due to sheer luck. We cannot say that the extra cured person was due to the new treatment.

Now, as you can imagine, with more complicated numbers, the maths can get a bit tricky, so there are online calculators, I tend to use this one - https://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx

On the first page, our two categories are cured and not cured. The observed cured is 6, so the observed not cured is 4. Our expected, the value from our control is 5 and 5 and we're looking at a 0.05 significance level.

The programme returns a result for us and says "The Chi^2 value is 0.4. The P-Value is 0.527. The result is not significant at p=0.05."

Which means exactly the same as the by-hand version. We can't say that new treatment was the reason why the extra person in the "new treatment" cured was compared to the old treatment.