Monday
Jan022012

Is Premier League defending getting worse?

In Radio 4's More or Less, sports statistician Rob Mastrodomenico took a look at a BBC blog that discusses whether premier league defending is getting worse. He focusses on the fact that the number of goals per game has risen from 2.59 in 2010/11 to 2.97 in 2011/12.

After discussing the football and the pundits, he turns to the statistics. It is unarguable that 2.97 is bigger than 2.59, but he asks whether the difference is statistically significant. He concludes that it is not and calculates how great the difference would need to be for it to be statistically significant. (I addressed the meaning of statistical significance in my blog posting Significance isn't always significant.)

Now the question of statistical significance arises when we have the necessarily limited information that we gain from a sample and we want to know what that tells us about a population. Because our sample is an incomplete picture of the population, we are met with uncertainty –- and that's where probability comes in. But if we have all of the data, there is no uncertainty and no role for statistical inference. If there is a difference between two things then we must conclude that, well, there is a difference between the two things.

Mastrodomenico's analysis of the BBC's football data treats the games played during the first half of each season as a sample of all the games that could theoretically be played. He sees each season as having an unknown, underlying goals-per-game average and the figures of 2.97 and 2.59 as estimates of their respective averages. It is thus reasonable to carry out standard statistical inference procedures.

It's rather like tossing a coin. We can suppose that any given coin has a particular propensity to come down heads. If we were to toss it many, many times we might end up with a very good idea of what that propensity is. It might be 50:50 heads:tails or, in the case of a biased coin, it might be 60:40. In practice we wouldn't toss the coin millions of times, say, but a couple of hundred times isn't too arduous and the resulting proportion of heads would give us an insight into the coin's underlying propensity towards heads. If we obtained 110 heads out of 200, we might reasonably argue that this is not inconsistent with a long-term propensity of 50:50 heads:tails -- although we would expect 100 heads out of 200 tosses, the process is a random one and a small deviation is not unlikely.

The problem for me in this approach is that I don't believe football matches are like coin tosses. Every coin toss is essentially the same as every other coin toss: though the outcome varies, the circumstances surrounding it are the same. And you can, in principle, repeatedly toss the coin as often as you want. But football matches are subject to many different variables: no two matches are the same. And you cannot endlessly repeat matches in an attempt to discern an underlying goals-per-game average.

In some respects probability and statistics are not like other mathematical disciplines: there are differing schools of thought. There isn't universal agreement, for example, on what a probability actually is and how you can calculate it. Some people believe that a probability is inherent to a situation so that all observers would agree on what that probability is. Others argue that if you have different information to someone else you can reasonably come to a different assessment of the probability of a particular event happening.

Mastrodomenico believes that any given football season has an underlying goals-per-game average and that we can gain insight into what that average is by looking at the actual games played. I look at it differently. I believe we can only go by those actual games. I think football matches are fundamentally different to coin tosses and cannot be treated in the same way. He concludes that we have insufficient evidence of a difference between last season and this season; I say there is a difference. Of course, whether that difference is meaningful and what caused it are completely different questions . . .

Monday
Nov282011

Is Lord Sugar a psychic octopus?

During the World Cup last year, a psychic octopus rose to fame by correctly predicting the outcome of eight football matches. Despite the relatively low probability of this happening by chance, most people are content to believe that the octopus was just guessing.

Does Young Apprentice work in the same way? Harry M has now been in the boardroom on the losing team six times out of six. Is it fair to conclude that he's the weakest contestant?

Perhaps he is, though he's had some notable and impressive moments. Yet given that the teams are regularly mixed up, is it not inevitable – or, at least, not unlikely – that one person would find themselves on the losing team every week simply by chance?

And does the winning team always win because they're the best? What role does randomness play? There seems to be a lot of post hoc ergo propter hoc analysis in the boardroom. James made great play of the fact that his team found a pocket watch at a considerably cheaper price than the opposition. But might that not have been more luck than judgement? As Zara countered, the watch she bought was the cheapest of the ones she saw; indeed it was cheapest by a large margin.

If the World Cup's octopus was lucky rather than psychic, might not Harry M be unlucky rather than a weak contestant?

More formal analyses on similar lines to the ones I'm suggesting here have been done in the sphere, for example, of investment management. There is evidence to suggest that many apparently successful fund managers may merely be just consistently lucky.

Friday
Oct212011

Significance isn't always significant

The BBC report the new climate change data with the headline Global warming since 1995 "now significant". While there is an attempt to explain what this means, I think the headline is deceptive and the explanation is unclear. Let me try to do better.

Suppose I have a coin which I suspect is biased in favour of heads. I toss the coin ten times and it comes down heads six times. What conclusions can I draw from this? Well, if the coin is not biased I would have expected it to come down heads five times out of ten. But, crucially, coin tosses are random: if you repeatedly toss a fair coin ten times it doesn't come down heads exactly five times out of ten every single time. Sometimes you get six heads; sometimes you get four. So the fact that I got heads six times is entirely consistent with the coin being a fair coin: for an unbiased coin to come down heads six times out of ten is not that unusual. This evidence is not, therefore, sufficient, to support my suspicion that the coin is biased.

On the other hand, if my coin had come down heads nine times out of ten, I may start to feel justified in my belief that it is biased. Fair coins do sometimes come down heads nine times out of ten, but only quite rarely. If my coin did so I may prefer to believe that it is biased, since coming down heads nine times out of ten is not that unusual for a biased coin.

An outcome is said to be statistically significant if the probability of it occurring by chance is very low. How low is low is entirely subjective. A benchmark that is often used is 5%, though the researchers at CERN who are looking for the Higgs boson use a far lower probability, as did the physicists who announced they may have found particles travelling at a speed faster than that of light.

But what statisticians mean by "significant" is simply that there is evidence to suggest, for example, that the coin is biased. The term says nothing about whether the bias is meaningful in practical terms: practical significance. A conclusion of statistical significance does not distinguish between a coin that is biased in such a way that it comes down heads 50.1% of the time and a coin that comes down heads 90% of the time.

The latest climate change data does offer evidence that the temperature has increased. But there are two reasons to be cautious. First, it remains possible (though unlikely) that the increase is simply a random variation, equivalent to a fair coin coming up heads nine times out of ten. Second, even if the increase is real, it may not be significant in practical terms.

Wednesday
Aug172011

What’s the point of Fermat’s last theorem?

Today Google celebrates Pierre de Fermat’s 410th birthday. He is most famously known for his “last theorem”, the subject of Simon Singh’s excellent book. (If you know about Fermat’s last theorem, jump down to "Molina’s Urns" below.)

The theorem is a generalisation of Pythagoras’s theorem. Every schoolchild knows that, for a right-angled triangle,

where x, y and z are the lengths of the three sides of the triangle, with z the length of the longest side.

Some right-angled triangles have sides whose lengths are whole numbers. For example, a triangle whose sides have lengths 3, 4 and 5 is right-angled because

Fermat was considering the more general equation

where n is a positive whole number. He wondered whether positive whole numbers x, y and z could be found that satisfied this equation if n was larger than 2.

He claimed that no such values could be found and he noted the fact in the margin of an algebra textbook he was reading. He wrote, “I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain.”

Well, that was in 1637. It wasn't until 1995 that Andrew Wiles and Richard Taylor published a proof of the theorem, some 358 year later.

(Incidentally, Fermat was both a lawyer and a mathematician. As am I :P)

Molina’s Urns

Some time before Fermat’s last theorem was proved, E. C. Molina invented a problem whose solution relied upon upon it. The problem would only have a solution if Fermat’s last theorem was proved to be false.

There are two urns each containing the same number of balls. Each ball is either black or white.

From each urn the same number of balls is selected. Each time a ball is selected from an urn its colour is noted and it is then put back into the urn before the next ball is selected.

The problem is this. Can the distributions of black and white balls in each urn be chosen such that the following condition is satisfied: the probability that the balls drawn from the first urn are all white must equal the probability that the balls drawn from the second urn are either all white or all black.

Let’s analyse the problem.

We define the following variables:

x = the number of white balls in the second urn

y = the number of black balls in the second urn

z = the number of white balls in the first urn

n = the number of balls selected from each urn

Since the total number of balls in each urn is the same, there are x + y balls in each urn.

The probability that a white ball is selected from the first urn is

Therefore the probability that all of the balls drawn from the first urn are white is

Similarly, the probability that all of the balls drawn from the second urn are white is

and the probability that all of the balls drawn from the second urn are black is

We require the probability that all of the balls drawn from the first urn are white to be equal to the probability that the balls drawn from the second urn are either all white or all black. In other words

which reduces to 

So in order to satisfy the requirement of the problem we must be able to find whole numbers which satisfy this equation.

We can certainly do this if n = 2, since the equation becomes Pythagoras’s theorem. Thus we can have 5 white balls and 2 black balls in the first urn, and three white balls and four black balls in the second urn.

If we draw two balls from each urn, the probability that both balls drawn from the first urn are white is

and the probability that the two balls drawn from the second urn are either both white or both black  is

which satisfies the requirement of the problem.

If, however, we choose more than two balls from each urn, Fermat’s last theorem tells us that there are no mixtures of black and white balls in each urn that will satisfy the requirement of the problem.

(I found this interesting problem in Fifty Challenging Problems in Probability, by Frederick Mosteller.)

Monday
Jul042011

Why I (mostly) hate league tables

There are many reasons to object to league tables, and the table above illustrates one of them very well: a tiny change in what you measure results in a big change in league table position.

The three columns rank research institutes according to three different, but similar, metrics. Yet the resulting rankings give very different results. For example, the Max Planck Society tops the first two tables, yet falls to 20th place in the third.

That 20th place ranking illustrates another objection I have to tables such as these. The metric on which the institutes are judged in third column is measured to two decimal places. This is bizarrely specific for something that has a number of subjective features. The definition of a "highly cited paper" is arbitrary, and being cited is itself somewhat arbitrary (and, indeed, may not be evidence that the paper being cited is good: the citation may be a criticism, contradiction or refutation).

Of course, there is an exception to every rule. The table below, in which the University of Cambridge is ranked number one, is entirely beyond any criticism.

(As is this year’s Tompkins table, which ranks Cambridge colleges by exam success. Trinity College comes top. By happy coincidence, I was at Trinity :)