You want to go out for a walk this afternoon, but you’re worried that it might rain. You turn on the television: the forecast is for rain. Should you give up on your walk?

You decide to do a little research. You go to the weather forecaster’s website and discover that they claim a 90% accuracy rate: out of 100 days on which it rained, they predicted it would rain on 90 of those days. Sounds pretty good.

Digging a little deeper you discover that out of 100 days on which it did not rain, they correctly predicted it would be dry on 80 of those days. That’s not too bad, either.

It looks like the forecaster is pretty reliable. You decide to go ahead with your walk but you take an umbrella with you, trusting the forecast of rain.

It’s bright sunshine the whole time! You didn’t need the umbrella at all!

Why?

Because you didn’t use **Bayes theorem**.

You see, it turns out that it rains only 10% of the time where you live. So in 100 days, it rains on 10 of those days. And the weather forecaster, with its 90% accuracy rate, would correctly predict rain on 9 of those 10 days.

However, it doesn’t rain on 90 out of 100 days. But the weather forecaster would wrongly predict that it would rain on 20% of these. So on 18 days the forecast would be for rain when it didn’t actually rain.

In total then, the weather forecaster predicts rain on 9 + 18 = 27 days out of 100. But on only 9 of those days does it actually rain. So the proportion of days on which it rains when the weather forecaster has predicted rain is 9/27, which is only one third. That’s pretty unreliable.

The impressive statistic (“90% accuracy!”) on the weather forecaster’s website was the answer to the following question: “Given that it did in fact rain, what is the probability that the *forecast* was for rain?”

The problem arose because this question is the wrong way round. What you really want to know is, “Given that the forecast is for rain, what is the probability that it will actually rain?” The statistic here is much less impressive: about 33%.

Why did this happen?

Although the weather forecaster often correctly predicts rain when it actually rains, it doesn’t rain very often, so the number of days on which it rains and on which rain is predicted is small (9 days). And although the weather forecaster rarely predicts rain when it doesn’t rain, there are many days on which it doesn’t rain, so there are many opportunities for an incorrect forecast (18 days out of 100).

Thus a prediction of rain is more often associated with a dry day than with a wet day. And that’s what happened to you today.

A similar problem arises in diagnostic testing for diseases: for *rain* read ‘disease’, for *forecast* read ‘diagnostic test’. Bayes theorem says that the question of interest is “Given that the test is positive, what is the probability that the patient actually has the disease?”

There are two things we wish to avoid. A **false positive** occurs when a healthy patient is diagnosed as having the disease. (Statisticians creatively call these **Type I errors**.) A **false negative** occurs when a patient with the disease is diagnosed as being healthy. (Statisticians creatively call these **Type II errors**.)

Pregnancy isn’t a disease, but the picture below illustrates the distinction between the two types of error.

The answer to our question – “Given that the test is positive, what is the probability that the patient actually has the disease?” – is the ratio of ‘the number of sick patients who get a positive test result’ to ‘the number of patients (both sick and healthy) who get a positive rest result’. (If you like: *true positives* divided by *all positives*.)

For this ratio to be high (i.e. for the diagnostic test to be reliable) we need the number of false positives to be very low.

For example if we have 10 true positives and 1 false positive, then the proportion of true positives is 10/11, which is very high. But if we have 10 true positives and 10 false positives, then the proportion is 10/20, which is no better than diagnosis by tossing a coin!

Problems arise when the **base rate** of the disease amongst people who are tested is low. In a screening programme for a rare disease, even a low rate of false positives will throw up a large number of positive test results, because so many of the people tested will be healthy and a small proportion of a large amount is still a reasonable number of people, all of whom will be wrongly diagnosed. And even if the test is very good at identifying sick people, the actual number of sick people is low (because the disease is rare) so that number of true positives may not be very high. Thus the ratio *true positives* to *all positives* may, therefore, not be very high, as in my rain example.

School league tables (or, as the *Department for Education* calls them, **School Performance Tables**) were published last month, to **much complaint from some private schools**. Schools such as Eton, Harrow and Winchester scored 0% because they no longer enter their students for GCSEs, preferring International GCSEs, which are thought to be more challenging and therefore better suited to more able pupils.

I’ve long been troubled by league tables, though for a different reason: an A grade is not the same as an A grade.

League tables are based on grades obtained in public examinations. I’m simplifying here, but basically you add up the number of A grades obtained in each school, divide by the number of students in that school and you get an average. Good schools have high averages, bad schools hve low averages. Pretty uncontroversial, surely?

I argue that pretty much every step of the process is flawed. For example, adding up the number of A grades. This is only meaningful if all A grades are the same. Does an A grade in Maths mean the same thing as an A grade in Theatre Studies? (If so, *what *does it mean?)

Let’s try an easier question. Does an A grade in Maths mean the same thing as an A grade in Maths? That's a ridiculous question, surely? Well, no. Different schools use different exam boards for their maths exams. Can we be certain that A grades given by different exam boards mean the same thing? (OCR's exams have always struck me as a much harder that Edexcel’s, for example.)

Let’s narrow it down further. Does an A grade in Edexcel’s Maths mean the same thing as an A grade in Edexcel’s Maths? Not necessarily. Students can choose which modules they sit. Are the exams on the Statistics modules directly comparable to those on the Mechanics modules? (Edexcel’s module in Decision Maths is often seen as markedly easier than the other modules, despite counting equally for the final grade.)

Hmm. OK, then. Does an A grade in Edexcel’s Maths mean the same thing as an A grade in Edexcel’s Maths where the modules taken are the same? Not if they’re not taken at the same time. Can we be certain that the C3 exam and the marks obtained in it by candidates are consistent from one exam sitting to another? (Edexcel’s C3 exam in June 2013 was **an internet sensation** within hours of the end of the exam because it was considered unusually difficult. Edexcel responded by dropping the grade boundaries quite markedly. How precise is that process? How precise could it possibly be?)

Surely I’ll concede that if two students sit the same maths modules set by the same exam board at the same time and both get A grades, then those two A grades are equal?

Nope. An A grade requires an average of 80 marks per module. Or more. One candidate could have got an average of 80. The second could have got an average of 90.

Which makes the second one better? Not necessarily. Maybe the second one got very high marks on the easier modules which boosted his average. C1 is the simplest module, but it counts equally. Very high marks in C1 can make up for low marks in, say, C3. Maybe the second student was sitting some of the modules for the second time, having tried them a year earlier and not done so well.

I think it’s perfectly possible that a student with a B grade is meaningfully better at maths than a student with an A grade. Yet the A grade student will be off to a top-ranked university, and the B grade student will have to settle for his second choice.

But now **I** have fallen into the league table trap. *Top-ranked university.* What does that mean? If you can’t even compare A grades in the same subject and be sure that you’re making a meaningful, consistent judgement, how can you compare entire universities and say that some are ‘better’ than others?

I'll bet there are some lecturers at London Metropolitan (ranked bottom of **the Guardian’s table**) who are better than some lecturers at Cambridge. (Uh oh. *Better. *What does that mean?) Stephen Hawking was a professor at Cambridge: that didn’t mean you’d be certain to be taught by him or even that you’d ever see him at all. And just because he’s incredibly clever doesn’t mean he’s an incredible teacher. I know I’m not the only person who gave up on **A Brief History of Time** well before the final chapter.

Malcolm Gladwell agrees with me. He wrote **an excellent piece **for the *New Yorker* on the subject of ranking colleges in the USA.

*The Economist* published **a story** about a competition that tested Stanley Milgram's famous ‘**six degrees of separation**’ claim.

DARPA, the research arm of the Department of Defense in the US, staged the **Red Balloon Challenge** in 2009. Competitors had to locate ten red weather balloons that had been tethered at random locations across the US.

The intention was not that one person drive around the country with a pair of binoculars. Rather, I might ask all my friends on Facebook to look out for a red balloon and tell me if they saw one. They might then ask all their friends, and so on.

The winning team from MIT found all of the balloons in just nine hours using this type of strategy. But to encourage participation they offered $2,000 to the first person to send them the co-ordinates of a balloon. On its own this may not have been very efficient. So, crucially, they also offered $1,000 to whomever recruited *that *person to the challenge, and $500 to whomever recruited *that* person to the challenge, and so on.

One interesting question mathematically is how much money did the MIT team stand to lose? The Red Balloon Challenge offered a prize of $40,000.

In principle, the sender of a winning set of co-ordinates might have been been at the top of a long line of recruiters. Doesn’t this mean that the MIT team risked an enormous payout?

Well, no. Consider a recruitment chain of seven people. The total payout would be:

$2,000+$1,000+$500+$250+$125+$62.50+$31.25 = $3,968.75

The seventh payment of $31.25 is pretty small. If there were more people in the chain, their payments would be even smaller. Nonetheless, lots of small amounts can quickly add up to a large amount.

Suppose there were 17 people in the chain. The total payout would be $3,999.97 (to the nearest cent). The seventeenth person would have got 3¢.

Even so, there could have been many more people in the chain, and might not the total have slowly grown to an unaffordable amount?

Suppose the total amount payable is T and that there are infinitely many people in the chain. Then,

T = 2000 + 1000 + 500 + 250 + ...

Now multiply both sides of this by 2:

2T = 4000 + 2000 + 1000 + 500 + ...

Finally subtract the first equation from the second:

2T – T = 4000 + **2000** + **1000** + **500** + ... – (**2000** + **1000** + **500** + ...)

The terms in the brackets cancel out all of the terms at the beginning except the first one. Thus we end up with T = 4000.

So even if there had been infinitely many people in the chain, the total payout would have been $4,000. Since there were ten balloons that gives a grand total of $40,000 which was the value of the prize. MIT were certain to make at least a little profit, provided they actually won the prize. Since the kudos of winning was worth more than the prize, their investment was well worth the risk.

Series such as this one are called **geometric series**. One of the earliest examples of them was **Zeno’s dichotomy paradox**. For me to walk 4000 metres, I first have to walk 2000m, i.e. half the total distance. I then have to walk 1000m, i.e. half the remaining distance. Then 500m. Then 250m. And so on. No matter how far I've travelled there's always half the remaining distance left to go. So I have infinitely many stages to complete and will therefore never get to the end of them.

The flaw in the argument is that last sentence. It assumes that the infinitely many stages of the journey will take infinitely long to get through. But they won't.

Suppose I walk at 1m/s. Then the first stage will take me 2000s. The second will take 1000s. The third 500s, and so on. So the total time taken will be T = 2000 + 1000 + 500 + ... We now know that this adds up to 4000. So I can finish the journey in a finite amount of time. (Indeed the 4000 seconds you would expect me to need to walk 4000m at 1m/s.)

]]>The above screenshot shows an interesting point in a sequence of games of roulette. The results of the previous nine spins are shown at the bottom of the screen in reverse order, so black 17 was the most recent.

Above this we see the bets that have been made on the next spin. Note the huge stack of chips on red (circled) compared with the tiny stack betting on black. The gamblers clearly believe that a red is “overdue”, since the last seven spins have all been black. This is an example of the so-called *gambler’s fallacy*. Probability theory tells us that a black is equally likely on the next spin.

So what did happen next?

Not only was the next spin a black, it was black 17 *again*. In fact, black 17 had come up three times in the last nine spins.

This is a great example of how clumpy randomness is. People tend to associate randomness with evenness. In the very long run they’re right. In a very large number of spins, you would expect to see black about half the time, and black 17 about one time in 37. But in the short run, you often get clumpy results such as this.

It’s because clumpiness doesn’t *feel *random that **Apple had to fiddle the shuffle feature** in iTunes to make it seem random by avoiding playing two songs by the same artist one after the other.

An exercise I like to use with students is to ask them to write down a sequence of 100 random digits generated from their own heads. Two things typically happen. First, they find it surprisingly hard. Initially they write their digits quite quickly, but they soon slow down. This is because they’re *thinking*: they’re trying to make the digits look random. (Sometimes they just give up and start writing down sequences they already know, like phone numbers.)

But the second thing that happens is that they fail. For example, about one time in ten you would expect the same digit to be repeated. So in a list of 100 random digits, you’d expect about ten repeats. Typically students will generate fewer than this. You’d also expect to see one example (on average) of three of the same digits in a row in a set of 100 random digits. In a class of students, this very rarely happens. Indeed, in a class of 30 students, you’d expect to see about three examples of four identical digits in a row. This never seems to happen because it just doesn’t feel random.

But it is. Randomness is clumpy.

]]>Of course you know the answer to this: the Sun. (Though often when I ask people they tend to look sheepish and say nothing, for fear of being wrong!)

Actually, it’s not at all obvious which one is bigger: after all, they look the same size. You know which one is bigger because you were told it when you were very young. Which rather spoils the pleasure of trying to work it out. But *how* do we know the Sun is bigger?

It’s fairly easy to establish that the Sun is a least a bit bigger than the Moon. They look the same size in the sky, but during a solar eclipse the Moon completely covers the face of the Sun. So the Sun must be further away than the Moon and therefore it must be bigger. But *how* much bigger?

To establish this we need another rare sighting in the sky: have you ever seen the Moon and the Sun in the sky at the same time? It does happen — as the picture above shows. (Indeed, Lewis Carroll wrote about it in the first two verses of ** The Walrus and the Carpenter**.)

But imagine that you can see both the Sun and the Moon in the sky together and that the Moon is a *half *Moon. What must the geometry of the solar system be at that moment?

The Sun-Moon-Earth forms a giant right-angled triangle, with the Moon at the right-angle. If we can measure the Sun-Earth-Moon angle, then we can use trigonometry to find the relative distances of the Sun and the Moon from Earth. Since the Sun and the Moon look the same size in the sky, these relative distances must also be the relative sizes of the Sun and Moon themselves.

The hard part is measuring the angle. You need it to be *exactly* a half Moon. You need to measure the angle between the Sun (**without** looking at it directly), you and the Moon.

But it gets worse, if you’re even *slightly* wrong with your measurement, the relative distances and sizes change by a surprisingly large amount. This is because the angle is very nearly 90° — and that is because the Sun is really *very* much farther away than the Moon and *very* much bigger. In fact the angle is about 89 5/6°, which indicates that the Sun is about 340 times the size of the Moon. But if you measure it as 89 4/6°, you’ll get the Sun being only 170 times the size of the Moon, in other words *half* the size (and therefore one-eighth of the volume, and one-eighth of the mass). So your measurement needs to be incredibly precise!

**Notes**

The diagram above is merely intended to indicate the relative positions of the Sun, Moon and Earth. The sizes are not to the same scale (compared with the Earth, the Sun should be vastly bigger than shown and the Moon slightly smaller) and the distances are not in proportion (the Earth-Sun distance is substantially bigger than the Earth-Moon distance – indeed, the triangle should look almost like two parallel lines).

The Sun is, in fact, almost exactly 400 times the size of the Moon, by which I mean its radius is about 400 times the radius of the Moon. This means that its volume is 400×400×400 times the volume of the Moon, which makes it 64 million times as large. (The ratio of the masses of the Sun and the Moon is different, however, because they do not have the same average density.)

The figures of 340 and 170 come from tan(89 5/6°) and tan(89 4/6°), which actually represent the Moon:Sun and Moon:Earth distance ratios. It’s an incredible coincidence that, although the angle differs by one-sixth of a degree, one tangent is (almost exactly) double the size of the other. (In fact, it’s double to one part in a hundred thousand!)

]]>Why do we use radians? After all, there’s nothing wrong with degrees. Everyone understands degrees. Almost any fraction of a circle is a whole number of degrees. In radians, it seems like every angle is an irrational number. So what’s the point?

The truth is that no-one uses radians to measure angles. We use radians because it makes the graph of sin(*x*) look nice. If you draw the **graph of sin( x) in degrees**, it is a virtually flat, featureless graph: the

This is not a trivial point. Look at the **graph of sin( x) in radians near to the origin**. It looks like a straight line. Specificially, it looks like the line

So what is sin(18°)? Well, 18° is a tenth of 180°, so it’s a tenth of π, i.e. about 0.31. So the sine of 18° is equal to the sine of 0.31 radian. But the sine of angle in radians is approximately equal to the angle itself. Thus sin(18°) ≈ sin(0.31) ≈ 0.31. Easy!

(If you check this on a calculator, you’ll see it’s correct!)

But how do we find sin(54°) – for angles that large, the sine graph doesn’t look at all like a straight line. So what do we do then? What does the calculator do? (Hint: the **graph of sin( x) looks a little bit like a cubic** equation between –180° and +180°.)

I was idly browsing **Quora**, when I came upon the following sequence:

1, ∞, 5, 6, 3, 3, 3, ...

It is highly unusual to have infinity as a term in the middle of a sequence – whatever could it be?

Well, the *n*th term is the number of **convex regular polytopes in n dimensions**. Ah! That explains it?

It’s easiest to start with the second term of the sequence. It’s the number of **regular polygons** that you can make. A regular polygon is a plane figure with straight sides all of equal length. The most obvious is perhaps a square. But we additionally require the polygons to be *convex:* this means that the sides cannot turn in on themselves. The figure below on the left is a convex regular pentagon; the figure on the right is also a regular pentagon, but it is not convex.

It should be fairly obvious that you can construct regular polygons with any number of sides:

So the number of convex regular polygons is infinite. And that’s the second term of the sequence.

What about the others? Let’s look at the third term: 5. This is the number of **regular polyhedra**. That is, the number of three dimensional shapes, where each face is a regular polygon. (Again, we require them to be convex.) The five regular polyhedra are called the **Platonic solids**, and they are illustrated in the photograph at the top of this blog. From left to right: icosahedron, dodecahedron, cube, octahedron and tetrahedron. It is not possible to construct any other regular polyhedra: the angles won’t fit together to form a closed solid.

In informal language, then, the sequence is the number of regular shapes in one dimension, two dimensions, three dimensions, and so on. The word “polytope” is the generic term that covers polygons (two dimensions), polyhedra (three dimensions), and all the others in higher dimensions. Perhaps surprisingly, it is the two-dimensional world that offers the greatest variety.

]]>