The **BBC reports** that 2014 Oscar winner Dan Piponi, who was part of the team which pioneered simulating smoke and fire in films such as Avatar and Puss In Boots, said: "Nobody told me if I wanted to get an Academy Award, I should study mathematics." Joshua Pines, who worked on the film Coraline, was honoured for developing image-processing mathematics to standardise colour.

How many people do you think need to be in a room such that it is more likely than not that two of them share the same birthday?

The answer may surprise you. It's 23.

I mentioned this to one of my students yesterday and he didn't believe me. OK, I said, what's your favourite football club? **Chelsea**. So we looked at the biographies of the members of the **first team** to see if any share a birthday. Chelsea's website listed 24 members of the first team, though only 23 had their birthdays listed. So there was a 50:50 chance we'd find a match.

And we did! **David Luiz** and **John Mikel Obi** both have their birthday on 22nd April.

Here's the best part: so do I! What are the chances of that?!

Let's start with the Chelsea players. We looked at 23 of them. We needed to find a matching birthday. How many potential matches are there? First on the list is Petr Cech. We can compare his birthday with any of the remaining 22 players. So that's 22 potential matches. Next on the list is Branislav Ivanovic. We've already compared him with Petr Cech, but there are still 21 other players to compare his birthday with. That's a total of 43 comparisons so far. Then there's Ashley Cole. We've already compared him with both Petr Cech and Branislav Ivanovic, but that still leaves 20 other players to compare his birthday with. So now we've looked at 63 different pairs of players.

There's a pattern here. The total number of comparisons we can make amongst the 23 players is 22+21+20+19+...+3+2+1 = 253. That's a surprisingly large number and suddenly it's not looking quite so surprising that we've got a 50:50 chance of finding a match.

For any given pair, the probability they share a birthday is 1/365, ignoring leap-year birthdays. So the probability they don't share a birthday is 1–1/365 = 364/365.

The probability no pair shares a birthday is (364/365)^253, since there are 253 pairs, each of which must not share a birthday.

So the probability that there is (at least one) pair that does share a birthday is 1–(364/365)^253, which is 0.5005 to 4 decimal places. Pretty much bang on 50:50.

The probability we would find two Chelsea first team players sharing a birthday *with me* is 0.5005 x 1/365, which is almost one in a thousand. Which is pretty small!

All this inspired me to dig a little deeper.

First I drew a graph. (You can click on the graph to enlarge it.) This shows the probability of two people having the same birthday on the y-axis and the number of people in the room on the x-axis. The red line shows a probabiliy of 50%, which crosses the graph at 23 people. The blue liine shows a probability of 95%, which crosses the graph at 48. So if you have 48 people in a room it is almost certain there'll be two people sharing a birthday.

I then looked at three people sharing a birthday, and four, and five, and six, and seven, and . . . then my computer could no longer handle the calculations.

I discovered, for example, that if you have 800 friends on Facebook, it is *certain* that there will be (at least one) day in the year on which seven of your friends have a birthday. By "certain" I mean that the probability is so close to 1 that there is only a tiny, tiny chance that you won't have such a day. (It's **0.00002**!)

I produced a **graph** on **Desmos** to illustrate all this. The slider on the left varies the value of k, this is the number of people sharing a birthday: you can vary it from 2 to 7. The horizontal axis is the number of people in the room (or the number of Facebook friends you have) and the vertical axis is the probability that you will have k friends who share the same birthday.

In fact this problem has been exhaustively researched and discussed. It's called the **Birthday Problem**, or sometimes the Birthday Paradox (as the number of people required is much smaller than you might imagine).

**UPDATE**

I tried this with a couple of other students, with equally impressive results. When we went through the **Arsenal first team**, we found a pair *in the first two players* we tried: **Wojciech Szczesny** and **Lukasz Fabianski** were both born on 18th April. With **Manchester United**, it took five players to find a match: **David De Gea** and **Rio Ferdinand** were both born on 7th November.

The Daily Telegraph gives some excellent advice on applying to medical schools.

Getting into medical school** **is hard. I didn’t realise just how hard until I started to research it. According to Ucas figures for 2012 entry, there were 82,489 applications to medical courses for only 7,805 places. This means there were 10.6 applicants for every place.

To give yourself the best possible chance of success – or at least a fighting chance of an interview – your application needs to fulfil the tough academic requirements and have an “X factor” that will catch the eye of the admissions tutor, too.

For prospective medical students like me, it is a daunting prospect. How to succeed? I talked to doctors, medical students and academics involved in the admissions process to find out.

First, the basics: you need top grades – not just at A-level, but also at GCSE. Candidates with A/A* GCSE results in English language, maths and science are preferred, and in reality most successful applicants will boast As and A*s in a wide range of subjects. Nearly all universities ask for chemistry A-level and at least one other science: some insist this should be biology. A third A-level is needed, and it can be any subject (although most medical schools will not accept general studies or critical thinking). Realistically, to gain an offer your predicted grades must be AAA at least. If you do get a conditional offer, your place at medical school will be assured by meeting the required grades: AAA, or even A*AA at some universities.

Most universities ask applicants to take either the BMAT or the UKCAT aptitude tests, which examine GCSE scientific knowledge and aptitude for medicine by assessment of verbal reasoning, data analysis, abstract reasoning, decision-making and judgment in real-life situations.

Students are told that there is a limited amount of work they can do to prepare for the aptitude test, but Joe Hamilton, a third-year medical student, told me otherwise. Hamilton was rejected by all four of his chosen universities the first time round. His below-par UKCAT mark was partly responsible. “Two of the rejections I received were due to the fact that I did not score highly enough in the UKCAT.” So how did he make sure he got a better score the following year? “The second time around I did a two-day course in London and a lot more practice before sitting the test.” He dramatically improved his score.

The course Hamilton took is run by Kaplan, an international exam-preparation organisation, and teaches techniques for answering questions from each section of the test. For instance, careful time-management counts: it is crucial that you attempt all sections, as often the questions that carry more marks are towards the end of the paper. That’s a useful insight, but at £315 the course is not cheap.

“They say you can’t prepare for the UKCAT, only familiarise yourself with the questions,” Hamilton says. “I found that was not the case and the more practice you do, the higher the score you will get. I know a lot of others who are at medical school with me now had exactly the same experience of the UKCAT.”

Universities will also be looking for evidence that you are genuinely interested in medicine and have read widely around the subject, gaining insight into the NHS and health care generally. Dr Lawrence Seymour, a consultant in acute medicine at a teaching hospital, recommends starting as early as GCSE year. “I would advise a would-be doctor to keep a folder and collect anything in the general press or from medical journals such as the BMJ [formerly the British Medical Journal] that relates to medical advances, new treatments – anything that catches their interest.”

Before applying, students should make sure they have a clear idea of what being a doctor is about, says David Bender, an emeritus professor of nutritional biochemistry at University College London, and a former member of the medical admissions team. “Students thinking about applying to medical school should talk to doctors and medical students to find out what the course and the job is really like,” he says. “It is not all the glamour you see on television.”

Nearly all medical schools require applicants to have some sort of health-care-related work experience. I asked Dr Patrick Harkin, the deputy director of medical admissions at the University of Leeds, what counted as relevant experience. “Volunteering in a hospice is work experience, even if it’s not necessarily what you think of first. In fact, anything that has clinical relevance is work experience. Care homes, hospices, pharmacies, all places where something clinical is happening.” You don’t need a long list of placements, Dr Harkin says, as long as it is clear that you have learnt from what you’ve done. “It’s not about what you do; it’s about how much you get out of it. Some people get more out of a week than others get out of a month.”

Having said that, working or volunteering in a clinical setting for a prolonged period of time is valuable. “If you stick at something for six months, that shows dedication and an interest. If you’ve been at 15 different things we might start to wonder about your commitment, or your ability to get on well with other people.”

Work experience can also enhance your vital communication skills. Leo Feinberg, president of the University of Birmingham’s MedSoc and a third-year medical student, volunteered in an acute medical unit, where he learnt what he says is one of medicine’s most important lessons: that “Patients want to talk. They may be nervous, and they need someone to offload to.”

Only three medical schools, Belfast’s Queen’s, Edinburgh, and Southampton, do not interview prospective medical students. But certain medical schools place more emphasis on personal statements than others and information about this can be found on their websites. (Many universities provide a guide to writing the personal statement, as does UCAS.)

Dr Harkin stresses that a personal statement** **must concentrate on the individual’s unique experiences relevant to their choice of career. “Your personal statement is personal. It is about you. We are not after great prose. This is not a creative writing course.” Hamilton agrees: “Anything that I thought was relevant to my application, that I had gained something from, I put into my personal statement.”

At the interview, tutors are looking for commitment and enthusiasm. They also assess aptitude, empathy, communication skills and social awareness. Preparation is vital, says Dr Seymour. “Interview practice is really, really important. Most candidates are stumped or struggle to sound sincere when you ask them 'So why do you want to study medicine?’ My advice is: practise.”

A realistic understanding of the highs and lows of being a doctor is required. Prof Bender has a way of investigating this. “One of the questions I often ask applicants is: 'Has anyone tried to persuade you that medicine is an awful career?’”

Given the competition, it is inevitable that even some of the best candidates will be turned down. But this should not be a deterrent. Hamilton made good use of his enforced gap year by working as a health care assistant at a local hospital, which he believes boosted his application the second time around. “There are a lot of people who are second-time applicants, possibly as many as a third or half of the year group. I have not met anyone who has regretted having a year out – but I have met people who wish that they had had the opportunity.” The greatest benefit of his year out, Hamilton says, was the chance “to experience health care from the nurses’ perspective. From their point of view, doctors who started out as health care assistants make the best doctors!”

After speaking to Joe Hamilton – and so many other helpful people – I am encouraged, though the anxiety hasn’t completely left me. At least I know what I need to do to give myself a chance of being accepted — and maybe one day I will be able to look a patient in the eye and say: “Hello, I’m Dr Ford. How can I help you?”.

The Economist recently published a story about a competition that tested Stanley Milgram's famous "six degrees of separation" claim.

DARPA, the research arm of the Department of Defense in the US, staged the Red Balloon Challenge in 2009. Competitors had to locate ten red weather balloons that had been tethered at random locations across the US.

The intention was not that one person drive around the country with a pair of binoculars. Rather, I might ask all my friends on Facebook to look out for a red balloon and tell me if they saw one. They might then ask all their friends, and so on.

The winning team from MIT found all of the balloons in just nine hours using this type of strategy. But to encourage participation they offered $2,000 to the first person to send them the co-ordinates of a balloon. On its own this may not have been very efficient. So, crucially, they also offered $1,000 to whomever recruited *that* person to the challenge, and $500 to whomever recruited *that* person to the challenge, and so on.

One interesting question mathematically is how much money did the MIT team stand to lose? The Red Balloon Challenge offered a prize of $40,000.

In principle, the sender of a winning set of co-ordinates might have been been at the top of a long line of recruiters. Doesn't this mean that the MIT team risked an enormous payout?

Well, no. Consider a recruitment chain of seven people. The total payout would be:

$2,000+$1,000+$500+$250+$125+$62.50+$31.25 = $3,968.75

The seventh payment of $31.25 is pretty small. If there were more people in the chain, their payments would be even smaller. Nonetheless, lots of small amounts can quickly add up to a large amount.

Suppose there were 17 people in the chain. The total payout would be $3,999.97 (to the nearest cent). The seventeenth person would have got 3¢.

Even so, there could have been many more people in the chain, and might not the total have slowly grown to an unaffordable amount?

Suppose the total amount payable is T and that there are infinitely many people in the chain. Then,

T = 2000 + 1000 + 500 + 250 + ...

Now multiply both sides of this by 2:

2T = 4000 + 2000 + 1000 + 500 + ...

Finally subtract the first equation from the second:

2T – T = 4000 + 2000 + 1000 + 500 + ... – (2000 + 1000 + 500 + 250 + ...)

Which gives:

T=4000

So even if there had been infinitely many people in the chain, the total payout would have been $4,000. Since there were ten balloons that gives a grand total of $40,000 which was the value of the prize. MIT were certain to make at least a little profit, provided they actually won the prize. Since the kudos of winning was worth more than the prize, their investment was well worth the risk.

Series such as this one are called **geometric series**. One of the earliest examples of them was Zeno’s dichotomy paradox. For me to walk 4000 metres, I first have to walk 2000m, i.e. half the total distance. I then have to walk 1000m, i.e. half the remaining distance. Then 500m. Then 250m. And so on. No matter how far I've travelled there's always half the remaining distance left to go. So I have infinitely many stages to complete and will therefore never get to the end of them.

The flaw in the argument is that last sentence. It assumes that the infinitely many stages of the journey will take infinitely long to get through. But they won't.

Suppose I walk at 1m/s. Then the first stage will take me 2000s. The second will take 1000s. The third 500s, and so on. So the total time taken will be T = 2000 + 1000 + 500 + ... We now know that this adds up to 4000. So I can finish the journey in a finite amount of time. (Indeed the 4000 seconds you would expect me to need to walk 4000m at 1m/s.)

]]>The above letter was published in today's *Times Educational Supplement*. The full text of the letter was:

Conrad Wolfram is right to argue that we need to rethink our priorities for mathematics education ("Computers do it better", TES, 30 March 2012). It is bizarre, for example, that we still teach students to use tables of probabilities three decades after the calculator ousted the books of logarithms so hated by our parents. Worse, exam boards often ask questions that specifically test the ability to use the tables, rather than focussing on candidates' understanding of what the probabilities really mean.

While such approaches rob the subject of its interest and practicality, resulting in fewer students pursuing it at higher levels, there is even more at stake. Dr Keith Devlin of Stanford University -- in an article titled "All the math taught at university can be outsourced. What now?" -- argues that the West's competitive advantage must come from mathematical creativity; countries like India are already our technical equals and charge a lot less for their skills.

Indeed, Hal Varian, chief economist at Google, has said that "the sexy job in the next ten years" will be that of statistician. Andreas Weigend, formerly chief scientist at Amazon, agrees: "Data is the new oil". Company surveys suggest there is likely to be a serious shortage of suitably qualified data analysts in the years ahead, yet we still seem to expect our bright students to become engineers.

If schools are serious when they claim to be preparing young people for the challenges of the 21st Century, they need to understand that forcing another generation to spend hours solving trigonometric equations and reading books of tables will only result in even more people telling me cheerfully how much they hated maths when they were at school.

]]>In Radio 4's *More or Less*, sports statistician Rob Mastrodomenico took a look at a BBC blog that discusses whether premier league defending is getting worse. He focusses on the fact that the number of goals per game has risen from 2.59 in 2010/11 to 2.97 in 2011/12.

After discussing the football and the pundits, he turns to the statistics. It is unarguable that 2.97 is bigger than 2.59, but he asks whether the difference is statistically significant. He concludes that it is not and calculates how great the difference would need to be for it to be statistically significant. (I addressed the meaning of statistical significance in my blog posting Significance isn't always significant.)

Now the question of statistical significance arises when we have the necessarily limited information that we gain from a **sample** and we want to know what that tells us about a **population**. Because our sample is an *incomplete* picture of the population, we are met with* uncertainty* –- and that's where probability comes in. But if we have *all *of the data, there is no uncertainty and no role for statistical inference. If there is a difference between two things then we must conclude that, well, there is a difference between the two things.

Mastrodomenico's analysis of the BBC's football data treats the games played during the first half of each season as a sample of all the games that could theoretically be played. He sees each season as having an unknown, underlying goals-per-game average and the figures of 2.97 and 2.59 as estimates of their respective averages. It is thus reasonable to carry out standard statistical inference procedures.

It's rather like tossing a coin. We can suppose that any given coin has a particular propensity to come down heads. If we were to toss it many, many times we might end up with a very good idea of what that propensity is. It might be 50:50 heads:tails or, in the case of a biased coin, it might be 60:40. In practice we wouldn't toss the coin millions of times, say, but a couple of hundred times isn't too arduous and the resulting proportion of heads would give us an insight into the coin's underlying propensity towards heads. If we obtained 110 heads out of 200, we might reasonably argue that this is not inconsistent with a long-term propensity of 50:50 heads:tails -- although we would expect 100 heads out of 200 tosses, the process is a random one and a small deviation is not unlikely.

The problem for me in this approach is that I don't believe football matches are like coin tosses. Every coin toss is essentially the same as every other coin toss: though the outcome varies, the circumstances surrounding it are the same. And you can, in principle, repeatedly toss the coin as often as you want. But football matches are subject to many different variables: no two matches are the same. And you cannot endlessly repeat matches in an attempt to discern an underlying goals-per-game average.

In some respects probability and statistics are not like other mathematical disciplines: there are differing schools of thought. There isn't universal agreement, for example, on what a probability actually is and how you can calculate it. Some people believe that a probability is inherent to a situation so that all observers would agree on what that probability is. Others argue that if you have different information to someone else you can reasonably come to a different assessment of the probability of a particular event happening.

Mastrodomenico believes that any given football season has an underlying goals-per-game average and that we can gain insight into what that average is by looking at the actual games played. I look at it differently. I believe we can only go by those actual games. I think football matches are fundamentally different to coin tosses and cannot be treated in the same way. He concludes that we have insufficient evidence of a difference between last season and this season; I say there is a difference. Of course, whether that difference is meaningful and what caused it are completely different questions . . .

]]>During the World Cup last year, a psychic octopus rose to fame by correctly predicting the outcome of eight football matches. Despite the relatively low probability of this happening by chance, most people are content to believe that the octopus was just guessing.

Does *Young Apprentice* work in the same way? Harry M has now been in the boardroom on the losing team six times out of six. Is it fair to conclude that he's the weakest contestant?

Perhaps he is, though he's had some notable and impressive moments. Yet given that the teams are regularly mixed up, is it not inevitable – or, at least, not unlikely – that one person would find themselves on the losing team every week simply by chance?

And does the winning team always win because they're the best? What role does randomness play? There seems to be a lot of post hoc ergo propter hoc analysis in the boardroom. James made great play of the fact that his team found a pocket watch at a considerably cheaper price than the opposition. But might that not have been more luck than judgement? As Zara countered, the watch she bought was the cheapest of the ones she saw; indeed it was cheapest by a large margin.

If the World Cup's octopus was lucky rather than psychic, might not Harry M be unlucky rather than a weak contestant?

More formal analyses on similar lines to the ones I'm suggesting here have been done in the sphere, for example, of investment management. There is evidence to suggest that many apparently successful fund managers may merely be just consistently lucky.

]]>The BBC report the new climate change data with the headline **Global warming since 1995 "now significant"**. While there is an attempt to explain what this means, I think the headline is deceptive and the explanation is unclear. Let me try to do better.

Suppose I have a coin which I suspect is biased in favour of heads. I toss the coin ten times and it comes down heads six times. What conclusions can I draw from this? Well, if the coin is not biased I would have expected it to come down heads five times out of ten. But, crucially, coin tosses are random: if you repeatedly toss a fair coin ten times it doesn't come down heads exactly five times out of ten every single time. Sometimes you get six heads; sometimes you get four. So the fact that I got heads six times is entirely consistent with the coin being a fair coin: for an unbiased coin to come down heads six times out of ten is not that unusual. This evidence is not, therefore, sufficient, to support my suspicion that the coin is biased.

On the other hand, if my coin had come down heads nine times out of ten, I may start to feel justified in my belief that it is biased. Fair coins do sometimes come down heads nine times out of ten, but only quite rarely. If my coin did so I may prefer to believe that it is biased, since coming down heads nine times out of ten is not that unusual for a biased coin.

An outcome is said to be **statistically significant** if the probability of it occurring by chance is very low. How low is low is entirely subjective. A benchmark that is often used is 5%, though the researchers at CERN who are looking for the Higgs boson use a far lower probability, as did the physicists who announced they may have found particles travelling at a speed faster than that of light.

But what statisticians mean by "significant" is simply that there is evidence to suggest, for example, that the coin is biased. The term says nothing about whether the bias *is meaningful in practical terms: ***practical significance**. A conclusion of statistical significance does not distinguish between a coin that is biased in such a way that it comes down heads 50.1% of the time and a coin that comes down heads 90% of the time.

The latest climate change data does offer evidence that the temperature has increased. But there are two reasons to be cautious. First, it remains possible (though unlikely) that the increase is simply a random variation, equivalent to a fair coin coming up heads nine times out of ten. Second, even if the increase is real, it may not be significant in practical terms.

]]>Today Google celebrates Pierre de Fermat’s 410th birthday. He is most famously known for his “last theorem”, the subject of Simon Singh’s excellent book. (If you know about Fermat’s last theorem, jump down to "Molina’s Urns" below.)

The theorem is a generalisation of Pythagoras’s theorem. Every schoolchild knows that, for a right-angled triangle,

where x, y and z are the lengths of the three sides of the triangle, with z the length of the longest side.

Some right-angled triangles have sides whose lengths are whole numbers. For example, a triangle whose sides have lengths 3, 4 and 5 is right-angled because

Fermat was considering the more general equation

where n is a positive whole number. He wondered whether positive whole numbers x, y and z could be found that satisfied this equation if n was larger than 2.

He claimed that no such values could be found and he noted the fact in the margin of an algebra textbook he was reading. He wrote, “I have discovered a truly marvelous demonstration of this proposition that this margin is too narrow to contain.”

Well, that was in 1637. It wasn't until 1995 that Andrew Wiles and Richard Taylor published a proof of the theorem, some 358 year later.

(Incidentally, Fermat was both a lawyer and a mathematician. As am I :P)

**Molina’s Urns**

Some time before Fermat’s last theorem was proved, E. C. Molina invented a problem whose solution relied upon upon it. The problem would only have a solution if Fermat’s last theorem was proved to be false.

There are two urns each containing the same number of balls. Each ball is either black or white.

From each urn the same number of balls is selected. Each time a ball is selected from an urn its colour is noted and it is then put back into the urn before the next ball is selected.

The problem is this. Can the distributions of black and white balls in each urn be chosen such that the following condition is satisfied: the probability that the balls drawn from the first urn are all white must equal the probability that the balls drawn from the second urn are either all white or all black.

Let’s analyse the problem.

We define the following variables:

x = the number of white balls in the second urn

y = the number of black balls in the second urn

z = the number of white balls in the first urn

n = the number of balls selected from each urn

Since the total number of balls in each urn is the same, there are x + y balls in each urn.

The probability that a white ball is selected from the first urn is

Therefore the probability that all of the balls drawn from the first urn are white is

Similarly, the probability that all of the balls drawn from the second urn are white is

and the probability that all of the balls drawn from the second urn are black is

We require the probability that all of the balls drawn from the first urn are white to be equal to the probability that the balls drawn from the second urn are either all white or all black. In other words

which reduces to

So in order to satisfy the requirement of the problem we must be able to find whole numbers which satisfy this equation.

We can certainly do this if n = 2, since the equation becomes Pythagoras’s theorem. Thus we can have 5 white balls and 2 black balls in the first urn, and three white balls and four black balls in the second urn.

If we draw two balls from each urn, the probability that both balls drawn from the first urn are white is

and the probability that the two balls drawn from the second urn are either both white or both black is

which satisfies the requirement of the problem.

If, however, we choose more than two balls from each urn, Fermat’s last theorem tells us that there are no mixtures of black and white balls in each urn that will satisfy the requirement of the problem.

(I found this interesting problem in Fifty Challenging Problems in Probability, by Frederick Mosteller.)

]]>There are many reasons to object to league tables, and the table above illustrates one of them very well: a tiny change in what you measure results in a big change in league table position.

The three columns rank research institutes according to three different, but similar, metrics. Yet the resulting rankings give very different results. For example, the Max Planck Society tops the first two tables, yet falls to 20th place in the third.

That 20th place ranking illustrates another objection I have to tables such as these. The metric on which the institutes are judged in third column is measured to two decimal places. This is bizarrely specific for something that has a number of subjective features. The definition of a "highly cited paper" is arbitrary, and being cited is itself somewhat arbitrary (and, indeed, may not be evidence that the paper being cited is good: the citation may be a criticism, contradiction or refutation).

Of course, there is an exception to every rule. The table below, in which the University of Cambridge is ranked number one, is entirely beyond any criticism.

(As is this year’s **Tompkins table**, which ranks Cambridge colleges by exam success. Trinity College comes top. By happy coincidence, I was at Trinity :)