Chapter 8 Other applications(1 / 1)

The applications of probability away from dice, casino gambling, and various aspects of natural science can be overlooked. In this chapter, I have picked out some of its appearances in law, social science, sport, and economics to emphasize its ubiquity. The common theme is that decisions we make will depend on the probabilities of various outcomes, so we need methods that lead to reasonably reliable estimates of those different probabilities.

Legal matters

Although Lord Denning, one of the best-known UK judges in the 20th century, had a mathematics degree, few lawyers feel comfortable with probability. This ought to be astonishing, as phrases relating to the subject are used freely in courts. In civil cases, such as libel, to say ‘on the balance of probabilities’ clearly puts the dividing line at 50%. But in criminal cases, where a jury is asked to convict only if they are ‘sure’ of Guilt, there is no consensus on a figure. Some people would wish to convict if they were 80% certain of Guilt, others would use 95% or even higher. These are plainly subjective probabilities. And although the same phrase is used whatever the offence, some would apply a lower threshold of proof for a relatively minor offence. This could make it harder to convict mass murderers than fare dodgers.

Suppose an expert witness testifies that the DNA of the accused matches DNA found at a crime scene, and that the chance of a match between the latter and an innocent person chosen at random is one in several million. Jurors may have two distinct problems with this statement. The first is that they may think that it is equivalent to saying that the chance the crime scene DNA is NOT that of the defendant is one in several million. The second is that they may treat all such tiny figures as equivalent, even though one in ten million differs from one in a billion by a factor of a hundred.

The first error has been termed ‘The Prosecutor’s Fallacy’. Starkly, it is equating the chance of Innocence, given a DNA match, to the chance of a DNA match, given Innocence. This is logical nonsense: the chance of zero arising, given a fair roulette wheel, is not the same as the chance that the wheel is fair, given that zero occurred. This trap can be avoided by giving the jury an estimate of how many citizens might match the crime scene DNA. With a population of around 60 million, if the match chance is one in 2 million, there might be thirty or so; if it is 1 in 20 million, there might be about three; it is unlikely there are more than half a dozen. But do not overlook the phrase ‘selected at random’: the more close relatives the criminal has, the more matches we would expect, the less strong this evidence against the accused.

The second error can best be avoided by remembering how Bayes’ Rule measures the usefulness of any piece of evidence. Before this evidence is presented, you have some idea of the odds that the accused is Guilty. If the evidence is ten times more likely under Guilt than Innocence, then the odds of Guilt get multiplied by ten: while evidence that is three times more likely under Innocence than Guilt reduces the odds by a factor of three, and so on. With DNA evidence, it often happens that the chance of the evidence,assuming Guilt, is 100%, which makes the impact of the evidence clear: the odds of Guilt should be multiplied by whatever the ‘several million’ figure actually is.

Randomized response

A head teacher wishes to ascertain what proportion of his senior students smoke cannabis. Direct questions are unlikely to produce truthful answers, but a technique known as randomized response is available. The main idea is that the teacher recording the answers does not know what question is actually being asked, so that cannabis users are able to answer honestly without fear of being identified.

The words ‘I smoke cannabis’ are written on each of 80 cards, and ‘I do not smoke cannabis’ on another 20. Each card is placed in an identical envelope, these 100 envelopes are mixed thoroughly in a large bag. The students should see this operation being done, so that they know that the bag contains both versions of the question, and in those proportions.

Angela selects one envelope at random, opens it, reads the question to herself, and simply says either ‘Agree’ or ‘Disagree’. She then puts the postcard back in the envelope, returns the envelope to the bag, and shakes the bag up ready for the next student.

Suppose that one-third of the responses are ‘Agree’. Because the students are picking the envelopes at random, ‘Agree’ is the honest response from 80% of users, and 20% of non-users. A few lines of algebra show that this is consistent with 2/9 of the students being users. The head teacher has his answer, no individual student has been identified.

Alternatively, replace ‘I do not smoke cannabis’ with a question on an unrelated subject, for which the proportion of ‘Agree’ answers is known. If a previous survey has established that half the students own a pet, and there is no reason to link pet-owning with cannabis smoking, the statement on 20 cards could be ‘I own a pet’. Then, if one-third of the responses are ‘Agree’, we estimate that 7/24 of the students are users.

The calculations giving these estimates are in the Appendix.

The uncertainty as to which question is being asked each time leads to some imprecision in the final estimate. The proportion of envelopes that contain the sensitive question should be as high as possible, but low enough for genuine cannabis users to believe that giving an honest answer will not have repercussions. Putting the sensitive question on as many as 95% of the postcards would not work.

WADA

The World Anti-Doping Agency seeks to promote sport as a healthy activity, by identifying athletes who take performanceenhancing drugs, and excluding them from competitions. But whatever methods are used, any testing programme is liable to two opposing types of error: claiming that an athlete is using drugs when they are innocent, and passing an athlete as clean when they are a drug user.

Unfortunately, methods of reducing the chance of making either type of error often tend to increase the chance of the other. For example, one test measures the ratio of testosterone to epitestosterone. The body normally produces these substances in fairly equal amounts, but those athletes who seek to cheat by injecting testosterone will have a high T/E ratio. Athletes whose ratio is above some specified amount, say six to one, will be banned. However, the T/E ratio varies naturally: it changes over a menstrual cycle, it will increase if you catch flu. Set the critical T/E ratio too high and no drug cheat will fail it; set it too low, and many innocent athletes will be wrongly accused.

Suppose the chance that a particular test makes a mistake is 1%. That means that if the athlete is innocent, the chance they fail is 1%, if they are users the chance they pass is also 1%. Sam fails the test: what is the chance she is innocent?

Put like that, the temptation to say ‘1%’ looks overwhelming – this test gets things wrong one time in a hundred, so if it says she has failed, that will be wrong one time in a hundred. Resist this temptation. The only valid answer is ‘We do not know. It could be any figure. We need to know the proportion of drug cheats in the population.’

For, suppose that proportion is 1% or so. Then among 10,000 athletes we expect 100 drug cheats, and 9,900 innocents. In testing, we expect just one drug cheat to pass, leaving 99 who fail. But 1% of the 9,900 innocents, i.e. another 99 athletes, will also fail the test. Among those who fail, half are innocent: the chance Sam is innocent would be 50%.

If the proportion of cheats differs from 1%, this conclusion changes. If it is higher, the chance that Sam is innocent will be less, but if it is lower, her chance of innocence will be even higher. The lower the proportion of drug cheats, the less satisfactory is this test, despite its apparently impressive performance.

This same logic applies when we consider how to detect potential terrorists at airports. Whatever screening devices are used, they cannot be perfect, but suppose that the probability a real terrorist evades these checks is tiny, 1/10,000, while the chance that an innocent person is led away for intensive interrogation is a minuscule 1/100,000. How likely is it that someone picked out is guilty?

We cannot answer the question without having some idea of the proportion of would-be passengers who are terrorists. Try one in a million – frighteningly high, given that Heathrow handles over fifty million passengers a year. But the figures assure us that, even with fifty potential terrorists, it is overwhelmingly likely that all will be detected.

Unfortunately, five hundred innocent passengers will also be detained! Among those stopped by this system, fewer than 10% are terrorists. And if there are fewer than fifty terrorists, the chance that someone who is stopped is indeed guilty is even lower. Detection methods must have much better performance figures if they are to be useful.

Football results (1)

Betting on the results of soccer matches generates a substantial interest in the UK. All sorts of exotic bets can be made – on the time of the first throw-in, the sum of the shirt numbers worn by all the goal-scorers, how many red and yellow cards will be ? ourished during the game – but most interest is on which of the three results, Home win, Draw, or Away win, will occur. A rational punter will assess the respective probabilities of these results, and his decision on whether to bet, and how much, will rest on these assessments as well as the payout prices offered by the bookies.

But how might the punter deduce his degrees of belief in the different outcomes? In May 2009, statistician David Spiegelhalter took up the challenge on BBC Radio’s More or Less by analysing the ten games to be played in the Premier League two days later. For each game, he estimated the number of goals each team might score, on average, taking account its own strength in attack, and their opponents’ defensive capabilities. For example, a strong Home team (Arsenal) were estimated to score 2.1 goals, on average, against Stoke City.

No team can score 2.1 goals, but that figure is just the qaverage over a hypothetical number of matches. The crucial step is to assess the probabilities of 0, 1, 2, 3, . . . goals in a single game, and Spiegelhalter used the Poisson distribution. Data over many years show that this is pretty good at describing how the actual number of goals tends to vary around its average. With Arsenal’s figure of 2.1, the chance of no goals came out as 12%, one goal as 26%, two goals as 27%, three as 19%, and so on.

Data for Stoke put their mean score as 0.67 goals. This translates into a 51% chance of no goals, a 34% chance of just one goal, 11% for two goals, and so on. With a leap of faith, take the numbers of goals scored by each team as independent. So the probability of a 2-1 score comes from multiplying the chance the Home team scores twice by the chance the Away team scores once – in this case, 27% * 34%, around 9%.

In this way, the probability of any possible score is estimated. Then the probabilities for each of Home win, Draw, and Away win are found from the Addition Law, by adding up the separate probabilities of all the scores that lead to those three respective results. This gave Arsenal a 72% chance of victory, Stoke had a 10% chance, leaving a chance of 18% for the Draw. The score given the highest probability, at 14%, was 2-0.

Do not scoff! In the ten games, the exact score given the highest probability happened twice, and eight of the ten match results were those that were identified as being the most likely. A betting man, who had placed money on each ‘most likely result’ and on each ‘predicted’ exact score, would have smiled happily as the match scores unfolded.

How can we reconcile a 72% degree of belief that Arsenal would win that match with ideas of frequency, as there is no question of playing this game hundreds of times and counting how often Arsenal won? Recall how we judged the reliability of a weather forecaster when she says that the chance of rain tomorrow is 30%:

there is only one tomorrow, it will either rain or it will not. However, we can look at all the occasions when she gives rain a 30% chance, and check its actual frequency. We shall believe her claim about tomorrow, or not, on the basis of her overall record. With soccer matches, we can make similar calculations for all games played over the season. Among these, there might be forty or so where some result was given a probability close to 72% – we can check whether the ‘predicted’ result did occur with a frequency around 72%, as a way of validating our methods.

Can a gambler expect to make money by using these ideas? The payout prices depend heavily on how much is staked on each outcome, and the largest sums are usually staked on one team or the other to win. Bets on a Draw tend not to attract committed fans. If the chance of a Draw is assessed as 25%, and the payout price is better than three to one, the opportunity to profit is there.

Do not assume that the best bet is on the outcome with the highest predicted probability!

Football results (2)

Before the 2010 soccer World Cup Finals began, statistician Ian McHale published the results of his calculations, which allocated to each of the 32 teams some non-zero probability of winning the trophy. He made Spain the favourites, albeit with a winning chance of only 11.6%, followed by Brazil, whose chance was put at 10.3%.

To obtain these figures, McHale used an approach similar to that described above for each match. However, he did not make a direct calculation of the probabilities of the distinct match outcomes, he relied on a Monte Carlo simulation.

Thus, for a match in which England’s mean score was put at 1.5 goals, the Poisson model gives a 22% chance of no goals, a 33% chance of one goal, and so on. The computer’s random number generator selected one of the values 0, 1, 2, 3, . . . with the appropriate probabilities, and did the same thing for England’s opponents, leading to some simulated score such as a 2-2 draw. Similar simulations were made for every scheduled match, leading to simulated group tables, and then to matches in the knockout stages all the way to the final. This process was repeated 100,000 times, and the number of simulations in which each team emerged as champions was recorded. Spain ‘won’ 11,633 times, hence the 11.6% figure noted earlier. The Law of Large Numbers, as usual, is the justification.

And Spain did win! Were McHale’s probabilities ‘correct’? We cannot know. Perhaps Spain would have won 65% of the time, had it been possible to make indefinite repetitions of the tournament. But the best evidence that his methods make good sense is that bookmakers follow a similar path to set their initial payout prices to attract punters.

Black-Scholes

Share prices on stock markets fluctuate, sometimes for no apparent reason. If the price is £5 today, you do not know what the price will be next month. However, you can buy an option – the right to buy (or sell) that share at the strike price of £5.20 at a given future time. If, at that time, the market price is less than £5.20, you will not exercise your option to buy, but if it is above that price, you can make an instant profit by taking up the option,

and immediately selling. Corresponding remarks apply to an option to sell. What are fair prices for these options?

Fischer Black and Myron Scholes addressed this question in 1973. At the heart of their work was the assumption that the changes in share prices varied randomly, but in a particular way related to the Gaussian distribution. The fair prices for both buy and sell options were found to depend on the current price, the price at which the option would be exercised, the intervening time period, prevailing interest rates, and the volatility of the underlying share price (as measured by the standard deviation over a period): but not on the mean amount by which the share price was expected to change!

This last point may be surprising, but that is how things work out. It is also quite useful, as it means that we have no need to add to any uncertainty by estimating the trend in prices. If you want to discover the fair price for some particular option, free software is widely available – just type ‘Black-Scholes’ into your favourite search engine. Given the current price and the strike price, the fair cost of a buy option would increase if the time period were longer, or if interest rates were higher, or if the volatility of the share price were higher.

How do the claims in this last sentence accord with your intuition? The first does seem reasonable, as the longer you are prepared to wait, the higher the chance of an increase in the underlying share price, but the other two claims are more subtle. An increase in volatility also increases the chance of a jump in the share price, but it also happens to decrease the mean change in price – and the former effect turns out to be bigger.

The volatility is measured by looking at the changes in the share price over the 250 or so trading days in one year. This should give enough data for an estimate to be reliable, but not stretch so far into the past as to be irrelevant for current conditions. A poor estimate of the volatility will lead to an unreasonable price for an option.

A model is only useful when its key assumptions are not violated. And, as Figure 7 shows, taking the Gaussian distribution as a model for price fluctuations implies a really tiny probability for catastrophic events, such as the price dropping by more than three or four standard deviations. When the actual probability of such an event is significantly underestimated, the model is undermined, and the conclusions it indicates may have no sound basis at all. The extreme-value distributions, mentioned in Chapter 4 , have been used to address this problem.

Share portfolios

Companies A and B are both expected to make profits. With low interest rates, A is expected to return 20%, while B should return 40%; with high interest rates, the positions are reversed – A should gain 40%, B 20%. Suppose Nick is a risk-averse investor, while Mary is risk-attracted.

If low or high interest rates are seen as equally likely, both companies may look equally attractive, with a mean return of 30%. In accordance with their respective attitudes to risk, Nick could divide his funds equally between the two companies, and guarantee to get 30%, whether rates are high or low: Mary could plump for one company or the other, hoping to get 40% but accepting she might get only 20%.

Suppose B is replaced by company C, which will return 10% with low interest rates, or 50% if rates are high – again an average of 30%, like company B. But now mixing A and C makes no sense to either investor: Nick prefers A alone, Mary puts everything into C.

The essential difference is that the returns from A and B are negatively correlated – in conditions when one is high, the other tends to be low; but returns for A and C are positively correlated – they do better or worse together. ‘Correlation’ is measured on a scale from –1 (total negative correlation) to +1 (total positive correlation). If two assets fluctuate in value independently of each other, their correlation will be zero.

Risk-averse investors are encouraged to diversify their holdings, so that any losses might be balanced by gains elsewhere. They wish to hold negatively correlated assets. But there is an inescapable piece of logic: if X is negatively correlated with Y, and Y is negatively correlated with Z, then X and Z will tend to be positively correlated!

However, all is not lost. A mathematical result, due to Salomon Bochner, proves that it is indeed possible for each pair of assets in a large portfolio to be negatively correlated; but the greater the number of assets, the harder it is to achieve mutual negative correlation.