Chapter 2 The workings of probability(1 / 1)

As well as the subjective, objective, and frequentist approaches to probability, there are other standpoints. For example, should one always insist on associating a probability with a number? Might it be enough to say that one probability was greater, or one degree of belief was more intense, than another? And should we necessarily offer an initial set of axioms – self-evident truths – on which to erect a theory?

Many distinguished writers have felt it useful to have two separate approaches, one for degrees of belief and one for objective probabilities. Both would have the same rules of logic, free from contradictions, but how values of probabilities were arrived at, and how they are interpreted, could differ. Any theory should be consistent with the classical view, based on repeatable experiments with equally likely outcomes. So we will focus on that case, seeking any rules that the notion of probability must obey.

The Addition Law

Deal one card from a well-shuffled pack. We take all cards as equally likely, so the probability of any event, such as obtaining a Club, or a Spade, or an Ace is found by calculating the proportion of all possible outcomes that lead to those events. How might we find the probability that either of two such events occur?If those events have no outcomes in common, we say that they are mutually exclusive , or disjoint . The events ‘Get a Spade’ and ‘Get a Club’ are disjoint, but the events ‘Get a Spade’ and ‘Get an Ace’ are not disjoint, as the Ace of Spades belongs to both. When two events are mutually exclusive, then the total number of outcomes that lead to either event is just the sum of the numbers for each event separately, so we have a simple result: whenever two events are mutually exclusive ,

the probability that at least one occurs is the sum of their individual probabilities.

This is the Addition Law . It plainly holds in all experiments where we would take the classical view: using the balls in a bag analogy, it is the same as saying that the total number of balls that are either Red or Blue is the sum of the number of Red balls and the number of Blue balls. And in any repeatable experiment, such as rolling dice,

or spinning roulette wheels, the sum of the individual frequencies of two disjoint events is inevitably the frequency that at least one of them occurs. So frequentists accept the Addition Law too.

Also a subjectivist accepts this Law. For otherwise, there would be two disjoint events, call them A and B, where it did not hold. In that case, the subjectivist could be confronted by three bets: one about A, one about B, and one about either A or B, and would accept each bet on its own as fair. But he could be guaranteed to lose money if all three bets were struck! The Addition Law forbids this inconsistency.

This Addition Law extends to a collection of many events, provided no two of them have any outcomes in common – they are pairwise disjoint . The probability that at least one among even millions of pairwise disjoint events occurs is just the sum of their individual probabilities. But suppose the number of outcomes is no longer finite: for example, tossing an ordinary coin repeatedly until Heads appears for the first time.

The possible outcomes of this experiment are the unending list {1, 2, 3, 4,….}, each with its own non-zero probability. What is the chance that we take some even number of throws to get a Head? That event happens when any of the outcomes {2, 4, 6, 8, . . . .} happen. Could we compute its probability by adding up the corresponding individual probabilities?

There is no mathematical difficulty in doing this adding up, but that action falls outside the scope of the classical view of probability, which deals only with a finite list of outcomes. There is no consensus as to whether the Addition Law for such an unending list should be part of the workings of probability. In favour of including it, we may be able to find the probabilities of a wider class of events than without it; against inclusion, as it is not part of the classical theory, we should be cautious about taking steps that might have hidden pitfalls. There is no right or wrong answer.

I’m a pragmatist. I am content to extend the use of the Addition Law in this way, and I have never felt uncomfortable with the results of doing so. This position is a standard part of the account given in most books used to teach the subject at university. But de Finetti took the cautious view to avoid making this extension, and others have felt the same way.

The Multiplication Law

If you toss an ordinary coin, you will expect to guess Heads or Tails correctly half the time. If you shuffle a deck of cards, and predict whether the top card is Red or Black, you also expect to be correct half the time. When you guess both a coin toss and a card colour, how likely are you to get both correct?

Think of conducting this double experiment a hundred times. You expect to guess the coin correctly about fifty times, and when you do so, you expect to go on to guess the card colour half the time.

That suggests you expect to get both right on about twenty-five occasions, and it looks sensible to offer 25%, or 1/4, as the chance of being right both times. For these experiments, the chance of being correct both times is found just by multiplying their individual chances.

Ten balls of the same size and composition are labelled with the numbers zero up to nine, and one of them is selected completely at random. So it is equally likely to show a Low number (zero to four), or a High number (five to nine). Five of these numbers are coloured Green, the rest are Blue, so Green and Blue are also equally likely. Trying to guess the colour, or whether it is Low or High, we have a 50% chance either time. What about the chance that the ball we draw is both Low and Green?

The argument above with the coin and the cards suggests 1/4 as the answer, but a moment’s thought shows this cannot be correct. With ten balls, it is impossible that 1/4 of them (two and a half!) will be both Low and Green! The correct answer depends on which numbers are coloured Green, and which Blue. So suppose numbers one to five are Green, the rest are Blue.

In that case, four of the ten numbers (one, two, three, and four) are both Low and Green, so the chance is 0.4. But, as we did with the first problem, we can also use a two-stage process: in one hundred repetitions of this experiment, we expect to get a Low number fifty times. Four of the five Low numbers are Green, so when we did get a Low number, we expect it to be Green 4/5 of the time. Overall, we expect a Low Green number forty times, leading again to the answer 0.4.

With the coin and cards, the outcome of the coin toss has no bearing on the card drawn. We do not change our minds about the chance of a Red card when told whether the coin falls Heads – the conditional probability of the second event, given the first, is just its ordinary probability. When this happens, the two events are said to be independent , and the chance that both occur is the product of their individual probabilities.

With the ten balls, the chance both events occur also arises as a product, in which the first component is also the probability of one event (Low number), and the second is the conditional probability of Green when this happens. So really the two calculations are identical in form, the only difference is whether the outcome of the first event affects the chance of the second. Each time, we have used the Multiplication Law , which says:

the probability that both of two events occur is the probability of the first, multiplied by the probability of the second, conditional on the first happening.

Independence

We used the term ‘independent’ to describe the circumstances when the occurrence of the first event does not change our assessment of the chance of the second. Suppose this holds, but we learn that the second event has happened; might this affect our assessment of the chance of the first?

No. Whenever the fact that one event has or has not occurred makes no difference to the chances of another event, it turns out that whether or not this second event occurs makes no difference to the chances of the first. Two events are independent when the occurrence or non-occurrence of either makes no difference to the probability of the other. To find the chance both occur, multiply their individual chances.

Events that have no bearing at all on each other, like rain today in Tunis and the gender of the next birth in Paris, are surely independent. But sometimes independence is not obvious. Using an ordinary fair die, consider the events ‘Get an even number’ and‘Get a multiple of three’, with respective chances one-half and one-third. The only way both occur is when we get a Six, having probability one-sixth. And since multiplying one-half and one-third gives one-sixth, those two events are independent. The chance of getting an even number does not change if we are told whether or not a multiple of three occurs (and vice versa).

Now consider the same problem when you have an eight-sided fair die, or a ten-sided fair die, with the sides labelled one to eight or one to ten respectively. Do the arithmetic: you should find that the two events are independent in one of these cases, but not independent in the other. Intuition about independence is useful, but not always enough.

Assuming two factors to be independent, when they are not, is one of the most common mistakes made in assessing probabilities. Suppose that, in a university’s graduate school, half the students are female, and one in five study engineering. Select one student at random: the probability this student is female can be taken as one-half, the probability the student studies engineering will be one-fifth. However, you will find that the probability the student is a female engineer is much less than their product, one-tenth.

Events with overlap

The Addition Law shows how to find the chance of at least one of two events, provided they are disjoint. What if they are not disjoint? For example, in drawing one card at random, what is the chance it is either a Spade or an Ace? The Ace of Spades falls in both categories, so if we just added the respective probabilities, we would count that card twice. To correct for the outcomes that would be double-counted and find the probability that at least one of two events occurs,

add their individual chances, then subtract the chance both occur.Of course, if the two events are disjoint, it is impossible that both happen, so this extra term has value zero, and we are back to the original Addition Law.

Let’s see this notion in action in the two earlier examples. With the coin and Red/Black card, the chance we get at least one guess right comes from the arithmetic 1/2+1/2-1/4, which is 3/4. In the other example, the chance a randomly chosen numbered ball is either Low or Green is 1/2+1/2-0.4 = 0.6.

And the chance of drawing either a Spade or an Ace arises as 13/52+4/52-1/52 = 16/52, confirmed by noting that exactly 16 of the 52 cards satisfy this condition.

This last calculation should warn you against making early arithmetic simplifications. Yes, 13/52 is the same as 1/4, and 4/52 is the same as 1/13, but to add 1/4 and 1/13, you are better off with their original fractions. And it is seldom helpful to re-write a friendly fraction like 5/13 as its ugly decimal approximation 0.38461538 . . .

More than two events

A collection of many events is described as independent whenever knowing whether or not some of them occur makes no difference to the probabilities of any of the others. In this case, the Multiplication Law means that, whatever selection of events we make from this collection, the probability that all of them occur is just the product of their individual probabilities.

But how might we find the probability that three or more events all occur, when they are not independent? For example, in whist or bridge, the cards are randomly shuffled and shared equally among four players. How likely is it that they all receive exactly one Ace?

Consider the four separate events: Anne gets exactly one Ace; Brian gets one Ace; Colin has one Ace; Debby gets one Ace. Plainly, these four events are not independent, as if any three of them happen, the other is certain. We will find the probability they all occur via a three-stage process.

First, we find the probability that Anne has exactly one Ace. Assuming all possible ways of dealing the cards are equally likely, we have an exercise in counting: count the total number of possible deals, and then count in how many of them she gets exactly one Ace. Believe me, the chance works out as just under 44%.

Assume Anne has just one Ace (and hence twelve non-Aces). That leaves three Aces and thirty-six non-Aces for the other players, and Brian gets thirteen of them, chosen at random. A similar counting exercise on this smaller pack shows that the chance he would get exactly one Ace is just over 46%. The Multiplication Law then tells us that the chance of both events, i.e. that both Anne and Brian have exactly one Ace, is the product of these two values, just over 20%.

So now assume Anne and Brian each have exactly one Ace. Then Colin receives thirteen cards at random from the two Aces and twenty-four non-Aces that remain: the chance that he gets exactly one Ace is found to be 52%.

The final step is to use the Multiplication Law once more, to combine these last two calculations: the chance that Anne and Brian, and then also Colin, all have exactly one Ace is a little over 10%. If this happens, Debby inevitably has the final Ace, so we have found the answer we seek.

This answer itself is of no real consequence – although the deal is totally random, the most equitable outcome for the Aces is rather unlikely – but the method used is universal. To find the chance that every event in a collection occurs, break things down into stages. Find the chance for one event; then, assuming this event occurs, find the chance of a second; now assuming both of these occur, find the chance of a third; then assuming all these three events happen, find the chance of a fourth – and so on. Finally, multiply all these quantities together.

Where else might we have to follow this path? Suppose my journey has three stages, and I can assess their separate chances of having no delay: however, all the stages will be affected by the weather, and delay on one stage will change the chance of delay elsewhere. In manufacturing industry, the safety of a piece of equipment will rely on several components which do not operate independently – some may use the same water supply, others may have been inadequately tested by the same unreliable employee. With a medical procedure, whether or not the things that can go wrong are independent of each other can make a huge difference to the overall chance that all turns out well.

If events are independent, then the chance they all occur is just the product of their individual chances. But we are seldom lucky enough for this condition to hold: a stage-by-stage assessment, with probabilities changing as the work progresses, is the norm.

What about the chance that at least one of three or more events occurs? The Addition Law does extend to this case, but as the expression is cumbersome, I will not write it down. Its recipe follows the same path as described when using the Multiplication Law for the chance that all of many events occur: take it one step at a time.

Using the word independent when disjoint is meant, and vice versa, are common errors. The example of choosing one card at random can help you see how to avoid it. Here, the events ‘Get a Spade’ and ‘Get a Club’ are disjoint, but far from independent, as if either occurs, the other cannot, so the chance both occur is zero! Also ‘Get a Spade’ and ‘Get an Ace’ are independent (yes?), but plainly not disjoint.

Remember: the Addition Law is used to find the chance of at least one event, the Multiplication Law is used to give the chance they all occur.

It is sometimes said that counting really goes one, two, infinity. This aphorism carries the truth that if we can make the step from dealing with one case to dealing with two cases, then subsequent steps to three, four, five, etc. cases are trivial in comparison. This surely applies to both the Addition and Multiplication Laws.

A neat trick

Any event either happens, or it does not. The total probability is split between the event happening, and it failing to happen. So if we can find the chance an event does not happen, we can deduce the chance it does occur by subtraction from 100%.

To illustrate, let’s find the chance of at least one Six when an ordinary fair die is rolled twice. Any outcome is a pair of numbers showing the scores of the first roll, then the second, e.g. (5, 2) or (4, 4), and we take all such outcomes as equally likely. Each roll has six possible results, leading to 6x6=36 outcomes altogether. Our event does not happen when neither die shows a Six, for which there are 5x5=25 outcomes. The chance of no Sixes is 25/36, so the chance of at least one Six is 11/36, a bit less than one in three.

This leads on to a junior version of a gambling problem solved by Blaise Pascal and Pierre de Fermat in 1654. How often must we roll a die to make it more likely than not that we get at least one Six, i.e. that the chance of getting a Six is more than one half? We have just seen that two rolls are not enough.

Each extra roll increases the number of possible outcomes by a factor of six, while the number of outcomes without a Six gets multiplied by five. So a third throw generates 216 outcomes in all, and 125 of them – over half – contain no Six; three rolls are not enough either. However, four rolls give 1,296 outcomes, and only 625 of them have no Sixes – fewer than half. That leaves more outcomes that include a Six than outcomes with no Sixes, so now a Six is more likely than not. Four rolls suffice.

The actual game analysed by Pascal and Fermat involved rolling not one die, but two dice together; and asking how often this needs to be done to make it more likely than not that a Double-Six turns up at least once. The method of solution is the same, but the raw arithmetic is formidable. Today, we can quickly reach the answer with a microcomputer or a pocket calculator, while logarithms and slide rules had conveniently just become available in the 17th century. With up to 24 rolls, it is more likely than not that no Double-Six appears, but a 25th roll tips the balance the other way.

Most problems of the form ‘Find the chance of at least one of these events happening’ are best solved in this manner: work out the chance that none of them arises, and then subtract from unity.