Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Probability - Everything You Need To Know
Description: These are notes on everything in probability. Contents 1 Basic ideas 1 1.1 Sample space, events . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 What is probability? . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Kolmogorov’s Axioms . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Proving things from the axioms . . . . . . . . . . . . . . . . . . . 4 1.5 Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . . . 6 1.6 Other results about sets . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Stopping rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.9 Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . 13 1.10 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.11 Mutual independence . . . . . . . . . . . . . . . . . . . . . . . . 16 1.12 Properties of independence . . . . . . . . . . . . . . . . . . . . . 17 1.13 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 Conditional probability 23 2.1 What is conditional probability? . . . . . . . . . . . . . . . . . . 23 2.2 Genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 The Theorem of Total Probability . . . . . . . . . . . . . . . . . 26 2.4 Sampling revisited . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6 Iterated conditional probability . . . . . . . . . . . . . . . . . . . 31 2.7 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Random variables 39 3.1 What are random variables? . . . . . . . . . . . . . . . . . . . . 39 3.2 Probability mass function . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Expected value and variance . . . . . . . . . . . . . . . . . . . . 41 3.4 Joint p.m.f. of two random variables . . . . . . . . . . . . . . . . 43 3.5 Some discrete random variables . . . . . . . . . . . . . . . . . . 47 3.6 Continuous random variables . . . . . . . . . . . . . . . . . . . . 55 vii viii CONTENTS 3.7 Median, quartiles, percentiles . . . . . . . . . . . . . . . . . . . . 57 3.8 Some continuous random variables . . . . . . . . . . . . . . . . . 58 3.9 On using tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.10 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 More on joint distribution 67 4.1 Covariance and correlation . . . . . . . . . . . . . . . . . . . . . 67 4.2 Conditional random variables . . . . . . . . . . . . . . . . . . . . 70 4.3 Joint distribution of continuous r.v.s . . . . . . . . . . . . . . . . 73 4.4 Transformation of random variables . . . . . . . . . . . . . . . . 74 4.5 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A Mathematical notation 79 B Probability and random variables 83

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Notes on Probability
Peter J
...

The description of the course is as follows:
This course introduces the basic notions of probability theory and develops them to the stage where one can begin to use probabilistic
ideas in statistical inference and modelling, and the study of stochastic
processes
...
Conditional probability and independence
...
Continuous
distributions
...
Independence
...
Mean,
variance, covariance, correlation
...

The syllabus is as follows:
1
...
Sample spaces, events, relative frequency,
probability axioms
...
Finite sample spaces
...
Combinatorial probability
...
Conditional probability
...
Bayes theorem
...
Independence of two events
...
Sampling
with and without replacement
...
Random variables
...

Standard distributions - hypergeometric, binomial, geometric, Poisson, uniform, normal, exponential
...
Probabilities of events in terms of random variables
...
Transformations of a single random variable
...

7
...
Marginal and conditional distributions
...

iii

iv
8
...
Means and variances of linear functions of random
variables
...
Limiting distributions in the Binomial case
...
They have been “fieldtested” on the class of 2000
...

Set books The notes cover only material in the Probability I course
...

You need at most one of the three textbooks listed below, but you will need the
statistical tables
...
Devore (fifth edition), published by Wadsworth
...
However, the lectures go into more detail at several points,
especially proofs
...

Other books which you can use instead are:
• Probability and Statistics in Engineering and Management Science by W
...

Hines and D
...
Montgomery, published by Wiley, Chapters 2–8
...
Rice, published by
Wadsworth, Chapters 1–4
...
V
...
F
...

You need to become familiar with the tables in this book, which will be provided
for you in examinations
...

The next book is not compulsory but introduces the ideas in a friendly way:
• Taking Chances: Winning with Probability, by John Haigh, published by
Oxford University Press
...
maths
...
ac
...

Other web pages of interest include
http://www
...
edu/˜chance/teaching aids/
books articles/probability book/pdf
...
Grinstead and J
...

http://www
...
uah
...

http://www
...
cam
...
uk/wmy2kposters/july/
The Birthday Paradox (poster in the London Underground, July 2000)
...
combinatorics
...
html
An article on Venn diagrams by Frank Ruskey, with history and many nice pictures
...

Peter J
...
1 Sample space, events
...
2 What is probability?
...
3 Kolmogorov’s Axioms
...
4 Proving things from the axioms
...
5 Inclusion-Exclusion Principle
...
6 Other results about sets
...
7 Sampling
...
8 Stopping rules
...
9 Questionnaire results
...
10 Independence
...
11 Mutual independence
...
12 Properties of independence
...
13 Worked examples
...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...


...


...


...


...


...


...



...


...


...


...


...


...


...
1 What is conditional probability?
...
2 Genetics
...
3 The Theorem of Total Probability
2
...

2
...

2
...

2
...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...


...


...


...



...


...


...


...
1 What are random variables?
...
2 Probability mass function
...
3 Expected value and variance
...
4 Joint p
...
f
...
5 Some discrete random variables
...
6 Continuous random variables
...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...
7
3
...
9
3
...

Some continuous random variables
On using tables
...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...


...



...
58

...
63

More on joint distribution
4
...

4
...

4
...
v
...
4 Transformation of random variables
4
...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...



...


...


...


...


...


67
67
70
73
74
77

A Mathematical notation

79

B Probability and random variables

83

Chapter 1
Basic ideas
In this chapter, we don’t really answer the question ‘What is probability?’ Nobody has a really good answer to this question
...
We also look at different kinds of sampling, and examine
what it means for events to be independent
...
1

Sample space, events

The general setting is: We perform an experiment which can have a number of
different outcomes
...
We usually call it S
...
For example, if I plant
ten bean seeds and count the number that germinate, the sample space is

S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
...

Sometimes we can assume that all the outcomes are equally likely
...
In the beans example, it is most unlikely
...
) If all outcomes are equally likely,
then each has probability 1/|S |
...

1

2

CHAPTER 1
...
In particular, “cases of equal probability” are often hypothetically stipulated when the theoretical methods
employed are definite enough to permit a deduction rather than a stipulation
...
We can specify an event by listing all the outcomes
that make it up
...
Then
A = {HHH, HHT, HT H, T HH},
B = {HHH, HT H, T HH, T T H}
...
So, if all outcomes are equally likely, we
have
|A|
P(A) =

...

An event is simple if it consists of just a single outcome, and is compound
otherwise
...
If A = {a} is a simple event,
then the probability of A is just the probability of the outcome a, and we usually
write P(a), which is simpler to write than P({a})
...
)
We can build new events from old ones:
• A ∪ B (read ‘A union B’) consists of all the outcomes in A or in B (or both!)
• A ∩ B (read ‘A intersection B’) consists of all the outcomes in both A and B;
• A \ B (read ‘A minus B’) consists of all the outcomes in A but not in B;
• A (read ‘A complement’) consists of all outcomes not in A (that is, S \ A);
/
• 0 (read ‘empty set’) for the event which doesn’t contain any outcomes
...
2
...

In the example, A is the event ‘more tails than heads’, and A ∩ B is the event
{HHH, T HH, HT H}
...
2

What is probability?

There is really no answer to this question
...
That is, to say that the probability of getting heads when a coin is tossed means that, if the coin is tossed many
times, it is likely to come down heads about half the time
...
You wouldn’t be surprised
to get only 495
...
But this argument doesn’t work in all
cases, and it doesn’t explain what probability means
...
You say that the probability of heads in a
coin toss is 1/2 because you have no reason for thinking either heads or tails more
likely; you might change your view if you knew that the owner of the coin was a
magician or a con man
...

We regard probability as a mathematical construction satisfying some axioms
(devised by the Russian mathematician A
...
Kolmogorov)
...
The answer
agrees well with experiment
...
3

Kolmogorov’s Axioms

Remember that an event is a subset of the sample space S
...
, are called mutually disjoint or pairwise disjoint if Ai ∩ A j = 0 for
any two of the events Ai and A j ; that is, no two of the events overlap
...
These numbers satisfy three axioms:
Axiom 1: For any event A, we have P(A) ≥ 0
...


4

CHAPTER 1
...
are pairwise disjoint, then
P(A1 ∪ A2 ∪ · · ·) = P(A1 ) + P(A2 ) + · · ·
Note that in Axiom 3, we have the union of events and the sum of numbers
...
Sometimes we separate Axiom 3 into two parts: Axiom 3a if there are only finitely many events
A1 , A2 ,
...
We will only use Axiom 3a, but 3b is important
later on
...


1
...
That means,
every step must be justified by appealing to an axiom
...

Here are some examples of things proved from the axioms
...
Usually, a theorem is a big, important statement; a proposition a rather
smaller statement; and a corollary is something that follows quite easily from a
theorem or proposition that came before
...
1 If the event A contains only a finite number of outcomes, say
A = {a1 , a2 ,
...


To prove the proposition, we define a new event Ai containing only the outcome ai , that is, Ai = {ai }, for i = 1,
...
Then A1 ,
...
4
...

Corollary 1
...
, an }, then
P(a1 ) + P(a2 ) + · · · + P(an ) = 1
...
1, and P(S ) = 1 by
Axiom 2
...

Now we see that, if all the n outcomes are equally likely, and their probabilities sum to 1, then each has probability 1/n, that is, 1/|S |
...
1, we see that, if all outcomes are equally likely, then
P(A) =

|A|
|S |

for any event A, justifying the principle we used earlier
...
3 P(A ) = 1 − P(A) for any event A
...
Then A1 ∩ A2 = 0 (that is, the
events A1 and A2 are disjoint), and A1 ∪ A2 = S
...

So P(A) = P(A1 ) = 1 − P(A2 )
...
4 P(A) ≤ 1 for any event A
...
3, and P(A ) ≥ 0 by Axiom 1; so 1 −
P(A) ≥ 0, from which we get P(A) ≤ 1
...
5 P(0) = 0
...
3; and P(S ) = 1 by Axiom 2,
/
so P(0) = 0
...
BASIC IDEAS

Here is another result
...

Proposition 1
...

/
This time, take A1 = A, A2 = B \ A
...
So by Axiom 3,
P(A1 ) + P(A2 ) = P(A1 ∪ A2 ) = P(B)
...
Now P(B \ A) ≥ 0 by Axiom 1; so
P(A) ≤ P(B),
as we had to show
...
5

Inclusion-Exclusion Principle



A

B




A Venn diagram for two sets A and B suggests that, to find the size of A ∪ B,
we add the size of A and the size of B, but then we have included the size of A ∩ B
twice, so we have to take it off
...
7
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
...
We
see that A ∪ B is made up of three parts, namely
A1 = A ∩ B,

A2 = A \ B,

A3 = B \ A
...
Similarly we have A1 ∪A2 = A and A1 ∪A3 =
B
...
(We have three pairs of sets to check
...

The arguments for the other two pairs are similar – you should do them yourself
...
6
...

From this we obtain
P(A) + P(B) − P(A ∩ B) = (P(A1 ) + P(A2 )) + (P(A1 ) + P(A3 )) − P(A1 )
= P(A1 ) + P(A2 ) + P(A3 )
= P(A ∪ B)
as required
...
Here it is for three events; try to prove it yourself
...
The parts in
common have been counted twice, so we subtract P(A∩B), P(A∩C) and P(B∩C)
...

Proposition 1
...

Can you extend this to any number of events?

1
...
Here are some examples
...
9 Let A, B,C be subsets of S
...

De Morgan’s Laws: (A ∪ B) = A ∩ B and (A ∩ B) = A ∪ B
...
You should draw Venn diagrams and
convince yourself that they work
...
BASIC IDEAS

1
...
I draw a
pen; each pen has the same chance of being selected
...
In this case, if A is the event ‘red or
green pen chosen’, then
|A| 2 1
P(A) =
= =
...

What if we choose more than one pen? We have to be more careful to specify
the sample space
...

Sampling with replacement means that we choose a pen, note its colour, put
it back and shake the drawer, then choose a pen again (which may be the same
pen as before or a different one), and so on until the required number of pens have
been chosen
...

Sampling without replacement means that we choose a pen but do not put it
back, so that our final selection cannot include two pens of the same colour
...


1
...
SAMPLING

9

Now there is another issue, depending on whether we care about the order in
which the pens are chosen
...
It doesn’t really matter in this case whether we choose the
pens one at a time or simply take two pens out of the drawer; and we are not
interested in which pen was chosen first
...
(Each element is written as a set since, in a set, we don’t
care which element is first, only which elements are actually present
...
We should not be surprised that this is the same as in
the previous case
...
These involve the following functions:
n! = n(n − 1)(n − 2) · · · 1
n
Pk = n(n − 1)(n − 2) · · · (n − k + 1)
n
Ck = n Pk /k!
Note that n! is the product of all the whole numbers from 1 to n; and
n

Pk =

so that
n

Ck =

n!
,
(n − k)!

n!

...
10 The number of selections of k objects from a set of n objects is
given in the following table
...

Here are the proofs of the other three cases
...
(Think of
the choices as being described by a branching tree
...


10

CHAPTER 1
...

As before, we multiply
...

For sampling without replacement and unordered sample, think first of choosing an ordered sample, which we can do in n Pk ways
...
So we divide by k!, obtaining n Pk /k! = nCk choices
...

Note that, if we use the phrase ‘sampling without replacement, ordered sample’, or any other combination, we are assuming that all outcomes are equally
likely
...
Three
names are drawn out; these will be the days of the Probability I lectures
...
For ordered samples, the
size of the sample space is 7 P3 = 7 · 6 · 5 = 210
...
Thus, P(A) = 60/210 = 2/7
...

Example A six-sided die is rolled twice
...
So the number of elements in the sample space is 62 = 36
...
So the probability of the event is 6/36 = 1/6
...
We draw
ten balls from the box, and we are interested in the event that exactly 5 of the balls
are red and 5 are blue
...


1
...
SAMPLING

11

Consider sampling with replacement
...
What is |A|? The
number of ways in which we can choose first five red balls and then five blue ones
(that is, RRRRRBBBBB), is 105 · 105 = 1010
...
In fact, the five red balls could appear in any five of
the ten draws
...
So we have
|A| = 252 · 1010 ,
and so

252 · 1010
= 0
...

2010
Now consider sampling without replacement
...
There are 10 P5 ways of choosing five of the ten red
balls, and the same for the ten blue balls, and as in the previous case there are
10C patterns of red and blue balls
...
343
...
There are 10C5
choices of the five red balls and the same for the blue balls
...
So
|A| = (10C5 )2 ,
and

(10C5 )2
= 0
...

20C
10
This is the same answer as in the case before, as it should be; the question doesn’t
care about order of choices!
So the event is more likely if we sample with replacement
...
I
take out three coins at random
...

So |S | = 13C3 = 286
...
So the probability is 72/286 = 0
...


12

CHAPTER 1
...
087
...
If it is without replacement,
decide whether the sample is ordered (e
...
does the question say anything about
the first object drawn?)
...
If not,
then you can use either ordered or unordered samples, whichever is convenient;
they should give the same answer
...


1
...
You are allowed to take the test
up to three times
...

So the sample space is
S = {p, f p, f f p, f f f },
where for example f f p denotes the outcome that you fail twice and pass on your
third attempt
...

But it is unreasonable here to assume that all the outcomes are equally likely
...
Let us assume
that the probability that you pass the test is 0
...
(By Proposition 3, your chance
of failing is 0
...
) Let us further assume that, no matter how many times you have
failed, your chance of passing at the next attempt is still 0
...
Then we have
P(p)
P( f p)
P( f f p)
P( f f f )

=
=
=
=

0
...
2 · 0
...
16,
0
...
8 = 0
...
23 = 0
...


Thus the probability that you eventually get the certificate is P({p, f p, f f p}) =
0
...
16 + 0
...
992
...
008 = 0
...

A stopping rule is a rule of the type described here, namely, continue the experiment until some specified occurrence happens
...


1
...
QUESTIONNAIRE RESULTS

13

For example, if you toss a coin repeatedly until you obtain heads, the sample
space is
S = {H, T H, T T H, T T T H,
...
(We have to allow all possible outcomes
...
This ensures that the sample space is finite
...
In the meantime you might like to consider
whether it is a reasonable assumption for tossing a coin, or for someone taking a
series of tests
...
For example, the number of coin
tosses might be determined by some other random process such as the roll of a
die; or we might toss a coin until we have obtained heads twice; and so on
...


1
...
I have a hat containing 20 balls, 10 red and 10 blue
...
I am interested in the event that I draw exactly five red and
five blue balls
...
What colour are your eyes?
Blue 2

Brown 2

3
...
BASIC IDEAS

What can we conclude?
Half the class thought that, in the experiment with the coloured balls, sampling
with replacement make the result more likely
...
(This doesn’t matter,
since the students were instructed not to think too hard about it!)
You might expect that eye colour and mobile phone ownership would have no
influence on your answer
...
If true, then of the 87 people with brown
eyes, half of them (i
...
43 or 44) would answer “with replacement”, whereas in
fact 45 did
...
So
perhaps we have demonstrated that people who own mobile phones are slightly
smarter than average, whereas people with brown eyes are slightly less smart!
In fact we have shown no such thing, since our results refer only to the people who filled out the questionnaire
...

On the other hand, since 83 out of 104 people have mobile phones, if we
think that phone ownership and eye colour are independent, we would expect
that the same fraction 83/104 of the 87 brown-eyed people would have phones,
i
...
(83 · 87)/104 = 69
...
In fact the number is 70, or as near as we can
expect
...


1
...

This is the definition of independence of events
...
Do not say that two
events are independent if one has no influence on the other; and under no circum/
stances say that A and B are independent if A ∩ B = 0 (this is the statement that
A and B are disjoint, which is quite a different thing!) Also, do not ever say that
P(A ∩ B) = P(A) · P(B) unless you have some good reason for assuming that A
and B are independent (either because this is given in the question, or as in the
next-but-one paragraph)
...
Suppose that a student is chosen
at random from those who filled out the questionnaire
...
10
...
Then
P(A) = 52/104 = 0
...
8365,
P(C) = 83/104 = 0
...

Furthermore,
P(A ∩ B) = 45/104 = 0
...
375,
P(B ∩C) = 70/104 = 0
...
4183,
P(A) · P(C) = 0
...
6676
...

In practice, if it is the case that the event A has no effect on the outcome
of event B, then A and B are independent
...
There might be a very definite connection between A and B, but still it
could happen that P(A ∩ B) = P(A) · P(B), so that A and B are independent
...

Example If we toss a coin more than once, or roll a die more than once, then
you may assume that different tosses or rolls are independent
...
But (1/36) = (1/6) · (1/6), and the separate
probabilities of getting 4 on the first throw and of getting 5 on the second are both
equal to 1/6
...
This would work just as well for
any other combination
...
This holds even if the examples
are not all equally likely
...

Example I have two red pens, one green pen, and one blue pen
...
Let A be the event that I choose exactly one red pen,
and B the event that I choose exactly one green pen
...
BASIC IDEAS

We have P(A) = 4/6 = 2/3, P(B) = 3/6 = 1/2, P(A∩B) = 2/6 = 1/3 = P(A)P(B),
so A and B are independent
...
This time, if you write down the sample space
and the two events and do the calculations, you will find that P(A) = 6/10 = 3/5,
P(B) = 4/10 = 2/5, P(A ∩ B) = 2/10 = 1/5 = P(A)P(B), so adding one more
pen has made the events non-independent!
We see that it is very difficult to tell whether events are independent or not
...
(There is
one other case where you can assume independence: this is the result of different
draws, with replacement, from a set of objects
...
Each of the eight possible outcomes has probability 1/8
...
Then
• A = {HHH, HHT, HT H, T HH}, P(A) = 1/2,
• B = {HHH, HHT, T T H, T T T }, P(B) = 1/2,
• A ∩ B = {HHH, HHT }, P(A ∩ B) = 1/4;
so A and B are independent
...
For example, let C be the
event ‘heads on the last toss’
...

Are B and C independent?

1
...
You will need to know the conclusions, though the
arguments we use to reach them are not so important
...


1
...
PROPERTIES OF INDEPENDENCE

17

If all three pairs of events happen to be independent, can we then conclude
that P(A ∩ B ∩C) = P(A) · P(B) · P(C)? At first sight this seems very reasonable;
in Axiom 3, we only required all pairs of events to be exclusive in order to justify
our conclusion
...

Example In the coin-tossing example, let A be the event ‘first and second tosses
have same result’, B the event ‘first and third tosses have the same result, and
C the event ‘second and third tosses have same result’
...
Thus any pair of the three
events are independent, but
P(A ∩ B ∩C) = 1/4,
P(A) · P(B) · P(C) = 1/8
...

The correct definition and proposition run as follows
...
, An be events
...
, ik with k ≥ 1, the events
Ai1 ∩ Ai2 ∩ · · · ∩ Aik−1

and Aik

are independent
...

Proposition 1
...
, An be mutually independent
...


Now all you really need to know is that the same ‘physical’ arguments that
justify that two events (such as two tosses of a coin, or two throws of a die) are
independent, also justify that any number of such events are mutually independent
...
In other words, all 64 possible outcomes are equally likely
...
12

Properties of independence

Proposition 1
...


18

CHAPTER 1
...

From Corollary 4, we know that P(B ) = 1 − P(B)
...
Thus,
P(A ∩ B ) = P(A) − P(A ∩ B)
= P(A) − P(A) · P(B)
(since A and B are independent)
= P(A)(1 − P(B))
= P(A) · P(B ),
which is what we were required to prove
...
13 If A and B are independent, so are A and B
...

More generally, if events A1 ,
...
We have to be a bit careful though
...

Proposition 1
...
Then A and B ∩C
are independent, and A and B ∪C are independent
...
You are allowed up to three attempts to pass the test
...
8
...
Then, by Proposition 1
...
That is,
P(p) = 0
...
2) · (0
...
2)2 · (0
...
2)3 ,
as we claimed in the earlier example
...


1
...
PROPERTIES OF INDEPENDENCE
Example
The electrical apparatus in the diagram
works so long as current can flow from left
to right
...
The probability that component A
works is 0
...
9; and the probability that
component C works is 0
...

Find the probability that the apparatus works
...

Now the apparatus will work if either A and B are working, or C is working (or
possibly both)
...

Now
P((A ∩ B) ∪C)) = P(A ∩ B) + P(C) − P(A ∩ B ∩C)
(by Inclusion–Exclusion)
= P(A) · P(B) + P(C) − P(A) · P(B) · P(C)
(by mutual independence)
= (0
...
9) + (0
...
8) · (0
...
75)
= 0
...

The problem can also be analysed in a different way
...
Thus, the event that the apparatus does not work is (A ∪B )∩C
...
We have
P((A ∩C ) ∪ (B ∩C ) = P(A ∩C ) + P(B ∩C ) − P(A ∩ B ∩C )
(by Inclusion–Exclusion)
= P(A ) · P(C ) + P(B ) · P(C ) − P(A ) · P(B ) · P(C )
(by mutual independence of A , B ,C )
= (0
...
25) + (0
...
25) − (0
...
1) · (0
...
07,
so the apparatus works with probability 1 − 0
...
93
...
You might be tempted
to say P(A ∩C ) = (0
...
25) = 0
...
1) · (0
...
025;
and conclude that
P((A ∩C ) ∪ (B ∩C )) = 0
...
025 − (0
...
025) = 0
...
But this is not correct, since the
events A ∩C and B ∩C are not independent!

20

CHAPTER 1
...
Suppose that I have a coin which has
probability 0
...
I toss the coin three times
...
6)3 = 0
...

The probability of tails on any toss is 1 − 0
...
4
...
Each outcome
has probability (0
...
6) · (0
...
144
...
144) = 0
...

Similarly the probability of one head is 3 · (0
...
4)2 = 0
...
4)3 = 0
...

As a check, we have
0
...
432 + 0
...
064 = 1
...
13

Worked examples

Question
(a) You go to the shop to buy a toothbrush
...
The probability that you buy a red toothbrush is
three times the probability that you buy a green one; the probability that you
buy a blue one is twice the probability that you buy a green one; the probabilities of buying green, purple, and white are all equal
...
For each colour, find the probability that you
buy a toothbrush of that colour
...
On the first day of term they both go to the shop to
buy a toothbrush
...
Find the probability that they buy toothbrushes of the same
colour
...
On the first day of each term
they buy new toothbrushes, with probabilities as in (b), independently of
what they had bought before
...
Find the probablity that James and Simon have differently
coloured toothbrushes from each other for all three terms
...
13
...
Let x = P(G)
...


Since these outcomes comprise the whole sample space, Corollary 2 gives
3x + 2x + x + x + x = 1,
so x = 1/8
...

(b) Let RB denote the event ‘James buys a red toothbrush and Simon buys a blue
toothbrush’, etc
...

The event that the toothbrushes have the same colour consists of the five
outcomes RR, BB, GG, PP, WW , so its probability is
P(RR) + P(BB) + P(GG) + P(PP) + P(WW )
9
1
1
1
1
1
=
+ + + +
=
...
So the event ‘different
coloured toothbrushes in all three terms’ has probability
3 3 3 27
· · =
...
So it is
more likely that they will have the same colour in at least one term
...
The warden tags six of the
elephants with small radio transmitters and returns them to the reserve
...
He counts how many
of these elephants are tagged
...
Find the probability that exactly two of the
selected elephants are tagged, giving the answer correct to 3 decimal places
...
BASIC IDEAS

Solution The experiment consists of picking the five elephants, not the original
choice of six elephants for tagging
...
Then |S | = 24C5
...
This involves
choosing two of the six tagged elephants and three of the eighteen untagged ones,
so |A| = 6C2 · 18C3
...
288

to 3 d
...

Note: Should the sample should be ordered or unordered? Since the answer
doesn’t depend on the order in which the elephants are caught, an unordered sample is preferable
...
288,

since it is necessary to multiply by the 5C2 possible patterns of tagged and untagged elephants in a sample of five with two tagged
...
They decide to stop having
children either when they have two boys or when they have four children
...

(a) Write down the sample space
...
Find P(E) and P(F) where
E = “there are at least two girls”, F = “there are more girls than boys”
...

(b) E = {BGGB, GBGB, GGBB, BGGG, GBGG, GGBG, GGGB, GGGG},
F = {BGGG, GBGG, GGBG, GGGB, GGGG}
...
So P(E) = 8/16 = 1/2, P(F) = 5/16
...


2
...
They toss a fair coin ‘best of three’ to
decide who pays: if there are more heads than tails in the three tosses then Alice
pays, otherwise Bob pays
...
The sample space is

S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T },
and the events ‘Alice pays’ and ‘Bob pays’ are respectively
A = {HHH, HHT, HT H, T HH},
B = {HT T, T HT, T T H, T T T }
...
How should
we now reassess their chances? We have
E = {HHH, HHT, HT H, HT T },
and if we are given the information that the result of the first toss is heads, then E
now becomes the sample space of the experiment, since the outcomes not in E are
no longer possible
...

23

24

CHAPTER 2
...

In general, suppose that we are given that an event E has occurred, and we
want to compute the probability that another event A occurs
...
The correct definition
is as follows
...
The
conditional probability of A given E is defined as
P(A | E) =

P(A ∩ E)

...
If you are asked for the definition
of conditional probability, it is not enough to say “the probability of A given that
E has occurred”, although this is the best way to understand it
...
This is P(A | E), not P(A/E) or P(A \ E)
...

To check the formula in our example:
P(A ∩ E) 3/8 3
=
= ,
P(E)
1/2 4
P(B ∩ E) 1/8 1
=
=
...
Thus, for example,
P(A | B) =

P(A ∩ B)
P(B)

if P(B) = 0
...
The probability that the car is yellow is 3/100: the
probability that the driver is blonde is 1/5; and the probability that the car is
yellow and the driver is blonde is 1/50
...


2
...
GENETICS

25

Solution: If Y is the event ‘the car is yellow’ and B the event ‘the driver is blonde’,
then we are given that P(Y ) = 0
...
2, and P(Y ∩ B) = 0
...
So
P(B | Y ) =

P(B ∩Y ) 0
...
667
P(Y )
0
...
p
...

There is a connection between conditional probability and independence:
Proposition 2
...
Then A and B are independent if and only if P(A | B) = P(A)
...

So first suppose that A and B are independent
...
Then
P(A | B) =

P(A ∩ B) P(A) · P(B)
=
= P(A),
P(B)
P(B)

that is, P(A | B) = P(A), as we had to prove
...
In other words,
P(A ∩ B)
= P(A),
P(B)
using the definition of conditional probability
...

This proposition is most likely what people have in mind when they say ‘A
and B are independent means that B has no effect on A’
...
2

Genetics

Here is a simplified version of how genes code eye colour, assuming only two
colours of eyes
...
Each gene is either B or b
...
The gene it receives from its father
is one of its father’s two genes, each with probability 1/2; and similarly for its
mother
...

If your genes are BB or Bb or bB, you have brown eyes; if your genes are bb,
you have blue eyes
...
CONDITIONAL PROBABILITY

Example Suppose that John has brown eyes
...
His
sister has blue eyes
...

Thus each of John’s parents is Bb or bB; we may assume Bb
...
(For example, John gets his father’s B gene with probability 1/2 and his mother’s B gene with probability 1/2, and these are independent, so the probability that he gets BB is 1/4
...
)
Let X be the event ‘John has BB genes’ and Y the event ‘John has brown
eyes’
...
The question asks us to calculate
P(X | Y )
...
3

P(X ∩Y ) 1/4
=
= 1/3
...

Example An ice-cream seller has to decide whether to order more stock for the
Bank Holiday weekend
...
According to the weather forecast, the probability of sunshine
is 30%, the probability of cloud is 45%, and the probability of rain is 25%
...
) What is the overall probability that the salesman will sell all his
stock?
This problem is answered by the Theorem of Total Probability, which we now
state
...
The events A1 , A2 ,
...


2
...
THE THEOREM OF TOTAL PROBABILITY

27

Another way of saying the same thing is that every outcome in the sample space
lies in exactly one of the events A1 , A2 ,
...
The picture shows the idea of a
partition
...
An

Now we state and prove the Theorem of Total Probability
...
2 Let A1 , A2 ,
...
Then
n

P(B) = ∑ P(B | Ai ) · P(Ai )
...
Multiplying up, we find that
P(B ∩ Ai ) = P(B | Ai ) · P(Ai )
...
, B ∩ An
...
Moreover, the union of all
these events is B, since every outcome lies in one of the Ai
...


i=1

Substituting our expression for P(B ∩ Ai ) gives the result
...
An
Consider the ice-cream salesman at the start of this section
...
Then
A1 , A2 and A3 form a partition of the sample space, and we are given that
P(A1 ) = 0
...
45,

P(A3 ) = 0
...


28

CHAPTER 2
...
The other information we are
given is that
P(B | A1 ) = 0
...
6,

P(B | A3 ) = 0
...


By the Theorem of Total Probability,
P(B) = (0
...
3) + (0
...
45) + (0
...
25) = 0
...

You will now realise that the Theorem of Total Probability is really being used
when you calculate probabilities by tree diagrams
...

One special case of the Theorem of Total Probability is very commonly used,
and is worth stating in its own right
...
To say that both A and A have non-zero probability is just to say
that P(A) = 0, 1
...
3 Let A and B be events, and suppose that P(A) = 0, 1
...


2
...

Example I have two red pens, one green pen, and one blue pen
...

(a) What is the probability that the first pen chosen is red?
(b) What is the probability that the second pen chosen is red?
For the first pen, there are four pens of which two are red, so the chance of
selecting a red pen is 2/4 = 1/2
...
Let A1 be the event ‘first pen red’,
A2 the event ‘first pen green’ and A3 the event ‘first pen blue’
...
Let B be the event ‘second pen red’
...
On the other hand, if the first pen is green or blue, then two of
the remaining pens are red, so P(B | A2 ) = P(B | A3 ) = 2/3
...
5
...

We have reached by a roundabout argument a conclusion which you might
think to be obvious
...
This argument happens to be correct
...

Beware of obvious-looking arguments in probability! Many clever people have
been caught out
...
5

Bayes’ Theorem

There is a very big difference between P(A | B) and P(B | A)
...
Of course, no test is perfect; there will be
some carriers of the defective gene who test negative, and some non-carriers who
test positive
...

The scientists who develop the test are concerned with the probabilities that
the test result is wrong, that is, with P(B | A ) and P(B | A)
...
If I tested positive, what is the
chance that I have the disease? If I tested negative, how sure can I be that I am not
a carrier? In other words, P(A | B) and P(A | B )
...
4 Let A and B be events with non-zero probability
...

P(B)

The proof is not hard
...
(Note that we need both A
and B to have non-zero probability here
...


30

CHAPTER 2
...

P(B | A) · P(A) + P(B | A ) · P(A )

Bayes’ Theorem is often stated in this form
...
3
...
) Using the same notation that we used before, A1
is the event ‘it is sunny’ and B the event ‘the salesman sells all his stock’
...
We were given that P(B | A1 ) = 0
...
3, and
we calculated that P(B) = 0
...
So by Bayes’ Theorem,
P(A1 | B) =

P(B | A1 )P(A1 ) 0
...
3
=
= 0
...
59

to 2 d
...

Example Consider the clinical test described at the start of this section
...
Suppose also that the
probability that a carrier tests negative is 1%, while the probability that a noncarrier tests positive is 5%
...
) Let A be the event ‘the patient is a carrier’, and B the event ‘the
test result is positive’
...
001 (so that P(A ) = 0
...
99,
P(B | A ) = 0
...

(a) A patient has just had a positive test result
...
99 × 0
...
99 × 0
...
05 × 0
...
00099
=
= 0
...

0
...
What is the probability that the
patient is a carrier? The answer is
P(A | B ) =

P(B | A)P(A)
P(B | A)P(A) + P(B | A )P(A )

2
...
ITERATED CONDITIONAL PROBABILITY

31

0
...
001
(0
...
001) + (0
...
999)
0
...
00001
...
94095

=

So a patient with a negative test result can be reassured; but a patient with a positive test result still has less than 2% chance of being a carrier, so is likely to worry
unnecessarily
...
If the patient has a family history of the disease, the
calculations would be quite different
...
A new blood test is
developed; the probability of testing positive is 9/10 if the subject has the serious
form, 6/10 if the subject has the mild form, and 1/10 if the subject doesn’t have
the disease
...
What is the probability that I have the serious form
of the disease?
Let A1 be ‘has disease in serious form’, A2 be ‘has disease in mild form’, and
A3 be ‘doesn’t have disease’
...
Then we are given that A1 ,
A2 , A3 form a partition and
P(A1 ) = 0
...
1
P(A3 ) = 0
...
9 P(B | A2 ) = 0
...
1
Thus, by the Theorem of Total Probability,
P(B) = 0
...
02 + 0
...
1 + 0
...
88 = 0
...
9 × 0
...
108
P(B)
0
...
p
...
6

Iterated conditional probability

The conditional probability of C, given that both A and B have occurred, is just
P(C | A ∩ B)
...
It is given by
P(C | A, B) =

P(C ∩ A ∩ B)
,
P(A ∩ B)

32

CHAPTER 2
...

Now we also have
P(A ∩ B) = P(B | A)P(A),
so finally (assuming that P(A ∩ B) = 0), we have
P(A ∩ B ∩C) = P(C | A, B)P(B | A)P(A)
...
5 Let A1 ,
...
Suppose that P(A1 ∩ · · · ∩ An−1 ) = 0
...
, An−1 ) · · · P(A2 | A1 )P(A1 )
...

The birthday paradox is the following statement:
If there are 23 or more people in a room, then the chances are better
than even that two of them have the same birthday
...
(This is not quite true
but not inaccurate enough to have much effect on the conclusion
...
, pn
...
Then P(A2 ) = 1 − 365 , since whatever p1 ’s birthday is, there is a 1 in
365 chance that p2 will have the same birthday
...
It is not
straightforward to evaluate P(A3 ), since we have to consider whether p1 and p2
have the same birthday or not
...
But we can calculate that P(A3 |
2
A2 ) = 1 − 365 , since if A2 occurs then p1 and p2 have birthdays on different days,
and A3 will occur only if p3 ’s birthday is on neither of these days
...


What is A2 ∩ A3 ? It is simply the event that all three people have birthdays on
different days
...
If Ai denotes the event ‘pi ’s birthday is not on the
same day as any of p1 ,
...
, Ai−1 ) = 1 − i−1 ,
365

2
...
ITERATED CONDITIONAL PROBABILITY

33

and so by Proposition 2
...

365

Call this number qi ; it is the probability that all of the people p1 ,
...

The numbers qi decrease, since at each step we multiply by a factor less than 1
...
5,

qn ≤ 0
...

By calculation, we find that q22 = 0
...
4927 (to 4 d
...
); so 23
people are enough for the probability of coincidence to be greater than 1/2
...
What is the probability of the
event A3 ? (This is the event that p3 has a different birthday from both p1 and p2
...
On the other hand, if p1 and p2 have the same birthday,
1
then the probability is 1 − 365
...
So, by the Theorem of Total Probability,
P(A3 ) = P(A3 | A2 )P(A2 ) + P(A3 | A2 )P(A2 )
1
1
1
2
= (1 − 365 )(1 − 365 ) + (1 − 365 ) 365
= 0
...
p
...
, pi−1 , then as before we find that
P(Bi | B1 , · · · , Bi−1 ) = 1 − i−1 ,
12
so
1
2
P(B1 ∩ · · · ∩ Bi ) = (1 − 12 )(1 − 12 )(1 − i−1 )
...
5729

34

CHAPTER 2
...
3819
for i = 5
...

A true story
...
He said to the class, “I bet that
no two people in the room have the same birthday”
...
859
...
Why?
(Answer in the next chapter
...
7

Worked examples

Question Each person has two genes for cystic fibrosis
...
Each child receives one gene from each parent
...

(a) Neither of Sally’s parents has cystic fibrosis
...
However, Sally’s
sister Hannah does have cystic fibrosis
...

(b) In the general population the ratio of N genes to C genes is about 49 to 1
...
Harry does
not have cystic fibrosis
...

(c) Harry and Sally plan to have a child
...

Solution During this solution, we will use a number of times the following principle
...
Then A ∩ B = A, and so
P(A | B) =

P(A ∩ B) P(A)
=

...
We are given
that Sally’s sister has genes CC, and one gene must come from each parent
...
7
...
Now by the basic rules of
genetics, all the four combinations of genes for a child of these parents, namely
CC,CN, NC, NN, will have probability 1/4
...

Then
P(S1 ∩ S2 ) 2/4 2
P(S1 | S2 ) =
=
=
...
We are given that the
probability of a random gene being C or N is 1/50 and 49/50 respectively
...
So, if H1 is the
event ‘Harry has at least one C gene’, and H2 is the event ‘Harry does not have
cystic fibrosis’, then
P(H1 | H2 ) =

P(H1 ∩ H2 )
(49/2500) + (49/2500)
2
=
=
...
As in
(a), this can only occur if Harry and Sally both have CN or NC genes
...
Now if Harry and Sally are
both CN or NC, these genes pass independently to the baby, and so
P(X | S3 ∩ H3 ) =

1
P(X)
=
...

P(S2 ∩ H2 )
Now Harry’s and Sally’s genes are independent, so
P(S3 ∩ H3 ) = P(S3 ) · P(H3 ),
P(S2 ∩ H2 ) = P(S2 ) · P(H2 )
...
CONDITIONAL PROBABILITY
1 P(S1 ∩ S2 ) P(H1 ∩ H2 )
·
·
4
P(S2 )
P(H2 )
1
· P(S1 | S2 ) · P(H1 | H2 )
=
4
1 2 2
=
· ·
4 3 51
1
=

...

Question The Land of Nod lies in the monsoon zone, and has just two seasons,
Wet and Dry
...
During the Wet season, the probability that it is raining is 3/4; during
the Dry season, the probability that it is raining is 1/6
...
What is the
probability that it is raining when I arrive?
(b) I visit Oneirabad on a random day, and it is raining when I arrive
...
Given this
information, what is the probability that it will be raining when I return to
Oneirabad in a year’s time?
(You may assume that in a year’s time the season will be the same as today but,
given the season, whether or not it is raining is independent of today’s weather
...
We are given that P(W ) =
1/3, P(D) = 2/3, P(R | W ) = 3/4, P(R | D) = 1/6
...

(b) By Bayes’ Theorem,
P(W | R) =

P(R | W )P(W ) (3/4) · (1/3)
9
=
=
...
7
...
The information we are
given is that P(R ∩ R | W ) = P(R | W )P(R | W ) and similarly for D
...

P(R)
13/36
156

38

CHAPTER 2
...


3
...
Similarly, a random variable is neither random nor a
variable:
A random variable is a function defined on a sample space
...
The standard abbreviation for ‘random variable’ is r
...

Example I select at random a student from the class and measure his or her
height in centimetres
...
(Remember that a function is nothing but a rule for
associating with each element of its domain set an element of its target or range
set
...
)
Example I throw a six-sided die twice; I am interested in the sum of the two
numbers
...
RANDOM VARIABLES

and the random variable F is given by F(i, j) = i + j
...
, 12}
...
These definitions are not quite
precise, but more examples should make the idea clearer
...

For example, F is discrete if it can take only finitely many values (as in the second
example above, where the values are the integers from 2 to 12), or if the values of
F are integers (for example, the number of nuclear decays which take place in a
second in a sample of radioactive material – the number is an integer but we can’t
easily put an upper limit on it
...
In the first example, the height of a student could in principle be any real
number between certain extreme limits
...

One could concoct random variables which are neither discrete nor continuous
(e
...
the possible, values could be 1, 2, 3, or any real number between 4 and 5),
but we will not consider such random variables
...


3
...
The most basic question we can ask is: given
any value a in the target set of F, what is the probability that F takes the value a?
In other words, if we consider the event
A = {x ∈ S : F(x) = a}
what is P(A)? (Remember that an event is a subset of the sample space
...

(There is a fairly common convention in probability and statistics that random
variables are denoted by capital letters and their values by lower-case letters
...
But
remember that this is only a convention, and you are not bound to it
...
3
...
If F takes only a few values, it is convenient to list it in a table; otherwise
we should give a formula if possible
...
m
...

Example I toss a fair coin three times
...
The possible values of X are 0, 1, 2, 3, and its p
...
f
...
The event X = 1, for example, when written as a
set of outcomes, is equal to {HT T, T HT, T T H}, and has probability 3/8
...
We write X ∼ Y
in this case
...


3
...
, an
...

i=1

That is, we multiply each value of X by the probability that X takes that value,
and sum these terms
...

There is an interpretation of the expected value in terms of mechanics
...
, n, where pi = P(X = ai ), then
the centre of mass of all these masses is at the point E(X)
...
, then
we define the expected value of X to be the infinite sum


E(X) = ∑ ai P(X = ai )
...
RANDOM VARIABLES

Of course, now we have to worry about whether this means anything, that is,
whether this infinite series is convergent
...
We won’t worry about it too much
...
In the proofs
below, we assume that the number of values is finite
...

Here, X 2 is just the random variable whose values are the squares of the values of
X
...
The next theorem shows that, if E(X) is a kind
of average of the values of X, then Var(X) is a measure of how spread-out the
values are around their average
...
1 Let X be a discrete random variable with E(X) = µ
...

i=1

For the second term is equal to the third by definition, and the third is
n

∑ (ai − µ)2P(X = ai)

i=1
n

=

∑ (a2 − 2µai + µ2)P(X = ai)
i

i=1

n

=

n

∑ a2P(X = ai) − 2µ
i

i=1

∑ aiP(X = ai) + µ2

i=1

n

∑ P(X = ai)


...
We add it up by columns instead of by rows, getting three parts with
n terms in each part
...
(Remember that E(X) = µ, and that ∑n P(X = ai ) = 1 since
i=1
the events X = ai form a partition
...
4
...
M
...
OF TWO RANDOM VARIABLES

43

Some people take the conclusion of this proposition as the definition of variance
...
What are the
expected value and variance of X?
E(X) = 0 × (1/8) + 1 × (3/8) + 2 × (3/8) + 3 × (1/8) = 3/2,
Var(X) = 02 × (1/8) + 12 × (3/8) + 22 × (3/8) + 32 × (1/8) − (3/2)2 = 3/4
...
1, we get
3
Var(X) = −
2

2

1
1
× + −
8
2

2

3
1
× +
8
2

2

3
3
× +
8
2

2

×

1 3
=
...

• The expected value of X always lies between the smallest and largest values
of X
...
(For the formula in Proposition 3
...


3
...
m
...
of two random variables

Let X be a random variable taking the values a1 ,
...
, bm
...

Here P(X = ai ,Y = b j ) means the probability of the event that X takes the value
ai and Y takes the value b j
...

Note the difference between ‘independent events’ and ‘independent random variables’
...
RANDOM VARIABLES

Example In Chapter 2, we saw the following: I have two red pens, one green
pen, and one blue pen
...
Then the events
‘exactly one red pen selected’ and ‘exactly one green pen selected’ turned out to
be independent
...
Then
P(X = 1,Y = 1) = P(X = 1) · P(Y = 1)
...

On the other hand, if I roll a die twice, and X and Y are the numbers that come
up on the first and second throws, then X and Y will be independent, even if the
die is not fair (so that the outcomes are not all equally likely)
...
(You may
want to revise the material on mutually independent events
...
2 Let X and Y be random variables
...

(b) If X and Y are independent, then Var(X +Y ) = Var(X) + Var(Y )
...

If two random variables X and Y are not independent, then knowing the p
...
f
...
The joint probability mass function
(or joint p
...
f
...
We arrange the table so
that the rows correspond to the values of X and the columns to the values of Y
...
m
...
of X
...
m
...
of Y
...
)
In particular, X and Y are independent r
...
s if and only if each entry of the
table is equal to the product of its row sum and its column sum
...
4
...
M
...
OF TWO RANDOM VARIABLES

45

Example I have two red pens, one green pen, and one blue pen, and I choose
two pens without replacement
...
Then the joint p
...
f
...
m
...
s for X and Y :
a 0 1 2
P(X = a)

1
6

2
3

1
6

b 0 1
P(Y = b)

1
2

1
2

Now we give the proof of Theorem 3
...

We consider the joint p
...
f
...
The random variable X +Y takes the
values ai + b j for i = 1,
...
, m
...
Thus,
E(X +Y ) =

∑ ck P(X +Y = ck )
k
n

=

m

∑ ∑ (ai + b j )P(X = ai,Y = b j )

i=1 j=1
n

=

m

m

∑ ai ∑ P(X = ai,Y = b j ) +

i=1

n

∑ b j ∑ P(X = ai,Y = b j )
...
m
...
table, so is equal to
j=1
P(X = ai ), and similarly ∑n P(X = ai ,Y = b j ) is a column sum and is equal to
i=1
P(Y = b j )
...

The variance is a bit trickier
...
RANDOM VARIABLES

using part (a) of the Theorem
...
For this, we
have to make the assumption that X and Y are independent, that is,
P(X = a1 ,Y = b j ) = P(X = ai ) · P(Y = b j )
...

So
Var(X +Y ) =
=
=
=

E((X +Y )2 ) − (E(X +Y ))2
(E(X 2 ) + 2E(XY ) + E(Y 2 )) − (E(X)2 + 2E(X)E(Y ) + E(Y )2 )
(E(X 2 ) − E(X)2 ) + 2(E(XY ) − E(X)E(Y )) + (E(Y 2 ) − E(Y )2 )
Var(X) + Var(Y )
...
(If the thought
of a ‘constant variable’ worries you, remember that a random variable is not a
variable at all but a function, and there is nothing amiss with a constant function
...
3 Let C be a constant random variable with value c
...

(a) E(C) = c, Var(C) = 0
...

(c) E(cX) = cE(X), Var(cX) = c2 Var(X)
...
So
E(C) = c · 1 = c
...

(For C2 is a constant random variable with value c2
...
5
...
2, once we observe that the
constant random variable C and any random variable X are independent
...
) Then
E(X + c) = E(X) + E(C) = E(X) + c,
Var(X + c) = Var(X) + Var(C) = Var(X)
...
, an are the values of X, then ca1 ,
...
So
n

E(cX) =

∑ caiP(cX = cai)

i=1
n

= c ∑ ai P(X = ai )
i=1

= cE(X)
...
5

E(c2 X 2 ) − E(cX)2
c2 E(X 2 ) − (cE(X))2
c2 (E(X 2 ) − E(X)2 )
c2 Var(X)
...
We describe for each type the situations in which it arises, and
give the p
...
f
...
If the variable is tabulated
in the New Cambridge Statistical Tables, we give the table number, and some
examples of using the tables
...

A summary of this information is given in Appendix B
...
They
don’t give the probability mass function (or p
...
f
...
It is defined for a discrete random
variable as follows
...
, an
...
The cumulative distribution
function, or c
...
f
...


48

CHAPTER 3
...
m
...
of X as follows:
i

FX (ai ) = P(X = a1 ) + · · · + P(X = ai ) =

∑ P(X = a j )
...
m
...
from the c
...
f
...

We won’t use the c
...
f
...
It is much more important for continuous random variables!

Bernoulli random variable Bernoulli(p)
A Bernoulli random variable is the simplest type of all
...
So its p
...
f
...

Necessarily q (the probability that X = 0) is equal to 1 − p
...

For a Bernoulli random variable X, we sometimes describe the experiment as
a ‘trial’, the event X = 1 as‘success’, and the event X = 0 as ‘failure’
...

More generally, let A be any event in a probability space S
...

/
The random variable IA is called the indicator variable of A, because its value
indicates whether or not A occurred
...
(The event IA = 1 is just the event A
...

Calculation of the expected value and variance of a Bernoulli random variable
is easy
...
(Remember that ∼ means “has the same p
...
f
...
)
E(X) = 0 · q + 1 · p = p;
Var(X) = 02 · q + 12 · p − p2 = p − p2 = pq
...
)

3
...
SOME DISCRETE RANDOM VARIABLES

49

Binomial random variable Bin(n, p)
Remember that for a Bernoulli random variable, we describe the event X = 1 as a
‘success’
...

For example, suppose that we have a biased coin for which the probability of
heads is p
...
This
number is a Bin(n, p) random variable
...
, n, and the p
...
f
...
, n, where q = 1 − p
...

Note that we have given a formula rather than a table here
...

k=0

(This argument explains the name of the binomial random variable!)
If X ∼ Bin(n, p), then
E(X) = np,

Var(X) = npq
...
The easy way
only works for the binomial, but the harder way is useful for many random variables
...

Here is the easy method
...
Then X is our
Bin(n, p) random variable
...


50

CHAPTER 3
...

Now we have
X = X1 + X2 + · · · + Xn
(can you see why?), and X1 ,
...
So, as we saw earlier, E(Xi ) =
p, Var(Xi ) = pq
...

The other method uses a gadget called the probability generating function
...
Let X be a random variable whose values are non-negative integers
...
To save space, we write pk for the probability P(X = k)
...

(The sum is over all values k taken by X
...

Proposition 3
...
Then
(a) [GX (x)]x=1 = 1;
(b) E(X) =
(c) Var(X) =

d
dx GX (x) x=1 ;
d2
G (x)
+ E(X) − E(X)2
...

For part (b), when we differentiate the series term-by-term (you will learn later in Analysis
that this is OK), we get
d
GX (x) = ∑ kpk xk−1
...

For part (c), differentiating twice gives
d2
GX (x) = ∑ k(k − 1)pk xk−2
...

Adding E(X) and subtracting E(X)2 gives E(X 2 ) − E(X)2 , which by definition is Var(X)
...
5
...
We have
pk = P(X = k) = nCk qn−k pk ,
so the probability generating function is
n

∑ nCk qn−k pk xk = (q + px)n ,
k=0

by the Binomial Theorem
...
4(a)
...
Putting x = 1 we find that
E(X) = np
...
Putting x = 1 gives n(n − 1)p2
...


The binomial random variable is tabulated in Table 1 of the Cambridge Statistical Tables [1]
...

For example, suppose that the probability that a certain coin comes down
heads is 0
...
If the coin is tossed 15 times, what is the probability of five or
fewer heads? Turning to the page n = 15 in Table 1 and looking at the row 0
...
2608
...
2608 − 0
...
1404
...
5
...
So the probability of five heads in 15 tosses of a coin with p = 0
...
9745−
0
...
0514
...

Suppose that we have N balls in a box, of which M are red
...
What is the distribution of X? Since each ball has probability
M/N of being red, and different choices are independent, X ∼ Bin(n, p), where
p = M/N is the proportion of red balls in the sample
...
We sample n balls
from the box without replacement
...
RANDOM VARIABLES

red balls in the sample
...

The random variable X can take any of the values 0, 1, 2,
...
Its p
...
f
...

N Cn
For the number of samples of n balls from N is N Cn ; the number of ways of
choosing k of the M red balls and n − k of the N − M others is MCk · N−MCn−k ; and
all choices are equally likely
...

N −1

You should compare these to the values for a binomial random variable
...

In particular, if the numbers M and N − M of red and non-red balls in the
hat are both very large compared to the size n of the sample, then the difference
between sampling with and without replacement is very small, and indeed the
‘correction factor’ is close to 1
...

Consider our example of choosing two pens from four, where two pens are
red, one green, and one blue
...
We calculated earlier that P(X = 0) = 1/6, P(X = 1) = 2/3 and P(X =
2) = 1/6
...

These agree with the formulae above
...
We have again a coin whose probability of heads is p
...
Thus, the values of the variable are the positive integers 1, , 2, 3,
...
e
...
)
We always assume that 0 < p < 1
...


3
...
SOME DISCRETE RANDOM VARIABLES

53

The p
...
f of a Geom(p) random variable is given by
P(X = k) = qk−1 p,
where q = 1 − p
...

Let’s add up these probabilities:


p

∑ qk−1 p = p + qp + q2 p + · · · = 1 − q = 1,

k=1

since the series is a geometric progression with first term p and common ratio
q, where q < 1
...
)
We calculate the expected value and the variance using the probability generating function
...


We have


GX (x) =

px

∑ qk−1 pxk = 1 − qx ,

k=1

again by summing a geometric progression
...

2
dx
(1 − qx)
(1 − qx)2
Putting x = 1, we obtain
E(X) =

1
p
=
...

p3
p p
p

For example, if we toss a fair coin until heads is obtained, the expected number
of tosses until the first head is 2 (so the expected number of tails is 1); and the
variance of this number is also 2
...
RANDOM VARIABLES

Poisson random variable Poisson(λ)
The Poisson random variable, unlike the ones we have seen before, is very closely
connected with continuous things
...

The best example is radioactive decay: atomic nuclei decay randomly, but the
average number λ which will decay in a given interval is constant
...

So if, on average, there are 2
...
4) random variable
...

Although we will not prove it, the p
...
f
...

k!
Let’s check that these probabilities add up to one
...

By analogy with what happened for the binomial and geometric random variables, you might have expected that this random variable would be called ‘exponential’
...
However, if you speak a little French,
you might use as a mnemonic the fact that if I go fishing, and the fish are biting at
the rate of λ per hour on average, then the number of fish I will catch in the next
hour is a Poisson(λ) random variable
...

Again we use the probability generating function
...

Differentiation gives λeλ(x−1) , so E(X) = λ
...


3
...
CONTINUOUS RANDOM VARIABLES

55

The cumulative distribution function of a Poisson random variable is tabulated
in Table 2 of the New Cambridge Statistical Tables
...
4 fish bite per hour on average, then the probability that I will
catch no fish in the next hour is 0
...
9643 (so that the probability that I catch six or more is 0
...

There is another situation in which the Poisson distribution arises
...
So I conduct 1000 independent trials
...

But it turns out to be Poisson(1), to a very good approximation
...

The general rule is:
If n is large, p is small, and np = λ, then Bin(n, p) can be approximated by Poisson(λ)
...
6

Continuous random variables

We haven’t so far really explained what a continuous random variable is
...
The crucial property is that, for any real number a, we have (X = a) = 0;
that is, the probability that the height of a random student, or the time I have to
wait for a bus, is precisely a, is zero
...

We use the cumulative distribution function or c
...
f
...
Remember from
last week that the c
...
f
...

Note: The name of the function is FX ; the lower case x refers to the argument
of the function, the number which is substituted into the function
...
Note that FX (y) is the same function written in terms of
the variable y instead of x, whereas FY (x) is the c
...
f
...
)
Now let X be a continuous random variable
...

Proposition 3
...
d
...
is an increasing function (this means that FX (x) ≤
FX (y) if x < y), and approaches the limits 0 as x → −∞ and 1 as x → ∞
...
RANDOM VARIABLES
The function is increasing because, if x < y, then
FX (y) − FX (x) = P(X ≤ y) − P(X ≤ x) = P(x < X ≤ y) ≥ 0
...
It is obtained by differentiating the c
...
f
...

dx
Now fX (x) is non-negative, since it is the derivative of an increasing function
...
Because FX (−∞) = 0, we
have
Z x
FX (x) =
fX (t)dt
...
Note also that
P(a ≤ X ≤ b) = FX (b) − FX (a) =

Z b
a

fX (t)dt
...
d
...
like this: the probability that the value of X lies in a
very small interval from x to x + h is approximately fX (x) · h
...

There is a mechanical analogy which you may find helpful
...
Then the total mass is one, and the expected value of X is
the centre of mass
...
Then again the total mass is one; the mass to the left of
x is FX (x); and again it will hold that the centre of mass is at E(X)
...
m
...
by the p
...
f
...
Thus, the expected value of
X is given by
Z


E(X) =
−∞

x fX (x)dx,

and the variance is (as before)
Var(X) = E(X 2 ) − E(X)2 ,
where
2

x2 fX (x)dx
...


E(X ) =
It is also true that Var(X) = E((X

Z ∞

3
...
MEDIAN, QUARTILES, PERCENTILES

57

We will see examples of these calculations shortly
...
The support of a continuous random variable is the smallest
interval containing all values of x where fX (x) > 0
...
d
...
given by
2x if 0 ≤ x ≤ 1,
0 otherwise
...
We check the integral:
fX (x) =

Z 1

Z ∞
−∞

fX (x)dx =

2x dx = x2

0

x=1
x=0

= 1
...

(Study this carefully to see how it works
...


Var(X) =
2
3
18
Z ∞

E(X) =

3
...
More formally, m should satisfy FX (m) = 1/2
...
In the example at the end of the last section, we saw that
E(X) = 2/3
...
Since

FX (x) = x2 for 0 ≤ x ≤ 1, we see that m = 1/ 2
...

The lower quartile l and the upper quartile u are similarly defined by
FX (l) = 1/4,

FX (u) = 3/4
...
More generally, we
define the nth percentile of X to be the value of xn such that
FX (xn ) = n/100,

58

CHAPTER 3
...

Reminder If the c
...
f
...
d
...
is fX (x), then
• differentiate FX to get fX , and integrate fX to get FX ;
• use fX to calculate E(X) and Var(X);
• use FX to calculate P(a ≤ X ≤ b) (this is FX (b) − FX (a)), and the median
and percentiles of X
...
8

Some continuous random variables

In this section we examine three important continuous random variables: the uniform, exponential, and normal
...


Uniform random variable U(a, b)
Let a and b be real numbers with a < b
...
In other
words, its probability density function is constant on the interval [a, b] (and zero
outside the interval)
...
d
...
is the area of a rectangle of height c and base b − a; this must be 1, so
c = 1/(b − a)
...
d
...
of the random variable X ∼ U(a, b) is given by
fX (x) = 1/(b − a) if a ≤ x ≤ b,
0
otherwise
...
d
...
is
FX (x) =

0
if x < a,
(x − a)/(b − a) if a ≤ x ≤ b,
1
if x > b
...
d
...
) shows that the expected value
and the median of X are both given by (a + b)/2 (the midpoint of the interval),
while Var(X) = (b − a)2 /12
...
However, it is very useful for simulations
...
Of course, they are not really random, since
the computer is a deterministic machine; but there should be no obvious pattern to

3
...
SOME CONTINUOUS RANDOM VARIABLES

59

the numbers produced, and in a large number of trials they should be distributed
uniformly over the interval
...
Its
great simplicity makes it the best choice for this purpose
...
g
...
The
Poisson random variable, which is discrete, counts how many events will occur
in the next unit of time
...
Not that it
takes non-negative real numbers as values
...
d
...
of X is
0
λe−λx

fX (x) =

if x < 0,
if x ≥ 0
...
d
...
to be
FX (x) =

0
1 − e−λx

if x < 0,
if x ≥ 0
...


The median m satisfies 1 − e−λm = 1/2, so that m = log 2/λ
...
69314718056 approximately
...
There is a theorem called the central limit theorem which says that, for
virtually any random variable X which is not too bizarre, if you take the sum (or
the average) of n independent random variables with the same distribution as X,
the result will be approximately normal, and will become more and more like a
normal variable as n grows
...


60

CHAPTER 3
...
(If you are approximating any discrete random variable by a
continuous one, you should make a “continuity correction” – see the next section
for details and an example
...
d
...
of the random variable X ∼ N(µ, σ2 ) is given by the formula
2
2
1
fX (x) = √ e−(x−µ) /2σ
...
The picture below shows the graph of this
function for µ = 0, the familiar ‘bell-shaped curve’
...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


...


The c
...
f
...
d
...
However, it is not
possible to write the integral of this function (which, stripped of its constants, is
2
e−x ) in terms of ‘standard’ functions
...

The crucial fact that means that we don’t have to tabulate the function for all
values of µ and σ is the following:
Proposition 3
...

So we only need tables of the c
...
f
...
d
...
of any normal random variable
...
d
...
of the standard normal is given in Table 4 of the New Cambridge
Statistical Tables [1]
...

For example, suppose that X ∼ N(6, 25)
...
4
...
4) = 0
...

The p
...
f
...
v
...
This means
that, for any positive number c,
Φ(−c) = P(Y ≤ −c) = P(Y ≥ c) = 1 − P(Y ≤ c) = 1 − Φ(c)
...

So, if X ∼ N(6, 25) and Y = (X − 6)/5 as before, then
P(X ≤ 3) = P(Y ≤ −0
...
6) = 1 − 0
...
2743
...
9
...
9

61

On using tables

We end this section with a few comments about using tables, not tied particularly
to the normal distribution (though most of the examples will come from there)
...
Tabulating something
with the input given to one extra decimal place would make the table ten times as
bulky! Interpolation can be used to extend the range of values tabulated
...
It is probably true that F is changing at a roughly constant rate
between, say, 0
...
29
...
283) will be about three-tenths of the way
between F(0
...
29)
...
d
...
of the normal distribution, then Φ(0
...
6103 and Φ(0
...
6141, so Φ(0
...
6114
...
0038 is
0
...
)

Using tables in reverse
This means, if you have a table of values of F, use it to find x such that F(x) is a
given value c
...

For example, if Φ is the c
...
f
...
67) = 0
...
68) = 0
...
6745 (since 0
...
0031 = 0
...

In this case, the percentile points of the standard normal r
...
are given in Table
5 of the New Cambridge Statistical Tables [1], so you don’t need to do this
...


Continuity correction
Suppose we know that a discrete random variable X is well approximated by a
continuous random variable Y
...
d
...
of Y and want to
find information about X
...
This probability is
equal to
P(X = a) + P(x = a + 1) + · · · + P(X = b)
...
d
...
of Y
...
RANDOM VARIABLES

of a rectangle of height fY (a) and base 1 (from a − 0
...
5)
...
5 to
x = a + 0
...
Similarly for the other values
...


...


...


...


...


...


...


...


...
5

a

a+0
...
we find that P(a ≤ X ≤ b) is approximately equal to
the area under the curve y = fY (x) from x = a − 0
...
5
...
5) − FY (a − 0
...
Said otherwise,
this is P(a − 0
...
5)
...
Then
P(a ≤ X ≤ b) ≈ P(a−0
...
5) = FY (b+0
...
5)
...
) Similarly, for example, P(X ≤ b) ≈
P(Y ≤ b + 0
...
5)
...
75, and light
bulbs fail independently
...
Then X ∼
Bin(192, 3/4), and so E(X) = 144, Var(X) = 36
...
5 ≤ Y ≤ 150
...


3
...
WORKED EXAMPLES

63

Let Z = (Y − 144)/6
...
5 − 144
139
...
75 ≤ Z ≤ 1
...
8606 − 0
...
6338
...
5 ≤ Y ≤ 150
...
10

Worked examples

Question I roll a fair die twice
...

(a) Write down the joint p
...
f
...

(b) Write down the p
...
f
...

(c) Write down the p
...
f
...

(d) Are the random variables X and Y independent?
Solution (a)
Y
0

1
0

1
2
3
X 4
5
6

3
0

4
0

5
0

2
36
2
36
2
36
2
36
2
36

1
36
1
36
1
36
1
36
1
36
1
36

2
0
0

0

0

0

2
36
2
36
2
36
2
36

0

0

0

2
36
2
36
2
36

0

0

2
36
2
36

0
2
36

The best way to produce this is to write out a 6 × 6 table giving all possible values
for the two throws, work out for each cell what the values of X and Y are, and
then count the number of occurrences of each pair
...

(b) Take row sums:
x

1

2

3

4

5

6

P(X = x)

1
36

3
36

5
36

7
36

9
36

11
36

64

CHAPTER 3
...

1296

(c) Take column sums:
y

0

1

2

3

4

5

P(Y = y)

6
36

10
36

8
36

6
36

4
36

2
36

and so

35
665
,
Var(Y ) =

...
g
...


Question An archer shoots an arrow at a target
...
d
...
is given by
fX (x) =

(3 + 2x − x2 )/9 if x ≤ 3,
0
if x > 3
...
5 0
...
5 1
...

Solution First we work out the probability of the arrow being in each of the
given bands:
P(X < 0
...
5) − FX (0) =

Z 0
...

216

dx
1/2
0

Similarly we find that P(0
...
5) = 47/216,
P(1
...
So the p
...
f
...
10
...

216
27

Question Let T be the lifetime in years of new bus engines
...

(a) Find the value of d
...

(c) Suppose that 240 new bus engines are installed at the same time, and that
their lifetimes are independent
...

Solution (a) The integral of fT (x), over the support of T , must be 1
...

(b) The c
...
f
...
d
...
; that is, it is

0

for x < 1
FT (x) =
 1 − 1 for x > 1

x2
The mean of T is
Z ∞
1

x fT (x) dx =

Z ∞
2
1

x2

dx = 2
...
That is, 1 − 1/m2 = 1/2,
or m = 2
...
RANDOM VARIABLES
(c) The probability that an engine lasts for four years or more is
1 − FT (4) = 1 − 1 −

1
4

2

=

1

...

We approximate X by Y ∼ N(15, (15/4)2 )
...
5)
...
5) = P(Z ≤ −1
...
2)
= 0
...

Note that we start with the continuous random variable T , move to the discrete
random variable X, and then move on to the continuous random variables Y and
Z, where finally Z is standard normal and so is in the tables
...
)
But what went wrong with our argument for the Birthday Paradox? We assumed (without saying so) that the birthdays of the people in the room were independent; but of course the birthdays of twins are clearly not independent!

Chapter 4
More on joint distribution
We have seen the joint p
...
f
...
Now we examine
this further to see measures of non-independence and conditional distributions of
random variables
...
1

Covariance and correlation

In this section we consider a pair of discrete random variables X and Y
...
We introduce a number (called
the covariance of X and Y ) which gives a measure of how far they are from being
independent
...
We found that, in any case,
Var(X +Y ) = Var(X) + Var(Y ) + 2(E(XY ) − E(X)E(Y )),
and then proved that if X and Y are independent then E(XY ) = E(X)E(Y ), so that
the last term is zero
...
We
write Cov(X,Y ) for this quantity
...
1 (a) Var(X +Y ) = Var(X) + Var(Y ) + 2 Cov(X,Y )
...

67

68

CHAPTER 4
...


(4
...
It is defined as
follows:
Cov(X,Y )

...

Theorem 4
...
Then
(a) −1 ≤ corr(X,Y ) ≤ 1;
(b) if X and Y are independent, then corr(X,Y ) = 0;
(c) if Y = mX + c for some constants m = 0 and c, then corr(X,Y ) = 1 if m > 0,
and corr(X,Y ) = −1 if m < 0
...
But note that
this is another check on your calculations: if you calculate a correlation coefficient
which is bigger than 1 or smaller than −1, then you have made a mistake
...

For part (c), suppose that Y = mX + c
...

Now we just calculate everything in sight
...


Thus the correlation coefficient is a measure of the extent to which the two
variables are related
...
More generally, a
positive correlation indicates a tendency for larger X values to be associated with
larger Y values; a negative value, for smaller X values to be associated with larger
Y values
...
1
...
Let X be the number of red pens that I choose and
Y the number of green pens
...
m
...
of X and Y is given by the
following table:
Y
0 1
1
0 0 6
X 1
2

1
3
1
6

1
3

0

From this we can calculate the marginal p
...
f
...


Also, E(XY ) = 1/3, since the sum
E(XY ) = ∑ ai b j P(X = ai ,Y = b j )
i, j

contains only one term where all three factors are non-zero
...

3
1/12

The negative correlation means that small values of X tend to be associated with
larger values of Y
...

Example We have seen that if X and Y are independent then Cov(X,Y ) = 0
...
Consider the following joint
p
...
f
...
MORE ON JOINT DISTRIBUTION

Now calculation shows that E(X) = E(Y ) = E(XY ) = 0, so Cov(X,Y ) = 0
...

We call two random variables X and Y uncorrelated if Cov(X,Y ) = 0 (in other
words, if corr(X,Y ) = 0)
...

Here is the proof that the correlation coefficient lies between −1 and 1
...

This depends on the following fact:
Let p, q, r be real numbers with p > 0
...
Then q2 ≤ pr
...
This means that the quadratic equation px2 + 2qx + r = 0 either has no real roots, or
has two equal real roots
...

Now let p = Var(X), q = Cov(X,Y ), and r = Var(Y )
...
1) shows that
px2 + 2qx + r = Var(xX +Y )
...
Now our argument above shows that q2 ≤ pr, that is, Cov(X,Y )2 ≤ Var(X) · Var(Y ),
as required
...
2

Conditional random variables

Remember that the conditional probability of event B given event A is P(B | A) =
P(A ∩ B)/P(A)
...
Then the conditional probability
that X takes a certain value ai , given A, is just
P(X = ai | A) =

P(A holds and X = ai )

...

So we can, for example, talk about the conditional expectation
E(X | A) = ∑ ai P(X = ai | A)
...
2
...
In this case, we have
P(X = ai | Y = b j ) =

P(X = ai ,Y = b j )

...
m
...
table of X and Y
corresponding to the value Y = b j
...
We divide the entries in the column by
this value to obtain a new distribution of X (whose probabilities add up to 1)
...

i

Example I have two red pens, one green pen, and one blue pen, and I choose
two pens without replacement
...
Then the joint p
...
f
...

3
3
If we know the conditional expectation of X for all values of Y , we can find
the expected value of X:

Proposition 4
...

j

Proof:

E(X) =

∑ aiP(X = ai)
i

72

CHAPTER 4
...

j

In the above example, we have
E(X) = E(X | Y = 0)P(Y = 0) + E(X | Y = 1)P(Y = 1)
= (4/3) × (1/2) + (2/3) × (1/2)
= 1
...
Recall the situation: I have a coin with probability p of showing heads; I
toss it repeatedly until heads appears for the first time; X is the number of tosses
...
If Y = 1, then we stop the experiment then and
there; so if Y = 1, then necessarily X = 1, and we have E(X | Y = 1) = 1
...
So
E(X) = E(X | Y = 0)P(Y = 0) + E(X | Y = 1)P(Y = 1)
= (1 + E(X)) · q + 1 · p
= E(X)(1 − p) + 1;
rearranging this equation, we find that E(X) = 1/p, confirming our earlier value
...
1, we saw that independence of events can be characterised
in terms of conditional probabilities: A and B are independent if and only if they
satisfy P(A | B) = P(A)
...
4 Let X and Y be discrete random variables
...

This is obtained by applying Proposition 15 to the events X = ai and Y = b j
...
m
...
of X | (Y = b j ) is equal to the p
...
f
...


4
...
JOINT DISTRIBUTION OF CONTINUOUS R
...
S

4
...
v
...
1) remains valid
...
The formalism here needs even more concepts from calculus than we
have used before: functions of two variables, partial derivatives, double integrals
...

Let X and Y be continuous random variables
...

We define X and Y to be independent if P(X ≤ x,Y ≤ y) = P(X ≤ x) · P(Y ≤ y),
for any x and y, that is, FX,Y (x, y) = FX (x) · FY (y)
...
)
The joint probability density function of X and Y is
fX,Y (x, y) =

∂2
FX,Y (x, y)
...
)
The probability that the pair of values of (X,Y ) corresponds to a point in some
region of the plane is obtained by taking the double integral of fX,Y over that
region
...
)
The marginal p
...
f
...
d
...
of Y is similarly
Z ∞

fY (y) =

−∞

fX,Y (x, y)dx
...
MORE ON JOINT DISTRIBUTION

Then the conditional p
...
f
...

fY (b)

The expected value of XY is, not surprisingly,
Z ∞Z ∞

E(XY ) =
−∞ −∞

xy fX,Y (x, y)dx dy,

and then as in the discrete case
Cov(X,Y ) = E(XY ) − E(X)E(Y ),

corr(X,Y ) =

Cov(X,Y )

...

As usual this holds if and only if the conditional p
...
f
...
d
...
of X, for any value b
...


4
...
v
...

Example Let X √ Y be random variables
...
What is the support of Y ? Find the cumulative distribution
function and the probability density function of Y
...
Now
FY (y) =
=
=
=


X, so the support of Y is [0, 2]
...
4
...
(Note that
y
2 , since Y = X
...


The argument in (b) is the key
...
This means that y = g(x) if and

only if x = h(y)
...
) Thus
FY (y) = FX (h(y)),
and so, by the Chain Rule,
fY (y) = fX (h(y))h (y),
where h is the derivative of h
...
)
Applying this formula in our example we have
fY (y) =

1
y
· 2y =
4
2

for 0 ≤ y ≤ 2, since the p
...
f
...

Here is a formal statement of the result
...
5 Let X be a continuous random variable
...
Let Y = g(X)
...
d
...
of Y is given by fY (y) = fX (h(y))|h (y)|, where h is the inverse
function of g
...
6: if X ∼ N(µ, σ2 ) and Y =
(X − µ)/σ, then Y ∼ N(0, 1)
...

σ 2π

76

CHAPTER 4
...
Thus, h (y) = σ, and
2
1
fY (y) = fX (σy + µ) · σ = √ e−y /2 ,


the p
...
f
...

However, rather than remember this formula, together with the conditions for
its validity, I recommend going back to the argument we used in the example
...
For example, if X is a random
variable taking both positive and negative values, and Y = X 2 , then a given value


y of Y could arise from either of the values y and − y of X, so we must work
out the two contributions and add them up
...
Find the p
...
f
...


2
The p
...
f
...
Let Φ(x) be its c
...
f
...




Now Y = X 2 , so Y ≤ y if and only if − y ≤ X ≤ y
...

So
d
FY (y)
dy
1

= 2Φ ( y) · √
2 y
1 −y/2
= √
e

...
d
...
is zero
...
5
...
If you blindly applied
the formula of Theorem 4
...
d
...
of X
...
5

Worked examples

Question Two numbers X and Y are chosen independently from the uniform
distribution on the unit interval [0, 1]
...

Find the p
...
f
...

Solution The c
...
f
...

(The variable can be called x in both cases; its name doesn’t matter
...


(For, if both X and Y are smaller than a given value x, then so is their maximum;
but if at least one of them is greater than x, then again so is their maximum
...

Thus P(Z ≤ x) = x2
...
So
the c
...
f
...

The√
median of Z is the value of m such that FZ (m) = 1/2, that is m2 = 1/2, or
m = 1/ 2
...
d
...
of Z by differentiating:
fZ (x) =

2x
0

if 0 < x < 1,
otherwise
...

18

78

CHAPTER 4
...
If N is the number showing
on the die, I then toss a fair coin N times
...

(a) Write down the p
...
f
...

(b) Calculate E(X) without using this information
...
So P(X = k | N = n) = nCk (1/2)n
...


n=1

Clearly P(N = n) = 1/6 for n = 1,
...
So to find P(X = k), we add up the
probability that X = k for a Bin(n, 1/2) r
...
for n = k,
...
(We
start at k because you can’t get k heads with fewer than k coin tosses!) The answer
comes to
k 0
1
2
3
4
5
6
63
120
99
64
29
8
1
P(X = k) 384 384 384 384 384 384 384
For example,
P(X = 4) =

4C (1/2)4 + 5C (1/2)5 + 6C (1/2)6
4
4
4

6

=

4 + 10 + 15

...
3,
6

E(X) =

∑ E(X | (N = n))P(N = n)
...
So
6

E(X) =

∑ (n/2) · (1/6) =

n=1

1+2+3+4+5+6 7
=
...
m
...
to check that the answer is the same!

Appendix A
Mathematical notation
The Greek alphabet

Mathematicians use the Greek alphabet for an extra supply of symbols
...

You don’t need to learn this; keep it
for reference
...


79

Name
Capital Lowercase
alpha
A
α
beta
B
β
gamma
Γ
γ
delta

δ
epsilon
E
ε
zeta
Z
ζ
eta
H
η
theta
Θ
θ
iota
I
ι
kappa
K
κ
lambda
Λ
λ
mu
M
µ
nu
N
ν
xi
Ξ
ξ
omicron
O
o
pi
Π
π
rho
P
ρ
sigma
Σ
σ
tau
T
τ
upsilon
ϒ
υ
phi
Φ
φ
chi
X
χ
psi
Ψ
ψ
omega

ω

80

APPENDIX A
...

(some people include 0)

...

, −2,
1
2 , 2, π,
...
5

a divides b

4 | 12

m choose n

5C
2

n factorial

5! = 120

∑ xi

xa + xa+1 + · · · + xb

∑ i2 = 12 + 22 + 32 = 14

x≈y

(see section on Summation below)
x is approximately equal to y

mC or m
n
n
n!

= 10

3

b
i=a

i=1

Sets
Notation
{
...
}
or {x |
...


A∪B
A∩B
A\B
A⊆B
A
/
0
(x, y)
A×B

cardinality of A
(number of elements in A)
A union B
(elements in either A or B)
A intersection B
(elements in both A and B)
set difference
(elements in A but not B)
A is a subset of B (or equal)
complement of A
empty set (no elements)
ordered pair
Cartesian product
(set of all ordered pairs)

Example
{1, 2, 3}
NOTE: {1, 2} = {2, 1}
2 ∈ {1, 2, 3}
{x : x2 = 4} = {−2, 2}
|{1, 2, 3}| = 3
{1, 2, 3} ∪ {2, 4} = {1, 2, 3, 4}
{1, 2, 3} ∩ {2, 4} = {2}
{1, 2, 3} \ {2, 4} = {1, 3}
{1, 3} ⊆ {1, 2, 3}
everything not in A
/
{1, 2} ∩ {3, 4} = 0
NOTE: (1, 2) = (2, 1)
{1, 2} × {1, 3} =
{(1, 1), (2, 1), (1, 3), (2, 3)}

81

Summation
What is it?
Let a1 , a2 , a3 ,
...
The notation
n

∑ ai

i=1

(read “sum, from i equals 1 to n, of ai ”), means: add up the numbers a1 , a2 ,
...


i=1
n

The notation

∑ a j means exactly the same thing
...

m

The notation

∑ ai is not the same, since (if m and n are different) it is telling

i=1

us to add up a different number of terms
...
For example,
20

∑ ai = a10 + a11 + · · · + a20
...
For example, if X is a discrete random
variable, then we say that
E(X) = ∑ ai P(X = ai )
i

where the sum is over all i such that ai is a value of the random variable X
...

n

n

n

∑ (ai + bi) = ∑ ai + ∑ bi
...
1)

i=1

Imagine the as and bs written out with a1 + b1 on the first line, a2 + b2 on the
second line, and so on
...
MATHEMATICAL NOTATION

and then add up all the results
...
The answers
must be the same
...


(A
...
A simple
example shows how it works: (a1 + a2 )(b1 + b2 ) = a1 b1 + a1 b2 + a2 b1 + a2 b2
...

(A
...
The right
says: differentiate each function and add up the derivatives
...

k=0

Infinite sums


Sometimes we meet infinite sums, which we write as

∑ ai for example
...
We
need Analysis to give us a definition in general
...
You also need to know
the sum of the “exponential series”
xi
x2 x3 x4
= 1 + x + + + + · · · = ex
...
In Analysis you will see some answers to this question
...


Appendix B
Probability and random variables
Notation
In the table, A and B are events, X and Y are random variables
...
m
...
or same p
...
f
...
48)
• Occurs when there is a single trial with a fixed probability p of success
...

• p
...
f
...

• E(X) = p, Var(X) = pq
...
PROBABILITY AND RANDOM VARIABLES

Binomial random variable Bin(n, p) (p
...
g
...
Also, sampling with replacement from a population
with a proportion p of distinguished elements
...

• Values 0, 1, 2,
...

• p
...
f
...

• E(X) = np, Var(X) = npq
...
51)
• Occurs when we are sampling n elements without replacement from a population of N elements of which M are distinguished
...
, n
...
m
...
P(X = k) = (MCk · N−MCn−k )/N Cn
...

N −1

• Approximately Bin(n, M/N) if n is small compared to N, M, N − M
...
52)
• Describes the number of trials up to and including the first success in a
sequence of independent Bernoulli trials, e
...
number of tosses until the
first head when tossing a coin
...
(any positive integer)
...
m
...
P(X = k) = qk−1 p, where q = 1 − p
...


85

Poisson random variable Poisson(λ) (p
...
g
...

• Values 0, 1, 2,
...
m
...
P(X = k) = e−λ λk /k!
...

• If n is large, p is small, and np = λ, then Bin(n, p) is approximately equal
to Poisson(λ) (in the sense that the p
...
f
...


Uniform random variable U[a, b] (p
...

• p
...
f
...


• c
...
f
...


• E(X) = (a + b)/2, Var(X) = (b − a)2 /12
...
59)
• Occurs in the same situations as the Poisson random variable, but measures
the time from now until the first occurrence of the event
...
d
...
f (x) =

0
λ e−λx

• c
...
f
...

if x < 0,
if x ≥ 0
...

• However long you wait, the time until the next occurrence has the same
distribution
...
PROBABILITY AND RANDOM VARIABLES

Normal random variable N(µ, σ2 ) (p
...
This also works for many other types of random variables: this
statement is known as the Central Limit Theorem
...
d
...
f (x) = √ e−(x−µ) /2σ
...
d
...
; use tables
...

• For large n, Bin(n, p) is approximately N(np, npq)
...
If X ∼ N(µ, σ2 ), then (X −
µ)/σ ∼ N(0, 1)
...
d
...
s of the Binomial, Poisson, and Standard Normal random variables
are tabulated in the New Cambridge Statistical Tables, Tables 1, 2 and 4
Title: Probability - Everything You Need To Know
Description: These are notes on everything in probability. Contents 1 Basic ideas 1 1.1 Sample space, events . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 What is probability? . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Kolmogorov’s Axioms . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Proving things from the axioms . . . . . . . . . . . . . . . . . . . 4 1.5 Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . . . 6 1.6 Other results about sets . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.8 Stopping rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.9 Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . 13 1.10 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.11 Mutual independence . . . . . . . . . . . . . . . . . . . . . . . . 16 1.12 Properties of independence . . . . . . . . . . . . . . . . . . . . . 17 1.13 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 Conditional probability 23 2.1 What is conditional probability? . . . . . . . . . . . . . . . . . . 23 2.2 Genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 The Theorem of Total Probability . . . . . . . . . . . . . . . . . 26 2.4 Sampling revisited . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6 Iterated conditional probability . . . . . . . . . . . . . . . . . . . 31 2.7 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3 Random variables 39 3.1 What are random variables? . . . . . . . . . . . . . . . . . . . . 39 3.2 Probability mass function . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Expected value and variance . . . . . . . . . . . . . . . . . . . . 41 3.4 Joint p.m.f. of two random variables . . . . . . . . . . . . . . . . 43 3.5 Some discrete random variables . . . . . . . . . . . . . . . . . . 47 3.6 Continuous random variables . . . . . . . . . . . . . . . . . . . . 55 vii viii CONTENTS 3.7 Median, quartiles, percentiles . . . . . . . . . . . . . . . . . . . . 57 3.8 Some continuous random variables . . . . . . . . . . . . . . . . . 58 3.9 On using tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.10 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 More on joint distribution 67 4.1 Covariance and correlation . . . . . . . . . . . . . . . . . . . . . 67 4.2 Conditional random variables . . . . . . . . . . . . . . . . . . . . 70 4.3 Joint distribution of continuous r.v.s . . . . . . . . . . . . . . . . 73 4.4 Transformation of random variables . . . . . . . . . . . . . . . . 74 4.5 Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A Mathematical notation 79 B Probability and random variables 83