Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Probability and Statistics
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Statistics notes
Andre Sostar
November 27, 2014

Contents

1 Discrete Random Variables

5

Bernoulli Random Variables

6

The Binomial Distribution

7

The Geometric Distribution

8

The Negative Binomial Distribution

8

The Hypergeometric Distribution

9

The Poisson Distribution

9

2 Continuous Random Variables

10

The Exponential Density

11

The Gamma Density

11

The Beta Density

12

The Normal Distribution

12

3 Functions of a Random Variable

13

The cdf Method

13

The Transformation Method

13

1

Generating pseudorandom variables
4 The Expected Value

13
15

The Expected Value of a random variable

15

Calculating Expected Values

15

Expected values of functions of random variables

15

5 The Variance

16

The Variance of a random variable

16

Chebyshev's inequality

17

6 Moment Generating Functions

18

Moments

18

Moment Generating Functions

18

7 Joint and Marginal Distributions

20

Multivariate Random Variables

20

Continuous r
...
joint pdf

20

Marginal cdf and marginal pdf

21

8 Independent Random Variables

23

Independent Random Variables

23

Conditional probability mass function

23

Conditional probability density function

24

The Law of Total Probability

24

9 Functions of Jointly Distributed r
...
i
...
r
...

25
Expected values functions of jointly distributed r
...

10 Covariance and Correlation

26
27

The concept of dependence

27

Linear dependency

27

Covariance

27

Covariance

27

The Correlation Coecient

28

Covariance of linear combinations

28

11 Conditional expectations

30

Conditional expectations

30

Conditional variance

30

12 The Law of Large Numbers

31

Unbiasedness

31

Consistency

31

The Law of Large Numbers

31

13 Convergence in distribution

32

Convergence in distribution

32

Theorem proving Convergence in distribution

32

Standardizing

32

14 The Central Limit Theorem

33
3

The Central Limit Theorem
15 Other Important Laws

33
34

The Multiplicative Law

34

The Multiplicative Law (bivariate)

34

The conditional pmf/pdf

34

Hierarchic Models

34

Independent Addition Law

34

4

1

Discrete Random Variables

Denition: The probability that Y takes on the value y,

P (Y = y), is

dened as the sum of the probabilities of all sample points in S that are
assigned the value y
...

For any discrete probability distribution, the following must be true:
1
...

2
...


Denition: A random variable on Ω is a real valued function X : Ω → R,
that is a function that assigns a real value to each possible outcome in Ω
...

Denition: The probability mass function (pmf ), is the probability
measure on the sample space that determines the probabilities of the various
values of X; is those values are denoted by x1 , x2 , · · · , then there is a function
p such that
p(xi ) = P (X = xi )

and



p(xi ) = 1
...


Question 1
...
g
...
If so, is the draw done with or without replacement, i
...
is the
probability of success always the same?
Question 3
...
In what way does our random variable evaluate the outcome
of the random experiment, i
...
the drawn balls?
5

The cumulative distribution function (cdf ) of a random variable, is
dened to be

F (x) = P (X ≤ x),

−∞
The cumulative distribution function is non-decreasing and satises
lim F (x) = 0 and lim F (x) = 1
...
It's function is thus
{
px (1 − p)1−x ,
p(x) =
0,

if x = 0 or x = 1
otherwise

A Bernoulli random variable might take on the value 1 or 0 according to
whether a guess was a success or a failure
...
We draw one ball and the random variable is
X = The number of "successes" among the drawn balls

6

The Binomial Distribution: Any particular sequence of k successes occurs
with probability

pk (1 − p)n−k ,

from the multiplication principle
...
P (X = k) is thus the probability of any

particular sequence times the number of such sequences:
( )
n k
p(k) =
p (1 − p)n−k
k

The draw is done with replacement, the number of balls to be drawn xed,
and the random variable is
X = The number of "successes" among the drawn balls
A binomial experiment possesses the following properties:
1
...

2
...

3
...
The probability of a failure is
equal to q = (1 − p)
...
The trials are independent
...
The random variable of interest is Y, the number of successes observed
during the n trials
...
So that X = k, there mus be k − 1 failures followed by as success
...
The last trial is a success, and the remaining r − 1 successes can
(
)
r

be assigned to the remaining k − 1 trials in

k−r

k−1
r−1

ways
...
We now draw balls until
the r'th "success"

8

The Hypergeometric Distribution: Let X denote the number of black
balls drawn when taking m balls without replacement
...

The draw is done without replacement, the number of balls to be drawn
xed, and the random variable is
X = The number of "successes" among the drawn balls

The Poisson frequency function with parameter λ(λ > 0) is
P (X = k) =

λk −λ
e , k = 1, 2, 3, · · ·
k!

The Poisson Distribution can be derived as the limit of a a binomial

distribution as the number of trials, n, approaches innity and the probability
of success on each trial, p, approaches zero in such a way that np = λ
...


Cumulative distribution function (cdf )of X is the function
[0, 1] with

F : R →

F (x) = P r(X ≤ x) = P r(X ∈ (−∞, x)

Basic properties of a cdf:
• F is a non-decreasing function
• F (−∞) ≡ limx→−∞ F(x) = 0,

F(∞) ≡ limx→∞ F(x) = 1

• P r(a < X ≤ b) = F (b) − F (a), a, b ∈ R, a < b

The p'th quantile of the distribution is dened to be that (in this situation

unique) value xp such that

F (xp ) = P r(X ≤ xp ) = p ⇔ xp = F −1 (p)
• Median, η , splits the distribution to

50
50

,

F (x0
...
5 ⇔ η = F −1 (0
...

Γ(α)

g(x) =

The gamma function, Γ(α), is dened as

Γ(α) =



y α−1 e−y dy, y > 0

0

11

The Beta Density function depends on two parameters,α and β :
xα−1 (1 − x)β−1
, 0B(α, β)

f (b) =

The beta function, B(α, β), is dened as the integral


1

y α−1 (1 − y)β−1 dy =

B(α, β) =
0

Γ(α)Γ(β)
Γ(α + β)

The Normal Distribution function depends on two parameters,µ and
(where −∞ < µ < ∞, > 0):

−(x−µ)2
1
f (x) = √ e 2σ2 ,
σ 2π

µ is called

−∞
mean and σis called standard deviation
...


1) Use the fact that Y

= g(X) to express FY in terms of FX

2) In order to nd fy , we dierentiate FY using the chain rule
...
Use the given information to nd the pdf of Y
...


The Transformation Method: When the function Y = g(X) is strictly
monotone, the cdf method can be formalized
...


Any linear transformation of a Normal random variable is itself Normal
...
Suppose that f (x) = 0 if x is not in I
...
Here g −1 is the inverse function of g ; that is, g −1 (y) = x if y = g(x)
...

Proposition D Let U be uniform on [0, 1], and let X = F −1 (U )
...


These numbers are called pseudorandom because they are generated according
to some rule or algorithm and thus are not really random
...
The possible values of the random variable are weighted
by their probabilities, as specied in the following denition
...


A standard technique:
1) Try to merge x and the pmf/pdf of X
...


3) Since the summand/integrand is a pdf/pmf it sums to 1
...


Expected values of functions of random variables: If we are interested
in nding E(Y ) = E[g(X)]
...
Then the expected value of g(Y ) is given by
E[g(y)] =



g(y)p(y)

∀y

Denition: If X is a random variable with expected value E(X), the variance
of X is

V ar(X) = E{[X − E(X)]2 }

provided that the expectation exists
...
Variance is dened as the expected squared distance between X and E(X)
...

µY = aµX + b
2
2
σY = a2 σX

A good example would be Fahrenheit-Celsius transformation
...
Then, for any
t > 0,
P r(|X − µ| > t) ≤

σ2
t2

Theorem says that if σ 2 is very small, there is a high probability that X will
not deviate much from µ
...

k2

or,

6

Moment Generating Functions

The k'th moment of a random variable Y taken about the origin is dened
to be E(Y k ) and is denoted by µ′k
...


Moment:
by

E(X) is the rst moment of X
...
The
k 'th central moment of X is dened by


E(X − µ) =
k



−∞

(x − µ)k fX (x)dx

The moment-generating function (mgf) of a random variable X is M (t) =
E(etX ) if the expectation is dened
...


−∞

Theorem: If m(t) exists, then for any positive integer k,
dk m(t)
dtk

= m(k) (0) = µ′k
k=0

In other words, if you nd the k'th derivative of m(t) with respect to t and
then set t = 0, the result will be µ′k
In summary, a moment-generating function is a mathematical expression
that sometimes (but not always) provides an easy way to nd moments associated with random variables
...


18

Linear transformation: Let X have mgf
constants
...
The joint probability mass
function of X and Y is dened by
pX,Y (x, y) = P r(X = x, Y = y),

x, y ∈ R

If X and Y are discrete random variables with joint probability function
p(x, y), then
1
...

2
...


The marginal probability mass functions of X and Y can be derived via:
pX (x) =



pX,Y (x, y) and pY (y) =

y



pX,Y (x, y)

x

The joint cumulative distribution function (cdf) of two random variables X and Y is dened by
FX,Y (x, y) = P r(X ≤ x, Y ≤ y),

x, y ∈ R

For the joint distribution function it holds that

1)
2)
3)

FX,Y (−∞, −∞) = FX,Y (−∞, y) = FX,Y (x, −∞) = 0
FX,Y (∞, ∞) = 1
FX,Y is increasing, that means if x∗ ≥ x and y ∗ ≥ y , then
0 ≤ P r(x < X ≤ x∗ , y < Y ≤ y ∗ )
= FX,Y (x∗ , y ∗ ) − FX,Y (x∗ , y) − FX,Y (x, y ∗ ) + FX,Y (x, y)

When X and Y are jointly discrete, we have the following relationship between the joint distribution function and the joint probability mass function:
FX,Y (x, y) =

∑∑

t1 ≤x t2 ≤y

20

pX,Y (t1 , t2 )

Let X and Y be continuous random variables with joint distribution
function FX,Y (x, y)
...
e
...
i
...
, i
...
independent and identically distributed (random variables)
...
v
...
If the domain of (X, Y ) is rectangular, that is, there exist constants
a, b, c, and d such that a ≤ x ≤ b, c ≤ y ≤ d
2
...


Conditional probability: Let A and B be two events, with P r(B) > 0
...
The
conditional pmf for X given Y=y is then given by
pX|Y (x|y) = P r(X = x|Y = y) =

pX,Y (x, y)
P r({X = x} ∩ {Y = y})
=
P r(Y = y)
pY (y)

23

Conditional pdf Let the continuous random variables X and Y have joint
pdf fX,Y (x, y), and marginal pdf's fX (x) and fY (y), respectively
...
t
...
v's

Let X and Y be continuous r
...
and consider U = h(X, Y )
...
Determine the region U = u in the (x, y)-space
...

2
...

3
...

4
...


The discrete case: Let X and Y be discrete random variables with joint
pmf pX,Y (x, y)
...

The objective is to nd:
MU (t) = E(etU ) = E(et·h(X1 ,X2 ,··· ,Xn ) )

The method of moment generating functions is best suited for situations
where X1 , X2 , · · · , Xn is a collection of independent random variables and
where the function U = h(X1 , X2 , · · · , Xn ) is a linear combination of the X s
...


The bivariate case:

Let X and Y be jointly distributed with pmf
pX,Y (x, y) or pdf fX,Y (x, y), and consider the function g(X, Y )
...


Linear dependency: If lower values of X tend to appear with lower values
of Y, and if higher values of X tend to appear with higher values of Y it is
clear that we have a dependency
...
e
...


Covariance:

Let X and Y be jointly distributed with expectations µX
and µY , respectively
...
Then

E[g(X, Y )] = E[g1 (X)g2 (Y )] = E[g1 (X)]E[g2 (Y )]

27

Corollary: Uncorrelated
...
Let X and Y be independent random variables
...


Denition: The correlation coecient of X and Y is dened by
σXY = Corr(X, Y ) = √

σXY
Cov(X, Y )

=
σX · σY
V ar(X) · V arY

Cov(X, Y ) depends on the scales on which X and Y are measured
...
e
...


Conclusion
1
...

2
...
If X and Y are independent it follows that
ρX,Y = 0
...
The closer ρX,Y is to -1 or 1, the stronger is the linear dependence between X and Y
...
ρX,Y = ±1 means that the linear dependence between X and Y is perfect,that means it is in the form of a straight line
...
It then follows that
(

V ar

n


)

Xi

=

i=1

n

i=1

29

V ar(Xi )

11

Conditional expectations

A conditional probability distribution is in itself a proper probability
distribution
∑
 ∀y h(y)pY |X (y|x)


E(h(Y )|X = x)

discrete case

∫ ∞

h(y)fY |X (y|x)dy
−∞

continuous case

Conditional variance: Let X and Y be jointly distributed random variables
...
It then holds that
2

V ar(Y ) = E[V ar(Y |X)] + V ar[E(Y |X)]

30

12

The Law of Large Numbers

It is commonly believed that if a fair coin is tossed many times and the propor1
tion of heads is calculated, that proportion will be close to 2
...
He tossed a coin 10,000 times and observed
5067 heads
...


Denition: If

ˆ
ˆ
E(θ) = θ, then θ is said to be an

unbiased estimator of

θ
...


Denition: If for any ϵ > 0

(

)
ˆ
θ−θ ≤ϵ

lim P r

n→∞

or

(

)
ˆ
θ−θ >ϵ

lim P r

n→∞

=1

=0

ˆ
then θ is said to be a consistent estimator of θ
...


Theorem: The Law of Large Numbers Let X1 , X2 , · · · be a sequence of
i
...
d
...
Then
∑n

¯
Xn =

i=1

n

Xi

→ µ as n → ∞
p

31

13

Convergence in distribution

Denition: Let

X1 , X2 , · · · be a sequence of random variables with cdf 's

F1 , F2 , · · · , respectively, and let X be a random variable with cdf F
...
Notation
...

Let an → a an n → ∞, then

(

lim

n→∞

a n )n
= ea
1+
n

Steps in nding the convergence in distribution
1
...

2
...

3
...

4
...


Theorem: Let F1 , F2 , · · · be a sequence of cdf 's with corresponding mgf 's
M1 , M2 , · · ·
...
If
lim Mn (t) = M (t)

n→∞

∀t in an open interval including zero, then Xn → Xn as n → ∞
...
e
...
For example,
with Gamma distribution we standardize:
Zn =


Xn − n/λ
Xn − µXn
λ
= √
= √ Xn − n
σXn
n/λ
n

32

14

The Central Limit Theorem

Theorem: The Central Limit Theorem (CLT)
...
i
...
ran-

dom variables with mean µ, variance σ 2 , and mgf MX (t) and set Sn =
X1 + X2 + · · · + Xn
...


Rule of thumb
...


33

15

Other Important Laws

The multiplicative law:
P (A ∩ B) = P (A)P (B|A),

where P (A) is the unconditional probability of A and P (B|A) is the probability of B given that A has occurred
...
We further assume that Y is itself a
random variable
...

The marginal distribution of X
1
...
We then nd the marginal distribution of X, fX (x), by integrating (or
summing) over possible values Y
...

P r(A ∪ B) = P r(A) + P r(B) − P r(A ∩ B) = P r(A) + P r(B) − P r(A) · P r(B)

34


Title: Probability and Statistics
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.