Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Probability and Statistics
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
Statistics notes
Andre Sostar
November 27, 2014
Contents
1 Discrete Random Variables
5
Bernoulli Random Variables
6
The Binomial Distribution
7
The Geometric Distribution
8
The Negative Binomial Distribution
8
The Hypergeometric Distribution
9
The Poisson Distribution
9
2 Continuous Random Variables
10
The Exponential Density
11
The Gamma Density
11
The Beta Density
12
The Normal Distribution
12
3 Functions of a Random Variable
13
The cdf Method
13
The Transformation Method
13
1
Generating pseudorandom variables
4 The Expected Value
13
15
The Expected Value of a random variable
15
Calculating Expected Values
15
Expected values of functions of random variables
15
5 The Variance
16
The Variance of a random variable
16
Chebyshev's inequality
17
6 Moment Generating Functions
18
Moments
18
Moment Generating Functions
18
7 Joint and Marginal Distributions
20
Multivariate Random Variables
20
Continuous r
...
joint pdf
20
Marginal cdf and marginal pdf
21
8 Independent Random Variables
23
Independent Random Variables
23
Conditional probability mass function
23
Conditional probability density function
24
The Law of Total Probability
24
9 Functions of Jointly Distributed r
...
i
...
r
...
25
Expected values functions of jointly distributed r
...
10 Covariance and Correlation
26
27
The concept of dependence
27
Linear dependency
27
Covariance
27
Covariance
27
The Correlation Coecient
28
Covariance of linear combinations
28
11 Conditional expectations
30
Conditional expectations
30
Conditional variance
30
12 The Law of Large Numbers
31
Unbiasedness
31
Consistency
31
The Law of Large Numbers
31
13 Convergence in distribution
32
Convergence in distribution
32
Theorem proving Convergence in distribution
32
Standardizing
32
14 The Central Limit Theorem
33
3
The Central Limit Theorem
15 Other Important Laws
33
34
The Multiplicative Law
34
The Multiplicative Law (bivariate)
34
The conditional pmf/pdf
34
Hierarchic Models
34
Independent Addition Law
34
4
1
Discrete Random Variables
Denition: The probability that Y takes on the value y,
P (Y = y), is
dened as the sum of the probabilities of all sample points in S that are
assigned the value y
...
For any discrete probability distribution, the following must be true:
1
...
2
...
Denition: A random variable on Ω is a real valued function X : Ω → R,
that is a function that assigns a real value to each possible outcome in Ω
...
Denition: The probability mass function (pmf ), is the probability
measure on the sample space that determines the probabilities of the various
values of X; is those values are denoted by x1 , x2 , · · · , then there is a function
p such that
p(xi ) = P (X = xi )
and
∑
p(xi ) = 1
...
Question 1
...
g
...
If so, is the draw done with or without replacement, i
...
is the
probability of success always the same?
Question 3
...
In what way does our random variable evaluate the outcome
of the random experiment, i
...
the drawn balls?
5
The cumulative distribution function (cdf ) of a random variable, is
dened to be
F (x) = P (X ≤ x),
−∞
The cumulative distribution function is non-decreasing and satises
lim F (x) = 0 and lim F (x) = 1
...
It's function is thus
{
px (1 − p)1−x ,
p(x) =
0,
if x = 0 or x = 1
otherwise
A Bernoulli random variable might take on the value 1 or 0 according to
whether a guess was a success or a failure
...
We draw one ball and the random variable is
X = The number of "successes" among the drawn balls
6
The Binomial Distribution: Any particular sequence of k successes occurs
with probability
pk (1 − p)n−k ,
from the multiplication principle
...
P (X = k) is thus the probability of any
particular sequence times the number of such sequences:
( )
n k
p(k) =
p (1 − p)n−k
k
The draw is done with replacement, the number of balls to be drawn xed,
and the random variable is
X = The number of "successes" among the drawn balls
A binomial experiment possesses the following properties:
1
...
2
...
3
...
The probability of a failure is
equal to q = (1 − p)
...
The trials are independent
...
The random variable of interest is Y, the number of successes observed
during the n trials
...
So that X = k, there mus be k − 1 failures followed by as success
...
The last trial is a success, and the remaining r − 1 successes can
(
)
r
be assigned to the remaining k − 1 trials in
k−r
k−1
r−1
ways
...
We now draw balls until
the r'th "success"
8
The Hypergeometric Distribution: Let X denote the number of black
balls drawn when taking m balls without replacement
...
The draw is done without replacement, the number of balls to be drawn
xed, and the random variable is
X = The number of "successes" among the drawn balls
The Poisson frequency function with parameter λ(λ > 0) is
P (X = k) =
λk −λ
e , k = 1, 2, 3, · · ·
k!
The Poisson Distribution can be derived as the limit of a a binomial
distribution as the number of trials, n, approaches innity and the probability
of success on each trial, p, approaches zero in such a way that np = λ
...
Cumulative distribution function (cdf )of X is the function
[0, 1] with
F : R →
F (x) = P r(X ≤ x) = P r(X ∈ (−∞, x)
Basic properties of a cdf:
• F is a non-decreasing function
• F (−∞) ≡ limx→−∞ F(x) = 0,
F(∞) ≡ limx→∞ F(x) = 1
• P r(a < X ≤ b) = F (b) − F (a), a, b ∈ R, a < b
The p'th quantile of the distribution is dened to be that (in this situation
unique) value xp such that
F (xp ) = P r(X ≤ xp ) = p ⇔ xp = F −1 (p)
• Median, η , splits the distribution to
50
50
,
F (x0
...
5 ⇔ η = F −1 (0
...
Γ(α)
g(x) =
The gamma function, Γ(α), is dened as
∫
Γ(α) =
∞
y α−1 e−y dy, y > 0
0
11
The Beta Density function depends on two parameters,α and β :
xα−1 (1 − x)β−1
, 0
f (b) =
The beta function, B(α, β), is dened as the integral
∫
1
y α−1 (1 − y)β−1 dy =
B(α, β) =
0
Γ(α)Γ(β)
Γ(α + β)
The Normal Distribution function depends on two parameters,µ and
(where −∞ < µ < ∞, > 0):
−(x−µ)2
1
f (x) = √ e 2σ2 ,
σ 2π
µ is called
−∞
mean and σis called standard deviation
...
1) Use the fact that Y
= g(X) to express FY in terms of FX
2) In order to nd fy , we dierentiate FY using the chain rule
...
Use the given information to nd the pdf of Y
...
The Transformation Method: When the function Y = g(X) is strictly
monotone, the cdf method can be formalized
...
Any linear transformation of a Normal random variable is itself Normal
...
Suppose that f (x) = 0 if x is not in I
...
Here g −1 is the inverse function of g ; that is, g −1 (y) = x if y = g(x)
...
Proposition D Let U be uniform on [0, 1], and let X = F −1 (U )
...
These numbers are called pseudorandom because they are generated according
to some rule or algorithm and thus are not really random
...
The possible values of the random variable are weighted
by their probabilities, as specied in the following denition
...
A standard technique:
1) Try to merge x and the pmf/pdf of X
...
3) Since the summand/integrand is a pdf/pmf it sums to 1
...
Expected values of functions of random variables: If we are interested
in nding E(Y ) = E[g(X)]
...
Then the expected value of g(Y ) is given by
E[g(y)] =
∑
g(y)p(y)
∀y
Denition: If X is a random variable with expected value E(X), the variance
of X is
V ar(X) = E{[X − E(X)]2 }
provided that the expectation exists
...
Variance is dened as the expected squared distance between X and E(X)
...
µY = aµX + b
2
2
σY = a2 σX
A good example would be Fahrenheit-Celsius transformation
...
Then, for any
t > 0,
P r(|X − µ| > t) ≤
σ2
t2
Theorem says that if σ 2 is very small, there is a high probability that X will
not deviate much from µ
...
k2
or,
6
Moment Generating Functions
The k'th moment of a random variable Y taken about the origin is dened
to be E(Y k ) and is denoted by µ′k
...
Moment:
by
E(X) is the rst moment of X
...
The
k 'th central moment of X is dened by
∫
E(X − µ) =
k
∞
−∞
(x − µ)k fX (x)dx
The moment-generating function (mgf) of a random variable X is M (t) =
E(etX ) if the expectation is dened
...
−∞
Theorem: If m(t) exists, then for any positive integer k,
dk m(t)
dtk
= m(k) (0) = µ′k
k=0
In other words, if you nd the k'th derivative of m(t) with respect to t and
then set t = 0, the result will be µ′k
In summary, a moment-generating function is a mathematical expression
that sometimes (but not always) provides an easy way to nd moments associated with random variables
...
18
Linear transformation: Let X have mgf
constants
...
The joint probability mass
function of X and Y is dened by
pX,Y (x, y) = P r(X = x, Y = y),
x, y ∈ R
If X and Y are discrete random variables with joint probability function
p(x, y), then
1
...
2
...
The marginal probability mass functions of X and Y can be derived via:
pX (x) =
∑
pX,Y (x, y) and pY (y) =
y
∑
pX,Y (x, y)
x
The joint cumulative distribution function (cdf) of two random variables X and Y is dened by
FX,Y (x, y) = P r(X ≤ x, Y ≤ y),
x, y ∈ R
For the joint distribution function it holds that
1)
2)
3)
FX,Y (−∞, −∞) = FX,Y (−∞, y) = FX,Y (x, −∞) = 0
FX,Y (∞, ∞) = 1
FX,Y is increasing, that means if x∗ ≥ x and y ∗ ≥ y , then
0 ≤ P r(x < X ≤ x∗ , y < Y ≤ y ∗ )
= FX,Y (x∗ , y ∗ ) − FX,Y (x∗ , y) − FX,Y (x, y ∗ ) + FX,Y (x, y)
When X and Y are jointly discrete, we have the following relationship between the joint distribution function and the joint probability mass function:
FX,Y (x, y) =
∑∑
t1 ≤x t2 ≤y
20
pX,Y (t1 , t2 )
Let X and Y be continuous random variables with joint distribution
function FX,Y (x, y)
...
e
...
i
...
, i
...
independent and identically distributed (random variables)
...
v
...
If the domain of (X, Y ) is rectangular, that is, there exist constants
a, b, c, and d such that a ≤ x ≤ b, c ≤ y ≤ d
2
...
Conditional probability: Let A and B be two events, with P r(B) > 0
...
The
conditional pmf for X given Y=y is then given by
pX|Y (x|y) = P r(X = x|Y = y) =
pX,Y (x, y)
P r({X = x} ∩ {Y = y})
=
P r(Y = y)
pY (y)
23
Conditional pdf Let the continuous random variables X and Y have joint
pdf fX,Y (x, y), and marginal pdf's fX (x) and fY (y), respectively
...
t
...
v's
Let X and Y be continuous r
...
and consider U = h(X, Y )
...
Determine the region U = u in the (x, y)-space
...
2
...
3
...
4
...
The discrete case: Let X and Y be discrete random variables with joint
pmf pX,Y (x, y)
...
The objective is to nd:
MU (t) = E(etU ) = E(et·h(X1 ,X2 ,··· ,Xn ) )
The method of moment generating functions is best suited for situations
where X1 , X2 , · · · , Xn is a collection of independent random variables and
where the function U = h(X1 , X2 , · · · , Xn ) is a linear combination of the X s
...
The bivariate case:
Let X and Y be jointly distributed with pmf
pX,Y (x, y) or pdf fX,Y (x, y), and consider the function g(X, Y )
...
Linear dependency: If lower values of X tend to appear with lower values
of Y, and if higher values of X tend to appear with higher values of Y it is
clear that we have a dependency
...
e
...
Covariance:
Let X and Y be jointly distributed with expectations µX
and µY , respectively
...
Then
E[g(X, Y )] = E[g1 (X)g2 (Y )] = E[g1 (X)]E[g2 (Y )]
27
Corollary: Uncorrelated
...
Let X and Y be independent random variables
...
Denition: The correlation coecient of X and Y is dened by
σXY = Corr(X, Y ) = √
σXY
Cov(X, Y )
√
=
σX · σY
V ar(X) · V arY
Cov(X, Y ) depends on the scales on which X and Y are measured
...
e
...
Conclusion
1
...
2
...
If X and Y are independent it follows that
ρX,Y = 0
...
The closer ρX,Y is to -1 or 1, the stronger is the linear dependence between X and Y
...
ρX,Y = ±1 means that the linear dependence between X and Y is perfect,that means it is in the form of a straight line
...
It then follows that
(
V ar
n
∑
)
Xi
=
i=1
n
∑
i=1
29
V ar(Xi )
11
Conditional expectations
A conditional probability distribution is in itself a proper probability
distribution
∑
∀y h(y)pY |X (y|x)
E(h(Y )|X = x)
discrete case
∫ ∞
h(y)fY |X (y|x)dy
−∞
continuous case
Conditional variance: Let X and Y be jointly distributed random variables
...
It then holds that
2
V ar(Y ) = E[V ar(Y |X)] + V ar[E(Y |X)]
30
12
The Law of Large Numbers
It is commonly believed that if a fair coin is tossed many times and the propor1
tion of heads is calculated, that proportion will be close to 2
...
He tossed a coin 10,000 times and observed
5067 heads
...
Denition: If
ˆ
ˆ
E(θ) = θ, then θ is said to be an
unbiased estimator of
θ
...
Denition: If for any ϵ > 0
(
)
ˆ
θ−θ ≤ϵ
lim P r
n→∞
or
(
)
ˆ
θ−θ >ϵ
lim P r
n→∞
=1
=0
ˆ
then θ is said to be a consistent estimator of θ
...
Theorem: The Law of Large Numbers Let X1 , X2 , · · · be a sequence of
i
...
d
...
Then
∑n
¯
Xn =
i=1
n
Xi
→ µ as n → ∞
p
31
13
Convergence in distribution
Denition: Let
X1 , X2 , · · · be a sequence of random variables with cdf 's
F1 , F2 , · · · , respectively, and let X be a random variable with cdf F
...
Notation
...
Let an → a an n → ∞, then
(
lim
n→∞
a n )n
= ea
1+
n
Steps in nding the convergence in distribution
1
...
2
...
3
...
4
...
Theorem: Let F1 , F2 , · · · be a sequence of cdf 's with corresponding mgf 's
M1 , M2 , · · ·
...
If
lim Mn (t) = M (t)
n→∞
∀t in an open interval including zero, then Xn → Xn as n → ∞
...
e
...
For example,
with Gamma distribution we standardize:
Zn =
√
Xn − n/λ
Xn − µXn
λ
= √
= √ Xn − n
σXn
n/λ
n
32
14
The Central Limit Theorem
Theorem: The Central Limit Theorem (CLT)
...
i
...
ran-
dom variables with mean µ, variance σ 2 , and mgf MX (t) and set Sn =
X1 + X2 + · · · + Xn
...
Rule of thumb
...
33
15
Other Important Laws
The multiplicative law:
P (A ∩ B) = P (A)P (B|A),
where P (A) is the unconditional probability of A and P (B|A) is the probability of B given that A has occurred
...
We further assume that Y is itself a
random variable
...
The marginal distribution of X
1
...
We then nd the marginal distribution of X, fX (x), by integrating (or
summing) over possible values Y
...
P r(A ∪ B) = P r(A) + P r(B) − P r(A ∩ B) = P r(A) + P r(B) − P r(A) · P r(B)
34
Title: Probability and Statistics
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.
Description: Basic overview of the most important concepts, definitions, facts and lemmas of probability and statistic. Extra good for reviewing for an exam.