Foundations of Stochastic Modeling - 2010 Lecture Notes | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Foundations of Stochastic Modeling - 2010 Lecture Notes
Description: Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations • Stochastic processes o Notes from Prof Assaf Zeevi's "Foundations of Stochastic Modelling".

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Foundations of Stochastic Modeling

Page 1

FOUNDATIONS OF STOCHASTIC
MODELING
LECTURE 1 – 19th January 2010

Probability Preliminaries


Set operations
o

When we talk of a probability space, we quote a triple (W,  ,  ) , where


W is the space of outcomes



 is a s -field – the space of “admissible sets/events”
...



 is a probability distribution/measure based on which statements are
made about the probability of various events
...
In particular,
for A Î  , (A) =  (w Î W : w Î A)
...
We will assume this always
hold throughout this course and omit this condition
...
Then  ( n An ) = 1
...

o



Definition (limsup and liminf): Let {an } be a real-value sequence
...
Similarly
lim infn an := supm infn ³m an

Graphically (source: Wikipedia):
an
lim supn ¥ an
lim infn ¥ an

n
Note that these limits need not be finite
...
If
lim supn an > a , then an > a infinitely often – in other words, there are
infinitely many elements in the sequence above a
...
and lim inf an < a  an < a i
...

We now attempt to extend this definition to sets
...
Then

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

{A }
n

be a sequence of

Foundations of Stochastic Modeling

Page 3

lim supn An := m n ³m An = limm ¥ n ³m An
lim infn An := m n ³m An = limm ¥ n ³m An

We can interpret these as follows
...
o
...

{lim inf

n

An } = {w Î W : w Î An ev
...

Remark: Take {An , ev
...
Then

{A ,
n

c
ev
...
o
...

Proof

of

(i):

Let

us

Bm =  n ³m An
...
In other words, the sets Bm increase monotonically to
B =  m Bm =

 
m

n ³m

An = lim infn An

Since the events are increasing, a simple form of monotone convergence gives us
that  (Bn )   (B )
...
Thus
 (Bm ) £ infn ³m  (An )

supm  (Bm ) £ supm infn ³m  (An )
 (B ) £ lim infn  (An )

And given how we have defined B:
 (lim infn An ) £ lim infn  (An )

As required
...
Then

{A } be a sequence of
 (A i
...
) = 0
...
1 We offer two proofs – the first is somewhat
mechanical, the second is more intuitive
...



The second proof will require a lemma:
Lemma (Fubini’s): If {Xn } is a sequence of real-valued random variables with

X n ³ 0 , then

1

 ( å n Xn ) = å n  (Xn ) (which could be infinite)
...
o
...
To see

this more clearly, note that the first statement is simply a shorthand for  (w Î W : w Î An i
...
) = 0
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 5

effectively a condition under which we can exchange a summation and an
integration
...
We also have that

å  (A ) = å  ( )
=  (å  )
n

n

An

n

n

An

<¥

å

This means that the random variable

n

A

must be less than or equal to

n

infinity for every outcome in W ; in other words

å

n

A < ¥

a
...

n

This must mean that A = 1 only finitely many times
...

o



Definition (independence): A sequence of random variables

{X , , X }
n

1

are

independent if and only if
 (X1 £ x1 , , X n £ x n ) = i =1  (X i £ x i )
n

For all x 1 , , x n Î 
...

o

Proposition (second Borrel-Cantelli Lemma): Let

independent measurable events such that

c

n

n

be a sequence of

å  (A ) = ¥ , then  (A
n

n

Proof: We showed in a remark above that

{A , i
...

= m n ³m Anc
Let us fix m > 1
...
o
...

Foundations of Stochastic Modeling


Page 6

(

n ³m

)

( )

Anc = n ³m  Anc

= n ³m 1 -  (An )

£ n ³m exp (-(An ))

(

)

= exp -å n ³m (An )
=0

But we have that


( 
m

n ³m

)

Anc £ å m 

(

n ³m

Anc

)

=0
As required
...

Let us understand these two statements intuitively


The first statement tells us that for every single outcome in the sample
space W , the sequence of random variables Xn tends to X
...

o

Example: Consider the sequence X n = n1 U [0,1]
...

Proof: In this case, W = [0,1]
...



Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

o

Page 7

Definition (convergence in probability): Xn p X as n  ¥ if, for all

e>0

(

)

 Xn - X > e  0

as n  ¥

Let us understand this concept as compared to convergence almost surely
...

This, however, allows the possibility that for some outcomes

w,

X n (w)  X (w) infinitely often, provided that the probability of these w
is small and gets smaller
...
For every outcome, we require
X n (w)  X (w)
...
o
...
If An is the event that “Xn is far from X”, convergence in
probability requires
requires that

å
o

n

limn ¥ (An ) = 0 , whereas convergence almost surely

(An i
...
) = 0

– which would, for example, be satisfied if

(An ) < ¥
...
Imagine carrying out an
“experiment” by generating random variables X1 , X 2 , 
...



If X n  X almost surely, then it is certain that in every experiment we
might carry out, the values X1 (w), X 2 (w),  will tend to X (w )
...
The probability of

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 8

this happening decreases with the number of X generated, but it is still
possible that in certain experiments, it will happen
...
Let
{ n n}

us fix e Î (0,1) , and consider the definition

(

)

 X n - 0 > e =  (X n > e)
=  (U n £
1
=
n
0

1
n

)

This proves the Xn do indeed converge to 0 in probability
...
o
...
So Xn cannot be converging to 0 almost surely
...

o



Claim: Xn  X a
...
 Xn p X as n  ¥
...

Proof: Recall that convergence almost surely can be written as

(

)

 X n - X > e i
...
= 0

or in other words

(

}) = 0

{

 lim supn X n - X > e

We have, however, that lim infn An Í lim supn An
...
v
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011



Foundations of Stochastic Modeling

o

Page 9

Remark: It is interesting to note that if Xn p X , it is possible to find a

subsequence

k = 1, 2, 
...
s
...

Proof: Assume we have Xn
...
The
convergence of Xn almost surely follows by the BC Lemma
...
We then define random

variables X1 and X2 as follows:

ìï1
X1 (w) = ïí
ïï0
î

ì
ï0
X 2 (w ) = ï
í
ï
1
ï
î

w Î [0, 12 )
otherwise

w Î [0, 12 )
otherwise

Graphically:

X 1 (w )

X 2 (w )

0

w

1
2

0

1

w

1
2

1

Similarly, we define X3, X4 and X5 to be random variables divided into three
parts:
X 3 (w)

X 4 (w)

1

1

0

1
3

w
2
3

1

0

X 5 (w)

1
3

w
2
3

1

0

1
3

We continue this pattern to form a pyramid of random variables
...
Now, consider that  (X1 = 1) =  (X 2 = 1) = 12
...

However, if we fix any w Î [0,1] , X n (w) = 1 infinitely often (because every set of
random variables will involve at least one that is equal to 1 at that w )
...

o

Definition (convergence in expectation) – also called L1 convergence
...

1

Proof: For the first part, consider that

Xn = Xn - X + X £ Xn - X + X
 Xn - X £ Xn - X
Taking expectations yields the result
...
Taking limits yields the
answer required
...
Then for

all a > 0
 (X > a ) £

(X )
a

Proof:

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 11

(X ) =  (X , X > a ) +  (X , X £ a )

³  (X , X > a )
³ a  (X > a )

As required
...

1

Proof: Fix an e > 0 and use Markov’s Inequality

(

)

 Xn - X > e £

(

 Xn - X

e

)0

As required
...
Clearly, X n  0 a
...
(Note
{ n}

that this does not hold true for w = 0
...

However,

it

is

clear

that

 (Xn ) = n  (U £ n1 ) = 1
...
(This illustrates an interesting example where
taking limits inside and outside expectations gives a different result
...
)
o

Example: Let Xn be a sequence of IID random variables with exponential

distribution with parameter 1
...
s
...
Consider

 (X n > (1 + e)log n ) = e -(1+e ) log n
1
= 1+e
n
This means that

å  (X
n

n

> (1 + e)log n ) < ¥

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 12

So by BC-1,

 (Xn > (1 + e)log n, i
...
) = 0
As such,

Xn £ (1 + e)log n ev
...
o
...
s
...



Note: This only applies to the lim sup – to say that the actual sequence tends to

1 is nonsensical, because most of the Xn will indeed be very small!

LECTURE 2 – 26th January 2010



Interchange arguments
o

Example: In our last lecture, we considered the sequence of random variables

X n = n  U £ 1
...
We showed,
{ n}
however, that (X n ) = 1 for all n
...
It looks like certain conditions need to be
satisfied
...
(Clearly, in the example
above, the n that sits outside the indicator prevents us from bounding Xn)
...
Because X n  X a
...
, (Bn )  0
...



Remark: It is clear that this theorem still holds if Xn only converges in

probability to X rather than almost surely, because all that is required is for the
probability of rogue events to fall to 0
...

o

Lemma (Fatou): If {Xn } is a sequence of non-negative random variables, then

 (lim infn Xn ) £ lim infn (Xn )
(Note: there is no converse for lim sup)
...
Note that:


Ym  Y a
...



infn ³m (Xn ) ³ (infn ³m Xn ) = (Ym )
...

The

(w)(dw)
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

LHS

then

ò inf

n

has

value

X n (w)(dw)
...
So we can apply
bounded convergence:

lim inf Xn ³  (limn ¥ min[Ym , k ])
=  (min[Y , k ])

As k  ¥ , the last line tends to (Y ) = (lim infn Xn )
...

Proof: Since Xn £ Y , we have that Y + X n ³ 0 and Y - X n ³ 0
...

o

Theorem (MON – Monotone Convergence): Let



{X }
n

be a sequence of

random variables such that 0 £ X1 £ X 2 £  almost surely, then


X n  X (possibly ¥ ) almost surely
...

Proof: Since Xn > 0, we can apply Fatou’s Lemma, to get

lim infn (Xn ) ³ (lim infn X ) = (X )
However, the fact that Xn < X also gives (X n ) £ (X )  lim infn (Xn ) £ (X )
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011



Foundations of Stochastic Modeling

o

Page 15

{X }

Proposition (Frobini I): Let

n

be a sequence of random variables with

Xn ³ 0 for all n, then

 ( å n Xn ) = å n (Xn )
Proof: Define

Sn = å m =1 Xn  S = å m =1 X m
¥

n

(possibly ¥)

(The last step follows because the Xn are non-negative, which means the sequence

{S }
n

is non-decreasing almost surely – there is no outcome in the probability

space in which adding an extra term to the sum decreases it)
...
However,
 (Sn ) = 

(å

n
m =1

Xn

)

Finite n

n
= å m =1 (X n ) 

å

¥
m =1

(X n )

And

(S ) = 

(å

¥
m =1

Xm

)

Combining these two results yields

å

¥
m =1

(Xn ) = 

(å

¥
m =1

Xm

)

As required
...
Then

 ( å n Xn ) = å n (Xn )
Proof: As above, set S n =
Sn =

å

n
m =1

å

n
m =1

X m
...
s
...
If the sum
was equal to infinity of a set of non-zero measure, the expectation would blow
up
...
)
Now, note that

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 16

Sn - å m =1 Xm =

å
= å
£å

¥

Xm - å m =1 Xm
¥

n
m =1
¥

m =n +1

¥

m =n +1

Xn
Xm

0

a
...
as n  ¥

(The last line follows from the fact the sum is finite almost surely
...

The above implies that Sn 

å

¥

m =1

Xm

almost surely
...
First note that
Finite n

å m =1 (Xm ) = 

n

(å

n
m =1

)

X m = (Sn )

Now take limits of both sides
limn

å

(X m ) = å m =1 (X m ) = limn (Sn )
¥

n
m =1

By dominated convergence, we have

limn (Sn ) = (limn Sn ) = 

(å

¥
m =1

Xm

)

Combining the last two equations, we do indeed find that

å

¥
m =1

(Xm ) = 

(å

¥
m =1

Xm

)

As required
...
Is there a
“common thread” that runs through these conditions?

o

Definition (Uniform Integrability): A sequence of random variables {Xn } is

said to be uniformly integrable (u
...
) if for all e > 0 , there exists a K (e) < ¥
such that

(

)

(

supn  X n : X n > K (e) º supn  X n 

{ Xn >K (e )}

)£ e

Note that the supremum implies that the K we find must work uniformly for the
entire sequence – in other words, for a given e , there must be a single K (e)
which makes the tail expectation small for every n
...

o

Proposition: Let

{X }
n

be a sequence of random variables and suppose

X n  X almost surely


If {Xn } are uniformly integrable, then (X n )  (X )
...



{X }

If Xn > 0 for all n and (Xn )  (X ) < ¥ , then

n

are uniformly

integrable
...
We have almost identified the “bare
minimum” condition for interchange
...
i


£ supn (a + e)
<¥

Now, let Yn = Xn - X
...
Hence, {Yn }
is also uniformly integrable (since {Xn } is uniformly integrable, and X is
integrable)
...
s
...
s
...
Thus,
n
(Yn )  0 , as required
...
Thus, applying bounded

(

)

convergence, fa  éê Xn ùú
...

Thus, the RHS above can be made arbitrary close to the difference
between these two expectations
...
i
...



If supn  X n



If Xn £ Yn and {Yn } is uniformly integrable, then

{X }
n

is uniformly

integrable
...

Proof (part 1): We will prove the first part of the statement above
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 19

Proposition (Holder’s Inequality): Let X and Y be random variables

such that  X

p

< ¥ and  Y

q

< ¥
...
Then

(

 XY £  | X |p

) ( | Y | )
1/ p

q

1/q

When p = q = 2, this inequality is none other than the Cauchy-Schwartz
inequality
(XY ) £ (X 2 ) (Y 2 )
Proof: If X = 0 or Y = 0 a
...
, the inequality holds trivially
...
Suppose we are

able to show that, for every point w in the sample space,

X (w)Y (w)
xy

p

1 X (w )
1 Y (w )
£
+
p
p x
q yp

q

Then, taking expectations and rearranging, this becomes Holder’s
Inequality
...
The statement then becomes
ab £

a p aq
+
p
q

With a and b non-negative
...

The expression then follows trivially from the convexity of the
exponential function:
æ s t ö es et
exp çç + ÷÷÷ £ +
çè p q ÷ø p
q
We have therefore proved Holder’s Inequality
...

Proof:

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011



Foundations of Stochastic Modeling

Page 20

q
æ q
ö
æ q
ö
 X =  çç X ; X < 1÷÷ +  çç X ; X ³ 1÷÷
è
ø
è
ø
æ p
ö÷
£ 1 +  çç X ; X ³ 1÷
è
ø
£¥

As required
...
We
then use Holder’s Inequality to obtain

(

(

)

 Xn ; Xn > K =  Xn 
æ
£ çç  Xn
è

p

)
æ
çç 
çç ( {
è

{ Xn >K }
1/ p

÷ö
÷ø

1/q

ö÷
÷
Xn >K } ÷
÷ø

)

q

Taking a supremum

æ
 Xn ; Xn > K £ supn çç  Xn
è

(

)

p

1/ p

÷ö
÷ø

((X

> K ))

1/q

n

By the assumption in the theorem, the first expression is finite
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011



Foundations of Stochastic Modeling

Page 21

Proof (part 1 – alternative): We propose an alternative, somewhat simpler

proof to part 1:
p -1
æ
ö÷
ì
ï X n ïü
ï
ççç
ï
ïý ; X > a ÷÷÷
 X n ; X n > a £  ç X n ïí
÷÷
n
çç
ï
a ï
ï
ï
÷ø÷
ï
ï
î
þ
èç
1
= p-1  X n ; X n > a
a
K
= p-1
a

(

)

(

)

Setting a ³ K / a p-1 gives the required result
...
What condition is needed on
the distribution of the X for this to happen? Clearly, if the Xn are IID, this will
not be the case, since every additional variable in the sum add something to the
sum – IID variables are either “dead at the start” or “never die out”
...
Then

å

n

{X }
n

be a sequence of independent

X n converges if and only if for some (and therefore

every) K > 0, the following conditions hold:


å ( X
n

n

)

> K < ¥ (clearly, for example, if the Xn are IID without

bounded support, we’ll never be able to satisfy this)
...



å  (X



å

n

n

n

)

; Xn £ K < ¥

(

)

ar X n ; X n £ K < ¥

Note that there are no issues of existence of the expected value and the variance
in the last two points because these are taken over a finite range – namely on
X n Î (-K , K )
...
We will

cover it later in this course
...
If

{X }
n

n

be independent

ar(X n ) < ¥ , then

X m converges almost surely
...

o

Example: Set X n = Yn / n , where the Yn are IID exponential random variables

with mean 1
...
In light of the result we derived in the
previous lecture (that draws from an exponential will be arbitrarily large
infinitely often) one might expect this series not to converge
...



First condition:

(

)

 Xn > K = 

(

Yn
n

>K

)

=  (Yn > nK )
= e -nK

This does indeed sum to a finite number

å ( X
n

n

)

>K <¥

The first condition is therefore met
...
So, as we expected, Sn does not converge almost surely
...
The intuition here is

(-1)n / n does converge
...
We wonder whether this will
“spoil” the convergence
...



Second condition

(

)

1
 Yn ; Yn £ n
n
1
= (Yn )
n
2p - 1
=
n

(

 Xn ; Xn £ 1 =

We know, however, that

å

)

diverges, so the infinite sum of these

k
n

expectations only converges if p = ½, in which case the numerator is 0
...



Third condition (assuming p = ½)

(

)

ar X n ; X n £ 1 =
=
=

1
n2
1

(

ar Yn | Yn £ n

)

ar(Yn )
n2
(Y 2 ) - 0
n2

The infinite sum of the variances is therefore finite
...

o

Example: Set



Xn = Yn / n , where Yn are IID random variables with

probability ½ of being 1 and probably ½ of being –1
...
Since, in this case, (X n ) = 0 , we can simply apply the second
form of the 3-series theorem:

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 24

(

)

1
ar X n ; Yn £ n
n
(X 2 )
=
n

(

ar Xn ; X n £ 1 =

)

The infinite sum of these variances clearly diverges, and so Sn does not converge
...





The Strong Law of Large numbers
o

Theorem (SLLN): Let {Xn } be IID with  X1 < ¥ , then

1
n
X k  (X1 )
å
k =1
n

a
...

Note: there is a simpler proof of this theorem that assumes the fourth moment of
X1 is bounded
...

Proof: Let us fix n
...
We will first prove the
n

result for the truncated form, and later show that the result then carries
over to the non-truncated form
...
Note that
1
...
Since X1 has finite mean, it is the case that
X1 

{ X1 £n}

 X1 

{ X1 <¥}

= X1

a
...

(1) justifies using dominated convergence, and (2) gives us the limit
...
To do this, we will need two lemmas
...
Then
1
n
a  a¥
å
k =1 k
n

as n  ¥

Proof: Let e > 0 , and choose an N such that

ak > a¥ - e whenever k ³ N
Then
lim infn ¥

ì1
ü
ï
ï
1
n -N
n
N
ï
ï
³
+
lim
in
f
(
)
a
a
a
e
í
ý
å
å
k
n
k
¥
k =1
k =1
ï
ï
n
n
ï
ï
în
þ
³ 0 + a¥ - e

By a similar argument, lim sup £ a ¥
...

Lemma (Kronecker’s Lemma): Let

sequence
...
By the assumption in our proof,

u¥ = limn un exists
...
By the Cesaro Sum
Property, so does the second term
...



Using these two lemmas, our (rough) strategy will be as follows:


Step 3a: Show that

å

n

wk

converges almost surely, using the

k =1 k

Kolmogorov 3-series theorem
...

Step 3c: Note that since
1
n

å

å

n

wk

k =1 k

converges almost surely,

wk converges to 0 almost surely
...
Furthermore, the wk are independent of each other (because

they are build up from the Yk, which are built up from the Xk, which are
independent)
...
(Note that

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 27

some steps in the exposition below require some results about moments of
random variable which we will prove in a homework – these are indicated
by a *)
...
Thus, we
can write

å

¥
k =1

ar

( )£ò
wk
k

C1

¥
0

C2 + y

£C3 ò

¥

0

(

(

2y  X1 > y

 X1 > y

)

* = C 3  X1
<¥
So the sum does indeed converge
...
But by Step 3a, (A) = 1 , since A
occurs almost surely
...
s
...
s
...
All

that remains to be shown is that the result also applies to the full series

{X }
...

Thus, for large enough n, Xn = Yn almost surely
...

As such
1
n

å

n

Xk (w) - n1 å k =1Yk (w)  limn Xn (w) -Yn (w)
n

k =1

a
...

Thus, the SLLN holds for the original sequence Xn
...
Let Sn = å k =1 Xn
...
s

Foundations of Stochastic Modeling

Page 29

We might wonder, however, whether this still holds true if, instead of dividing by
n, we divide by some quantity an, where an  ¥ , but possibly slower than n
...

1 +e

Let us consider, for example, an = n (log n )2

...
Consider that

this is really very close to a

æ X ö÷
s2
ar ççç n ÷÷ =
1 +e
çè an ÷ø÷
n (log n )2
And this implies that

å

n

ar

( )<¥
Xn
an

Thus, by the Kolmogorov 3-series theorem,

å

Xn
n an

converges almost surely, and

by Kronecker’s Lemma, this means that S n / an  0
...

However, it is interesting to note that if we take an = n (slightly smaller), then
no almost sure result holds anymore
...
These arguments are clearly
extremely sensitive to scaling
...

Comments:


This is clearly a very difficult definition to use, because there are “too
many” bounded continuous function
...



If X n  X almost surely, then f (X n )  f (X ) , because f is continuous,
and  éêë f (Xn )ùúû   éêë f (X )ùúû , by bounded convergence (since the functions
are bounded)
...



Each member of

{X }

as well as X need not even live on the same

n

probability space
...

Note that this is radically different to the context of almost sure
convergence, which fixes an w in the probability space and requires
convergence
...

o

Definition (Weak convergence): A sequence of real valued random variables

{X }
n

converges weakly to X if and only if
Fn (x )  F (x )

for all continuity points of F, where Xn ~ Fn and X ~ F
...

o

Example: Take X1 , X 2 ,  IID, with (X1 ) = m and ar(X1 ) = s 2 , then

å

¥
i =1

(Xi - m)
n

 sN (0,1)

This is none other than the central limit theorem
...
We will prove this late in this lecture
...
Now, fix k < n (the number of people amongst whom we are looking
for a match), and let

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 31

Yn = min {i : X i = X j for some j < i }
This is effectively the smallest i for which Xi matches an earlier X
...
This is somewhat inconvenient – let’s try to use weak convergence
arguments to get an approximation to this instead
...
Similarly, an ~ bn if

= 1
...

(

)

Now, note that log(1 - x ) ~ -x as x  0
...
Therefore
æY
ö÷
êx n ú i - 1
ê
ú
log  ççç n > x ÷÷ ~ -å ië =2 û
÷
çè n
n
ø÷
ê
ú
1
êx n ú
= - å ië =2 û (i - 1)
n
11 2
xn
~n2
= -x 2 / 2
As such
æY
÷ö
 ççç n > x ÷÷  exp - 12 x 2
÷÷ø
çè n

(

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

)

Foundations of Stochastic Modeling

Page 32

The RHS is clearly a distribution, because it takes value 1 at x = 0 and decreases
thereafter, and we therefore have weak convergence by the second definition
...
For the lower bound:

ò

¥

x

e -y

2

æ1
¥æ
3ö 2
1ö 2
dy ³ ò ççç1 - 4 ÷÷÷e -y /2 dy = çç - 3 ÷÷÷e -x /2
x ç
çè x x ÷ø
y ÷ø
è

/2

(Which can be shown by differentiating the result)
...
Making this more specific
e
x

to a normally distributed random variable, suppose X ~ N (0, s 2 ) , then
 (X > x ) =
=
~

ò

1

¥
x

1

2ps
2

2

sò

exp-y
¥

x /s

2ps
1 s -x 2 /2s 2
e
2p x

2

/2 s 2

exp-y

2

/2

dy
dy

It is interesting to note, therefore, how similar our result looks to the central
limit theorem, despite the fact we did not center the original variables Yn
...
(In fact, the random variable with
this tail integral is a Rayleigh random variable)
...

o

Example:

Let

X1 , X 2 , 

be

IID

N (0, s 2 )

random

variables,

and

let

M n = max (X1 , , Xn )
...
We would like

to show that

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 33

{

 {an (M n - an ) £ x }  exp -e -x

}

"x Î 

We first note the RHS has no discontinuity points
...
Finally, we note
the RHS is monotone
...

Let us now show convergence:

(

 Mn £

x
an

+ an

)

(
é
= ê  (X
ë

=  X1 £

x
an

£

1

x
an

(

+ an , , Xn £

)

ù
+ an ú
û

é
= ê1 -  X1 >
ë

x
an

x
an

+ an

)

n

)

ù
+ an ú
û

n

We will now require a lemma
Lemma: Consider two (possibly non-real) sequences

{a }
n

and

{b }
...

This

is

precisely

(

an = - X1 >

x
an

the

situation

in

this

case,

with

bn = n  ¥

and

)

+ an  0 , since an  ¥
...
To do this, we will make extensive use of the Lemma in the previous
example (namely that  (X1 > x ) =

(

n ⋅  X1 >

x
an

)

1

+ an ~ n ⋅

=
=

=

2p

1

=

~

1
x 2p

⋅

x
an

e -x

2

/2

)

ìï
1
⋅ exp ïí- 12
ïîï
+ an

(

x
an

2ü
ï
+ an ïý
ïþï

{

2p
1

x
an

1
2
a2
n ⋅ exp - x 2 - x - 2n
2 an
+ an

2p
1

x
an

1
-x 2 /2 an2 -an2 /2
n ⋅ e -x e
e
+ an

2p

x
an

1
x

2p an
an
+ an

x
an
-x

)

}

1
1
-x 2 /2 an2
-a2 /2
n ⋅ e -x e
an 2p
e n
+ an
an 2p
1/n

=
2
2
1
-x -x /2 an
n ⋅e e
an 2p  (X1 > an )
+ an
e -x e

-x 2 /2 an2

e

As such, by the Lemma

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

(

 Mn £

Page 34

x
an

)

+ an

(

é
= ê1 -  X1 >
ë -x
 e -e

x
an

)

ù
+ an ú
û

n

As required
...
To see why,

with K > 0, and consider that by Markov’s Inequality

(

)

 Xn > K £
o

 Xn

K

£

<¥


supn  Xn

K

Proposition: Let {Xn } be a sequence of random variables


If Xn  X , then {Xn } is tight



If

{X }
n

is tight, then there exists a subsequence

{n }
k

such that

{X }
nk

converges weakly
...

o

Definition (Characteristic Function): A characteristic function (CF) of a

random variable X is given by

jX (q) =  éëê exp(iqX )ùûú

qÎ

Remarks


jX (0) = 1



jX (q) £ 1 , because by Jensen’s inequality, jX (q) £  exp(i qX ) = 1



jX(n ) (0) = i n (X n ) , provided  X



If X and Y are independent, jX +Y (q) = jX (q)jY (q) , though the converse

{

n

}

< ¥
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

o

Page 35

Examples: If X ~ N (m, s 2 ) , then

(

jX (q) = exp i mq - 12 s 2 q 2

)

Note also that if Y = m + sX , then

(

)

jY (q) =  e iq[ m+sX ] = e iqm + jX (sq)

This is convenient in relating all normally distributed random variables to the
standard normal
...
If


jX (q)  j(q) for all q Î 



j(q) is continuous at 0
...

Comments:


If Xn  X , then the two conditions in the proposition trivially hold –
this is really an “if and only if” statement
...
To
see

why,

consider,

for

example,

X n ~ N (0, n )
...
To see why,
consider that with K > 0,
æ
K ö÷
1
 X n > K =  çççN (0,1) >
÷÷÷ 
çè
2
nø

(

)

In other words, the distribution in question is not tight – tail probabilities
grown with n
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling


Page 36

The Central Limit Theorem
o

X1 , X 2 , 

Theorem: Let

(X1 ) = m

be IID with mean

and variance

s 2 = ar(X1 )
...
Let S n =

å

n
i =1

X i , and

consider its characteristic function
jS

n

/ n

(
(

)

é
ù
(q) =  ê exp i qS n / n ú
ë
û
n
é
iq
Xj
=  ê exp
n å j =1
ë
n
=  éê j =1 exp iq X j
n
ë
n
é
iq
Xj
=  j =1  ê exp
n
ë
n
= éêjX q ùú
n û
ë

(
(

( )

)ùúû

)ùúû
)ùúû

Now, let us take a Taylor Expansion

jX

q2
j ¢¢ (0) + Rn
2n X
n
q
q2 2
= jX (0) +
(X ) +
i (X 2 ) + Rn
2n
n
q 2s 2
= 1+ Rn
2n

( )= j
q

n

(0) +
X

q

jX¢ (0) +

Now, consider that
jS

n

( ( ))

(q) = jX
n

/

n

q

n

n

æ
ö
q s2
= ççç1 + Rn ÷÷÷
÷ø
çè
2n
2

= (1 + an )

bn

2 2
Now, suppose we can show that anbn = n éê q2sn + Rn ùú =
ë
û

q2s2
2

other words, that nRn  0 ) as n  ¥ , then we would have
jS

n

/

æ q 2 s 2 ö÷
çç÷÷
q

(
)
exp
çç
n
2 ÷ø
è
= j(q)

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

+ nRn 

q2s2
2

(or in

Foundations of Stochastic Modeling

Page 37

This is the characteristic function of N (0, s 2 ) , and it is continuous at q = 0
...

All we now need to do is to prove that nRn  0 as n  ¥
...
Let us apply expectations
n ö
æ n +1
n
÷÷
çç x
x
2
(
)
ix
÷÷
£  çç

 e ix - å m =0
÷÷
ç (n + 1) !
!
m!
n
÷ø
çè
n

Now, use Jensen’s Inequality on the LHS to obtain

( )

 e

i qX

2
2ö
æ
çç X q
æ 2 (iX q)m ÷ö
2 X q ÷÷
ç
÷÷ =  é f (X , q)ù
÷÷ £  çç
-  ççå m =0

êë
ûú
çç 3!
çè
m ! ÷ø
2 ÷÷÷
è
ø

Now, observe that if q  0 , f (X , q)  0 almost surely, since  X is bounded
...
We can therefore apply the

dominated convergence theorem to conclude that

 éëê f (X , q)ùûú  0

as q  0

More details?
But we have

(

Rn =  f (X ,

q
n

)

)

3
ì
ü
ï
ï
q
ï
2ï
X
2 q ï
ï
n
ï
= n ï
X
í
ý
ï
ï
n
3!
ï
ï
ï
ï
ï
ï
î
þ
3 3
ì
ü
ï
ï
q
2
ï
2 q ï
ïX n
ï
= í
X
ý
ï
n ïï
3!
ï
ï
ï
ï
ï
î
þ

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 38

Inside goes to 0 as n -> infinity for all theta, and is dominated by |X|^2 theta^2
which has finite expectation, hence by dominated convergence it all goes to 0
...



Odds & Ends
o

Proposition (The Skorohod Representation): Let

{X }
n

be a sequence of

random variables such that X n  X
...

Proof: Let Fn be the distribution of Xn and F be the distribution of X
...
Now, define

F -1 (p) = inf {x : F (x ) ³ p}
(This generalized inverse deals with cases in which there are discontinuities in the
distribution)
...

Now, take (W,  , ) = ([0,1],  , uniform distribution)
...
Let

X n¢ = Fn-1 (U )

X ¢ = F -1 (U )

Note that

 (X n¢ £ x ) =  (Fn¢(U ) £ x )

=  (U £ Fn (x ))
= Fn (x )
d

d

Similarly,  (X ¢ £ x ) = F (x )
...
Note
also, however, that by () , X n¢  X ¢ for all but continuity points in W
...
Thus, Xn¢  X ¢ almost surely
...
Let f be a function with  (X Î D f ) = 0 with Df being the set of
discontinuity points of f
...

LECTURE 4 – 9th February 2011



Introduction to Large Deviations
o

Consider a sequence of IID random variables,

Sn =

å

n
i =1

X1 , X 2 ,  ~ N (m, s 2 )
...
Fix a > m
...
It is easy to derive in the
case of a Gaussian variable – we seek to develop this result in more generality
...
The
central limit theorem would tell us that

Sn » nm + sZ n

Z ~ N (0,1)

A “natural” question, therefore, would be to consider

(Sn > nm + sK n )
The theory of large deviations goes even further – since a > m , we can write
(Sn > na ) = (Sn > n m + sKn )
In other words, the central limit theorem deals with “normal” deviations, of order
n from the mean
...
We will now see why the central limit theorem is powerless to deal with
these large deviations
...
Fix a > m
...
The central limit theorem is clearly

not equipped to deal with large deviations
...
Then for any a > m ,
1
log  (Sn > na )  -I (a )
n
Where

I (a ) = supqÎ éëêqa - log M (q)ùûú
I (a ) = supqÎ éêëqa - j(q)ùúû
(We denote j(q) = log M (q) )
...

Remark: The same conclusions hold if M (q ) exists in a neighborhood of the

origin (plus some mild technicalities)
...
In this case,

{

I (a ) = supq qa - qm - 12 q 2 s 2

This is simply a quadratic, with argmax q * =
I (a ) =

a -m

s2

}

...

o

Proof of Cramer’s Theorem: We will carry out this proof in two steps


Step 1: establish an upper bound

lim supn


1
log  (Sn > na ) £ -I (a )
n

Step 2: establish a lower bound

lim infn

1
log  (Sn > na ) ³ -I (a )
n

Fix a > m and q > 0
...

Lemma (Generalized Form of Markov’s Inequality): Let

g :  +   + be an increasing function such that  g(X1 ) < ¥
...



Now, consider that in our case
...
Note that we have used the
generalized form of Markov’s Inequality in the last line)
...
To do
this, we choose q such that

q Î arg supq>0 {qa - j(q)}
Note, however, that

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling


The

derivative

Page 43
of

our

objective,

at

q = 0,

is

given

by

a - j ¢(0) = a - m > 0
...

Together, these two points imply that extending our optimization
problem to q Î  will not change our optimum, since it necessarily lies in
the positive quadrant
...
Going
back to the above, we therefore find that

 (Sn > na ) £ exp (-nI (a ))
Precisely as requested
...
Fix a Î (0,1) and q1 , q2 Î 
...
Assume,

for notational convenience, that X has a density p(x) – this is not,
however, required
...
Suppose

g:

is such that

 g(X1 ) < ¥ , then
 ëêég(X )ûúù =

ò

g(x )p(x ) dx =

ò

g(x )

é
p(x )
p(x ) ùú
pq (x ) dx = q êêg(X1 )
pq (x )
pq (x ) úûú
ëê

Similarly,
é
p(X1 , , X n ) ùú
 éêëg(X1 , , X n )ùúû = q êêg(X1 , , X n )
pq (X1 , , X n ) úúû
êë

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 44

Where here, g : n  
...

Now, consider
é
ù
 (Sn > na ) =  ê {S >na } ú
n
ë
û
é
p(X1 , , X n ) ùú
= q êê {S >na }
p0 (X1 , , X n ) úúû
êë n
é
p(X i ) ùú
n
= q êê {S >na } i =1
n
pq (X i ) úûú
ëê
é
ù
p(X i )
n
ú
= q êê {S >na } i =1 qX
ú
n
i
e p(X i ) / M (q) úû
êë
é
ù
n
éM (q)ù
ê
ú
ê
ú
ë
û
ú
= q êê {S >na }
ú
n
ê n
exp q å i =1 X i ú
êë
úû
é
ù
= q ê {S >na } exp (-qSn + nj(q))ú
n
ë
û

(

)

Now, fix a d > 0
é
ù
 (Sn > na ) ³ q ê 
exp (-qSn + nj(q))ú
na
S
na
n
<
£
+
d
}
êë { n
úû
é
ù
³ q ê 
exp -q na + d n + nj(q) ú
na
S
na
n
<
£
+
d
}
êë { n
úû

( (

(

)

)

= e -n (qa -j(q ))e -qd n q na < Sn £ na + d n

)

This is starting to look like what we want – it is crucial, however, to
ensure that the last probability stays bounded away from 0 – if not, it
tends to negative infinity
...
Our hope is to choose our new distribution
(characterized by q ) to ensure we get what we want
...
Then the mean of Sn under our new
distribution would indeed be na, which gives us hope that the probability
above will be bounded away from 0
...
Under the conditions of
Cramer’s Theorem, it turns out this is also satisfied
...
Now, returning to our expression above,
1
q *d 1
log  (Sn > na ) = - éêq *a - j(q * )ùú + log q* na < Sn £ na + d n
ë
û
n
n n

(

)

Taking a lim inf,
lim infn

1
log  (Sn > na ) ³ - éêq *a - j(q * )ùú
ë
û
n

The last thing we now need to verify is that the q* required is indeed the
maximizer of the function above, to get I(a)
...
In this case, they are a = j ¢(q) ,
precisely as desired
...
These extra conditions are precisely those that ensure such a q does exist
...
Fix a > m
...
Using
this result, however, we can be more exact – we can find a sequence an such that
Sn eventually lies below n m + an almost surely
...
Let us
consider, for d > 0 , an = 2(1 + d )s 2n log n
...
With that choice of
envelope,

 (Sn > n m + an ) £ exp (-(1 + d )log n )
= n -(1+d )
This

{S

is

indeed

summable
...
Using a similar kind of argument, we could show

n

that

{S

n

³ n m - an } eventually almost surely
...

o

Lemma,



Moderate deviations theory
...
What about deviations in between?
Theorem

(Moderate

Deviations): Under the conditions of Cramer’s

Theorem, for any sequence an such that


an



an

n

n

¥
0

for any a > 0 ,
n

an2

log  (Sn > n m + a an )  -

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

a2
2s 2

Foundations of Stochastic Modeling

Page 47

Proof of upper bound: WLOG, set m = 0 and a = 1
...
Let us now use a Chernoff bound to
deduce that



(

Sn

>

n

an

n

) £ exp(q
=e

But j( nq ) ~

q2 s 2
2n 2

an

n

nj( nq ) -q

e

S
) éê exp(q nn )ùú
ë
û
a
n
n

, so



(

Sn
n

>

an

Sn

>

n

æ

2

) £ C exp ççççè q2sn

2

-q

an ö÷
÷÷
n ÷ø÷

Optimizing, we find



(

n

an
n

)

æ a2 ÷ö
ç
£ C exp çç- n 2 ÷÷÷
çè 2ns ø÷

As required
...

Remarks: If S0 = 0:


(Sn ) = n (X1 ) and ar(Sn ) = n ar(X1 )



By the Strong Law of Large Numbers, Sn » n (X1 )



Sn » n (X1 ) + s nN (0,1)

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

{X }
n

Foundations of Stochastic Modeling

o

Page 48

Question: Suppose T is a positive, integer valued random variable
...
We then have

(ST ) = 

(å

T
i =1

(

)

)

T
X i =   êé å i =1 X i | T úù =  (T [X1 ]) = (T )(X1 )
ë
û

Unfortunately, this answer is not particularly interesting or useful, simply
because in all “interesting” situations, T is allowed to depend on the behavior of
the system
...

o

Example: Consider a random walk defined as follows
IID ì
ï1
Xi ~ ï
í
ï1
ï
î
Sn

In this case,

n

with prob p
with prob 1 - p

p>

Sn = å i =1 Xi
n

1
2

 2p - 1 > 0 a
...
, and so Sn  ¥ almost surely
...

Furthermore,

by

definition,

ST = 0  (ST ) = 0
...
Thus, it is clear, in this case that
(ST ) ¹ (T )(X1 )
Clearly, therefore, there can be some issues
...

o

Definition (Filtration): We previously defined a probability space by the triplet

(W,  , ) , where

 represented possible events
...
This sigma algebra constrains all
the information that is available from knowing the value of the variables

X 0 , , Xn
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

is

a

filtration,

and

clearly,

Foundations of Stochastic Modeling

o

Page 49

Definition (Stopping time): Suppose T is a non-negative integer-valued random

variable
...
In other words, we require

{T = k } Î 
Example (Hitting Times): Let T = inf {n ³ 0 : X

"k

k

o

n

Î A}
...



If  Xi < ¥ and (T ) < ¥ , then (ST ) = (T )(X 0 )

Proof: Let ST =

å

T
i =1

X i = å i =1 X i {i £T }
...
Going back to our sum

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 50

(

)

¥
é
ù
(ST ) = å i =1  ê  Xi {i £T } | i -1 ú
êë
úû
¥
é
ù
= å i =1  ê {i £T }  (Xi | i -1 )ú
ë
û
¥
é
ù
= å i =1  ê {i £T }  (X1 )ú
ë
û
¥
= (X1 )å i =1  {i £T }

(

)

Using Fubini I again:

(ST ) = (X1 )

(å

¥
i =1

{i £T }

)

= (X1 )(T )


Second part: In this case, consider that, by the triangle inequality
ST £ å i =1 X i
T

By part 1, however,



(å

T
i =1

)

Xi = (T ) X1 < ¥

The result then follows by Fubini II
...
We let

Z n = k + Sn , with

0
–

intuitively, k is our initial wealth, and Zn is our wealth at time n
...
But consider also that, by definition
(ZT ) = 0 (ZT = 0) + N (ZT = N )

= N (ZT = N )

Now, unfortunately, the Xi are not non-negative
...
Let us now assume (T ) < ¥ (this will be proved
in homework 2)
...
Let T be a stopping time with respect to

{ } ,
n

and

such that (T 2 ) < ¥
...
We seek

to extend this to stopping times
...
Now

(å
=  (å

(ST2 ) = 

T

)
X ) + 2 ( å

Xi
i =1

T
i =1

2

2
i

T -1
i =1

å

T
j =i +1

Xi X j

)

But by Wald I,


(å

T
i =1

)

X i2 = (T )(X12 )
= s 2 (T )

Now, consider the second term



(å

T -1
i =1

å

T
j =i +1

) (å

Xi X j = 

¥
i =1

å

¥
j =i +1

Xi X j {T -1³i} {j £T }

)

Now, imagine it were possible to apply Fubini 2
...

As such, we have that {T ³ j } {T ³i +1} = {T ³ j }
...
X is bounded, and so X £ K

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 52

ST2 =

(å

£

(å

T

)
X )

Xi
i =1

2

2

T
i =1

i

£ (KT )2 = K 2T 2
This has a finite expectation, since T has a second moment
...
Then


S n Î n



Provided  Xi < ¥ , then  Sn £ n  X1 < ¥



 éëêSn | n -1 ùûú =  éëêSn -1 + Xn | n-1 ùûú = Sn -1 + (Xn )

Thus, if we let (Xn ) = 0 , then our random walk is a martingale
...
Then


Condition 1 holds
...

{ }
n

and put

Di = M i - M i -1
...
Indeed, it expresses our martingale as a
sum of mean-0 uncorrelated random variables
...



As we will see later, the boundedness condition is required to be able to
argue that why exactly is this needed?

 Di D j £  Di

2

 Dj

2

<¥

(Note: in this case, it would be enough to require the second moment of
each difference to be bounded, but it seems less clunky to require the
slightly more stringent uniform boundedness condition)
...

n

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011



Foundations of Stochastic Modeling

o

Example: Let

{X }

Page 54
(X i ) = 1

be IID, with

n

 Xi < ¥
...
This is a martingale:
n



Condition 1 satisfied
...
Let us consider some
simple examples
...
When is it the case

that (MT ) = (M 0 ) = 0 ? Effectively, we are asking when it is the case that
(ST ) = m (T )
This is precisely the subject matter of Wald’s First Identity
...
When is it

true that (MT ) = (M 0 ) = 0 ? Effectively, we are asking when it is the case that

(ST2 ) = s 2 (T )
This is precisely the subject matter of Wald’s Second Identity
...
Then for each m > 1
 (MT m ) =  (M 0 )
Where T  m = min(T , m )
...



Remark: Consider that



Consider that limn ¥ MT n  MT
...

By the theorem above, however,  éêëMT n ùúû =  éêëM 0 ùúû , which implies than
limn ¥  éëêMT n ùûú =  éëêM 0 ùûú
...

o

Corollary

II:

Let

MT n £ Z ,

with

(Z ) < ¥
...

o

Corollary III: Let (M n : n ³ 0) be a martingale with respect to

{ }
n

and T be

a stopping time such that (T ) < ¥
...

ë
û

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 56

Remark: This is a stronger version of Wald I
...
In the case of
a random walk, the X are exactly the martingale differences
...

Proof: We have

MT n =

å

T n
i =1

Di + M 0

M0 is integrable
...
By Fubini I, we can then swap the expectation and
sum, since the summands are positive
¥
¥
¥
é
ù
é
ù
å i =1 Di {T ³i} = å i =1  ê Di {T ³i} ú = å i =1  ê {T ³i}  éê Di | n -1 ùú ú
ë
û
ë
û
ë
û

Since the martingale differences are uniformly bounded bt C, this is < C (T )
...
As such, our
martingale is bounded by an integrable variable, and the OST holds by corollary
II
...

We now consider very large times T
...
Then
Mn  M ¥
for some finite bounded random variable M ¥
...
s
...
This is a random-walk like structure, even
n

though the Di may not be independent
...
If we succeed in showing that, we have
effectively shown that Mn/n converges to 0
...
Then
Mn
n

0

a
...

n
D
 =
Proof: Put M
å k =1 kk (where, as usual Dk = M k - M k -1
...

and  M
n
Thus, it is a martingale
...

n
 2£
M
å k =1
n

supn ³1 (Dk2 )

k2

<¥

This implies that

 2 < ¥  sup  M
 <¥
supn ³1  M
n
n ³1
n
(Notice that it is often easier to work with second moments rather than first
moments, despite the fact finiteness of second moments is a stronger condition

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 58

than finiteness for first moments)
...
s
...
Thus, (B ) = 1
...



Remark: When we proved the SLLN, we went to great pains to work with a

first moment only
...
It is interesting to note, however, that the scaling of Dk / k
is “overkill”
...
Carrying the entire

proof through, we obtain M n / n (1/2)+d  0 a
...
Thus, assuming more than first
moments has indeed resulted in a stronger conclusion
...

Proposition (Central Limit Theorem for Martingales): Let {M n : n ³ 0}

be a martingale with respect to {n } and put Vn = max1£i £n Di
...
This makes our martingale

“very similar” to a random walk
...
For a random walk Sn = å i =1Yi , it is
n

the

case

that

ar(Sn ) = n ar(Y1 )
...

o

Proposition

(Azuma-Hoeffding

Inequality):

Let

{M

n

: n ³ 0}

be

a

martingale with respect to {n } such that

M n - M n -1 = Dn £ cn

"n

Then, for all n > 0, l > 0

æ
ö÷
ç
l2
÷÷
 M n - M 0 ³ l £ 2 exp ççç÷
n
2÷
çè 2å k =1 ck ÷ø

(

)

Remark: Suppose that Dn £ c "k , then we can re-write this as

ì l2 ï
ü
ï
 M n - M 0 ³ l £ 2 exp ï
í- 2 ï
ý
ï
ï
ï
î 2c n ï
þ
ì x2 ï
ü
ï
ï
 M n - M 0 ³ xc n £ 2 exp í- ï
ý
ï
ï 2ï
ï
î
þ

(

)

(

)

Or, choosing l = kn log n , we obtain
æ k log n ö÷
÷÷
 M n - M 0 ³ kn log n £ 2 exp çççè
2c ÷ø

(

)

Choosing, for example, k = 4c produces a summable sequence, which can be
used to obtain an almost sure result
...
Let q > 0 , and write

(

Dk = - 12 ck 1 -

Dk
ck

) + c (1 + )
Dk

1
2

k

ck

Both terms in parentheses are non-negative (since the martingale differences are
bounded by ck) and add up to one
...
For expositional purposes, we

chose to present this material here instead
...
Recall, in
addition, that

e-x +e x
2

£ ex

2

/2

, which can be proved using a Taylor expansion
...
Substituting back into the equation, we obtain the required
n

result
...
Let B = (b1 , , bk ) , with bi Î  and k   = n
...
We wonder how many times the specific sequence B will appear in

X1n
...
Finding the distribution of this is difficult
...
But consider that since each X can
only be part of at most k of the inequalities
M i - M i -1 = [Rn ,k | i ] - [Rn ,k | i -1 ]
£k
Using the A-H inequality, we then have

æ l 2 ö÷
÷
 M n - M 0 > l £ 2 exp ççççè 2nk 2 ÷÷ø

æ l 2 ÷ö
÷
 Rn ,k - (Rn ,k ) > l £ 2 exp ççççè 2nk 2 ÷÷ø

æ x2 ö
 Rn ,k - (Rn ,k ) > xk n £ 2 exp ççç- ÷÷÷
èç 2 ÷ø

(

(

)

)

(

)

We can then construct a confidence interval
Rn ,k Î éê (Rn ,k ) - cak n , Rn ,k + cak n ùú
ë
û
Where ca is chosen to make the probability of the interval = 1 - a
...
Then, the only bounded solution to

Pf = f
is a constant vector (a multiple of 1)
...

Remark: We can think of the vector f as a function f :    , where  is the

set of states of the chain
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 62

Proof: Let M n = f (X n ) for an f that satisfies the assumption of the proposition
...
Let us first show that Mn is a martingale:


 M n =  f (Xn ) £ c



M n Î n = s(X1 , , X n ) (in fact, it only depends on the last Xn)
...

n

Note that

supn³1  M n < ¥ , because we have assumed that f is bounded
...
Suppose there exists x, y Î  such that

f (x ) < f (y )
Since {Xn : n ³ 0} is irreducible and recurrent
lim infn f (X n ) £ f (x )

a
...

lim supn f (X n ) ³ f (y )

a
...

This means, however, that lim infn f (Xn ) ¹ lim supn f (Xn ) , which contradicts the
convergence statement
...

Now, consider an “energy” function g :    + , with
dg (X (t ))
dt

4

£ -e

e>0

Some parts of this topic were covered at the end of the previous lecture
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 63

This is effectively a statement of the fact the energy of the system is “forced”
down to 0, since it is “constantly decreasing” As such, $t 0 (x , e) such that

g (X (t )) = 0 for all t ³ t0 ; in other words, our dynamical process is “pushed”
towards a “stable” state
...
We can write
our “equation of motion” as X n +1 - X n = f (X n )
...
This is effectively
the definition of a Markov process, provided en is independent of X0
...

Now consider the “energy” function – it would be too strong to ask for the
energy of the process to decrease along every path
...
Consider a Markov
chain {X n : n ³ 0} that is irreducible
...
For a finite state space, all we need is to check for solutions to

pP = p, p 1 = 1 ; indeed, the existence of such a p is associated with positive
recurrence
...
Here, we attempt to find simpler

conditions for stability to hold
...
Let K Í  be a set containing a finite number of states
...
Construct
M n = g(X n ) - g(x ) - å k =1 (Ag )(X k )
n -1

Where (Ag )(X k ) = X g(X k +1 ) - g(X k )
...
Now, consider the set K Í  in the
proposition
...
Tk  m is a bounded
stopping time so we can apply the OST
...
As such

å

(Tk m )-1

k =0

(Ag )(x k ) £ -e (Tk  m )

And so

- êé å k =k 0
ë

(T m )-1

(Ag )(xk )úù ³ e (Tk  m )
û

Combining our two inequalities

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 65

£0

(T m )-1
 éêg(XT m )ùú - g(x ) =  éê å k =k 0
(Ag )(x k )ùú
k
ë
û
ë
û
£ -ex éëêTk  m ùûú
-g(x ) £ -ex éêëTk  m úûù

And so
x éêëTk  m ùúû £ g(x ) / e
But 0 £ Tk  m  Tk and Tk < ¥
...

o

“Application”:

{Z }
n

Consider

the

stochastic



system

X n +1 = aX n + Z n +1

with

iid and X 0 = x
...
Use

g(x ) = x

x g(X1 ) = x ax + Z1 £ a x +  Z1
So if  Z1 < ¥ and a < 1 , then we might be able to make this work, because
we would then have

(

)

x g(X1 ) - g(x ) £ a - 1 x + (Z ) £ -e

provided X ³

 Z +e

1- a

Queuing


Sample path methods
o

We will consider a simple queuing model:


One single buffer, with unlimited space
...

This model is often called the G/G/1 queuing model; arrival times are arbitrarily
distributed,

workloads

are

arbitrarily

independent) and there is a single server
...
The
points in our sample space are w = {(tn , Sn ) : n Î } , with Sn > 0 and both Sn
and tn finite
...
Here is a pictorial representation of a single point, w Î W in our sample
space:
S1

S6

S2
S4

S3
t1
o

t2

S5

t3

t5

t4

t6

We assume the flow is stationary – in other words, qt w =dist w , for any w Î W ,
where qt w = (t + t, S ) and w = (t, S )
...

The positioning of the interval [0,1] does not matter, since the flow is stationary
...
The case r = 1 needs separate analysis
...
We will simply denote this as Wt
...

o

Now

ì
ï
a
ï
ï
ï
(a - t )+
ï
ï
ï
ï
(a - t )+ + S1
Wt (w) = í
ï
ï
(Wt - t - tn )+
ï
n
ï
+
ï
ï
ï
(
)
W
t
t
+ Sn +1
n -2
n
ï
ï
î tn

(

)

These are called Lindley’s Euqations
...
Define X ts (w) to be the work in the system at time t given
the system is empty at time s
...
This
is the “steady state” of the system; what we would observe if we had started the
system a very, very long time in the past
...
As such, we
have
lims -¥ X ts (qt w) = lims -¥ X ts++tt (w) = X t*+t (w) = X t* (qt w)
As such, X t* (w) =d X t* (qt w)
...

LECTURE 8 – 23rd March 2011

o

We have seen a number of properties of X t* , including the fact that it exists
...
We now look for conditions under which X t*
is finite
...
s
...

Proof: Fix t Î  and w Î W and define

{

}

Tts (w) = sup t < t : X ts (w) = 0

Intuitively, this is the last empty time before time t (this is clearly not a stopping
time; just a random time):

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 68

s

Tts (w)

t

Now, consider that (dropping the argument for X ts for simplicity)
Work in system
at Tts

X ts =


XTs s
t

Work processed
in [T s ,t ]

Work entering the
s

system in [Tt ,t ]
t



+ å n Î S n 1
- (t - Tts )
s

{t

n

= å n Î S n 1

{t

n

Î[Tts ,t ]

{t

n

Î[Tts ,t ]

£ å n Î S n 1

}

}

Î[Tt ,t ]

s

- (t - Tt )

}

¢

Now clearly, for s ¢ < s , Tts £ Tts ; in other words, Tts decreases as s decreases
...
This implies that lims -¥ Tts = Tt-¥
exists
...
This implies that there is some finite time
in the past at which the system was empty, which implies that the amount of
work at t, X t* , is finite
...
s
...
From the second line above, we have,
X ts = å n Î Sn 1

{t

}

Î[Tts ,t ]
n

- (t - Tts )

Now, however, that X ts ³ 0 (there can never be negative work in the system),
and so, dividing by t -Tts throughout, we obtain

å

n Î

Sn 1

{t

t - Tt

n

}

Î[Tts ,t ]

s

³1

Note that if we were to replace Tts with s, the LHS of this expression would form
a sequence with limit to r as s  -¥
...
But since we have assumed

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 69

Tts  -¥ as s  -¥ , the LHS above is such a subsequence
...
As such, Tt-¥ > -¥ a
...

and X t* < ¥ a
...

o



Definition (Coupling Time): Let Wta,t (w) denote the load in a system, at time
t and along sample path w , given that the system is allowed to start with load
a at time t
...
Then T0 , called the
coupling time, is the first time at which both the paths “couple” – past that
point, the two paths behave identically
...
s
...
Also assume, WLOG, that
a > X t* (w) (ie: the transient process starts higher than the steady-state process,

as pictured above)
...
Since the
transient process is more loaded than the steady-state process, this is effectively
the time at which the transient process first empties
...
This implies
that Wt a > 0 "t ³ 0 and that the server is constantly working for all t > 0
...
s
...
Consider the following estimator, based on a queue
started at an arbitrary point a
pˆt (A) =

1 t
1 a
ds
t ò0 {Ws ÎA}

(

)

As t grows, one might expect this to be a good estimator for  X t* Î A
...

(

)

Either way, we find that pˆt (A)   X t* Î A a
...
as t  ¥
...
Intuitively, this is because the chains eventually couple
...
This
is called convergence is total variation and is stronger than weak convergence
...
This generalizes straightforwardly
...
Now

{

}

{

 Wt a+t Î A1 ,Wt a+t Î A2 -  X t* Î A1 , X t* Î A2
1

{

2

1

}

{

2

}

=  Wt a+t Î A1 ,Wt a+t Î A2 -  X t* +t Î A1 , X t* +t Î A2
1

2

1

2

}

Calling the first event t and the second t , we can write this as
 (t ,T0 > t ) +  (t ,T0 £ t ) -  (t ,T0 > t ) -  (t ,T0 £ t )
Note, however, that if T0 £ t , the steady-state process is identical to the
transient process
...
s
...
Since this result
does not depend on the sets A1, A2, we can take a supremum over all such sets
and obtain our required result
...
Consider a finite-state,
irreducible Markov chain, and consider the quantity
 {X n = i | X 0 = j } - p {X n = i } £ maxi , j j {Ti > n }

(

£ maxi , j  j e

qTi

)e

-qn

Here, the first probability is taken with respect to a Markov chain that starts in
an arbitrary state j (the transient process)
...
Since the expectation of the moments of T is finite
(see homework 2), this implies that this difference falls exponentially fast
...
In

other words, the workload increases linearly with time
...
We then have

Wt a = a + å n Î Sn 1{t
³ å n Î Sn 1{t

n

n

Î[0,t ]}

Î[0,t ]}

-t

Dividing by t

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

- (t - I t )

Foundations of Stochastic Modeling

Page 72

Wt a

³

t

å

n Î

Sn 1{t

n

Î[0,t ]}

t

-1

Letting t  ¥ , we get that
lim inft ¥

Wt a
t

³ r -1 > 0

This is intuitively sensible – the amount that accumulates in the system is
whatever load there is in excess of 1
...
We let qn be the sojourn time of
the nth job in the system, given by dn - tn
...
For example, the
server could spend some of its time idling
...

We define the following quantities
A(t ) = sup {n : tn £ t } = number of arrivals in [0, t ]
D(t ) = number of of departures in [0, t ]
N (t ) = A(t ) - D(t ) = work in the system at time t
The only two assumptions we make is that the following two statements are true
for every sample path


o

limt ¥
limn ¥

A(t )
= l Î (0, ¥)
t

å

n

q

k =1 k

n

= q Î (0, ¥)

Proposition: Under these assumptions,

1 t
N (s ) ds  lq
t ò0

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 73

Intuitively, this states that the average amount of work in the system at any
given time is equal to the average number of arrivals per unit time multiplied by
the average sojourn time
...
By swapping the summation and integration
(valid by Fubini), we obtain

ò

t
0

N (s ) ds =
=

å ò
n Î

t
0

1{t

n

£s £dn }

ds

ìïAmount of time job n wasüï
å nÎ ïíï in the system during [0, t ] ïýï
ïî
ïþ

We can bound this above by considering the sojourn time of all arrivals up to
and including time t (though some of them may overrun past t) and lower bound
it by considering the sojourn time of all job that depart before time t (even
though some jobs that leave after t do spend some time in the system before t)
...

Before we continue, we will need the following claim

Claim: Under the two assumptions above

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 74
qn
tn

0

as n  ¥

Intuitively, the second assumption states qn grows slower than n, so we
would expect this to be true
...



Now by our claim, for all e > 0 , there exists an N (e) such that for n > N (e)

qn
tn

£e

d n - tn
tn

£e

This implies that dn £ (1 + e)tn "n > N (e)
...
Therefore

å

A(t )

D (t (1+ e ))

q £ å i =0

n =N ( e )+1 n

qn

i

Putting this together with the bounds developed above, we find that
t (1+ e )
1
1
1
A(t )
A(t (1+ e ))
q
£
N (s ) ds £
qn
å
å
ò
n =N ( e )+1 n
n =1
0
t(1 + e)
t(1 + e)
t(1 + e)

Let’s first consider the upper bound
æ A (t(1 + e) öæ
÷÷çç
÷÷ö
çç
1
1
A(t (1+ e ))
A(t (1+ e ))
÷
÷  lq
=
q
q
ç
ç
å
n
n÷
çç t(1 + e) ÷÷÷çç A (t(1 + e)) å n =1
÷÷
t(1 + e) n =1
è
øè
ø
The first term tends to l , by assumption 1
...
Since A (t(1 + e))  ¥ , the second term above is precisely such a
subsequence
...
The second term tends to q
by a similar logic as above, and by noting that since N (e) < ¥ , the second term
in brackets tends to 0
...



LECTURE 9 – 25th March 2011



Single-server queue; IID case
o

We now specialize our analysis to a situation in which the workloads and interarrival times are IID
...
There is, once

again, only one server
...

o

We also denote by wn the time that the nth job has to wait in queue before it is
served
...
Now, consider that

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 76

ì
ï
0
wn +1 = ï
í
ï
d - tn +1
ï
î n
+
= (dn - tn +1 )

dn £ tn +1
dn > tn +1

As such
+

wn +1 = (wn + Sn - tn +1 )
Let Z n = Sn -1 - tn
...
Letting sn =
w1
w2

j

j =1

1

1

+

1

2

= s2 + max(0, -s1 , -s2 )
= s2 - min 0£k £2 sk
In fact, carrying this analysis forwards, we find that
wn = sn - min 0£k £n sk
The second term takes into account the fact we have a reflected random walk,
and prevents the walk from going negative
...
As such, we can change indices on the Z
at will and still maintain equality in distribution
...
s
...
As such,
wn  M ¥ as n  ¥
...
However, since this is a Markov chain, a stationary distribution is all
we could really want]
...
This is consistent with our findings in the G/G/1 queue, since

(tn +1 ) > (Sn )  r < 1
...



The single-server M/M/1 Markovian queue
o

We now consider the most tractable of all single-server queue models; the M/M/1
in queue
...

o

Consider the process
X (t ) = Number of jobs in system at time t ³ 0

{X (t ) : t ³ 0}

is a continuous-time Markov Chain with countable state space
...
We have

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

f (h )
h

= 0)

Foundations of Stochastic Modeling

Page 78

ì
ï
lh + o(h )
ï
ï
ï
mh + o(h )
ï
 {X (t + h ) = j | X (t ) = i } = ïí
ï
1 - (l + m)h + o(h )
ï
ï
ï
o(h )
ï
ï
î

j = j +1
j = j -1
j =i
otherwise

Our transition matrix P then takes the form
é
ê
ê
ê
Ph = êê
ê
ê
ê
êë

1 - lh

lh

0

mh

1 - (l + m)h

lh

0

0

mh

1 - (l + m)h

lh

0

0

mh





0





ù
ú
ú
ú
ú
ú
ú
ú
ú
úû

We can write this as

é
0
l
ê-l
ê m -(l + m)
l
ê
ê
-(l + m)
m
Ph = I + ê 0
ê
0
m
ê
ê
0
êë
Ph = I + Qh + o(h )
Q is the rate matrix, defined by Q = limh 0

Ph -I
h

ù
ú
ú
0
ú
l 0 úú h + o(h )
ú
 ú
ú
 ú
û

...

o

We can use Little’s Law to good effect; consider the following two examples; the
first is trivial, the second less so


Consider the server as the “system”
Let S be the number of items in the server
...
As such, by Little’s Law

 p (S ) =
Note,

however,

that

l
=r
m

S = 1 ⋅ 1System is busy + 0 ⋅ 1System is idle
...

The

result

above

As
is

such,

therefore

consistent with what we would expect
...
The sojourn time in the
queue is simply the waiting time, wi
...

+

Now, note that Q(t ) = (X (t ) - 1)

(we must subtract any item

currently in the server)
...
Once again, recall w is the wait time
experienced by a random job entering the queue
...
As such
e -mx (mx )k -1 m k
r (1 - r)
(k - 1)!
k -1
¥ (mx )
= e -mx m(1 - r)r å k =1
rk -1
(k - 1)!
(mx r)k
¥
= e -mx m(1 - r)r å k =0
k!
= rm(1 - r)e -m(1-r )x

fw (x ) = å k =1
¥

And so

ì
ïexp (m(1 - r))
w ~ï
í
ï
0
ï
ï
î

with prob r
1-r

And

 (w > x ) = re -mx (1-r )



The M/G/1 queue
o

We now consider a slightly more general case in which the time between arrivals

{t }
n

are exponential, but the processing times

{S }
n

now follow a general

distribution
...
We also require R(t), the residual processing time of the work
currently in service
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

o

Page 81

To sidestep this complication, we will consider the following embedded Markov
chain Xn
...
In this case, by assumption, the An are IID, and

(A

n

| Sn = s ) ~ Po (ls ) ,

where l is the rate of arrivals
...

o

Theorem: p = pˆ
Proof: Define the following two processes:
Aj (t ) = Number of arrivals in [0, t ) that "find" the system in state j
D j (t ) = Number of arrivals in [0, t ) that "leave" the system in state j

Also let A(t) and D(t) be the total number of arrivals and departures up to time
t, respectively
...
For any given sample path w , it is clear that

Aj (t ) - Dj (t ) £ 1

"t ³ 0

This is because Aj(t) is the number of “up-crossings” of X(t) over the line
X(t) = j whereas Dj(t) is the number of “down-crossings”
...

2
...
s
...

3
...
s
...
Letting tk be the time of the kth arrival, we have
limt ¥

Aj (t )
A(t )

= limn ¥

1
n
1
å
k =1 {X (tk )= j }
n

5
...

limt ¥

n
1 t
1
1{X (s )= j } ds = limn ¥ å k =1 1{X (t )= j }
ò
k
0
t
n

where tk is the time of the kth arrival to the system
...

6
...
a
...
Finally, combine all the above

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 83
(6)

1 t
pˆ( j ) =a
...
limt ¥ ò 1{X (s )= j } ds
t 0
(5)

n
1
=
limn ¥ å k =1 1{X (t )= j }
k
n
(4 )

Aj (t )
=
limt ¥
A(t )
(3)

=a
...
p( j )

Which proves the required result
...
First, consider that by the above

 pˆ éëêX (t )ùûú =  p éëêXn ùúû
Recall further that
X n +1 = (X n - 1) + An +1

(A

+

n

| Sn = s ) ~ Po(ls )

We can write this as follows
X n +1 = (X n - 1)1{X
= X n 1{X

n

³1}

n

³1}

- 1{X

+ An +1
n

³1}

+ An +1

And
Squaring this expression, we obtain (we simply denote 1 º 1{X

n

³1}

to save space)

X n2+1 = X n2 1- 2X n 1+ 2X n An +1 1 + 1- 2An +1 1 + An2+1

As such

(

We

can

(

 p X n2+1 =  p X n2 1- 2X n 1+ 2X n An +1 1+ 1- 2An +1 1+ An2+1

)

(

)

begin

noting

by

simplifying

the

expression

above

by

)

 p X n 1{X ³1} =
n

And so

 p X n2+1 =  p X n2 1{X

n

(

³1}

+  p 1{X

)

n

³1}

+ An2+1 - 2X n +1 1- 2An +1 1+ 2X n An +1 1

0 = r +  An2+1 - 2 p (X ) - 2(An +1 )r + 2(An +1 ) p (X )
 add indicator ³ 0, same

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

that

Foundations of Stochastic Modeling

Page 84

Note

that

(A1 ) =  éêë (A | S )ùûú = l (S ) = r
ar(A1 ) =  éëê ar(A1 | S )ùûú + ar+([E(A|S))))=l (S ) + ar(lS ) = r + l 2ss2
So
0 = r + r + l 2 ss2 + r 2 - 2 p X - 2r 2 + 2r  p X
2 p X (1 - r) = 2r + l 2 ss2 + r 2
p X =

2r + r 2 + l 2 ss
2(1 - r)

2r + r + l 2 (S )2

ss2

2

p X =

(E [ S ])2

2(1 - r)
2r + r 2 (1 + CV 2 )
p X =
2(1 - r)

For M/M/1, CV = 1, so back to expression above
...
Thus, out that
(S 3 ) < ¥   p X 2 < ¥ , this is excessive because we took the simpler way to

prove it
...
Easy approach to problem
leads to an excessive condition
...
The steady states being equal doesn’t usually happen
...
This is
really quite a surprising result! The RHS has full information and sees the system
at all times
...

o

For an intuitive proof, write A(t) = cumulative number of samples up to time t
...
This is a martingale, because the
Where dA
(s ) are independent (by the “independent increments” property of
increments dA

Poissons processes) and mean 0 (because of our normalization)
...
As such
M (t )
t

 0 a
...
In other words,
limt ¥

t
A(t ) 1
1 t
1{X (s )ÎA} dA(s ) = limt ¥ ò 1{X (s )ÎA} ds
ò
t 0
lt A(t ) 0

Finally, use limt ¥

A(t )
t

= l to conclude that the first term converges to 1
...

LECTURE 10 – 30TH MARCH 2011

Renewal & Regenerative Process


Renewal processes
o

Let

{X }
n

be IID, with (X1 ) = m < ¥ and  (X1 = 0) < 1
...
Let N (t ) = sup {n ³ 1 : Sn £ t }
...

o

Definition

(renewal

function):

The

renewal

m(t ) = [N (t )]
...
{N (t ) : t ³ 0} is then a Poisson process
...
Much of
our work in this section will be concerned with generalizing these results to
general renewal processes:


(N (t )) = m(t ) = mt = (X1 )t

(generalizes to the elementary renewal

theorem)
...



Proposition: m(t ) = å n =1 Fn (t ) , where Fn (t ) =  (Sn £ t )
...



Remark: The CDF of Sn is the n-fold convolution of the CDF of each individual

variable X
...

o

The Laplace Transform of a function of a distribution with CDF F(x) is defined
as

Fˆ(s ) =

ò

¥
0

e -sx dF (x ) =

ò

¥
0

e -sx f (x ) dx

A few important results:


Laplace transform of convolutions
...

o

Theorem: If b(t) is bounded on any interval, then the solution to the following

renewal equation
a = b +a * F
¥

a(t ) = b(t ) + ò a(t - s ) dF (s )
0

is
a = b +b *m
a(t ) = b(t ) + ò

t

s =0

b(t - s ) dm(s )

Verification: Taking the Laplace transform of the first equation

aˆ(s ) = bˆ(s ) + aˆ(s )Fˆ(s )
As such
aˆ(s ) =

bˆ(s )
bˆ(s )Fˆ(s ) ˆ
ˆ (s )
= bˆ(s ) +
= b(s ) + bˆ(s )m
1 - Fˆ(s )
1 - Fˆ(s )

Taking an inverse Laplace transform of the proposed solution, we do indeed find
this equation is satisfied
...
The solution is
therefore

m(t ) = F (t ) + ò

t

s =0

o

Example:

Let

us

consider

F (t - s ) dm(s )

the



X ~ exp(1 / m)

example

and

m(t ) = t m  dm(t ) = mdt
...
As such, we might expect that

a(t )  m ò

t

s =0

b(s ) ds

as t  ¥

It turns out this last conclusion holds generally, not just for Poisson processes
...
s
...
As such

S N (t )
N (t )

£

S N (t )+1 N (t ) + 1
t
£
N (t ) N (t ) + 1 N (t )

1 a
...

However, by the SLLN,

Sn
n

 (X1 )
...

o



Proposition (Elementary/Baby Renewal Theorem):
m(t )  êëéN (t )ùûú
1
=

(X1 )
t
t

as t  ¥

Proof: See homework 4
...

1
...
This latter expression does tend to sN (0,1) in distribution,

by the standard CLT
...
To answer this question, we will require a sidelemma
...

Intuitively, this is because we need variances to accumulate fast
enough for the CLT to be valid
...
s
...
We can then use the converging
together Lemma on the expression quoted above to obtained the desired
result for this step
...
Consider that
S N (t )
N (t )

£

S N (t )+1 N (t ) + 1
t
£
⋅
N (t ) N (t ) + 1 N (t )

1 a
...

Both sides of the “sandwich” are in the form used in step 1
...

o



Definition (Lattice/Arithmetic distribution): A distribution F is said to be
of Lattice-type if there exists an h > 0 such that F is supported on

{X } = {nh : n Î }
n

For example, the Poisson distribution is lattice with h = 1
...

Remark: This result is not implied by the fact

m (t )
t



1
 ( X1 )

, because this theorem

concerns increments in the renewal function m
...

Remark: If f :  +   , we simply require both f+ and f– to be dRi for f to be
dRi
...



f is bounded and continuous, and

ò

¥
0

fh (x ) dx < ¥

for some h > 0
...

LECTURE 11 – 6TH APRIL 2011



Regenerative processes
o

A regenerative process is a stochastic process with time points at which, from a
probabilistic point of view, the process restarts itself
...
We consider that the process
“resets” each time the queue empties
...
Without loss of
generality, we let the renewal occur at X (t ) = 0
...



tn +1 = t(n + 1) - t(n ) , the inter-renewal time
...

Then X is said to be regenerative if


tn < ¥ a
...
"n



t(n )  ¥ as n  ¥



X 0 , X1 ,  are independent



X1 , X2 ,  are identically distributed (the first renewal might have a

different distribution if the system doesn’t start “empty”)
...
We

can then define a renewal process using any state x Î  as our “renewal state”
...

o

Example (DTMC): Let



(X

n

: n ³ 0) be a DTMC on  = 
...

o



Definition (recurrence): A regenerative process is positive-recurrent if
(t1 ) < ¥ and null-recurrent otherwise
...
s
...
Now
N (t ) 1
1
N (t )
N (t )
Y (f ) =
Y (f )
å
å
k =1 k
k =1 k
t
t N (t )
Note, however, that the Yi(f) are IID
...
All we now need to do is to show that the last term
vanishes
...

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Lemma: If
then
Notice
1
N (t )

1
n

{a } is a real-valued sequence
{a }  0 as n  ¥
...
As such
N (t )

1
N (t )

1
n

k =1

k

a
...

1

N (t )  ¥
...
s
...

o



We have just proved that

1
t

t

ò f (X (s ))
0

ds a
...

(Y1 ( f ))
( t1 )

...

ë
û
1

o

Proposition (CLT for regenerative processes): Let

(X (t ) : t ³ 0)

be a

regenerative process on a state space  with (t12 ) < ¥
...
Then
æ t
ö
çç f (X (s )) ds  (Y (t )) ÷÷
ò
÷÷
1
0
t ççç
÷  sN (0,1)
t
(t1 ) ÷÷÷
çç
è
ø÷

(

)

With s 2 =  Y12 ( fc ) / (t1 ) where fc (⋅) = f (⋅) o

(Y1 ( f ))
( t1 )

...
Suppose that either of the following conditions hold


F (x ) =  (t1 = x ) has a density and there is a function h : d   that
is bounded



F (x ) =  (t1 = x ) is non-lattice and there is a function h that is bounded
and continuous

Then
é t (1)
ù
 ê ò h(X (s )) ds ú
t (0)
ê
úû
 éêëh(X (t ))ùúû  ë
 éëê t1 ùûú
Then

X (t )  X (¥)

as t  ¥

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

as t  ¥

Foundations of Stochastic Modeling

Page 95

and
é t (1)
ù
 êò
1{X (s )ÎA} ds ú
ê t (0)
úû
 (X (t ) Î A) = ë
é
ù
 ëê t1 ûú

Remark: If X (0) =d X (¥) , then (X (t ) : t ³ 0) is stationary
...

a(t ) =  éêëh(X (t ))ùúû
=  éëêh(X (t )), t1 > t ùûú +  éëêh(X (t )), t1 £ t ùûú
t
= b(t ) + ò  éêëh(X (t )) | t1 = s ùúû dF (s )
0
t
= b(t ) + ò  éëêh(X (t(1) + t - s ))ùûú dF (s )
0
t
= b(t ) + ò  éêëh(X (t(0) + t - s ))ûùú dF (s )
0
(Where, in the last step, we have used the fact that every cycle has the same
distribution)
...
As such, by the conditions for direct

Reimann integrability, we have that g is dRi
...
And since F is non-lattice, all the conditions of the key renewal theorem
hold, and therefore

a(t ) t ¥

¥
1
b(s ) ds
(t1 ) ò0
¥
1
=
 éêëh(X (s )), t1 > s ùúû ds
ò
(t1 ) 0
¥
1
é
ù
=
 êh(X (s ))1{t >s } ú ds
ò
1
ë
û
(t1 ) 0
t
1
1
=
h(X (s )) ds
ò
0
(t1 )

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Foundations of Stochastic Modeling

Page 96

Where the last step follows by Fubini, since h is bounded
...

o

Example: Let (X (t ) : t ³ 0) be a positive recurrent regenerative process
...
We are now interested in the expression

 (T (A) > t ) = ? , especially for t large
...
Now

(
(

)
)

t

a(t ) =  éêëT (A)  t1 ùûú > t + ò  (T (A) > t - s )  (T (A) > t1 | t1 = s ) dFs
0
é
ù
=  êëT (A)  t1 úû > t
t

+ò  (T (A) > t - s )  (t1 = s | T (A) > t1 )  (T (A) > t1 ) dFs
t

0

s
= b(t ) + ò a(t - s )b dF
0

Where

F(s ) =  (t1 £ s | T (A) > t1 )
So

a = b + b (a * F )
For fixed l Î  ,
t

a(t )e lt = b(t )e lt + be lt ò a(t - s ) dF(s )
t

0

al (t ) = bl (t ) + ò be l(t -s )a(t - s )be ls dF(s )
0
t

dFl (s ) = be ls dF(s )

al (t ) = bl (t ) + ò al (t - s ) dFl (s )
0

Suppose $l s
...
Fl is a bona-fide distribution
...
By
appealing to the key renewal theorem
al (t ) 

1
l

(t ) ò
1

 l t1 =

ò

¥

0
¥

0

(

)

e ls  éëêT (A)  t1 ùûú > s ds = h

t¥

x dFl (x )

And so

a(t ) =  {T (A) > t } ~ he -lt
So the probability has an exponential type tail
...
Italicized numbers indicate examples
...
92
Azuma's Inequality
...
See Bounds on
normal tails

Baby Renewal Theorem
...
56

Bounds on normal tails
...
50

CLT for Renewal Processes
...
57

Coupling time
...
85

Elementary Renewal Theorem
...
85

Filtration
...
68

G/G/1 queue
...
74

GI/GI/1 queue
...
70, 72

Little's Law
...
78

M/G/1 queue
...
20

M/M/1 queue
...
89

Random walk
...
93

as a martingale
...
69

Assymetric
...
74

Symmetric
...
68

Renewal equation
...
83

Renewal function
...
85

Sample path methods
...
79
Martingale
...
54
Azuma's Inequality
...
60
convergence theorem
...
63

SLLN for renewal processes
...
64
Deterministic motivation
...
50
Wald's First Identitiy
...
51, 52
Wald's Second Identity
...
59

Transcribed by Daniel Guetta from
lectures by Assaf Zeevi, Spring 2011

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview