Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Foundations of Optimization - Lecture Notes
Description: Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations Notes covering more technical material in the "Foundations of Optimization" class above, including duality, KKT conditions, no-convex program and more.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Foundations of Optimization Notes

Page 1

FOUNDATIONS OF OPTIMIZATION
Basics


Optimization problems
o

An optimization problem is

minimise f (x ) subject to x Î 
f is the objective (real)  is the constraint set/feasible set/search space
...


o

We consider problems in the following form

minimize

f (x )

subject to hi (x ) = 0

" 1£i £m

gi (x ) £ 0

" m £i £r

xÎ
o

o

n

We consider the following subsets of the problem


In linear programming, all functions are linear
...


If  is the feasible set of a problem, a point x Î  is a local minimum if there
exists a neighborhood N r (x ) such that f (x ) £ f (y ) "y Î  Ç N r (x )
...
(Strict equivalents
exist)
...

o

A point x Î  Ì n is an interior point if there exists an open ball such that

N r (x ) Ì 
...


Daniel Guetta

Foundations of Optimization Notes

o

Page 2

A point x Î  Ì n is a closure point if, for every open ball N r (x ) , there exists

y Î  with y Î N r (x )
...

o

The set of reals is both closed and open
...
The intersection of a finite number of
open sets is open
...
The union of a finite number of
closed sets is closed
...

o

A set  Ì n is (sequentially) compact if, given a sequence {xk } Ì  , there is a
subsequence {xk } converging to an element x Î 
...




Theorem: A closed subset of a compact set is compact
...


o

A real-valued function f defined on a domain  Ì n is continuous at the point

x Î  if, for every sequence {xk } Ì  with xk  x , lim f (xk ) = f (x )
...

o

A function f is coercive over a set  Ì n if, for every sequence {xk } Ì  with

xk  ¥ , we have lim f (xk ) = ¥
...



Theorem: If f is continuous and  is open
f -1( ) is also open

closed

closed

and  is open closed, then


...


Daniel Guetta

Foundations of Optimization Notes

o

Page 3

Definition: A function is convex if f (lx1 + (1 - l)x 2 ) £ l f (x1 ) + (1 - l)f (x 2 )
...
We say f is convex over

 = dom f if it is convex when restricted to 
...

o

Definition: If f is convex with a convex domain  , we define the extendedvalue extension f : n   È {¥} by

ìïf (x )
f(x ) = ï
í
ïï ¥
î

if x Î 
otherwise

and we let

{

}

dom f = x Î n : f(x ) < ¥

An extended-value function is convex if

o



Its domain is convex
...


Given  Ì n , the indicator function I  : n   È {¥} as

ìï 0
I  (x ) = ï
í
ï¥
ïî

if x Î 
otherwise

If  is a convex set, then I  is a convex function
...
The converse is not true (eg: log x

on (0, ¥) )
...




Calculus
o

A function f :    with  Ì n is differentiable at x Î int  if there exists a
vector f (x ) Î n , known as the gradient, such that
lim

d 0

f (x + d ) - f (x ) - f (x ) ⋅ d
=0
d

And

Daniel Guetta

Foundations of Optimization Notes

Page 4
T

é ¶f (x )
¶f (x ) ùú
n
f (x ) = êê
, ,
ú Î


x
x
1
n ûú
ëê

f (x + hei ) - f (x )
¶f (x )
= lim
h 0
¶x i
h

f is differentiable over an open set  Î  if it is differentiable at every point in
the set
...

o

If, for a point x Î int  , each component of the gradient is differentiable, we say
f is twice differentiable at x, and we define the Hessian Matrix 2 f (x ) Î n´n by

é ¶2 f (x ) ù
ú
2 f (x ) = êê
ú
x
x


êë i j úûij
If f is twice continuously differentiable in a neighborhood of x, then the Hessian
is symmetric
...


o

Consider a vector-valued function F :   m ,  Ì n and a point x Î int 
...
Then



o

¶xi

The chain rule states that for interior points, if h(x ) = g(f (x )) , then

h(x ) = f (x )g(f (x ))



¶Fj (x )

{
im A = {y Î 

ker A = x Î n : Ax = 0
m

}

: y = Ax, x Î n

{

}

Given a set  Î n ,  ^ = x Î n : x ⋅ y = 0 "y Î 

Daniel Guetta

}

Foundations of Optimization Notes

Page 5
^

o

Lemma: im A = éêker(A )ùú
...



Example: The empty space, a line and any subspace are affine
...




Definition: Given a set of points  Ì n , the affine hull aff  is the
set of points l1x1 +  + lk xk , where k > 1, {xi } Ì  and l+ = 1
...


o

Convex sets


Definition: The set  is convex if, for all points x1 , x 2 Î  and a scalar

l Î (0,1) , lx1 + (1 - l)x 2 Î 
...



Definition: Given a set of points  Ì n , the convex hull conv  is the
set of points l1x1 +  + lk xk , where k > 1, {xi } Ì  , li ³ 0 and l+ = 1
...




Theorem (vector sum): If ,  Ì n are convex sets, then the set

 +  = {x + y : x Î , y Î } is also convex
...


Theorem: If  is an arbitrary collection of convex sets, then the
intersection Ç Î  is also convex
...


Polyhedra are convex
...
If the
cone is convex (ie, if l1x1 + l2x 2 Î  for all x 1, x 2 Î  , l1, l2 ³ 0 ), it is a
convex cone
...
It is a
convex cone
...
When the norm in question is the 2norm, the cone is called the second-order cone
...

o

Hyperplanes and halfspaces


Definition: A hyperplane is a set of the form {x Î n : a ⋅ x = b } , where
a Î n \ {0 } is called the normal vector
...




Definition: A halfspace is a set of the form

{x Î 

n

Halfspaces are convex but not affine
...


Foundations of Optimization Notes

Page 7

The L2-norm (Euclidean norm): x



x



G

x



¥

å

n
i =1

x i2 = x ⋅ x

= x  Gx (when G  0 and symmetric)

The p-norm: x =



=

2



n

| x i |p
i =1

)

1/ p

for p > 1

= max {| x1 |, , | xn |}

Given a norm, the (closed) ball with centre x0 and radius r > 0 is

{x Î 
o

n

}

: x - x 0 £ r , and it is convex
...
Fix the vector x Î n
...
t
...
A vector x ¢ Î  is equal to x * if and only if

(x - x ¢) ⋅ (z - x ¢) £ 0

"z Î 

Geometrically, the angle between x ¢  x and x ¢  z must be larger than 90o
for all points in the set:
x
z



Proof: Existence follows from the fact

z -x

is coercive and  is closed
...


Now, consider that f (x * ) = 2(x * - x )
...


Daniel Guetta

Foundations of Optimization Notes

Page 8

Application: Suppose we want to approximate f(x) over a set of points

{x , , x }
m

1

using g(x ) = å  =1 r f (x ) , where the fi are basis functions and r is
k

a vector of weights
...
t
...
This problem is
equivalent to

min

y -z

s
...


z Î Fr : r Î k

{

}

This is a projection problem, and so a unique optimizer exists
...
t
...
Then if
o

 is non-empty

o

f is lower semicontinuous over 

and one of the following conditions hold:
1
...


 is closed, and f is coercive

3
...

then the set of optimal minimizing solutions of f is non-empty and compact
...
Then, given

g > f * , the level set {x Î  : f (x ) £ g } must be non-empty
...
Thus, since  is compact, so is this set
...
Since f is coercive, (g ) is non-empty
*
and bounded for any g > f
...
Thus,
(g ) is compact
...

Note that the lower semicontinuity of f was only used in proving that (g ) is closed
...
If l is the smallest eigenvalue of G , we

2

x - b x , which is coercive if l > 0
...


Unconstrained local optimality


In this section, we consider the problem min f (x ) s
...
x Î  Ì n , but we focus on
minima that lie in the interior of 
...




Local optimality
o

Theorem – necessary conditions: let x * Î int  be an unconstrained local
minimum
...

Proof:


First order : fix d Î n \ {0 }
...

Thus, f (x * ) ⋅ d ³ 0
...



Second order : fix d Î n
...

o

Theorem – sufficient conditions: Consider a point x * Î int 
...
The geometric interpretation is
as above – the only difference is that we now require a positive definite instead of
a positive semidefinite matrix
...


o

Find the set of possible unconstrained local minima using f (x ) = 0
...


o

Example: Consider minx Î n 12 x Gx - b x and G  0
...
Furthermore,  \ int  is empty, and so the global
minimum must be an unconstrained local minimum
...




Sensitivity analysis
o

Consider the problem min f (x, a ) s
...
x Î n
...
The first-order conditions are

x f (x *(a ), a ) = 0
Taking the derivative with respect to a, we obtain
2
2
x *(a )xx
f (x *(a ), a ) + xa
f (x *(a ), a ) = 0

From this expression, we can obtain expressions for the sensitivity of the
optimum, and of the optimal value:

{

}

x * (a ) = -2xa f (x *(a ), a ) 2xx f (x *(a ), a )

-1

f *(a ) = a f (x *(a ), a ) = x *(a )x f (x *(a ), a ) + a f (x *(a ), a ) = a f (x *(a ), a )
o

The implicit function theorem tells us when this exists
...
t
...
We are interested in characterizing local
minima that are not in int 
...


Daniel Guetta

Foundations of Optimization Notes


Definition:

The

set

descent

of

{

Page 12

directions

of

f

at

x*

is

the

set

}

(x * ) = d Î n : f (x * ) ⋅ d < 0
...

For example:
 (x * )

x2


x*



x1

Theorem – necessary condition: If x * is a local minimum, there is no descent
direction in the tangent cone:
(x * ) Ç  (x * ) = Æ

Geometrically, the tangent cone contains the “directions in which we can move”
...
If
any direction fulfils both these conditions, then we can clearly improve on our current
point
...
Define

zk =

xk - x *
xk - x

*

-

d
0
d

dk = d + d zk  d

Now, if xk is a point on the line segment between x k and x * , the mean value theorem
tells us that

f (xk ) = f (x * ) + f (xk ) ⋅ (xk - x * )
Note, however, that we can write

Daniel Guetta

Foundations of Optimization Notes

(x

k

-x

*

)=

Page 13
xk - x *
d

(d

)

zk + d =

xk - x *
d

dk

And so we can re-write the above as
*

f (xk ) = f (x ) +

xk - x *
d

f (xk ) ⋅ dk

Now, if d Î (x * ) as well, then f (x * ) ⋅ d < 0
...
This
contradicts the local minimality of x *
...


Equality constrained optimization


Consider the problem min f (x ) s
...
h(x ) = 0 , x Î n where h : n  m
...




In this particular case, we will show we can characterize 

in a simple way
...
We now formalize this statement…


Definition: the cone of first-order feasible variations at x * Î n is the set

{

}

(x * ) = d Î n : h(x * )d = 0 = éêker h(x * ) ùú
ë
û
Note that d Î  (x * )  -d Î  (x * )
...



Definition: A point x * Î n is a regular point if it is feasible and the constraint
gradients

hi (x * )

are linearly independent
...
If

m > n, no regular points exist, and if m = 1, this reduces to h1(x * ) ¹ 0
...
Then  (x * ) =  (x * )
Proof: This theorem is hard
...
We’ll start by showing that for any direction

Daniel Guetta

Foundations of Optimization Notes

Page 14

d Î  (x * ) , there is such a path that starts by walking forward or backward along
the direction d
...
It’s therefore in 
...
Given a scalar t, consider the curve x(t ) = x * + td
...
However, there’s no guarantee we stay on the
constraints
...
This seems sensible – we are correcting our path to reflect how h
might change
...

Now, take the gradient of the boxed equation with respect to u and evaluate it at
(t, u) = 0
...

The two results above allow us to use the implicit function theorem to deduce
that a solution u(t) to the boxed equation exists for all t Î (-t, t ) , for some t
...

o

All we now need to prove is that the initial direction in which we move is d
...
We get

Daniel Guetta

Foundations of Optimization Notes

(d

Page 15


)

+ u (0) h(x * ) h(x * ) = 0

But since d Î  (x * ) , we also have that d h(x * ) = 0 , and so u (0) = 0 , and

x (0) = d , as required
...
Though I’m not quite sure how to prove that result]
...


o

 (x * ) Ì  (x * ) : choose d Î  (x * ) \ {0 } and let x(t) be the curve discussed

above
...
Then, by the mean
value theorem, there is some t Î [0, tk ] such that
x (tk ) - x (0) = x (t)(tk - 0)
x (tk ) - x *
x (t)
=
*
x (tk ) - x
x (tk ) - x * / tk

As tk  0 and therefore t  0 , this tends to


x (0)
d
=
x (0)
d

So d Î  (x * )
...
By the mean value
theorem, there is some x Î [xk , x * ] such that

h(xk ) - h(x * ) = h(x)(xk - x * )
But since x * and every x k are in the feasible set, h(xk ) = h(x * ) = 0 , and

0 = h(xk ) = h(x * ) + h(xk ) (xk - x * ) = h(xk ) (xk - x * )
h(xk )

xk - x *
xk - x *

=0

 h(x * )d = 0
And so d Î  (x * )
...

Proof: Since x * is a local minimum, (x * ) Ç  (x * ) = Æ , and since x * is regular,
(x * ) Ç  (x * ) = Æ
...

However, since we also have -d Î (x * ) , we must have f (x * ) ⋅ d = 0
...



The last part of the previous theorem is important, because it provides a “simple” way
to characterize the tangent cone, and a “recipe” to find optimal points
...
It
effectively states that f (x * ) [the direction in which we might increase our
objective] must be a linear combination of the hi (x * ) [the perpendicular to the
constraints hi (x * ) = 0 ]
...
Here is an
example, in which f (x ) is constant:

Daniel Guetta

Foundations of Optimization Notes

Page 17

x

{x : h(x ) = 0 }

2

*

h(x * )
f (x )
(Darker shading implies
larger value of f)

Proof: The existence of l * is simply a restatement of the previous theorem
...

For the second-order condition, consider a d Î  (x * ) , and use the first part of
the regularity lemma to define a path x(t ) either side of x * , which always stays
on the constraints and such that x (0) = d
...
Thus
g(0) = d 2 f (x * )d + x(0) f (x * ) ³ 0
Finally, consider (t ) = l *h(x(t )) = 0 and differentiate it twice, to get

(0) = d 



m
i =1

)

li 2hi (x * ) d + x(0) h(x * )l * = 0

Finally, add the last two equations, and apply the first order condition
...
We effectively allow the constraint h(x) = 0 to be broken, but we
associate a cost l with breaking it
...

o

As such, we have the following simple recipe to solve an equality-constrained
problem (assuming f and h are continuously differentiable on n ):


Check that global minima exist



Find the set of (x * , l * ) satisfying x (x *, l * ) = 0 and l (x *, l * ) = 0
...


A few examples:


min f (x ) = x 1 + x 2 s
...
h(x ) = x 12 + x 22 - 2 = 0, x Î 2
...
In this case, (x, l) = x1 + x 2 + l(x12 + x 22 - 2) , and since
all points are regular, we simply need to consider all points satisfying the
first-order conditions and choose the smallest one
...
t
...

However, it is not a regular point
...

o

Constraint qualification: In summary, what we have proved so far is
x * local minimum
 (x * ) Ç  (x * ) = Æ
 (x * ) Ç  (x * ) = Æ
 Lagrange multipliers exist
In proving this sequence of statements, we made use of the fact that x * was a
regular point
...
If the
constraints are linear, for example, Lagrange multipliers are guaranteed to exist
...

o

Theorem – Sufficient Conditions: Assume that f and h are both twice
continuously differentiable, and that x * Î n and l * Î m satisfy
x L(x * , l * ) = 0
l L(x *, l * ) = 0
2
L(x *, l * )d > 0
d xx

"d Î  (x * ) \ {0 }

Then x * is a strict local minimum
...
Suppose it
is not a strict local minimum; then there exists a sequence {xk } Ì n such that

xk ¹ x * and xk  x * which lies entirely in the feasible region of the problem [ie:
h(xk ) = 0 ] and f (xk ) £ f (x * )
...


h(xk )dk = 0
Taking the limit as k  ¥ , we get h(x * )d = 0 , and so d Î  (x * )
...
This gives

Daniel Guetta

Foundations of Optimization Notes

Page 20

hi (xk ) = dk li*hi (x * ) ⋅ dk + 12 dk2dkli*2hi (xˆi )dk = 0
Adding these m + 1 equations, we get

(

)

(

)

dk f (x * ) + å i =1 li*hi (x * ) ⋅ dk + 12 dk2dk 2 f (xˆ0 ) + å i =1 li*2hi (xˆi ) dk £ 0
m

(

m

)

dk x L(x *, l * ) ⋅ dk + 12 dk2dk 2 f (xˆ0 ) + å i =1 li*2hi (xˆi ) dk £ 0
m

Noting that, by the first order conditions, x L(x *, l * ) and then dividing by

1
2

dk2

and taking the limit as k  ¥ , this becomes

(

)

d  2 f (x * ) + å i =1 li*2hi (x * ) d £ 0
m

But since d Î  (x * ) \ {0 } , this violates our assumed second order condition
...
Consider the program
minx Î n s 2 = x Gx s
...
1 x = 1, mx = m
which might represent minimizing the variance in a portfolio while keeping total
sales equal to 1 unit, and keeping the expected return equal to a certain value m
...
We then get

æl * ö æ h + z m ö
÷
1 ÷
ççç 1* ÷÷÷ = ççç 1
÷
çèçl2 ÷ø÷ èçh2 + z2m ø÷÷
Where the constants depend on G and m
...




Sensitivity analysis
o

We now look at what happens when we slightly vary our constraint
...
t
...
Suppose there exists a local pair (x *, l * ) satisfying
the second-order sufficient conditions when u = 0, and x * is regular
...




x *(u ) is continuously differentiable
...


The last part of the theorem is the interesting one – it tells us that changing u
by an infinitesimal amount around our solution point will lead in a change of

-l * in the objective
...
This implies that
2xx L(x * , l * )y + xh(x * ) = 0
h(x * ) y = 0
The second equation implies that y Î  (x * )
...
Thus, our matrix is nonsingular
...
Second-

order conditions follow, for u sufficiently close to 0, by continuity assumptions
...
Using that result, we can re-write the previous equation
as

u f (x * (u ) + I l * (u ) = 0
u f (x * (u ) = -l *(u )
As required
...
t
...
Note also that when we say x £ 0 ,
we mean that every component of x is less than or equal to 0
...




Definition: Given a feasible point x * Î n , the set of active inequality constraints
(x * ) is defined as

{

}

(x * ) = j : g j (x * ) = 0 Í {1, , r }



Lemma: Let x * be a local minimum for the inequality constrained program above
(IneqCP)
...
t h(x ) = 0 , g j (x ) = 0 "j Î (x * )
Proof: Suppose x * is not a local optimum for EqCP
...

Since

g

is

continuous,

we

have

g(xk )  g(x * )
...
Thus, for sufficiently large k, xk is feasible for IneqCP
...



Definition: Consider the inequality constrained program above
...



Definition: The cone  EQ (x * ) at the point x * Î n is the set of vectors d Î n such
that
d ⋅ hi (x * ) = 0
*

d ⋅ g j (x ) = 0

Daniel Guetta

"1 £ i £ m
"j Î (x * )

Foundations of Optimization Notes

Page 24

Intuitively, it is the set of points in which we can move while keeping the equality
constraints and the active inequality constraints satisfied
...
First,
consider that
o

Because the h are equalities, we cannot move along hi (x ) [the perpendiculars
to the equality constraints] in any direction (either forwards or backwards)
...
Moving along + gi (x ) would leave the feasible region
...


All the first-order condition states is that there is no way to move in such a way that we
remain in the feasible region and decrease f (x * )
...


o

A positive linear combination of the perpendiculars to the active inequality
constraints (along which we cannot move)
...


Daniel Guetta

Foundations of Optimization Notes

Page 25

Proof: Everything follows from applying Lagrange multipliers to the equality
constrained program, except for the fact that mj* ³ 0 for all j Î (x * )
...
Then let
j Ì n be the set of points feasible for all other active constraints, and let j EQ (x * ) be

the corresponding cone of first-order feasible directions:

{

}

j = x : h(x ) = 0 , g  (x ) = 0 " Î (x * ) \ { j }
EQ

*

{

*

*

}

j (x ) = d : d ⋅ h(x ) = 0 , d ⋅ g  (x ) = 0 " Î (x * ) \ { j }

By regularity, there must exist a d Î j EQ(x * ) with d ⋅ g j (x * ) < 0 (if it were the case
that d ⋅ g j (x * ) = 0 "d Î j EQ(x * ) , then the point would not be regular, and if
d ¢ ⋅ g j (x * ) > 0 for some d ¢ Î j EQ(x * ) , then just take d = -d ¢ Î j EQ (x * ) )
...
Now, if we let (t ) = f [x(t )] , we find that
(t ) = x (t ) f [x(t )]

Evaluating at t = 0

(0) = d f (x * )
= -d 



m
*



}

li*hi (x * ) + å j =1 mj*g j (x * )
m

i =1

= -d g j (x )m
<0

*
j

This contradicts the local optimality of x *
...


For example, consider the problem
minx Î 3 12 (x 12 + x 22 + x 32 ) s
...
x 1 + x 2 + x 3 £ -3
The objective and constraints are continuously differentiable, and minima exist (by
coerciveness)
...
Furthermore, all
points are regular, so this is the global minimum
...

Proof: We follow the equality case
...
Then the
exists {xk } Ì n , h(xk ) = 0, g(xk ) £ 0 , xk ¹ x *, xk  x * with f (xk ) £ f (x * )
...
Using the same mean-value-theorem
argument as in the sufficient conditions proof for Lagrange multipliers, we find that
h(x * )d = 0
...
If j Î (x * ) , then g j (x * ) = 0 , and so since xk is feasible,
g j (xk ) - g j (x * ) £ 0
...
Together, these imply that
g j (x * )d £ 0
If the inequality is tight for all j Î (x * ) , then d Î  EQ(x * ) , and we proceed as in the
equality case
...
However, by the definition of our sequence,

f (xk ) - f (x * ) £ 0 , and by the mean value theorem,
f (xk ) - f (x * ) = f (xk ) (xk - x * )  f (x * )d
For some xk Î [xk , x * ]
...
This, together with the
statement d f (x * ) > 0 , is a contradiction
...
If either of these facts changed (eg: max
and/or >), the sign of the Lagrange multiplier required would be flipped
...




Sensitivity analysis
o

Consider the optimization problem
minx Î n f (x ) s
...
h(x ) = u, g (x ) £ v

o

Theorem: Suppose there exists a triple (x *, l *, m* ) satisfying the second order
sufficient conditions when (u, v ) = (0 , 0 ) , with x * regular
...


Daniel Guetta

Foundations of Optimization Notes

Page 28



x * (⋅, ⋅) is continuously differentiable
...
Then a vector

z Î m satisfies
z ⋅ y £ 0 for all y Î m such that Ay = 0
if and only if
z = Ax for some x Î n with x > 0
o

Theorem: Consider the standard inequality optimization program above
...
Then there exists Lagrange multipliers (l *, m* ) satisfying
the FOCs
f (x * ) + h(x * )l * + g (x * )m* = 0
m* ³ 0

mj*g j (x * ) = 0 "1 £ j £ r

If and only if
(x * ) Ç  EQ (x * ) = Æ

Proof: Suppose there are no equality constraints
...

If there are equality constraints h(x ) = 0 , we can replace them with inequality
constraints h(x ) £ 0 and -h(x ) £ 0
...
However, it is possible to prove the existence of Lagrange
multipliers under weaker assumptions called constraint qualifications
...

The weakest form of constraint qualification is quasiregularity, which requires
that (x * ) =  (x * )
...
t
...
Then x* is quasi-regular and Lagrange
multipliers exist
...


o

Theorem (General Sufficiency Condition): Consider the optimization
program

minx ÎW f (x ) s
...
g(x ) £ 0
Let x * Î n be a feasible point, and m* Î r be a vector such that
m* ³ 0

mj* = 0

"j Ï (x * )

x * Î argminx ÎWL(x, m* )
Then x * is a global minimum
...

Proof:
f (x * ) = f (x * ) + m* ⋅ g (x * )

{

}

= minx ÎW f (x ) + m*g (x )

{

*

}

£ minx ÎW,g (x )£0 f (x ) + m g (x )
£ minx ÎW,g (x )£0 f (x )

Daniel Guetta

Foundations of Optimization Notes

Page 30

Optimization over Convex Sets


Consider the problem min f (x ) subject to x Î  Ì n where  is a convex set
...
Consider applying
the mean value theorem to the function g(e) = f éêx * + e(x - x * )ùú :
ë
û
g(e) - g(0) = g ¢(c ) éëêe - 0ùûú
f éêx * + e(x - x * )ùú = f (x * ) + ef x * + es(x - x * ) ⋅ (x - x * )
ë
û

(

Where s Î (0,1)
...
This,
however, implies that f (x * ) > f éêx * + e(x - x * )ùú
...



Theorem (optimality for convex optimization): Suppose f : n   is convex
over 
...
Any local minimum of f is also a global minimum
...
If f is strictly convex, then there exists at most one global minimum
...
Suppose x * is a local minimum, and there exists some x ¹ x * with
f (x ) < f (x * )
...
The fact the value of f at that point is more than at x * contradicts
our local optimality assumption
...
Suppose x 0 ¹ x1 are two global minima
...



Theorem (necessary & sufficient conditions): If

f : n   is convex over 

and differentiable and x * Î  is a feasible point, then x * is globally optimal if and only
if
f (x * ) ⋅ (x - x * ) ³ 0

"x Î 

Proof: Necessity follows from the previous theorem
...


Daniel Guetta

Foundations of Optimization Notes

Page 32


m

x

o

Theorem (supporting hyperplane): Let  Ì n be a convex set and x Î n
be a point that is not in the interior of 
...

Proof: The intuition behind this proof will be to consider a sequence {xk } of
points outside the closure of  converging to x
...

We consider the projection of each of these x k onto  (denoted xˆk ), and we
consider the hyperplane perpendicular to the line from x k to xˆk :

Supporting hyperplane
of interest

xˆ1
xˆ2

x

xˆ3

x1
x3

x2

We then show that


The set  lies on one side of each of these hyperplanes
...
In other
words, x lies on the limiting supporting hyperplane
...
Define  = cl  , and note that  is convex
...
Then, by the projection theorem,

(xˆ

k

- xk ) ⋅ (x - xˆk ) ³ 0

And this means that for all k and x Î  ,

Daniel Guetta

"x Î 

Foundations of Optimization Notes

(xˆ

k

Page 33

- x k ) ⋅ x ³ (xˆk - x k ) ⋅ xˆk

= (xˆk - x k ) ⋅ (xˆk - x k ) + (xˆk - x k ) ⋅ x k

³ (xˆk - x k ) ⋅ x k

More succinctly

(xˆ

k

- xk ) ⋅ x ³ (xˆk - xk ) ⋅ xk = constant

"x Î 

In other words,  lies on one side of each of those hyperplanes
...
Letting k  ¥ , we get mk  m and xk  x , so

m ⋅x ³ m ⋅x

"k, x Î 

As required
...
Then there exists a hyperplane that separates them; ie: a vector
m Î n , m ¹ 0 and a scalar b Î  with
m ⋅ x1 £ b £ m ⋅ x 2

"x 1 Î 1 , x 2 Î 2

1
m

2

Proof: Consider the convex set

 = 1 - 2 = {x1 - x 2 : x1 Î 1 , x 2 Î 2 }

Daniel Guetta

Foundations of Optimization Notes

Page 34

Since the two sets are disjoint, 0 Ï 
...


Theorem (strictly separating hyperplane): Let  Ì n be a closed convex
set and x Ï  a point
...
In other words, $m Î n \ {0}, b Î  such that
m ⋅ x < b < infx Î m ⋅ x


m

x

Proof: Define r = minx Î x - x
...
Now, let  = x Î n : x - x £ r / 2
...
Diagramatically:


m

x

o



Corollary: If   n is a closed convex set, then it is the intersection of all
closed halfspaces that contain it
...


Since  ¹ n , the strictly separating hyperplane theorem implies that  is nonempty (since there is a point x Î n and x Ï  ) and clearly, C Ì H
...






Thus, x Ï H
...


Farkas’ Lemma
o

Lemma (Farkas): Consider a matrix A Î m´n
...
z ⋅ y £ 0 for all y Î m with Ay £ 0
...
z = Ax for some x Î n with x ³ 0
...
z makes an obtuse angle with all vectors y that make an obtuse angle
with every column of A
...
z lies in the cone formed by the columns of A
...



1  2 Suppose z satisfies (1) [it makes an obtuse angle to y, which

makes an obtuse angle to all the columns of A], but that there is no
x ³ 0 with z = Ax [it lies outside the cone defined by the columns of A]
...
Clearly, z ¹ 
...



Obtuse angle with the columns of A: Note that lAcol i Î  ,
and so y ⋅ z > ly ⋅ Acol i
...
This implies that the
hyperplane must be “titled away” from every edge of the cone (the
columns of A), and that the normal must make an obtuse angle to
the said column
...


We have therefore reached a contradiction
...



Consider a market with n assets
...


Daniel Guetta

Foundations of Optimization Notes


Page 37

The current price of asset i is vi, and so the current price of portfolio x is

v ⋅x
...
In scenario m, asset n yields Rmn
...



Definition: An arbitrage opportunity is a portfolio x such that
v ⋅x < 0

Rx ³ 0

In other words, the portfolio costs a negative amount to buy right now
and will always yield nonnegative returns
...



By Farkas’ Lemma, a market is consistent if and only if there exists a
vector q ³ 0 with v = R q
...
We
also have that

vi =

1
p ⋅ R col i
1+r

(

)

In other words, the prices today are the discounted value of the future
prices under the distribution p
...




The primal and the dual
o

Definition: Consider the primal problem min f (x ) s
...
g (x ) £ 0r , x Î W Ì n ,
and define

f * = infx ÎW,g (x )£0 f (x )
Assumption: The feasible set is non-empty and the optimal cost is bounded
below
...

Definition (Lagrangian function): For x Î W, m Î r

 (x, m) = f (x ) + m ⋅ g(x )
We say m * is a geometric multiplier

Daniel Guetta

Foundations of Optimization Notes

o



m* ³ 0



f * = infx ÎW L(x , m * )

Page 38

Definition: The set  Ì r +1 of constraint-cost pairs is defined by

{

}

 = (g(x ), f (x )) : x Î W

Given (m, m0 ) Î r +1 \ {0} , the hyperplane passing through (z , w ) is

{(z, w ) : m ⋅ z + m w = m ⋅ z + m w }
0

0

The positive/negative halfspace has the = replaced by >/<
...
In such a case, it can be normalized such that m0 = 1
...



f (x )

(0, L(x, m)

m
(g(x ), f (x ))

g(x )

In other words, the simple interpretation of L(x, m) is the point at which
a given hyperplane crosses the vertical axis
...


Daniel Guetta

Foundations of Optimization Notes

Page 39

f (x )



(0, inf

x ÎW

L(x , m))

m

g(x )



m * is a geometric multiplier if and only if m* ³ 0 and, among all

hyperplanes with normal (m* ,1) that contain  in the positive halfspace,
the highest interception of the vertical axis is attained at f * (in other
words infx ÎW L(x, m* ) = f * )
...
Thus, there is no
m

for which infx ÎW L(x, m) = f *

Daniel Guetta

because, as we saw in part (2),

Foundations of Optimization Notes

Page 40

infx ÎW L(x, m) considers the highest intercept for planes that contain the
whole of  in one of their halfpsaces
...
The hyperplane is the set (z, w ) satisfying

m ⋅ z + w = m ⋅ g(x ) + f (x )
Clearly, if z = 0, we must have w = L(x, m)
...
The hyperplane with normal (m,1) that intercepts the axis at level c is
the set (z, w ) with
m ⋅z + w = c

If  lies in the positive halfspace, then

L(x, m) = m ⋅ g(x ) + f (x ) ³ c

"x Î W

Thus, the maximum intercept is infx ÎW L(x , m)
...
We need to show that f * = infx ÎW L(x , m * )
...

o

Theorem: Let m * be a geometric multiplier
...

f (x )





f (x )

m
m

g(x )

Daniel Guetta

g(x )

Foundations of Optimization Notes

Page 41

Proof: Assume x * is a global minimum
...
Thus, since m* ³ 0 , we must have

m*j g j (x * ) = 0
Conversely, if (x *, m* ) = infx ÎW (x, m* ) and m* ⋅ g (x * ) :
f (x * ) = f (x * ) + m * ⋅ g (x * ) = L(x * , m * ) = min x ÎW L(x , m * ) = f *

o

We define the dual function q(m) = infx ÎW (x, m) ; it is concave over its domain,
which is convex [because q is a point-wise minimum of concave – actually linear –
functions]
...




Find the value at which it intercepts the f(x) axis
...


Diagrammatically:
f (x )



(0, q(m))
m

g(x )

o

We then define the dual problem as

max q(m) s
...
m ³ 0
And the dual optimal value is given by q * = maxm³0 q(m)
...


Daniel Guetta

Foundations of Optimization Notes

Page 42

Geometrically, the dual problem takes all of the snuggly fitting hyperplanes
defined by different values of m and finds the one with the highest intercept:
f (x )


q*

g(x )

[Parenthetically, recall that we proved above that a convex set can be described
as the intersection of half-spaces containing it
...

o

We now consider a number of examples to illustrate the concepts above


Consider the program
minx Î 2 f (x ) = x 1 - x 2 s
...
g(x ) = x 1 + x 2 - 1 £ 0
+

It turns out the solution to this problem is f * = -1, x * = (0,1)
...
The dual function is
q(m) = minx Î 2 (x , m)
+

= minx Î 2 (m + 1)x 1 + (m - 1)x 2 - m
+

Clearly, this is only bounded if both the coefficients of x1 and x2 are
positive
...
Thus
ìï-¥
q(m) = ïí
ïï -m
î

m <1
m ³1

Maximising this function, we clearly find m* = 1  q * = -1 = f *
...



Consider the program

Daniel Guetta

Foundations of Optimization Notes
min

Page 43

{

}

x Î ( x1 , x 2 )Î 2 :x 2 ³0

f (x ) = x1 + x 2 s
...
g(x ) = x1 £ 0

It turns out the solution to this problem is f * = 0, x * = (0, 0)
...
If

| m | £ 1 , the first term will dominate over the last term, and x1 = 0 will
be the bounded solution
...

So once again, no duality gap
...



Consider the program

minx Î f (x ) = x s
...
g(x ) = x 2 £ 0
It turns out the solution to this problem is f * = 0, x = 0
...



Consider the program
minx Î{0,1} f (x ) = -x s
...
g(x ) = x - 12 £ 0
By inspection, the solution is f * = 0, x * = 0
...
There is a duality
gap
...
Whatever the shape of the set  ,
the “snug hyperplane” with the highest cutting point will be lower than, or at
the same place as f * :
f (x )



f*

q*
g(x )

Proof: Consider feasible x and m ; in other words, m ³ 0 , x Î W and g(x ) £ 0
...
If q * < f *
then there is a duality gap
...
If there is a duality gap, the set of geometric
multipliers is empty
...
By weak duality, this holds if and only if there
is no duality gap
...
By weak duality,
this means that q(m) = -¥ "m ³ 0 , and so the dual problem is infeasible
...

However, if the primal is infeasible, we can say nothing about the dual
...
t g(x ) £ 0 if and only if
1
...
DUAL FEASIBILITY: m* ³ 0
3
...
COMPLEMENTARY SLACKNESS: mj*g j*(x * ) = 0 "1 £ j £ r
Note that this theorem is only useful if there is no duality gap
...
3 and 4 hold by earlier theorems
...
Therefore f * = q * , f * is
primal optimal and q * is dual optimal
...

Proof: Two parts:


Optimal  Saddle If (x *, m* ) is an optimal solution/geometric multiplier
pair, then from optimality condition (3), we have
(x *, m* ) £ (x, m* )

Furthermore,

since

g(x ) £ 0 ,

for

"x Î W

any

m ³0

we

must

have

(x *, m) £ f (x ) = (x * , m* )
...
Therefore, the first of the two options for

supm³0 (x *, m) options must be correct, and g (x * ) £ 0
...




Condition 3: Obvious, by the definition of a saddle point
...
Thus, (x *, m* ) = f (x * ) and this
means that m*g (x * ) = 0
...




Extension to equality constraints
o

The theory above can be extended to equality constraints by simply introducing
a pair of inequality constraints for each equality constraints
...


o

It turns out this is equivalent to simply eliminating the non-negativity
constraints on the Lagrange multipliers for equality constraints
...




Strong Duality
o

Theorem (Linear Constraints): Consider the following problem:

(P) : minx f (x ) s
...
Ax £ b, x Î n
The Lagrangian is  (x, m ) = f (x ) + m(Ax - b ) and q(m) = infx Î n (x, m) and
the dual is
max m q(m) s
...
m ³ 0, m Î m
If f is convex over n and continuously differentiable, then if the primal has an
optimal solution, then there is no duality gap, and at least one geometric
multiplier exists
...
Then, by the KKT
conditions, there exists m* Î m such that
m* ³ 0

m*(Ax - b ) = 0

Daniel Guetta

f (x * ) + Am* = 0

Foundations of Optimization Notes

Page 47

(We do not need to check for regularity, since the constraints are linear)
...

o

Extension to equality constraints: The above trivially extends to linear
equality constraints
...
t
...

Then, provided that


There exists an optimal solution x *



f and g are continuously differentiable



There exists multipliers (l *, m* ) satisfying the KKT conditions (ie: some
sort of regularity)
...

(Note that we are effectively requiring inequality constraints to be convex and
equality constraints the be linear
...
Indeed, since m is positive, mg (x ) is convex for
all g
...

o

Theorem (Slater’s Condition): Consider the problem

Daniel Guetta

Foundations of Optimization Notes

Page 48

minx f (x ) s
...




There exists a vector x with g(x ) < 0

Then there is no duality gap, and there exists at least one geometric multiplier
...

Note also that (0, f * ) is not in the interior of 
...


the

supporting

hyperplane

theorem,

there

exists

a

normal

vector

(m, b ) ¹ (0, 0) such that
b f * £ bw + m ⋅ z

" (z, w ) Î 

Now, consider; if (z, w ) Î  , then (z, w + g ) Î  , for all g ³ 0
...

Similarly, m ³ 0
...
This is a contradiction,
which means that we must have b > 0
...


Diagrammatically, if the set  =

{(g(x ), f (x )) : x Î W}

and the function f(x) look

like this:
f (x )
f (x )



g(x )

g min

x
W

Then the set  = {(z , w ) Î 

r +1

: $x Î W with g (x ) £ z , f (x ) £ w

} looks like

w



(m, b )

(0, f * )

g min

z

The line must be sloping as above (ie: m and b must be positive)
...
Therefore,
we can’t have b = 0 (ie: the line can’t be vertical)
...

o

Note that the requirement that there be a g(x ) < 0 is crucial
...
These are
harder to deal with and require a new definition

Daniel Guetta

Foundations of Optimization Notes

o

Page 50

Definition (Relative interior): Suppose  Ì n is a convex set
...

For example, consider the set  = [0,1] Î 2
...
However, it has a relative interior:
Ï
1

0

relint 

o

Theorem (Slater’s Conditions with Mixed Constraints): Consider the
program (E is a matrix)

minx ÎW f (x ) s
...
g(x ) £ 0 , Ax £ b, Ex = d
Suppose that the optimal value f * is finite and that


W is the intersection of a convex set  and a polyhedron



The functions f and g are convex over W



There is a feasible vector x with g(x ) < 0



There is a vector x with Ax £ b , Ex = d , x Î relint  and x Î W

Then there is no duality gap, and there exists at least one Lagrange multiplier
Title: Foundations of Optimization - Lecture Notes
Description: Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations Notes covering more technical material in the "Foundations of Optimization" class above, including duality, KKT conditions, no-convex program and more.