Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Convex Optimization - Lecture Notes
Description: Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations Condensed Notes roughly following two courses I took - "Foundations of Optimization" (thought by Prof Ciamac Moallemi) and "Convex Optimization" (thought by Prof Garud Iyengar). These notes are also heavily based on Boyd and Vandenberghe's book "Convex Optimization" (available online) and Luenberger's "Optimization by Vector Space Methods". The chapter numbers in these notes refer to Boyd and Vandenberghe's text. Rough list of topics covered: convexity of sets and functions, formulation of convex programs (from linear programs to semi-definite programs), duality, applications, Hilbert and Banach spaces, minimum-norm problems in Banach spaces, the Hahn-Banach Theorem.

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Convex Optimization

Page 1

CONVEX OPTIMIZATION
Chapter 2 – Convex Sets


Basics
o

A set is affine if it contains any line through two of its point
...


o

The affine hull of a set of points is the set of all affine combinations of these
points
...
Its relative
interior is its interior relative to its affine hull

relint  = {x Î C : B(x , r ) Ç aff  Í  for some r > 0}
o

The most general form of a convex combination is (x ) , where (x Î  ) = 1
...




The set (x , t ) : x £ t



The conic hull of {xi } is {l1x1 +  + lk xk : l ³ 0 }
...
Hyperplane with normal vector
a, offset b from the origin; can be written as {x : a ⋅ (x - x 0 ) = 0} = x 0 + a ^

o

Given k + 1 affinely independent pints (ie: vi - v0 linearly independent), the kdimensional simplex determined by these points is  = {å qi vi : qi ³ 0, q+ = 1}
...
All points x Î  can
then be expressed as x = v0 + Bq ¢ provided q ¢ ³ 0 and 1 ⋅ q ¢ £ 1



B has rank k (by assumptions) and k < n, and so there exists a A Î n´n
é A B ù éI ù
such that AB = êê 1 úú = êê úú
...
We can therefore express q ¢ ³ 0 and 1 q ¢ £ 0 as linear
inequalities
...




Operations that preserve convexity
o

Intersection (including infinite intersection) – also preserve subspaces, affine

sets and convex cones:


n+

Example: The positive semidefinite cone

 {X Î 
z ¹0

n

can be written as

}

: z  Xz ³ 0
...



{

Example:  = x Î m :

as
o



t Î[- p , p ]
3 3

{X Î 

n

åx

i



}

cos(it ) £ 1 for t Î [- p3 , p3 ]

can be written

: -1 £ (cos t, , cos mt ) ⋅ x £ 1} , and so is convex
...
The image

and inverse image of a convex set under such a function is convex
...



Example:

{x : Ax £ b,Cx = d }

Example:

{x : A(x ) = x A +  + x A
1

1

n

n

 B

}

is the inverse image of the

{



}

Example: x : (x - xc ) P (x - xc ) £ 1 , where P Î n++ is the image of a

unit Euclidean ball under f (u ) = P 1/2u + xc
...



of

is the inverse image of  + ´ {0} under

f (x ) = (b - Ax, d - Cx )
...
It normalizes the last

component of a vector to 1 and then gets rid of that component
...

o

Linear-fractional function: A linear-fractional function is formed by compsing

that perspective function with an affine function
...


Daniel Guetta

Convex Optimization


Page 3

Separating & Supporting Hyperplanes
o

Theorem: If  Ç  ¹ Æ then $a ¹ 0 and b such that a ⋅ x £ b "x Î  and
a ⋅ x ³ b "x Î 
...

o

{

Example: Consider an affine set  = Fu + g : u Î m

}

and a convex set 

which are disjoint
...
The only way a

linear function can be bounded below is if it’s 0 – as such, a F = 0 , and
b £ a ⋅g
...
Provided at least one of them is

open, they are disjoint if and only if there exists a separating hyperplane
...
Thus, a ⋅ x is strictly less than 0
for all points in the open set
...


Consider

{

 = b - Ax : x Î  n

}



This

has

a

solution

if

and

only

if

and  = m++ do not intersect
...
In other words, there is not separating hyperplane iff
l ¹0

l ³0

Al = 0

l b £ 0

Thus, only one of this sytem and Ax < b can have a solution
...


o

Theorem: f is convex over  convex iff f (y ) ³ f (x ) + f (x ) (y - x ) over 
...
Apply equation with y = z and

x = x i
...
Add the two
...
By convexity f (x + t[y - x ]) £ (1 - t )f (x ) + tf (y ) for t Î (0,1)
...
General case

consider

g(t ) = f (ty + (1 - t )x ) and g ¢(t ) = f (ty + (1 - t )x ) (y - x )
...


Apply inequality with ty + (1 - t )x and

ty + (1 - t)x
...


o

Theorem: 2 f (x )  0 over  convex  f convex over 
...
If 2 f is positive definite, get FOC for convexity
...
)

convex

n

( x )

(the geometric mean)

concave

(0, ¥)n

log det X

(the log determinant)

convex

X Î n++

å w f (x )

wi > 0

a ⋅x +b

log

(å e )
xi

1/n

i

i i

Same as fi, providing they are all
concave/convex

Ways to find convexity:
o

Directly verify definition

o

Check the Hessian: for example, for f (x , y ) = x 2 / y
2
é ù
-xy ùú
2 êé y
2 é
ùê y ú0
 f (x , y ) = 3 ê
=
y
x
2 ú
ê
úû ê-x ú
y ëê-xy x ûú y 3 ë
ëê úû
2

Daniel Guetta

for

Convex Optimization

o

Page 5

Restrict to a line: f is convex if and only if g(t ) = f éêëtx1 + (1 - t )x 2 ùúû is convex

over [0,1] " x1 , x 2
...
Take the line X = Z + tV ,
restricting to values of t for which X  0 , and wlog, assume it contains t = 0
...
Taking derivatives of g, we find
that the derivative is always < 0
...

o

Use the epigraph: Consider f (x ,Y ) = x Y -1x
...

éY
ïïì
epi f = (x ,Y , t ) : x Y -1x £ t,Y  0 = í(x ,Y , t ) : êê 
ï
x
ëê
îïï

{

}

ü
ï
x ùú
ï


0,
Y
0
ý
ï
t úú
ï
û
ï
þ

(We used Schur Complements)
...

o



Jensen’s Inequality: f convex  f ( [x ]) £  éêë f (x )ùúû , "x s
...
(x Î dom f ) = 1
...


o

The perspective function g(x, t ) = tf (x / t ) [t > 0] is convex if f is convex
...

o



The pointwise maximum supyÎ f (x, y ) is an extended-value convex function, if

f ( ⋅, y ) is convex for each y Î  [note that we do not require joint convexity of f]
This corresponds to intersections of epigraphs
...
Then we can write

f(x) as the maximum of all the possible sums of r elements of x
...
Convex
...

Note: Every convex function f : n   can be written as

f ¢(x ) = sup {g(x ) : g affine, g (z ) £ f (z ) "z }

Daniel Guetta



Convex Optimization

Page 6

Clearly, f ¢(x ) £ f (x )
...

$(l, m) ¹ 0 such that êê úú ⋅ êê
ú
x
m
f
(
)
t
êë úû êë
úû
Now, (1) we must have m ³ 0 , else as we take t  ¥ violated and (2) we must
have m ¹ 0 , else we get l = 0
...
As such, it’s a global underestimator with g(x ) = f (x )
...
This corresponds to the
projective of a (convex) epigraph onto the subspace of x
...


o

The general composition f (x ) = h (g1 (x ), , gk (x )) behaves as follows:
Result

h

h

g

Convex

Convex

Nondecreasing

Convex

Convex

Convex

Nonincreasing

Concave

Concave

Concave

Nondecreasing

Concave

Concave

Concave

Nonincreasing

Convex

These can be derived (for the case k = 1 and x scalar) by noting that the second
derivative of f is given by

f ¢¢(x ) = h ¢¢ (g(x )) ëêég ¢(x )ûúù + h ¢ (g(x )) g ¢¢(x )
2



Conjugate Functions
o

f * (y ) = supx Îdom f (y ⋅ x - f (x ))
...
If f is differentiable, this occurs at a point at which f ¢(x ) = y
...
Then y ⋅ x - 12 x Qx which is bounded

above and maximized at y - Qx = 0  x = Q -1y , and f * (y ) = 12 y Q -1y
...
Then the conjugate of this

indicator is the support function I * (y ) = supx Î (y ⋅ x ) = s (y )
...
To see why;

> 1 , then $z s
...
y ⋅ z > 1, z £ 1
...
If y

, then x ⋅ y £ x y

*

£ x
...


Example: f (x ) = log

get yi = e i / å e
x

xi

*

(å e )
...




Example: Company uses resources r at price p prices produces revenue S(r)
...


price

is


o

From the definition, we get f (x ) + f * (y ) ³ x ⋅ y
...
Thus, for
any z for which y = f (z ) , f * (y ) = z ⋅ f (z ) - f (z )

o

The conjugate of g(x ) = f (Ax + b) is g * (y ) = f * (A-y ) - b A-y
...


o

Let f be a proper convex function
...
¶f (x ) is the set of all subgradients at x
...
If x is in the interior of the domain of f, ¶f (x ) is nonempty, and if f is differentiable at x, ¶f (x ) = {f (x )}
...

Proof: Follows directly from the definition of f*
...


Daniel Guetta



Convex Optimization

o

Define

the

Page 8
cumulant

generating

function

f (l) = log [e lX ] ,

and

define

f+ (l) = f (l ) if l ³ 0 and ¥ otherwise
...


Then,

defining

f (l) = log [e l ⋅X ] , we find that
log 
(X Î  ) £ infl ,m {m + f (l ) : -l ⋅ z £ m "z Î  }
= infl {supz Î (-l ⋅ z ) + f (l )}

= infl {S  (-l) + f (l)} = -f* (0)

o

Example: Let X be multivariate Gaussian: X ~ N (0, I )
...
Consider  = {x : Ax £ b}
...
t
...




Intepreting l as a slack variable, this becomes supAx £b - 12 x



As such, (X Î  ) £ exp éëê- 12 dist(0,  )ùûú
...
t
...


Daniel Guetta

Convex Optimization

o

Page 9

Change of variables – consider a one-one function f : n  n

with

 Í f (dom f )
...



Example (Linear-fractional programming): Consider

min ec⋅⋅xx ++df s
...
Ax = b,Gx £ h
Simply write the objective as c ⋅
o

(

x
e ⋅x + f

)+d (

1
e ⋅x + f

) , and min c ⋅ y + dz
...




y1 , , ym :    satisfy yi (u ) £ u  u £ 0



ym +1 , , ym + p :    satisfy yi (u ) = 0  u = 0

Then composing f and h with these functions leads to the same problem
...
Then we can eliminate the equality constraints and optimize
f0 (f(z )) s
...
fi (f(z )) £ 0 over z
...
This preserves convexity, since this is an affine transformation
...
For example, take minx :f (x
i

1

)£0

x Px
...
Thus, the problem is equivalent to

minx :f (x
1 i

o

1

)£0

x1Sx1
...
Particularly useful for minimax problem
...
t
...




Implicit & Explicit constraints – if the objective function has a restricted

domain, it can often be unrestricted by adding additional constraints instead, and
vice versa
...
t
...
In either case,
the e -suboptimal sets are convex
...
Alternatively,
anywhere feasible we try to move from x yields an increase in objective
...
If the optimality condition

is met, f0 (y ) ³ f0 (x ) "y
...
t
...
Then consider

z(t ) = ty + (1 - t )x, t Î [0,1]
...
So

close to 0, we can decrease f0 by moving away from f
...
t
...




For problems with x  0 , need f0 (x ) ³ 0 , else f0 (x ) ⋅ (y - x ) unbounded
below
...
This gives complementary slackness…

Examples of convex optimization problems
...
Require that

 lies on one side of a ⋅ x £ b is equivalent to requiring:
sup u £r {a ⋅ xc + a ⋅ u } £ b  a ⋅ xc + r a £ b


o

Example: minx maxi (ai ⋅ x + bi ) can be linearised using the epigraph
...
t
...
t
...
Similarly, x = y/z is feasible for the original problem, if
z ¹ 0
...

If z = 0 and x0 is feasible for the original problem, then x = x0 + ty is optimal
for the original problem; taking t  ¥ allows us to gets the two objectives
arbitrarily close to each other
...
t
...



Example: Distance between polyhedra min x1 - x 2



Example:

Consider

an

LP

minimizing

c ⋅x

2
2

s
...
x1 Î 1 , x 2 Î 2
...
Then ar(c ⋅ x ) = x Sx
...





Example (portfolio problems): pi is relative price change of asset i

(change/start) and xi is amount of asset bought
...
Minimize variance subject to given mean return
...
Short sales: Can set x = x long - x short and
1 ⋅ x short £ h1 ⋅ x long
...

o

Quadratically constrained quadratic programs (QCQP)
minx

1
2

x P0x + q 0 ⋅ x + r0

s
...


1
2

x Pi x + q ⋅ x + ri £ 0 , x Î 

Where the Pi are positive definite
...

o

Second-order cone program (SOCP)
minx f x s
...
Ai x + b i

2

£ c ⋅ x + d i =,1, , m, x Î 

Daniel Guetta

Convex Optimization

Page 12

If ci = 0 "i , this reduces to a QCQP (square both sides to see)
...
t
...
Can write
£ 1} £ b  a ⋅ x + P x £ b ,

a i Î i = a i + P i u | u £ 1

{

a i ⋅ x + sup u P i ,x | u

i

i

i ,

for all

constraint as

i

an

SOCP
...




Note: If  =  = {x : Ax = f } is a polyhedron with k vertices, then

sup {a ⋅ x : x Î  } must occur at a vertex, so 1 inequality just turns into k
inequalities
...



Example

(uncertain

LP):

Say

a i ~ N (a i , Si )
...
Note that D(a i ⋅ x ) = x Si x = S1/2
i

2

ask


...

o

for



Semidefinite programs (SDP)

minx c ⋅ x s
...
x1F1 +  + xn Fn + G  0, x Î 
Multiple LMIs are easily dealt with by forming a large block diagonal LMI from
the individual LMIs
...



Example (bounds on eigenvalues): Consider min A

2

( A

2

is max

singular value)
...
Using Schur complements,
2

é sI A ù
ú0
AA - s 2 I  0  êê 
ú
A
sI
êë
úû


(Clearly, sI  0 and the Schur complement is S = sI - AAs )
...
The original LMI was

Daniel Guetta

Convex Optimization

Page 13

quadratic – Schur complements allowed us to make it linear
...


Given a portfolio x, can maximize x Sx s
...
that constraint and S  0
to get worst-case variance
...



Factor models: Say p = Fz + d , where z are random factors and

d

represents

additional

randomness
...



Correlation coefficients: rij =

Sij
Sii Sjj


...
K = 1 gives a monomial (closed under ´, ¸ )
...
t
...
Can deal with f (x ) £ g(x ) and h(x ) = g(x )
by dividing
...

o

Example: max x1x 2x 3 s
...
x1x 2 + x 2x 3 + x1x 3 £ c / 2, x > 0 [min volume box] can

be written as min x1-1x 2-1x 3-1 s
...
(2x1x 2 / c ) + (2x 2x 3 / c ) + (2x1x 3 / c ) £ 1, x > 0
...
Feed in and then take logs of
objective and constraints
...




Existence of Solutions
o

Theorem (Weierstrass): Consider the problem min f (x ) s
...
x Î  Ì n
...

Proof: Let f * be the optimal objective, and gk  f *
...
If (3) is true, this is an intersection of nested non-empty

compact sets – it is therefore non-empty and compact
...
Since  is compact,
this closed subset is also compact
...

Since f is coercive, (g ) is also bounded
...

o



Example: Consider min 12 x Px - b ⋅ x , x Î 
...
This is coercive if

l > 0
...




Chapter 5 – Duality


The Lagrangian Function
o

min f0 (x ) s
...
f (x ) £ 0 ,h(x ) = 0
...
This clearly underestimates the
optimal value, because everywhere in the feasible region, (x, l, n ) £ f0 (x )
...

o

Example: min x ⋅ x s
...
Ax = b
...
Differentiating, set to

0, minimum is at 2x + Al = 0 , so g(l ) =  (- 12 Al, l )
...
t
...
Involves partitioning the xi into either a “+1”

group or a “–1” group; Wij is the cost of having xi and xj in the same/different
partitions
...
Minimizing
over x, we find that g(n ) = -1 ⋅ n if éêëW + diag(n )ùúû  0 and -¥ o
...
Can use to
find bound – for example, using n = -lmin (W )1 , feasible because W - lmin I  0
...
t
...
The dual function is

g(l, n ) = infx éëê f (x ) + l ⋅ (Ax - b ) + n ⋅ (Cx - d )ùûú
= -l ⋅ b - n ⋅ d + infx éê f (x ) + (l A + n C )x ùú
ë
û
= -l ⋅ b - n ⋅ d - supx éê(-Al - C n ) x - f (x )ùú
ë
û
= -l ⋅ b - n ⋅ d - f * (-Al - C n )
Example: f (x ) = x and only equality constraints
...

the indicator of the unit ball in its dual norm, so g(n ) = í
ïï -¥
otherwise
ïî
Example: Min entropy: min å x i log x i s
...
Ax £ b, 1 ⋅ x = 1
...

ë
û



The Lagrange Dual Problem
o

The dual problem is max g(l, n ) s
...
l ³ 0 , with domain {(l, n ) : g(l, n ) > -¥}
...
If the primal is unbounded below, the dual is
infeasible
...


o

The dual is always convex and can be used to find a good lower bound
...
If the
constraints are linear, only feasibility is needed, not strict feasibility
...
t
...
The Lagrangian is
P (l )
q (l )
r (l )




1 
(x , l ) = 2 x (P0 + å li Pi ) x + (q 0 + å liqi ) ⋅ x + (r0 + l ⋅ ri )

l ³ 0 and so P(l )  0
...
t
...
p * = d * if we have
strict feasibility
...


)
...
Then

p * = inf {t : (u, v, t ) Î , u £ 0 , v = 0 } ,

and g(l, n ) = inf {(l, n,1) ⋅ (u, v, t ) : (u, v, t ) Î  }
...
Looking at the expression for p * and g(l, n ) , it is clear that
p * ³ g(l, n ) when l ³ 0 , because it involves more constraints
...
This then allows us to write p * = inf {t : (0 , 0 , t ) Î  } ,
and, for l ³ 0 , g(l, n ) = inf {(l, n,1) ⋅ (u, v, t ) : (u, v, t ) Î  }
...


Since

(0 , 0 , p * ) Î Bd( ) ,

we

have

p * = (0 , 0 , p * ) ⋅ (l, n,1) ³ g(l, n ) , which is weak duality
...

o

To prove strong duality, let  = {(0 , 0 , s ) : s < p * }
...
Create a separating hyperplane (l, n, m) so that


(l, n, m) ⋅ (u, v, t ) £ a "(u, v, t ) Î   mt £ mp * £ a



(l, n, m) ⋅ (u, v, t ) ³ a ³ mp * "(u, v, t ) Î   l ³ 0 , m ³ 0
...


Daniel Guetta

Convex Optimization

Page 17

Divide first equation by m and substitute in the second to find the Lagrangian is
> p*
...


Applying this to the strictly feasible point with v = 0 and u < 0, we find that
l = 0 , and so n ¹ 0 [supporting hyperplane cannot be 0 vector], so n ⋅ v ³ 0
...
But since we have an interior point
with n ⋅ v = 0 , there are points around that interior point with n ⋅ v < 0
...


Geometrically, this is equivalent to saying the hyperplane must pass to the left of
our interior point
...
One way to obtain
every pareto-optimal point is to minimize l ⋅ F
...


o

Shadow prices: The dual problem is the lowest cost given we can buy some
“constraint violation” and be rewarded when we don’t violate them
...
When strong duality holds,
there is a price that makes us indifferent
...




Optimality conditions
o

Complementary Slackness: Consider that, if x * and (l * , n * ) are primal and
dual optimal points with zero duality gap

f0 (x * ) = g(l * , n * ) = infx (x, l * , n * ) £ (x * , l * , n * ) £ f0 (x * )
As such, every inequality in this line must be an equality
...
This implies that at optimality
l * ⋅ f (x * ) = 0
...
Only
active constraints at the optimum can have non-zero multipliers
...
Finally,

complementary slackness shows we have 0 duality gap
...
t
...
KKT conditions are Ax * = b and

éP A ù éx * ù é-q ù
ê
úê ú = ê ú
...
t
...
This attempt to maximize
communication capacity given total available power of 1 for each channel
...
So



1
ai +xi

x i ³ 0, 1 ⋅ x = 1

"i

(n -

1
ai +x i

)x

i

=0

The first equation gives x i = n1 - ai , but this can only work if ai £

1
n

 ni £

1
a


...
Thus
ìï 1 - a
i
x i = ïí n
ïï 0
î
Using the sum constraint,



å

n
i =1

n < 1 / ai
= max {0, n1 - ai }
n ³ 1 / ai
max {0, n1 - ai } = 1
...




Using the dual to solve the primal
o

Example: min å i =1 fi (x i ) s
...
a ⋅ x = b , (x, n ) = å i =1 fi (x i ) + n(a ⋅ x - b )
...

ë
û

The dual therefore involves a single scalar variable (simple to solve)
...






Sensitivity analysis
o

Consider min f0 (x ) s
...
f (x ) £ u, h(x ) = v
...
If
the problem is convex, this is jointly convex (its epigraph is cl() , above)
...
To prove,

Daniel Guetta

Convex Optimization

Page 19

p * (0 , 0 ) = g(l * , n * ) £ f0 (x ) + l * ⋅ f (x ) + n * ⋅ h(x ) £ f0 (x ) + l * ⋅ u + n * ⋅ v
o

Local Result: If p differentiable & SD, u p * (0 , 0 ) = -l * and v p * (0 , 0 ) = -n *
p * (tei , 0) - p *
¶p * (0, 0)
= limt 0
³ -l *
¶ ui
t

[By global inequality]

Taking t negative gives the opposite inequality
...
The dual isn’t particularly interesting
...
t
...
The

o



same is true of Ax - b
...




Theorems of the Alternative
o

Weak alternatives:


Non-strict Inequalities: Consider
f (x ) £ 0 , h(x ) = 0 has soln  min 0 s
...
 = 0 (and not ¥)
The dual function has the property g(al, an ) = ag(l, n ) and is
g(l, n ) = infx éëêl ⋅ f (x ) + n ⋅ h(x )ùûú
Because of the homogeneity, if there is any g(l, n ) > 0 with l ³ 0 ,
d * = ¥
...
Thus, since d * £ p * , we find that

g(l, n ) > 0, l ³ 0 feasible  f (x ) £ 0, h(x ) = 0 infeasible
In fact, at most one of the two is feasible – weak alternatives
...
t
...

o

Strong alternatives – when fi are convex and hi are affine, we might be able to
prove strong alternatives; that exactly one of them must hold


Strict inequalities: First, consider

(

)

f (x ) < 0 , Ax = b feasible  p * = min s s
...
fi (x ) £ s, Ax = b < 0

Daniel Guetta

Convex Optimization

Page 20

The dual function is g(l, n ) = n ⋅ (Ax - b ) + mins éêë(1 - 1 ⋅ l )s ùúû
...
So the dual is d * = (max g(l, n ) s
...
1 ⋅ l = 1, l ³ 0 )
...
So if
the original system is infeasible (p* > 0), then there exists a

g(l, n ) ³ 0, l > 0
...
In that case, l ³ 0 , g(l, n ) > 0 is clearly feasible
...
Then g(l ) = -l ⋅ b if Al = 0 and -¥ o
...
The
strong system of alternative inequalities is l ³ 0 , Al = 0, l ⋅ b < 0
...

We ask if the intersection has a non-empty interior
...
Here, g(l ) = infx x  ( å li Ai ) x + 2 ( å lib i ) ⋅ x + ( å lic i )
...

As such, the alternative system is l > 0 , -blAl-1bl + cl ³ 0
...
This ellipsoid is empty if
and only if the alternative is satisfied [prove by finding inf f (x ) ]
...


Duality & Decentralization
o

Consider
represents

min å i =1 fi (x i ) s
...

k

a

number

of

å

k
i =1

g i (x i ) £ 0 , x i Î Wi

inequality

constraints]
...
The dual is g(m) = å i =1 gi (m) s
...
m ³ 0
k

k

where gi (m) = infx i ÎW fi (x i ) + m ⋅ g i (x i )
i

Daniel Guetta

k

Convex Optimization

o

Page 21

For example xij could be the quantity of resource j allocated to acitivity i, and we
might want to maximize utility
...


The resulting geometric multipliers can be considered as

prices for a given resources
...




Duality & Combinatorial Optimization
o

The knapsack problem is max v ⋅ x s
...
w ⋅ x £ C , x Î {0,1}n
...
There is then a breakpoint I * (m) until
I * (m)

which vi / wi ³ m
...
Consider
I * (m)
æ I * (m ) ö
æ
ö
that g(m) = ççå i =1 vi ÷÷ + m ççC - å i =1 wi ÷÷
...
We then want m to take its smallest
I

possible value, which isd m * = vI opt / w I opt

o

I opt
æ I opt ö
æ
ö
This gives an upper bound çç å i =1 vi ÷÷ + m * ççC - å i =1 wi ÷÷
...

This is clearly feasible, and corresponds to the greatest “bang for buck” policy
...
t
...


o

Sufficient conditions: If x * is in the interior of the feasible region and
f (x * ) = 0 and 2 f (x * )  0 , then x * is a strict local minimum
...


o

Consider, instead f (x, a )
...
t
...

o

The set of descent directions is (x * ) = {d Î n : f (x * ) ⋅ d < 0}
...
A necessary condition for x * to be optimal is (x * ) Ç  (x * ) = Æ
...
But since d Î   -d Î  , f (x * ) ⋅ d = 0 ]
...
t
...

Example: min x  Gx s
...
1 ⋅ x = 1, m ⋅ x = m
...
Feeding into the others gives a system of
equations for l , whence l = h + zm  x * = mv + w  s 2 = (am + b )2 + g
...
It
can be shown that all constraints but the active ones at the optimum can be
ignored
...


o

When using such conditions, it is important to check for non-regular points as
well
...




Subgradients – another way of expressing optimality conditions is as follows

Daniel Guetta

Convex Optimization

Page 23

x Î arg min { f (x ) : x Î  }  x Î arg min { f (x ) +   (x )}
 0 Î ¶ éëê f (x ) +   (x )ùûú = ¶f (x ) + ¶  (x )
 $g Î ¶f (x ) s
...
- g Î ¶  (x )
 $g Î ¶f (x ) s
...
g ⋅ (y - x ) ³ 0 "y Î 
For the last step, note g Î ¶  (x )    (y ) ³   (x ) + g ⋅ (y - x )  0 ³ g ⋅ (y - x ) "y Î 
...
Has solution 0 if b Î (A)
...


o

Letting y = Ax + v (v is noise), and guessing x based on y, assuming noise small

o

minu Î(A) u - b ; projecting b onto (A)
...


Examples
2

o

min Ax - b ; least square
...


o

min Ax - b

o

min Ax - b ; Robust estimator
...
t
...
Slowest

2

2
¥

; Chebyshev approx problem
...
t
...

o

min f(r1 ) +  + f(rn ) s
...
r = Ax - b is penalty function approximation
...






Least norm problems are min x s
...
Ax = b
...




Regularization

problems

are

bi-objective

(

)

problems; min Ax - b , x
...

o

Example (Tikhonov regularization):
2

2

2

2

min Ax - b + g x

æ A ÷ö
æ ö
ç
÷÷ x - ççb ÷÷÷
= x  A + g I x - 2b Ax + b b = çç
çç0 ÷÷
÷
ççèI g ÷ø
è ø2

(

Solution is

Daniel Guetta

)

Use

Convex Optimization

Page 24


æ A ö÷
÷÷
ççç
ççI g ÷÷
è
ø
o



æ A ö÷
æ A ÷ö
÷÷ x = çç
÷÷
ççç
ç
ççI g ÷÷
ççI g ÷÷
è
ø
è
ø

æ ö
ççb ÷÷

ç0 ÷÷÷  x = A A + g I
çè ø

(

)

-1

Ab



Example (minimizing derivatives and curvature): We can replace our
second objective ( x ) by Dx
...
Two
useful examples of D are


D has 1s across its diagonal and a –1 to the left of each 1
...




D has 2s across its diagonal, and 1s to its left and right
...

2

o

Example (LASSO): min Ax - b + g x
2

1



can be written as QP

2

min Ax - b + g1 ⋅ y s
...
- y £ x £ y



2



Stochastic Robust approximation: minx Ax - b
probability distribution
...
If (A = Ai ) = pi , this becomes

min p ⋅ t s
...
Ai x - b £ ti
...
For a 1-norm or ¥ -norm,
this can be written as an LP
...
We can write A = A + U , and we can then write

2
æ
the objective as min çç Ax - b + P 1/2x
2
è

2
2

ö÷

÷÷ , with P = (U U )
...



Worst-case

Robust

approximation:

We

let

AÎ,

and

we

solve

minx supAÎ Ax - b
...
t
...
For a
polyhedron, try this out for the vertices
...

Consider the approximation problem with the Euclidean norm and the maxsingular-value norm
...
Letting U = a(Ax - b )x  / Ax - b

Daniel Guetta

2

x

2

Convex Optimization

Page 25

achieves that
...
This is solvable as an
2

SOCP: min t1 + at2 s
...
Ax - b
o

Uncertainty

2

ellipsoids:

i = {Arow i + Pi u | u

2

£ t1 , x

2

Assume

2

£ t2
...
Then supAÎ Ax - b can be found by individually

{

maximizing sup Arow i ⋅ x - bi = supu Arow i ⋅ x - bi + (Pi u ) ⋅ x | u

2

}

£ 1
...

2

We then have minx supAÎ Ax - b = minx ,t t

2

s
...
Arow i ⋅ x - bi + Pi x

2

£ ti
...


Chapter 7 – Statistical Estimation


In a parametric estimation problem, we believe that a random variable of interest has a
density that is part of a family px (y ) indexed by x Î W
...
t
...




Example: Consider a model yi = a i ⋅ x + vi , where yi are the observed quantities, x is to
be estimated and the vi are IID noises with a density p
...
ML is least-squares
...


This is  1 - norm approximation
...
The ML estimate is then any x with yi - a i ⋅ x £ a "i
...


Then

We can feed pi into this expression
...

i

i

Convex Optimization

Page 26

Chapter 8 – Geometric problems


In a classification problem, we are given two sets of points {x 1 , , x N } and {y 1 , , y M }
and wish to find a function f (within a family) s
...
f (x i ) > 0 "i and f (y i ) < 0 "i
...

Interestingly, the strong alternative of this system of equations is
l ³ 0 , l ³ 0 ,(l, l ) ¹ 0 , å l x i = å l y i , 1 ⋅ l = 1 ⋅ l
i

i

By dividing by 1 ⋅ l , this becomes l, l ³ 0 , 1 ⋅ l = 1, 1 ⋅ l = 1, å li x i =

å l y
i

i


...



Robust linear discrimination: There are lots of possible solutions to the problem
above; we’d like to find the one that separates the points the most
...


To find the distance between them, take a point

with a ⋅ x + b = 1 and solve a ⋅ (x + ta ) + b = -1  t a
we want to solve min a
1
2

i

2

2
2

2

= -2  Distance = 2 / a
...
t
...


The dual function is g(l, n ) =

1
2

a + t(l ⋅ 1) + t(n ⋅ 1) + éêën ⋅ 1 - l ⋅ 1ùúû b + (n Y - l X )a
...

2

We then note that by Cauchy-Schwarz, (n Y - l X )a £ n Y - l X




2

a

2

and so the

dual is max l ⋅ 1 + n ⋅ 1 s
...
1 ⋅ n = 1 ⋅ l, n Y - l X £ 12 , l, n ³ 0
...
t
...


This is the minimum distance between the convex hull of the points
...
The dual is relatively
2

simple to construct as a QP, and the primal solution can be recovered: a = l X - n Y


Approximate linear separation: If the points cannot be exactly separated, we might
try to solve min 1 ⋅ u + 1 ⋅ v s
...
a ⋅ x i - b ³ 1 - ui "i,a ⋅ y i - b £ -(1 - vi ) "i, u ³ 0 , v ³ 0
...

The support vector classifier minimizes a trade-off between 1 ⋅ u + 1 ⋅ v and
n  M , N , this can efficiently be solved by taking the dual
...
If
2

Convex Optimization


Page 27

Non-linear classification: In the simplest case, we can use a linearly parameterized
family of functions f (z ) = q ⋅ F(z )
...


Infinite-dimensional optimization


Hilbert spaces
o

Definition: A pre-Hilbert Space consists of a vector space V over  with an
inner

x , y :V ´V  
...
x =

x, x

Satisfies

x, y = y, x

(a)

for all l (d)

lx, y = l x, y

(b)

x, x ³ 0 , with

is a norm, and inner product is continuous in

both its arguments under that norm (Proof on Luenberger, pp49)
...


Define

x , y = å i =1 x i yi
...

¥



V = 2 [a, b ] = space of measurable functions x : [a, b ]   such that
2

x(t ) is Lesbegue integrable
...


a

ò

a

b

x(t )y(t ) dt

is Cauchy iff xn - xm  0 as n, m  ¥ ,

y, y
...



Example of an incomplete space: Take V = C [0,1] , the space of
continuous functions on [0,1]
...
However:


V is complete under the first norm
...



V

not

is

complete

under

the

starred

norm,

because

fn - step  0 , so the sequence is Cauchy and leaves the space
...

o



Definition (Orthogonality): We say x and y are orthogonal iff x, y = 0
...
By the joint continuity of the inner
product, this is always closed
...
Then, for any x Î H , mink ÎK x - k has an optimal
solution k0 called the projection of x onto K
...

Proof: Luenberger, pp
...

o

Theorem: If M is a closed subspace of H, then H = M Å M ^ and M = M ^^
...

Proof: Luenberger, pp
...


o

Definition (Linear functional): A function j :V   is a linear functional if

j(ax + by ) = aj(x ) + bj(y )
...
If j is continuous at x0,
it is continuous everywhere
...




Bounded

$M s
...
j(y ) £ M y "y
...


Notes:


If

continuous,

j(z ) =

z

d

continuous

at

( ) = j (d ) £

j d

z
z

z

d

z
z

Daniel Guetta

z

d

0,

j(y ) £ 1 " y £ d
...


Convex Optimization


If

Page 29
bounded,

j(z ) £ M z "z

and

j(z ) £ e "z : z £

so

e
M


...



Example of a non-bounded linear functional: Let V be the space of
all sequences with finitely many

non-zero elements,

x = maxk x k
...

o



Theorem (Riesz-Frechet): If j(x ) is a continuous linear functional, then
there exists a z Î H such that j(x ) = x , z
Proof: Let M = {y : j(y ) = 0}
...
If
M = H, set z = 0
...


(

j(x )

)

j (x )

j x - j( g ) g = j(x ) - j(x ) = 0  x - j( g ) g Î M
j(x )

j (x )

 0 = x - j ( g ) g , g = x , g - j ( g ) g, g
 j(x ) = x,

j( g )
g

2

g = x, z

Note also that by Cauchy-Schwarz, j(x ) £ z x  j = z
...


o

Theorem (Special case of the Hahn-Banach Theorem): Let M Í H be a
closed subspace and jM be a continuous linear functional on M
...

Proof: Easy in the case of a Hilbert space
...
Then define j(x ) = x, m for

x Î H
...




Banach Spaces & Their Duals
o

A Banach space is a normed, complete vector space with no inner product
...
An

example of a linear functional on this space is
j(f ) =

ò

1
0

x (t ) dv(t ) £ x

ò

1
0

dv(t ) £ x TV(v )

Provided the total variation of v, TV(v ) < ¥ , where
TV(v ) = supAll partitions 0=t 1



{

 p = x Î ¥ : x

x


p

p

=1
n

2

å

n
i =1

v(ti ) - v(ti -1 )

}

< ¥ , where

æ ¥
= ççå i =1 x i
è

p

1/ p

ö÷
÷ø

or x

p

= supi x i if p = ¥

1
p
ì
ü
p [0,1] = ï
íx : ò x(t ) dt < ¥ï
ý , with
0
ï
ï
ï
ï
î
þ
1/ p

f

o

p

p
æ 1
ö
= ççç ò f (t ) dt ÷÷÷
è 0
ø

1£ p <¥



Definition: We say V * = {j : j is continuous linear functional on V } is the
dual space of V, with norm j

*

} (

{

= sup j(x ) : x £ 1
...


{ }

" x n* Í V *

Proof: Want to show that

with

x n* - x m*

*

£ e "n, m ³ M e

converges to a point x * = limn ¥ xn* Î V *
...
Since



is complete,

x * (x ) = limn ¥ xn* (x ) exists
...
Now


Linearity: By linearity of expectations, x * is linear
...

Then by the definition of x * (x ) , x * (x ) - x m* (x ) £ e x , and

(

x * (x ) £ x * (x ) - xm* (x ) + x m* (x ) £ e + xm*
0

0

0

*

)x

 bounded



Examples


We have already shown (Riesz-Frechet Theorem) that Hilbert spaces are
self-dual
...
In other words

1
...

2
...

3
...

q

Proof: We prove each step separately
1
...
Clearly, the proposed functional is linear
...

q

As such, the functional is also bounded, with f £ y
...
We prove this in four steps
...



Step 1 – approximate x using “basis” functions
...
We then have

x N = å i =1 x ie i  x
N



Step 2 – find the image under f
...
We then have, by linearity of f

f (x N ) = å i =1 y i x i  å i =1 y i x i
¥

N

By continuity of f, we have f (x N )  f (x )
...
Define the
vector yN Î  p as a “truncated y” series
...
Thus
*

p

1/q

ö÷
÷÷
ø

£ f

"N

*

£ f , and y Î  q
...
We have shown that y



q /p

q

q

£ f and y

q

³ f
...


Theorem: *1 =  ¥
...




Theorem: For p Î [1, ¥) , L*p [0,1] = Lq [0,1] , where p -1 + q -1 = 1
...

q

Theorem: C [0,1]* = NBV[0,1] ; we will prove this later using the HB
Theorem
...

Then, there exists a linear functional F : V   such that F (x ) £ p(x ) "x Î V

and F (x ) = f (x ) "x Î M
...
Clearly, the condition of the theorem

x
...

*

Convex Optimization

Page 33

The only reason we generalize p to a seminorm is to prove the geometric HB
theorem (see later)
...
The crux of the theorem is that this extension has
bounded norm
...
t
...

*

Note 3: Let X be a normed vector space
...
t
...

Define f (ax ) = a x

on the subspace generated by x; this has norm unity
...
This satisfies
the requirements for the point x
...
Then f : C [0,1]  
defined by f (x ) =



ò

1
0

x(t ) dv(t ) is a bounded linear functional in C [0,1]*
...
Then there is a function v
of bounded variation on [0, 1] such that f (x ) =



For the function defined in (2), f = TV(v )
...


ò

1
0

x(t ) dv(t )
...
Clearly, any f defined in this fashion is linear
...
Note that C[0,1] (space of continuous functions on [0,1]) is a subset of
B[0,1], the space of bounded functions on [0,1]
...
t
...
Define a set of
step functions us (t ) = {t £s } Î B[0,1]
...
We can write

(

)

x(t ) » z p (t ) = å i =1 x (ti ) ut (t ) - ut (t ) Î B[0, 1]
n

i -1

i

Where p is some partition of [0,1]
...



Step 2 – find the image under F
...
Since F is linear and the first
term in the sum is a constant, we can write
F (z p ) = å i =1 x(ti )(v(ti ) - v(ti -1 )) 
n



ò

1
0

x(t ) dv(t ) Î 

Step 3 – bridge the gap between B and C
...
Since F is also continuous using the max norm, this
implies that F (z p )  F (x ) = f (x )
...
By linearity of F, we have

å

n
i =1

v(ti ) - v(ti -1 ) = F



£ F
= F

*
*

n

å

(

))

e ut - ut

i =1 i
n

(

e ut - ut

i =1 i

= f

i -1

i

i -1

i

)

*

Where ei = 1 takes the absolute value into account, and the
jump from line 2 to 3 follows since u are step functions
...


Note, however, that the F produced by the HB Theorem is not
necessarily unique
...
This
theorem only states there exists such a v
...
From (1), it is clear that f
f

*

³ TV(v )
...
From (2, Step 4) it is clear that

= TV(v )
...
Because if not non-uniqueness noted above, C [0,1]* ¹ BV[0,1]
...
As such, we
define the space NBV[0,1] (normalized bounded variation), consisting of
all functions of bounded variations that vanish at 0 and are rightcontinuous
...






Odds & Ends
o

We will sometimes abuse notation and write x * (x ) = x , x * , for x Î V , x * Î V *
...
In a Banach
space, it is convenient notation (see Hyperplanes section)
...
Now, we have



x, x * £ x x *

By Corollary 2 of H-B (below), "x Î X , $x * Î X * s
...
x , x * = x x *

As such, if we consider x , x *

as a functional on X**, its norm is x

X


...
The

previously,

norm-preserving;


...
If X = X ** , X is called reflexive
...

o

We have

x , x * £ x * x
...
In a Banach space, we say x * Î X * is aligned with x Î X
x , x * = x * x
...
Similarly, if

S Í X , we say S ^ = x * Î X * : x, x * = 0 "x Î S Í X *
...


Daniel Guetta

Convex Optimization

o

Page 36

Theorem: If M Í X , then ^ (M ^ ) = M
...


To

show

the

converse,

we’ll

show

that

x Ï M  x Ï ^(M ^ )
...
It can be shown that f < ¥ ,
and so by the HB Theorem, we can extend it to some F which also vanishes on
M
...
However, F (x ) = F , x = 1 ¹ 0 , and so x Ï ^(M ^ )
...
There are clearly two ways to take the norm of
that vector – as an element of X or as an element of X ** (a functional on X * )
...
Let us now restrict
ourselves to a subspace M of X
...
The second is the most x can yield under a functional of
norm 1 that annihilates any element of M
...

So it makes sense that the two should be equal
...
Let

x Î X
...

÷÷
ø


...


Daniel Guetta

Convex Optimization

Page 37

Intuitively, this is because at the optimal m, the residual x – m0 is aligned to
some vector in M ^
...
For every
other x * , it’ll be smaller than that
...


M

^

x

m0

x - m0

M
This also implies that a vector m0 is the minimum-norm projection if and only if
there is a non-zero vector x * Î M ^ aligned with x – m0
...
Let x * Î X *
...
If the supremum
is achieved for some x 0 Î M , then x * - m0* is aligned with x0
...


o

In many optimization problems, we seek to minimize a norm over an affine
subset of a dual space rather than subspace
...
In that case, if x * is some vector that
satisfies these constraints,

Daniel Guetta

Convex Optimization

Page 38

min

yi ,x * =ci

x * = minm * ÎM ^ x * - m *
= sup x £1 x , x *
x ÎM

= sup
= sup

å ai yi £1
å ai yi £1

åa y ,x
i

*

i

c ⋅a

Where M is the space generated by the yi
...
Note that the optimal

åa y

is aligned with the

i i

optimal x*
...


minimizing

Assume

the

motor

is

governed

by

q(t ) + q(t ) = u(t )
...

Choose, u(t ) Î L¥ [0,1] , which is the dual of L1[0,1]
...

We can also integrate the governing equation directly to get
q(1) - q(0) + q(1) - q(0) =

ò

1
0

u(t ) dt  q(1) =

ò

1
0

u(t ) dt - q(1)

Feeding in the results of the previous equation
q(1) =



ò (1 - e )u(t ) dt 
1

t -1

0

q(1) = 1 - e t -1 , u

As such, our problem boils down to minimizing the norm of u subject to
e t -1 , u = 0

and

1 - e t -1 , u = 1
...
As such, we want to
maximize a2 subject to


1

ò (a
0

1

- a2 )e t -1 + a2 dt £ 1
...
For x Î L1 and u Î L¥
to be aligned, we require
x, u =

ò

1
0

x (t )u(t ) dt = x u

*

=

ò

1
0

x (t ) dt ⋅ maxt Î[0,1] u(t )

For this to be true, it is clear that u can only take two values (+M) and
that it must have the same sign as x at any given value of t
...
Clearly, it

changes sign at most once
...

o

Example: Consider selecting a thrust program u(t) for a vertically ascending

rocket subject only to gravity and thrust in order to reach a given altitude (say
1) with minimum fuel expenditure

ò

T
0

u(t ) dt
...
The equation of motion is x(t ) = u(t ) - 1
...
Instead, consider it in NBV[0,1], and associate with every u a

v Î NBV [0,1] so that u(t ) dt = dv(t )
...
We denote
this by an unknown T and then optimize over this parameter
...



Our problem is then a minimum norm problem subject to a single linear
constraint
...
The norm is in C[0,1], the space to
which NBV is dual
...
We then have min T -t ,v =1+ 1T 2 v = T1 + 12 T
...


To find the optimal u, note that the optimal v must be aligned to

(T - t )a
...




Hyperplanes & the Geometric Hahn-Banach Theorem
o

Definition: A hyperplane H of a normed linear space X is a maximal proper
affine set
...


o

Theorem: A set H is a hyperplane if and only if it is of the form

{x Î X : f (x ) = c }

where f is a non-zero linear functional, and c is a scalar
...
Thus,
X = span (x 0 , M )
...
Define

x ÎX

as

f (ax 0 + m ) = a
...



If x 0 Î M , then H = M
...




Only if: Suppose f ¹ 0 and let M = {x : f (x ) = 0}
...
Furthermore, there exists x0 such that f (x 0 ) = 1
...
So we only
require one extra vector (x0) to expand M into the whole subspace
...
Thus H = {x : f (x ) = c } = {x = cx 0 + M } is a
hyperplane
...

o

Theorem (Geometric Hahn-Banach): Let K be a convex set having a
nonempty interior in a real normed linear vector space X
...
Then there is a closed hyperplane in
X containing V but containing to interior point of K
...


o

We will abuse notation and write x * (x ) = x , x * , for x Î V , x * Î V *
...




It allows to represent all hyperplanes as x : x , a = 0
...


Proof: If y = 0, the result is simple
...



Parallelogram



2

x +y + x -y

Law:

2

2

= 2 x +2 y

2

(prove

by

extending norms as inner products)
...
Equivalent

x, x

satisfies triangle (expand x + y , and use C-S)
...

The induced norm is the Frobenius Norm, X
o

F


...
A seminorm might not satisfy (b)
...
x


p

1/ p

i

¥

The dual norm of a norm


...


is

z

*

{

}

= supx z ⋅ x : x £ 1
...
Note that x ⋅ y £ x y


For p Î (1, ¥) , the dual norm of


o

**

=

p

is

q

, where

1
p

+ q1 = 1
...


Open/closed sets

{

} is a neighborhood of x (“open ball”)



N r (x ) = y Î X : x - y < r



x Î int  if $r s
...
N r (x ) Ì 
...




x Î cl  if for every N r (x ) , N r (x ) Ç  ¹ Æ
...
The intersection of a finite number of
open sets is open
...
The union of a
finite number of closed sets is closed
...




Theorem: f -1 ( ) = {x Î dom f : f (x ) Î  }
...

o

 Ì n is (sequentially) compact if for every sequence
subsequence

{x }
ki

{x } Í 
k

there exists a

converging to an element x Î 
...



Theorem (Heine-Borel): in finite dimensional spaces, a set is compact if
and only if it is closed and bounded
...
The intersection
of a sequence of non-empty, nested compact sets is non-empty
...


o

A subspace of a vector space is a subset of the vector space that contains the 0
vector and that satisfies closure under addition and scalar multiplication
...


o

Second-order Taylor expansion: if f is twice-continuously differentiable over
2

N r (x ) , then " d Î N r (0 ) , f (x + d ) = f (x ) + f (x ) d + 12 d 2 f (x )d + o( d )



o

For a vector-value function F, éêF (x )ùú = ¶Fj (x ) / ¶x i
...


Linear algebra
o

The range or image of A Î m´n , (A) is the set of all vectors that can be
written as Ax
...
Note that  (A) Å (A ) = n ; in other words (A) = éê  (A )ùú
...


Daniel Guetta

Convex Optimization

o

Page 44

A real symmetric matrix A Î n Í n´n can be factored as A = Q LQ  , where

L = diag(l1 , , ln )
...


n++ is the set of symmetric, positive definite matrix; all their eigenvectors are

positive, and A Î n++ , x ¹ 0  x Ax > 0
...
As such, lmin (A)x x £ x Ax £ lmax (A)x x

with rank r
...
Writing
AA = V SU U SV  = V S2V  we see that these singular values are the square
root of the non-zero eigenvalues of AA , and the right singular vectors V are the
eigenvectors of AA
...
We have
smax (A) = supx ,y ¹0

x Ay
x

y

2

denote cond(A) = A
o

2

2

= supy ¹0
A-1

2

y AAy
y


...
We
2

= smax (A) / smin (A)
...
If A is square and non-singular, then
A† = A-1
...




The optimal value of minx 12 x Px + q ⋅ x + r for P Î  can be expressed

2



n

as - 12 q P †q + r if P  0 and q Î (P ) , and -¥ otherwise
...
If det A ¹ 0 , then the matrix

Then S = C - B A-1B is the Schur complement of A in X
...



A0

Let

and consider

minu x Xx = minu u Au + 2v B u + v Cv ,

where x = éêu v ùú
...
Thus


X  0  A  0 and S  0



If A  0 , then X  0  S  0

Daniel Guetta

Convex Optimization



Page 45

Consider
eliminate

(

é
ùé ù é ù
ê A B ú êx ú = êu ú
êB  C ú êy ú êv ú
êë
úû êë úû êë úû

x

and

with det A ¹ 0
...
Substituting this back to find x, we get

é A Bù
ê
ú
êB  C ú
êë
úû

-1

éA-1 + A-1BS -1B A-1 -A-1BS -1 ù
ú
= êê
ú
-1  -1
-1
-S B A
S
êë
úû

Daniel Guetta

we

get


Title: Convex Optimization - Lecture Notes
Description: Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations Condensed Notes roughly following two courses I took - "Foundations of Optimization" (thought by Prof Ciamac Moallemi) and "Convex Optimization" (thought by Prof Garud Iyengar). These notes are also heavily based on Boyd and Vandenberghe's book "Convex Optimization" (available online) and Luenberger's "Optimization by Vector Space Methods". The chapter numbers in these notes refer to Boyd and Vandenberghe's text. Rough list of topics covered: convexity of sets and functions, formulation of convex programs (from linear programs to semi-definite programs), duality, applications, Hilbert and Banach spaces, minimum-norm problems in Banach spaces, the Hahn-Banach Theorem.