Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Linear Algebra For AI/ ML
Description: In this notes you will get all the linear algebra needed for ai ml from beginning to advanced,
Description: In this notes you will get all the linear algebra needed for ai ml from beginning to advanced,
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
Notes on Linear Algebra
Peter J
...
Abstractly, it is the study of vector spaces over
fields, and their linear maps and bilinear forms
...
So in a course of this kind, it is necessary to touch on
both the abstract and the concrete aspects, though applications are not treated in
detail
...
Vector spaces over a field K are particularly attractive algebraic objects, since each vector space is completely determined by a single number, its
dimension (unlike groups, for example, whose structure is much more complicated)
...
On the practical side, the subject is really about one thing: matrices
...
As this suggests, matrices represent several different kinds of things
...
This gives rise to several equivalence relations on the set of matrices, summarised
in the following table:
Equivalence
Similarity
Congruence
Same linear map
α :V →W
Same linear map
α :V →V
A0 = Q−1 AP
P, Q invertible
A0 = P−1 AP
P invertible
Orthogonal
similarity
Same bilinear Same self-adjoint
form b on V
α : V → V w
...
t
...
We will see several such “canonical form theorems” in the notes
...
The course description reads as follows:
This module is a mixture of abstract theory, with rigorous proofs, and
concrete calculations with matrices
...
e
...
The concrete
applications involve ways to reduce a matrix of some specific type
(such as symmetric or skew-symmetric) to as near diagonal form as
possible
...
Of course, some revision is necessary, and I have tried to make the
notes reasonably self-contained
...
The notes for the prerequisite course, Linear Algebra I, by Dr Francis Wright,
are currently available from
http://centaur
...
qmul
...
uk/Lin Alg I/
I have by-and-large kept to the notation of these notes
...
I have included in the appendices some extra-curricular applications of linear algebra, including some special determinants, the method for solving a cubic
equation, the proof of the “Friendship Theorem” and the problem of deciding the
winner of a football league, as well as some worked examples
...
Cameron
September 5, 2008
Contents
1
2
3
4
Vector spaces
1
...
1
...
1
...
1
...
1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
3
...
9
...
13
15
15
16
20
22
25
29
Matrices and determinants
2
...
2
...
2
...
2
...
2
...
2
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
1 Definition and basic properties
...
2 Representation by matrices
...
3 Change of basis
...
4 Canonical form revisited
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
33
...
37
...
...
...
...
41
41
43
44
45
48
51
52
Linear maps on a vector space
4
...
4
...
4
...
4
...
4
...
6 Jordan form
...
7 Trace
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
1
CONTENTS
5
Linear and quadratic forms
5
...
5
...
1 Adjoints
...
1
...
5
...
5
...
1 Quadratic forms
...
2
...
5
...
3 Quadratic and bilinear forms
...
2
...
...
...
...
...
1 Inner products and orthonormal bases
...
2 Adjoints and orthogonal linear maps
...
1 Orthogonal projections and orthogonal decompositions
7
...
7
...
7
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
73
...
75
...
78
8
The complex case
81
8
...
81
8
...
82
8
...
83
9
Skew-symmetric matrices
85
9
...
85
9
...
86
9
...
88
A Fields and vector spaces
89
B Vandermonde and circulant matrices
93
C The Friendship Theorem
97
D Who is top of the league?
101
E Other canonical forms
105
F Worked examples
107
2
CONTENTS
Chapter 1
Vector spaces
These notes are about linear maps and bilinear forms on vector spaces, how we
represent them by matrices, how we manipulate them, and what we use this for
...
1
Definitions
Definition 1
...
If you don’t know what an abelian group is, then you can find it spelled out in
detail in Appendix A
...
I will not stop to prove that these structures really are fields
...
3
4
CHAPTER 1
...
2 A vector space V over a field K is an algebraic system consisting
of a non-empty set V equipped with a binary operation + (vector addition), and
an operation of scalar multiplication
(a, v) ∈ K ×V 7→ av ∈ V
such that the following rules hold:
(VA) (V, +) is an abelian group, with identity element 0 (the zero vector)
...
(VM1) For any a ∈ K, u, v ∈ V , we have a(u + v) = au + av
...
(VM3) For any a, b ∈ K, v ∈ V , we have (ab)v = a(bv)
...
Since we have two kinds of elements, namely elements of K and elements of
V , we distinguish them by calling the elements of K scalars and the elements of
V vectors
...
Example 1
...
This is a real vector space
...
There are two ways we can make
these definitions
...
Think of a vector as an arrow starting at the origin
and ending at a point of the plane
...
1)
...
• The algebraic definition
...
Thus, a vector v is just a pair (x, y) of real numbers
...
5
1
...
BASES
1
>
1
Figure 1
...
Let v = (x1 , y1 ) and w = (x2 , y2 )
...
In the algebraic definition, we say that the operations of addition and scalar
multiplication are coordinatewise: this means that we add two vectors coordinate
by coordinate, and similarly for scalar multiplication
...
Example 1
...
Let V = Kn , the set
of all n-tuples of elements of K
...
, an ) + (b1 , b2 ,
...
, an + bn ),
c(a1 , a2 ,
...
, can )
...
2
Bases
This example is much more general than it appears: Every finite-dimensional vector space looks like Example 1
...
Here’s why
...
3 Let V be a vector space over the field K, and let v1 ,
...
6
CHAPTER 1
...
, vn are linearly independent if, whenever we have
scalars c1 , c2 ,
...
(b) The vectors v1 , v2 ,
...
, cn ∈ K such that
v = c1 v1 + c2 v2 + · · · + cn vn
...
, vn i
...
, vn form a basis for V if they are linearly independent
and spanning
...
A list containing
the zero vector is never linearly independent
...
I will say “Let B = (v1 ,
...
, vn is a basis, and to refer to this list as B
...
4 Let V be a vector space over the field K
...
, vn ∈ V which form a basis for V
...
If you study Functional Analysis, Quantum Mechanics, or various other
subjects, you will meet vector spaces which are not finite dimensional
...
1 The following three conditions are equivalent for the vectors
v1 ,
...
, vn is a basis;
(b) v1 ,
...
, vn is a minimal spanning set (that is, if we remove any vector from
the list, then the result is no longer spanning)
...
7
1
...
BASES
Theorem 1
...
Suppose
that the vectors v1 ,
...
, wm
are linearly independent, where m > n
...
, vn , wi are linearly independent
...
Lemma 1
...
, xm ) (that
is, x1 ,
...
Proof This is proved by induction on the number of variables
...
, an1 of x1 are all zero, then putting x1 = 1 and the other variables zero
gives a solution
...
By hypothesis, n − 1 < m − 1
...
Computing
the value of x1 gives a solution to the original equations
...
Let us argue for a contradiction, by assuming that the result is false: that is, assume that none of the vectors
wi can be added to the list (v1 ,
...
This means that, for all j, the list (v1 ,
...
So there
are coefficients c1 ,
...
We cannot have d = 0; for this would mean that we had a linear combination of
v1 ,
...
So we can divide the equation through by d, and take wi to the other
side, to obtain (changing notation slightly)
n
wi = a1i v1 + a2i v2 + · · · + ani vn =
∑ a jiv j
...
VECTOR SPACES
We do this for each value of i = 1,
...
Now take a non-zero solution to the set of equations (∗) above: that is,
m
∑ a jixi = 0
i=1
for j = 1,
...
Multiplying the formula for wi by xi and adding, we obtain
!
x1 w1 + · · · + xm wm =
n
m
j=1
i=1
∑ ∑ a jixi
v j = 0
...
, wm )
are not linearly dependent, contrary to hypothesis
...
, vn ) to get a linearly
independent set must be wrong, and the proof is complete
...
4 Let V be a finite-dimensional vector space over a field K
...
The number of elements in a basis is called the dimension of the vector space
V
...
We denote the dimension of V by dim(V )
...
(a) Let (v1 ,
...
, wm ) be two bases for V
...
Both lists of vectors are linearly independent; so, according to
the Exchange Lemma, we can add some vector wi to the first list to get a larger
linearly independent list
...
, vn was not a maximal linearly
independent set, and so (by Proposition 1
...
We conclude that m = n, as required
...
, vn ) be linearly independent and let (w1 ,
...
Necessarily n ≤ m, since otherwise we could add one of the vs to (1 ,
...
But now we can
add some ws to (v1 ,
...
9
1
...
ROW AND COLUMN VECTORS
Remark We allow the possibility that a vector space has dimension zero
...
Now let V be an n-dimensional vector space over K
...
, vn for V
...
, cn ∈ K
...
, cn are the coordinates of
v (with respect to the given basis), and the coordinate representation of v is the
n-tuple
(c1 , c2 ,
...
Now the coordinate representation is unique
...
, c0n
...
Now the vectors v1 , v2
...
, cn − c0n = 0; that is,
c1 = c01 ,
c2 = c02 ,
...
Now it is easy to check that, when we add two vectors in V , we add their
coordinate representations in Kn (using coordinatewise addition); and when we
multiply a vector v ∈ V by a scalar c, we multiply its coordinate representation
by c
...
This is why we only need to consider vector spaces of the form Kn , as in Example 1
...
Here is how the result would be stated in the language of abstract algebra:
Theorem 1
...
1
...
There are two different ways that we can represent an n-tuple: as a row, or as
10
CHAPTER 1
...
Thus, the vector with components 1, 2 and −3 can be represented as a
row vector
[ 1 2 −3 ]
or as a column vector
1
2
...
But
you will see the notation (1, 2, −3) and the equivalent for columns in other books!)
Both systems are in common use, and you should be familiar with both
...
There are arguments for and against both systems
...
The most powerful argument will appear when we consider representing
linear maps by matrices
...
y
9
Statisticians also prefer column vectors: to a statistician, a vector often represents
data from an experiment, and data are usually recorded in columns on a datasheet
...
So we make a formal definition:
Definition 1
...
, vn )
...
cn
In order to save space on the paper, we often write this as
[v]B = [ c1
The symbol > is read “transpose”
...
vn ]>
...
4
...
4
Change of basis
The coordinate representation of a vector is always relative to a basis
...
Definition 1
...
, vn ) and B0 = (v01 ,
...
The transitition matrix P from B to B0 is the n × n
matrix whose jth column is the coordinate representation [v0j ]B of the jth vector
of B0 relative to B
...
Proposition 1
...
Then, for any vector v ∈ V , the coordinate representations of v with
respect to B and B0 are related by
[v]B = P [v]B0
...
By definition, we have
v0j
n
= ∑ pi j vi
...
, cn ]> ,
[v]B0 = [d1 ,
...
This means, by definition, that
n
v = ∑ ci vi =
i=1
n
∑ d j v0j
...
Reversing the order of summation, we get
n
v= ∑
i=1
n
∑ pi j d j
!
vi
...
By
the uniqueness of the coordinate representation, they are the same: that is,
n
ci =
∑ pi j d j
...
VECTOR SPACES
In matrix form, this says
c1
d1
...
= P
...
In this course, we will see four ways in which matrices arise in linear algebra
...
The next corollary summarises how transition matrices behave
...
Given a matrix P, we denote by P−1 the inverse of P, the matrix Q satisfying
PQ = QP = I
...
Corollary 1
...
(a) PB,B = I
...
(c) PB,B00 = PB,B0 PB0 ,B00
...
For example, for (b) we have
[v]B = PB,B0 [v]B0 ,
[v]B0 = PB0 ,B [v]B ,
so
[v]B = PB,B0 PB0 ,B [v]B
...
Corollary 1
...
This follows immediately from (b) of the preceding Corollary
...
r
...
the new
basis in terms of that w
...
t
...
13
1
...
SUBSPACES AND DIRECT SUMS
Example
Consider the vector space R2 , with the two bases
1
0
1
2
0
B=
,
,
B =
,
...
−1 1
So the theorem tells us that, for any x, y ∈ R, we have
x
1
0
1
2
=x
+y
= (3x − 2y)
+ (−x + y)
,
y
0
1
1
3
as is easily checked
...
5
Subspaces and direct sums
Definition 1
...
We write U ≤ V to mean “U is a subspace of V ”
...
Subspaces can be constructed in various ways:
(a) Let v1 ,
...
The span of (v1 ,
...
, cn ∈ K}
...
Moreover, (v1 ,
...
We denote the span of v1 ,
...
, vn i
...
Then
– the intersection U1 ∩ U2 is the set of all vectors belonging to both U1
and U2 ;
– the sum U1 +U2 is the set {u1 + u2 : u1 ∈ U1 , u2 ∈ U2 } of all sums of
vectors from the two subspaces
...
14
CHAPTER 1
...
Proofs are left
to the reader
...
9 Let V be a vector space over K
...
, vn ∈ V , the dimension of hv1 ,
...
, vn are linearly independent
...
An important special case occurs when U1 ∩ U2 is the zero subspace {0}
...
For suppose that we had
two different expressions for a vector v, say
v = u1 + u2 = u01 + u02 ,
u1 , u01 ∈ U1 , u2 , u02 ∈ U2
...
But u1 − u01 ∈ U1 , and u02 − u2 ∈ U2 ; so this vector is in U1 ∩U2 , and by hypothesis
it is equal to 0, so that u1 = u01 and u2 = u02 ; that is, the two expressions are
not different after all! In this case we say that U1 + U2 is the direct sum of the
subspaces U1 and U2 , and write it as U1 ⊕U2
...
The notion of direct sum extends to more than two summands, but is a little
complicated to describe
...
Definition 1
...
,Ur be subspaces of the vector space V
...
,Ur , and write
V = U1 ⊕
...
, r
...
10 If V = U1 ⊕ · · · ⊕Ur , then
(a) dim(V ) = dim(U1 ) + · · · + dim(Ur );
(b) if Bi is a basis for Ui for i = 1,
...
Chapter 2
Matrices and determinants
You have certainly seen matrices before; indeed, we met some in the first chapter
of the notes
...
Then we define the determinant of
a square matrix axiomatically and prove that it exists (that is, there is a unique
“determinant” function satisfying the rules we lay down), and give some methods
of calculating it and some of its properties
...
2
...
1 A matrix of size m × n over a field K, where m and n are positive
integers, is an array with m rows and n columns, where each entry is an element
of K
...
Example 2
...
Definition 2
...
(a) Let A and B be matrices of the same size m × n over K
...
(b) Let A be an m × n matrix and B an n × p matrix over K
...
MATRICES AND DETERMINANTS
element in the ith row of A by the corresponding element in the jth column
of B and summing:
n
(AB)i j =
∑ Aik Bk j
...
In particular, for a fixed value of n, we can add and multiply n × n matrices
...
The zero matrix, which we denote by O, is the matrix
with every entry zero, while the identity matrix, which we denote by I, is the matrix with entries 1 on the main diagonal and 0 everywhere else
...
We already met matrix multiplication in Section 1 of the notes: recall that if
PB,B0 denotes the transition matrix between two bases of a vector space, then
PB,B0 PB0 ,B00 = PB,B00
...
2
Row and column operations
Given an m × n matrix A over a field K, we define certain operations on A called
row and column operations
...
3 Elementary row operations There are three types:
Type 1 Add a multiple of the jth row to the ith, where j 6= i
...
Tyle 3 Interchange the ith and jth rows, where j 6= i
...
Type 2 Multiply the ith column by a non-zero scalar
...
By applying these operations, we can reduce any matrix to a particularly simple form:
2
...
ROW AND COLUMN OPERATIONS
17
Theorem 2
...
Then it is possible to
change A into B by elementary row and column operations, where B is a matrix
of the same size satisfying Bii = 1 for 0 ≤ i ≤ r, for r ≤ min{m, n}, and all other
entries of B are zero
...
Definition 2
...
We can write the canonical form matrix in “block form” as
Ir O
B=
,
O O
where Ir is an r × r identity matrix and O denotes a zero matrix of the appropriate
size (that is, r × (n − r), (m − r) × r, and (m − r) × (n − r) respectively for the three
Os)
...
Proof We outline the proof that the reduction is possible
...
The proof is by induction on the size of the matrix A: in other words, we
assume as inductive hypothesis that any smaller matrix can be reduced as in the
theorem
...
We proceed in steps as follows:
• If A = O (the all-zero matrix), then the conclusion of the theorem holds,
with r = 0; no reduction is required
...
• If A11 6= 0, then skip this step
...
• Now we can assume that A11 6= 0
...
• Now by row and column operations of Type 1, we can assume that all the
other elements in the first row and column are zero
...
Repeat this until all non-zero elements have been removed
...
MATRICES AND DETERMINANTS
• Now let B be the matrix obtained by deleting the first row and column of A
...
The same
sequence of operations applied to A now finish the job
...
2 Here is a small example
...
4 5 6
We have A11 = 1, so we can skip the first three steps
...
4 −3 −6
Now subtracting four times the first row from the second gives
1 0
0
...
Multiply the second row by −1/3 to get
1 0 0
...
0 1 0
We have finished the reduction, and we conclude that the rank of the original
matrix A is equal to 2
...
For each elementary row operation on an n-rowed matrix A, we define the corresponding elementary matrix by applying the same operation to the n × n identity
matrix I
...
We don’t have to distinguish between rows and columns for our elementary
matrices
...
2
...
For the other types, the matrices for row operations and column
operations are identical
...
2 The effect of an elementary row operation on a matrix is the same as
that of multiplying on the left by the corresponding elementary matrix
...
The proof of this lemma is somewhat tedious calculation
...
3 We continue our previous example
...
(Here 2 × 2
matrices are row operations while 3 × 3 matrices are column operations)
...
−4 1
0 −1/3
0 0 1
0 0 1
0 0 1
So the whole process can be written as a matrix equation:
1 −2 0
1 0 −3
1 0
1
0
1 0
A 0 1 00 1 0 0 1
0 −1/3 −4 1
0 0 1
0 0 1
0 0
0
−2 = B,
1
or more simply
1
4/3
1
−2 = B,
1
1 −2
0
A 0 1
−1/3
0 0
where, as before,
1
A=
4
2
5
3
,
6
1
B=
0
0
1
0
...
For example, the effect of adding twice the
first row to the second is undone by adding −2 times the first row to the second,
so that
−1
1 −2
1 2
=
...
First, one more definition:
20
CHAPTER 2
...
5 The m × n matrices A and B are said to be equivalent if B = PAQ,
where P and Q are invertible matrices of sizes m × m and n × n respectively
...
3 Given any m × n matrix A, there exist invertible matrices P and Q
of sizes m × m and n × n respectively, such that PAQ is in the canonical form for
equivalence
...
When mathematicians talk about a “canonical form” for an equivalence relation, they mean a set of objects which are representatives of the equivalence
classes: that is, every object is equivalent to a unique object in the canonical form
...
This is our job for the next section
...
3
Rank
We have the unfinished business of showing that the rank of a matrix is well defined; that is, no matter how we do the row and column reduction, we end up with
the same canonical form
...
Definition 2
...
We say that the column
rank of A is the maximum number of linearly independent columns of A, while
the row rank of A is the maximum number of linearly independent rows of A
...
)
Now we need a sequence of four lemmas
...
4 (a) Elementary column operations don’t change the column rank
of a matrix
...
(c) Elementary column operations don’t change the row rank of a matrix
...
Proof (a) This is clear for Type 3 operations, which just rearrange the vectors
...
21
2
...
RANK
So suppose that v1 ,
...
Consider a Type 1 operation involving adding c times the jth column to the ith; the new columns are
v01 ,
...
Suppose that the new vectors are linearly dependent
...
, an , not all zero, such
that
0 = a1 v01 + · · · + an v0n
= a1 v1 + · · · + ai (vi + cv j ) + · · · + a j v j + · · · + an vn
= a1 v1 + · · · + ai vi + · · · + (a j + cai )v j + · · · + an vn
...
, vn are linearly independent, we conclude that
a1 = 0,
...
, a j + cai = 0,
...
So the new
columns are linearly independent
...
(b) It is easily checked that, if an elementary row operation is applied, then the
new vectors satisfy exactly the same linear relations as the old ones (that is, the
same linear combinations are zero)
...
(c) Same as (b), but applied to rows
...
Theorem 2
...
In particular, the rank is independent of the row and column operations
used to compute it
...
These elementary operations don’t change the row or column
rank, by our lemma; so the row ranks of A and B are equal, and their column ranks
are equal
...
So the theorem is proved
...
Let A be an
invertible n × n matrix
...
This means that there are matrices P and Q, each a product of elementary
matrices, such that
PAQ = In
...
MATRICES AND DETERMINANTS
From this we deduce that
A = P−1 In Q−1 = P−1 Q−1 ;
in other words,
Corollary 2
...
In fact, we learn a little bit more
...
So, when we have written A as a product of elementary
matrices, we can choose to regard them as representing column operations, and
we see that A can be obtained from the identity by applying elementary column
operations
...
In other words, the following is
true:
Corollary 2
...
2
...
It has some very important properties: perhaps most important is the fact that a
matrix is invertible if and only if its determinant is not equal to zero
...
For a matrix written out as an array, the determinant is denoted by replacing
the square brackets by vertical bars:
1 2
1 2
...
c d = ad − bc,
g h i
Our first job is to define the determinant for square matrices of any size
...
4
...
7 A function D defined on n × n matrices is a determinant if it satisfies the following three conditions:
(D1) For 1 ≤ i ≤ n, D is a linear function of the ith column: this means that, if A
and A0 are two matrices which agree everywhere except the ith column, and
if A00 is the matrix whose ith column is c times the ith column of A plus c0
times the ith column of A0 , but agreeing with A and A0 everywhere else, then
D(A00 ) = c D(A) + c0 D(A0 )
...
(D3) D(In ) = 1, where In is the n × n identity matrix
...
8 There is a unique determinant function on n × n matrices, for any n
...
(a) If B is obtained from A by adding c times the jth column to the ith, then
D(B) = D(A)
...
(c) If B is obtained from A by interchanging two columns, then D(B) = −D(A)
...
By rule (D2), D(A0 ) = 0
...
Part (b) follows immediately from rule (D3)
...
24
CHAPTER 2
...
The first, third and fourth steps don’t change the value of D, while the second
multiplies it by −1
...
The
overall effect is to multiply D(A) by a certain non-zero scalar c, depending on the
operations
...
• If A is not invertible, then its column rank is less than n
...
Applying axiom (D1), we see that D(A) is a linear combination of values D(A0 ), where A0 are matrices with two equal columns; so
D(A0 ) = 0 for all such A0 , whence D(A) = 0
...
We show its
existence in the next section, by giving a couple of formulae for it
...
The proof of the theorem shows an important corollary:
Corollary 2
...
Proof See the case division at the end of the proof of the theorem
...
Theorem 2
...
Proof Suppose first that B is not invertible
...
Also, AB is not
invertible
...
Then XA is the inverse
of B
...
In the other case, B is invertible, so we can apply a sequence of elementary
column operations to B to get to the identity
...
Now these operations are represented by
elementary matrices; so we see that BQ = I, where Q is a product of elementary
matrices
...
5
...
The determinant is multiplied by the same
factor, so we find that c det(AB) = det(A)
...
Finally, we have defined determinants using columns, but we could have used
rows instead:
Proposition 2
...
The proof of uniqueness is almost identical to that for columns
...
Corollary 2
...
For, if D denotes the “determinant” computed by row operations, then det(A) =
D(A) = det(A> ), since row operations on A correspond to column operations on
A>
...
5
Calculating determinants
We now give a couple of formulae for the determinant
...
Definition 2
...
, n} is a bijection from the set {1,
...
The symmetric group Sn consists of all permutations of the set {1,
...
(There are n! such permutations
...
A
transposition is a permutation which interchanges two symbols and leaves all the
others fixed
...
26
CHAPTER 2
...
We need one more fact about signs: if π is any permutation and τ is a transposition, then sign(πτ) = − sign(π), where πτ denotes the composition of π and
τ (apply first τ, then π)
...
9 Let A be an n × n matrix over K
...
π∈Sn
Proof In order to show that this is a good definition, we need to verify that it
satisfies our three rules (D1)–(D3)
...
Each term, apart
from a sign, is the product of n elements, one from each row and column
...
(D2) Suppose that the ith and jth columns of A are equal
...
Then
π(τ(i)) = π( j) and π(τ( j)) = π(i), whereas π(τ(k)) = π(k) for k 6= i, j
...
But sign(πτ) = − sign(π)
...
The elements of Sn can be divided up
into n!/2 pairs of the form {π, πτ}
...
We conclude that det(A) = 0
...
(D3) If A = In , then the only permutation π which contributes to the sum is the
identity permutation ι: for any other permutation π satisfies π(i) 6= i for
some i, so that Aiπ(i) = 0
...
This gives us a nice mathematical formula for the determinant of a matrix
...
For n of moderate size, this will take a very long time! (For example,
10! = 3628800
...
27
2
...
CALCULATING DETERMINANTS
Definition 2
...
For 1 ≤ i, j ≤ n, we define the (i, j)
minor of A to be the (n − 1) × (n − 1) matrix obtained by deleting the ith row and
jth column of A
...
(These signs have a chessboard pattern, starting
with sign + in the top left corner
...
Finally, the adjugate of A is the n × n matrix Adj(A) whose (i, j) entry is the ( j, i)
cofactor K ji (A) of A
...
13
(a) For j ≤ i ≤ n, we have
n
det(A) = ∑ Ai j Ki j (A)
...
j=1
This theorem says that, if we take any column or row of A, multiply each
element by the corresponding cofactor, and add the results, we get the determinant
of A
...
4 Using a cofactor expansion along the first column, we see that
1 2 3
5 6
2 3
2 3
4 5 6 =
8 10 − 4 8 10 + 7 5 6
7 8 10
= (5 · 10 − 6 · 8) − 4(2 · 10 − 3 · 8) + 7(2 · 6 − 3 · 5)
= 2 + 16 − 21
= −3
using the standard formula for a 2 × 2 determinant
...
Let D(A) be the function defined by the right-hand side of (a) in the
theorem, using the jth column of A
...
(D1) It is clear that D(A) is a linear function of the jth column
...
28
CHAPTER 2
...
The harder case is
when the jth column is equal to another, say the kth
...
In the resulting sum, it is easy to see that
each such determinant occurs twice with opposite signs and multiplied by
the same factor
...
(D3) Suppose that A = I
...
So D(I) = 1
...
At first sight, this looks like a simple formula for the determinant, since it is
just the sum of n terms, rather than n! as in the first case
...
Working down the chain we find that this method
is just as labour-intensive as the other one
...
14 For any n × n matrix A, we have
A · Adj(A) = Adj(A) · A = det(A) · I
...
Recall that the (i, j) entry of Adj(A) is
K ji (A)
...
On the other hand, if i 6= j, then the (i, j) entry of the
product is
n
∑ Aik (Adj(A))k j =
k=1
n
∑ Aik K jk (A)
...
(Note
that changing the jth row of a matrix has no effect on the cofactors of elements in
this row
...
But A0 has two equal rows, so its determinant is
zero
...
The proof for the product the other way around is the same, using columns
instead of rows
...
6
...
15 If the n × n matrix A is invertible, then its inverse is equal to
(det(A))−1 Adj(A)
...
Apply elementary operations to the matrix, keeping track of the factor by
which the determinant is multiplied by each operation
...
Often it is
simpler to stop at an earlier stage when you can recognise what the determinant is
...
, an , and all off-diagonal
entries are zero, then det(A) is just the product a1 · · · an
...
5 Let
1
A= 4
7
2
5
8
3
6
...
7 −6 −11
Now the cofactor expansion along the first row gives
−3 −6
= 33 − 36 = −3
...
)
2
...
For example, if
0 1
A=
,
−2 3
then the result of substituting A into the polynomial x2 − 3x + 2 is
−2 3
0 −3
2 0
0 0
2
A − 3A + 2I =
+
+
=
...
MATRICES AND DETERMINANTS
We say that the matrix A satisfies the equation x2 − 3x + 2 = 0
...
)
It turns out that, for every n × n matrix A, we can calculate a polynomial equation of degree n satisfied by A
...
11 Let A be a n × n matrix
...
This is a polynomial in x of degree n
...
cA (x) =
2 x−3
Indeed, it turns out that this is the polynomial we want in general:
Theorem 2
...
Then cA (A) = O
...
6 Let us just check the theorem for 2 × 2 matrices
...
Proof We use the theorem
A · Adj(A) = det(A) · I
...
0
= O,
1
2
...
THE CAYLEY–HAMILTON THEOREM
31
Now it is very tempting just to substitute x = A into this formula: on the
right we have cA (A)I = cA (A), while on the left there is a factor AI − A = O
...
The matrix Adj(xI − A)
is an n × n matrix whose entries are determinants of (n − 1) × (n − 1) matrices
with entries involving x
...
As we have said, Adj(xI − A) is a matrix whose
entries are polynomials, so we can write it as a sum of powers of x times matrices,
that is, as a polynomial whose coefficients are matrices
...
3x − 4 x + 2
0 0
3 1
−4 2
The entries in Adj(xI − A) are (n − 1) × (n − 1) determinants, so the highest
power of x that can arise is xn−1
...
, Bn−1
...
So, if we let
cA (x) = xn + cn−1 xn−1 + · · · + c1 x + c0 ,
then we read off that
−ABn−1
−AB1
−AB0
Bn−1 =
I,
+ Bn−2 = cn−1 I,
···
+ B0 = c1 I,
= c0 I
...
, and the last by A0 = I
...
On the right, we have
An + cn−1 An−1 + · · · + c1 A + c0 I = cA (A)
...
32
CHAPTER 2
...
We will see that these maps can be represented by matrices, decide when
two matrices represent the same linear map, and give another proof of the canonical form for equivalence
...
1
Definition and basic properties
Definition 3
...
A function α from
V to W is a linear map if it preserves addition and scalar multiplication, that is, if
• α(v1 + v2 ) = α(v1 ) + α(v2 ) for all v1 , v2 ∈ V ;
• α(cv) = cα(v) for all v ∈ V and c ∈ K
...
We can combine the two conditions into one as follows:
α(c1 v1 + c2 v2 ) = c1 α(v1 ) + c2 α(v2 )
...
In other literature the term “linear transformation” is often used instead of
“linear map”
...
2 Let α : V → W be a linear map
...
33
34
CHAPTER 3
...
1 Let α : V → W be a linear map
...
Proof We have to show that each is closed under addition and scalar multiplication
...
For the kernel, if α(v1 ) = α(v2 ) = 0 then
α(v1 + v2 ) = α(v1 ) + α(v2 ) = 0 + 0 = 0,
and if α(v) = 0 then
α(cv) = cα(v) = c0 = 0
...
3 We define the rank of α to be ρ(α) = dim(Im(α)) and the nullity
of α to be ν(α) = dim(Ker(α))
...
2 (Rank–Nullity Theorem) Let α : V → W be a linear map
...
Proof Choose a basis u1 , u2 ,
...
The vectors u1 ,
...
, uq , v1 ,
...
We claim that the vectors α(v1 ),
...
We have
to show that they are linearly independent and spanning
...
Then α(c1 v1 +
· · · + cs vs ) = 0, so that c1 v1 + · · · + cs vs ∈ Ker(α)
...
But the us and vs form a basis for V , so they are linearly independent
...
The fact that c1 = · · · =
cs = 0 shows that the vectors α(v1 ,
...
35
3
...
REPRESENTATION BY MATRICES
Spanning: Take any vector in Im(α), say w
...
Write v in terms of the basis for V :
v = a1 u1 + · · · + aq uq + c1 v1 + · · · + cs vs
for some a1 ,
...
, cs
...
So the vectors w1 ,
...
Thus, ρ(α) = dim(Im(α)) = s
...
3
...
Let α : V → W be a linear map, where dim(V ) = m and dim(W ) = n
...
Let e1 ,
...
, fn the standard
basis for V
...
, m, the vector α(ei ) belongs to W , so we can write
it as a linear combination of f1 ,
...
Definition 3
...
, em ) for V and C = ( f1 ,
...
, n
...
Take α(ei ) and write it as a column vector
[ a1i a2i · · · ani ]>
...
So, for example, if m = 3, n = 2, and
α(e1 ) = f1 + f2 ,
α(e2 ) = 2 f1 + 5 f2 ,
α(e3 ) = 3 f1 − f2 ,
36
CHAPTER 3
...
−1
Now the most important thing about this representation is that the action of α
is now easily described:
Proposition 3
...
Choose bases for V and W and
let A be the matrix representing α
...
Proof Let e1 ,
...
, fn for W
...
v =
...
In our example, if v = 2e1 + 3e2 + 4e3 = [ 2 3 4 ]> , then
2
1 2 3
20
α(v) = Av =
3 =
...
Definition 3
...
Define their sum α + β
by the rule
(α + β )(v) = α(v) + β (v)
for all v ∈ V
...
3
...
CHANGE OF BASIS
37
Proposition 3
...
The proof of this is not difficult: just use the definitions
...
6 Let U,V,W be vector spaces over K, and let α : U → V and β :
V → W be linear maps
...
Again it is easily checked that β α is a linear map
...
So β α means “apply α, then β ”
...
5 If α : U → V and β : V → W are linear maps represented by
matrices A and B respectively, then β α is represented by the matrix BA
...
Of course it follows that a linear
map is invertible (as a map; that is, there is an inverse map) if and only if it is
represented by an invertible matrix
...
The significance of all this is that the strange rule for multiplying matrices is
chosen so as to make Proposition 3
...
The definition of multiplication of
linear maps is the natural one (composition), and we could then say: what definition of matrix multiplication should we choose to make the Proposition valid? We
would find that the usual definition was forced upon us
...
3
Change of basis
The matrix representing a linear map depends on the choice of bases we used to
represent it
...
Remember the notion of transition matrix from Chapter 1
...
, vm )
and B0 = (v01 ,
...
Then we have
[v]B = P[v]B0 ,
38
CHAPTER 3
...
The inverse of PB,B0 is PB0 ,B
...
Now let C = (w1 ,
...
, w0n ) be two bases for a space W ,
with transition matrix QC,C0 and inverse QC0 ,C
...
Let α be a linear map from V to W
...
What is the
relation between A and A0 ?
We just do it and see
...
We have
m
v0j = ∑ pi j vi ,
i=1
so
α(v0j ) =
=
=
m
∑
pi j α(vi )
i=1
m m
∑∑
pi j Aki wk
i=1 k=1
m n n
∑ ∑ ∑ pi j Akirlk w0l
...
Proposition 3
...
If P = PB,B0 and Q = PC,C0 are the transition matrices from the
unprimed to the primed bases, then
A0 = Q−1 AP
...
Recall that two matrices A and
B are equivalent if B is obtained from A by multiplying on the left and right by
invertible matrices
...
)
39
3
...
CANONICAL FORM REVISITED
Proposition 3
...
This holds because
• transition matrices are always invertible (the inverse of PB,B0 is the matrix
PB0 ,B for the transition in the other direction); and
• any invertible matrix can be regarded as a transition matrix: for, if the n × n
matrix P is invertible, then its rank is n, so its columns are linearly independent, and form a basis B0 for Kn ; and then P = PB,B0 , where B is the
“standard basis”
...
4
Canonical form revisited
Now we can give a simpler proof of Theorem 2
...
First, we make the following observation
...
8 Let α : V → W be a linear map of rank r = ρ(α)
...
O O
Proof As in the proof of Theorem 3
...
, us for Ker(α), and
extend to a basis u1 ,
...
, vr for V
...
, α(vr ) is a basis for
Im(α), and so can be extended to a basis α(v1 ),
...
, xt for W
...
, vr , vr+1 = u1 ,
...
, wr = α(vr ), wr+1 = x1 ,
...
We have
α(vi ) =
wi
0
if 1 ≤ i ≤ r,
otherwise;
so the matrix of α relative to these bases is
Ir O
O O
as claimed
...
LINEAR MAPS BETWEEN VECTOR SPACES
We recognise the matrix in the theorem as the canonical form for equivalence
...
8 with Proposition 3
...
9 A matrix of rank r is equivalent to the matrix
Ir O
...
So all our
definitions of rank agree!
The conclusion is that
two matrices are equivalent if and only if they have the same rank
...
Chapter 4
Linear maps on a vector space
In this chapter we consider a linear map α from a vector space V to itself
...
However, this time we have less freedom: instead of
having two bases to choose, there is only one
...
1
Projections and direct sums
We begin by looking at a particular type of linear map whose importance will be
clear later on
...
1 The linear map π : V → V is a projection if π 2 = π (where, as
usual, π 2 is defined by π 2 (v) = π(π(v)))
...
1 If π : V → V is a projection, then V = Im(π) ⊕ Ker(π)
...
We
claim that v − w ∈ Ker(π)
...
Now v = w + (v − w) is the sum of a vector in Im(π) and one
in Ker(π)
...
Then v = π(w) for some vector
w; and
0 = π(v) = π(π(w)) = π 2 (w) = π(w) = v,
as required (the first equality holding because v ∈ Ker(π))
...
LINEAR MAPS ON A VECTOR SPACE
It goes the other way too: if V = U ⊕W , then there is a projection π : V → V
with Im(π) = U and Ker(π) = W
...
Now
the assertions are clear
...
1 shows geometrically what a projection is
...
Im(π)
v
π(v)
Ker(π)
Figure 4
...
First, notice that
if π is a projection and π 0 = I − π (where I is the identity map, satisfying I(v) = v
for all vectors v), then π 0 is also a projection, since
(π 0 )2 = (I − π)2 = I − 2π + π 2 = I − 2π + π = I − π = π 0 ;
and π + π 0 = I; also ππ 0 = π(I − π) = π − π 2 = O
...
In this form the result extends:
Proposition 4
...
, πr are projections on V satisfying
(a) π1 + π2 + · · · + πr = I, where I is the identity transformation;
(b) πi π j = O for i 6= j
...
Proof We have to show that any vector v can be uniquely written in the form
v = u1 + u2 + · · · + ur , where ui ∈ Ui for i = 1,
...
We have
v = I(v) = π1 (v) + π2 (v) + · · · + πr (v) = u1 + u2 + · · · + ur ,
43
4
...
LINEAR MAPS AND MATRICES
where ui = πi (v) ∈ Im(πi ) for i = 1,
...
So any vector can be written in this
form
...
, r
...
On the other hand, for j 6= i, we have
πi (u0j ) = πi π j (v j ) = 0,
since πi π j = O
...
So the only possible expression
is given by ui = πi (v), and the proof is complete
...
, πr
satisfying the conditions of the above Proposition
...
, r; then we define πi (v) = ui
...
4
...
If we choose a basis v1 ,
...
j=1
Then just as in the last section, the action of α on V is represented by the action of
A on Kn : α(v) is represented by the product Av
...
What happens if we change the basis? This also follows from the formula we
worked out in the last chapter
...
44
CHAPTER 4
...
3 Let α be a linear map on V which is represented by the matrix A
relative to a basis B, and by the matrix A0 relative to a basis B0
...
Then
A0 = P−1 AP
...
6, since P and Q are the same here
...
2 Two n × n matrices A and B are said to be similar if B = P−1 AP
for some invertible matrix P
...
There is no simple canonical form for similarity like the one for equivalence
that we met earlier
...
In the final section we give without proof a
general result for the complex numbers
...
3
Eigenvalues and eigenvectors
Definition 4
...
A vector v ∈ V is said to be an
eigenvector of α, with eigenvalue λ ∈ K, if v 6= 0 and α(v) = λ v
...
Note that we require that v 6= 0; otherwise the zero vector would be an eigenvector for any value of λ
...
The name eigenvalue is a mixture of German and English; it means “characteristic value” or “proper value” (here “proper” is used in the sense of “property”)
...
Here “latent” means “hidden”:
the idea is that the eigenvalue is somehow hidden in a matrix representing α, and
we have to extract it by some procedure
...
45
4
...
DIAGONALISABILITY
Example
Let
−6
A=
−12
6
...
Similarly, the vector w =
is an eigen3
vector with eigenvalue 3
...
y
y
In the next-but-one section, we will see how to find the eigenvalues, and the fact
that there cannot be more than n of them for an n × n matrix
...
4
Diagonalisability
Some linear maps have a particularly simple representation by matrices
...
4 The linear map α on V is diagonalisable if there is a basis of V
relative to which the matrix representing α is a diagonal matrix
...
, vn is such a basis showing that α is diagonalisable
...
, n, where aii is the ith diagonal entry of the diagonal
matrix A
...
Conversely, if we have a basis
of eigenvectors, then the matrix representing α is diagonal
...
4 The linear map α on V is diagonalisable if and only if there is a
basis of V consisting of eigenvectors of α
...
LINEAR MAPS ON A VECTOR SPACE
1 2
Example The matrix
is not diagonalisable
...
So we
cannot find a basis of eigenvectors
...
5 Let α : V → V be a linear map
...
, λr are the distinct eigenvalues of α,
and π1 ,
...
Proof Let λ1 ,
...
, vimi be a
basis for the λi -eigenspace of α
...
So (a) and (b) are equivalent
...
Proposition 4
...
, πr satisfying the conditions of (c) where Im(πi ) is the λi eigenspace
...
So (b) implies (c)
...
So (c) implies
(b), and we are done
...
Indeed, we see
4
3
that
−6
6
3 4
3 4 2 0
=
,
−12 11 2 3
2 3 0 3
so that P−1 AP is diagonal, where P is the matrix whose columns are the eigenvectors of A
...
12 −8
−12 9
47
4
...
DIAGONALISABILITY
Check directly that P12 = P1 , P22 = P2 , P1 P2 = P2 P1 = 0, P1 +P2 = I, and 2P1 +3P2 =
A
...
Proposition 4
...
5, that is, ∑ri=1 Pi = I and Pi Pj = O for i 6= j
...
i=1
Proof (a) The proof is by induction on m, the case m = 1 being the given expression
...
Then
Ak = Ak−1 A
r
=
∑ λik−1Pi
!
r
∑ λiPi
i=1
!
i=1
When we multiply out this product, all the terms Pi Pj are zero for i 6= j, and we
obtain simply ∑ri=1 λik−1 λi Pi , as required
...
(b) If f (x) = ∑ am xm , we obtain the result by multiplying the equation of part
(a) by am and summing over m
...
)
48
4
...
LINEAR MAPS ON A VECTOR SPACE
Characteristic and minimal polynomials
We defined the determinant of a square matrix A
...
The obvious way to do this is to take the determinant
of any matrix representing α
...
But, if A0 = P−1 AP, then
det(P−1 AP) = det(P−1 ) det(A) det(P) = det(A),
since det(P−1 ) det(P) = 1
...
5 (a) The determinant det(α) of a linear map α : V → V is the
determinant of any matrix representing T
...
(c) The minimal polynomial mα (x) of a linear map α : V → V is the monic
polynomial of smallest degree which is satisfied by α
...
But the third part also creates a bit of a problem:
how do we know that α satisfies any polynomial? The Cayley–Hamilton Theorem
tells us that cA (A) = O for any matrix A representing α
...
Indeed, the Cayley–Hamilton
Theorem can be stated in the following form:
Proposition 4
...
Proof Suppose not; then we can divide cα (x) by mα (x), getting a quotient q(x)
and non-zero remainder r(x); that is,
cα (x) = mα (x)q(x) + r(x)
...
But the degree of r is less than the degree of mα , so this contradicts the definition of mα as the polynomial of least degree satisfied by α
...
8 Let α be a linear map on V
...
4
...
CHARACTERISTIC AND MINIMAL POLYNOMIALS
49
Remark: This gives us a recipe to find the eigenvalues of α: take a matrix A
representing α; write down its characteristic polynomial cA (x) = det(xI − A); and
find the roots of this polynomial
...
9 −0
...
1 x − 0
...
9)(x−0
...
03 = x −1
...
6 = (x−1)(x−0
...
6, as we found
...
Then
λ I − α is not invertible, so its kernel is non-zero
...
Then (λ I − α)v = 0, so that α(v) = λ v; that is, λ is an eigenvalue
of α
...
Then (x − λ ) divides
mα (x)
...
(a) implies (c): Let λ be an eigenvalue of A with eigenvector v
...
By induction, α k (v) = λ k v for any k, and so f (α)(v) = f (λ )(v)
for any polynomial f
...
Using this result, we can give a necessary and sufficient condition for α to be
diagonalisable
...
Lemma 4
...
, vr be eigenvectors of α with distinct eigenvalues λ1 ,
...
Then v1 ,
...
Proof Suppose that v1 ,
...
Some of these coefficients may be zero; choose a
relation with the smallest number of non-zero coefficients
...
(If c1 = 0 just re-number
...
Subtracting λ1 times the first equation from the second, we get
c2 (λ2 − λ1 )v2 + · · · + cr (λr − λ1 )vr = 0
...
So the coefficients in
50
CHAPTER 4
...
That is, ci (λi − λ1 ) = 0, so ci = 0 (since λi 6= λ1 ),
for i = 2,
...
This doesn’t leave much of the original equation, only c1 v1 = 0,
from which we conclude that c1 = 0, contrary to our assumption
...
Theorem 4
...
Proof Suppose first that α is diagonalisable, with eigenvalues λ1 ,
...
Then
there is a basis such that α is represented by a diagonal matrix D whose diagonal
entries are the eigenvalues
...
, r
...
Then all the diagonal entries of f (D) are zero; so f (D) = 0
...
We know that each λi is a root of mα (x), so that f (x) divides mα (x); and we also
know that f (α) = 0, so that the degree of f cannot be smaller than that of mα
...
Conversely, we have to show that if mα is a product of distinct linear factors
then α is diagonalisable
...
Let f (x) =
∏(x − λi ) be the minimal polynomial of α, with the roots λi all distinct
...
Then the polynomials h1 ,
...
Now
the Euclidean algorithm shows that we can write the h
...
f
...
i=1
Let Ui = Im(hi (α)
...
Moreover every vector can be written as a sum of vectors
from the subspaces Ui
...
The fact that the expression is unique follows
from the lemma, since the eigenvectors are linearly independent
...
6
...
The
matrix equation can be rewritten as AP = PD, from which we see that the columns
of P are the eigenvectors of A
...
Then P−1 AP = D
...
For example, if the
characteristic polynomial is (x−1)2 (x−2)3 , then the minimal polynomial must be
one of (x − 1)(x − 2) (this would correspond to the matrix being diagonalisable),
(x−1)2 (x−2), (x−1)(x−2)2 , (x−1)2 (x−2)2 , (x−1)(x−2)3 or (x−1)2 (x−2)3
...
1 2
For example, the characteristic polynomial of A =
is (x − 1)2 ; its
0 1
minimal polynomial is not (x − 1) (since A 6= I); so it is (x − 1)2
...
6
Jordan form
We finish this chapter by stating without proof a canonical form for matrices over
the complex numbers under similarity
...
6
(a) A Jordan block J(n, λ ) is a matrix of the form
λ 1 0 ··· 0
0 λ 1 ··· 0
,
···
0 0 0 ··· λ
that is, it is an n × n matrix with λ on the main diagonal, 1 in positions
immediately above the main diagonal, and 0 elsewhere
...
)
(b) A matrix is in Jordan form if it can be written in block form with Jordan
blocks on the diagonal and zeros elsewhere
...
11 Over C, any matrix is similar to a matrix in Jordan form; that
is, any linear map can be represented by a matrix in Jordan form relative to a
suitable basis
...
52
CHAPTER 4
...
Example
Any 3 × 3 matrix over C is similar to one of
λ 0 0
λ 1 0
λ 1
0 µ 0 , 0 λ 0 , 0 λ
0 0 ν
0 0 µ
0 0
0
1 ,
λ
for some λ , µ, ν ∈ C (not necessarily distinct)
...
Its characteristic polyno−b a
mial is x2 − 2ax + (a2 + b2 ), so that the eigenvalues over C are a + bi and a − bi
...
But over the real numbers, A has no eigenvalues and no eigenvectors; it is not
diagonalisable, and cannot be put into Jordan form either
...
We can
always get around this by working in a larger field (as above, enlarge the
field from R to C)
...
This problem cannot be transformed away by enlarging the field; we
are stuck with what we have
...
4
...
Definition 4
...
53
4
...
TRACE
Proposition 4
...
(a) For any two n × n matrices A and B, we have Tr(AB) =
(b) Similar matrices have the same trace
...
Now obviously Tr(BA) is the same thing
...
The second part of this proposition shows that, if α : V → V is a linear map,
then any two matrices representing α have the same trace; so, as we did for the
determinant, we can define the trace Tr(α) of α to be the trace of any matrix
representing α
...
Proposition 4
...
(a) The coefficient of xn−1 is − Tr(α), and the constant term is (−1)n det(α)
...
Proof Let A be a matrix representing α
...
−a1n
...
...
x − ann
The only way to obtain a term in xn−1 in the determinant is from the product
(x − a11 )(x − a22 ) · · · (x − ann ) of diagonal entries, taking −aii from the ith factor
and x from each of the others
...
) So the
coefficient of xn−1 is minus the sum of the diagonal terms
...
If α is diagonalisable then the eigenvalues are the roots of cα (x):
cα (x) = (x − λ1 )(x − λ2 ) · · · (x − λn )
...
54
CHAPTER 4
...
The linear forms comprise the dual
space of V ; we look at this and define dual bases and the adjoint of a linear map
(corresponding to the transpose of a matrix)
...
We show that we can change
the basis to put any quadratic form into “diagonal form” (with squared terms only),
by a process generalising “completing the square” in elementary algebra, and that
further reductions are possible over the real and complex numbers
...
1
Linear forms and dual space
The definition is simple:
Definition 5
...
A linear form on V is a linear map
from V to K, where K is regarded as a 1-dimensional vector space over K: that is,
it is a function from V to K satisfying
f (v1 + v2 ) = f (v1 ) + f (v2 ),
f (cv) = c f (v)
for all v1 , v2 , v ∈ V and c ∈ K
...
If f = [ a1 a2
...
xn ]> we have
x1
x2
f (v) = [ a1 a2
...
= a1 x1 + a2 x2 + · · · + an xn
...
LINEAR AND QUADRATIC FORMS
Conversely, any row vector of length n represents a linear form on Kn
...
2 Linear forms can be added and multiplied by scalars in the obvious
way:
( f1 + f2 )(v) = f1 (v) + f2 (v),
(c f )(v) = c f (v)
...
Not surprisingly, we have:
Proposition 5
...
Proof We begin by observing that, if (v1 ,
...
, an
are any scalars whatsoever, then there is a unique linear map f with the property
that f (vi ) = ai for i = 1,
...
It is given by
f (c1 v1 + · · · + cn vn ) = a1 c1 + · · · + an cn ,
in other words, it is represented by the row vector [ a1 a2
action on Kn is by matrix multiplication as we saw earlier
...
...
, fn ) form a basis for V ∗ ; indeed, the linear form f defined in the
preceding paragraph is a1 f1 + · · · + an fn
...
Since it has n elements, we see that
dim(V ∗ ) = n = dim(V )
...
Definition 5
...
, n} is defined by the rule
that
1 if i = j,
δi j =
0 if i 6= j
...
Now, if (v1 ,
...
, fn ) satisfying
fi (v j ) = δi j
...
For example,
n
∑ δi j ai = a j
i=1
for fixed j ∈ {1,
...
This is because all terms of the sum except the term i = j
are zero
...
1
...
1
...
4 Let α : V → W be a linear map
...
The map α ∗ is called the adjoint of α
...
We are given α : V → W and asked to
define α ∗ : W ∗ → V ∗
...
This linear form must act on
vectors v ∈ V to produce scalars
...
Now α ∗ , being a linear map, is represented by a matrix when we choose bases
for W ∗ and V ∗
...
What is the matrix? Some calculation
shows the following, which will not be proved in detail here
...
2 Let α : V → W be a linear map
...
Let B∗ and C∗
denote the dual bases of V ∗ and W ∗ corresponding to B and C
...
5
...
2
Change of basis
Suppose that we change bases in V from B = (v1 ,
...
, v0n ), with
change of basis matrix P = PB,B0
...
, fn ) is the dual basis of B, and (B0 )∗ = ( f10 ,
...
Proposition 5
...
Then
−1
>
PB∗ ,(B0 )∗ = PB,B
...
If P = PB,B0 has (i, j)
entry pi j , and Q = PB∗ ,(B0 )∗ has (i, j) entry qi j , we have
v0i =
f j0 =
n
∑ pkivk ,
k=1
n
∑ ql j fl ,
l=1
58
CHAPTER 5
...
k=1
Now qk j is the ( j, k) entry of Q> , and so we have
I = Q> P,
whence Q> = P−1 , so that Q = P−1
5
...
Quadratic forms
A lot of applications of mathematics involve dealing with quadratic forms: you
meet them in statistics (analysis of variance) and mechanics (energy of rotating
bodies), among other places
...
5
...
1
Quadratic forms
For almost everything in the remainder of this chapter, we assume that
the characteristic of the field K is not equal to 2
...
Of our list of
“standard” fields, this only excludes F2 , the integers mod 2
...
)
A quadratic form as a function which, when written out in coordinates, is a
polynomial in which every term has total degree 2 in the variables
...
We will meet a formal definition of a quadratic form later in the chapter, but
for the moment we take the following
...
2
...
5 A quadratic form in n variables x1 ,
...
In the above representation of a quadratic form, we see that if i 6= j, then the
term in xi x j comes twice, so that the coefficient of xi x j is ai j + a ji
...
That is, to obtain a
term cxi x j , we take ai j = a ji = c/2
...
)
Any quadratic form is thus represented by a symmetric matrix A with (i, j)
entry ai j (that is, a matrix satisfying A = A> )
...
We think of a quadratic form as defined above as being a function from the
vector space Kn to the field K
...
>
q(x1 ,
...
xn
Now if we change the basis for V , we obtain a different representation for the
same function q
...
Thus we have
v> Av = (Pv0 )>A(Pv0 ) = (v0 )> (P> AP)v0 ,
so we have the following:
Proposition 5
...
As for other situations where matrices represented objects on vector spaces,
we make a definition:
Definition 5
...
Proposition 5
...
60
CHAPTER 5
...
We will see that the answer to this question depends
on the field over which we work
...
5
...
2
Reduction of quadratic forms
Even if we cannot find a canonical form for quadratic forms, we can simplify them
very greatly
...
6 Let q be a quadratic form in n variables x1 ,
...
Then by a suitable linear substitution to new
variables y1 ,
...
, cn ∈ K
...
We call a quadratic form which is written
as in the conclusion of the theorem diagonal
...
Now assume that the theorem is true for forms
in n − 1 variables
...
, xn ) = ∑
n
∑ ai j xix j ,
i=1 j=1
where ai j = a ji for i 6= j
...
By a permutation of the variables (which
is certainly a linear substitution), we can assume that a11 6= 0
...
i=2
Then we have
n
a11 y21 = a11 x12 + 2 ∑ a1i x1 xi + q0 (x2 ,
...
, xn
...
So we have
q(x1 ,
...
, xn ),
61
5
...
QUADRATIC FORMS
where q00 is the part of q not containing x1 minus q0
...
, xn ) = ∑ ci y2i ,
i=2
and so we are done (taking c1 = a11 )
...
Now
xi j = 41 (xi + x j )2 − (xi − x j )2 ,
so taking xi0 = 12 (xi + x j ) and x0j = 12 (xi − x j ), we obtain a new form for q which
does contain a non-zero diagonal term
...
Case 3: All ai j are zero
...
Example 5
...
We have
(x + y + 2z)2 = x2 + 2xy + 4xz + y2 + 4z2 + 4yz,
and so
q = (x + y + 2z)2 − 4yz
= (x + y + 2z)2 − (y + z)2 + (y − z)2
= u2 + v2 − w2 ,
where u = x + y + 2z, v = y − z, w = y + z
...
−1
Can you find an invertible matrix P such that P> AP = A0 ?
62
CHAPTER 5
...
But this is still not a “canonical form for congruence”
...
In other words, we can multiply
any αi by any factor which is a perfect square in K
...
Suppose that
α1 ,
...
Putting
√
( αi )xi for 1 ≤ i ≤ r,
yi =
xi
for r + 1 ≤ i ≤ n,
we have
q = y21 + · · · + y2r
...
Over the real numbers R, things are not much worse
...
, αs > 0, αs+1 ,
...
, αn = 0
...
Again, we will see later that s and t don’t depend on how we do the reduction
...
]
5
...
3
Quadratic and bilinear forms
The formal definition of a quadratic form looks a bit different from the version we
gave earlier, though it amounts to the same thing
...
Definition 5
...
We say that b is a bilinear form if it is a linear function of
each variable when the other is kept constant: that is,
b(v, w1 + w2 ) = b(v, w1 ) + b(v, w2 ),
b(v, cw) = cb(v, w),
with two similar equations involving the first variable
...
63
5
...
QUADRATIC FORMS
(b) Let q : V → K be a function
...
Remarks The bilinear form in the second part is symmetric; and the division
by 2 in the definition is permissible because of our assumption that the characteristic of K is not 2
...
Note that the formula b(x, y) = 12 (q(x + y) − q(x) − q(y)) (which is known as
the polarisation formula) says that the bilinear form is determined by the quadratic
term
...
So these are equivalent objects
...
, vn ) is a basis for V ,
then we can represent b by the n × n matrix A whose (i, j) entry is ai j = b(vi , v j )
...
It is easy to see that this is the same as the
matrix representing the quadratic form
...
Let V ∗ be the dual
space of V , and let α : V → V ∗ be a linear map
...
The function
b(v, w) = α(v)(w)
is a bilinear form on V
...
Conversely, a symmetric bilinear form b gives rise to a linear
map α : V → V ∗ satisfying α(v)(w) = α(w)(v), by the rule that α(v) is the linear
map w 7→ b(v, w)
...
Then α is represented by a matrix A relative to the bases B and B∗
...
LINEAR AND QUADRATIC FORMS
Proposition 5
...
Moreover, if corresponding objects of these three types are represented by matrices as described above, then we get the same matrix A in each case
...
Proof Only the last part needs proof
...
So suppose that α : V → V ∗ , and
we change from B to B0 in V with transition matrix P
...
Now go back to the discussion
of linear maps between different vector spaces in Chapter 4
...
Apply this with Q = P> )−1 , so that
Q−1 = P> , and we see that the new matrix is P> AP, as required
...
2
...
Recall that two symmetric matrices A and A0 are congruent if A0 = P> AP for some
invertible matrix P; as we have seen, this is the same as saying that the represent
the same quadratic form relative to different bases
...
8 Any n × n complex symmetric matrix A is congruent to a matrix of
the form
Ir O
O O
for some r
...
Proof We already saw that A is congruent to a matrix of this form
...
65
5
...
QUADRATIC FORMS
The next result is Sylvester’s Law of Inertia
...
9 Any n × n real symmetric matrix A is congruent to a matrix of the
form
Is O O
O −It O
O O O
for some s,t
...
Proof Again we have seen that A is congruent to a matrix of this form
...
Suppose that two different reductions give the values s,t and s0 ,t 0 respectively,
with s + t = s0 + t 0 = n
...
Now let q be the
quadratic form represented by A
...
, yn and z1 ,
...
, xn of q such that
q = y21 + · · · + y2s − y2s+1 − · · · − y2s+t = z21 + · · · + z2s0 − z2s0 +1 − · · · − z2s+t
...
, ys = 0, zs0 +1 = 0,
...
, xn
...
According to a lemma from much earlier in the course (we used it in the proof of the Exchange Lemma!), the equations
have a non-zero solution
...
, xn , not all zero, such
that the variables y1 ,
...
, zn are all zero
...
But since zs0 +1 = · · · = zn = 0, we also have
q = z21 + · · · + z2s0 > 0
...
So we cannot have s < s0
...
So we must have s = s0 , as required to be proved
...
LINEAR AND QUADRATIC FORMS
We saw that s+t is the rank of A
...
Of course, both the rank and the signature are independent of how we reduce
the matrix (or quadratic form); and if we know the rank and signature, we can
easily recover s and t
...
Let q be a quadratic form in n variables represented by the real symmetric
matrix A
...
We say that q (or A) is
• positive definite if s = n (and t = 0), that is, if q(v) ≥ 0 for all v, with equality
only if v = 0;
• positive semidefinite if t = 0, that is, if q(v) ≥ 0 for all v;
• negative definite if t = n (and s = 0), that is, if q(v) ≤ 0 for all v, with
equality only if v = 0;
• negative semi-definite if s = 0, that is, if q(v) ≤ 0 for all v;
• indefinite if s > 0 and t > 0, that is, if q(v) takes both positive and negative
values
...
) can all be derived
from a special kind of bilinear form on the space known as an inner product
...
One can also define inner products for complex vector spaces, but things are
a bit different: we have to use a form which is not quite bilinear
...
6
...
1 An inner product on a real vector space V is a function b : V ×V →
R satisfying
• b is bilinear (that is, b is linear in the first variable when the second is kept
constant and vice versa);
• b is positive definite, that is, b(v, v) ≥ 0 for all v ∈ V , and b(v, v) = 0 if and
only if v = 0
...
An inner product is sometimes called a dot
product (because of this notation)
...
|w| cos θ , where |v|
and |w| are the lengths of v and w, and θ is the angle between v and w
...
But
it is much easier to reverse the process
...
INNER PRODUCT SPACES
for any vector v ∈ V ; and, if v, w 6= 0, then we define the angle between them to be
θ , where
v·w
...
|w|
For this definition to make sense, we need to know that
−|v|
...
|w|
for any vectors v, w (since cos θ lies between −1 and 1)
...
1 If v, w are vectors in an inner product space then
(v · w)2 ≤ (v · v)(w · w)
...
Expanding, we obtain
x2 (w · w) + 2x(v · w) + (v · v) ≥ 0
...
Since it is non-negative for all real x, either it has
no real roots, or it has two equal real roots; thus its discriminant is non-positive,
that is,
(v · w)2 − (v · v)(w · w) ≤ 0,
as required
...
Definition 6
...
, vn ) for an inner product space is called orthonormal if vi · v j = δi j (the Kronecker delta) for 1 ≤ i, j ≤ n
...
, vn satisfy vi · v j = δi j , then they are necessarily linearly independent
...
Taking the inner product
of this equation with vi , we find that ci = 0, for all i
...
2 Let · be an inner product on a real vector space V
...
, vn ) for V
...
xn ]> and w = [ y1 y2
...
6
...
INNER PRODUCTS AND ORTHONORMAL BASES
69
Proof This follows from our reduction of quadratic forms in the last chapter
...
q = x12 + · · · + xs2 − xs+1
Now we must have s = n and t = 0
...
Either of these would contradict the positive definiteness of V
...
, xn ) = x12 + · · · + xn2 ,
and by polarisation we find that
b((x1 ,
...
, yn )) = x1 y1 + · · · + xn yn ,
as required
...
Let w1 ,
...
The Gram–Schmidt process works as follows
...
Put v1 = w1 /|w1 |; then
|v1 | = 1, that is, v1 · v1 = 1
...
, n, let w0i = wi − (v1 · wi )v1
...
, n
...
, w0n )
...
So if we end up with vectors
v2 ,
...
, n
...
, n; by what we have said, this holds if i or j is 1 as well
...
3 The inner product on Rn for which the standard basis is orthonormal (that is, the one given in the theorem) is called the standard inner product on
Rn
...
INNER PRODUCT SPACES
Example 6
...
To simplify things, I will write (a1 , a2 , a3 ) instead of [ a1 a2 a3 ]>
...
Now v1 · w2 = 1 and v1 · w3 = 13 , so in the second step we find
w02 = w2 − v1 = ( 23 , 13 , − 23 ),
w03 = w3 − 13 v1 = ( 89 , − 29 , 29 )
...
We have w02 · w02 = 1,
so v2 = w02 = ( 32 , 13 , − 32 )
...
Finally, w003 · w003 = 49 , so v3 = 32 w003 = ( 23 , − 23 , 13 )
...
6
...
The importance of an inner product is that the
corresponding linear map is a bijection which maps an orthonormal basis of V to
its dual basis in V ∗
...
Now suppose that
(v1 ,
...
Then, if α(vi ) = fi ,
we have fi (v j ) = δi j ; but this is exactly the statement that ( f1 ,
...
, vn )
...
Recall too that we defined the adjoint of α : V → V to be the map α ∗ : V ∗ → V ∗
defined by α ∗ ( f )(v) = f (α(v)), and we showed that the matrix representing α ∗
relative to the dual basis is the transpose of the matrix representing α relative to
the original basis
...
2
...
4 Let V be an inner product space, and α : V → V a linear map
...
Proposition 6
...
Now we define two important classes of linear maps on V
...
5 Let α be a linear map on an inner product space V
...
(b) α is orthogonal if it is invertible and α ∗ = α −1
...
4 If α is represented by a matrix A (relative to an orthonormal
basis), then
(a) α is self-adjoint if and only if A is symmetric;
(b) α is orthogonal if and only if A> A = I
...
6 Two real symmetric matrices are called orthogonally similar if
they represent the same self-adjoint map with respect to different orthonormal
bases
...
5 Two real symmetric matrices A and A0 are orthogonally similar
if and only if there is an orthogonal matrix P such that A0 = P−1 AP = P> AP
...
We see that orthogonal similarity is a
refinement of both similarity and congruence
...
72
CHAPTER 6
...
Theorem 6
...
Proof We have
α(v) · α(w) = v · α ∗ (α(w)),
by the definition of adjoint; so (a) and (b) are equivalent
...
, vn ) is an orthonormal basis, that is, vi · v j = δi j
...
, α(vn ) is an orthonormal basis,
and (c) holds
...
, vn ), so that v · w = ∑ xi yi
...
Corollary 6
...
Proof The columns of the matrix representing α are just the vectors α(v1 ),
...
, vn
...
Alternatively, the condition on columns shows that
A> A = I, where A is the matrix representing α; so α ∗ α = I, and α is orthogonal
...
Chapter 7
Symmetric and Hermitian matrices
We come to one of the most important topics of the course
...
But there is more to be said!
7
...
Definition 7
...
The
orthogonal complement of U is the set of all vectors which are orthogonal to
everything in U:
U ⊥ = {w ∈ V : w · u = 0 for all u ∈ U}
...
1 If V is an inner product space and U a subspace of V , with
dim(V ) = n and dim(U) = r, then U ⊥ is a subspace of V , and dim(U ⊥ ) = n − r
...
Proof Proving that U ⊥ is a subspace is straightforward from the properties of
the inner product
...
The argument for scalar
multiples is similar
...
Then apply the Gram–
Schmidt process to this basis (starting with the elements of the basis for U), to
obtain an orthonormal basis (v1 ,
...
Since the process only modifies vectors
by adding multiples of earlier vectors, the first r vectors in the resulting basis will
form an orthonormal basis for U
...
SYMMETRIC AND HERMITIAN MATRICES
U, and so lie in U ⊥ ; and they are clearly linearly independent
...
, vn ) is the orthonormal basis we constructed
...
, r; so w is a linear combination of the last n − r
basis vectors, which thus form a basis of U ⊥
...
Now the last statement of the proposition follows from the proof, since we
have a basis for V which is a disjoint union of bases for U and U ⊥
...
If
we have projections P1 ,
...
This can be refined
in an inner product space as follows
...
2 Let V be an inner product space
...
Proposition 7
...
Proof We know that V = Ker(π) ⊕ Im(π); we only have to show that these two
subspaces are orthogonal
...
Then
v · w = v · π(u) = π ∗ (v) · u = π(v) · u = 0,
as required
...
3 Let π1 ,
...
Let Ui = Im(πi )
for i = 1,
...
Then
V = U1 ⊕U2 ⊕ · · · ⊕Ur ,
and if ui ∈ Ui and u j ∈ U j , then ui and u j are orthogonal
...
2
...
So take ui and u j as in the
Proposition, say ui = πi (v) and u j = π j (w)
...
A direct sum decomposition satisfying the conditions of the theorem is called
an orthogonal decomposition of V
...
75
7
...
THE SPECTRAL THEOREM
7
...
I emphasise that these two
theorems are the same! Either of them can be referred to as the Spectral Theorem
...
4 If α is a self-adjoint linear map on a real inner product space V ,
then the eigenspaces of α form an orthogonal decomposition of V
...
Moreover, there
exist orthogonal projections π1 ,
...
, λr are the distinct eigenvalues of α
...
5 Let A be a real symmetric matrix
...
In other words, any real symmetric matrix
is orthogonally similar to a diagonal matrix
...
So we concentrate on
the first theorem
...
The proof will be by induction on n = dim(V )
...
So we assume that the theorem holds for (n − 1)-dimensional spaces
...
Choose an orthonormal basis; then α is represented by a real symmetric matrix A
...
(The
so-called “Fundamental Theorem of Algebra” asserts that any polynomial over C
has a root
...
Now we can find a
column vector v ∈ Cn such that Av = λ v
...
If v = [ z1 z2 · · · zn ]> , then we have
λ (|z1 |2 + |z2 |2 + · · · + |zn |2 ) =
=
=
=
=
λ v> v
(Av)> v
v> Av
v> (λ v)
λ (|z1 |2 + |z2 |2 + · · · + |zn |2 ),
so (λ − λ )(|z1 |2 + |z2 |2 + · · · + |zn |2 ) = 0
...
76
CHAPTER 7
...
Let U be the subspace v⊥ = {u ∈ V : v · u = 0}
...
We claim that α : U → U
...
Then
v · α(u) = α ∗ (v) · u = α(v) · u = λ v · u = 0,
where we use the fact that α is self-adjoint
...
So α is a self-adjoint linear map on the (n − 1)-dimensional inner product
space U
...
They are all orthogonal to the unit vector v; so, adding v to the
basis, we get an orthonormal basis for V , and we are done
...
If we require that the eigenvalues
occur in decreasing order down the diagonal, then the result is a true canonical
form: each matrix is orthogonally similar to a unique diagonal matrix with this
property
...
6 If α is self-adjoint, then eigenvectors of α corresponding to distinct eigenvalues are orthogonal
...
If α(v) = λ v
and α(w) = µw, then
λ v · w = α(v) · w = α ∗ (v) · w = v · α(w) = µv · w,
so, if λ 6= µ, then v · w = 0
...
1 Let
10
A= 2
2
2
13
4
2
4
...
For eigenvalue 18 the eigenvectors satisfy
10 2
2
x
18x
2 13 4 y = 18y ,
2
4 13
z
18z
77
7
...
QUADRATIC FORMS REVISITED
so the eigenvectors are multiples of [ 1 2 2 ]>
...
For the eigenvalue 9, the eigenvectors satisfy
10 2
2
x
9x
2 13 4 y = 9y ,
2
4 13
z
9z
that is, x + 2y + 2z = 0
...
) Thus the eigenspace is 2dimensional
...
√
This can be
in
√ done
>
2
−1/
2
]
and
many different
ways:
for
example,
we
could
choose
[
0
1/
√
√
√
[ −4/3 2 1/3 2 1/3 2 ]>
...
We conclude that, if
√
1/3
0√
−4/3√ 2
P = 2/3 1/ √2
1/3√2 ,
2/3 −1/ 2 1/3 2
then P is orthogonal, and
18
>
P AP = 0
0
0 0
9 0
...
7
...
This gives us a
new look at the reduction of real quadratic forms
...
78
CHAPTER 7
...
7 The rank of a real symmetric matrix is equal to the number of
non-zero eigenvalues, and the signature is the number of positive eigenvalues minus the number of negative eigenvalues (counted according to multiplicity)
...
, λn , say
...
, λs
are positive, λs+1 ,
...
Let D be a
diagonal matrix with diagonal entries
p
p
p
p
1/ λ1 ,
...
, 1/ −λs+t , 1,
...
Then
Is
>
> >
(PD) APD = D P APD = O
O
7
...
O
Simultaneous diagonalisation
There are two important theorems which allow us to diagonalise more than one
matrix at the same time
...
Theorem 7
...
Then there exists an invertible matrix P such that P> AP = I and
P> BP is diagonal
...
Proof A is a real symmetric matrix, so there exists an invertible matrix P1 such
that P1> AP1 is in the canonical form for congruence (as in Sylvester’s Law of Inertia)
...
Now consider P1> BP = C
...
Moreover, P2 is orthogonal, so P2> P2 = I
...
Then
P> AP = P2> (P1> AP1 )P2 = P2> IP2 = I,
and
P> BP = P2> (P1> BP1 )P2 = P2>CP2 = D,
as required
...
4
...
Now we have
det(P1> ) det(xA−B) det(P1 ) = det(P1> (xA−B)P1 ) = det(xP1> AP1 −P1> BP1 ) = det(xI −C),
and det(P1> ) = det(P1 ) is non-zero; so the polynomials det(xA−B) and det(xI −C)
are non-zero multiples of each other and so have the same roots
...
If a mechanical system has n coordinates x1 ,
...
, x˙n , and (from general physical principles) is positive definite (zero velocities correspond to minimum energy); near equilibrium, the potential energy is
approximated by a quadratic function of the coordinates x1 ,
...
If we simultaneously diagonalise the matrices of the two quadratic forms, then we can solve n
separate differential equations rather than a complicated system with n variables!
The second theorem can be stated either for linear maps or for matrices
...
9 (a) Let α and β be self-adjoint maps on an inner product space
V , and suppose that αβ = β α
...
(b) Let A and B be real symmetric matrices satisfying AB = BA
...
Proof Statement (b) is just a translation of (a) into matrix terms; so we prove (a)
...
, λr be the distinct eigenvalues of α
...
We claim that β maps Ui to Ui
...
Then
α(β (u)) = β (α(u)) = β (λi u) = λi β (u),
so β (u) is also an eigenvector of α with eigenvalue λi
...
Now β is a self-adjoint linear map on the inner product space Ui , and so by the
spectral theorem again, Ui has an orthonormal basis consisting of eigenvectors of
β
...
Finally, since we have an orthogonal decomposition, putting together all these
bases gives us an orthonormal basis of V consisting of simultaneous eigenvectors
of α and β
...
SYMMETRIC AND HERMITIAN MATRICES
Remark This theorem easily extends to an arbitrary set of real symmetric matrices such that any two commute
...
For an infinite
set, we use the fact that they span a finite-dimensional subspace of the space of
all real symmetric matrices; to diagonalise all the matrices in our set, it suffices to
diagonalise the matrices in a basis
...
However, some changes are required
...
Usually, the proofs are similar to those in the real
case
...
1
Complex inner products
There are no positive definite bilinear forms over the complex numbers; for we
always have (iv) · (iv) = −v · v
...
Definition 8
...
]
denotes complex conjugation
...
As before, we write b(v, w) as v · w
...
(Sometimes we say that b is semilinear (that is,
“ 21 ”-linear) as a function of its first variable, and describe it as a sesquilinear form
81
82
CHAPTER 8
...
A form satisfying (b) is called Hermitian, and one satisfying
(c) is positive definite
...
)
The definition of an orthonormal basis is exactly as in the real case, and the
Gram–Schmidt process allows us to find one with only trivial modifications
...
xn ]> , w = [ y1 · · · yn ]>
...
(Take the complex conjugates of all the entries in A, and then transpose
...
• a map which preserves the inner product (that is, which satisfies α(v) ·
α(w) = v · w, or α ∗ = α −1 ) is represented by a matrix A satisfying (A)> =
A−1 : such a matrix is called unitary
...
2
The complex Spectral Theorem
The spectral theorem for self-adjoint linear maps on complex inner product spaces
is almost identical to the real version
...
The definition of an orthogonal projection is the same: a projection which is
self-adjoint
...
1 If α is a self-adjoint linear map on a complex inner product space
V , then the eigenspaces of α form an orthogonal decomposition of V
...
Moreover, there
exist orthogonal projections π1 ,
...
, λr are the distinct eigenvalues of α
...
3
...
2 Let A be a complex Hermitian matrix
...
There is one special feature of the complex case:
Proposition 8
...
Proof Suppose that α is self-adjoint and α(v) = λ v
...
So (λ − λ )v · v = 0
...
We also have a theorem on simultaneous diagonalisation:
Proposition 8
...
Then there is an orthonormal basis for
V consisting of eigenvectors of both α and β
...
You are invited to formulate the theorem in
terms of commuting Hermitian matrices
...
3
Normal matrices
The fact that the eigenvalues of a complex Hermitian matrix are real leaves open
the possibility of proving a more general version of the spectral theorem
...
In fact,
the converse is also true
...
In other words, a real matrix is orthogonally similar to a diagonal matrix if and
only if it is symmetric
...
What really happens is the following
...
2 (a) Let α be a linear map on a complex inner-product space V
...
84
CHAPTER 8
...
We say that A is normal if it commutes
>
>
with its conjugate transpose: AA = A A
...
5 (a) Let α be a linear map on a complex inner product space V
...
(b) Let A be an n × n matrix over C
...
Proof As usual, the two forms of the theorem are equivalent
...
If α has an orthonormal basis (v1 ,
...
, n, where λi are eigenvalues
...
Since αα ∗ and α ∗ α agree on the vectors of a basis, they are equal; so α is normal
...
Let
β = 12 (α + α ∗ ),
γ=
1
∗
2i (α − α )
...
The analogy is even closer, since
clearly we have α = β + iγ
...
For
β∗ =
where we use the fact
1
∗
2 (α + α) = β ,
1
γ ∗ = −2i
(α ∗ − α) = γ,
that (cα)∗ = cα ∗
...
For
1 2
(α − αα ∗ + α ∗ α − (α ∗ )2 ) =
4i
1 2
γβ =
(α + αα ∗ − α ∗ α − (α ∗ )2 ) =
4i
(Here we use the fact that αα ∗ = α ∗ α
...
4i
Hence, by the Proposition at the end of the last section, there is an orthonormal
basis B whose vectors are eigenvectors of β and γ, and hence are eigenvectors of
α = β + iγ
...
Chapter 9
Skew-symmetric matrices
We spent the last three chapters looking at symmetric matrices; even then we
could only find canonical forms for the real and complex numbers
...
We find a canonical form
for these matrices under congruence which works for any field whatever
...
)
9
...
1 Let V be a vector space over K
...
Proposition 9
...
Proof
0 = b(v + w, v + w) = b(v, v) + b(v, w) + b(w, v) + b(w, w) = b(v, w) + b(w, v)
for any v, w ∈ V , using the definition of an alternating bilinear form
...
85
86
CHAPTER 9
...
2 Let b be an alternating bilinear form on a vector space V
...
, us , w1 ,
...
, zt ) for V such that b(ui , wi ) = 1 and
b(wi , ui ) = −1 for i = 1,
...
Proof If b is identically zero, then simply choose a basis (z1 ,
...
So suppose not
...
Replacing w by
w/c, we have b(u, w) = 1
...
For suppose that cu + dw = 0
...
We take u1 = u and w1 = v as our first two basis vectors
...
We claim that
V = U ⊕W
...
So take a vector v ∈ V , and let x = −b(w, v)u + b(u, v)w
...
Thus v − x ∈ W
...
Now b is an alternating bilinear form on W , and so by induction there is a
basis of the required form for W , say (u2 ,
...
, ws , z1 ,
...
Putting in
u1 and w1 gives the required basis for V
...
2
Skew-symmetric and alternating matrices
A matrix A is skew-symmetric if A> = −A
...
If the
characteristic of the field K is not equal to 2, then any skew-symmetric matrix is
alternating; but if the characteristic is 2, then the extra condition is needed
...
, vn ):
its (i, j) entry is b(vi , v j )
...
3 An alternating bilinear form b on a vector space over K is represented by an alternating matrix; and any alternating matrix represents an alternating bilinear form
...
9
...
SKEW-SYMMETRIC AND ALTERNATING MATRICES
87
Proof This is obvious since if b is alternating then a ji = b(v j , vi ) = −b(vi , v j ) =
−ai j and aii = b(vi , vi ) = 0
...
4 Let A be an alternating matrix (or a skew-symmetric matrix over a
field whose characteristic is not equal to 2)
...
Moreover the number s is half the rank of A, and so is independent
of the choice of P
...
Also, the matrix in the
statement of the theorem is just the matrix representing b relative to the special
basis that we found in the preceding theorem
...
5 (a) The rank of a skew-symmetric matrix (over a field of characteristic not equal to 2) is even
...
Proof (a) The canonical form in the theorem clearly has rank 2s
...
that it is invertible
...
Each of these blocks has determinant 1,
−1 0
and hence so does the whole matrix
...
If the size n of A is odd, then the rank cannot be n (by (a)), and so det(A) = 0
...
For example,
0
a
b
c
−a 0
0 a
d
e
= a f − be + cd
...
)
88
9
...
SKEW-SYMMETRIC MATRICES
Complex skew-Hermitian matrices
What if we play the same variation that led us from real symmetric to complex
Hermitian matrices? That is, we are working in a complex inner product space,
>
and if α is represented by the matrix A, then its adjoint is represented by A , the
conjugate transpose of A
...
So we
make the following definition:
>
Definition 9
...
Actually, things are very much simpler here, because of the following observation:
Proposition 9
...
Proof Try it and see!
Corollary 9
...
Proof This follows immediately from the Proposition preceding
...
5)
...
Appendix A
Fields and vector spaces
Fields
A field is an algebraic structure K in which we can add and multiply elements,
such that the following laws hold:
Addition laws
(FA0) For any a, b ∈ K, there is a unique element a + b ∈ K
...
(FA2) There is an element 0 ∈ K such that a + 0 = 0 + a = a for all a ∈ K
...
(FA4) For any a, b ∈ K, we have a + b = b + a
...
(FM1) For all a, b, c ∈ K, we have a(bc) = (ab)c
...
(FM3) For any a ∈ K with a 6= 0, there exists a−1 ∈ K such that aa−1 =
a−1 a = 1
...
Distributive law
(D) For all a, b, c ∈ K, we have a(b + c) = ab + ac
...
FIELDS AND VECTOR SPACES
Note the similarity of the addition and multiplication laws
...
Then (FM0)–(FM4) say that (K \ {0}, ·)
is also an abelian group
...
)
Examples of fields include Q (the rational numbers), R (the real numbers), C
(the complex numbers), and F p (the integers mod p, for p a prime number)
...
If there is a positive integer n such that 1+1+· · ·+1 = 0,
where there are n ones in the sum, then the smallest such n is prime
...
But in a field, the
product of two non-zero elements is non-zero
...
If no such n exists, we say that the characteristic of K is
zero
...
Vector spaces
Let K be a field
...
(VA1) For all u, v, w ∈ V , we have u + (v + w) = (u + v) + w
...
(VA3) For any v ∈ V , there exists −v ∈ V such that v + (−v) = (−v) + v = 0
...
Scalar multiplication laws
(VM0) For any a ∈ K, v ∈ V , there is a unique element av ∈ V
...
(VM2) For any a, b ∈ K, v ∈ V , we have (a + b)v = av + bv
...
91
(VM4) For any v ∈ V , we have 1v = v (where 1 is the element given by (FM2))
...
The most important example of a vector space over a field K is the set Kn of
all n-tuples of elements of K: the addition and scalar multiplication are defined
by the rules
(u1 , u2 ,
...
, vn ) = (u1 + v1 , u2 + v2 ,
...
, vn ) = (av1 , av2 ,
...
The fact that Kn is a vector space will be assumed here
...
Here is a particularly easy one, the proof of (VM4),
as an example
...
, vn ), then
1v = 1(v1 ,
...
, 1vn ) = (v1 ,
...
The second step uses the definition of scalar multiplication in K n , and the third
step uses the field axiom (FM2)
...
FIELDS AND VECTOR SPACES
Appendix B
Vandermonde and circulant
matrices
The Vandermonde matrix V (a1 , a2 ,
...
1
a1
a2
...
a
a
...
...
n−1
n−1
n−1
a1
a2
...
We can write down its determinant explicitly:
Theorem B
...
, an )) = ∏(a j − ai )
...
From this theorem, we draw the following conclusion:
Corollary B
...
, an ) is invertible if and only if the parameters a1 , a2 ,
...
For the determinant can be zero only if one of the factors vanishes
...
We see that f (ai ) = 0
for 1 ≤ i ≤ n − 1, since the result is the determinant of a matrix with two equal
columns
...
VANDERMONDE AND CIRCULANT MATRICES
where K is independent of x
...
In the same way, all differences (a j − ai ) for i < j are factors,
so that the determinant is K0 times the product of all these differences, where K0
does not contain any of a1 ,
...
To find K0 , we observe that the leading diagonal of the matrix gives us a term
in the determinant with sign +1; but this product is obtained by
a2 a23 · · · an−1
n
taking the term with larger index from each factor in the product, also giving sign
+1
...
Another general type of matrix whose determinant can be calculated explicitly
is the circulant matrix, whose general form is as follows:
a0
a1
a2
...
an−2
C(a0 ,
...
an−3
...
...
a0
Theorem B
...
, an−1 ) be a circulant matrix over the field C
...
Then
(a) C is diagonalisable;
jk
(b) the eigenvalues of C are ∑n−1
j=0 a j ω , for k = 0, 1,
...
Proof We can write down the eigenvectors
...
, n − 1, let
vk = [ 1 ω k
...
The jth entry in Cvk is
an− j + an− j+1 ω k + · · · + an− j−1 ω (n−1)k
= a0 ω jk + · · · + an− j−1 ω (n−1)k + an− j ω nk + · · · + an−1 ω (n+ j−1)k
= ω jk (a0 + a1 ω k + · · · + an−1 ω (n−1)k ),
using the fact that ω n = 1
...
So
Cvk = (a0 + a1 ω k + · · · + an−1 ω (n−1)k )vk ,
as required
...
, vn−1 are linearly independent
...
, ω n−1 ), and the powers of ω are
all distinct; so the first part of this appendix shows that the determinant of this
95
matrix is non-zero, so that the columns are linearly independent
...
Finally, for part (c), the determinant of a diagonalisable matrix is the product
of its eigenvalues
...
1 We have the identity
a b c
a3 + b3 + c3 − 3abc = c a b = (a + b + c)(a + ωb + ω 2 c)(a + ω 2 b + ωc),
b c a
where ω = e2πi/3
...
Consider the equation
x3 + ax2 + bx + c = 0
...
Now, as above, we have
y3 − 3uvy + u3 + v3 = (y + u + v)(y + ωu + ω 2 v)(y + ω 2 u + ωv),
so if we can find u and v satisfying −3uv = d and u3 + v3 = e, then the solutions
of the equation are y = −u − v, y = −ωu − ω 2 v, and y = −ω 2 u − ωv
...
Then U +V = e and UV = −d 3 /27
...
Now u is a cube
root of U, and then v = −d/(3u), and we are done
...
96
APPENDIX B
...
The theorem asserts that the configuration must look like this, where we represent people by dots and friendship by edges:
u
A
A
u
u
u
HH A
HHA
u
H
A H
A HH
H
Hu
u
A
A
Au
u
The proof of the theorem is in two parts
...
We argue by contradiction, and so we assume that we
have a counterexample to the theorem
...
[In the terminology of graph theory, this says that we have
a regular graph of valency m
...
For they have one common friend P3 ; any further
97
98
APPENDIX C
...
u
u
u
D
...
A D
A D
A D
AD
u
ADu
u
Now let us suppose that there are two people P and Q who have different
numbers of friends
...
They
have a common friend R
...
Now if S is the friend of P but not Q, and T is the friend of Q but not P,
then any possible choice of the common friend of S and T leads to a contradiction
...
But this means that we don’t have a counterexample after all
...
Step 2: Linear algebra We prove that m = 2
...
, Pn
...
Then by assumption,
A is an n × n symmetric matrix
...
Consider the product AJ
...
So every entry of AJ is m, whence AJ = mJ
...
Thus, A and J are commuting symmetric matrices, and so
by Theorem 7
...
We will calculate their
eigenvalues
...
If j is the column vector with all entries 1, then clearly
J j = n j, so j is an eigenvector of J with eigenvalue n
...
Now v · j = 0 means that the sum of the components of v is
zero; this implies that Jv = 0
...
Now we turn to A, and observe that
A2 = (m − 1)I + J
...
If i = j, this number is m, while if i 6= j then (by assumption)
it is 1
...
The all-one vector j satisfies A j = m j, so is an eigenvector of A with eigenvalue m
...
(Exercise: Prove this by a counting argument in the
graph
...
Thus, if v is an eigenvector of A with eigenvalue λ , not a multiple of j, then
λ 2 v = A2 v = ((m − 1)I + J)v = (m − 1)v,
√
so λ 2 = m − 1, and λ = ± m − 1
...
So if we let f and g
be the multiplicities of m − 1 and − m − 1 as eigenvalues of A, we have
√
√
√
0 = Tr(A) = m + f m − 1 + g(− m − 1) = m + ( f − g) m − 1
...
But the trace equation is 0 = m + ( f − g)u; this
says that 0 ≡ 1 mod u
...
But then m = 2, n = 3, and
we have the Three Musketeers (three individuals, any two being friends)
...
So
the theorem is proved
...
THE FRIENDSHIP THEOREM
Appendix D
Who is top of the league?
In most league competitions, teams are awarded a fixed number of points for a
win or a draw
...
How can we take this into account?
You might think of giving each team a “score” to indicate how strong it is, and
then adding the scores of all the teams beaten by team T to see how well T has
performed
...
So suppose we ask simply that the score of T
should be proportional to the sum of the scores of all the teams beaten by T
...
Let T1 ,
...
Let A be the n × n matrix whose (i, j) entry is equal to 1 if TI
beats T j , and 0 otherwise
...
xn ]> of scores, the
ith entry of Ax is equal to the sum of the scores x j for all teams T j beaten by Ti
...
Here is an example
...
Suppose that
A beats B, C, D, E;
B beats C, D, E, F;
C beats D, E, F;
D beats E, F;
E beats F;
F beats A
...
WHO IS TOP OF THE LEAGUE?
The matrix A is
0 1 1 1 1 0
0 0 1 1 1 1
0 0 0 1 1 1
0 0 0 0 1 1
...
Also, E and F have the fewest
wins, but F took A’s scalp and should clearly be better
...
7744
0
...
4307
0
...
1920
0
...
0085
...
But perhaps there is a different eigenvalue and/or eigenvector which would
give us a different result?
In fact, there is a general theorem called the Perron–Frobenius theorem which
gives us conditions for this method to give a unique answer
...
Definition D
...
We
say that A is indecomposable if, for any i, j with 1 ≤ i, j ≤ n, there is a number m
such that the (i, j) entry of Am is strictly positive
...
, Tkm with Tk0 = Ti and Tkm = T j ,
sich that each team in the chain beats the next one
...
In this case, obviously the teams in C occupy
the top places in the league, and we have reduced the problem to ordering these
teams
...
In our example, we see that B beats F beats A, so the (2, 1) entry in A2 is
non-zero
...
So A is indecomposable in this case
...
1 (Perron–Frobenius Theorem) Let A be a n × n real matrix with
all its entries non-negative, and suppose that A is indecomposable
...
xn ]> for A with
the property that xi > 0 for all i
...
So the Perron–Frobenius eigenvector solves the problem of ordering the teams
in the league
...
Suppose, for example, that there are five teams A, B, C, D, E; and suppose
that A beats B and C, B beats C and D, C beats D and E, D beats E and A, and E
beats A and B
...
The matrix A is
0 1 1 0 0
0 0 1 1 0
,
A=
0
0
0
1
1
1 0 0 0 1
1 1 0 0 0
which is easily seen to be indecomposable; and if v is the all-1 vector, then Av =
2v, so that v is the Perron–Frobenius eigenvector
...
In this case, it is clear that there is so much symmetry
between the teams that none can be put above the others by any possible rule
...
For example, instead of just
putting the (i, j) entry equal to 1 if Ti beats T j , we could take it to be the number
of goals by which Ti won the game
...
How does an Internet search
engine like Google find the most important web pages that match a given query?
An important web page is one to which a lot of other web pages link; this can be
described by a matrix, and we can use the Perron–Frobenius eigenvector to do the
ranking
...
WHO IS TOP OF THE LEAGUE?
Appendix E
Other canonical forms
One of the unfortunate things about linear algebra is that there are many types
of equivalence relation on matrices! In this appendix I say a few brief words
about some that we have not seen elsewhere in the course
...
)
Row-equivalence
Two matrices A and B of the same size over K are said to be row-equivalent if
there is an invertible matrix P such that B = PA
...
(This is true because any invertible matrix can be written as a product of
elementary matrices; see Corollary 2
...
)
A matrix A is said to be in echelon form if the following conditions hold:
• The first non-zero entry in any row (if it exists) is equal to 1 (these entries
are called the leading ones);
• The leading ones in rows lower in the matrix occur further to the right
...
For example, the matrix
0 1 a b 0 c
0 0 0 0 1 d
0 0 0 0 0 0
is in reduced echelon form, whatever the values of a,
...
105
106
APPENDIX E
...
1 Any matrix is row-equivalent to a unique matrix in reduced echelon
form
...
As far as I know it does
not have a standard name
...
(See page 16)
...
To see this, use row operations to put the matrix into reduced echelon form, then column permutations to move the columns containing the leading
ones to the front of the matrix
...
It would take us too far afield to explain why this equivalence relation is important in coding theory
...
This is the natural
relation arising from representing a quadratic form relative to different bases
...
In other cases, it is usually much harder to come up with a canonical form
...
I state the result for quadratic
forms
...
2 Let F p be the field of integers mod p, where p is an odd prime
...
A quadratic form q in n variables
over F p can be put into one of the forms
x12 + · · · + xr2 ,
2
+ cxr2
x12 + · · · + xr−1
by an invertible linear change of variables
...
Appendix F
Worked examples
1
...
3
(a) Find a basis for the row space of A
...
(d) Find invertible matrices P and Q such that PAQ is in the canonical form for equivalence
...
0 0 0 0 0
The first two rows clearly form a basis for the row space
...
(c) The column rank is equal to the row rank and so is also equal to 2
...
The first and second columns are not linearly independent, so we cannot
use these! (Note that we have to go back to the original A here; row operations
change the column space, so selecting two independent columns of B would not
be correct
...
WORKED EXAMPLES
(d) By step (a), we have PA = B, where P is obtained by performing the same
elementary row operations on the 3 × 3 identity matrix I3 :
1
0 0
P = 1 −1 0
...
Q=
0
1
0
0
−2
0 0
0 1 0
0 0
0 0 1
Remark: P and Q can also be found by multiplying elementary matrices, if
desired; but the above method is simpler
...
2
...
, Pn
...
During the year, some voters change their minds; a
proportion ai j of former supporters of P j will support Pi at the end
of the year
...
(a) Prove that the vector giving the support for the parties at the end
of the year is Av
...
Show that the vector giving the support for
the parties at the end of m years is Am v
...
9 0
...
0
...
7
Show that, after a long time, the support for the parties will be
approximately 0
...
25 for P2
...
From what we are given, the proportion supporting P j at the beginning
of the year was x j , and a fraction ai j of these changed their support to Pi
...
The total support for Pi is found by adding these
up for all j: that is,
n
yi =
∑ ai j x j ,
j=1
or v0 = Av, where v0 is the vector [ y1
...
(b) Let vk be the column vector whose ith component is the proportion of the
population supporting party Pi after the end of k years
...
An exactly similar argument shows that vk = Avk−1
for any k
...
(The result of (a) starts
the induction with m = 1
...
)
(c) The matrix P has characteristic polynomial
x − 0
...
3
2
−0
...
7 = x − 1
...
6 = (x − 1)(x − 0
...
So the eigenvalues of P are 1 and 0
...
We
find
bysolving
linear equations that
3
1
eigenvectors for the two eigenvalues are
and
respectively
...
75 0
...
25 −0
...
P1 =
−0
...
75
0
...
25
110
APPENDIX F
...
) Then P is diagonalisable:
A = P1 + 0
...
From this and Proposition 4
...
6)m P2
...
So in the limit, if v0 =
is
y
the matrix giving the initial support for the parties, with x + y = 1, then the matrix
giving the final support is approximately
0
...
75 x
0
...
75
=
=
...
25 0
...
25(x + y)
0
...
6)m
As a check, use the computer with Maple to work out Pm for some large value
of m
...
7515116544 0
...
0
...
2545349632
3
...
A second basis for V is given by w1 = v1 + v2 + v3 , w2 =
2v1 + v2 + v3 , w3 = 2v2 + v3
...
The first dual basis vector g1 satisfies g1 (w1 ) = 1, g1 (w2 ) = g1 (w3 ) = 0
...
So g1 = − f1 − 2 f2 + 4 f3
...
Alternatively, the transition matrix P from the vs to the ws is
1 2 0
P = 1 1 2,
1 1 1
and we showed in Section 5
...
2 that the transition matrix between the dual bases
is
−1 1
0
(P−1 )> = −2 1
1
...
111
4
...
0 1
Let A be the matrix
...
The equation for Fn is proved by induction on n
...
Suppose that it holds for n; then
Fn−1
Fn
0 1
Fn
Fn−1 + Fn
Fn
Fn+1
n+1
n
A
= A ·A =
=
=
...
To find a formula for Fn , we show that A is diagonalisable, and then write
A = λ1 P1 + λ2 P2 , where P1 and P1 are projection matrices with sum I satisfying
P1 P2 = P2 P1 = 0
...
From here it is just calculation
...
(Since the eigenvalues are
distinct, we know that A is diagonalisable, so the method will work
...
Rather
we can now argue as
√ than find P1 explicitly,
√
follows: 1 = F1 = c(λ1 − λ2 ) = c 5, so that c = 1/ 5 and
√ !n
√ !n !
1
1+ 5
1− 5
Fn = √
−
...
Let Vn be the vector space of real polynomials of degree at most n
...
112
APPENDIX F
...
(c) Apply the Gram–Schmidt process to the basis (1, x, x2 ) to find
an orthonormal basis for V2
...
Let D : Wn →
Wn be the linear map given by differentiation: (D f )(x) = f 0 (x)
...
(a) Put b( f , g) = 01 f (x)g(x) dx
...
So
we have to show that it is linear in the first variable, that is, that
R
Z 1
0
Z 1
( f1 (x) + f2 (x))g(x) dx =
0
Z 1
Z 1
f1 (x)g(x) dx +
0
f2 (x)g(x) dx,
Z 1
(c f (x))g(x) dx = c
0
f (x)g(x) dx,
0
which are clear from elementary calculus
...
This is clear from properties of
integration
...
6
1
7
(c) The first basis vector is clearly 1
...
To make x2
orthogonal to the two preceding is the same as making it orthogonal to 1 and x, so
we replace it by x2 + bx + c; we find that
1
3
1
4
so that b = −1 and c = 16
...
√ (x − ),
1,
2
6
2 3
6 5
(d) Integration by parts shows that
f · D(g) =
Z 1
1
180 ;
so
f (x)g0 (x) dx
0
=
[ f (x)g(x)]10 −
Z 1
f 0 (x)g(x) dx
0
= −(D f ) · g,
where the first term vanishes because of the condition on polynomials in Wn
...
6
...
Is each of the following
statements true or false? Give brief reasons
...
If A and B are orthogonally similar then they are similar
...
If A and B are similar then they are orthogonally similar
...
Thus it is clear that both (a) and (b) are true
...
If A and B are similar, then
they have the same eigenvalues, and so are orthogonally congruent to the same
diagonal matrix, and so to each other
...
By Sylvester’s Law of Inertia, any real symmetric matrix is congruent to a
diagonal matrix with diagonal entries 1, −1 and 0
...
For example, the matrices I and 2I are congruent but not
orthogonally similar
...
7
...
A=
0 1 0 1
1 1 1 1
1 1 1 1
1 0 1 0
114
APPENDIX F
...
First solution We have to find an orthonormal basis which consists of eigenvectors for both matrices
...
If v1 = (1, 1, 1, 1) then Av1 =
4v1 and Bv1 = 2v1
...
Any
further eigenvector v = (x, y, z, w) should be orthogonal to both of these, that is,
x + y + z + w = 0 = x − y + z − w
...
Conversely, any such
vector satisfies Av = 0 and Bv = 0
...
Normalising,
we obtain√the
√
required basis: (1, 1, 1, 1)/2, (1, −1, 1, −1)/2, (1, 0, −1, 0)/ 2, (0, 1, 0, −1)/ 2
...
P=1
1
− √12
0
2
2
1
1
0
− √12
2 −2
Second solution Observe that both A and B are circulant matrices
...
The second and fourth columns have corresponding eigenvalues 0 for both matrices, and hence so do any linear combinations of them; in particular, we can replace these two columns by their real and
imaginary parts, giving (after a slight rearrangement) the matrix
1 1
1
0
1 −1 0
1
1 1 −1 0
...
The results of Appendix B also allow us to write down the eigenvalues of A
and B without any calculation
...
115
Remark
A more elegant solution is the matrix
1 1
1
1
1
1 −1 1 −1
...
It is an n × n
matrix H with all entries ±1 satisfying H > H = nI
...
The smallest order for which the existence of a Hadamard matrix is still in
doubt is (at the time of writing) n = 668
...
exercise, show that, if H is a Hadamard matrix of size n, then
As a further
H H
is a Hadamard matrix of size 2n
...
)
1 1
1 1
8
...
1 2
1 0
Find an invertible matrix P and a diagonal matrix D such that P> AP =
I and P> BP = D, where I is the identity matrix
...
The form is x2 + 2xy + 2y2 , which is (x + y)2 + y2
...
)
1 1
Now the matrix that transforms (x, y) to (x + y, y) is Q =
, since
0 1
Hence
[x
1
0
1
1
x
x+y
=
...
Now, if we put
P = Q−1
1
=
0
−1
, we see that P> AP = P> (Q> Q)P = I
...
WORKED EXAMPLES
What about P> BP? We find that
1 0 1 1 1
>
P BP =
−1 1 1 0 0
−1
1
=
1
0
0
= D,
−1
the required diagonal matrix
...
Remark 1: In general it is not so easy
...
So by the Spectral Theorem, we can find an
orthogonal matrix P2 such that P1> (P1> BP1 )P2 is diagonal
...
) Then because P2 is orthogonal,
we have
P2> (P1> AP1 )P2 = P2> IP2 = I,
so that P = P1 P2 is the required matrix
...
We saw in the lectures that the diagonal
entries of D are the roots of the polynomial det(xA − B) = 0
...
Index
abelian group, 3, 90, 91
addition
of linear maps, 36
of matrices, 15
of scalars, 3
of vectors, 4
adjoint, 57, 71
adjugate, 27
alternating bilinear form, 85
alternating matrix, 86
axioms
for field, 3, 89
for vector space, 3, 90
column operations, 16
column rank, 20
column vector, 10, 15
complex numbers, 3, 90
congruence, 59, 106
coordinate representation, 10
coordinatewise, 5
cubic equation, 95
data, 10
determinant, 23, 87
of linear map, 48
diagonalisable, 45
dimension, 8
direct sum, 14, 41
distributive law, 3
dot product, 67
dual basis, 56
dual space, 56
basis, 6
orthonormal, 68
bilinear form, 62
alternating, 85
symmetric, 62
canonical form, 20
for congruence, 64, 65
for equivalence, 17, 20, 39
for orthogonal similarity, 76
for similarity, 51
Cauchy–Schwarz inequality, 68
Cayley–Hamilton Theorem, 30, 48
characteristic of field, 58, 90
characteristic polynomial
of linear map, 48
circulant matrix, 94, 114
coding equivalence, 106
cofactor, 27
cofactor expansion, 27
echelon form, 105
eigenspace, 44
eigenvalue, 44
eigenvector, 44
elementary column operations, 16
elementary matrix, 18
elementary row operations, 16
equivalence, 20, 39
error-correcting codes, 106
Euclidean plane, 4
Exchange Lemma, 7
Fibonacci numbers, 111
field, 3
117
118
finite-dimensional, 6
Friendship Theorem, 97
Fundamental Theorem of Algebra, 75
Google (Internet search engine), 103
Gram–Schmidt process, 69, 85
graph theory, 97
Hadamard matrix, 115
Hermitian, 82, 83
identity linear map, 42
identity matrix, 12, 16
image, 33
indecomposable matrix, 102
indefinite, 66
inner product, 67, 81
standard, 69
integers mod p, 90
intersection, 13
inverse matrix, 12, 29
invertible
linear map, 37
matrix, 12, 24, 29, 37
Jordan block, 51
Jordan form, 51
kernel, 33
Kronecker delta, 56
linear form, 55
linear map, 33
linear transformation, 33
linearly independent, 6
Maple (computer algebra system), 102,
110
matrix, 15
alternating, 86
circulant, 94, 114
Hadamard, 115
Hermitian, 82
INDEX
identity, 16
indecomposable, 102
invertible, 19, 37
normal, 84
orthogonal, 71
skew-Hermitian, 88
skew-symmetric, 86
symmetric, 59, 71
unitary, 82
Vandermonde, 93, 114
minimal polynomial, 48
minor, 27
multiplication
of linear maps, 36
of matrices, 15
of scalars, 3
negative definite, 66
negative semi-definite, 66
non-singular, 12
normal linear map, 84
normal matrix, 84
nullity, 34
orthogonal complement, 73
orthogonal decomposition, 74
orthogonal linear map, 71
orthogonal projection, 74
orthogonal similarity, 71
orthogonal vectors, 73
orthonormal basis, 68
parallelogram law, 4
permutation, 25
Perron–Frobenius Theorem, 102
Pfaffian, 87
polarisation, 63
positive definite, 66, 67, 82
positive semi-definite, 66
projection, 41
orthogonal, 74
119
INDEX
quadratic form, 59, 63
rank
of linear map, 34, 40
of matrix, 17, 40
of quadratic form, 66, 78
Rank–Nullity Theorem, 34
rational numbers, 3, 90
real numbers, 3, 90
ring, 16
row operations, 16
row rank, 20
row vector, 10, 15, 55
row-equivalence, 105
scalar, 4
scalar multiplication, 4
self-adjoint, 71
semilinear, 82
sesquilinear, 82
sign, 25
signature, 66, 78
similarity, 44
orthogonal, 71
skew-Hermitian, 88
skew-symmetric matrix, 86
spanning, 6
Spectral Theorem, 75, 82, 83
standard inner product, 69, 82
subspace, 13
sum, 13
Sylvester’s Law of Inertia, 62, 65, 77
symmetric group, 25
trace, 52, 99
transition matrix, 11
transposition, 25
unitary, 82, 83
Vandermonde matrix, 93, 114
vector, 4
vector space, 4
complex, 4
real, 4
zero matrix, 16
zero vector, 4
Title: Linear Algebra For AI/ ML
Description: In this notes you will get all the linear algebra needed for ai ml from beginning to advanced,
Description: In this notes you will get all the linear algebra needed for ai ml from beginning to advanced,