Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Bioinformatics
Description: Scoring matrix determination

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Scoring Matrices
Shifra Ben-Dor
Irit Orr

Scoring matrices
❆ Sequence alignment and database searching

programs compare sequences to each other
as a series of characters
...

❆ Scoring matrices are used to assign a score
to each comparison of a pair of characters
...

❆ In most cases a positive score is given to

identical or similar character pairs, and a
negative or zero score to dissimilar
character pairs
...
This scoring scheme is not much used
...
This matrix
scores identical bp 3, transitions 2, and
transversions 0
...
g
size, shape or charge of the aa)
...
These
matrices are constructed by analyzing the
substitution frequencies seen in the
alignments of known families of proteins
...

• This score is based on the observed
frequencies of such occurrences in
alignments of evolutionary related proteins
...


Observed Scoring Matrices
• Identities are assigned the most positive

scores
• Frequently observed substitutions also
receive positive scores
• Mismatches , or matches that are unlikely
to have been a result of evolution, are
given negative scores
...

• These ratios are called odds scores
...

• Odds scores and log odds scores are used to
score protein alignments

Different types of matrices
• Observed Scoring Matrices are superior to

simple identity scores, or scores based
solely on chemical propensities of the
amino

• The most frequently used observed log odds

matrices used are the PAM and BLOSUM
matrices
...

❆ Derived from global alignments of very
similar sequences (at least 85% identity), so
that there would be little likelihood of an
observed change being the result of several
successive mutations, but it should reflect
one mutation only
...


❆ An accepted point mutation in a protein is a

replacement of one amino acid by another,
accepted by natural selection
...
To be
accepted, the new amino acid usually must
function in a way similar to the old one: chemical
and physical similarities are found between the
amino acids that are observed to interchange
frequently
...

❆ PAM gives the probability that a given
amino acid will be replaced by any other
particular amino acid after a given
evolutionary interval, in this case 1 accepted
point mutation per 100 amino acids
...
(this
lets us add the scores along a protein instead
of multiplying the probabilities)
❆ The resulting matrix is the “log-odds”
matrix, known as the PAM matrix
...
Greater numbers are greater
distances
...


PAM250
❆ At this evolutionary distance, only one amino acid in

five remains unchanged
...
52% of the
cysteines and 27% of the glycines would still be
unchanged, but only 6% of the highly mutable
asparagines would remain
...
0
...
231049
Expected score = -0
...
354 bits
Lowest score = -8, Highest score = 17
A
2
-2
0
0
-2
0
0
1
-1
-1
-2
-1
-1
-3
1
1
1

R
-2
6
0
-1
-4
1
-1
-3
2
-2
-3
3
0
-4
0
0
-1

N
0
0
2
2
-4
1
1
0
2
-2
-3
1
-2
-3
0
1
0

D
0
-1
2
4
-5
2
3
1
1
-2
-4
0
-3
-6
-1
0
0

C
-2
-4
-4
-5
12
-5
-5
-3
-3
-2
-6
-5
-5
-4
-3
0
-2

Q
0
1
1
2
-5
4
2
-1
3
-2
-2
1
-1
-5
0
-1
-1

E
0
-1
1
3
-5
2
4
0
1
-2
-3
0
-2
-5
-1
0
0

G
1
-3
0
1
-3
-1
0
5
-2
-3
-4
-2
-3
-5
0
1
0

H
-1
2
2
1
-3
3
1
-2
6
-2
-2
0
-2
-2
0
-1
-1

I
-1
-2
-2
-2
-2
-2
-2
-3
-2
5
2
-2
2
1
-2
-1
0

L
-2
-3
-3
-4
-6
-2
-3
-4
-2
2
6
-3
4
2
-3
-3
-2

K
-1
3
1
0
-5
1
0
-2
0
-2
-3
5
0
-5
-1
0
0

M
-1
0
-2
-3
-5
-1
-2
-3
-2
2
4
0
6
0
-2
-2
-1

F
-3
-4
-3
-6
-4
-5
-5
-5
-2
1
2
-5
0
9
-5
-3
-3

P
1
0
0
-1
-3
0
-1
0
0
-2
-3
-1
-2
-5
6
1
0

S
1
0
1
0
0
-1
0
1
-1
-1
-3
0
-2
-3
1
2
1

T
1
-1
0
0
-2
-1
0
0
-1
0
-2
0
-1
-3
0
1
3

W
-6
2
-4
-7
-8
-5
-7
-7
-3
-5
-2
-3
-4
0
-6
-2
-5

Y
-3
-4
-2
-4
0
-4
-4
-5
0
-1
-1
-4
-2
7
-5
-3
-3

V
0
-2
-2
-2
-2
-2
-2
-1
-2
4
2
-2
2
-1
-1
-1
0

B
0
-1
2
3
-4
1
3
0
1
-2
-3
1
-2
-4
-1
0
0

Z
0
0
1
3
-5
3
3
0
2
-2
-3
0
-2
-5
0
0
-1

X
0
-1
0
-1
-3
-1
-1
-1
-1
-1
-1
-1
-1
-2
-1
0
0

*
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8

Pet91 - an updated Dayhoff matrix
❆ Since the family of PAM matrices were

derived from a comparatively small number
of families, many of the possible mutations
were not observed
...
have derived an updated matrix
by examining a very large number of
families, and created the PET91 scoring
matrix
...

❆ First, multiple alignments of short regions
(without gaps) of related sequences were
gathered
...


BLOSUM Matrices
❆ Substitution frequencies for all pairs of

amino acids were calculated between the
groups, this was used to create the log-odds
BLOSUM ( Block Substitution Matrix )
...

❆ Thus, BLOSUM62 means that the

sequences clustered in this block are at
least 62% identical
...


BLOSUM62 MATRIX
#
#
#
#
#
#
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y

Matrix made by matblas from blosum62
...
0/blocks
...
6979, Expected = -0
...

❆ Lower PAM matrices tend to find short
alignments of highly similar regions
...

❆ For local alignments use BLOSUM matrices
...

❆ BLOSUM matrices with LOW number, are
better for distant sequences
...

❆ When doing global alignment (and database

scanning) of related (similar) sequences use
PAM200 or PAM250
...
g
...

❆ For local database scanning (e
...


Tips…
...



Title: Bioinformatics
Description: Scoring matrix determination