Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Bioinformatics
Description: Bioinformatics note

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Phylogene)c
Inference:

Part
II

Shifra
Ben‐Dor

Irit
Orr

June
2010


The “ideal” method to build a phylogenetic tree
•  Will
be
based
on
sequences
with
biological
relevance

to
the
ques)on
being
asked

•  Will
extract
the
maximum
amount
of
informa)on

available
from
the
sequence
data

•  Will
combine
this
informa)on
with
prior
knowledge

of
paHerns
of
sequences
evolu)on

(evolu)onary

models)

•  Will
add
model
parameters
(such
as
transi)on/
transversion
bias)
whose
values
are
not
known
a

priori
...
22E+020











8
...
84E+074











6
...


Methods
of
tree
searching

•  Exhaus)ve
(imprac)cal
for
all
but
the
smallest

datasets)
–
branch
addi)on
...


Then
you
add
the
fiah

branch
to
all
of
the
fourth
branch
level
trees,

on
all
possible
branches…
...


Methods
of
tree
searching

•  A
shortcut
for
this
is
known
as
the
branch‐and‐
bound
method
...


You
add
(go
down)
a
branch
...


If
the
score
goes
down,
you
don’t
follow

that
path
anymore
(the
score
will
keep
going

down)
so
you
back
up
one
level
and
try
again

un)l
you
get
to
the
)p
...


The
whole
tree
is
eventually

covered,
and
you
end
up
with
the
best
one
...


These
are
known
as

“hill‐climbing”
algorithms,
where
the
idea
is
to

get
to
the
top
of
the
hill
(the
maximum)
and

hope
that
it’s
the
global
maximum
...


Op)miza)on
is
done
so
that
the
best

neighbor
(most
closely
related
taxa)
is
chosen
...

All
possible
connec)ons
are
made
between
a

branch
in
one
tree,
and
a
branch
in
the
other,

looking
to
find
the
best
one


More
on
evolu)onary
models
...
Character Methods
Distance
is
the
measure
of
how
related
the
sequences

are,
as
measured
by
observed
differences
in
the

sequence
(number
of
changes)

Character‐states
are
the
actual
sequences:
the

character
is
the
posi)on,
and
what
is
there
is
the

state
For
example:
for
DNA
at
any
given
posi)on

(character)
there
are
four
possible
states
(A,
C,
G,
T)


Character
analysis
lets
us
locate
where
in
the
tree

each
site
changed,
while
Distance
analysis
tells
us

how
much
change
occurred
along
each
branch
...
Character Methods
Distance
methods
also
are
generally
considered

algorithmic
methods,
where
they
use
an
algorithm

to
construct
a
tree
from
the
data
(in
this
case,
the

distance
matrix)
...

Character
methods,
on
the
other
hand,
are

considered
tree‐searching
methods,
where
they

build
many
trees,
and
then
have
to
decide
which
is

(or
are)
the
best
...


Distance
(pairwise)
Methods

•  Distance
‐
the
number
of
subs)tu)ons
per
site

per
)me
period
...

From
this

matrix
the
method
es)mates
the
phylogene)c

rela)onships
of
the
OTUs
...





Mathema)cal
models
allow
for
correc)ng
the

percentage
differences
between
sequences,

based
on
the
DNA

models
...

•  Evolu)onary
distance
is
always
bigger
than
the

distance
calculated
by
direct
sequence

comparison
...



The
disadvantage
of
distance
methods:





Inevitable
loss
of
evolu)onary
informa)on

when
the
method
discards
the
actual

sequences,
(character
state
of
the
taxa),
since

the
sequence
alignment
is
converted
to
pairwise

distances
...


Distance
method
steps

•  First
build
a
distance
matrix

•  Find
the
most
closely
related
taxa

•  Combine
them
(give
them
a
distance)
and

remove
them
from
the
matrix

•  Build
a
new
matrix
with
the
remaining
sequences

•  Con)nue
un)l
all
sequences
are
connected

•  Build
a
tree
from
the
resul)ng
series
of
matrices

working
from
the
last
to
the
first
(the
root
or

more
distantly
related
to
the
most
closely
related,

or
least
evolu)onarily
distant)


Distance method steps

Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
•  The
oldest
method
to
reconstruct
phylogene)c

trees
from
distance
data
...

The
newly
formed
cluster

replaces
its
OTUs
in
the
distance
matrix
...


Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
•  This
process
is
repeated
un)l
all
the
OTUs
are

clustered
...

•  UPGMA
is
based
on
the
molecular
clock

hypothesis
–
the
evolu)onary
rate
is
the
same

in
all
branches
(or
that
all
sequences
are

equally
distant
from
the
root)





This
assump)on
is
seldom
true
...

•  The
NJ
method
constructs
a
phylogene)c
tree,
by

joining
neighbors,
(OTUs),
by
a
branch
to
the
same

node
(common
ancestor)
...




Distances Methods:

Neighbor-Joining
•  NJ
starts
with
a
matrix
like
UPGMA

•  It
then
calculates
the
“net
divergence”
of
one

OTU
from
all
others
as
the
sum
of
distances
to

that
OTU
...

•  The
lowest
scoring
pair
is
then
chosen,
and
the

distances
to
the
node
that
join
them
is
taken
...


A
tree
is
built
from
the
series
of

matrices
...

This
distance

is
inferred
from
the
observed
differences

between
sequences
...

The
nucleo)de
or
aa
appearing
in

this
posi)on
is
a
state
...

Character‐state
methods
retain
the
original

status
of
the
taxa,
therefore
can
be
used
to

aHempt
the
reconstruc)on
of
the
character‐
state
of
ancestral
nodes
...


Taken from Dr
...
Itai Yanai

Maximum
Parsimony
Methods

The
Maximum
Parsimony
method
is
good
for
similar

sequences,
a
sequence
group
with
a
small
amount

of
varia)on

Maximum
Parsimony
methods
do
not
give
the
branch

lengths,
only
the
branch
order
...


If
more

taxa
were
added,
a
truer
picture
might
appear
...


Character
Based
Methods:


Maximum
Likelihood
•  ML
will
give
the
most
likely
tree
given
the
data

under
a
par)cular
model
–
if
you
change
the

model,
you
will
get
a
different
tree

•  ML
method
–
(like
the
Maximum
Parsimony

method)
performs
its
analysis
on
each
posi)on

individually
in
the
mul)ple
alignment
(like

parsimony)
...

•  Unlike
Parsimony,
ML
does
take
iden)cal

posi)ons
into
account,
and
can
give
branch

lengths


Character
Based
Methods:


Maximum
Likelihood
•  Likelihood
methods
regard
the
observed
data
as
a

fixed
observa)on
and
seek
the
values
of
the
sta)s)cal

parameters
that
provide
the
most
probable

descrip)on
of
the
data,
given
the
model
of
evolu)on
...

•  These
proper)es
make
likelihood
very
suited
to

historical
inference
problems,
in
which
the
observed

data
arise
only
once
...


The
first

ball
is
thrown
...


If
its
to
the
lea
of
the
first
ball,

Player
B
(Irit)
gets
a
point
...




•  The
problem:

You
can’t
see
the
table
...

You
are
only
told

who
gets
a
point
...

1177‐8


Character
Based
Methods:

Bayesian
Analysis

•  Aaer
8
throws,
the
score
is
Shifra
5,
Irit
3
...

•  The
only
thing
we
do
know
is
the
current

standings
in
the
game
...

1177‐8


Character
Based
Methods:

Bayesian
Analysis

•  If
we
knew
where
the
first
ball
fell,
then
we

could
calculate
the
probability
...

•  So
we
have
to
calculate
a
probability
of,
say,

Irit
winning
(observing
some
outcome)
given
a

model
based
on
what
happened
in
the
past
...

1177‐8


Character
Based
Methods:

Bayesian
Analysis

“The
Bayesian
approach
is
to
write
down
exactly

the
probability
we
want
to
infer,
in
terms
only
of

the
data
we
know,
and
directly
solve
the
resul)ng

equa)on
...

1177‐8


Character
Based
Methods:

Bayesian
Analysis

“Bayes
theorem”


The
probability
that
of
a
par)cular
choice
p


given
the
data
(the
Posterior
Probability)
is

propor)onal
to
the
likelihood
of
p
(the

probability
that
we
would
get
the
observed

data
if
p
were
true),
mul)plied
by
the
a
priori

probability
(prior
probability)
of
this
p
being

true
rela)ve
to
all
other
values
of
p
...

1177‐8


How
does
this
apply
to
phylogene)cs?

•  The
data
we
have
at
hand
are
the
current

sequences
(taxa)
...

•  unfortunately,
those
are
unknown
...



What
Bayesian
methods
do
is
integrate
over

degrees
of
uncertainty
...

•  Very
computa)onally
intensive
–
we
use

Markov
Chain
Monte
Carlo
(MCMC
methods)

•  Its
very
hard
to
figure
out
what
prior

probability
should
be
used
...


Its
using
probability
as
a
measure

of
confidence
...

a
new

state
of
the
chain
is
defined
(by
moving
a

branch
and/or
changing
a
branch
length)
...


Problems
or
rough
spots
...
(burn
in)

•  How
many
genera)ons
to
run
the
mcmc

simula)ons


How
do
we
know
if
the
results
are

reliable?


“The
results
of
a
phylogene)c

analysis
are
explicitly
uncertain;

accuracy
is
a
pipe
dream
...



•  As
with
es)mates
of
model
parameters,
a
single

point
es)mate
is
of
liHle
value
without
some

measure
of
the
confidence
we
can
place
in
it
...



☺

It
is
the
crea)on
of
pseudoreplicate
datasets
by

randomly
resampling
the
original
dataset
...

☺
Bootstrapping
is
used
to
examine
how
oaen
a

par)cular
cluster
in
a
tree
appears
when
nucleo)des

or
amino
acids
sequences
are
resampled




☺
The
frequency
with
which
a
given
branch
is
found
is

recorded
as
the
bootstrap
propor)on
...




Bootstrapping
is…
•  How
is
bootstrapping
and
the
construc3on
of
a

consensus
tree
carried
out
in
prac3ce?

Take
a
dataset
consis)ng
of
in
total
n
sequences
with
m

sites
each
...

However,
each
site
is
sampled
at
random
and
no

more
sites
are
sampled
than
there
were
original

sites
...


Sta)s)cal
Methods

Phylogene)c
trees
are
generated
from
all
the
datasets
...

•  In
this
final,
consensus
tree,

the
#
of
)mes
a

par)cular
branch
point
occurred
out
of
all
the
trees

that
were
built
will
be
displayed
...
Itai Yanai

Taken from Dr
...

•  Or
more
accurately,
what
is
your
biological

ques)on?

•  Do
you
need
the
root?

A
radial
display
is

enough
for
most
purposes

•  Crea)ng
a
rooted
tree
means
more
search

space

•  It
is
possible
to
“root”
a
tree,
even
if
it
was

calculated
as
unrooted
to
begin
with


Many
phylogenies
also
include
an
outgroup
—
a

taxon
outside
the
group
of
interest
...



Hence,
the
outgroup
stems
from
the
base
of
the

tree
...


Which
method
is
the
best
for
my
analysis?
Choose
set
of
related


seqs
(DNA
or
Protein)

Obtain
Mul)ple
Alignment

Is
there
a
strong
similarity?
Strong
similarity

Maximum
Parsimony

Check
validity
of
the


results

Distant
(weak)
similarity

Distance
methods
Very
weak
similarity

Maximum
Likelihood

So,
boHom
line
what
do
we
need?

•  A
subs)tu)on
model

•  An
evolu)onary
model

•  A
rate
model

•  A
method
of
tree
building/refining

•  A
method
of
assessing
the
reliability
of
the

tree




Title: Bioinformatics
Description: Bioinformatics note