Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
Phylogene)c Inference:
Part II
Shifra Ben‐Dor
Irit Orr
June 2010
The “ideal” method to build a phylogenetic tree
• Will be based on sequences with biological relevance
to the ques)on being asked
• Will extract the maximum amount of informa)on
available from the sequence data
• Will combine this informa)on with prior knowledge
of paHerns of sequences evolu)on (evolu)onary
models)
• Will add model parameters (such as transi)on/
transversion bias) whose values are not known a
priori
...
22E+020 8
...
84E+074 6
...
Methods of tree searching
• Exhaus)ve (imprac)cal for all but the smallest
datasets) – branch addi)on
...
Then you add the fiah
branch to all of the fourth branch level trees,
on all possible branches…
...
Methods of tree searching
• A shortcut for this is known as the branch‐and‐
bound method
...
You add (go down) a branch
...
If the score goes down, you don’t follow
that path anymore (the score will keep going
down) so you back up one level and try again
un)l you get to the )p
...
The whole tree is eventually
covered, and you end up with the best one
...
These are known as
“hill‐climbing” algorithms, where the idea is to
get to the top of the hill (the maximum) and
hope that it’s the global maximum
...
Op)miza)on is done so that the best
neighbor (most closely related taxa) is chosen
...
All possible connec)ons are made between a
branch in one tree, and a branch in the other,
looking to find the best one
More on evolu)onary models
...
Character Methods
Distance is the measure of how related the sequences
are, as measured by observed differences in the
sequence (number of changes)
Character‐states are the actual sequences: the
character is the posi)on, and what is there is the
state For example: for DNA at any given posi)on
(character) there are four possible states (A, C, G, T)
Character analysis lets us locate where in the tree
each site changed, while Distance analysis tells us
how much change occurred along each branch
...
Character Methods
Distance methods also are generally considered
algorithmic methods, where they use an algorithm
to construct a tree from the data (in this case, the
distance matrix)
...
Character methods, on the other hand, are
considered tree‐searching methods, where they
build many trees, and then have to decide which is
(or are) the best
...
Distance (pairwise) Methods
• Distance ‐ the number of subs)tu)ons per site
per )me period
...
From this
matrix the method es)mates the phylogene)c
rela)onships of the OTUs
...
Mathema)cal models allow for correc)ng the
percentage differences between sequences,
based on the DNA models
...
• Evolu)onary distance is always bigger than the
distance calculated by direct sequence
comparison
...
The disadvantage of distance methods:
Inevitable loss of evolu)onary informa)on
when the method discards the actual
sequences, (character state of the taxa), since
the sequence alignment is converted to pairwise
distances
...
Distance method steps
• First build a distance matrix
• Find the most closely related taxa
• Combine them (give them a distance) and
remove them from the matrix
• Build a new matrix with the remaining sequences
• Con)nue un)l all sequences are connected
• Build a tree from the resul)ng series of matrices
working from the last to the first (the root or
more distantly related to the most closely related,
or least evolu)onarily distant)
Distance method steps
Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
• The oldest method to reconstruct phylogene)c
trees from distance data
...
The newly formed cluster
replaces its OTUs in the distance matrix
...
Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
• This process is repeated un)l all the OTUs are
clustered
...
• UPGMA is based on the molecular clock
hypothesis – the evolu)onary rate is the same
in all branches (or that all sequences are
equally distant from the root)
This assump)on is seldom true
...
• The NJ method constructs a phylogene)c tree, by
joining neighbors, (OTUs), by a branch to the same
node (common ancestor)
...
Distances Methods:
Neighbor-Joining
• NJ starts with a matrix like UPGMA
• It then calculates the “net divergence” of one
OTU from all others as the sum of distances to
that OTU
...
• The lowest scoring pair is then chosen, and the
distances to the node that join them is taken
...
A tree is built from the series of
matrices
...
This distance
is inferred from the observed differences
between sequences
...
The nucleo)de or aa appearing in
this posi)on is a state
...
Character‐state methods retain the original
status of the taxa, therefore can be used to
aHempt the reconstruc)on of the character‐
state of ancestral nodes
...
Taken from Dr
...
Itai Yanai
Maximum Parsimony Methods
The Maximum Parsimony method is good for similar
sequences, a sequence group with a small amount
of varia)on
Maximum Parsimony methods do not give the branch
lengths, only the branch order
...
If more
taxa were added, a truer picture might appear
...
Character Based Methods:
Maximum Likelihood
• ML will give the most likely tree given the data
under a par)cular model – if you change the
model, you will get a different tree
• ML method – (like the Maximum Parsimony
method) performs its analysis on each posi)on
individually in the mul)ple alignment (like
parsimony)
...
• Unlike Parsimony, ML does take iden)cal
posi)ons into account, and can give branch
lengths
Character Based Methods:
Maximum Likelihood
• Likelihood methods regard the observed data as a
fixed observa)on and seek the values of the sta)s)cal
parameters that provide the most probable
descrip)on of the data, given the model of evolu)on
...
• These proper)es make likelihood very suited to
historical inference problems, in which the observed
data arise only once
...
The first
ball is thrown
...
If its to the lea of the first ball,
Player B (Irit) gets a point
...
• The problem: You can’t see the table
...
You are only told
who gets a point
...
1177‐8
Character Based Methods:
Bayesian Analysis
• Aaer 8 throws, the score is Shifra 5, Irit 3
...
• The only thing we do know is the current
standings in the game
...
1177‐8
Character Based Methods:
Bayesian Analysis
• If we knew where the first ball fell, then we
could calculate the probability
...
• So we have to calculate a probability of, say,
Irit winning (observing some outcome) given a
model based on what happened in the past
...
1177‐8
Character Based Methods:
Bayesian Analysis
“The Bayesian approach is to write down exactly
the probability we want to infer, in terms only of
the data we know, and directly solve the resul)ng
equa)on
...
1177‐8
Character Based Methods:
Bayesian Analysis
“Bayes theorem”
The probability that of a par)cular choice p
given the data (the Posterior Probability) is
propor)onal to the likelihood of p (the
probability that we would get the observed
data if p were true), mul)plied by the a priori
probability (prior probability) of this p being
true rela)ve to all other values of p
...
1177‐8
How does this apply to phylogene)cs?
• The data we have at hand are the current
sequences (taxa)
...
• unfortunately, those are unknown
...
What Bayesian methods do is integrate over
degrees of uncertainty
...
• Very computa)onally intensive – we use
Markov Chain Monte Carlo (MCMC methods)
• Its very hard to figure out what prior
probability should be used
...
Its using probability as a measure
of confidence
...
a new
state of the chain is defined (by moving a
branch and/or changing a branch length)
...
Problems or rough spots
...
(burn in)
• How many genera)ons to run the mcmc
simula)ons
How do we know if the results are
reliable?
“The results of a phylogene)c
analysis are explicitly uncertain;
accuracy is a pipe dream
...
• As with es)mates of model parameters, a single
point es)mate is of liHle value without some
measure of the confidence we can place in it
...
☺ It is the crea)on of pseudoreplicate datasets by
randomly resampling the original dataset
...
☺ Bootstrapping is used to examine how oaen a
par)cular cluster in a tree appears when nucleo)des
or amino acids sequences are resampled
☺ The frequency with which a given branch is found is
recorded as the bootstrap propor)on
...
Bootstrapping is…
• How is bootstrapping and the construc3on of a
consensus tree carried out in prac3ce?
Take a dataset consis)ng of in total n sequences with m
sites each
...
However, each site is sampled at random and no
more sites are sampled than there were original
sites
...
Sta)s)cal Methods
Phylogene)c trees are generated from all the datasets
...
• In this final, consensus tree, the # of )mes a
par)cular branch point occurred out of all the trees
that were built will be displayed
...
Itai Yanai
Taken from Dr
...
• Or more accurately, what is your biological
ques)on?
• Do you need the root? A radial display is
enough for most purposes
• Crea)ng a rooted tree means more search
space
• It is possible to “root” a tree, even if it was
calculated as unrooted to begin with
Many phylogenies also include an outgroup — a
taxon outside the group of interest
...
Hence, the outgroup stems from the base of the
tree
...
Which method is the best for my analysis?
Choose set of related
seqs (DNA or Protein)
Obtain Mul)ple Alignment
Is there a strong similarity?
Strong similarity
Maximum Parsimony
Check validity of the
results
Distant (weak) similarity
Distance methods
Very weak similarity
Maximum Likelihood
So, boHom line what do we need?
• A subs)tu)on model
• An evolu)onary model
• A rate model
• A method of tree building/refining
• A method of assessing the reliability of the
tree