Bioinformatics | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Bioinformatics
Description: Bioinformatics note

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Phylogene)c Inference: 
Part II 
Shifra Ben‐Dor 
Irit Orr 
June 2010 

The “ideal” method to build a phylogenetic tree
•  Will be based on sequences with biological relevance 
to the ques)on being asked 
•  Will extract the maximum amount of informa)on 
available from the sequence data 
•  Will combine this informa)on with prior knowledge 
of paHerns of sequences evolu)on  (evolu)onary 
models) 
•  Will add model parameters (such as transi)on/
transversion bias) whose values are not known a 
priori
...
22E+020            8
...
84E+074            6
...
 

Methods of tree searching 
•  Exhaus)ve (imprac)cal for all but the smallest 
datasets) – branch addi)on
...
  Then you add the ﬁah 
branch to all of the fourth branch level trees, 
on all possible branches…
...
 

Methods of tree searching 
•  A shortcut for this is known as the branch‐and‐
bound method
...
  You add (go down) a branch
...
  If the score goes down, you don’t follow 
that path anymore (the score will keep going 
down) so you back up one level and try again 
un)l you get to the )p
...
  The whole tree is eventually 
covered, and you end up with the best one
...
  These are known as 
“hill‐climbing” algorithms, where the idea is to 
get to the top of the hill (the maximum) and 
hope that it’s the global maximum
...
  Op)miza)on is done so that the best 
neighbor (most closely related taxa) is chosen
...
 
All possible connec)ons are made between a 
branch in one tree, and a branch in the other, 
looking to ﬁnd the best one 

More on evolu)onary models
...
Character Methods
Distance is the measure of how related the sequences 
are, as measured by observed diﬀerences in the 
sequence (number of changes) 
Character‐states are the actual sequences: the 
character is the posi)on, and what is there is the 
state For example: for DNA at any given posi)on 
(character) there are four possible states (A, C, G, T)  
Character analysis lets us locate where in the tree 
each site changed, while Distance analysis tells us 
how much change occurred along each branch
...
Character Methods
Distance methods also are generally considered 
algorithmic methods, where they use an algorithm 
to construct a tree from the data (in this case, the 
distance matrix)
...
 
Character methods, on the other hand, are 
considered tree‐searching methods, where they 
build many trees, and then have to decide which is 
(or are) the best
...
 

Distance (pairwise) Methods 
•  Distance ‐ the number of subs)tu)ons per site 
per )me period
...
 From this 
matrix the method es)mates the phylogene)c 
rela)onships of the OTUs
...
 
    Mathema)cal models allow for correc)ng the 
percentage diﬀerences between sequences, 
based on the DNA  models
...
 
•  Evolu)onary distance is always bigger than the 
distance calculated by direct sequence 
comparison
...
 
 

The disadvantage of distance methods: 
    Inevitable loss of evolu)onary informa)on 
when the method discards the actual 
sequences, (character state of the taxa), since 
the sequence alignment is converted to pairwise 
distances
...
 

Distance method steps 
•  First build a distance matrix 
•  Find the most closely related taxa 
•  Combine them (give them a distance) and 
remove them from the matrix 
•  Build a new matrix with the remaining sequences 
•  Con)nue un)l all sequences are connected 
•  Build a tree from the resul)ng series of matrices 
working from the last to the ﬁrst (the root or 
more distantly related to the most closely related, 
or least evolu)onarily distant) 

Distance method steps

Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
•  The oldest method to reconstruct phylogene)c 
trees from distance data
...
 The newly formed cluster 
replaces its OTUs in the distance matrix
...
 

Distances Methods: UPGMA
Unweighted Pair Group with Arithmetic Means
•  This process is repeated un)l all the OTUs are 
clustered
...
 
•  UPGMA is based on the molecular clock 
hypothesis – the evolu)onary rate is the same 
in all branches (or that all sequences are 
equally distant from the root) 
    This assump)on is seldom true
...
 
•  The NJ method constructs a phylogene)c tree, by 
joining neighbors, (OTUs), by a branch to the same 
node (common ancestor)
...
  

Distances Methods:

Neighbor-Joining
•  NJ starts with a matrix like UPGMA 
•  It then calculates the “net divergence” of one 
OTU from all others as the sum of distances to 
that OTU
...
 
•  The lowest scoring pair is then chosen, and the 
distances to the node that join them is taken
...
  A tree is built from the series of 
matrices
...
 This distance 
is inferred from the observed diﬀerences 
between sequences
...
 The nucleo)de or aa appearing in 
this posi)on is a state
...
 
Character‐state methods retain the original 
status of the taxa, therefore can be used to 
aHempt the reconstruc)on of the character‐
state of ancestral nodes
...

Taken from Dr
...
Itai Yanai

Maximum Parsimony Methods 
The Maximum Parsimony method is good for similar 
sequences, a sequence group with a small amount 
of varia)on 
Maximum Parsimony methods do not give the branch 
lengths, only the branch order
...
  If more 
taxa were added, a truer picture might appear
...
 

Character Based Methods:  
Maximum Likelihood
•  ML will give the most likely tree given the data 
under a par)cular model – if you change the 
model, you will get a diﬀerent tree 
•  ML method – (like the Maximum Parsimony 
method) performs its analysis on each posi)on 
individually in the mul)ple alignment (like 
parsimony)
...
 
•  Unlike Parsimony, ML does take iden)cal 
posi)ons into account, and can give branch 
lengths 

Character Based Methods:  
Maximum Likelihood
•  Likelihood methods regard the observed data as a 
ﬁxed observa)on and seek the values of the sta)s)cal 
parameters that provide the most probable 
descrip)on of the data, given the model of evolu)on
...
 
•  These proper)es make likelihood very suited to 
historical inference problems, in which the observed 
data arise only once
...
  The ﬁrst 
ball is thrown
...
  If its to the lea of the ﬁrst ball, 
Player B (Irit) gets a point
...
   
•  The problem:  You can’t see the table
...
 You are only told 
who gets a point
...
 1177‐8 

Character Based Methods: 
Bayesian Analysis 
•  Aaer 8 throws, the score is Shifra 5, Irit 3
...
 
•  The only thing we do know is the current 
standings in the game
...
 1177‐8 

Character Based Methods: 
Bayesian Analysis 
•  If we knew where the ﬁrst ball fell, then we 
could calculate the probability
...
 
•  So we have to calculate a probability of, say, 
Irit winning (observing some outcome) given a 
model based on what happened in the past
...
 1177‐8 

Character Based Methods: 
Bayesian Analysis 
“The Bayesian approach is to write down exactly 
the probability we want to infer, in terms only of 
the data we know, and directly solve the resul)ng 
equa)on
...
 1177‐8 

Character Based Methods: 
Bayesian Analysis 
“Bayes theorem”  
The probability that of a par)cular choice p  
given the data (the Posterior Probability) is 
propor)onal to the likelihood of p (the 
probability that we would get the observed 
data if p were true), mul)plied by the a priori 
probability (prior probability) of this p being 
true rela)ve to all other values of p
...
 1177‐8 

How does this apply to phylogene)cs? 
•  The data we have at hand are the current 
sequences (taxa)
...
 
•  unfortunately, those are unknown
...
  
What Bayesian methods do is integrate over 
degrees of uncertainty
...
 
•  Very computa)onally intensive – we use 
Markov Chain Monte Carlo (MCMC methods) 
•  Its very hard to ﬁgure out what prior 
probability should be used
...
  Its using probability as a measure 
of conﬁdence
...
 a new 
state of the chain is deﬁned (by moving a 
branch and/or changing a branch length)
...
 

Problems or rough spots
...
(burn in) 
•  How many genera)ons to run the mcmc 
simula)ons 

How do we know if the results are 
reliable? 

“The results of a phylogene)c 
analysis are explicitly uncertain; 
accuracy is a pipe dream
...
  
•  As with es)mates of model parameters, a single 
point es)mate is of liHle value without some 
measure of the conﬁdence we can place in it
...
  
☺  It is the crea)on of pseudoreplicate datasets by 
randomly resampling the original dataset
...
 
☺ Bootstrapping is used to examine how oaen a 
par)cular cluster in a tree appears when nucleo)des 
or amino acids sequences are resampled    
☺ The frequency with which a given branch is found is 
recorded as the bootstrap propor)on
...
  

Bootstrapping is…
•  How is bootstrapping and the construc3on of a 
consensus tree carried out in prac3ce? 
Take a dataset consis)ng of in total n sequences with m 
sites each
...
 
However, each site is sampled at random and no 
more sites are sampled than there were original 
sites
...
 

Sta)s)cal Methods 
Phylogene)c trees are generated from all the datasets
...
 
•  In this ﬁnal, consensus tree,  the # of )mes a 
par)cular branch point occurred out of all the trees 
that were built will be displayed
...
Itai Yanai

Taken from Dr
...
 
•  Or more accurately, what is your biological 
ques)on? 
•  Do you need the root?  A radial display is 
enough for most purposes 
•  Crea)ng a rooted tree means more search 
space 
•  It is possible to “root” a tree, even if it was 
calculated as unrooted to begin with 

Many phylogenies also include an outgroup — a 
taxon outside the group of interest
...
  
Hence, the outgroup stems from the base of the 
tree
...
 

Which method is the best for my analysis?
Choose set of related  
seqs (DNA or Protein)

Obtain Mul)ple Alignment

Is there a strong similarity?
Strong similarity 
Maximum Parsimony 
Check validity of the  
results

Distant (weak) similarity 
Distance methods
Very weak similarity 
Maximum Likelihood

So, boHom line what do we need? 
•  A subs)tu)on model 
•  An evolu)onary model 
•  A rate model 
•  A method of tree building/reﬁning 
•  A method of assessing the reliability of the 
tree  

Title: Bioinformatics
Description: Bioinformatics note

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview