Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Protein Structure Briefly explained
Description: This is related to graduate and under graduate programs. This is in class briefly explained notes.
Description: This is related to graduate and under graduate programs. This is in class briefly explained notes.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
BASICS OF PROTEIN STRUCTURE
DEFINITIONS
TORSION ANGLES AND THE RAMACHANDRAN PLOT
THE RAMACHNDRAN PLOT AND THE QUALITY OF A PROTEIN
STRUCTURE
SECONDARY STRUCTURE ELEMENTS
STRUCTURAL MOTIFS: CONNECTIVITY BETWEEN SECONDARY
STRUCTURE ELEMENTS
FOLDS AND FOLD CLASSIFICATION
DOMAINS AND DOMAIN CLASSIFICATION
PROTEIN DATABASES: SHORT OVERVIEW
THE PROTEIN DATABANK (PDB): FILE FORMAT AND CONTENT
Definitions
To understand the basic principles of protein threedimensional structure and the potential of their use in various
areas of research, academic or industrial - like
pharmaceutical or biotech industries - we first need to look
at the four levels of protein structure
...
The first level is the
amino acid sequence - there are 20 different amino acids
most commonly found in proteins
...
The sequence
controls to a large extent the higher levels of the protein
structure – secondary, tertiary and quaternary structure
...
A domain is an independent folding unit of a protein
...
Some proteins consist of one single domain while
others may contain several domains
...
Domains with the same fold
may or may not be related to each other functionally or
evolutionary
...
The
currently known protein three- dimensional structures have
been classified into more than 1000 different unique folds
...
The fourth structural level, the quaternary structure, is
an oligomeric structure and usually involves several
polypeptide chains (called subunits)
...
An oligomer is stabilized by
subunit interactions, and may involve hydrophobic
interactions, hydrogen bonds, salt bridges, etc
...
Since large variations in the sequence may result in the
same type of three-dimensional structure, we say that
structure has a higher degree of conservation than
sequence
...
This is why
you may hear that the determination of the structure of a
protein with unknown function may help in revealing the
function
...
Although the function of the protein was known
before structure determination, the similarity of the structure
to that of ferrochelatase, an enzyme active in heme
biosynthesis, could only be revealed after the structure
determination of cobalt chelatase
...
An example of a quaternary protein structure
...
The structure was
obtained using single-particle reconstruction from cryo-electron microscopic
(cryo EM) images of the complex
...
Other domains where homology-modeled
based on known structures from other proteins
...
In this process the
peptide bond, the covalent bond between two amino acid
residues, is formed
...
Each of them has its specific
characteristics defined by the side chain, which provides it
with its unique role in a protein structure
...
The charged amino acid residues include lysine (+), arginine
(+), aspartate (-) and glutamate (-)
...
The hydrophobic amino acids include alanine,
valine, leucine,
isoleucine, proline, phenylalanine,
tryptophane, cysteine and methionine
...
However, glycine, being one of the common
amino acids, does not have a side chain and for this reason it
is not straightforward to assign it to one of the above classes
...
This suggests that it is rather hydrophilic
...
In contrast to glycine, proline provides rigidity to the
polypeptide chain by imposing certain torsion angles on the
segment of the structure
...
Glycine and proline are often
highly conserved within a protein family since they are
essential for the conservation of a particular protein fold
...
While hydrophobic amino acids build up
the core of the molecule, polar and charged amino acids
preferentially cover the surface and are in contact with
solvent due to their ability to form hydrogen bonds
...
The hydrogen is covalently attached to one of the atoms
(called the hydrogen-bond
donor), but interacts
electrostatically with the other atom (the hydrogen bond
acceptor, O)
...
Due to their electronic structure, water molecules may
accept 2 hydrogen bonds, and donate 2, thus being
simultaneously engaged in a total of 4 hydrogen bonds
...
In addition,
water is often found to be involved in ligand binding to
proteins, mediating ligand interactions with polar or charged
side chain- or main chain atoms
...
A detailed
atlas of hydrogen bonding for all 20 amino acids in protein
structures was compiled by Ian McDonald and Janet
Thornton
...
These interactions may be
important for the stabilization of the protein threedimensional structure - for example proteins from
thermophilic organisms (organisms that live at elevated
temperatures, up to 80-90 C, or even higher) often have an
extensive network of salt bridges on their surface, which
contributes to the thermostability of these proteins,
preventing their denaturation at high temperatures
...
Below you can see a figure showing
the distribution of the different amino acids within protein
molecules:
While hydrophobic amino acids are mostly buried within the core
of the structure, a smaller fraction of polar groups are found to be
buried
...
The vertical axis shows the fraction of highly buried residues,
while the horizontal axis shows the amino acid names in one- letter
code
...
A special way for plotting protein torsion angles was also
introduced by Ramachandran and co-authors, and was
subsequently named the Ramachandran plot
...
It also provides an overview of excluded regions that
show which rotations of the polypeptide are not allowed due
to steric hindrance(collisions between atoms)
...
Torsion angles are among the most important local
structural parameters that control protein folding essentially, if we would have a way to predict the
Ramachandran angles for a particular protein, we would be
able to predict its fold
...
This is due to the partial double-bond character of the
peptide bond, which restricts rotation around the C-N bond,
placing two successive α-carbons and C, O, N and H between
them in one plane
...
Illustration
Torsion angles are dihedral angles, which are defined by 4
points in space
...
The standard IUPAC definition of a dihedral angle is
illustrated in the figure below
...
The
rotation takes place around the central B-C bond
...
The rotation around the B-C bond is described by
the A-B- D angle shown of the right figure: Positive angles
correspond to clockwise rotation:
The mentioned above, restriction of the Ramachandran
angles in proteins to certain values is clearly visible in the
Ramachandran plot below
...
Each dot on the plot shows the angles for an
amino acid
...
This is a convenient presentation and allows clear
distinction of the characteristic regions of α-helices and βsheets
...
Some values of φ and ψ are forbidden since
the involved atoms will come too close to each other,
resulting in a steric clash
...
But there are sometimes exclusions from this rule - such
values can be found and they most probably will result in
some strain in the polypeptide chain
...
They
may have functional significance and may be conserved
within a protein family
...
In this case the Ramachandran plot shows torsion angle
distribution for one single residue, glycine
...
That is
why glycine is often found in loop regions, where the polypeptide
chain needs to make a sharp turn
...
Another residue with special
properties is proline, which in contrast to glycine fixes the torsion
angles at a certain value, very close to that of an extended
-strand
...
Theoretically, the average phi and psi values for helices and -sheets should be clustered around -57, -47 and
-80, +150, respectively
...
The Ramachndran plot and the quality of a protein
structure:
In cases when the protein X-ray structure was not
properly refined, and especially for bad or wrong homology
models, we may find torsion angles in disallowed regions of
the Ramachandran plot − this type of deviations usually
indicates problems with the structure
...
The image below shows two Ramachandran plots for
the same structure refined at different resolutions
...
Red indicates low-energy regions, brown allowed regions,
yellow the so-called generously-allowed regions and paleyellow marks disallowed regions
...
You may also notice that the torsion angles on the left
plot lack real clustering around secondary structure regions
and have a much wider distribution, compared to the plot on
the right (also compare to the left plot on the figure above)
...
They may indicate
problems in the structure, but they may also be true and may
provide some interesting insights into the function of the
protein
...
Some features are essential in practical
applications - for example in sequence alignment analysis,
in homology modeling and analysis of model quality, in
planning mutations in a protein or when analyzing proteinligand interactions
...
Linus Pauling was the first to
predict the existence of -helices
...
An example of an -helix is shown on the
figure below
...
To give you a better
impression of how a helix looks like, only the main chain of
the polypeptide is show in the figure, no side chains
...
6 residues/turn in an α-helix, which means that there is
one residue every 100 degrees of rotation (360/3
...
Each
residue is translated 1
...
4 Å between structurally equivalent
atoms in a turn (pitch of a turn)
...
The -helix is the
major structural element in proteins
...
Together these
groups form a hydrogen bond, one of the main forces in the
stabilization of secondary structure in proteins
...
The α-helix is not the only helical structure in proteins
...
The 3_10 helix has a smaller radius, compared to the
-helix, while the -helix has a larger radius
...
β-sheets consist of several β-strands, stretched
segments of the polypeptide chain, kept together by a
network of hydrogen bonds
...
These segments do not
need to follow to each other in the sequence and may be
located in different regions of the polypeptide chain
...
In the figure each
strand is
represented by an arrow, which defines its direction starting
from the N-terminus to the C- terminus
...
In the next figure you can see an example of a
protein structure with an anti-parallel -sheet:
When there are only 2 anti-parallel -strands, like in the figure
below, it is called a
- hairpin
...
Short turns and
longer loops play an important role in protein 3D structures,
connecting together strands to strands, strands to -helices,
or helices to helices
...
But
in some cases, when a loop has some specific function, for
example interaction with another protein, the sequence may
be conserved
...
β-haipin
You may have heard the expression "Structure is
Function"
...
For this reason,
when working or just viewing protein 3D structures, it is an
advantage to be able to recognize the secondary structure
elements and to identify structural motifs
...
To create the observed variety of
protein structures, proteins use these structural motifs as
building blocks
...
Also, from
known protein three-dimensional structures we have learned
that in nature there is a limited number of ways by which
secondary structure elements are combined
...
Here we will
look at some examples
...
One of the simplest protein structural motifs is a helical
bundle, shown on the schematic image below
...
Parallel and anti-parallel β-sheets are also connected by a
variety of connectivity types
...
If a connecting region cannot be classified as a
secondary structure, and it is not a short loop, it
is sometimes called coil region
...
An example is shown on the figure below
...
In the first two hairpins are connected to each
other making up the sheet, while in the second there is the socalled Greek-key motif type of connectivity:
The figure below shows the topology of a protein
plastocyanin, which only contains β– structures
...
Here we just want to explain
the concept by showing some example
...
Fold analysis may reveal evolutionary
relationships, which sometimes are difficult to detect at the
sequence level, it may also help a better understanding of the
mechanism of function of a protein, its activity and biological
role
...
g
...
The relationship between the amino acid sequence and
the three-dimensional structure of a protein is not unique – a
large number of modifications in the sequence within a
protein family can be tolerated and will result in a similar 3D
structure
...
By other words, the constraints put during
evolution by nature on the three-dimensional structure are
much tighter than those put on the amino acid sequence
...
A protein fold is defined by the arrangement of the
secondary structure elements of the structure relative to each
other in space
...
The 4-helix bundle
and the TIM barrel, for example, are two types of very
common protein folds
...
An additional example is shown
below
...
Rossmann, a protein
crystallographer who solved one of the very first structures
with this type of fold
...
On the right the nucleotide binding
domain of liver alcohol dehydrogenase is shown
...
There are of course many more
types of protein folds, but how many in total?
Taking into account the huge number of amino acid
sequences, one would expect a high number of different
folds
...
Nature
has re-used the same fold again and again for performing
totally new functions
...
This will bring us to a page where among other stuff the
following two options are shown:
•
•
Folds As Defined By SCOP
Topologies As Defined By CATH
SCOP and CATH are the two databases generally
as the two main authorities in the world
classification
...
Also notice the graph, the last time a new
identified was 2008:
accepted
of fold
different
fold was
The next graph shows the folds identified by CATH database, a
total of 1282 folds:
Apparently the two databases use slightly different ways for
fold definitions and classification, which results in different
total numbers of folds
...
Have we reached the limit? There is probably
still a chance that some new folds will be discovered
...
Knowing the fold of the different domains in a protein
molecule is important in many cases
...
Domains and domain classification
Many proteins only contain a single domain, while others
may have several domains
...
Such domains often “carry” their function
with them when they get inserted into different proteins
during evolution
...
Often have a specific function associated with it
...
The procedure followed by databases, for example
CATH or SCOP, includes:
Assignment of secondary
structures
Assignment of domains
Assignment of a structural class to each domain (3 possible
structural classes, alpha, beta and alpha/beta)
Assignment of fold (called Architecture in
the CATH database)
Assignment of topology (homologues
superfamily)
Secondary structure is usually assigned automatically, using
computer software
...
One needs to be aware that CATH and SCOP use
slightly different terminology in fold assignment and have a
different way of describing the entries
...
There are currently 53 million protein
domains classified into 2,737 superfamilies in the CATH
database
...
A subunit of
hemoglobin consists of a single α- helical domain
...
In the case of hemoglobin this will make 4 domains, while
for pyruvate kinase there will be 12 protein domains in the
functional unit
...
The top domain on the figure
below is built up by β-sheets, while the other two domains
contain a mixture of helices and strands
...
As an example,
performing a search with PDB ID 1E0T would return the
following result for the 3 domains
...
As mentioned above, there are in total 3 domains in each chain: 01,
02 and 03
...
This information is highly valuable in homology
modeling, especially in cases when we need to model
different domains using different modeling templates, the so
called multi- template homology modeling
...
It is also interesting to know
what is actually inside a structural file, what type of
information is kept there and how structural information is
presented in the file
...
Only few
structures existed at that time, and the only experimental
method for protein structure determination available was
protein X-ray crystallography
...
Before the cloning era people had to purify proteins from
cells, substantially limiting availability − to obtain a few
milligrams of a protein for crystallization one would need a
lot of cells
...
Another
important factor was the introduction of synchrotron
radiation
...
This eliminated the time-consuming stage of optimization
of the crystallization conditions, which was required for
obtaining crystals large enough for the relatively low X-ray
intensity of home sources
...
Cheaper computers also meant new
software, which also started to become user friendly, and in
addition new graphics capabilities of monitors became
available
...
That was when
the number of protein structures started to increase
dramatically
...
One such consortium is, for example, the Structural
Genomics Consortium (SGC)
...
Currently every newly determined protein structure
has to be deposited with the Protein Data Bank before the
scientific paper describing the structure can be published
...
However, one should remember that not all
structures in the PDB are unique
...
This may be a source of confusion if one
would try to fetch a structure from PDB - which one to
choose if there are many entries of the same protein? This
will be discussed later in the chapter on homology modeling
...
Coming back to our initial questions, how to download a
structure and what is inside the PDB file? First we need to
check if there is a structure for the protein we are interested
in
...
For example, enter the name of a
protein called magnesium chelatase
...
Some other proteins may be listed in the output,
some of them come from electron microscopy modeling,
others may be totally unrelated
...
Of course you may refine your
search using the options provided on the PDB page that show
up when you enter the name of the protein
...
Both PDB and PDBsum provide additional data on the
entry, including links to other databases, where more
information can be found
...
The PQS
database is also of interest, it is the Protein Quaternary
Structure database
...
The reason is that the information, which can be
found in PQS is currently generated by the PISA sever,
Protein Interfaces, Surfaces and Assemblies
...
The biological unit in solution
may contain several subunits of the same protein, arranged as
dimers, trimers of higher order oligomers
...
When the molecules are
crystallized, they get arranged in certain types of space
lattices, within which all molecules are ordered and related to
each other by symmetry operations of the particular
symmetry group of the crystal (possible symmetry groups
are listed in the International Tables for Crystallography)
...
In such cases, one unit within, for
example a trimer, becomes the asymmetric unit of the crystal
...
This is reflected in the
content of the files in the PDB, they contain coordinates for
the atoms of one subunit, the asymmetric unit
...
The file generated by the PISA server may also be
downloaded from the PDB
...
In the middle figure there are two
subunits in the unit cell related to each other by a two-fold
rotation symmetry axis
...
In the
last figure on the right the molecules in the unit cell are
related by a 4-fold crystallographic symmetry axis
...
On essential feature is a description of the amino acid
sequence in relation to the secondary structure of the protein
...
The image
below shows the page from PDBsum:
The information in this page is useful for quick identification
of the position of amino acids within the structure, for getting
an idea on the type of the protein (all α, α/β), the location of
active site residues, etc
...
The Protein Databank (PDB): File Format and Content
Here we will focus on the PDB
...
PDB files are simple text files
and can be open by any text editor (in contrast, for example
to MS Word files, which cannot be opened by all text
editors)
...
Each atom position
is defined by its x,y,z coordinates
...
) and the structure as such, like
geometry, secondary structure content, regions missing in the
structure, etc)
...
The
lower the value of the R-factor, the better the fit
...
The reason is that
there was no electron density for these residues (see for
example the discussion on structure quality in homology
modeling)
...
It is essentially
impossible to find the correct positions for amino acids
without the guiding electron density
...
The numbers after the first record in the file, ATOM, are
just sequential numbers of the atoms in the structure
...
The next carbon atom is C- β, and following atoms
are named after the Greek alphabet, gamma, delta, etc
...
They are just C, O and N
...
In cases when the
structure consists of several polypeptide chains (a multisubunit protein), each chain will get its own identifier, like
A, B, C, etc (as in the case of Pyruvate kinase discussed
earlier)
...
The 3 numbers which follow (e
...
, 14
...
369, 62
...
They describe the position of each atom in an orthogonal
coordinate system
...
Title: Protein Structure Briefly explained
Description: This is related to graduate and under graduate programs. This is in class briefly explained notes.
Description: This is related to graduate and under graduate programs. This is in class briefly explained notes.