Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

You have nothing in your shopping cart yet.

Title: Biological Databases
Description: A brief note on Biological Databases

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above


Biological Data
Data is a collection of facts such as values or measurements
...
Biological data are commonly stored in database
...

BIOLOGICAL DATABASE
A biological database is a large organized body of persistent data, usually associated with computerized software
designed to update, query, and retrieve components of data stored within the system
...
For eg; a record associated with a nucleotide sequence database typically contains information such as
the input sequence with a description of the type of molecule, the scientific name of the source organism from
which it was isolated, and the literature citations associated with the sequence
...

Making the biological information available for analysis and developing applications is the key and there are a huge
number of databases in the public and private domains to do so
...
Protein Databases
Protein sequence databases are characterized as primary database and secondary or composite database
...
Primary Databases:
Primary protein databases contain over 300,000 protein sequences that function as a repository for the raw data
...

SWISS PROT
SWISS PROT is a curated protein sequence database; ie groups of designated scientists have prepared the entries
from literatures or contacts with external experts
...
It
provides a high level of annotations such as the description of the function of a protein, its domain structure, posttranslational modifications, etc
...

PIR
Protein Information Resource (PIR) is the first developed, the most comprehensive and expertly annotated protein
sequence database
...
Although SWISS PROT and PIR
overlap extensively, there are still many sequences that can be found in only one of them
...

ii
...
Primary databases are
combined and filtered to form non-redundant composite databases
...
PROSITE is a database of protein families and domains
...


PFAM
Pfam is a database of Protein FAMilies defined as domains (continuous segments of entire protein sequences)
...
It is licensed under the GNU General
Public License, which basically makes it available to anyone, but imposes the restriction that derivative works (new
database modification) must be made available in source form
...
Structural Databases
Structural databases pertain to macromolecular structures
...
Analytical biologists usually deposit their molecular
structures in the PDB on publication
...
PDB provides a primary archive
of all 3-D structures for macromolecules such as proteins, RNA, DNA and various complexes
...
The summary information gives an at a glance overview of the contents of each PDB
entry in terms of numbers of protein chains, ligands, metal ions, etc
...

CATH
The CATH database is a hierarchial classification of protein domain structures, which form clusters at 4 major
structural levels:
Class (C) – Derived from secondary structure content, which is assigned for protein structures automatically
...

Topology (T) – Topology level clusters structures according to their topological connections and number of
secondary structures
...

SCOP
The Structural Classification Of Proteins database aims to provide a detailed and comprehensive description of the
structural and evolutionary relationships between all proteins whose structure is known
...

3
...
Each GenBank entry includes a concise description of the sequence, scientific
nomenclature, and taxonomy of the source organism, and a table of features that identifies coding regions and
other sites of biological significance, such as transcription units, sites of mutations or modifications
...
GenBank is a part of the International Nucleotide
Sequence Database Collaboration (INSDC), which is comprised of DDBJ, EMBL and GenBank
...
EMBL database is maintained by the European Bioinformatics Institute in
Hinxton, UK
...
DDBJ is organized by NIG (National Institute of Genetics)
...

GDB
Human Genome DataBase stores data about genes, DNA markers, map location, genetic disease and supports
clinical medicine and biomedical research
...

Other Databases:
There are many more databases which provide specialized information
...
It is a secondary database, which
contains many links to other databases
...
It is an effort to computerize current knowledge of
molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes
...

NCBI
The National Center for Biotechnology Information is part of the United States National Library of Medicine (NLM)
...
Major databases include
GenBank for DNA sequences and PubMed, a bibliographic database for biomedical literature
...
All these databases are available online through the
Entrez search engine
...
The NCBI Bookshelf is a
collection freely available, downloadable, on-line version of selected biomedical books
...
These can be classified as:
1
...
Thus the degree of
similarity between the two sequences can be measured while their homology is a case of being either true or false
...


2
...
These allow you to approximate the biochemical function of query
protein
...
Structural Analysis
These sets of tools allow you to compare structures with the known structural databases
...

4
...

Examples of Bioinformatics Tools:
i
...
BLAST is used to perform fast similarity searches to find a match
against the queried nucleotide or protein sequence
...

Depending on the type of sequences to compare, there are different programs:
Blast p: Compares an aminoacid query sequence against a protein sequence database
...

Blast x: Compares a nucleotide query sequence against a protein sequence database
...

Program Query sequence
Blast p

Protein

Protein

Blast n

Nucleic acid

Nucleic acid

Blast x

Translated nucleic acid Protein

t Blast n

ii
...
FASTA search similarities between one sequence (the query) and any group of sequences of the same
type (nucleic acid or protein) as the query sequence
...

A sequence in FASTA format is represented as a series of lines each of which do not exceed 80 characters
...
Following the initial line is the
actual sequence in standard one letter code
...

>MCHU- Calmodulin-Human, rabbit, rat, and chicken
ACCTGAGGCTCCGAACGTCCGATTCAC*


Title: Biological Databases
Description: A brief note on Biological Databases