Data Structures and Algorithms | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Data Structures and Algorithms
Description: These notes are in non-vague bullet form, full of information on various advanced data structures and algorithms, with diagrams and pictures. Topics include: Binary Trees Self-balancing Trees (AVL trees, Red-Back trees) Skip lists Graphs

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Data Structures & Algorithms 2 Notes






What is an ADT?
o Stands for Abstract Data Type
...

o Examples of ADTs include stacks, queues, trees, etc
...

 Which operations are used most
...

Examples of ADT operations:
o Insert:
 Add an item to the list of data items
...

 Pointer-based: replace head
...

o List:
 List all the items present in the ADT (regardless of sort order)
...

 Pointer-based: traverse from one item to the other via
pointers, and output the item along the way
...

o Traverse:
 Traverse the list in a particular order
...

o Traverse & Retrieve:
 Traverse a sorted ADT and retrieve an item
...

 Pointer-based: scan until the item is found (inefficient)
...

Table ADT:
o The table is a form of ADT which represents a list of records, much like a table
in a database
...

Each record has a number of different fields, also known as attributes,
associated to it
...
S
...

7,300,000
Paris
France
2,200,000
Rome
Italy
2,800,000
Toronto
Canada
3,200,000
Venice
Italy
300,000
o A table can be ordered by a certain field
...

o Finding the record containing a certain value in one of its fields becomes
easier if the table is sorted, since then, a binary search can be performed
...

 Insert an item at a position based on the item’s search key
...

 Retrieve an item having a specified search key value
...

Binary trees:
o A binary tree is a structure containing data items (called nodes) which have at
most two children: a left child node and a right child node
...

o The left child node is used to store the data item which is smaller than the
parent’s, and the right child node is used to store the data item which is
larger
...

o However, a basic binary tree can degenerate into a glorified linked list:

o This actually does not improve efficiency of search/retrieval at all, instead
making it pretty much exactly the same as the efficiency of a pointer-based
linked list
...

o Find maximum:
 Start at root and take the right branch until there is no child
...

 Else, take right branch
...

o Insert:
 If the node to be inserted is smaller than the current node, take left
branch
...

 Keep on iterating until the proper position for the node is found
...
More
on that in the next point
...

 This is the easiest scenario, as the node can be removed without any
repercussions
...

 Set the item’s parent to point to the item’s child, thus removing the
item from the tree
...

 Let the two children be L and R, and the item to delete be X
...

AVL trees:
o AVL trees are a form of self-balancing trees that try to minimise the height of
the tree
...

o The height of an empty tree is set to -1
...

o Example:
 Adding a node with the value 60 will make the tree unbalanced, since
the height difference between the left and right subtree of the root
node is 2, which is 1 more than the allowed height
...

 Thus, after inserting, the algorithm must walk up the tree from the
insertion node, and see if the AVL property is violated
...

 Four kinds of violations:
Case 2

Case 3

Case 4

o Rotate:
 A rotation is the process of taking a subtree and rotating it
...

 The algorithm for an LL rotation is as follows:
Node RotateLL(Node n)
{
Node k = n
...
Left = k
...
Right = n
return k
}







An RR rotation would be the same except with the “Left” and “Right”
instances swapped
...
A double rotation is required for these cases
...

 With AVL trees, an insertion requires the algorithm to recursively go
down the tree to find the insertion point, and then go back up to
retrieve height information
...

o Properties:
 Every node is “coloured” red or black
...

 If a node is red, then both its children are black
...

 The root node must be black
...

o Observations:
 The shortest path will consist of only black nodes
...

 The longest path can never be more than twice the length of the
shortest path
...

o If the new node is the root, then colour it black
...

o Otherwise, if the parent is red and the UNCLE is black:
 If, relative to grandparent G, the node is on the outside, then perform
an LL or RR rotation (depending on whether the node inserted is to
the left or right of the node’s parent), and recolour the old root G as
red and the new root as black
...

o If both the parent and uncle are red, then the scheme above will not
necessarily work, and we will have to recursively fix further violations
through additional recolouring
...
e
...

o Example of adding a node 45:



The 40 and 55 nodes are both red, so colour them black and make the
parent red:



50 is on the outside of 70, so an LL rotation is performed:









Finally, the node with value 45 is added:

If more red violations are caused, then the rotation is performed
again
...
In this case, we
are done
...

o The easiest case is if the node to be deleted is red
...

o Deleting black nodes is where the problems begin
...

o Conventions: let X be the node being deleted, T be X’s sibling, P be X’s parent,
R be T’s right child and L be T’s left child
...
The same rules apply except reversed for right childs
...

 Make the root red
...

 Go to Step 2
...

 Go to Step 2B
...

 If X has at least one red child:
 Go to Step 2B
...

 If L is red:
 Go to Step 2A2
...

 If both L and R are red:
 Go to either Step 2A2 or Step 2A3
...

o Step 2A1 :
 Swap the colours of X, P and T
...

o Step 2A2 :
 Rotate L around T
...

 Swap the colour of X and P
...

o Step 2A3 :
 Rotate T around P
 Swap the colour of X, P, T and R
...

o Step 2B :
 Continue searching:







If the new X is red, continue searching:

If new X is black, rotate and recolour T and P, and go back to Step 2:

o Step 3:
 Find the node to delete and delete it
...

 If it was black, go to step 2
...

Other types of balancing trees:
o b-tree: each node can have a variable number of keys and children
...
Used for large datasets
...



What is file compression?
o File compression is the process of decreasing the size of data, with the intent
of making file transfer easier
...

 When you compress a file in a lossy fashion, upon decompression, you
get exactly the same file that you compressed
...

o Lossy
...

 These algorithms are generally clever enough to remove the least
important data from a file while keeping it as faithful to the original
file as possible
...

 Consider:
o A small character set with seven characters:
 a, e, I, s, t, sp (space) and nl (new line)
...

o You can create a statistics table out of this:
Character
Code
Frequency
Total Bits
a
000
10
30
e
001
15
45
l
010
12
36
s
011
3
9
t
100
4
12
sp
101
13
39
nl
110
1
3
Total
174
o We can also make a coding tree out of it with values only in the leaf nodes,
called a trie:

a

e

l

s

t

sp

nl

o A left branching indicates a 0, and a right branching indicates a 1
...

o We can move the nl node up one:

nl

a

e



l

s

t

sp

o This suddenly allows nl to be expressed using just two bits: 11
...

Shannon-Fano coding:
o This is used to efficiently find a good prefix tree to do compression with
...

o Initial stage is a forest with C trees
...

o Taking the first example:
10
a

15

12

e

l

4

3
s

t

13
sp

1
nl

o Now, the idea is to create subtrees out of the lightest-weighted trees (in this
case, s and nl, being weighted as 3 and 1 respectively)
...

o The idea is that the higher the level in the forest, the higher the frequency, so
the more frequent characters need less bits to express
...







o Disadvantages:
 The above compression algorithm requires you to be aware of the
frequencies of all of the different characters in the file
...

 Doesn’t work as well with a small variety of letter frequencies
...

o The prefix tree is rearranged as data is received, but it does make the
algorithm sensitive to transition errors
...

Lempel-Ziv-Welch:
o Lempel-Ziv-Welch is a lossless data compression algorithm
...

o It allocates additional bits to each character for a dictionary, which stores
sequences of characters
...

 If a pair of two characters is not in the dictionary, add it to the
dictionary with its own unique code, and move on
...
If it isn’t, then add it to the
dictionary and move on
...

Lempel-Ziv-Welch example:
o We need to compress the string “thisisthe”, using ASCII representation (8bits) for 0 … 255 different character codes, and an extra bit for the dictionary
(meaning 512 different codes overall)
...

o While the compression algorithm worked well for “thisisthe”, it would not
perform as effectively on other strings with not as many patterns in the
characters
...

Lempel-Ziv-Welch decompression:
o Decompression is the same process as compression, except reversed
...
If it were
kept, then the filesize would possibly be even larger than the original
uncompressed data
...

Skip lists:
o Skip lists are an efficient way to store a set of items which allows for quick
searching, inserting and deleting
...

o Skip lists vs linked lists:
 Skip lists are based on linked lists, but unlike linked lists, skip lists
support random access
...

 Skip lists are assumed to be sorted (this is taken care of during
insertion)
...

 They can be compared to a subway which has a local line and an
express line:



As one can see in the above diagram, if we wanted to access “59”,
we’d need to travel from 14, to 34, to 42 using the express line, and
then take the local line to travel from 42 to 50 to 59, since if we’d
stayed on the express line we would’ve ended up at location 72,
which is too far ahead
...

o The higher the level is, the less elements there are (i
...
the more it “skips”)
...

o How do we decide which elements are in the higher list levels?
 If this were a subway, then the higher lists (“express” routes) would
contain the most popular stops
...

Skip list searching:
o Initialisation: Let L = topmost list
...

o If we reach an item larger than the one we’re looking for (i
...
“we got too
far”), then:
 If this is the bottom of the list, then the item isn’t found
...

o Restart from the second step
...

 Going down to |L2| and going through |L2| / |L1| stops
...

o We want to minimise this value to optimise the searching
...
The formula will be optimised when:
|L1| + |L2| / |L1|
This is
smallest
...

This is optimised when |L1| = √|L2|
o For example: if we have 150 items in L2, then if we have √150 elements in L1,
we’d have √150 + 150/ √150 = 24
...

o This gives us a time complexity of Θ(√n) (worst case)
...
e
...

o The optimised amount of lists needed would be lg(n), which gives logarithmic
time complexity if you work it out
...
However, insertions and deletions can tamper with
the skip list, essentially “corrupting” it
...

Skip list insertion:
o Start by running a search to find where the item should fit in the bottom list
...

o This, however, “unidealises” the skip list
...

o Skip lists take a very simple approach to this
...
e
...

Skip list deletion:
o Deletion is trivial
...

Skip list search time complexity:
o The time complexity for a search in an n element skip list costs O(lg(n)) WHP
...

Proof:
o Lemma:
 WHP, an n-element skip list has O(lg(n)) levels
...

 If the current node was promoted up, then we move up, because
that’s where we would’ve come from
...

 Keep on repeating until reaching -∞
...

o Number of “up” moves
< number of levels
≤ c(lg(n)), WHP (from lemma)
o We need to prove that, WHP, the number of moves is at most the number of
times we need to flip a coin to get c(lg(n)) heads
...
e
...

o Appended to each link from one node to another there should be a “width”
value, which is the amount of nodes of the bottom list it will skip:



What is dynamic programming?
o Dynamic programming is an approach to algorithm design that involves
breaking a given problem into various sub-problems
...

o Take an example of the Longest Common Subsequence (LCS) problem:
 The LCS problem deals with finding the longest common subsequence
in a set of sequences with respect to the order in which each item is
presented
...

o A naïve approach would be to create a brute-force algorithm that checks
every possible subsequence in X and sees if it’s inside the set of all the
possible subsequences in Y
...
youtube
...

o Then, if the row’s character is the same as the column’s character, take the
top left diagonal result and increment it
...





Memoisation:
o Memoisation is a technique used in dynamic programming where “memos”
are kept
...

 With a normal recursive approach, the Fibonacci function will
calculate the sum of the number before the current one in the
sequence, and the number before THAT number in the sequence (fib(i
- 1) + fib(i - 2)), and it will keep doing this, even when it had already
calculated these before
...
If it is, then just use that result
...

Graphs revision:
o A graph is a data structure with a set of nodes (vertices) and edges
connecting these vertices
...

 It is written as (a, b) and it is different to (b, a), i
...
a is known as the
first item and b is known as the second
...

o What is a set?
 A set is a sequence of elements
...

 Example:
 {a, b} = {b, a} = {b, a, a, b, b}

o What is a digraph?
 A digraph, short for directed graph, is an ordered pair G = (V, E),
where:
 V is a finite, non-empty set of vertices
...

 E is a finite, non-empty set of ordered pairs called edges
...

 E⊆VxV
 An edge can also be written as a → b
...

 a is the head of the arc
...

 Since this is a directed graph, the order is important, so the ordered
pair of vertices (a, c) is distinct from the ordered pair (c, a)
...

o Example of a digraph:
 G = (V, E), where:
 V = {a, b, c, d}
 E = {(a, b), (a, c), (b, c), (c, a), (c, d), (d, d)}
a

b

c

d

o If there is an edge a → b, then b is said to be adjacent to a, and the edge a →
b is said to emanate from v and be incident to w
...

 |A(v)| is called the out-degree of a vertex and |I(v)| is called the indegree of a vertex
...

 It is simple if no vertices are visited more than once
...

 It is open if the first vertex isn’t the same as the last vertex
...





A path is a simple walk
...

 However, no vertex can be repeated
...

 It can visit the same vertex twice, but not the same edge
...

 This means that no vertex can be repeated EXCEPT the start
and end vertices
...

 Example: {v, v}
...

 A tree is a connected, simple, acyclic graph
...

 Connected:
 A connected graph is a graph with a path between every pair
of vertices
...

 Planar:
 A planar graph is a graph which can be drawn on a plane
without any edges intersecting
...

 Spanning tree:
 A spanning tree is a tree that connects all the vertices in a
graph but uses only some edges
...
This makes the
graph weighted
...

o Adjacency matrix:
 An adjacency matrix is a 2D array used to store the edges of the
graph
...

Each value in the matrix corresponds to the weight of an edge
...

 If there are no weights, then the adjacency matrix simply stores
boolean values
...

o Adjacency list:
 An adjacency list is a list of linked lists
...

o An adjacency matrix is better at finding out if there is an edge between two
given vertices, whereas an adjacency list is better at listing all the vertices
adjacent to a given vertex
...

o Colloquially:
 Consider a party
...
People
can’t shake their own hands and if Person A shakes Person B’s hand,
then it is said that Person B also shook Person A’s hand
...

o With graphs, consider the people to be vertices and the handshakes to be
undirected edges
...

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview