DSA Fundamentals for Coding Excellence | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

Network Theory PART 1 -BASICS£0.75

resonance circuits in ac£2.50

Internet of things£12.50

Study notes on system response to inputs.£2.00

FULL LAB RECORDS£0.50

Total£18.25

Checkout

Or: Edit My Basket

Title: DSA Fundamentals for Coding Excellence
Description: "This comprehensive set of notes on Data Structures and Algorithms (DSA) covers key concepts, techniques, and best practices for efficient problem-solving in computer science. It includes clear explanations of fundamental data structures such as arrays, linked lists, trees, and graphs, along with detailed walkthroughs of essential algorithms like sorting, searching, and dynamic programming. Designed to help students and professionals master DSA for interviews, coding challenges, and real-world applications, these notes also provide practical tips and code examples to enhance understanding."

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

COMPLETE DATA STRUCTURE AND ALGORITHMS

Contents
1 Introduction

1

1
...

1
...
1

1
...
1

Big Oh notation
...
2
...

3

1
...
3

Object oriented concepts
...
3 Pseudocode
...
4 Tips for working through the examples
...
5 Book outline
1
...

...
7 Where can I get the code?
...
8 Final messages

7

I

...
1 Singly Linked List

...
1
...

10

2
...
2

Searching
...
1
...

11

2
...
4

Traversing the list

...
1
...

13

2
...
13
2
...
1

Insertion
...
2
...

15

2
...
3

Reverse Traversal
...
3 Summary
...
1 Insertion
...
2 Searching
...
3 Deletion
...
4 Finding the parent of a given node
...
5 Attaining a reference to a node
...
6 Finding the smallest and largest values in the binary search tree
3
...
26
3
...
1

Preorder
...
7
...

26

3
...
3

Inorder
...
7
...

30

3
...

31

4 Heap

32

4
...

33

4
...

37

4
...

38

4
...

41

4
...

42

5 Sets 44
5
...

46

5
...
1

46

Insertion
...
2 Ordered
...
3 Summary
...
1 A standard queue
...
2 Priority Queue
...
3 Double Ended Queue
...
4 Summary
...
1 Tree Rotations

...
2 Tree Rebalancing
...
3 Insertion
...
4 Deletion
...
5 Summary
...
1 Bubble Sort
...
2 Merge Sort
...
3 Quick Sort
...
4 Insertion Sort
...
5 Shell Sort
...
6 Radix Sort
...
7 Summary
...
1 Primality Test

...
2 Base conversions
...
3 Attaining the greatest common denominator of two numbers
...
4 Computing the maximum value for a number of a specific base
consisting of N digits
...
5 Factorial of a number

...
6 Summary
...
1 Sequential Search
...
2 Probability Search
...
3 Summary
...
1 Reversing the order of words in a sentence
...
2 Detecting a palindrome

...
3 Counting the number of words in a string
...
4 Determining the number of repeated words within a string
...
5 Determining the first matching character between two strings
...
6 Summary
...
1 Iterative algorithms
...
2 Recursive Algorithms
...
3 Summary
...
1 Summary
...
Iterative Solutions

92

93

C
...
94
C
...
95
C
...

D Testing

95

97

D
...

D
...

98

D
...

99

D
...

D
...

99

D
...
100
D
...
100
E Symbol Definitions 101
VI

Chapter 1

Introduction
1
...
It is not a definitive book on the theory of
data structures and algorithms
...

You should use this book alongside another on the same subject, but one that
contains formal proofs of the algorithms in question
...

1
...
We
assume that the reader is familiar with the following:

1
...
An imperative programming language
3
...
2
...
We have chosen to use big Oh notation for
a few reasons, the most important of which is that it provides an abstract
measurement by which we can judge the performance of algorithms without
using mathematical proofs
...
INTRODUCTION

2

Figure 1
...
1 shows some of the run times to demonstrate how important it is to
choose an efficient algorithm
...
Cubic and exponential algorithms should
only ever be used for very small problems (if ever!); avoid them if feasibly
possible
...
g
...

O(n) linear: the run time complexity is proportionate to the size of n
...
g
...

O(n log n) just n log n: usually associated with an algorithm that breaks the problem into
smaller chunks per each invocation, and then takes the results of these
smaller chunks and stitches them back together, e
...
quick sort
...
g
...

O(n3) cubic: very rare
...

If you encounter either of the latter two items (cubic and exponential) this is
really a signal for you to review the design of your algorithm
...
We would strongly advise that you always review
your algorithm design and optimise where possible—particularly loops and

CHAPTER 1
...

The biggest asset that big Oh notation gives us is that it allows us to essentially
discard things like hardware
...
This applies even if the former is ran on a machine that is far faster
than the latter
...
An algorithm with a quadratic run time grows faster than one
with a logarithmic run time
...

Big Oh notation also acts as a communication tool
...
You are
discussing prototype algorithms for node discovery in massive networks
...
Does this give you a good idea of how fast each
respective algorithm is? No
...
Replay the scene
back in your head, but this time as well as talking about algorithm design each
respective developer states the asymptotic run time of their algorithm
...

Some readers may actually work in a product group where they are given
budgets per feature
...
If you save some time in one feature it doesn’t necessarily
give you a buffer for the remaining features
...

Everything is great until your boss comes in and tells you that the start up time
should not exceed n ms
...
Even if you don’t
have these budgets you should still strive for optimal solutions
...

1
...
2

Imperative programming language

All examples are given in a pseudo-imperative coding format and so the reader
must know the basics of some imperative mainstream programming language to
port the examples effectively, we have written this book with the following target
languages in mind:
1
...
C#
3
...
INTRODUCTION

4

The reason that we are explicit in this requirement is simple—all our
implementations are based on an imperative thinking style
...

Two of the languages that we have listed (C# and Java) target virtual machines
which provide various things like security sand boxing, and memory management
via garbage collection algorithms
...
When porting to C++ you must remember to use pointers for certain
things
...
In C++
you should interpret the reference as a pointer to the next node and so on
...

It is essential that the user is familiar with primitive imperative language
constructs before reading this book otherwise you will just get lost
...
2
...
In particular, we never provide data structures or algorithms that work
on generic types—this is in order to make the samples as easy to follow as
possible
...
Inheritance
2
...
Polymorphism
This is especially important if you are planning on looking at the C# target that
we have implemented (more on that in §1
...
As a final note it is also desirable that the reader is familiar
with interfaces as the C# target uses interfaces throughout the sorting algorithms
...
3

Pseudocode

Throughout this book we use pseudocode to describe our solutions
...
Pre-conditions should always be enforced
2
...
INTRODUCTION

5

3
...
All primitive language constructs are explicitly begun and ended
If an algorithm has a return type it will often be presented in the postcondition,
but where the return type is sufficiently obvious it may be omitted for the sake of
brevity
...
Additionally, the name of the
parameter usually acts as the biggest clue to its type
...
g
...

The last major point of reference is that we always explicitly end a language
construct
...
While implicit scope closure works well in simple code, in complex
cases it can lead to ambiguity
...

All algorithms start with a simple algorithm signature, e
...

1) algorithm AlgorithmName(arg1, arg2,
...

n) end AlgorithmName
Immediately after the algorithm signature we list any Pre or Post conditions
...

n) end AlgorithmName
The example above describes an algorithm by the name of AlgorithmName,
which takes a single numeric parameter n
...

Normally what is listed as a pre-conidition is critical to the algorithms
operation
...
The post-condition mainly
describes the effect of the algorithms operation
...
For example, in
the C# target we have implemented, we consider non-conformance to
preconditions to be exceptional cases
...

CHAPTER 1
...
4

6

Tips for working through the examples

As with most books you get out what you put in and so we recommend that in
order to get the most out of this book you work through each algorithm with a
pen and paper to track things like variable names, recursive calls etc
...
This
will help you keep track of and visualise the mutations that are occurring
throughout the algorithm
...
We suggest you put
everything on paper irrespective of how trivial some variables and calculations
may be so that you always have a point of reference
...
This approach is a far cleaner way than drawing out an elaborate map of
function calls with arrows to one another, which gets large quickly and simply
makes things more complex to follow
...

1
...

The reader doesn’t have to read the book sequentially from beginning to end:
chapters can be read independently from one another
...

Each of the chapters on data structures present initially the algorithms
concerned with:
1
...
Deletion
3
...

For all readers we recommend that before looking at any algorithm you
quickly look at Appendix E which contains a table listing the various symbols used
within our algorithms and their meaning
...
You can think of yield in the same light as return
...
INTRODUCTION

7

returns each value to the caller
...

1
...
We
then transcribe these tests into unit tests satisfying them one by one
...

For the most part algorithms have fairly obvious cases which need to be
satisfied
...
With such algorithms we will point out the test cases which are tricky and
the corresponding portions of pseudocode within the algorithm that satisfy that
respective case
...
This in some cases will yield an overwhelming list of concerns
which will hinder your ability to design an algorithm greatly
...
Solving the smaller problems
and then composing them is a far easier task than clouding your mind with too
many little details
...
Because unit tests contribute such a core piece
of creating somewhat more stable software we invite the reader to view Appendix
D which describes testing in more depth
...
7

Where can I get the code?

This book doesn’t provide any code specifically aligned with it, however we do
actively maintain an open source project 1 that houses a C# implementation of all
the pseudocode listed
...
com/dsa
...
8

Final messages

We have just a few final messages to the reader that we hope you digest before
you embark on reading this book:
1
...

1

CHAPTER 1
...
Always work through the algorithms on paper to understand how they
achieve their outcome
If you always follow these key points, you will get the most out of this book
...
LINKED LISTS

Data Structure

10

Linked Lists
Linked lists can be thought of from a high level perspective as being a series of
nodes
...

In DSA our implementations of linked lists always maintain head and tail
pointers so that insertion at either the head or tail of the list is a constant time
operation
...

As such, linked lists in DSA have the following characteristics:
1
...
Deletion is O(n)
3
...
In DSA
we chose to always maintain pointers (or more aptly references) to the node(s) at
the head and tail of the linked list and so performing a traditional insertion to
either the front or back of the linked list is an O(1) operation
...
When the node we are inserting before is somewhere in the
middle of the linked list (known as random insertion) the complexity is O(n)
...
This traversal yields an O(n) run time
...
the list is dynamically resized, thus it incurs no copy penalty like an array or
vector would eventually incur; and
2
...

2
...
Each node that makes up a singly linked list consists of a value, and a
reference to the next node (if any) in the list
...
1: Singly linked list node

CHAPTER 2
...
2: A singly linked list populated with integers

2
...
1

Insertion

In general when people talk about insertion with respect to linked lists of any form
they implicitly refer to the adding of a node to the tail of the list
...

Adding a node to a singly linked list has only two cases:
1
...
we simply need to append our node onto the end of the list updating the tail
reference appropriately
...
Next ← n
tail ← n
end if
end Add

As an example of the previous algorithm consider adding the following
sequence of integers to the list: 1, 45, 60, and 12, the resulting list is that of Figure
2
...

2
...
2

Searching

Searching a linked list is straightforward: we simply traverse the list checking the
value we are looking for with the value of each node in the linked list
...
1
...

CHAPTER 2
...
Value 6= value
7)
n ← n
...
1
...
the list is empty; or
2
...
we are removing the head node; or
4
...
the node to remove is somewhere in between the head and tail; or 6
...
If you know
that items will only ever be removed from the head or tail of the list then you can
create much more concise algorithms
...

Remove(head, value) is the
head node in the list
value is the value to remove from the list
4)
Post: value is removed from the list, true;
otherwise false
5)
if head = ∅ 6) // case 1
7)
return false
8)
end if
9)
n ← head
10)
if n
...
LINKED LISTS

14

1) algorithm
2)
Pre: head
3)
11)
if head = tail
12)
// case 2
13)
head ← ∅
14)
tail ← ∅
15)
else
16)
// case 3
17)
head ← head
...
Next =6
∅ and n
...
Value 6= value
22)
n ← n
...
Next 6= ∅
25)
if n
...
Next ← n
...
Next
31)
return true
32)
end if
33)
// case 6
34)
return false
35)
end Remove

2
...
4

Traversing the list

Traversing a singly linked list is the same as that of traversing a doubly linked list
(defined in §2
...
You start at the head of the list and continue until you come
across a node that is ∅
...
node = ∅, we have exhausted all nodes in the linked list; or
2
...
Next
...

Traverse(head) is the head
node in the list
Post: the items in the list have been traversed
4)
n ← head
5)
while n 6= 0

CHAPTER 2
...
Value
7)
n ← n
...
1
...
e
...
1
...
However, what if we wanted to traverse the nodes in the
linked list in reverse order for some reason? The algorithm to perform such a
traversal is very simple, and just like demonstrated in §2
...
3 we will need to
acquire a reference to the predecessor of a node, even though the fundamental
characteristics of the nodes that make up a singly linked list make this an
expensive operation
...

Figure 2
...

1)
algorithm ReverseTraversal(head, tail)
2)
Pre: head and tail belong to the same list
3)
Post: the items in the list have been traversed in reverse order
4)
if tail 6= ∅
5)
curr ← tail
6)
while curr 6= head
7)
prev ← head
8)
while prev
...
Next
10)
end while
11)
yield curr
...
Value 15)
end if
16) end ReverseTraversal
This algorithm is only of real interest when we are using singly linked lists, as
you will soon see that doubly linked lists (defined in §2
...
2
...

2
...
The only difference is that
each node has a reference to both the next and previous nodes in the list
...
LINKED LISTS

16

Figure 2
...
4: Doubly linked list node
The following algorithms for the doubly linked list are exactly the same as
those listed previously for the singly linked list:

CHAPTER 2
...
Searching (defined in §2
...
2)
2
...
1
...
2
...
1
...

1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)

algorithm Add(value)
Pre: value is the value to add to the list
Post: value has been placed at the tail of the list
n ← node(value)
if head = ∅
head ← n
tail ← n
else
n
...
Next ← n
tail ← n
end if
end Add

Figure 2
...
1
...

Figure 2
...
2
...
1
...
Like insertion we have the added task
of binding an additional reference (Previous) to the correct value
...
LINKED LISTS
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)

2
...
3

18

if value = head
...
Next
head
...
Next
while n 6= ∅ and value 6= n
...
Next
end while
if n = tail
tail ← tail
...
Next ← ∅
return true
else if n 6= ∅
n
...
Next ← n
...
Next
...
Previous
return true
end if
return false
end Remove

Reverse Traversal

Singly linked lists have a forward only design, which is why the reverse traversal
algorithm defined in §2
...
5 required some creative invention
...
1
...

Figure 2
...

Figure 2
...
LINKED LISTS
2)
3)
4)
5)
6)
7)
8)
9)

2
...
Value
n ← n
...

Using a data structure like an array would require you to specify the size up front;
exceeding that size involves invoking a resizing algorithm which has a linear run
time
...
This requires
maintaining pointers to the nodes at the head and tail of the list but the memory
overhead will pay for itself if this is an operation you will be performing many
times
...
At the expense of a little memory (in most cases 4 bytes
would suffice), and a few more read/writes you could maintain a count variable
that tracks how many items are contained in the list so that accessing such a
primitive property is a constant operation - you just need to update count during
the insertion and deletion algorithms
...
In general doubly linked lists are more accommodating for non-trivial
operations on a linked list
...
For the most cases this requirement is present
...

Sometimes you will have to backtrack in order to create the correct parse tree
...
LINKED LISTS
list
...
We start with a root
node with value x, where the left subtree of x contains nodes with values < x and
the right subtree contains nodes whose values are ≥ x
...

BSTs are of interest because they have operations which are favourably fast:
insertion, look up, and deletion can all be done in O(log n) time
...

In the following examples you can assume, unless used as a parameter alias
that root is a reference to the root node of the tree
...
1: Simple unbalanced binary search tree

19

3
...

1)
2)
3)

algorithm Insert(value)
Pre: value has passed custom type checks for type T
Post: value has been placed in the correct location in the tree

CHAPTER 3
...
Value
if current
...
Left ← node(value)
else
InsertNode(current
...
Right = ∅
current
...
Right, value)
end if
end if
end InsertNode

The insertion algorithm is split for a good reason
...
If
the tree is empty then we simply create our root node and finish
...
Note that at each stage we perform a
binary chop: we either choose to recurse into the left subtree or the right by
comparing the new value with that of the current node
...

CHAPTER 3
...
2

23

Searching

Searching a BST is even simpler than insertion
...

We have talked previously about insertion, we go either left or right with the
right subtree containing values that are ≥ x where x is the value of the node we are
inserting
...
the root = ∅ in which case value is not in the BST; or
2
...
Value = value in which case value is in the BST; or
3
...
Value, we must inspect the left subtree of root for value; or
4
...
Value, we must inspect the right subtree of root for value
...
Value = value
return true
else if value < root
...
Left, value)
else
return Contains(root
...
3

Deletion

Removing a node from a BST is fairly straightforward, with four cases to consider:
1
...
the value to remove has a right subtree, but no left subtree; or
3
...
the value to remove has both a left and right subtree in which case we promote the
largest value in the left subtree
...
This case is already covered by the first, but should be noted as a
possibility nonetheless
...
In such a case the first
occurrence of that value in the BST will be removed
...
BINARY SEARCH TREE

24

Figure 3
...
4 and §3
...

1)
algorithm Remove(value)
2)
Pre: value is the value of the node to remove, root is the root node of the BST
3)
Count is the number of items in the BST
3)
Post: node with value is removed if found in which case yields true, otherwise
false
4)
nodeToRemove ← FindNode(value)
5)
if nodeToRemove = ∅
6)
return false // value not in BST
7)
end if
8)
parent ← FindParent(value)
9)
if Count = 1
10)
root ← ∅ // we are removing the only node in the BST
11)
else if nodeToRemove
...
Right = null
12)
// case #1
13)
if nodeToRemove
...
Value
14)
parent
...
Right ← ∅
17)
end if
18)
else if nodeToRemove
...
Right 6= ∅
19)
// case # 2
20)
if nodeToRemove
...
Value
21)
parent
...
Right
22)
else
23)
parent
...
Right
24)
end if
25)
else if nodeToRemove
...
Right = ∅

CHAPTER 3
...
Value < parent
...
Left ← nodeToRemove
...
Right ← nodeToRemove
...
Left
while largestV alue
...
Right
end while
// set the parents’ Right pointer of largestV alue to ∅
FindParent(largestV alue
...
Right ← ∅
nodeToRemove
...
Value
end if
Count ← Count −1
return true
end Remove

3
...
We have found that such an algorithm
is very useful, especially when performing extensive tree transformations
...
Value
6)
return ∅
7)
end if
8)
if value < root
...
Left = ∅
10)
return ∅
11)
else if root
...
Value = value
12)
return root
13)
else
14)
return FindParent(value, root
...
Right = ∅
18)
return ∅
19)
else if root
...
Value = value
20)
return root
21)
else

CHAPTER 3
...
Right)
end if
end if
end FindParent

A special case in the above algorithm is when the specified value does not exist
in the BST, in which case we return ∅
...

3
...
4, but instead of returning a reference to the
parent of the node with the specified value, it returns a reference to the node itself
...

1)
algorithm FindNode(root, value)
2)
Pre: value is the value of the node we want to find the parent of
3)
root is the root node of the BST
4)
Post: a reference to the node of value if found; otherwise ∅
5)
if root = ∅
6)
return ∅
7)
end if
8)
if root
...
Value
11)
return FindNode(root
...
Right, value)
14)
end if
15)
end FindNode
Astute readers will have noticed that the FindNode algorithm is exactly the
same as the Contains algorithm (defined in §3
...
Given FindNode, the easiest way
of implementing Contains is to call FindNode and compare the return value with
∅
...
6 Finding the smallest and largest values in the
binary search tree
To find the smallest value in a BST you simply traverse the nodes in the left subtree
of the BST always going left upon each encounter with a node, terminating when
you find a node with no left subtree
...
BINARY SEARCH TREE

27

largest value in the BST
...

The base case in both FindMin, and FindMax algorithms is when the Left
(FindMin), or Right (FindMax) node references are ∅ in which case we have
reached the last node
...
7

algorithm FindMin(root)
Pre: root is the root node of the BST
root =6
∅
Post: the smallest value in the BST is located
if root
...
Value
end if
FindMin(root
...
Right = ∅
return root
...
Right) 9) end FindMax

Tree Traversals

There are various strategies which can be employed to traverse the items in a tree;
the choice of strategy depends on which node visitation order you require
...

3
...
1

Preorder

When using the preorder algorithm, you visit the root first, then traverse the left
subtree and finally traverse the right subtree
...
3
...
Value
Preorder(root
...
Right)
end if
end Preorder

CHAPTER 3
...
7
...
7
...
An example of postorder traversal
is shown in Figure 3
...

1)
2)
3)
4)
5)
6)
7)
8)
9)

algorithm Postorder(root)
Pre: root is the root node of the BST
Post: the nodes in the BST have been visited in postorder
if root 6= ∅
Postorder(root
...
Right)
yield root
...
3: Preorder visit binary search tree example

CHAPTER 3
...
4: Postorder visit binary search tree example

3
...
3

Inorder

Another variation of the algorithms defined in §3
...
1 and §3
...
2 is that of inorder
traversal where the value of the current node is yielded in between traversing the
left subtree and the right subtree
...
5
...
BINARY SEARCH TREE

30

Figure 3
...
Left)
yield root
...
Right)
end if
end Inorder

One of the beauties of inorder traversal is that values are yielded in their
comparison order
...

3
...
4

Breadth First

Traversing a tree in breadth first order yields the values of all nodes of a particular
depth in the tree before any deeper ones
...
An example of breadth
first traversal is shown in Figure 3
...

Traditionally breadth first traversal is implemented using a list (vector,
resizeable array, etc) to store the values of the nodes visited in breadth first order
and then a queue to store those nodes that have yet to be visited
...
BINARY SEARCH TREE

31

Figure 3
...
8

algorithm BreadthFirst(root)
Pre: root is the root node of the BST
Post: the nodes in the BST have been visited in breadth first order
q ← queue
while root =6
∅
yield root
...
Left 6= ∅
q
...
Left)
end if
if root
...
Enqueue(root
...
IsEmpty()
root ← q
...
With logarithmic
insertion, lookup, and deletion it is very effecient
...
Trees are recursive
data structures, so typically you will find that many algorithms that operate on a
tree are recursive
...
We
can only attain logarithmic run times for the algorithms presented earlier when

CHAPTER 3
...
A binary search tree does not enforce such a property, and the run
times for these operations on a pathologically unbalanced tree become linear:
such a tree is effectively just a linked list
...

Chapter 4

Heap
A heap can be thought of as a simple tree data structure, however a heap usually employs
one of two strategies:
1
...
max heap
Each strategy determines the properties of the tree and its values
...
For example, the node at the root of the tree will have the
smallest value in the tree
...
In this
book you should assume that a heap employs the min heap strategy unless
otherwise stated
...
The nodes are conceptually the same, however, having at most two
children
...
1 shows how the tree (not a heap data structure) (12 7(3 2) 6(9
)) would be represented as an array
...
1 is a result of simply
adding values in a top-to-bottom, left-to-right fashion
...
2 shows arrows to
the direct left and right child of each value in the array
...
3
shows a step by step process to represent a tree data structure as an array
...
3 you can assume that the default capacity of our array is eight
...
Often the run time behaviour of a program
can be unpredictable when it comes to the size of its internal data structures, so
we need to choose a more dynamic data structure that contains the following
properties:
1
...
the data structure encapsulates resizing algorithms to grow the array as required
at run time

32

CHAPTER 4
...
1: Array representation of a simple tree data structure

Figure 4
...
Vector
2
...
List
Figure 4
...
This varies from case to case; sometimes null values are prohibited entirely;
in other cases we may treat them as being smaller than any non-null value, or
indeed greater than any non-null value
...
For the sake of clarity we will avoid
the issue by prohibiting null values
...
The required expressions for this are
defined as follows for a node at index:
1
...
2 ∗ index + 1 (left child)
3
...
4 a) represents the calculation of the right child of 12 (2∗0+2); and b)
calculates the index of the parent of 3 ((3 − 1)/2)
...
1

Insertion

Designing an algorithm for heap insertion is simple, but we must ensure that heap
order is preserved after each insertion
...
Inserting a value into the next free slot in an array is simple: we just
need to keep track of the next free index in the array as a counter, and increment
it after each insertion
...
In the case of min-heap ordering
this requires us to swap the values of a parent and its child if the value of the child
is < the value of its parent
...

CHAPTER 4
...
3: Converting a tree data structure to its array counterpart

35

36

CHAPTER 4
...
4: Calculating node properties

The run time efficiency for heap insertion is O(log n)
...

Figure 4
...

CHAPTER 4
...
5: Inserting values into a min-heap 1) algorithm Add(value)
Pre: value is the value to add to the heap
Count is the number of items in the heap
Post: the value has been added to the heap
heap[Count] ← value
Count ← Count +1
MinHeapify() 8) end Add

1)
2)
3)

algorithm MinHeapify()
Pre: Count is the number of items in the heap
heap is the array used to store the heap items

CHAPTER 4
...

4
...
The
algorithm for deletion has three steps:
1
...
put the last value in the heap at the index location of the item to delete
3
...
HEAP

39

Figure 4
...
In Figure 4
...

Please note that in our deletion algorithm that we don’t default the removed
value in the heap array
...
e
...
This is important in
both unmanaged, and managed languages
...
If we were to
not null that hole then the object could still be reached and thus won’t be garbage
collected
...
3

Searching

Searching a heap is merely a matter of traversing the items in the heap array
sequentially, so this operation has a run time complexity of O(n)
...
7
...

2)
3)
4)

Figure 4
...
HEAP

5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)

40

Post: value is located in the heap, in which case true; otherwise false
i←0
while i < Count and heap[i] 6= value
i←i+1
end while
if i < Count
return true
else
return false
end if
end Contains

The problem with the previous algorithm is that we don’t take advantage of
the properties in which all values of a heap hold, that is the property of the heap
strategy being used
...
Factoring in what we know about the
heap we can optimise the search algorithm by including logic which makes use of
the properties presented by a certain heap strategy
...
As an example
consider a min-heap that doesn’t contain the value 5
...
If
this is the case then 5 cannot be in the heap and so we can provide an answer
without traversing the rest of the heap
...
The optimisation that we present can be very
common and so we feel that the extra logic within the loop is justified to prevent
the expensive worse case run time
...
To tailor the
algorithm for a max-heap the two comparison operations in the else if condition
within the inner while loop should be flipped
...
HEAP

16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)

count ← count + 1
end if
start ← start + 1
end while
if count = nodes
return false
end if
nodes ← nodes ∗ 2
end while
return false
end Contains

The new Contains algorithm determines if the value is not in the heap by
checking whether count = nodes
...
As an example consider Figure 4
...
If we are
searching for the value 10 within the min-heap displayed it is obvious that we
don’t need to search the whole heap to determine 9 is not present
...

4
...
3 traversal of a heap is usually done like that of any other array
data structure which our heap implementation is based upon
...
You will note that in the search algorithm that we use Count as this upper
bound rather than the actual physical bound of the allocated array
...

Figure 4
...
HEAP

42

Figure 4
...
Potentially you will have at most
LengthOf(heapArray)−Count garbage values in the backing heap array data
structure
...
To make
things simple the garbage value of a reference type will be simple ∅ and 0 for a
value type
...
8 shows a heap that you can assume has been mutated many times
...
In Figure 4
...

From what you have read thus far you will most likely have picked up that
traversing the heap in any other order would be of little benefit
...
Heaps are not usually traversed in
any other way than the one prescribed previously
...
5

Summary

Heaps are most commonly used to implement priority queues (see §6
...
As discussed in both the
insertion §4
...
2 sections a heap maintains heap order according
to the selected ordering strategy
...
The former strategy enforces that the value of a parent node is less than
that of each of its children, the latter enforces that the value of the parent is greater
than that of each of its children
...
If the heap can be
configured otherwise, e
...
to use max-heap then this will often require you to state
this explicitly
...
The cost of such a policy is that upon each
insertion and deletion we invoke algorithms that have logarithmic run time
complexities
...
We will also have to factor in the cost of
dynamic array expansion at some stage
...
It may

CHAPTER 4
...

This will assist in minimising the impact of dynamic array resizing
...
The values within the set
are distinct from one another
...

This section does not cover set theory in depth; rather it demonstrates briefly
the ways in which the values of sets can be defined, and common operations that
may be performed upon them
...

Given the set A defined previously we can say that 4 is a member of A denoted
by 4 ∈ A, and that 99 is not a member of A denoted by 99 ∈/ A
...
A more concise way of
defining a set and its members is by providing a series of properties that the values
of the set must satisfy
...
x is an alias to the current value
we are inspecting and to the right hand side of | are the properties that x must
satisfy to be in the set A
...
You will be able to note from the previous
definition of the set A that the set can contain an infinite number of values, and
that the values of the set A will be all even integers that are a member of the
natural numbers set N, where N = {1,2,3,
...
The union set can be defined as follows A ∪ B = {x | x ∈ A or x ∈
B}, and intersection A ∩ B = {x | x ∈ A and x ∈ B}
...
1 demonstrates set
intersection and union graphically
...

Both set union and intersection are sometimes provided within the framework
associated with mainstream languages
...
NET 3
...
Linq
...
microsoft
...
microsoft
...
linq
...
aspx

45

CHAPTER 5
...
1: a) A ∩ B; b) A ∪ B
these algorithms
...
Linq
...

Set union can be implemented as a simple traversal of both sets adding each
item of the two sets to a new union set
...
Add(item)
end foreach
foreach item in set2
union
...
This runtime
applies only to sets that exhibit O(1) insertions
...
The only major thing worth
pointing out about our algorithm is that we traverse the set containing the fewest
items
...

46
1)
2)
3)
3)
4)
5)
6)
7)
8)
9)

algorithm Intersection(set1, set2)
Pre: set1, and set2 6= ∅
intersection, and smallerSet are sets
Post: An intersection of set1, and set2 has been created
if set1
...
Count
smallerSet ← set1
else
smallerSet ← set2
end if
foreach item in smallerSet

CHAPTER 5
...
Contains(item) and set2
...
Add(item)
end if
end foreach
return intersection
end Intersection

The run time of our Intersection algorithm is O(n) where n is the number of
items in the smaller of the two sets
...

5
...

For example the members of B = {6,2,9} conform to no ordering scheme because
it is not required
...

We will only look at insertion for an unordered set and cover briefly why a
hash table is an efficient data structure to use for its implementation
...
1
...
As mentioned previously we only add an item to a set if that item
is not already in the set, so the backing data structure we use must have a quick
look up and insertion run time complexity
...
O(1) for insertion
2
...

5
...

In DSA 0
...
From versions 0
...

The ordered set has its order realised by performing an inorder traversal upon
its backing tree data structure which yields the correct ordered sequence of set
members
...
SETS

47

Because an ordered set in DSA is simply a wrapper for an AVL tree that
additionally ensures that the tree contains unique items you should read §7 to
learn more about the run time complexities associated with its operations
...
3

Summary

Sets provide a way of having a collection of unique objects, either ordered or
unordered
...
As we discussed in §5
...
1 because we check first if
the item is already contained within the set before adding it we need this check to
be as quick as possible
...
Using a hash table this check results in a near constant run time
complexity
...

Another key property of sets implemented using the approach we describe is
that both have favourably fast look-up times
...
Ordered sets as
described in 3 perform a binary chop at each stage when searching for the
existence of an item yielding a logarithmic run time
...
For example in §11
...

Chapter 6

Queues
Queues are an essential data structure that are found in vast amounts of software
from user mode to kernel mode applications that are core to the system
...

A traditional queue only allows you to access the item at the front of the queue;
when you add an item to the queue that item is placed at the back of the queue
...
The following list
describes the operations performed upon the queue in Figure 6
...

Enqueue(10)

2
...

Enqueue(9)

4
...

Enqueue(3)

6
...

Peek()

48

1

This operation is sometimes referred to as Front

CHAPTER 6
...

Enqueue(33)

9
...
Dequeue()

6
...
In DSA we don’t
provide a standard queue because queues are so popular and such a core data
structure that you will find pretty much every mainstream library provides a
queue data structure that you can use with your language of choice
...

The main property of a queue is that we have access to the item at the front of
the queue
...
1)
...
The reason we have an O(1) run time complexity
for deletion is because we only ever remove items from the front of queues (with
the Dequeue operation)
...
The run time complexity for searching a queue remains the same as that of
a singly linked list: O(n)
...
2

Priority Queue

Unlike a standard queue where items are ordered in terms of who arrived first, a
priority queue determines the order of its items by using a form of custom
comparer to see which item has the highest priority
...

A sensible implementation of a priority queue is to use a heap data structure
(defined in §4)
...
A heap provides us with the
ability to construct a priority queue where the items with the highest priority are
either those with the smallest value, or those with the largest
...
3

Double Ended Queue

Unlike the queues we have talked about previously in this chapter a double ended
queue allows you to access the items at both the front, and back of the queue
...

A deque applies no prioritization strategy to its items like a priority queue
does, items are added in order to either the front of back of the deque
...

CHAPTER 6
...
1: Queue mutations Deque’s provide front and
back specific versions of common queue operations, e
...

you may want to enqueue an item to the front of the
queue rather than the back in which case you would use

51

CHAPTER 6
...

The following list identifies operations that are
commonly supported by deque’s:
• EnqueueFront
• EnqueueBack
• DequeueFront
• DequeueBack
• PeekFront
• PeekBack
Figure 6
...
EnqueueBack(12)
2
...
EnqueueBack(23)
4
...
DequeueFront()
6
...
In some cases the set of algorithms that add
an item to the back of the deque may be named as they are with normal queues,
e
...
EnqueueBack may simply be called Enqueue an so on
...
This is certainly
the case in
...
In such a scenario you can
safely assume that the Add method will simply enqueue an item to the back of the
deque
...
That is enqueueing an item to the back of a the queue is O(1),
additionally enqueuing an item to the front of the queue is also an O(1) operation
...
Using an array as the backing data structure would require the
programmer to be explicit about the size of the array up front, this would provide
an obvious advantage if the programmer could deterministically state the
maximum number of items the deque would contain at any one time
...
Such an approach would also leave the library
developer

CHAPTER 6
...
2: Deque data structure after several mutations to look at
array minimization techniques as well, it could be that after several
invocations of the resizing algorithm and various mutations on the
deque later that we have an array taking up a considerable amount of
memory yet we are only using a few small percentage of that memory
...

To bypass all the aforementioned issues a deque typically uses a doubly linked
list as its baking data structure
...

With a language that targets a garbage collected virtual machine memory
reclamation is an opaque process as the nodes that are no longer referenced
become unreachable and are thus marked for collection upon the next invocation
of the garbage collection algorithm
...

CHAPTER 6
...
4

53

Summary

With normal queues we have seen that those who arrive first are dealt with first;
that is they are dealt with in a first-in-first-out (FIFO) order
...
Normal queues have constant insertion and
deletion run times
...
Despite that, searching is usually
exposed on queues and typically the run time is linear
...
One
implementation of a priority queue is to use a heap data structure as its backing
store, so the run times for insertion, deletion, and searching are the same as those
for a heap (defined in §4)
...
For example the breadth first search
defined in §3
...
4 makes extensive use of queues
...
M
...
M
...

An AVL tree is a binary search tree (BST, defined in §3) with a self-balancing
condition stating that the difference between the height of the left and right
subtrees cannot be no more than one, see Figure 7
...
This condition, restored after
each tree modification, forces the general shape of an AVL tree
...
Consider a binary search tree obtained
by starting with an empty tree and inserting some values in the following order
1,2,3,4,5
...
2 represents the worst case scenario in which the running
time of all common operations such as search, insertion and deletion are O(n)
...
The height of an AVL tree with n nodes is O(log n)
regardless of the order in which values are inserted
...
This is combined with a
technique that efficiently restores the balance condition for the tree
...

h

h+1

Figure 7
...
AVL TREE

Figure 7
...
3: Avl trees, insertion order: -a)1,2,3,4,5 -b)1,5,4,3,2

7
...
There are left and
right rotations both of them decrease the height of a BST by moving smaller
subtrees down and larger subtrees up
...
AVL TREE

Figure 7
...
AVL TREE

57

1) algorithm
2)
Pre:
3)
Post:

4)
5)
6)
7)
8)

LeftRotation(node) node
...
Right is the new root of the subtree,
node has become node
...
Right
node
...
Left
RightNode
...
Left ! = ∅
Post: node
...
Left’s right child and,
BST properties are preserved
LeftNode ← node
...
Left ← LeftNode
...
Right ← node
end RightRotation

The right and left rotation algorithms are symmetric
...

7
...
If this property is not present then we
perform the correct rotation
...
These
algorithms are named LeftAndRightRotation, and RightAndLeftRotation
...
g
...

CheckBalance(current) current is the node
to start from balancing

CHAPTER 7
...
Left = ∅ and current
...
Height = -1;
7)
else
8)
current
...
Left),Height(current
...
Left) - Height(current
...
Left
...
Left
...
Left) - Height(current
...
Right
...
Right
...
3

Insertion

AVL insertion operates first by inserting the given value the same way as BST
insertion and then by applying rebalancing techniques if necessary
...
Each time we insert a node into an AVL tree:
1
...
AVL TREE

59

1) algorithm
2)
Pre:
3)
Post:
4)
2
...

Insert(value) value has passed custom type
checks for type T
value has been placed in the correct location in the tree
if root = ∅
5)
root ← node(value)
6)
else
7)
InsertNode(root, value)
8)
end if
9)
end Insert
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)

algorithm InsertNode(current, value)
Pre: current is the node to start from
Post: value has been placed in the correct location in the tree while
preserving tree balance
if value < current
...
Left = ∅
current
...
Left, value)
end if
else
if current
...
Right ← node(value)
else
InsertNode(current
...
AVL TREE

60

1) algorithm
2)
Pre:
3)
Post:
4)

7
...
3)
...
If the tree doesn’t need to be
rebalanced and the value we are removing is contained within the tree then no
further step are required
...

CHAPTER 7
...
V alue = V alue
parent = nodeToRemove
if value < nodeToRemove
...
Left
else
nodeToRemove ← nodeToRemove
...
Push(nodeToRemove)
end while
if nodeToRemove = ∅
return false // value not in Avl
end if
parent ← FindParent(value)
if count = 1 // count keeps track of the # of nodes in the Avl
root ← ∅ // we are removing the only node in the Avl
else if nodeToRemove
...
Right = null
// case #1
if nodeToRemove
...
Value
parent
...
Right ← ∅
end if
else if nodeToRemove
...
Right 6= ∅
// case # 2
if nodeToRemove
...
Value
parent
...
Right
else
parent
...
Right
end if
else if nodeToRemove
...
Right = ∅
// case #3
if nodeToRemove
...
Value
parent
...
Left
else
parent
...
Left
end if
else
// case #4
largestV alue ← nodeToRemove
...
Right 6= ∅
// find the largest value in the left subtree of nodeToRemove

50)
51)
52)
53)
54)
55)
56)
57)
58)
59)
60)
61)

7
...
Right
end while
// set the parents’ Right pointer of largestV alue to ∅
FindParent(largestV alue
...
Right ← ∅
nodeToRemove
...
Value
end if
while path
...
Pop()) // we trackback to the root node check
balance
end while
count ← count − 1
return true
end Remove

Summary

The AVL tree is a sophisticated self balancing tree
...
Unlike its older brother the
AVL tree avoids worst case linear complexity runtimes for its operations
...

63

CHAPTER 7
...
g
...
g
...

The algorithms discussed can easily be translated into generic sorting
algorithms within your respective language of choice
...
1

Bubble Sort

One of the most simple forms of sorting is that of comparing each item with every
other item in some list, however as the description may imply this form of sorting
is not particularly effecient O(n2)
...

1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)

8
...
Count − 1
for j ← 0 to list
...
The algorithm is based on splitting a list, into
two similar sized lists (left, and right) and sorting each list and then merging the
sorted lists back together
...

63

65

CHAPTER 8
...
1: Bubble Sort Iterations
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)

algorithm Mergesort(list)
Pre: list 6= ∅
Post: list has been sorted into values of ascending order
if list
...
Count / 2
left ← list(m)
right ← list(list
...
Count−1
left[i] ← list[i]
end for
for i ← 0 to right
...
SORTING

Figure 8
...
3

Quick Sort

Quick sort is one of the most popular sorting algorithms based on divide et impera
strategy, resulting in an O(n log n) complexity
...
This is the main quick sort operation, called partition,
recursively repeated on lesser and greater sub lists until their size is one or zero in which case the list is implicitly sorted
...

4

75

74

2

54

CHAPTER 8
...
3: Quick Sort Example (pivot median strategy)
1)
2)
3)
4)
5)
6)
7)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)

algorithm QuickSort(list)
Pre: list 6= ∅
Post: list has been sorted into values of ascending order
if list
...
Count−1
if list[i] = pivot
equal
...
Insert(list[i])
end if
if list[i] > pivot
greater
...
SORTING

8
...
It can be best thought of as a sorting scheme similar to that of sorting a hand
of playing cards, i
...
you take one card and then look at the rest with the intent of
building up an ordered set of cards in your hand
...
4: Insertion Sort Iterations
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)

8
...
Count
hold ← list[unsorted]
i ← unsorted − 1
while i ≥ 0 and hold < list[i]
list[i + 1] ← list[i]
i←i−1
end while
list[i + 1] ← hold
unsorted ← unsorted + 1
end while
return list
end Insertionsort

Shell Sort

Put simply shell sort can be thought of as a more efficient variation of insertion
sort as described in §8
...

Shell sort is fairly straight forward but may seem somewhat confusing at first
as it differs from other sorting algorithms in the way it selects items to compare
...
5 shows shell sort being ran on an array of integers, the red coloured
square is the current value we are holding
...
Count / 2

CHAPTER 8
...
6

69

while increment 6= 0
current ← increment
while current < list
...
Normally a
bucket is a queue, each time radix sort is performed these buckets are emptied
starting the smallest key bucket to the largest
...
g
...
Because we are dealing with, in this example base 10 numbers
we have at any one point 10 possible key values 0
...
Before we show you this first simple version of radix sort let us clarify
what we mean by isolating keys
...
The number used as an example has in total three keys:

70

CHAPTER 8
...
5: Shell sort
1
...
Tens
3
...
SORTING

71

For further clarification what if we wanted to determine how many thousands
the number 102 has? Clearly there are none, but often looking at a number as final
like we often do it is not so obvious so when asked the question how many
thousands does 102 have you should simply pad the number with a zero in that
location, e
...
0102 here it is more obvious that the key value at the thousands
location is zero
...
The solution is actually very simple, but its not often you want to isolate
a key in a number so we will spell it out clearly here
...
As
a simple example lets say that we want to access the tens key of the number 1290,
the tens column is key 10 and so after substitution yields key ← (1290 / 10) % 10
= 9
...
The value of key is used in
the following algorithm to work out the index of an array of queues to enqueue
the item into
...
Enqueue(item)
end foreach
list ← CollapseQueues(queues)
ClearQueues(queues)
indexOfKey ← indexOfKey ∗ 10
end for
return list
end Radix

Figure 8
...
Omitted queues in Figure 8
...

8
...
g
...
3), some are not (e
...

72

CHAPTER 8
...
6: Radix sort base 10 algorithm
bubble sort defined in §8
...

Selecting the correct sorting algorithm is usually denoted purely by efficiency,
e
...
you would always choose merge sort over shell sort and so on
...

Some algorithms are very nicely expressed in a recursive fashion, however these
algorithms ought to be pretty efficient, e
...
implementing a linear, quadratic, or
slower algorithm using recursion would be a very bad idea
...

Chapter 9

Numeric
Unless stated otherwise the alias n denotes a standard 32 bit integer
...
1

Primality Test

A simple algorithm that determines whether or not a given integer is a prime
number, e
...
2, 5, 7, and 13 are all prime numbers, however 6 is not as it can be
the result of the product of two numbers that are <√ 6
...

1)
2)
3)
4)
5)
6)
7)
8)
9)
10)

9
...
For example 78 10 has a binary
representation of 10011102
...
1 shows the algorithm trace when the number to convert to binary is
74210
...
NUMERIC

5)
6)
7)
8)
9)

list
...
1: Algorithm trace of ToBinary

9
...
g
...
One of the most elegant solutions to this problem is based on Euclid’s algorithm
that has a run time complexity of O(n2)
...
4 Computing the maximum value for a number of a
specific base consisting of N digits
This algorithm computes the maximum value of a number for a given number of
digits, e
...
using the base 10 system the maximum number we can have made up
of 4 digits is the number 999910
...

The expression by which we can compute this maximum value for N digits is:
N
B − 1
...
As an example if we wanted to determine the maximum value for a

CHAPTER 9
...
The maximum value of the previous example would be
represented as FFFFFF16 which yields 1677721510
...
For this reason in our actual implementation numberBase
has an enumeration type
...
For our implementation we cast the
value of numberBase to an integer, as such we extract the value associated with
the relevant option in the Base enumeration
...
In the algorithm listed below
the cast is implicit so we just use the actual argument numberBase
...
5

algorithm MaxValue(numberBase, n)
Pre: numberBase is the number system to use, n is the number of digits
Post: the maximum value for numberBase consisting of n digits is computed
return Power(numberBase,n) −1 5) end MaxValue

Factorial of a number

Attaining the factorial of a number is a primitive mathematical operation
...
The iterative
solution is presented because it too is trivial to implement and doesn’t suffer from
the use of recursion (for more on recursion see §C)
...
The aforementioned acts as a base case that we
will build upon
...
We can indicate that we are after the factorial
of a number using the form N! where N is the number we wish to attain the
factorial of
...

1)
algorithm Factorial(n)
2)
Pre: n ≥ 0, n is the number to compute the factorial of
3)
Post: the factorial of n is computed
4)
if n < 2
5)
return 1
6)
end if
7)
factorial ← 1
8)
for i ← 2 to n
9)
factorial ← factorial ∗ i
10)
end for
11)
return factorial 12) end Factorial

CHAPTER 9
...
6

76

Summary

In this chapter we have presented several numeric algorithms, most of which are
simply here because they were fun to design
...
Numeric algorithms in
particular drive some of the most advanced systems on the planet computing such
data as weather forecasts
...
1

Sequential Search

A simple algorithm that search for a specific item inside a list
...

1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)

10
...
Count and list[index] 6= item
index ← index + 1
end while
if index < list
...
In addition to
searching for an item, it takes into account its frequency by swapping it with it’s
predecessor in the list
...

Figure 10
...

76

78

CHAPTER 10
...
1: a)
Search(12), b)
Search(101)
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)

algorithm ProbabilitySearch(list, item)
Pre: list 6= ∅
Post: a boolean indicating where the item is found or not;
in the former case swap founded item with its predecessor
index ← 0
while index < list
...
Count or list[index] 6= item
return false
end if
if index > 0
Swap(list[index],list[index − 1])
end if
return true
end ProbabilitySearch

10
...
We have
presented more efficient searching algorithms earlier on, like for instance the
logarithmic searching algorithm that AVL and BST tree’s use (defined in §3
...
We
decided not to cover a searching algorithm known as binary chop (another name
for binary search, binary chop usually refers to its array counterpart) as CHAPTER
10
...

79

Searching algorithms and their efficiency largely depends on the underlying
data structure being used to store the data
...
If you are going to search for data fairly often then we
strongly advise that you sit down and research the data structures available to
you
...
Model your data and then research the data structures that
best fit your scenario
...
The algorithms
presented are based on problems the authors have come across previously, or
were formulated to satisfy curiosity
...
1

Reversing the order of words in a sentence

Defining algorithms for primitive string operations is simple, e
...
extracting a substring of a string, however some algorithms that require more inventiveness can
be a little more tricky
...
This algorithm works
on the principal that words are all delimited by white space, and using a few
markers to define where words start and end we can easily reverse them
...
STRINGS

4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)
33)

11
...
Length − 1
start ← last
while last ≥ 0
// skip whitespace
while start ≥ 0 and value[start] = whitespace
start ← start − 1
end while
last ← start
// march down to the index before the beginning of the word
while start ≥ 0 and start =6
whitespace
start ← start − 1
end while
// append chars from start + 1 to length + 1 to string buffer sb
for i ← start + 1 to last
sb
...
Append(‘ ’)
end if
last ← start − 1
start ← last
end while
// check if we have added one too many whitespace to sb
if sb[sb
...
Length ← sb
...

The algorithm that we present has a O(n) run time complexity
...
These pointers march in towards each other always checking that each
character they point to is the same with respect to value
...
1 shows the
IsPalindrome algorithm in operation on the string “Was it Eliot’s toilet I saw?” If
you remove all punctuation, and white space from the aforementioned string you
will find that it is a valid palindrome
...
STRINGS

82

Figure 11
...
Strip()
...
Length −1
while word[left] = word[right] and left < right
left ← left + 1
right ← right − 1
end while
return word[left] = word[right]
end IsPalindrome

In the IsPalindrome algorithm we call a method by the name of Strip
...
As a result
word contains a heavily compacted representation of the original string, each
character of which is in its uppercase representation
...

11
...
tracking when we are in a string
2
...
skipping white space that delimits the words
As an example consider the string “Ben ate hay” Clearly this string contains
three words, each of which distinguished via white space
...
index
2
...
inWord

83

CHAPTER 11
...
2: String with three words

Figure 11
...
If we are not currently hitting white space
we are in a word, the opposite is true if at the present index we are hitting white
space
...
We don’t take into account any particular splitting
symbols you may use, e
...
in
...
Split 1 can take a char (or array of
characters) that determines a delimiter to use to split the characters within the
string into chunks of strings, resulting in an array of sub-strings
...
2 we present a string indexed as an array
...
Figure
11
...

1)
algorithm WordCount(value)
2)
Pre:
value 6= ∅
3)
Post: the number of words contained within value is determined
4)
inWord ← true
5)
wordCount ← 0
6)
index ← 0
7)
// skip initial white space
8)
while value[index] = whitespace and index < value
...
Length and value[index] = whitespace
13)
return 0
14)
end if
15)
while index < value
...
Length −1

1

http://msdn
...
com/en-us/library/system
...
split
...
STRINGS

19)
20)
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)

84

index ← index + 1
end while
inWord ← false
wordCount ← wordCount + 1
else
inWord ← true
end if
index ← index + 1
end while
// last word may have not been followed by whitespace
if inWord
wordCount ← wordCount + 1
end if
return wordCount 33) end WordCount

11
...

If we split all the words using a single occurrence of white space as our delimiter
we get all the words within the string back as elements of an array
...
All that is left
to do is subtract the unique word count from the total number of stings contained
in the array returned from the split operation
...
3
...
4: a) Undesired uniques set; b) desired uniques set
1)
2)
3)
4)

algorithm RepeatedWordCount(value)
Pre: value =6 ∅
Post: the number of repeated words in value is returned
words ← value
...
STRINGS

5)
6)
7)
8)
9)
10)

uniques ← Set
foreach word in words
uniques
...
Strip())
end foreach
return words
...
Count
end RepeatedWordCount

You will notice in the RepeatedWordCount algorithm that we use the Strip
method we referred to earlier in §11
...
This simply removes any punctuation from
a word
...
g
...
Figure 11
...

11
...
Put simply, we can parse the strings
considered using a double loop and check, discarding punctuation, the equality
between any characters thus returning a non-negative index that represents the
location of the first character in the match (Figure 11
...
This approach exhibit a run time complexity of O(n2)
...
5: a) First Step; b) Second Step c) Match Occurred
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)

algorithm Any(word,match)
Pre: word,match 6= ∅
Post: index representing match location if occured, −1 otherwise
for i ← 0 to word
...
Length − 1
while match[index] = whitespace
index ← index + 1
end while

0

CHAPTER 11
...
6

86

if match[index] = word[i]
return index
end if
end for
end for
return −1
end Any

Summary

We hope that the reader has seen how fun algorithms on string data types are
...
We for one find strings fascinating
...
Now that we have spurred you along a little with our
introductory algorithms you can devise some of your own
...
In most cases tracing an algorithm
only requires a single table
...
This diagram
will be used to visualise the problem more effectively
...

The trace table will store information about the variables used in your
algorithm
...
Such an approach allows you to attain a history of the various
values each variable has held
...

We have found this approach both simple, and powerful
...

In this chapter we will show you how to work through both iterative, and
recursive algorithms using the technique outlined
...
1

Iterative algorithms

We will trace the IsPalindrome algorithm (defined in §11
...
Before we even look at the variables the algorithm uses,
first we will look at the actual data structure the algorithm operates on
...
Each character in the string can be accessed via an
index much like you would do when accessing items within an array
...

For our example we will use IsPalindrome to operate on the string “Never odd
or even” Now we know how the string data structure is represented, and the value
of the string we will operate on let’s go ahead and draw it as shown in Figure A
...

86

Figure A
...
ALGORITHM WALKTHROUGH

value word left
right
Table A
...
value
2
...
left
4
...
1
...
Table A
...

While this approach may look a little bloated in print, on paper it is much more
compact
...

There is one other point that we should clarify at this time - whether to include
variables that change only a few times, or not at all in the trace table
...
2
we have included both the value, and word variables because it was convenient to
do so
...
1) and only use the trace table for variables whose values change
during the algorithm
...

value
“Never odd or even”

word
“NEVERODDOREVEN”

left
0
1

right
13
12

2

11

3

10

4

9

5

8

6

7

7

6

Table A
...
You can use these trace tables to verify algorithm correctness
...
Visualising the problem domain and

APPENDIX A
...
Moreover you
always have a point of reference which you can look back on
...
2

Recursive Algorithms

For the most part working through recursive algorithms is as simple as walking
through an iterative algorithm
...
Most recursive algorithms are much
simple to follow when you draw out the recursive calls rather than using a table
based approach
...

1)
2)
3)
4)
5)
6)
7)
8)
9)
10)

algorithm Fibonacci(n)
Pre: n is the number in the fibonacci sequence to compute
Post: the fibonacci sequence number n has been computed
if n < 1
return 0
else if n < 2
return 1
end if
return Fibonacci(n − 1) + Fibonacci(n − 2)
end Fibonacci

Before we jump into showing you a diagrammtic representation of the
algorithm calls for the Fibonacci algorithm we will briefly talk about the cases of
the algorithm
...
n < 1
2
...
n ≥ 2
The first two items in the preceeding list are the base cases of the algorithm
...
The third item from the list is our recursive case
...

Figure A
...
In
Figure A
...
Figure A
...
In Figure A
...

It is important to note that each recursive call only ever returns to its caller
upon hitting one of the two base cases
...
Upon hitting a base case you go back to

APPENDIX A
...
2: Call chain for Fibonacci algorithm

Figure A
...
Execution in the caller is
contiued at the next statement, or expression after the recursive
call was made
...
When the first recursive call (Fibonacci(n − 1))
returns to the caller we then execute the the second recursive call
(Fibonacci(n − 2))
...

Recursive algorithms are much easier to demonstrate diagrammatically as
Figure A
...
When you come across a recursive algorithm draw
method call diagrams to understand how the algorithm works at a high level
...
ALGORITHM WALKTHROUGH

A
...
In order to understand an algorithm try and work
through it using trace tables
...

In the vast majority of cases implementing an algorithm is simple provided
that you know how the algorithm works
...

92

Appendix B

Translation Walkthrough
The conversion from pseudo to an actual imperative language is usually very
straight forward, to clarify an example is provided
...
1 to the C# language
...
Floor(Math
...

A consideration to take note of is that many algorithms have fairly strict
preconditions, of which there may be several - in these scenarios you will need to
inject the correct code to handle such situations to preserve the correctness of the
algorithm
...

91
APPENDIX B
...
1

Summary

As you can see from the example used in this chapter we have tried to make the
translation of our pseudo code algorithms to mainstream imperative languages as
simple as possible
...

94

Appendix C

Recursive Vs
...
One of
the biggest advantages recursive methods bring to the table is that they usually
result in more readable, and compact solutions to problems
...
Generally a
recursive algorithms has two main properties:
1
...
A recursive case
For now we will briefly cover these two aspects of recursive algorithms
...
The trouble we speak of manifests itself typically as
a stack overflow, we will describe why later
...
An iterative solution uses no recursion whatsoever
...
g
...
The down
side to iterative algorithms is that they tend not to be as clear as to their recursive
counterparts with respect to their operation
...
Most production software you will find uses little or no
recursive algorithms whatsoever
...
g
...
Normally it is systems level code that has this zero tolerance policy for
recursive algorithms
...
O(n2)
2
...
RECURSIVE VS
...
O(2n)

If you use recursion for algorithms with any of the above run time efficiency’s
you are inviting trouble
...

While constantly splitting problems into smaller problems is good practice, in
these cases you are going to be spawning a lot of method calls
...
When you exceed the allotted stack space for a thread the process will be
shutdown by the operating system
...
g
...
You can ask for a bigger stack size, but you
typically only want to do this if you have a very good reason to do so
...
1

Activation Records

An activation record is created every time you invoke a method
...
Activation records take a small amount of time to create, and are pretty
lightweight
...
Consider an algorithm
that when invoked given a specific value it creates many recursive calls
...
We will have to wait until the
activation records start to be unwound after the nested methods in the call chain
exit and return to their respective caller
...
Unwinding an activation record results in several steps:
1
...
The return address is popped off the stack
3
...
RECURSIVE VS
...
Recursive algorithms can exhaust the stack size allocated to
the thread fairly fast given the chance
...
recursive solution in the form of the Fibonacci algorithm
...
The iterative solution is not as pretty, nor self documenting but it does

96

the job a lot quicker
...
The iterative version on the other hand has a O(n) run time
...
This example is mainly used to shock programmers into
thinking about the ramifications of recursion rather than warning them off
...
2

Some problems are recursive in nature

Something that you may come across is that some data structures and algorithms
are actually recursive in nature
...

A common tree node usually contains a value, along with two pointers to two
other nodes of the same node type
...

When using recursive algorithms on tree’s it makes sense as you are simply
adhering to the inherent design of the data structure you are operating on
...

We can also look at sorting algorithms like merge sort, and quick sort
...

C
...
Often
software projects will take a trade between readability, and efficiency in which
case recursion is great provided you don’t go and use it to implement an algorithm
with a quadratic run time or higher
...
Defensive coding will always prevail
...
Using recursion in such scenarios is
perfectly acceptable
...
Its iterative counterpart is probably less lines of code than its
recursive counterpart
...
It may be the case that your compiler recognises
things like tail recursion and can optimise them
...
The amount of optimisation compilers can

APPENDIX C
...
ITERATIVE SOLUTIONS

96

do though is somewhat limited by the fact that you are still using recursion
...

98

APPENDIX D
...
Testing has often been
discarded by many developers in the belief that the burden of proof of their
software is on those within the company who hold test centric roles
...
As a developer you should at least provide a suite of unit
tests that verify certain boundary conditions of your software
...
If
you add or tweak algorithms and then run your suite of tests you will be quickly
alerted to any cases that you have broken with your recent changes
...
Of course in order to attain such a standard you need
to think carefully about the tests that you construct
...
Most modern languages like C++, C#, and Java
offer an impressive catalogue of testing frameworks that you can use for unit
testing
...
, http://www
...
org/
NUnit: Can be used with languages that target Microsoft’s Common Language
Runtime
...
nunit
...
php
Boost Test Library: Targeted at C++
...
http://www
...
org
...
boost
...
html
CppUnit: Targeted at C++
...
sourceforge
...
The ones listed are the testing frameworks that we
believe are the most popular for C++, C#, and Java
...
1

What constitutes a unit test?

A unit test should focus on a single atomic property of the subject being tested
...
As an example if you were wanting to write a test that verified
that a particular value V is returned from a specific input I then your test should
do the smallest amount of work possible to verify that V is correct given I
...

As well as a unit test being relatively atomic you should also make sure that
your unit tests execute quickly
...
Failure to attain such a goal will most likely result in the
suite of tests not being ran that often by the developers on your team
...

Building up a test suite can help greatly in a team scenario, particularly when
using a continuous build server
...

Employing such strategies can help you catch niggling little error cases early
rather than via your customer base
...

D
...
In recent years a test driven approach to development has become very
popular
...

One of the founding principles of TDD is to write the unit test first, watch it fail
and then make it pass
...
We have
found this approach to provide a more structured intent to the implementation of
algorithms
...
Because TDD makes you write the tests up front you never find yourself in a
situation where you forget, or can’t be bothered to write tests for your code
...
We, as the authors of this book ourselves use TDD as our
preferred method
...

Green: The failing test now passes
...
Your task at this stage is solely to make the test
pass, that is to make the respective test green
...
TESTING

100

the restructuring of your program to make it as readable and maintainable as
possible
...
If you adhere to progressive revisions of your algorithm
restructuring when appropriate you will find that using TDD you can implement
very cleanly structured types and so on
...
3

How seriously should I view my test suite?

Your tests are a major part of your project ecosystem and so they should be
treated with the same amount of respect as your production code
...

Employing a methodology like TDD, or testing after implementing you will find
that you spend a great amount of time writing tests and thus they should be
treated no differently to your production code
...

D
...
A popular approach - the three A’s is described in the following list:
Assemble: Create the objects you require in order to perform the state based assertions
...

Assert: Specify what you expect to hold after the previous two steps
...
MethodA();
// assert
Assert
...
BoolExpr)
}

D
...
g
...

Typically all tests are abstracted from production code
...

We can also use things like inheritance etc when defining classes of tests
...

D
...
Code coverage is merely an indicator as to the portions of production
code that your units tests cover
...

D
...
Moreover unit testing
can be used to create a safety blanket when adding and removing features
providing an early warning for breaking changes within your production code
...
TESTING

Appendix E

Symbol Definitions
Throughout the pseudocode listings you will find several symbols used, describes
the meaning of each of those symbols
...

=
Equality
...

<
Less than
...

>
Greater than
...

∅
Null
...

or
Logical or
...

yield
Like return but builds a sequence
...
1: Pseudo symbol definitions
* This symbol has a direct translation with the vast majority of imperative
counterparts

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview