DSA Notes | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: DSA Notes
Description: This pdf is based upon DSA notes which will help you to solve data scientist and algorithms

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Lecture Notes for

Data Structures and Algorithms

Revised each year by John Bullinaria
School of Computer Science
University of Birmingham
Birmingham, UK

Version of 27 March 2019

These notes are currently revised each year by John Bullinaria
...
All are members
of the School of Computer Science, University of Birmingham, UK
...
1 Algorithms as opposed to programs
...
2 Fundamental questions about algorithms
...
3 Data structures, abstract data types, design
1
...

1
...

...

patterns

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...
1 Arrays
...
2 Loops and Iteration
...
3 Invariants
...
1 Linked Lists
...
2 Recursion
...
3 Stacks
...
4 Queues
...
5 Doubly Linked Lists
...
6 Advantage of Abstract Data Types

...

...

...

...

...

...

...

...

...

...

...

...

4 Searching
4
...

4
...

4
...

4
...
1 Time versus space complexity
...
2 Worst versus average complexity
...
3 Concrete measures for performance
...
4 Big-O notation for complexity class
...
5 Formal definition of complexity classes

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

12
12
15
16
17
18
20

...

...

21
21
22
22
23

...

...

...
1 General specification of trees
...
2 Quad-trees
...
3 Binary trees
...
4
6
...
6
6
...
8

Primitive operations on binary trees
The height of a binary tree
...

Implementation of trees
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

7 Binary Search Trees
7
...

7
...

7
...

7
...

7
...

7
...

7
...

7
...
9 Sorting using binary search trees
...
10 Balancing binary search trees
...
11 Self-balancing AVL trees
...
12 B-trees
...
1 Trees stored in arrays
...
2 Priority queues and binary heap trees
8
...
4 Inserting a new heap tree node
...
5 Deleting a heap tree node
...
6 Building a new heap tree from scratch
8
...

8
...

8
...

8
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...
1 The problem of sorting
...
2 Common sorting strategies
...
3 How many comparisons must it take?
...
4 Bubble Sort
...
5 Insertion Sort
...
6 Selection Sort
...
7 Comparison of O(n2 ) sorting algorithms
...
8 Sorting algorithm stability
...
9 Treesort
...
10 Heapsort
...
11 Divide and conquer algorithms
...
12 Quicksort
...
13 Mergesort
...
14 Summary of comparison-based sorting algorithms

3

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

34
36
37
37
38

...

...

...

...

...

...

40
40
40
41
41
42
43
44
46
47
48
48
49

...

...

...

...

...

51
51
52
53
54
55
56
58
59
61
62

...

...

...

...

...

...

...

63
63
64
64
66
67
69
70
71
71
72
74
75
79
81

9
...
81
9
...
83
10 Hash Tables
10
...

10
...

10
...

10
...

10
...
6 A simple Hash Table in operation
...
7 Strategies for dealing with collisions
...
8 Linear Probing
...
9 Double Hashing
...
10Choosing good hash functions
...
11Complexity of hash tables
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

85
85
85
87
87
88
89
90
92
94
96
96

11 Graphs
11
...

11
...

11
...

11
...

11
...

11
...

11
...

11
...

11
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

12 Epilogue
A Some Useful Formulae
A
...

A
...

A
...

A
...

A
...

118

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

4

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

119

...
119

...
120

...
We shall see how
they depend on the design of suitable data structures, and how some structures and algorithms
are more efficient than others for the same task
...

We will start by studying some key data structures, such as arrays, lists, queues, stacks
and trees, and then move on to explore their use in a range of different searching and sorting
algorithms
...
Finally, we will look at graph based representations and cover the kinds
of algorithms needed to work efficiently with them
...

We will not restrict ourselves to implementing the various data structures and algorithms
in particular computer programming languages (e
...
, Java, C , OCaml ), but specify them in
simple pseudocode that can easily be implemented in any appropriate language
...
1

Algorithms as opposed to programs

An algorithm for a particular task can be defined as “a finite sequence of instructions, each
of which has a clear meaning and can be performed with a finite amount of effort in a finite
length of time”
...
However, in order to be executed by a computer, we will generally need a program that
is written in a rigorous formal language; and since computers are quite inflexible compared
to the human mind, programs usually need to contain more details than algorithms
...

The task of implementing the discussed algorithms as computer programs is important,
of course, but these notes will concentrate on the theoretical aspects and leave the practical
programming aspects to be studied elsewhere
...
It is also worth bearing in mind the distinction
between different programming paradigms: Imperative Programming describes computation in
terms of instructions that change the program/data state, whereas Declarative Programming

5

specifies what the program should accomplish without describing how to do it
...

Algorithms can obviously be described in plain English, and we will sometimes do that
...
This is called pseudocode, which comes in a variety of
forms
...

1
...
What is it supposed to do?
2
...
How efficiently does it do it?
The technical terms normally used for these three aspects are:
1
...

2
...

3
...

The details of these three aspects will usually be rather problem dependent
...
Sometimes that will be based on a particular representation of the
associated data, and sometimes it will be presented more abstractly
...

For simple problems, it is often easy to see that a particular algorithm will always work,
i
...
that it satisfies its specification
...

In this case, we need to spend some effort verifying whether the algorithm is indeed correct
...
However, since the number of different potential inputs for most algorithms is
infinite in theory, and huge in practice, more than just testing on particular cases is needed
to be sure that the algorithm satisfies its specification
...
Although
we will discuss proofs in these notes, and useful relevant ideas like invariants, we will usually
only do so in a rather informal manner (though, of course, we will attempt to be rigorous)
...
Formal
verification techniques are complex and will normally be left till after the basic ideas of these
notes have been studied
...
This will
6

usually depend on the problem instance size, the choice of data representation, and the details
of the algorithm
...
We shall study the general ideas concerning efficiency in Chapter 5, and then
apply them throughout the remainder of these notes
...
3

Data structures, abstract data types, design patterns

For many problems, the ability to formulate an efficient algorithm depends on being able to
organize the data in an appropriate manner
...
These notes will look at
numerous data structures ranging from familiar arrays and lists to more complex structures
such as trees, heaps and graphs, and we will see how their choice affects the efficiency of the
algorithms based upon them
...
We can do this by formulating abstract mathematical models
of particular classes of data structures or data types which have common features
...
Typically, we specify how they are built out of more primitive data types (e
...
, integers
or strings), how to extract that data from them, and some basic checks to control the flow of
processing in algorithms
...
We shall see many examples of
abstract data types throughout these notes
...
These embody and generalize important
design concepts that appear repeatedly in many problem contexts
...

These can speed up the development of algorithms by providing familiar proven algorithm
structures that can be applied straightforwardly to new problems
...

1
...

The lectures associated with these notes are designed to help you understand them and fill in
some of the gaps they contain, but that is unlikely to be enough because often you will need
to see more than one explanation of something before it can be fully understood
...
The subject of these notes is a
classical topic, so there is no need to use a textbook published recently
...
The
reason is that these notes cover important fundamental material that is taught in all university
degrees in computer science
...
It is a good idea to go
to your library and browse the shelves of books on data structures and algorithms
...
Wikipedia is generally a good source of fairly
reliable information on all the relevant topics, but you hopefully shouldn’t need reminding
that not everything you read on the internet is necessarily true
...

1
...
Data structures will be formulated to represent various types of information in
such a way that it can be conveniently and efficiently manipulated by the algorithms we
develop
...

We shall begin by looking at some widely used basic data structures (namely arrays,
linked lists, stacks and queues), and the advantages and disadvantages of the associated
abstract data types
...
That will leave
us with the necessary tools to study three particularly important data structures: trees (in
particular, binary search trees and heap trees), hash tables, and graphs
...
The idea is that once the basic ideas and examples
covered in these notes are understood, dealing with more complex problems in the future
should be straightforward
...
Generally, we need to build algorithms that manipulate collections of such objects,
so we need procedures for storing and sequentially processing them
...
1

Arrays

In computer science, the obvious way to store an ordered collection of items is as an array
...
We can just write the items
in order, separated by commas and enclosed by square brackets
...
If we call this array a, we can write it as:
a = [1, 4, 17, 3, 90, 79, 4, 6, 81]
This array a has 9 items, and hence we say that its size is 9
...
When we work with arrays in computer science, however, we more often
(though not always) start from 0
...
, 7, 8
...
More
generally, for any integer i denoting a position, we write a[i] to denote the element in the ith
position
...
Then, in the above
example, a[0] = 1, a[1] = 4, a[2] = 17, and so on
...
In mathematics,
it stands for equality
...
We will typically use = in its mathematical meaning, unless it
is written as part of code or pseudocode
...
Algorithms that process data
stored as arrays will typically need to visit systematically all the items in the array, and apply
appropriate operations on them
...
2

Loops and Iteration

The standard approach in most programming languages for repeating a process a certain
number of times, such as moving sequentially through an array to perform the same operations
on each item, involves a loop
...
,N,
do something
and in programming languages like C and Java this would be written as the for-loop
for( i = 0 ; i < N ; i++ ) {
// do something
}
in which a counter i keep tracks of doing “the something” N times
...
The general for-loop structure is
for( INITIALIZATION ; CONDITION ; UPDATE ) {
REPEATED PROCESS
}
in which any of the four parts are optional
...

2
...
It may be a simple inequality, such as “i < 20”, or something
more abstract, such as “the items in the array are sorted”
...

In particular, a loop-invariant is a condition that is true at the beginning and end of every
iteration of the given loop
...
,a[0]
for(int i = 1 ; i != n ; i++) {
// min equals the minimum item in a[0],
...
,a[i-1], and i==n
return min;
}
At the beginning of each iteration, and end of any iterations before, the invariant “min equals
the minimum item in a[0],
...
Hence, when the loop terminates with “i == n”, we
know that “min equals the minimum item in a[0],
...
This is a kind of proof by induction:
the invariant is true at the start of the loop, and is preserved by each iteration of the loop,
therefore it must be true at the end of the loop
...
We will also see how invariants (sometimes called
inductive assertions) can be used to formulate similar correctness proofs concerning properties
of data structures that are defined inductively
...
However, arrays are not always the
most efficient way to store collections of items
...

As we explore the details of storing collections as lists, the advantages and disadvantages of
doing so for different situations will become apparent
...
1

Linked Lists

A list can involve virtually anything, for example, a list of integers [3, 2, 4, 2, 5], a shopping
list [apples, butter, bread, cheese], or a list of web pages each containing a picture and a
link to the next web page
...

Graphical Representation
Non-empty lists can be represented by two-cells, in each of which the first cell contains a
pointer to a list element and the second cell contains a pointer to either the empty list or
another two-cell
...
For instance, the list [3, 1, 4, 2, 5] can be represented as:

-

-

-

-

?

?

?

?

?

3

1

4

2

5

Abstract Data Type “List”
On an abstract level , a list can be constructed by the two constructors:
• EmptyList, which gives you the empty list, and

12

• MakeList(element, list), which puts an element at the top of an existing list
...

and it is clearly possible to construct any list in this way
...
It starts with the “base case”, the EmptyList, and
then builds up increasingly complex lists by repeatedly applying the “induction step”, the
MakeList(element, list) operator
...
The way to proceed is to note
that a list is always constructed from the first element and the rest of the list
...
This
can be done using the two selectors, also called accessor methods:
• first(list), and
• rest(list)
...

We call everything a list that can be constructed by the constructors EmptyList and
MakeList, so that with the selectors first and rest and the condition isEmpty, the following
relationships are automatically satisfied (i
...
true):
• isEmpty(EmptyList)
• not isEmpty(MakeList(x, l)) (for any x and l)
• first(MakeList(x, l)) = x
• rest(MakeList(x, l)) = l
In addition to constructing and getting back the components of lists, one may also wish to
destructively change lists
...

and then applying replaceRest([6, 2, 3, 4], l) changes it to [9, 6, 2, 3, 4]
...
Throughout these notes, we will be formulating our data
representations and algorithms in terms of appropriate definitions of them
...

The above list could be represented in XML as:

However, there are usually many different ways to represent the same object in XML
...
XML is flexible
enough to represent and communicate very complicated structures in a uniform way
...

The programming language Lisp and its derivates, for instance, take lists as the most
important primitive data structure
...
However, that can be problematic because lists are conceptually not limited in
size, which means array based implementation with fixed-sized arrays can only approximate
the general concept
...
g
...
More general
purpose implementations follow a pointer based approach, which is close to the diagrammatic
representation given above
...

3
...
When items are stored as linked-lists, there is no index for each item,
and recursion provides the natural way to process them
...
We will now look
at how to implement two important derived procedures on lists, last and append, which
illustrate how recursion works
...
This algorithm can be written in pseudocode as:
last(l) {
if ( isEmpty(l) )
error(‘Error: empty list in last’)
elseif ( isEmpty(rest(l)) )
return first(l)
else
return last(rest(l))
}
The running time of this depends on the length of the list, and is proportional to that length,
since last is called as often as there are elements in the list
...
Compared to the constant time complexity
which access to the last element of an array has, this is quite bad
...

Another useful procedure allows us to append one list l2 to another list l1
...

15

3
...
They are the ideal data structure
to model a First-In-Last-Out (FILO), or Last-In-First-Out (LIFO), strategy in search
...
For instance, the stack created by inserting
the numbers [3, 1, 4, 2, 5] in that order would be represented as:

-

-

-

-

?

?

?

?

?

5

2

4

1

3

Abstract Data Type “Stack”
Despite their relation to linked lists, their different use means the primitive operators for
stacks are usually given different names
...

The selectors will work only for non-empty stacks, hence we need a condition which tells
whether a stack is empty:
• isEmpty(stack)
We have equivalent automatically-true relationships to those we had for the lists:
• isEmpty(EmptyStack)
• not isEmpty(push(x, s)) (for any x and s)
• top(push(x, s)) = x
• pop(push(x, s)) = s
In summary, we have the direct correspondences:
List
Stack

constructors
EmptyList
MakeList
EmptyStack
push

selectors
first
rest
top
pop

condition
isEmpty
isEmpty

So, stacks and linked lists are the same thing, apart from the different names that are used
for their constructors and selectors
...
So far we have implied
a functional approach
...
That is, there are at least two stacks
around, the original one and the newly created one
...

If we apply top to a particular stack, we will always get the same element
...
Instead it might be better to think of a single
stack which is destructively changed, so that after applying push the original stack no longer
exits, but has been changed into a new stack with an extra element
...
However, as long as we keep this difference in
mind, ignoring such implementational details should not cause any problems
...
4

Queues

A queue is a data structure used to model a First-In-First-Out (FIFO) strategy
...

Graphical Representation
A queue can be graphically represented in a similar way to a list or stack, but with an
additional two-cell in which the first element points to the front of the list of all the elements
in the queue, and the second element points to the last element of the list
...
In particular, they can both be done with
constant effort, i
...
independently of the queue length
...

For instance, by applying push(5, q) where q is the queue above, we get
17

?

?
-

-

-

-

?

?

?

?

?

3

1

4

2

5

The two selectors are the same as for stacks:
• top(queue), which gives the top element of a queue, that is, 3 in the example, and
• pop(queue), which gives the queue without the top element
...

3
...
For a simple list of numbers, a linked list and a doubly linked list may look the same,
e
...
, [3, 1, 4, 2, 5]
...

Graphical Representation
Non-empty doubly linked lists can be represented by three-cells, where the first cell contains a
pointer to another three-cell or to the empty list, the second cell contains a pointer to the list
element and the third cell contains a pointer to another three-cell or the empty list
...
For instance,
[3, 1, 4, 2, 5] would be represented as doubly linked list as:

-

-

-

-

?

?

?

?

?

3

1

4

2

5

Abstract Data Type “Doubly Linked List”
On an abstract level , a doubly linked list can be constructed by the three constructors:
• EmptyList, the empty list, and

18

• MakeListLeft(element, list), which takes an element and a doubly linked list and
returns a new doubly linked list with the element added to the left of the original
doubly linked list
...

It is clear that it may possible to construct a given doubly linked list in more that one way
...

Then, since the selectors only work for non-empty lists, we also need a condition which returns
whether a list is empty:
• isEmpty(list)
This leads to automatically-true relationships such as:
• isEmpty(EmptyList)
• not isEmpty(MakeListLeft(x, l)) (for any x and l)
• not isEmpty(MakeListRight(x, l)) (for any x and l)
• firstLeft(MakeListLeft(x, l)) = x
• restLeft(MakeListLeft(x, l)) = l
• firstRight(MakeListRight(x, l)) = x
• restRight(MakeListRight(x, l)) = l

Circular Doubly Linked List
As a simple extension of the standard doubly linked list, one can define a circular doubly
linked list in which the left-most element points to the right-most element, and vice versa
...

19

3
...
For instance, the standard
abstract data type of a list does not offer an efficient procedure last(l) to give the last element
in the list, whereas it would be trivial to find the last element of an array of a known number
of elements
...
While last(l) and getItem(i, l) procedures can easily be implemented
using the primitive constructors, selectors, and conditions, they are likely to be less efficient
than making use of certain aspects of the underlying implementation
...
However, if we discipline ourselves to writing programs in
terms of the operations for manipulating abstract data types rather than making use of particular implementations details, then we can modify programs more
readily by reimplementing the operations rather than searching all programs for
places where we have made accesses to the underlying data structures
...
”
This advantage will become clearer when we study more complex abstract data types and
algorithms in later chapters
...
More
succinctly, this problem is known as searching
...

4
...

This is where data structures come in
...

We need to develop and study useful data structures that are closer to the way humans think,
or at least more structured than mere sequences of bits
...

After we have chosen a suitable representation, the represented information has to be
processed somehow
...
In this case, the process
of interest is that of searching
...
To begin with, let us consider:
1
...

2
...

As we have already noted, arrays are one of the simplest possible ways of representing collections of numbers (or strings, or whatever), so we shall use that to store the information to
be searched
...

Suppose, for example, that the set of integers we wish to search is {1,4,17,3,90,79,4,6,81}
...
If we ask where 91
is, the answer is nowhere
...
Since we start our index counting from 0, any negative number
would do
...
Other
(perhaps better) conventions are possible, but we will stick to this here
...
2

Specification of the search problem

We can now formulate a specification of our search problem using that data structure:
Given an array a and integer x, find an integer i such that
1
...
otherwise, i is any j for which a[j] is x
...
If there is more than one
position where x occurs, then this specification allows you to return any of them – for example,
this would be the case if a were [17, 13, 17] and x were 17
...

Hence different algorithms with different behaviours can satisfy the same specification – for
example, one algorithm may return the smallest position at which x occurs, and another may
return the largest
...
In fact, in practice,
they occur quite often
...
3

A simple algorithm: Linear Search

We can conveniently express the simplest possible algorithm in a form of pseudocode which
reads like English, but resembles a computer program without some of the precision or detail
that a computer usually requires:
// This assumes we are given an array a of size n and a key x
...
,n-1,
if a[i] is equal to x,
then we have a suitable i and can terminate returning i
...

Some aspects, such as the ellipsis “
...
In a programming
language such as C or Java, one would write something that is more precise like:
for ( i = 0 ; i < n ; i++ ) {
if ( a[i] == x ) return i;
}
return -1;
In the case of Java, this would be within a method of a class, and more details are needed,
such as the parameter a for the method and a declaration of the auxiliary variable i
...
In either,
there would need to be additional code to output the result in a suitable format
...
If we forget this, and let i run from
0 to n instead, we get an incorrect algorithm
...
Depending
on the particular language, operating system and machine you are using, the actual effect of
this error will be different
...
In Java, you will always get an error message
...
4

A more efficient algorithm: Binary Search

One always needs to consider whether it is possible to improve upon the performance of a
particular algorithm, such as the one we have just created
...
On average, it will take n/2 steps
...
Thus, we should
try to organize the collection in such a way that a more efficient algorithm is possible
...
Here
we shall consider one of the simplest – we still represent the collections by arrays, but now we
enumerate the elements in ascending order
...

Thus, instead of working with the previous array [1, 4, 17, 3, 90, 79, 4, 6, 81], we would work
with [1, 3, 4, 4, 6, 17, 79, 81, 90], which has the same items but listed in ascending order
...

// Use integers left and right (initially set to 0 and n-1) and mid
...

If a[left] is equal to x,
then
terminate returning left,
otherwise terminate returning -1
...
Because the array is sorted, it is easy to see which of each pair of segments the
searched-for item x is in, and the search can then be restricted to that segment
...
To see that this
runtime behaviour is a big improvement, in practice, over the earlier linear-search algorithm,
notice that log2 1000000 is approximately 20, so that for an array of size 1000000 only 20
iterations are needed in the worst case of the binary-search algorithm, whereas 1000000 are
needed in the worst case of the linear-search algorithm
...
Also, strictly speaking, this algorithm is not
correct because it does not work for the empty array (that has size zero), but that can easily
be fixed
...
Having done that, try to write down
some convincing arguments, maybe one that involves a loop invariant and one that doesn’t
...
Moreover, it is not unusual to end up with a better/clearer algorithm after it has
been modified to make its correctness easier to argue
...
It is fairly clear that we could perform a linear search through a linked
list in essentially the same way as with an array, with the relevant pointer returned rather
than an index
...
It seems that our array-based
approach is the best we can do with the data structures we have studied so far
...

Notice that we have not yet taken into account how much effort will be required to sort
the array so that the binary search algorithm can work on it
...
That may also depend on further
details, such as how many times we need to performa a search on the set of n items – just
once, or as many as n times
...
First we need to consider
in more detail how to compare algorithm efficiency in a reliable manner
...
So, before moving on to study increasingly complex data structures and
algorithms, we first look in more detail at how to measure and describe their efficiency
...
1

Time versus space complexity

When creating software for serious applications, there is usually a need to judge how quickly
an algorithm or program can complete the given tasks
...
It certainly has to be ensured
that the waiting time is reasonable for the size of the problem, and normally faster execution
is better
...

Another important efficiency consideration is how much memory a given program will
require for a particular task, though with modern computers this tends to be less of an issue
than it used to be
...

For a given task, there are often algorithms which trade time for space, and vice versa
...
It is
usually up to the algorithm/program designer to decide how best to balance the trade-off for
the application they are designing
...
2

Worst versus average complexity

Another thing that has to be decided when making efficiency considerations is whether it is
the average case performance of an algorithm/program that is important, or whether it is
more important to guarantee that even in the worst case the performance obeys certain rules
...
However, for
time-critical problems, such as keeping track of aeroplanes in certain sectors of air space, it
may be totally unacceptable for the software to take too long if the worst case arises
...
For example, the most efficient algorithm on average might have a particularly bad worst case efficiency
...

5
...
For this, we first have to decide how
to measure it
...
For one, if
it is a big application and there are several potential algorithms, they would all have to be
programmed first before they can be compared
...
Also, the machine on
which the program is run, or even the compiler used, might influence the running time
...
Again, particularly with big applications, this is not really
feasible
...

Therefore complexity is usually best measured in a different way
...
For this to be possible,
however, the algorithm has to be described in a way which very much looks like the program to
be implemented, which is why algorithms are usually best expressed in a form of pseudocode
that comes close to the implementation language
...
The
size of a problem is typically expressed as an integer, and that is typically the number of items
that are manipulated
...
So the complexity of an algorithm will be given by a function
which maps the number of items to the (usually approximate) number of time steps the
algorithm will take when performed on that many items
...
In today’s world, where computers have become
much faster, and often have dedicated floating-point hardware, the differences in time costs
have become less important
...
Just counting the most costly operations is often a good strategy
...
4

Big-O notation for complexity class

Very often, we are not interested in the actual function C(n) that describes the time complexity of an algorithm in terms of the problem size n, but just its complexity class
...

If an algorithm is such that we may consider all steps equally costly, then usually the
complexity class of the algorithm is simply determined by the number of loops and how often
the content of those loops are being executed
...

There is a standard notation, called the Big-O notation, for expressing the fact that
constant factors and other insignificant details are being ignored
...
Similarly, linear
search is O(n)
...

Before we define complexity classes in a more formal manner, it is worth trying to gain
some intuition about what they actually mean
...
Recall that we are
considering functions which map natural numbers (the size of the problem) to the set of nonnegative real numbers R+ , so the classes will correspond to common mathematical functions
such as powers and logarithms
...

The most common complexity classes (in increasing order) are the following:
• O(1), pronounced ‘Oh of one’, or constant complexity;
• O(log2 log2 n), ‘Oh of log log en’;
• O(log2 n), ‘Oh of log en’, or logarithmic complexity;
• O(n), ‘Oh of en’, or linear complexity;
• O(nlog2 n), ‘Oh of en log en’;
• O(n2 ), ‘Oh of en squared’, or quadratic complexity;
• O(n3 ), ‘Oh of en cubed’, or cubic complexity;
• O(2n ), ‘Oh of two to the en’, or exponential complexity
...
g
...
So assume
we have algorithms with these functions describing their complexity
...
56 × 102
2
...
55 × 104
1
...
16 × 1077
27

n = 1024
1
...
32 × 100
1
...
02 × 103
1
...
05 × 106
1
...
80 × 10308

n = 1048576
1
...
32 × 100
2
...
05 × 106
2
...
10 × 1012
1
...
74 × 10315652

Some of these numbers are so large that it is rather difficult to imagine just how long a
time span they describe
...
1 msec
65
...
05 msec
65
...
8 sec
3
...
32 µsec
10 µsec
1
...
02 msec
1
...
9 min
5
...
32 µsec
20 µsec
1
...
8 wk
36, 559 yr
2
...
For algorithms
with exponential complexity, O(2n ), even modest sized problems have run times that are
greater than the age of the universe (about 1
...
This is why complexity classes are so important –
they tell us how feasible it is likely to be to run a program with a particular large number
of data items
...

Another useful way of thinking about growth classes involves considering how the compute
time will vary if the problem size doubles
...
We can test our program on a problem
that is a half or quarter or one eighth of the full size, and have a good idea of how long we
will have to wait for the full size problem to finish
...

The following graph plots some of the complexity class functions from the table
...

28

100
n

2
90

n2

80
n log n
70

60
n
50

40

30

20

10
log n
0

10

20

30

40

50

60

70

80

90

100

It is clear from these plots why the non-principal growth terms can be safely ignored when
computing algorithm complexity
...
5

Formal definition of complexity classes

We have noted that complexity classes are concerned with growth, and the tables and graph
above have provided an idea of what different behaviours mean when it comes to growth
...
Let us now consider
a more formal definition of a ‘big O’ class:
Definition
...
We say that the
function g is ‘eventually smaller’ than the function c ∗ f
...
First, we do not need to know exactly when g
becomes smaller than c ∗ f
...
Second, we wish to consider the efficiency of an algorithm
independently of the speed of the computer that is going to execute it
...
The idea is that when we measure the time of the steps of a
particular algorithm, we are not sure how long each of them takes
...
This definition also makes it clear that constant factors do not change
the growth class (or O-class) of a function
...
So we can write O(n2 ) = O(1000000 ∗ n2 ) =
O(1/1000000 ∗ n2 )
...
In this case it is O(n2 )
...
This allows us to simplify terms
...
3n + 100 can be determined as
follows
...

When we say that an algorithm ‘belongs to’ some class O(f ), we mean that it is at most
as fast growing as f
...
e
...
This
holds for the average case as well as the worst case
...
In the
worst case, we have to check all n entries until we find the right one, which means we make
n comparisons
...
Both those functions, C(n) = n and C(n) = n/2
belong to the same complexity class, namely O(n)
...
But this would
be less informative, and we would not say that an algorithm has quadratic complexity if we
know that, in fact, it is linear
...

The issue of efficiency and complexity class, and their computation, will be a recurring
feature throughout the chapters to come
...
In most cases, we can determine the time complexity by
a simple counting of the loops and tree heights
...

30

Chapter 6

Trees
In computer science, a tree is a very general and powerful data structure that resembles a real
tree
...

6
...

It is usually easiest to represent trees pictorially, so we shall frequently do that
...
1:
8

3

1

11

6

9

7

10

14

12

15

Figure 6
...

More formally, a tree can be defined as either the empty tree, or a node with a list of successor
trees
...
We will refer to the label of a node as its value
...
g
...

In order to talk rigorously about trees, it is convenient to have some terminology: There
always has to be a unique ‘top level’ node known as the root
...
1, this is the node
labelled with 8
...
Then, given a node, every node on the next
level ‘down’, that is connected to the given node via a branch, is a child of that node
...
1, the children of node 8 are nodes 3 and 11
...
For instance,
node 11 is the parent of node 9 (and of node 14 as well)
...

If a node is the child of a child of
...
Conversely, the second node is an ancestor of the first node
...
g
...
1)
...
Trees have the property
that for every node there is a unique path connecting it with the root
...
The depth or level of a node is given by the length of this path
...
The maximal length of a
path in a tree is also called the height of the tree
...
The size of a tree is given by the number of nodes it contains
...

The tree in Figure 6
...
A tree consisting of just of one node has
height 0 and size 1
...

Like most data structures, we need a set of primitive operators (constructors, selectors
and conditions) to build and manipulate the trees
...
We will now look at some particularly useful types of tree
...
2

Quad-trees

A quadtree is a particular type of tree in which each leaf-node is labelled by a value and each
non-leaf node has exactly four children
...
g
...

Formally, a quadtree can be defined to be either a single node with a number or value
(e
...
, in the range 0 to 255), or a node without a value but with four quadtree children: lu,
ll, ru, and rl
...
A quad tree is either
(Rule 1) a root node with a value, or
(Rule 2) a root node without a value and four quad tree children: lu, ll, ru, and rl
...

We say that a quadtree is primitive if it consists of a single node/number, and that can
be tested by the corresponding condition:
• isValue(qt), which returns true if quad-tree qt is a single node
...

• makeQT(luqt, ruqt, llqt, rlqt), which builds a quad-tree from four constituent quadtrees luqt, llqt, ruqt, rlqt
...

• ru(qt), which returns the right-upper quad-tree
...

• rl(qt), which returns the right-lower quad-tree
...
For cases when isValue(qt) is true, we
could define an operator value(qt) that returns the value, but conventionally we simply say
that qt itself is the required value
...
A simple example would be:

0

50

10

60

70

110 120

80

100 90

40

20

30

We can then create algorithms using the operators to perform useful manipulations of the
representation
...

There exist numerous variations of this general idea, such coloured quadtrees which store
value-triples that represent colours rather than grey-scale, and edge quad-trees which store
lines and allow curves to be represented with arbitrary precision
...
3

Binary trees

Binary trees are the most common type of tree used in computer science
...
A binary tree is either
(Rule 1) the empty tree EmptyTree, or
(Rule 2) it consists of a node and two binary trees, the left subtree and right subtree
...
This definition may appear
circular, but actually it is not, because the subtrees are always simpler than the original one,
and we eventually end up with an empty tree
...
Day 0 is when you “get off the ground” by applying Rule 1 to get the empty tree
...
Thus, for example, on day 1 you can create exactly trees that have
a root with a value, but no children (i
...
both the left and right subtrees are the empty tree,
created at day 0)
...
Thus, binary trees are the objects created by the
above two rules in a finite number of steps
...
(Exercise: work out the sequence of steps needed to
create the tree in Figure 6
...
)

6
...
We have two constructors which
are used to build trees:
• EmptyTree, which returns an empty tree,
• MakeTree(v, l, r), which builds a binary tree from a root node with label v and two
constituent binary trees l and r,
a condition to test whether a tree is empty:
• isEmpty(t), which returns true if tree t is the EmptyTree,
and three selectors to break a non-empty tree into its constituent parts:
• root(t), which returns the value of the root node of binary tree t,
• left(t), which returns the left sub-tree of binary tree t,
• right(t), which returns the right sub-tree of binary tree t
...

For convenience though, it is often a good idea to define derived operators that allow us to
write simpler, more readable algorithms
...
Then the tree in Figure 6
...
For example, for the tree t
defined above we have
root(left(left(t)) = 1,
but the expression
root(left(left(left(t))))
does not make sense because
left(left(left(t))) = EmptyTree
and the empty tree does not have a root
...
In a language such as C , this would cause an unpredictable behaviour,
but if you are lucky, a core dump will be produced and the program will be aborted with
no further harm
...

The following equations should be obvious from the primitive operator definitions:
root(MakeTree(v,l,r)) = v
left(MakeTree(v,l,r)) = l
right(MakeTree(v,l,r)) = r
isEmpty(EmptyTree) = true
isEmpty(MakeTree(v,l,r)) = false
The following makes sense only under the assumption that t is a non-empty tree:
MakeTree(root(t),left(t),right(t)) = t
It just says that if we break apart a non-empty tree and use the pieces to build a new tree,
then we get an identical tree back
...
The concrete data type used in an implementation is called a data structure
...

The important advantage of abstract data types is that we can develop algorithms without
having to worry about the details of the representation of the data or the implementation
...

35

6
...
The maximum
height of a binary tree with n nodes is (n − 1), which happens when all non-leaf nodes have
precisely one child, forming something that looks like a chain
...
We can achieve
this by ‘filling’ each successive level in turn, starting from the root
...
Terminology varies, but we shall say that such
trees are perfectly balanced or height balanced , and we shall see later why they are optimal for
many of our purposes
...

We can easily determine the maximum number of nodes that can fit into a binary tree of
a given height h
...
This hypothesis
can be proved by induction using the definition of a binary tree as follows:
(a) The base case applies to the empty tree that has height h = −1, which is consistent
with s(−1) = 2−1+1 − 1 = 20 − 1 = 1 − 1 = 0 nodes being stored
...
By the induction hypothesis, each subtree can store s(h) = 2h+1 − 1 nodes,
so the total number of nodes that can fit in a height h + 1 tree is 1 + 2 × (2h+1 − 1) =
1 + 2h+2 − 2 = 2(h+1)+1 − 1 = s(h + 1)
...

An obvious potential problem with any proof by induction like this, however, is the need to
identify an induction hypothesis to start with, and that is not always easy
...
Sometimes, however, the relevant series is too complicated
to sum easily
...
Here, since level h of a tree clearly has 2h nodes,
we can explicitly add in the 2h+1 nodes of the last level of the height h + 1 tree to give
s(h + 1) = s(h) + 2h+1
Also, since a height h + 1 tree is made up of a root node plus two trees of height h
s(h + 1) = 1 + 2s(h)
Then subtracting the second equation from the first gives
s(h) = 2h+1 − 1
36

which is the required answer
...

Hence a perfectly balanced tree consisting of n nodes has height approximately log2 n
...
Therefore, for perfectly balanced trees we
can reduce the search time considerably as the table demonstrates
...

6
...
e
...

This is easy if we use recursion
...
Otherwise, any binary tree will always be assembled from a root node, a left sub-tree
l, and a right sub-tree r, and its size will be the sum of the sizes of its components, i
...
1
for the root, plus the size of l, plus the size of r
...
Thus we can easily define
the procedure size(t), which takes a binary tree t and returns its size, as follows:
size(t) {
if ( isEmpty(t) )
return 0
else return (1 + size(left(t)) + size(right(t)))
}
This recursively processes the whole tree, and we know it will terminate because the trees
being processed get smaller with each call, and will eventually reach an empty tree which
returns a simple value
...
7

Implementation of trees

The natural way to implement trees is in terms of records and pointers, in a similar way to how
linked lists were represented as two-cells consisting of a pointer to a list element and a pointer
to the next two-cell
...
The inductive definition
37

of trees then allows recursive algorithms on trees to operate efficiently by simply passing the
pointer to the relevant root-node, rather than having to pass complete copies of whole trees
...

A binary tree can be implemented as a data record for each node consisting simply of the
node value and two pointers to the children nodes
...
The absence of a child node can be simply represented by a Null Pointer
...
8

Recursive algorithms

Some people have difficulties with recursion
...

This way of putting things, although suggestive, can be misleading
...
What happens is
that a processor (which can be a machine or a person) executes the algorithm
...

For example, suppose that John (the first processor in this task) wants to compute the
size of a given tree t using the above recursive algorithm
...
If it is, he simply returns zero and finishes
his computation
...
He can then ask two of his students, say Steve and Mary, to execute the
same algorithm, but for the trees l and r
...
If Steve and Mary aren’t given empty trees, they will themselves
have to delegate executions of the same algorithm, with their sub-trees, to other people
...
What happens, is that there are many people running their
own copies of the same algorithm on different trees
...
However, the same processor, with some difficulty,
can impersonate several processors, in such a way that it achieves the same result as the
execution involving many processors
...

Note that there is nothing to stop us keeping count of the recursions by passing integers
along with any data structures being operated on, for example:
function(int n, tree t) {
// terminating condition and return

...

return function(n-1, t2)
}
38

so we can do something n times, or look for the nth item, etc
...
5):
F(int n) {
if ( n == 0 ) return 0
if ( n == 1 ) return 1
return F(n-1) + F(n-2)
}
though this is an extremely inefficient algorithm for computing these numbers
...
Is it possible to create an O(n)
recursive algorithm to compute these numbers?
In most cases, however, we won’t need to worry about counters, because the relevant data
structure has a natural end point condition, such as isEmpty(x), that will bring the recursion
to an end
...

Then we consider further elaborations of these trees, namely AVL trees and B-trees, which
operate more efficiently at the expense of requiring more sophisticated algorithms
...
1

Searching with arrays or lists

As we have already seen in Chapter 4, many computer science applications involve searching
for a particular item in a collection of data
...
On average, if there are n items, this will
take n/2 checks, and in the worst case, all n items will have to be checked
...
We also saw
that if the items are sorted before storing in an array, one can perform binary search which
only requires log2 n checks in the average and worst cases
...
The idea here is that, with the help of binary trees, we can speed up the
storing and search process without needing to maintain a sorted array
...
2

Search keys

If the items to be searched are labelled by comparable keys, one can order them and store
them in such a way that they are sorted already
...

In our examples, the search keys will, for simplicity, usually be integer numbers (such
as student ID numbers), but other choices occur in practice
...
In that case, comparability usually refers to the alphabetical order
...
If
w = bed and t = sky then the relation w < t holds, but this is not the case if w = bed and
t = abacus
...
Each entry of
the dictionary is a pair consisting of a word and a definition
...
The search key, in this example, is the word (to which a
definition is attached in the dictionary entry)
...
This is what matters
from the point of view of the search algorithms we are going to consider
...

Notice the use of the word “abstract” here
...
For example, a
dictionary usually comes in the form of a book, which is a sequence of pages – but for us,
the distribution of dictionary entries into pages is an accidental feature of the dictionary
...
So “abstraction” means
“getting rid of irrelevant details”
...

Note that we should always employ the data structure to hold the items which performs
best for the typical application
...

However, for many applications, the kind of binary trees we studied in the last chapter are
particularly useful here
...
3

Binary search trees

The solution to our search problem is to store the collection of data to be searched using
a binary tree in such a way that searching for a particular item takes minimal effort
...
For the moment, we shall assume that all the items in the data collection are distinct,
with different search keys, so each possible node value occurs at most once, but we shall see
later that it is easy to relax this assumption
...
A binary search tree is a binary tree that is either empty or satisfies the following
conditions:
• All values occurring in the left subtree are smaller than that of the root
...

• The left and right subtrees are themselves binary search trees
...
This
means we can inherit many of the operators and algorithms we defined for general binary
trees
...

7
...
So, to insert a new value v, the following cases arise:
• If the given tree is empty, then simply assign the new value v to the root, and leave the
left and right subtrees empty
...

– If v is larger than the value of the root: insert v into the right sub-tree
...

Thus, using the primitive binary tree operators, we have the procedure:
insert(v,bst) {
if ( isEmpty(bst) )
return MakeTree(v, EmptyTree, EmptyTree)
elseif ( v < root(bst) )
return MakeTree(root(bst), insert(v,left(bst)), right(bst))
elseif ( v > root(bst) )
return MakeTree(root(bst), left(bst), insert(v,right(bst)))
else error(‘Error: violated assumption in procedure insert
...
Note that the node
added is always a leaf
...
This can be
proved rigorously via an inductive argument
...
The original tree bst is not modified, it
is merely inspected
...
That can
easily be done by using pointers, similar to the way we set up linked lists
...

7
...
We simply have to compare the item being looked for with the root, and then keep
‘pushing’ the comparison down into the left or right subtree depending on the result of each
root comparison, until a match is found or a leaf is reached
...
Here is a concise description in words of the
search algorithm that we have just outlined:
In order to search for a value v in a binary search tree t, proceed as follows
...
Otherwise,
if v is equal to the root of t, then v does occur in t, and hence we stop returning
true
...
Hence replace t
by its left sub-tree and carry on in the same way
...

Notice that such a description of an algorithm embodies both the steps that need to be carried
out and the reason why this gives a correct solution to the problem
...

42

When we do want to run them, we need to provide a more precise specification, and would
normally write the algorithm in pseudocode, such as the following recursive procedure:
isIn(value v, tree t) {
if ( isEmpty(t) )
return false
elseif ( v == root(t) )
return true
elseif ( v < root(t) )
return isIn(v, left(t))
else
return isIn(v, right(t))
}
Each recursion restricts the search to either the left or right subtree as appropriate, reducing
the search tree height by one, so the algorithm is guaranteed to terminate eventually
...
The only way to leave the loop is to have found the required value, or to only
have an empty tree remaining, so the procedure only needs to return whether or not the final
tree is empty
...
For example,
if we are searching for a student ID, we usually want a pointer to the full record for that
student, not just a confirmation that they exist
...
Clearly,
the basic tree structures we have been discussing can be elaborated in many different ways
like this to form whatever data-structure is most appropriate for the problem at hand, but,
as noted above, we can abstract out such details for current purposes
...
6

Time complexity of insertion and search

As always, it is important to understand the time complexity of our algorithms
...
At worst, this will be the number of nodes in the tree
...
This can be calculated by taking all possible binary
search trees of a given size n and measuring each of their heights, which is by no means an
43

easy task
...

As we have seen above, perfectly balanced trees achieve minimal height for a given number
of nodes, and it turns out that the more balanced a tree, the more ways there are of building
it
...
The tree on the right, however, can be reached in two ways: Inserting in
the order 2, 1, 3 or in the order 2, 3, 1
...

Carrying out exact tree height calculations is not straightforward, so we will not do that
here
...
It follows that the average number of comparisons needed to
search a binary search tree is O(log2 n), which is the same complexity we found for binary
search of a sorted array
...

Interestingly, the average height of a binary search tree is quite a bit better than the
average height of a general binary tree consisting of the same n nodes that have not been
√
built into a binary search tree
...

The reason for that is that there is a relatively large proportion of high binary trees that are
not valid binary search trees
...
7

Deleting nodes from a binary search tree

Suppose, for some reason, an item needs to be removed or deleted from a binary search tree
...
For n items that would require n steps of O(log2 n) complexity, and hence have
overall time complexity of O(nlog2 n)
...
Instead,
we need an algorithm that produces an updated binary search tree more efficiently
...

• If only one of the node’s subtrees is non-empty, ‘move up’ the remaining subtree
...
Use this node to overwrite the
44

one that is to be deleted
...

The last part works because the left-most node in the right sub-tree is guaranteed to be bigger
than all nodes in the left sub-tree, smaller than all the other nodes in the right sub-tree, and
have no left sub-tree itself
...
1, we get the tree displayed in Figure 7
...

8

8

3

1

11

6

3

9

7

10

14

12

1

15

12

6

9

7

14

10

15

Figure 7
...

In practice, we need to turn the above algorithm (specified in words) into a more detailed
algorithm specified using the primitive binary tree operators:
delete(value v, tree t) {
if ( isEmpty(t) )
error(‘Error: given item is not in given tree’)
else
if ( v < root(t) )
// delete from left sub-tree
return MakeTree(root(t), delete(v,left(t)), right(t));
else if ( v > root(t) )
// delete from right sub-tree
return MakeTree(root(t), left(t), delete(v,right(t)));
else
// the item v to be deleted is root(t)
if ( isEmpty(left(t)) )
return right(t)
elseif ( isEmpty(right(t)) )
return left(t)
else
// difficult case with both subtrees non-empty
return MakeTree(smallestNode(right(t)), left(t),
removeSmallestNode(right(t))
}
If the empty tree condition is met, it means the search item is not in the tree, and an
appropriate error message should be returned
...
Since the relevant sub-trees will always be non-empty, these sub-algorithms can
be written with that precondition
...
It is often safest to start each procedure with a
45

check to determine whether the preconditions are satisfied, with an appropriate error message
produced when they are not, but that may have a significant time cost if the procedure is
called many times
...
It recursively looks in the left sub-tree till it reaches an empty tree, at
which point it can return the root
...

These procedures are further examples of recursive algorithms
...

It is clear from the algorithm that the deletion of a node requires the same number of
steps as searching for a node, or inserting a new node, i
...
the average height of the binary
search tree, or O(log2 n) where n is the total number of nodes on the tree
...
8

Checking whether a binary tree is a binary search tree

Building and using binary search trees as discussed above is usually enough
...
We know that an empty tree is a (trivial) binary
search tree, and also that all nodes in the left sub-tree must be smaller than the root and
themselves form a binary search tree, and all nodes in the right sub-tree must be greater than
the root and themselves form a binary search tree
...
Exercise:
identify what is inefficient about this algorithm, and formulate a more efficient algorithm
...
9

Sorting using binary search trees

Sorting is the process of putting a collection of items in order
...

The node values stored in a binary search tree can be printed in ascending order by
recursively printing each left sub-tree, root, and right sub-tree in the right order as follows:
printInOrder(tree t) {
if ( not isEmpty(t) ) {
printInOrder(left(t))
print(root(t))
printInOrder(right(t))
}
}
Then, if the collection of items to be sorted is given as an array a of known size n, they can
be printed in sorted order by the algorithm:
sort(array a of size n) {
t = EmptyTree
for i = 0,1,
...
Exercise: modify
this algorithm so that instead of printing the sorted values, they are put back into the original
array in ascending order
...
10

Balancing binary search trees

If the items are added to a binary search tree in random order, the tree tends to be fairly
well balanced with height not much more than log2 n
...
In
the extreme case of the new items being added in ascending order, the tree will be one long
branch off to the right, with height n log2 n
...
One simply has to recursively
build a binary tree with the middle (i
...
, median) item as the root, the left subtree made up
of the smaller items, and the right subtree made up of the larger items
...
9
...

Another way to avoid unbalanced binary search trees is to rebalance them from time to
time using tree rotations
...
The two forms are related by left and right tree
rotations which clearly preserve the binary search tree property
...
For example, if the left form had A consisting of two
nodes, and C and E consisting of one node, the height of the tree would be reduced by one
and become perfectly balanced by a right tree rotation
...
For example, if the left form had C consisting of two
nodes, and A and E consisting of one node, the tree would be balanced by first performing
a left rotation of the A-B-C sub-tree, followed by a right rotation of the whole tree
...

7
...
Obviously, there will be a cost involved in such rebalancing, and there will be a
48

trade-off between the time involved in rebalancing and the time saved by the reduced height
of the tree, but generally it is worthwhile
...
M
...
M
...
These maintain the difference in heights
of the two sub-trees of all nodes to be at most one
...

The general idea is to keep track of the balance factor for each node, which is the height
of the left sub-tree minus the height of the right sub-tree
...
However, insertion or deletion
of a node could leave that in the wider range [−2, 2] requiring a tree-rotation to bring it back
into AVL form
...
Compare them with other self-balancing approaches such as red-black trees
...
12

B-trees

A B-tree is a generalization of a self-balancing binary search tree in which each node can hold
more than one search key and have more than two children
...
The standard (Knuth) definition is:
Definition
...

• Every non-leaf node (except the root node) has at least m/2 children
...

• A non-leaf node with c children contains c − 1 search keys which act as separation values
to divide its sub-trees
...

There appears to be no definitive answer to the question of what the “B” in “B-Tree” stands
for
...

The standard representation of simple order 4 example with 9 search keys would be:

The search keys held in each node are ordered (e
...
, 1, 2, 5 in the example), and the non-leaf
node’s search keys (i
...
, the items 8 and 17 in the example) act as separation values to divide
49

the contents of its sub-trees in much the same way that a node’s value in a binary search
tree separates the values held in its two sub-trees
...
All values in the leftmost
subtree will be less than s1, all values in the middle subtree will be between s1 and s2, and all
values in the rightmost subtree will be greater than s2
...

The restriction on the number of children to lie between m/2 and m means that the best
case height of an order m B-tree containing n search keys is logm n and the worst case height
is logm/2 n
...
The requirement
that all the leaf nodes are at the same level means that B-trees are always balanced and thus
have minimal height, though rebalancing will often be required to restore that property after
insertions and deletions
...
To maintain the conditions of the B-tree definition, non-leaf nodes often have to
be split or joined when new items are inserted into or deleted from the tree (which is why
there is a factor of two between the minimum and maximum number of children), and rebalancing is often required
...
An advantage of B-trees over self balancing binary
search trees, however, is that the range of child nodes means that rebalancing is required less
frequently
...
There is also the cost of keeping the items within each node ordered, and
having to search among them, but for reasonably small orders m, that cost is low
...

50

Chapter 8

Priority Queues and Heap Trees
8
...
If the tree in question is a complete
binary tree, there is a useful array based alternative
...
A binary tree is complete if every level, except possibly the last, is completely
filled, and all the leaves on the last level are placed as far to the left as possible
...
Complete binary trees always have minimal height for their size n, namely log2 n, and
are always perfectly balanced (but not every perfectly balanced tree is complete in the sense
of the above definition)
...

Notice that this time we have chosen to start the array with index 1 rather than 0
...
The nodes on level i then have indices 2i , · · · , 2i+1 − 1
...
The children of a
node with index i, if they exist, have indices 2i and 2i + 1
...
This allows the following simple algorithms:
boolean isRoot(int i) {
return i == 1
}
51

int level(int i) {
return log(i)
}
int parent(int i) {
return i / 2
}
int left(int i) {
return 2 * i
}
int right(int i) {
return 2 * i + 1
}
which make the processing of these trees much easier
...
Since keeping binary search trees balanced is a difficult problem, it is therefore not
really a viable option to adapt the algorithms for binary search trees to work with them stored
as arrays
...
However, we
shall now see that there is another kind of binary tree for which array-based representations
allow very efficient processing
...
2

Priority queues and binary heap trees

While most queues in every-day life operate on a first come, first served basis, it is sometimes
important to be able to assign a priority to the items in the queue, and always serve the item
with the highest priority next
...
The structure of a complete binary
tree in array form is particularly useful for representing such priority queues
...
The idea is that the node labels, which
were the search keys when talking about binary search trees, are now numbers representing
the priority of each item in question (with higher numbers meaning a higher priority in our
examples)
...
This is because we only ever
want to remove one element at a time, namely the one with the highest priority present, and
the idea is that the highest priority item will always be found at the root of the tree
...
A binary heap tree is a complete binary tree which is either empty or satisfies
the following conditions:
• The priority of the root is higher than (or equal to) that of its children
...

52

Alternatively, one could define a heap tree as a complete binary tree such that the priority of
every node is higher than (or equal to) that of all its descendants
...

The most obvious difference between a binary heap tree and a binary search trees is that
the biggest number now occurs at the root rather than at the right-most node
...

Three examples of binary trees that are valid heap trees are:
96
90

9

70
8

80

75

42

44

10

72

8

3

2

60
1

17

9

3

1

14

and three which are not valid heap trees are:
6
4
5

6
3

5
4

6
5

3

4
3

the first because 5 > 4 violates the required priority ordering, the second because it is not
perfectly balanced and hence not complete, and the third because it is not complete due to
the node on the last level not being as far to the left as possible
...
3

Basic operations on binary heap trees

In order to develop algorithms using an array representation, we need to allocate memory and
keep track of the largest position that has been filled so far, which is the same as the current
number of nodes in the heap tree
...
e
...
We also
need to be able to determine when the queue/tree is empty
...

8
...
The tree that results will still be a complete binary tree, but the heap tree
priority ordering property might have been violated
...
This can be done easily by comparing its priority with
that of its parent, and if the new element has higher priority, then it is exchanged with its
parent
...
Hence an algorithm which inserts a new heap tree node with priority p is:
insert(int p, array heap, int n) {
if ( n == MAX ) {
error(‘Heap is full’)
else {
heap[n+1] = p
bubbleUp(n+1,heap,n+1)
}
}
bubbleUp(int i, array heap, int n) {
if ( isRoot(i) )
return
elseif ( heap[i] > heap[parent(i)] ) {
swap heap[i] and heap[parent(i)]
bubbleUp(parent(i),heap,n)
}
}
54

Note that this insert algorithm does not increment the heap size n – that has to be done
separately by whatever algorithm calls it
...

8
...
e
...
We will then be left with something which is not a
binary tree at all
...
However, as
with insertion of a new item, the heap tree (priority ordering) property might be violated
...
This process is then repeated until the new root element
has found a valid place
...

Since the original heap tree is ordered, items will only ever need to be bubbled up or down,
never both, so we can simply call both, because neither procedure changes anything if it is
not required
...
In
the case of two children, it is crucial that when both children have higher priority than the
given node, it is the highest priority one that is swapped up, or their priority ordering will be
violated
...
Note also that this algorithm does not attempt to be fair in the sense that
if two or more nodes have the same priority, it is not necessarily the one that has been waiting
longest that will be removed first
...

As with insertion, deletion takes at most O(log2 n) steps, because the maximum number
of times it may have to bubble down or bubble up the replacement element is the height of
the tree which is log2 n
...
6

Building a new heap tree from scratch

Sometimes one is given a whole set of n new items in one go, and there is a need to build
a binary heap tree containing them
...
One obvious possibility would be to insert the n items one by one into a heap tree,
starting from an empty tree, using the O(log2 n) ‘bubble up’ based insert algorithm discussed
earlier
...

It turns out, however, that rearranging an array of items into heap tree form can be done
more efficiently using ‘bubble down’
...
, n, then all the items with an index greater than n/2 will be leaves, and not
need bubbling down
...
,
a[1] by exchanging them with the larger of their children until they either are positioned at
a leaf, or until their children are both smaller, we obtain a valid heap tree
...
Next a[3] = 3 is bubbled down,
swapping with a[7] = 7, giving:
5

8

7

9

1

4

3

6

2

Next a[2] = 8 is bubbled down, swapping with a[4] = 9, giving:
5

9

7

8

1

4

3

6

2

Finally, a[1] = 5 is bubbled down, swapping with a[2] = 9, to give first:
9

5

7

8

1

4

3

6

2

7

5

1

4

3

6

2

1

4

3

5

2

then swapping with a[4] = 8 to give:
9

8

and finally swapping with a[8] = 6 to give:
9

8

7

6

which has the array rearranged as the required heap tree
...
, bn/2c
...
So the total number of comparisons involved is
at most (n/2)
...
2 = nlog2 n, which is the same as we would have by inserting the array
items one at a time into an initially empty tree
...
This is because the number of bubble down steps
57

will usually be less than the full height of the tree
...
To be sure of the complexity class, we
need to perform a more accurate calculation
...
4), so for large h we have
C(h) ≈ 2h

∞
X
j
= 2h
...
Thus, the total number of operations
is O(2h+1 ) = O(n), meaning that the complexity class of heapify is actually O(n), which is
better than the O(nlog2 n) complexity of inserting the items one at a time
...
7

Merging binary heap trees

Frequently one needs to merge two existing priority queues based on binary heap trees into
a single priority queue
...
Move all the items from the smaller heap tree one at a time into the larger heap tree
using the standard insert algorithm
...

2
...
Then move
the last item of the new tree to replace the dummy root “0”, and bubble down that new
root
...
On average, around half the items in the last level of one
tree will need moving and bubbling, so that will be O(n) moves, each with a cost of
O(log2 n), again giving an overall time complexity of O(nlog2 n)
...

3
...
The heapify
algorithm has time complexity O(n), and the concatenation need be no more than
that, so this approach has O(n) overall time complexity, making it in the best general
approach of all three
...

58

If the two binary heap trees are such that very few moves are required for the second
approach, then that may look like a better choice of approach than the third approach
...
So, for array-based
similarly-sized binary heaps, the third approach is usually best
...
In practice, a good general purpose merge algorithm would check the
sizes of the two trees and use them to determine the best approach to apply
...
8

Binomial heaps

A Binomial heap is similar to a binary heap as described above, but has the advantage of
more efficient procedures for insertion and merging
...

Definition
...

• A binomial tree of order k has a root node with children that are roots of binomial trees
of orders k − 1, k − 2,
...

Thus, a binomial tree of order k has height k, contains 2k nodes, and is trivially constructed
by attaching one order k−1 binomial tree as the left-most child of another order k−1 binomial
tree
...

A Binomial heap is constructed as a collection of binomial trees with a particular structure
and node ordering properties:
• There can only be zero or one binomial tree of each order
...
e
...

59

The structure of such a heap is easily understood by noting that a binomial tree of order k
contains exactly 2k nodes, and a binomial heap can only contain zero or one binomial tree of
each order, so the total number of nodes in a Binomial Heap must be
n=

∞
X

bk 2k

bk ∈ [0, 1]

k=0

where bk specifies the number of trees of order k
...
The maximum
number of trees in a heap with n nodes therefore equals the number of digits when n is written
in binary without leading zeros, i
...
log2 n + 1
...

The most important operation for binomial heaps is merge, because that can be used as
a sub-process for most other operations
...
By definition, that is achieved by adding one of
those trees as the left most sub-tree of the root of the other, and preservation of the priority
ordering simply requires that it is the tree with the highest priority root that provides the
root of the combined tree
...
Then merging two whole
binomial heaps is achieved by merging the constituent trees whenever there are two of the
same order, in a sequential manner analogous to the addition of two binary numbers
...
This is better than the O(n) complexity
of merging binary heaps that can be achieved by concatenating the heap arrays and using the
O(n) heapify algorithm
...
e
...
The average time complexity of that
insert is given by computing the average number of O(1) tree combinations required
...
5, the probability of needing a second
combination is 0
...
53 , and so on, which sum to one
...
That is better than the O(log2 n) complexity of insertion into
a standard binary heap
...
In this case, there
is no better process, so heapify here has the same time complexity as the heapify algorithm
for binary heaps
...
For standard binary heaps, that simply requires application of the usual
bubble-up process with O(log2 n) complexity
...

The highest priority node in a binomial heap will clearly be the highest priority root node,
and a pointer to that can be maintained by each heap update operation without increasing the
complexity of the operation
...
However, those trees can
easily be merged back into the original heap using the standard merge algorithm, with the
60

standard merge complexity of O(log2 n)
...
So, the complexity
of delete is always O(log2 n)
...

8
...
It can be used to implement a priority queue in a similar way to binary or binomial
heaps, but the structure of Fibonacci heaps are more flexible and efficient, which allows them
to have better time complexities
...

The flexibility and efficiency of Fibonacci heaps comes at the cost of more complexity: the
trees do not have a fixed shape, and in the extreme cases every element in the heap can be in
a separate tree
...
A pointer to the highest
priority root node is maintained, making it trivial to find the highest priority node in the
heap
...

Fibonacci heaps can easily be merged with O(1) complexity by simply concatenating the
two lists of root nodes, and then insertion can be done by merging the existing heap with a
new heap consisting only of the new node
...

Obviously, at some point, order needs to be introduced into the heap to achieve the overall
efficiency
...
The number of trees in the heap is decreased as part of the delete
operation that is used to remove the highest priority node and update the pointer to the
highest priority root
...
First it removes the highest
priority root, leaving its children to become roots of new trees within the heap, the processing
of which will be O(log2 n)
...
Finally the roots of those
trees are checked to reset the pointer to the highest priority
...

For each node, a record is kept of its number of children and whether it is marked
...
The mark is used by the algorithm for
increasing a node priority, which is also complex, but can be achieved with O(1) complexity
...

Finally, an arbitrary node can be deleted from the heap by increasing its node priority
to infinity and applying the delete highest priority algorithm, resulting in an overall time
complexity of O(log2 n)
...

8
...
The following table summarizes the average time
complexities of the crucial heap operations:
Heap type
Binary
Binomial
Fibonacci

Insert
O(log2 n)
O(1)
O(1)

Delete
O(log2 n)
O(log2 n)
O(log2 n)

Merge
O(n)
O(log2 n)
O(1)

Heapify
O(n)
O(n)
O(n)

Up priority
O(log2 n)
O(log2 n)
O(1)

Obviously it will depend on the application in question whether using a more complicated heap
is worth the effort
...

62

Chapter 9

Sorting
9
...
To be able to do this, we first need to specify the notion of order on the items we are
considering
...

Usually, what is meant by sorting is that once the sorting process is finished, there is
a simple way of ‘visiting’ all the items in order, for example to print out the contents of a
database
...

For example, if all the objects are sorted and stored in an array a of size n, then
for i = 0,
...
If the objects are stored in a linked list, we would
expect that the first entry is the smallest, the next the second-smallest, and so on
...

Sorting is important because having the items in order makes it much easier to find a given
item, such as the cheapest item or the file corresponding to a particular student
...

If the sorting can be done beforehand (off-line), this enables faster access to the required
item, which is important because that often has to be done on the fly (on-line)
...
So, if we often have to look
up items, it is worth the effort to sort the whole collection first
...

It follows that sorting algorithms are important tools for program designers
...
It
is worth noting that we will be far from covering all existing sorting algorithms – in fact, the
field is still very much alive, and new developments are taking place all the time
...

9
...
Some of the key strategies are:
enumeration sorting

Consider all items
...

exchange sorting

If two items are found to be out of order, exchange them
...

selection sorting

Find the smallest item, put it in the first position, find the smallest
of the remaining items, put it in the second position
...

divide and conquer

Recursively split the problem into smaller sub-problems till you
just have single items that are trivial to sort
...

All these strategies are based on comparing items and then rearranging them accordingly
...
We will later consider other noncomparison-based algorithms which are possible when we have specific prior knowledge about
the items that can occur, or restrictions on the range of items that can occur
...
If the whole set of items cannot be stored in the internal memory at one
time, different techniques have to be used
...
Suffice to say, they generally work by
splitting the set of items into subsets containing as many items as can be handled at one time,
sorting each subset in turn, and then carefully merging the results
...
3

How many comparisons must it take?

An obvious way to compute the time complexity of sorting algorithms is to count the number of
comparisons they need to carry out, as a function of the number of items to be sorted
...
We are more interested in
having a lower bound for the number of comparisons needed for the best algorithm in the
worst case
...
Then we can see
how well particular sorting algorithms compare against that theoretical lower bound
...
In fact, for some problems, optimal lower bounds are not yet known
...
In these cases, one generally
has to relax the problem to find solutions which are probably approximately correct
...

For sorting algorithms based on comparisons, however, it turns out that a tight lower
bound does exist
...
Thus, the lower
bound must be at least n, the number of items to be sorted, since we need at least n steps to
examine every element
...
However, as we shall
shortly see, no algorithm can actually take fewer than O(nlog2 n) comparisons in the worst
case
...
We shall start by demonstrating that every algorithm needs
at least O(nlog2 n) comparisons
...
If we have found
that i ≤ j and j ≤ k, then we know that the sorted order is: i, j, k
...
In some cases, however, it is clear that we will need as many as
three comparisons
...
A third comparison is needed
...
The decision tree for the three item example we were discussing is:

i <= j
yes

no

j <= k

i <= k

yes

no

yes

i <= k

i <= j <= k
yes

i <= k <= j

no
j <= k

j <= i <= k
no

yes

k <= i <= j

j <= k <= i

no
k <= j <= i

So what can we deduce from this about the general case? The decision tree will obviously
always be a binary tree
...
The leaves of the decision tree are
65

all the possible outcomes
...
The first item can
be any of the n items, the second can be any of the remaining n − 1 items, and so forth, so
their total number is n(n − 1)(n − 2) · · · 3 · 2 · 1 = n!
...
The number of leaves of a tree of
height h is at most 2h , so we want to find h such that
2h ≥ n!

h ≥ log2 (n!)

or

There are numerous approximate expressions that have been derived for log2 (n!) for large n,
but they all have the same dominant term, namely nlog2 n
...
) Hence,
no sorting algorithm based on comparing items can have a better average or worst case
performance than using a number of comparisons that is approximately nlog2 n for large n
...

To do this, we would have to exhibit at least one algorithm with this performance behaviour
(and convince ourselves that it really does have this behaviour)
...

We shall proceed now by looking in turn at a number of sorting algorithms of increasing
sophistication, that involve the various strategies listed above
...
We start with approaches
that work with simple arrays, and then move on to using more complex data structures that
lead to more efficient algorithms
...
4

Bubble Sort

Bubble Sort follows the exchange sort approach
...
Assume we have array a of size n that we wish to sort
...

It then compares a[n-2] and a[n-3] and swaps those if need be, and so on
...
It then starts from
the back again, comparing pairs of ‘neighbours’, but leaving the zeroth entry alone (which is
known to be correct)
...
It keeps making ‘passes’ over the array until it is sorted
...

The item with the lowest index that is compared to its right neighbour is a[i-1]
...
,a[i-1] are in their final position
...
Since they are not in order, it swaps
them, giving 4 1 2 3
...
Since those are in order,
it leaves them where they are
...
We get 1 4 2 3
...
This will always happen after Bubble Sort has done its first
‘pass’ over the array
...
These entries are in order, so nothing happens
...
These are not in order, so they have to be swapped, giving 1 2 4 3
...
Note that now the second-smallest entry is in place, too
...
Again
these are out of order and have to be swapped, giving 1 2 3 4
...
Furthermore, the
third-smallest item is in place now, which means that the fourth-smallest has to be correct,
too
...

It is now clear that Bubble Sort can be implemented as follows:
for ( i = 1 ; i < n ; i++ )
for ( j = n-1 ; j >= i ; j-- )
if ( a[j] < a[j-1] )
swap a[j] and a[j-1]
The outer loop goes over all n − 1 positions that may still need to be swapped to the left, and
the inner loop goes from the end of the array back to that position
...
The outer loop is carried out
n − 1 times
...
So the number of
comparisons is the same in each case, namely
n−1
X n−1
X
i=1 j=i

1 =

n−1
X

(n − i)

i=1

= (n − 1) + (n − 2) + · · · + 1
n(n − 1)
=

...

9
...
It starts by treating the first
entry a[0] as an already sorted array, then checks the second entry a[1] and compares it with
the first
...
That leaves a[0],a[1] sorted
...
More generally, at the beginning of the ith stage, Insertion Sort has the
entries a[0],
...
,a[i]
...
Since a[1]=1 is smaller than a[0]=4, it has to be inserted in the zeroth slot,
67

but that slot is holding a value already
...

At the next step, the algorithm treats a[0],a[1] as an already sorted array and tries to
insert a[2]=3
...
This is achieved
by moving a[1] ‘up’ one slot to a[2] (the value of which we assume we have remembered),
allowing us to move the current value into a[1], giving 1 3 4 2
...
,a[2]
...
Comparison with
a[0]=1 shows that a[1] was the slot we were looking for, giving 1 2 3 4
...

However, this typically involves swapping each next item many times to get it into its right
position, so it is more efficient to store each next item in a temporary variable t and only
insert it into its correct position when that has been found and its content moved:
for ( i = 1 ; i < n ; i++ ) {
j = i
t = a[j]
while ( j > 0 && t < a[j-1] ) {
a[j] = a[j-1]
j-}
a[j] = t
}
The outer loop again goes over n−1 items, and the inner loop goes back through the currently
sorted portion till it finds the correct position for the next item to be inserted
...
The
outer loop is always carried out n − 1 times
...
In the worst case, it will be carried out i times; on average,
it will be half that often
...
Thus average and worst case
number of steps for of Insertion Sort are both proportional to n2 , and hence the average and
worst case time complexities are both O(n2 )
...
6

Selection Sort

Selection Sort is (not surprisingly) a form of selection sorting
...
Then
it finds the second-smallest item and exchanges it with the item in a[1]
...
More generally, at the ith stage, Selection Sort finds the
ith-smallest item and swaps it with the item in a[i-1]
...

For the example starting array 4 1 3 2 , Selection Sort first finds the smallest item in
the whole array, which is a[1]=1, and swaps this value with that in a[0] giving 1 4 3 2
...
Finally, it finds the smallest
of the reduced array a[2],a[3], that is a[2]=3, and swaps that into a[2], or recognizes that
a swap is not needed, giving 1 2 3 4
...
Note
that, unlike with Bubble Sort and Insertion Sort, there is exactly one swap for each iteration
of the outer loop,
The time complexity is again the number of comparisons carried out
...
In the inner loop, which is carried out (n − 1) − i = n − 1 − i times,
one comparison occurs
...

2
Therefore the number of comparisons for Selection Sort is proportional to n2 , in the worst
case as well as in the average case, and hence the average and worst case time complexities
are both O(n2 )
...

69

9
...
So one might imagine that it does not make
much difference which of these algorithms is used
...
The following table shows the measured running
times of the three algorithms applied to arrays of integers of the size given in the top row:
Algorithm
Bubble Sort
Insertion Sort
Selection Sort

128
54
15
12

256
221
69
45

512
881
276
164

1024
3621
1137
634

O1024
1285
6
643

R1024
5627
2200
833

2048
14497
4536
2497

Here O1024 denotes an array with 1024 entries which are already sorted, and R1024 is an
array which is sorted in the reverse order, that is, from biggest to smallest
...
Warning: tables of measurements like this are always dependent
on the random ordering used, the implementation of the programming language involved, and
on the machine it was run on, and so will never be exactly the same
...
Each swap requires three
assignments and takes, in fact, more time than a comparison
...
Insertion Sort does particularly well on data which is sorted already –
in such a case, it only makes n − 1 comparisons
...

These comparisons serve to show that complexity considerations can be rather delicate, and
require good judgement concerning what operations to count
...
For instance, we have assumed here that all comparisons cost
the same, but that may not be true for big numbers or strings of characters
...
You will have to gain experience before you feel comfortable with making
such decisions yourself
...
Something else to be aware of when making these calculations is that it is
not a bad idea to keep track of any constant factors, in particular those that go with the
dominating sub-term
...
It is 1/2 for the average case of Bubble Sort and Selection Sort, but only
1/4 for Insertion Sort
...
Or if you know that your program is only ever used on fairly
small samples, then using the simplest algorithm you can find might be beneficial overall – it
is easier to program, and there is not a lot of compute time to be saved
...
It is certainly easy to program, but that is about all it has
going for it
...

70

9
...
g
...
g
...
So, if we denote
the original order of an array of items by subscripts, we want the subscripts to end up in
order for each set of items with identical keys
...
Sorting algorithms which satisfy this useful property are said
to be stable
...
In this way, the stability of the
sorting algorithms studied so far can easily be established:
Bubble Sort

This is stable because no item is swapped past another unless they are
in the wrong order
...

Insertion Sort

This is stable because no item is swapped past another unless it has a
smaller key
...

Selection Sort

This is not stable, because there is nothing to stop an item being swapped
past another item that has an identical key
...

The issue of sorting stability needs to be considered when developing more complex sorting
algorithms
...

9
...
The idea here, which we have already seen before,
involves inserting the items to be sorted into an initially empty binary search tree
...
This sorting algorithm is called Treesort, and for the
basic version, we require that all the search keys be different
...
For a balanced tree that is O(log2 n)
...

Treesort can be difficult to compare with other sorting algorithms, since it returns a tree,
rather than an array, as the sorted data structure
...
This is usually the case if items
are frequently deleted or inserted, since a binary search tree allows these operations to be
implemented efficiently, with time complexity O(log2 n) per item
...

71

Even if we have an array of items to start with, and want to finish with a sorted array,
we can still use Treesort
...
That is easiest done by passing and returning an index j that keeps track of the next
array position to be filled
...

Since there are n items to insert into the tree, and each insertion has time complexity
O(log2 n), Treesort has an overall average time complexity of O(nlog2 n)
...

Note, however, that if the tree is not kept balanced while the items are being inserted, and
the items are already sorted, the height of the tree and number of comparisons per insertion
will be O(n), leading to a worst case time complexity of O(n2 ), which is no better than the
simpler array-based algorithms we have already considered
...
Find the simplest ways to relax that restriction and determine how
the choice of approach affects the stability of the associated Treesort algorithm
...
10

Heapsort

We now consider another way of implementing a selection sorting algorithm using a more
efficient data structure we have already studied
...
For that, remember the idea of a priority queue discussed earlier
...
Then, if we remove
the item with the highest priority at each step we can fill an array in order ‘from the rear’,
starting with the biggest item
...
However,
there may be a better way, so it is worth considering the other possibilities
...
Removing this item would be very simple, but
inserting a new item would always involve finding the right position and shifting a number of
items to the right to make room for it
...

Another approach would be to use an unsorted array
...
Then, after that, the last item would have to be swapped into
the gap, or all items with a higher index ‘shifted down’
...

Thus, of those three representations, only one is of use in carrying out the above idea
efficiently
...

To make use of binary heap trees, we first have to take the unsorted array and re-arrange
it so that it satisfies the heap tree priority ordering
...
Then we need to extract the sorted
array from it
...
In the sorted array, it should be in the last position a[n]
...
Since a[n] now contains the correct
item, we will never have to look at it again
...
,a[n-1]
and bring them back into a heap-tree form using the bubble down procedure on the new root,
which we know to have complexity O(log2 n)
...
Then we rearrange a[1],
...
And so on
...
,a[n] will have the correct
entries, and there will be a heap tree for the items a[1],
...
Note that the size,
and therefore the height, of the heap tree decreases at each step
...
This will take at most twice as many comparisons as
the height of the original heap tree, which is log2 n
...
The number of comparisons will actually
be less than that, because the number of bubble down steps will usually be less than the full
height of the tree, but usually not much less, so the time complexity is still O(nlog2 n)
...
First heapify converts the
73

array into a binary heap tree, and then the for loop moves each successive root one item at a
time into the correct position in the sorted array:
heapSort(array a, int n) {
heapify(a,n)
for( j = n ; j > 1 ; j-- ) {
swap a[1] and a[j]
bubbleDown(1,a,j-1)
}
}
It is clear from the swap step that the order of identical items can easily be reversed, so there
is no way to render the Heapsort algorithm stable
...
Thus the overall average and worst-case complexities are both O(nlog2 n), and
we now have a sorting algorithm that achieves the theoretical best worst-case time complexity
...

A useful feature of Heapsort is that if only the largest m n items need to be found and
sorted, rather than all n, the complexity of the second stage is only O(mlog2 n), which can
easily be less than O(n) and thus render the whole algorithm only O(n)
...
11

Divide and conquer algorithms

All the sorting algorithms considered so far work on the whole set of items together
...
The idea is that it will usually be easier to sort many smaller collections of
items than one big one, and sorting single items is trivial
...
There are two main
approaches for doing this:
Assuming we are working on an array a of size n with entries a[0],
...
That is, we split the array at item n/2
and consider the two sub-arrays a[0],
...
,a[n-1]
...
However, the two sorted arrays that result from each
split have to be merged together carefully to maintain the ordering
...

Another approach would be to split the array in such a way that, at each stage, all the items
in the first collection are no bigger than all the items in the second collection
...
This is the
underlying idea for a sorting algorithm called Quicksort
...

74

9
...

How to partition
...
If the array is very simple, for example [4,3,7,8,1,6], then a good split would be to
put all the items smaller than 5 into one part, giving [4,3,1], and all the items bigger than
or equal to 5 into the other, that is [7,8,6]
...
The value that defines the split is called the
pivot
...

One situation that we absolutely have to avoid is splitting the array into an empty subarray and the whole array again
...
However, if the pivot is chosen to be an item in the array, and the
pivot is kept in between and separate from both sub-arrays, then the sub-arrays being sorted
at each recursion will always be at least one item shorter than the previous array, and the
algorithm is guaranteed to terminate
...
Moreover, to save
space, we do not actually split the array into smaller arrays
...
We say that we partition the array, and the Quicksort
algorithm is then applied to the sub-arrays of this partitioned array
...
Therefore,
Quicksort is called giving the lowest index (left) and highest index (right) of the sub-array
it must work on
...

The crucial part of this is clearly the partition(a, left, right) procedure that rearranges
the array so that it can be split around an appropriate pivot a[pivotindex]
...
If, on the other hand, we halve the array at each
stage, it would only need log2 n recursive calls
...
Ideally then, we would like to get two sub-arrays of roughly equal size
(namely half of the given array) since that is the most efficient way of doing this
...

75

Choosing the pivot
...
g
...
Unfortunately, there is no quick guaranteed way
of finding the optimal pivot
...
More importantly, it would not necessarily give a pivot that is a value in the
array
...

• Take a key from ‘the middle’ of the array, that is a[(n-1)/2]
...
g
...

Note that one should never simply choose the first or last key in the array as the pivot, because
if the array is almost sorted already, that will lead to the particularly bad choice mentioned
above, and this situation is actually quite common in practice
...

The partitioning
...
This is more easily demonstrated by
an example than put into words
...
The ordering
we choose is the standard lexicographic one, and let the chosen pivot be “fortran”
...
To the left of the left marker,
there will be items we know to have a key smaller than or equal to the pivot
...

In the middle, there will be the items we have not yet considered
...

We begin by swapping the pivot value to the end of the array where it can easily be kept
separate from the sub-array creation process, so we have the array: [|c, ocaml, java, ada,
pascal, basic, haskell | fortran]
...
Now “ocaml” is greater than “fortran”, so we stop on the left and proceed
from the right instead, without moving the left marker
...
Now “basic” is smaller than “fortran”, so we have two keys, “ocaml”
and “basic”, which are ‘on the wrong side’
...
This brings us to [c,
basic | java, ada, pascal | ocaml, haskell, fortran]
...
Then “pascal”
is bigger than “fortran”, so we move the right marker again
...
We have now got [c, basic | java, ada, | pascal, ocaml,
haskell, fortran]
...
Finally, we swap the pivot back from the last position into the position immediately
after the markers to give [c, basic, ada, java | | fortran, ocaml, haskell, pascal]
...
The markers are therefore ‘in the same place’ once rightmark becomes
smaller than leftmark, which is when we stop
...

Note that this algorithm doesn’t require any extra memory – it just swaps the items in the
original array
...
To render
quicksort stable, the partitioning must be done in such a way that the order of identical items
can never be reversed
...

Complexity of Quicksort
...
The partitioning step compares each of n items against the pivot,
and therefore has complexity O(n)
...

In the worst case, when an array is partitioned, we have one empty sub-array
...
Those complexity functions then add up to
n + (n − 1) + (n − 2) + · · · 2 + 1 = n(n + 1)/2
Ignoring the constant factor and the non-dominant term n/2, this shows that, in the worst
case, the number of comparisons performed by Quicksort is O(n2 )
...
Then we have n comparisons in the first case, two lots of bn/2c comparisons
for the two sub-arrays, four times bn/4c, eight times bn/8c, and so on, down to 2log2 n−1 times
bn/2log2 n−1 c = b2c
...

More interesting and important is how well Quicksort does in the average case
...
The strategy for choosing a pivot at each stage
affects that, though as long as it avoids the problems outlined above, that does not change
the complexity class
...
In the end, all reasonable variations involve
comparing O(n) items against a pivot, for each of O(log2 n) recursions, so the total number
of comparisons, and hence the overall time complexity, in the average case is O(nlog2 n)
...
In this case, only
the first sub-array needs to be processed at each stage, until the sub-array sizes exceed m
...

rendering the time complexity of the whole modified algorithm only O(n)
...

78

Improving Quicksort
...
Generally, the pivot will be better if more items are sampled before it
is being chosen
...
Note that in order to find the median of all the
items, without sorting them first, we would end up having to make n2 comparisons, so we
cannot do that without making Quicksort unattractively slow
...
The reason for
this is all the overheads from the recursion (e
...
, storing all the return addresses and formal
parameters)
...

9
...
11, is called mergesort
...

However, that will obviously not result in a set of sorted sub-arrays that we can just append
to each other at the end
...
As with binary search in Section 4
...
Thus a suitable mergesort algorithm is:
mergesort(array a, int left, int right) {
if ( left < right ) {
mid = (left + right) / 2
mergesort(a, left, mid)
mergesort(a, mid+1, right)
merge(a, left, mid, right)
}
}
Note that it would be relatively simple to modify this mergesort algorithm to operate on
linked lists (of known length) rather than arrays
...
Of course, care needs to be taken to keep the list
size information intact, and effort is required to find the crucial pointer for each split
...
The principle of merging two sorted collections (whether they be
lists, arrays, or something else) is quite simple: Since they are sorted, it is clear that the
smallest item overall must be either the smallest item in the first collection or the smallest
item in the second collection
...
Now
the second smallest item overall must be either the second-smallest item in the first collection,
or the smallest item in the second collection, and so on
...

79

The implementation will be quite different, however, depending on which data structure
we are using
...
In contrast, when using
linked lists, it would be possible for merge to work by just changing the reference to the next
node
...

For arrays, a suitable merge algorithm would start by creating a new array b to store the
results, then repeatedly add the next smallest item into it until one sub-array is finished, then
copy the remainder of the unfinished sub-array, and finally copy b back into a:
merge(array a, int left, int mid, int right) {
create new array b of size right-left+1
bcount = 0
lcount = left
rcount = mid+1
while ( (lcount <= mid) and (rcount <= right) ) {
if ( a[lcount] <= a[rcount] )
b[bcount++] = a[lcount++]
else
b[bcount++] = a[rcount++]
}
if ( lcount > mid )
while ( rcount <= right )
b[bcount++] = a[rcount++]
else
while ( lcount <= mid )
b[bcount++] = a[lcount++]
for ( bcount = 0 ; bcount < right-left+1 ; bcount++ )
a[left+bcount] = b[bcount]
}
It is instructive to compare this with the partition2 algorithm for Quicksort to see exactly
where the two sort algorithms differ
...

Complexity of Mergesort
...
This holds for the worst case as well as the
average case
...
For arrays, 16 would once again
be a suitable size to switch to an algorithm like Selection Sort
...
This is because the
ordering of the required items only emerges at the very last stage after the large majority of
the comparisons have already been carried out
...
14

Summary of comparison-based sorting algorithms

The following table summarizes the key properties of all the comparison-based sorting algorithms we have considered:
Sorting
Algorithm
Bubble Sort
Selection Sort
Insertion Sort
Treesort
Heapsort
Quicksort
Mergesort

Strategy
employed
Exchange
Selection
Insertion
Insertion
Selection
D&C
D&C

Objects
manipulated
arrays
arrays
arrays/lists
trees/lists
arrays
arrays
arrays/lists

Worst case
complexity
O(n2 )
O(n2 )
O(n2 )
O(n2 )
O(nlog2 n)
O(n2 )
O(nlog2 n)

Average case
complexity
O(n2 )
O(n2 )
O(n2 )
O(nlog2 n)
O(nlog2 n)
O(nlog2 n)
O(nlog2 n)

Stable
Yes
No
Yes
Yes
No
Maybe
Yes

To see what the time complexities mean in practice, the following table compares the typical
run times of those of the above algorithms that operate directly on arrays:
Algorithm
Bubble Sort
Selection Sort
Insertion Sort
Heapsort
Quicksort
Quicksort2
Mergesort
Mergesort2

128
54
12
15
21
12
6
18
6

256
221
45
69
45
27
12
36
22

512
881
164
276
103
55
24
88
48

1024
3621
634
1137
236
112
57
188
112

O1024
1285
643
6
215
1131
1115
166
94

R1024
5627
833
2200
249
1200
1191
170
93

2048
14497
2497
4536
527
230
134
409
254

As before, arrays of the stated sizes are filled randomly, except O1024 that denotes an array
with 1024 entries which are already sorted, and R1024 that denotes an array which is sorted in
the reverse order
...
It should
be emphasized again that these numbers are of limited accuracy, since they vary somewhat
depending on machine and language implementation
...

It is up to the program designer to make sure that an appropriate one is picked, depending
on the properties of the data to be sorted, how it is best stored, whether all the sorted items
are required rather than some sub-set, and so on
...
15

Non-comparison-based sorts

All the above sorting algorithms have been based on comparisons of the items to be sorted,
and we have seen that we can’t get time complexity better than O(nlog2 n) with comparison
based algorithms
...

81

It is always worth thinking about the data that needs to be sorted, and whether comparisons really are required
...
How would you sort those? The answer is surprisingly simple
...
This is a very unusual situation as far as general sorting is concerned, yet
this kind of thing often comes up in every-day life
...
Rather than employing one of the comparison-based sorting
algorithms, in this situation we can do something much simpler
...
1:
3

0

4

1

2

0

1

2

3

4

create array b of size n
for ( i = 0 ; i < n ; i++ )
b[a[i]] = a[i]
copy array b into array a

Figure 9
...

This algorithm uses a second array b to hold the results, which is clearly not very memory
efficient, but it is possible to do without that
...
2:
i=0
3

0

4

1

2

1

0

4

3

2

0

1

4

3

2

0

1

2

3

4

i=0

for ( i = 0 ; i < n; i++ ) {
while ( a[i] != i )
swap a[a[i]] and a[i]
}

i=1
i=2

Figure 9
...

As far as time complexity is concerned, it is obviously not appropriate here to count the
number of comparisons
...
The
algorithm of Figure 9
...
The time complexity of the algorithm of
Figure 9
...
This algorithm performs at most n − 1 swaps, since
one item, namely a[a[i]] is always swapped into its final position
...

This example should make it clear that in particular situations, sorting might be performed by much simpler (and quicker) means than the standard comparison sorts, though
most realistic situations will not be quite as simple as the case here
...

82

9
...
For
example, suppose you are given a number of dates, by day and month, and need to sort them
into order
...
Then form one big queue out of these, by concatenating all the day queues starting
with day 1 and continuing up to day 31
...
Again form a big queue by concatenating these month queues in
order
...

This may seem surprising at first sight, so let us consider a simple example:
[25/12, 28/08, 29/05, 01/05, 24/04, 03/01, 04/01, 25/04, 26/12, 26/04, 05/01, 20/04]
...
Then concatenation of the queues gives:
[01/05, 03/01, 04/01, 05/01, 20/04, 24/04, 25/12, 25/04, 26/12, 26/04,28/08, 29/05]
...

This is called Two-phase Radix Sorting, since there are clearly two phases to it
...
Repeat this process for each sorting criterion
...

For example, if you know that your items to be sorted are all (at most) two-digit integers,
you can use Radix Sort to sort them
...
Similarly, if you know that your keys are all strings consisting of three characters, you
can again apply Radix Sort
...

Note that at no point, does the the algorithm actually compare any items at all
...

The complexity class of this algorithm is O(n), since at every phase, each item is dealt with
precisely once, and the number of phases is assumed to be small and constant
...
The concatenation of the queues will involve some overheads, of course, but these will
be small if the sets are small and linked lists, rather than arrays, are used
...
Also, if the restricted sets are not known in advance,
and potentially large, the overheads of finding and sorting them could render Radix sort worse
than using a comparison-based approach
...

84

Chapter 10

Hash Tables
10
...
g
...
g
...
g
...
We have also seen that these approaches can
perform quite differently when it comes to the particular tasks we expect to carry out on the
items, such as insertion, deletion and searching, and that the best way of storing data does
not exist in general, but depends on the particular application
...
The idea is to simply put each item in an easily determined location, so
we never need to search for it, and have no ordering to maintain when inserting or deleting
items
...
e
...

We first need to specify what we expect to be able to do with this way of storing data,
without considering how it is actually implemented
...
This is similar to what you will generally do when first trying to implement
a class in Java: You should think about the operations you wish to perform on the objects
of that class
...

The approach we have been following for defining abstract data types in these notes is by
describing the crucial operations in plain English, trusting that they are simple enough to not
need further explanations
...
An important aspect of studying software engineering is to learn about and
use more formal approaches to this way of operating
...
The data structure to be considered in this
chapter is a particular type of table known as a hash table
...
2

The Table abstract data type

The specification of the table abstract data type is as follows:
1
...
The objects can be arbitrarily complicated
...
The keys are used in order to identify objects in much the way we have done
for searching and sorting
...
We assume that there are methods or procedures for:
(a) determining whether the table is empty or full;
(b) inserting a new object into the table, provided the table is not already full;
(c) given a key, retrieving the object with that key;
(d) given a key, updating the item with that key (usually by replacing the item with a
new one with the same key, which is what we will assume here, or by overwriting
some of the item’s variables);
(e) given a key, deleting the object with that key, provided that object is already stored
in the table;
(f) listing or traversing all the items in the table (if there is an order on the keys then
we would expect this to occur in increasing order)
...

In a programming language such as Java, we could write an interface for this abstract
data type as follows, where we assume here that keys are objects of a class we call Key and
we have records of a class called Record:
interface Table {
Boolean isEmpty();
Boolean isFull();
void Insert(Record);
Record Retrieve(Key);
void Update(Record);
void Delete{Key};
void Traverse();
}
Note that we have not fixed how exactly the storage of records should work – that is something that comes with the implementation
...
You could certainly carry out all those operations
with binary search trees and sorted or unsorted arrays if you wished
...

This general approach follows the sensible and commonly used way to go about defining a
Java class: First think about what it is you want to do with the class, and only then wonder
86

about how exactly you might implement the methods
...
But notice that, as opposed to a specification
in plain English, such as the above, a definition of an interface is only a partial specification
of an abstract data type, because it does not explain what the methods are supposed to do;
it only explains how they are called
...
3

Implementations of the table data structure

There are three key approaches for implementing the table data structure
...
Let us assume that we want to implement the table
data structure using a sorted array
...
Then to insert an element we first have to find
its proper position, which will take on average the same time as finding an element
...
4, so this takes O(log2 n)
...
However, if we wish to delete or insert an item, we will have to shift
what is ‘to the right’ of the location in question by one, either to the left (for deletion) or to
the right (for insertion)
...
Traversal in order is simple, and is of O(n) complexity as well
...
However, we know already that in the worst case, the tree
can be very deep and narrow, and that these trees will have linear complexity when it comes
to looking up an entry
...
For those trees, insertion,
deletion, search, retrieval and update, can all be done with time complexity O(log2 n), and
traversal has O(n) complexity
...
The remainder of this
chapter will describe how this is done, and what the various computational costs are
...
4

Hash Tables

The underlying idea of a hash table is very simple, and quite appealing: Assume that, given
a key, there was a way of jumping straight to the entry for that key
...
Assume that we have an array data to hold our entries
...

It would be easiest if we could just make the data array big enough to hold all the keys
that might appear
...
In this case, the function h would be the identity function h(k) = k
...
For example, many American companies use their employees’
9-digit social security number as a key, even though they have nowhere near 109 employees
...
Clearly it would be very inefficient, if not impossible,
to reserve space for all 109 social security numbers which might occur
...
For example, if we had to store entries
about 500 employees, we might create an array with 1000 entries and use three digits from
their social security number (maybe the first or last three) to determine the place in the array
where the records for each particular employee should be stored
...
Much of the remainder of this chapter will be spent on the various
strategies for dealing with such collisions
...
If the keys that are likely to
actually occur are not evenly spread throughout the space of all possible keys, particular
attention should be paid to choosing the hash function h in such a way that collisions among
them are less likely to occur
...
However, that problem might easily be avoided
by a more prudent choice, such as the last three digits
...
5

Collision likelihoods and load factors for hash tables

One might be tempted to assume that collisions do not occur very often when only a small
subset of the set of possible keys is chosen, but this assumption is mistaken
...
As an example, consider a collection of people, and a
hash function that gives their birthdays as the number of the day in the year, i
...
1st January
is 1, 2nd January is 2,
...
One might think that if all we want to do
is store a modest number of 24 people in this way in an array with 365 locations, collisions
will be rather unlikely
...
This is so surprising at first sight that this phenomenon has become known as the von
Mises birthday paradox , although it is not really a paradox in the strict sense
...
Suppose we have a group of n people and
want to find out how likely it is that two of them have the same birthday, assuming that the
birthdays are uniformly distributed over the 365 days of the year
...
It is actually easier to first compute the probability q(n) that no two of them share a
birthday, and then p(n) = 1 − q(n)
...
For n = 2
we get q(2) = 364/365 because, for the added second person, 364 of the 365 days are not the
birthday of the first person
...
476 and p(23) = 0
...
Note that in the real world, the distribution of birthdays over the year is not precisely
uniform, but this only increases the probability that two people have the same birthday
...

Implications for hash tables
...
And collisions will be even more
likely if the hash function does not distribute the items randomly throughout the table
...

The load factor of a hash table
...
Then we call λ = n/m the load factor of the hash table
...
25 is 25% full, one with load factor 0
...
Then if we have a hash
table with load factor λ, the probability that a collision occurs for the next key we wish to
insert is λ
...
If these optimistic
assumptions fail, then the probability may be even higher
...
Fifty percent is
an often quoted good maximum figure, while beyond an eighty percent load the performance
deteriorates considerably
...

10
...
Obviously, this example is designed to illustrate the principle – typical real-world
hash tables are usually very much bigger, involving arrays that may have a size of thousands,
millions, or tens of millions, depending on the problem
...
Let us consider one of the many possibilities
...
The string X1 X2 X3 therefore gives us three numbers from 0 to 25, say k1 , k2 ,
and k3
...

89

That is, we think of the strings as coding numbers in base 26
...
For example, we can take the remainder the number leaves when divided by 11
...
So our hash function is
h(X1 X2 X3 ) = (k1 ∗ 262 + k2 ∗ 26 + k3 )%11 = k%11
...

As a simple example of a hash table in operation, assume that we now wish to insert the
following three-letter airport acronyms as keys (in this order) into our hash table: PHL, ORY,
GCM, HKG, GLA, AKL, FRA, LAX, DCA
...

We naturally start off with an empty table of the required size, i
...
11:

Clearly we have to be able to tell whether a particular location in the array is still empty, or
whether it has already been filled
...

However, for clarity, this key will not appear in the pictures we use
...
The number associated with the first item
PHL is 4, so we place it at index 4, giving:
PHL
Next is ORY, which gives us the number 8, so we get:
PHL

ORY

Then we have GCM, with value 6, giving:
PHL

GCM

ORY

Then HKG, which also has value 4, results in our first collision since the corresponding position
has already been filled with PHL
...

10
...
One obvious option is to reserve a two-dimensional array from the start
...
Then we could put HKG into the slot ‘beneath’ PHL, and
GLA in the one beneath ORY, and continue filling the table in the order given until we reach:
0

1
LAX
DCA

2

3

4
PHL
HKG

5
FRA

6
GCM

7
AKL

8
ORY
GLA

9

10

The disadvantage of this approach is that it has to reserve quite a bit more space than will be
eventually required, since it must take into account the likely maximal number of collisions
...

Moreover, when searching for a particular key, it will be necessary to search the entire column
associated with its expected position, at least until an empty slot is reached
...
The average complexity of searching for a particular item depends on how many entries
in the array have been filled already
...

Direct chaining
...
The result for the above example can be pictured something like this:
0

1

2

3

4

5

LAX

PHL

FRA

DCA

HKG

6

7

8

9

10

GCM AKL ORY

GLA

This approach does not reserve any space that will not be taken up, but has the disadvantage
that in order to find a particular item, lists will have to be traversed
...

We can compute the size of the average non-empty list occurring in the hash table as
follows
...
So the number of slots with at least one item falling in it is

m − 1 n
N (n, m) = m
...
1 − (
)
m
91

and since there are n items altogether, the average number of items in a non-empty list is:
k(n, m) =

n
n

...
1 − ( m−1
m

Then a linear search for an item in a list of size k takes on average
k(k + 1)
1
k+1
1 + 2 + ··· + k =
=
k
2k
2
comparisons
...
e
...
That shows there are
k+1
λ λ2
=1+ +
+ O(λ3 )
2
4 24
comparisons on average for a successful search, i
...
that this has O(1) complexity
...
That
will clearly be n/m = λ, and so in an unsuccessful search the average number of comparisons
made to decide the item in question is not present will be λ, which is again O(1)
...
Note also that insertion is done even more speedily, since all we
have to do is to insert a new element at the front of the appropriate list
...
e
...
For traversal, we need
to sort the keys, which can be done in O(nlog2 n), as we know from Chapter 9
...
This speed-up would be paid for by making the insertion operation more expensive, i
...

take slightly longer, but it will still have constant complexity
...

Open addressing
...
We refer to that position as a key’s primary position (so
in the earlier example, ORY and GLA have the same primary position)
...
If this reaches the beginning of the array, i
...
index 0, we
start again at the end
...
A better approach is to search
for an empty location using a secondary hash function
...

We will now look at both of these approaches in some detail
...
8

Linear Probing

We now proceed with the earlier example using linear probing
...

Linear probing reduces the index by one to 3, and finds an empty location in that position,
so we put HKG there giving:
HKG

PHL

GCM

ORY

Next we wish to insert GLA, with hash value 8, but the location with that index is already
filled by ORY
...
So we try the next index down, but that contains GCM, so we
continue to the next one at index 5 which s empty, so we put AKL there:
HKG

PHL

AKL

GCM

GLA

ORY

We now continue in the same way with the remaining keys, eventually reaching:
DCA

LAX

FRA

HKG

PHL

AKL

GCM

GLA

ORY

This looks quite convincing - all the keys have been inserted in a way that seems to make
good use of the space we have reserved
...
Instead, we will have to
follow its possible insertion locations until we hit an empty one, which tells us that the key
we were looking for is not present, after all, because it would have been inserted there
...
However, as we shall see,
hash tables lose much of their speed advantage if they have a high load factor, so as a matter
of policy, many more locations should be kept empty
...
Searching for JFK, on the other hand, we would start with its proper position,
given by the hash function value 8, so we would check indices 8, 7, 6,
...
This looks
pretty bad at first sight, but bear in mind that we said that we will aim towards keeping the
load factor at around 50 percent, so there would be many more empty slots which effectively
stop any further search
...
Suppose we now delete GCM from the table
and then search for AKL again
...
This is clearly not acceptable, but
equally, we do not wish to have to search through the entire array to be sure that an entry is
not there
...
Let us assume that we use the character ‘!’ for that
...
If,
on the other hand, we are trying to insert a key, then we can ignore any exclamation marks
and fill the position once again
...
A large
number of exclamation marks means that we have to keep looking for a long time to find a
particular entry despite the fact that the load factor may not be all that high
...
In such cases, it may be better to re-fill a new hash table
again from scratch, or use another implementation
...
The complexity of open addressing with linear probing is rather difficult to compute, so we will not attempt to present a full account of it here
...
For
relatively small load factors, this is quite impressive, and even for larger ones, it is not bad
...
e
...

Clustering
...
Consider what happens if we try to insert two keys that
have the same result when the hash function is applied to them
...
So we
keep checking the same locations we only just checked in order to insert GLA
...
This effect is known as primary clustering because the
new key JFK will be inserted close to the previous key with the same primary position, GLA
...
So these blocks, or clusters, keep growing, not only if we hit the same primary
location repeatedly, but also if we hit anything that is part of the same cluster
...
Note that searching for keys is also adversely affected by these
clustering effects
...
9

Double Hashing

The obvious way to avoid the clustering problems of linear probing is to do something slightly
more sophisticated than trying every position to the left until we find an empty one
...
We apply a secondary hash function to tell us how many slots to
jump to look for an empty slot if a key’s primary position has been filled already
...
In the above example, one thing we could do is take the same number k associated
with the three-character code, and use the result of integer division by 11, instead of the
remainder, as the secondary hash function
...
Thus we would like to use as our secondary hash function h2 (n) = (k/11)%11
...
An easy solution is to simply make the secondary hash
function one if the above would evaluate to zero, that is:

(k/11)%11 if (k/11)%11 6= 0,
h2 (n) =
1
otherwise
...
Since h2 (HKG) = 3 we
now try every third location to the left in order to find a free slot:
HKG

PHL

GCM

ORY

Note that this did not create a block
...
Since h2 (GLA) = 9, we now try every ninth location
...

Note that we still have not got any blocks, which is good
...
Here is the result when filling the
table with the remaining keys given:
HKG

DCA

PHL

FRA

GCM

AKL

ORY

LAX

GLA

Our example is too small to show convincingly that this method also avoids secondary clustering, but in general it does
...
It is also worth noting that, in both cases, proceeding to secondary positions
to the left is merely a convention – it could equally well be to the right – but obviously it has
to be made clear which direction has been chosen for a particular hash table
...
The efficiency of double hashing is even more difficult to compute
than that of linear probing, and therefore we shall just give the results without a derivation
...
Note that it is the natural logarithm (to base
e = 2
...
) that occurs here, rather than the usual logarithm to base 2
...
e
...

95

10
...
However, what
is important when choosing a good hash function is to make sure that it spreads the space of
possible keys onto the set of hash table indices as evenly as possible, or more collisions than
necessary will occur
...
Therefore, when defining hash functions of strings of characters, it is never a
good idea to make the last (or even the first) few characters decisive
...
Secondly, one has to be careful to ensure that the
secondary hash function cannot result in a number which has a common divisor with the
size of the hash table
...
Even for large hash tables, this can still be a problem if the secondary hash keys
can be similarly large
...

10
...
g
...
5, but having higher load factors
can considerably slow down the operations
...
The following table shows the average number of locations that need to be checked to
conduct successful and unsuccessful searches in hash tables with different collision handling
strategies, depending on the load factor given in the top row
...

Strategy
Successful Search
Direct chaining
Linear probing
Double hashing
Unsuccessful search
Direct chaining
Linear probing
Double hashing

0
...
25

1
...
06
1
...
12
1
...
15

0
...
12
1
...
25
1
...
33

0
...
75

0
...
99

1
...
50
1
...
37
2
...
85

1
...
50
2
...
48
50
...
65

0
...
50
2
...
75
8
...
00

0
...
50
10
...
99
5000
...
00

It also shows the considerable advantage that double hashing has over linear probing, particularly when the load factors become large
...

The following table shows a comparison of the average time complexities for the different
possible implementations of the table interface:

Sorted array
Balanced BST
Hash table

Search
O(log2 n)
O(log2 n)
O(1)

Insert
O(n)
O(log2 n)
O(1)

Delete
O(n)
O(log2 n)
O(1)

Traverse
O(n)
O(n)
O(nlog2 n)

Hash tables are seen to perform rather well: the complexity of searching, updating and
retrieving are all independent of table size
...
For example, lots of
repeated deletions and insertions can cause efficiency problems with some hash table strategies,
as explained above
...
25 comparisons to complete a successful search
...
39 comparisons if we use a hash table, provided that we
keep its load factor below 50 percent
...

97

Chapter 11

Graphs
Often it is useful to represent information in a more general graphical form than considered
so far, such as the following representation of the distances between towns:
Glasgow

44

Edinburgh
110

215

Newcastle
168

Manchester
286
80
Birmingham
Swansea

119
117
London

157
198
Exeter

With similar structures (maybe leaving out the distances, or replacing them by something
else), we could represent many other situations, like an underground tunnel network, or a
network of pipes (where the number label might give the pipe diameters), or a railway map,
or an indication of which cities are linked by flights, or ferries, or political alliances
...

There is much more that can be done with such a picture of a situation than just reading
off which place is directly connected with another place: For example, we can ask ourselves
98

whether there is a way of getting from A to B at all, or what is the shortest path, or what
would be the shortest set of pipes connecting all the locations
...

11
...
A graph consists of
a series of nodes (also called vertices or points), displayed as nodes, and edges (also called
lines, links or, in directed graphs, arcs), displayed as connections between the nodes
...
e
...
The remainder
of this Chapter will assume that, which is sufficient for most practical applications
...
We distinguish between directed and undirected graphs
...
Think of them as representing roads, where some roads may be one-way only
...
An example of an unweighted digraph is:
A

B

D

C

E

and an example of a weighted digraph, because it has labels on its edges, is:
4
A

D

1

2

1
2

2

E
6
1

2
2

B

C

3

3

In undirected graphs, we assume that every edge can be viewed as going both ways, that is,
an edge between A and B goes from A to B as well as from B to A
...

A path is a sequence of nodes or vertices v1 , v2 ,
...
Note that in a directed graph, the edge from vi to vi+1
is the one which has the corresponding direction
...
A path is simple if no vertex appears on it twice (with
the exception of a circle, where the first and last vertex may be the same – this is because we
have to ‘cut open’ the circle at some point to get a path, so this is inevitable)
...
For
directed graphs, the notion of connectedness has two distinct versions: We say that a digraph
is weakly connected if for every two vertices A and B there is either a path from A to B or a
path from B to A
...
So,
in a weakly connected digraph, there may be two vertices i and j such that there exists no
path from i to j
...
In fact, any tree can be viewed
as a simple graph of a particular kind, namely one that is connected and contains no circles
...
Instead, if two vertices A and B are connected by an edge e, we say that they are
neighbours, and the edge connecting them is said to be incident to A and B
...

11
...
At
no time was there ever a connection between all the items represented, apart from the order in
which their keys appeared
...
Now, on
the other hand, it is the connections that are the crucial information we need to encode in
the data structure
...

Array-based implementation
...

However, this only works if the graph is given explicitly, that is, if we know in advance how
many vertices there will be, and which pairs will have edges between them
...
For unweighted graphs, we can do this quite easily
in an n × n two-dimensional binary array adj, also called a matrix , the so-called adjacency
matrix
...

The array/matrix representations for the two example graphs shown above are then:

A
B
C
D
E

0
1
2
3
4

A
0
0
0
1
0
0

B
1
1
0
0
0
0

C
2
0
1
0
1
0

D
3
1
0
0
0
0

E
4
0
0
1
1
0

A
B
C
D
E
100

0
1
2
3
4

A
0
0
2
∞
∞
∞

B
1
1
0
3
∞
∞

C
2
∞
2
0
∞
3

D
3
4
2
2
0
2

E
4
∞
6
1
1
0

In the first case, for the unweighted graph, a ‘0’ in position adj[i][j] reads as false, that
is, there is no edge from the vertex i to the vertex j
...
It is often useful to use boolean values here, rather than the
numbers 0 and 1, because it allows us to carry out operations on the booleans
...

For an undirected graph, if there is a 1 in the ith column and the jth row, we know that
there is an edge from vertex i to the vertex with the number j, which means there is also an
edge from vertex j to vertex i
...
We say that such a
matrix is symmetric – it equals its mirror image along the main diagonal
...
There is a potential problem with the adjacency/weight matrix
representation: If the graph has very many vertices, the associated array will be extremely
large (e
...
, 10,000 entries are needed if the graph has just 100 vertices)
...
e
...

A solution to this problem is to number all the vertices as before, but, rather than using a
two-dimensional array, use a one-dimensional array that points to a linked list of neighbours
for each vertex
...
This implementation is using
so-called adjacency lists
...
In Java, this could
be accomplished with something like:
class Graph {
Vertex[] heads;
private class Vertex {
int name;
double weight;
Vertex next;

...
//methods for graphs
}

101

Pointer-based implementation
...
In a
language such as Java, a class Graph might have the following as an internal class:
class Vertex {
string name;
Vertex[] neighbours;
double[] weights;
}
When each vertex is created, an array neighbours big enough to accommodate (pointers to)
all its neighbours is allocated, with (for weighted graphs) an equal sized array weights to
accommodate the associated weights
...
Any entries in the neighbours array that are not needed will
hold a null pointer as usual
...
3

2

2

2

6

2

3

2

2

3

1

Relations between graphs

Many important theorems about graphs rely on formal definitions of the relations between
them, so we now define the main relevant concepts
...
e
...
A subgraph of a
graph G is defined as any graph that has a vertex set which is a subset of that of G, with
adjacency relations which are a subset of those of G
...
Finally, a graph G is said to contain
another graph H if there exists a subgraph of G that is either H or isomorphic to H
...
The reverse
operation of smoothing removes a vertex w with exactly two edges e1 and e2 , leaving an edge
e connecting the two adjacent vertices u and v:

102

A subdivision of a graph G can be defined as a graph resulting from the subdivision of edges
in G
...

An edge contraction removes an edge from a graph and merges the two vertices previously
connected by it
...
These are not allowed in simple graphs, in which case some
edges may need to be deleted
...

11
...
In other words, it can be drawn on a
sheet of paper in such a way that no edges cross each other
...

Note that it is clearly possible for planar graphs to be drawn in such a way that their
edges do cross each other, but the crucial thing is that they can be transformed (by moving
vertices and/or deforming the edges) into a form without any edges crossing
...
Clearly all sub-graphs
of this will also be planar
...
For small graphs, it is easy to check systematically that there are no
possible vertex repositionings or edge deformations that will bring the graph into explicitly
planar form
...
In fact, it can be proved that these two graphs form the basis of some useful
theorems about planarity
...
Another, based on the concept of minors, is Wagner’s
theorem which states that “a finite graph is planar if and only if it does not have K5 or K3,3
as a minor”
...
This is not entirely straightforward,
but algorithms do exist which allow a graph with n vertices to be tested for planarity with
time complexity O(n)
...

11
...
e
...
Because,
unlike trees, graphs do not have a root vertex, there is no natural place to start a traversal,
and therefore we assume that we are given, or randomly pick, a starting vertex i
...

The first is known as breadth first traversal : We start with the given vertex i
...
We then remove the first vertex from the
queue and one by one put its neighbours at the end of the queue
...
We do this until
the queue is empty
...
If there is a
circle in the graph, like A, B, C in the first unweighted graph above, we would revisit a vertex
we have already visited, and thus we would run into an infinite loop (visiting A’s neighbours
puts B onto the queue, visiting that (eventually) gives us C, and once we reach C in the queue,
we get A again)
...
In the
above algorithm, we only add a vertex j to the queue if done[j] is false
...
This way, we will not visit any vertex more than once,
and for a finite graph, our algorithm is bound to terminate
...

To see why this is called breadth first search, we can imagine a tree being built up in this
way, where the starting vertex is the root, and the children of each vertex are its neighbours
(that haven’t already been visited)
...

The second traversal strategy is known as depth first traversal : Given a vertex i to start
from, we now put it on a stack rather than a queue (recall that in a stack, the next item to
be removed at any time is the last one that was put on the stack)
...
We then repeatedly pop the next vertex from the stack,
mark it as done, and put its neighbours on the stack, provided they have not been marked as
done, just as we did for breadth first traversal
...
Again, we can see why this is called depth first by
104

formulating the traversal as a search tree and looking at the order in which the items are
added and processed
...
There is no reason why A’s neighbour B should be visited before D in the
example
...
Note also that the only vertices that will be listed are those in the same
connected component as A
...

Exercises: Write algorithms, in pseudocode, to (1) visit all nodes of a graph, and (2) decide
whether a given graph is connected or not
...

11
...
This number is called the length of the
path
...
Note that there need not be a unique
shortest path, since several paths might have the same length
...

Note that the weights do not necessarily have to correspond to distances; they could, for
example, be time (in which case we could speak of “quickest paths”) or money (in which case
we could speak of “cheapest paths”), among other possibilities
...
But notice that we do need to restrict the edge weights to be nonnegative numbers, because if there are negative numbers and cycles, we can have increasingly
long paths with lower and lower costs, and no path with minimal cost
...

Dijkstra’s algorithm
...
Given the start node, Dijkstra’s algorithm computes shortest paths starting from s and
ending at each possible node
...
Because the algorithm, although elegant
and short, is fairly complicated, we shall consider it one component at a time
...
We keep an array D of distances indexed by the
vertices
...
However, before the algorithm finishes, D[z] is the best overestimate
we currently have of the distance from s to z
...
Then the algorithm repeatedly decreases the
overestimates until it is no longer possible to decrease them further
...

Improving estimates
...
Suppose
that, for two given vertices u and z, it happens that D[u] + weight[u][z] < D[z]
...
This corresponds to the code fragment
if ( D[u] + weight[u][z] < D[z] )
D[z] = D[u] + weight[u][z]
of the full algorithm given below
...

Dijkstra’s algorithm, Version 1
...
(It is always a good idea to
start with an inefficient simple algorithm, so that the results from it can be used to check
the operation of a more complex efficient algorithm
...

The following algorithm implements that idea:
// Input: A directed graph with weight matrix ‘weight’ and
//
a start vertex ‘s’
...

// We begin by buiding the distance overestimates
...

for ( each vertex z of the graph ) {
if ( z is not the start vertex s )
D[z] = infinity
// This is certainly an overestimate
...

for ( each vertex z of the graph ) {
tight[z] = false
}
106

// We now repeatedly update the arrays ‘D’ and ‘tight’ until
// all entries in the array ‘tight’ hold the value true
...

}
// At this point, all entries of array ‘D’ hold tight estimates
...
What is perhaps not so clear is why the estimates it holds
are actually tight, i
...
are the minimal path lengths
...
To see this, suppose that
you wish to navigate from a vertex s to a vertex z, and that the shortest path from s to
z happens to go through a certain vertex u
...
Given that the whole, unsplit path is a shortest path from s to z, the initial
sub-path has to be a shortest path from s to u, for if not, then you could shorten your path
from s to z by replacing the initial sub-path to u by a shorter path, which would not only give
a shorter path from s to u but also from s to the final destination z
...
The
reason is that shortest paths cannot have cycles
...

If, as tends to be the case in practice, we also wish to compute the route of shortest path,
rather than just its length, we also need to introduce a third array pred to keep track of the
‘predecessor’ or ‘previous vertex’ of each vertex, so that the path can be followed back from
the end point to the start point
...

The time complexity of this algorithm is clearly O(n2 ) where n is the number of vertices,
since there are operations of O(n) nested within the repeat of O(n)
...
Suppose we want to compute the shortest path from A (node 0) to E
(node 4) in the weighted graph we looked at before:
4
A

D

1

2

1
2

2

E
6
1

2
2

B

C

3

107

3

A direct implementation of the above algorithm, with some code added to print out the status
of the three arrays at each intermediate stage, gives the following output, in which“oo” is used
to represent the infinity symbol “∞”:
Computing shortest paths from A
|A
B
C
D
E
--------+--------------------------------------D
|0
oo
oo
oo
oo
tight
|no
no
no
no
no
pred
...

Neighbour B has estimate decreased from oo to 1 taking a shortcut via A
...

|A
B
C
D
E
--------+--------------------------------------D
|0
1
oo
4
oo
tight
|yes
no
no
no
no
pred
...

Neighbour
Neighbour
Neighbour
Neighbour

A
C
D
E

is already tight
...

has estimate decreased from 4 to 3 taking a shortcut via B
...

|A
B
C
D
E
--------+--------------------------------------D
|0
1
3
3
7
tight
|yes
yes
no
no
no
pred
...

Neighbour B is already tight
...

Neighbour E has estimate decreased from 7 to 4 taking a shortcut via C
...

|none
A
B
B
C
108

Vertex D has minimal estimate, and so is tight
...

|A
B
C
D
E
--------+--------------------------------------D
|0
1
3
3
4
tight
|yes
yes
yes
yes
no
pred
...

Neighbour C is already tight
...

|A
B
C
D
E
--------+--------------------------------------D
|0
1
3
3
4
tight
|yes
yes
yes
yes
yes
pred
...

A shortest path from A to E is:

A B C E
...
For
example, using a “*” to represent tight, the distance, status and predecessor for each node at
each stage of the above example can be listed more concisely as follows:
Stage |
A
B
C
D
E
-------+-----------------------------------------------1
| 0
oo
oo
oo
oo
2
| 0 *
1
A
oo
4
A
oo
3
| 0 *
1 * A
3
B
3
B
7
B
4
| 0 *
1 * A
3 * B
3
B
4
C
5
| 0 *
1 * A
3 * B
3 * B
4
C
6
| 0 *
1 * A
3 * B
3 * B
4 * C
A shortest path from A to E is:

A B C E
...
The time complexity of Dijkstra’s algorithm can be
improved by making use of a priority queue (e
...
, some form of heap) to keep track of which
node’s distance estimate becomes tight next
...
The previous algorithm then becomes:
109

// Input: A directed graph with weight matrix ‘weight’ and
//
a start vertex ‘s’
...

// We begin by buiding the distance overestimates
...

for ( each vertex z of the graph ) {
if ( z is not the start vertex s )
D[z] = infinity
// This is certainly an overestimate
...

Create a priority queue containing all the vertices of the graph,
with the entries of D as the priorities
// Then we implicitly build the path tree discussed above
...

u = remove vertex with smallest priority from queue
for ( each vertex z in the queue which is adjacent to u ) {
if ( D[u] + weight[u][z] < D[z] ) {
D[z] = D[u] + weight[u][z]
// Lower overestimate exists
...

If the priority queue is implemented as a Binary or Binomial heap, initializing D and creating
the priority queue both have complexity O(n), where n is the number of vertices of the graph,
and that is negligible compared to the rest of the algorithm
...
Removals happen O(n) times, and priority changes happen O(e) times,
where e is the number of edges in the graph, so the cost of maintaining the queue and updating
D is O((e + n)log2 n)
...
Using a Fibonacci heap for the priority queue allows priority updates of
O(1) complexity, improving the overall complexity to O(e + nlog2 n)
...
So, in that case, the time complexity is actually greater than or equal to the
previous simpler O(n2 ) algorithm
...
That is, there are usually not many more edges than vertices, and in
this case the time complexity for both priority queue versions is O(nlog2 n), which is a clear
improvement over the previous O(n2 ) algorithm
...
7

Shortest paths – Floyd’s algorithm

If we are not only interested in finding the shortest path from one specific vertex to all the
others, but the shortest paths between every pair of vertices, we could, of course, apply
Dijkstra’s algorithm to every starting vertex
...
This maintains a square matrix ‘distance’ which contains
the overestimates of the shortest paths between every pair of vertices, and systematically
decreases the overestimates using the same shortcut idea as above
...

In the algorithm below, we attempt to decrease the estimate of the distance from each
vertex s to each vertex z by going systematically via each possible vertex u to see whether
that is a shortcut; and if it is, the overestimate of the distance is decreased to the smaller
overestimate, and the predecessor updated:
// Store initial estimates and predecessors
...

for ( each vertex u ) {
for ( each vertex s ) {
for ( each vertex z ) {
if ( distance[s][u]+distance[u][z] < distance[s][z] ) {
distance[s][z] = distance[s][u]+distance[u][z]
predecessor[s][z] = predecessor[u][z]
}
}
}
}
As with Dijkstra’s algorithm, this can easily be adapted to the case of non-weighted graphs
by assigning a suitable weight matrix of 0s and 1s
...

This is the same complexity as running the O(n2 ) Dijkstra’s algorithm once for each of the n
possible starting vertices
...
However, if the graph is sparse with e = O(n),
then multiple runs of Dijkstra’s algorithm can be made to perform with time complexity
O(n2 log2 n), and be faster than Floyd’s algorithm
...
Suppose we want to compute the lengths of the shortest paths between
all vertices in the following undirected weighted graph:

We start with distance matrix based on the connection weights, and trivial predecessors:

Start

A
B
C
D
E

A
0
1
14
4
∞

B
1
0
∞
∞
2

C
14
∞
0
8
10

D
4
∞
8
0
1

E
∞
2
10
1
0

A
B
C
D
E

A
A
B
C
D
E

B
A
B
C
D
E

C
A
B
C
D
E

D
A
B
C
D
E

E
A
B
C
D
E

Then for each vertex in turn we test whether a shortcut via that vertex reduces any of the
distances, and update the distance and predecessor arrays with any reductions found
...
So the shortest distance from C to B is 11, and the predecessors of B are E,
then D, then C, giving the path C D E B
...

11
...
Suppose you have been given a
weighted undirected graph such as the following:
A
6

5
1

B

5

5

C

D
6
3

4
2

6
E

F

We could think of the vertices as representing houses, and the weights as the distances between
them
...
For obvious reasons, you will want to keep the amount of
digging and laying of pipes or cable to a minimum
...
e
...
For
example, if we have already chosen the edge between A and D, and the one between B and
D, then there is no reason to also have the one between A and B
...
Also, assuming that we have only one feeding-in point (it is of
no importance which of the vertices that is), we need the whole layout to be connected
...

113

Hence, what we are looking for is a minimal spanning tree of the graph
...
Here, minimal refers to the sum of all the weights of the
edges contained in that tree, so a minimal spanning tree has total weight less than or equal
to the total weight of every other spanning tree
...

Observations concerning spanning trees
...
So, to come up with some ideas which will allow us to develop an
algorithm for the minimal spanning tree problem, we shall need to make some observations
about minimal spanning trees
...
Here are some examples:

We can immediately notice that their general shape is such that if we add any of the remaining
edges, we would create a circle
...
These observations are not quite
sufficient to lead to an algorithm, but they are good enough to let us prove that the algorithms
we find do actually work
...
We say that an algorithm is greedy if it makes its decisions based only
on what is best from the point of view of ‘local considerations’, with no concern about how the
decision might affect the overall picture
...
The algorithm is
greedy in the sense that the decision at each step is based only on what is best for that next
step, and does not consider how that will affect the quality of the final overall solution
...

Prim’s Algorithm – A greedy vertex-based approach
...
Then we can consider all the edges which
connect a vertex in S to one outside of S, and add to S one of those that has minimal weight
...
This process can
be repeated, starting with any vertex to be the sole element of S, which is a trivial minimal
spanning tree containing no edges
...

When implementing Prim’s algorithm, one can use either an array or a list to keep track
of the set of vertices S reached so far
...
That is, the
114

vertex in S which has an edge to i with minimal weight
...

For the above graph, starting with S = {A}, the tree is built up as follows:
A

A

1

1

B

C

C

F

E

4

2

E

F

A

1

1

5

C
4

2

D
E

C
D

F

A

B

4

D

E

1

B

D

B

A

F

B

5
3

C
4

2

D

E

F

It is slightly more challenging to produce a convincing argument that this algorithm really
works than it has been for the other algorithms we have seen so far
...
There are several possible proofs that it is, but none
are straightforward
...

Let Y be the output of Prim’s algorithm, and X1 be any minimal spanning tree
...
Clearly, if X1 = Y , then Prim’s algorithm has generated a minimal
spanning tree
...
Then, since
X1 is a spanning tree, it must include a path connecting the two endpoints of e, and because
circles are not allowed, there must be an edge in X1 that is not in Y , which we can call f
...
Then create
tree X2 that is X1 with f replaced by e
...
Now we can repeat this process until we have replaced all the
edges in X1 that are not in Y , and we end up with the minimal spanning tree Xn = Y , which
completes the proof that Y is a minimal spanning tree
...
However, as with Dijkstra’s algorithm, a Binary
or Binomial heap based priority queue can be used to speed things up by keeping track of
which is the minimal weight vertex to be added next
...
Finally, using the more sophisticated
Fibonacci heap for the priority queue can improve this further to O(e + nlog2 n)
...

Just as with Floyd’s versus Dijkstra’s algorithm, we should consider whether it eally is
necessary to process every vertex at each stage, because it could be sufficient to only check
actually existing edges
...
This algorithm does not consider the vertices directly at all, but builds a minimal spanning tree by considering and adding
edges as follows: Assume that we already have a collection of edges T
...
If we start with T being the empty set, and continue
until no more edges can be added, a minimal spanning tree will be produced
...

For the same graph as used for Prim’s algorithm, this algorithm proceeds as follows:
A
1
B

1

C
D
E

1

2

F

3

2

1

1
5
3

4

2

3

4

2

In practice, Kruskal’s algorithm is implemented in a rather different way to Prim’s algorithm
...
There are implementations
of that which can be achieved with overall time complexity O(elog2 e), which is dominated by
the O(elog2 e) complexity of sorting the e edges in the first place
...
If the graph is sparse, i
...
the
116

number of edges is not much more than the number of vertices, then Kruskal’s algorithm will
have the same O(nlog2 n) complexity as the optimal priority queue based versions of Prim’s
algorithm, but will be faster than the standard O(n2 ) Prim’s algorithm
...
e
...

11
...
There are further graph based problems that are even more complex
...
There are
currently no known polynomial time algorithms for solving this
...
g
...
Exercise: write an algorithm in pseudocode
that solves the Travelling Salesman Problem, and determine its time complexity
...
This involves finding a series of routes to service
a number of customers with a fleet of vehicles with minimal cost, where that cost may be
the number of vehicles required, the total distance covered, or the total driver time required
...
In such cases, a multi-objective
optimization approach is required which returns a Pareto front of non-dominated solutions,
i
...
a set solutions for which there are no other solutions which are better on all objectives
...

Since exact solutions to these problems are currently impossible for all but the smallest
cases, heuristic approaches are usually employed, such as evolutionary computation, which
deliver solutions that are probably good but cannot be proved to be optimal
...
That has the additional
advantage of being able to generate a whole Pareto front of solutions rather than just a single
solution
...

117

Chapter 12

Epilogue
Hopefully the reader will agree that these notes have achieved their objective of introducing
the basic data structures used in computer science, and showing how they can be used in the
design of useful and efficient algorithms
...

It has been demonstrated how ideas from combinatorics and probability theory can be used
to compute the efficiency of algorithms as a function of problem size
...

General ideas such as recursion and divide-and-conquer have been used to provide more
efficient algorithms, and inductive definitions and invariants have been used to establish proofs
of correctness of algorithms
...

Clearly, these notes have only been able to provide a brief introduction to the topic, and
the algorithms and data structures and efficiency computations discussed have been relatively
simple examples
...

118

Appendix A

Some Useful Formulae
The symbols a, b, c, r and s represent real numbers, m and n are positive integers, and indices
i and j are non-negative integers
...
1

Binomial formulae

(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2 b + 3ab2 + b3

A
...
3

√
√ m
am/n = n am = n a
√
√
a−(m/n) = 1/( n am ) = 1/( n a)m

Logarithms

Definition: The logarithm of c to base a, written as loga c, is the real number b satisfying the
equation c = ab , in which we assume that c > 0 and a > 1
...
From the definition, we immediately see that:
aloga c = c

and

loga ab = b

and we can move easily from one base a to another a0 using:
loga0 b = loga0 a ∗ loga b
...
4

Sums

We often find it useful to abbreviate a sum as follows:
s=

n
X

ai = a1 + a2 + a3 + · · · + an

i=1

We can view this as an algorithm or program: Let s hold the sum at the end, and double[]
a be an array holding the numbers we wish to add, that is a[i] = ai , then:
double s = 0
for ( i = 1 ; i <= n ; i++ )
s = s + a[i]
computes the sum
...
For that, we often have to count a variant of
1 + 2 + · · · + n, so it is helpful to know that:
n
X

i = 1 + 2 +
...

2

To illustrate this, consider the program in which k counts the instructions:
for ( i = 0 ; i < n ; i++ ) {
for( j = 0 ; j <= i ; j++ ) {
k++ // instruction 1
k++ // instruction 2
k++ // instruction 3
}
}

120

Using the above formula, the time complexity k is computed as follows:
k=

n−1
i
XX

3=

i=0 j=0

n−1
X

n−1
X

i=0

i=0

(i + 1)3 = 3

(i + 1) = 3

n
X

i=3

i=1

n(n + 1)

...
= 2
2i
2 4 8 16
i=0

∞
X
1 2 3
4
i
=0+ + + +
+
...
+ n ' 2 = O(1)
2i
2 4 8
2
i=0

n
X
1 2 3
n
i
= 0 + + + +
...

A
...
Thus the sequence begins 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,
55, 89, 144, 233, 377,

Title: DSA Notes
Description: This pdf is based upon DSA notes which will help you to solve data scientist and algorithms

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview