Get Informed | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Get Informed
Description: These notes are there for University students starting from 1st year beginners to 4th year exparts

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Introduction to

Computing
Explorations in

Language, Logic, and Machines

David Evans

University of Virginia

For the latest version of this book and supplementary materials, visit:

http://computingbook
...
0 United States License

Contents
1 Computing
1
...

1
...

1
...
1 Information
...
2
...

1
...
3 Growth of Computing Power
...
3 Science, Engineering, and the Liberal Arts
1
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

1
2
3
3
8
12
13
16

Part I: Deﬁning Procedures
2 Language
2
...
2 Language Construction
...
3 Recursive Transition Networks
2
...

2
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

19
19
20
22
26
32

3 Programming
3
...

3
...

3
...

3
...

3
...
1 Primitives
...
4
...

3
...

3
...

3
...
1 Making Procedures
...
6
...
7 Decisions
...
8 Evaluation Rules
...
9 Summary
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

35
36
37
39
40
40
41
44
45
45
46
48
50
52

4 Problems and Procedures
4
...

4
...

4
...
1 Procedures as Inputs and Outputs
4
...

4
...

4
...

4
...
1 Printing
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

4
...
2 Tracing
...
6 Summary
...
1 Types
...
2 Pairs
...
2
...

5
...
2 Triples to Octuples
...
3 Lists
...
4 List Procedures
...
4
...

5
...
2 Generic Accumulators
...
4
...
5 Lists of Lists
...
6 Data Abstraction
...
7 Summary of Part I
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...
75

...
79

...
81

...
83

...
86

...
92

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...
1 Empirical Measurements
...
2 Orders of Growth
...
2
...

7
...
2 Omega
...
2
...

7
...

7
...
1 Input Size
...
3
...

7
...
3 Worst Case Input
...
4 Growth Rates
...
4
...

7
...
2 Linear Growth
...
4
...

7
...
4 Exponential Growth
...
4
...
4
...

7
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

125
125
129
130
133
134
136
136
137
138
139
139
140
145
147
149
149
149

Part II: Analyzing Procedures
6 Machines
6
...
2 Mechanizing Logic
...
2
...

6
...
2 Composing Operations
...
2
...

6
...

6
...
1 Turing Machines
...
4 Summary
...

...

...

...

...

...

...

...

...

...

...

...

...
1 Sorting
...
1
...
153
8
...
2 Insertion Sort
...
1
...

8
...
4 Binary Trees
...
1
...

8
...

8
...
1 Unstructured Search
8
...
2 Binary Search
...
2
...

8
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

158
161
166
167
168
168
169
178

9 Mutation
9
...

9
...

9
...
1 Names, Places, Frames, and Environments
9
...
2 Evaluation Rules with State
...
3 Mutable Pairs and Lists
...
4 Imperative Programming
...
4
...

9
...
2 Imperative Control Structures
...
5 Summary
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

179
179
181
182
183
186
188
188
191
193

10 Objects
10
...

10
...
1 Encapsulation
...
1
...

10
...
3 Object Terminology
...
2 Inheritance
...
2
...
2
...

10
...

10
...

Part III: Improving Expressiveness

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

195
196
196
197
199
200
202
204
207
209

11 Interpreters
11
...

11
...
1 Python Programs
...
1
...

11
...
3 Applications and Invocations
11
...
4 Control Statements
...
2 Parser
...
3 Evaluator
...
3
...

11
...
2 If Expressions
...
3
...

11
...
4 Procedures
...
3
...

11
...
6 Finishing the Interpreter
...
4 Lazy Evaluation
...
4
...

11
...
2 Lazy Programming
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

11
...
235
Part IV: The Limits of Computing
12 Computability
12
...

12
...
1 G¨ del’s Incompleteness Theorem
o
12
...

12
...

12
...

12
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

237
237
240
241
244
245
251

Indexes
253
Index
...
256

List of Explorations
1
...
2
2
...
1
4
...
3
5
...
2
7
...
1
12
...
2

Guessing Numbers
...

Power of Language Systems
...

Recipes for π
...

Pegboard Puzzle
...

Searching the Web
...

Busy Beavers
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...
1 Using three bits to distinguish eight possible values
...
1
2
...
3
2
...
5
2
...
7
2
...
9

Simple recursive transition network
...

Recursive transition network with subnetworks
...

RTN generating “Alice runs”
...

Converting the Number productions to an RTN
...

Converting the Digit productions to an RTN
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

22
23
24
24
25
30
31
31
32

3
...

39

4
...
2
4
...
4
4
...

...

...

54
54
57
58
72

5
...

93

6
...
2
6
...
4
6
...
6

A procedure maps inputs to an output
...

Circular Composition
...

Cornering the Queen
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Computing logical or and not with wine
...

Turing Machine model
...

Checking parentheses Turing Machine
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

110
111
112
119
121
121

7
...
128
7
...
130
7
...
131
8
...
165
9
...
2
9
...
4
9
...
6

Sample environments
...

Environment after evaluating (deﬁne inc (make-adder 1))
...

Mutable pair created by evaluating (set-mcdr! pair pair)
...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

182
184
185
186
187
187

10
...
197
10
...
201
10
...
206
12
...
239
12
...
245
12
...
249

Image Credits
Most of the images in the book, including the tiles on the cover, were generated
by the author
...
INC, michaeldbeavers, and Oneras
...
2
...
The Apollo Guidance Computer image in Section 1
...
3 was released by
NASA and is in the public domain
...
1 is from iStockPhoto, and the rotary trafﬁc signal is from the Wikimedia Commons
...
The
playing card images in Chapter 4 are from iStockPhoto
...
The Dilbert comic in
Chapter 4 is licensed from United Feature Syndicate, Inc
...
1 is from Wikipedia and is in the public domain
...
The odomoter image in Chapter 7 is from iStockPhoto, as is
the image of the frustrated student
...
1 is
from iStockPhoto
...
The xkcd comic at the end of Chapter 11 is used under the creative
commons license generously provided by Randall Munroe
...
I had the privilege of taking 6
...
When I arrived as a new faculty
member at the University of Virginia in 1999, I was distraught to discover that
the introductory computing courses focused on teaching industrial skills, and
with so much of the course time devoted to explaining the technical complexities of using bloated industrial languages like C++ and Java, there was very little,
if any, time left to get across the core intellectual ideas that are the essence of
computing and the reason everyone should learn it
...
This course was
ﬁrst offered in Spring 2002, with the help of an extraordinary group of Assistant
Coaches
...

That course, and the next several offerings, used Abelson & Sussman’s outstanding Structure and Interpretation of Computer Programs (SICP) textbook along
with Douglas Hofstadter’s G¨ del, Escher, Bach: An Eternal Golden Braid
...

Front: Grace Deng, Rachel Dada, Jon Erdman (Assistant Coach)
...
I hope the resulting book captures the spirit and fun of computing
exempliﬁed by SICP but better suited to an introductory course for students
,
with no previous background while covering many topics not included in SICP
such as languages, complexity analysis, objects, and computability
...

I am indebted to many people who helped develop this course and book
...
Greg
Humphreys, Paul Reynolds, and Mark Sherriff have also taught versions of this
course, and contributed to its development
...
William Aiello, Anna Chefter, Chris Frost, Jonathan Grier,
Thad Hughes, Alan Kay, Tim Koogle, Jerry McGann, Gary McGraw, Radhika Nagpal, Shawn O’Hargan, Mike Peck, and Judith Shatin also made important contributions to the class and book
...

Finally, my thanks to all past, present, and future students who use this book,
without whom it would have no purpose
...
In their capacity as intellectual challenge, they are without precedent in
the cultural history of mankind
...
We have developed tools that amplify physical force by the trillions
and increase the speeds at which we can travel by the thousands
...
While some animals have
developed tools to amplify their physical abilities, only humans have developed
tools to substantially amplify our intellectual abilities and it is those advances
that have enabled humans to dominate the planet
...
Language provided the ability to transmit our thoughts to
others, as well as to use our own minds more effectively
...

Computing is the ultimate mental ampliﬁer—computers can mechanize any intellectual activity we can imagine
...
Computing has changed the world more than any other invention of the
past hundred years, and has come to pervade nearly all human endeavors
...

There are two reasons why everyone should study computing:
1
...

2
...

Anyone who has submitted a query to Google, watched Toy Story, had LASIK
eye surgery, used a smartphone, seen a Cirque Du Soleil show, shopped with a
credit card, or microwaved a pizza should be convinced of the ﬁrst reason
...

Although this book will touch on on some exciting applications of computing,
our primary focus is on the second reason, which may seem more surprising
...
We
teach them to read
for the higher
purpose of allowing
them access to
beautiful and
meaningful ideas
...
1
...
The goal of this book is to teach you that new way of thinking
...
1
information
processes

Processes, Procedures, and Computers

Computer science is the study of information processes
...
Each step changes the state of the world in some small way, and the
result of all the steps produces some goal state
...
Because they involve physical things like sugar and dirt, however, they are not pure information processes
...

The boundaries between the physical world and pure information processes,
however, are often fuzzy
...
g
...
g
...
By focusing on abstract information, instead of the
physical ways of representing and manipulating information, we simplify computation to its essence to better enable understanding and reasoning
...

Attributed to Paul
Erd¨ s
o

A procedure is a description of a process
...
The list of steps is the procedure; the act of following
them is the process
...
An algorithm is a mechanical procedure that is
guaranteed to eventually ﬁnish
...

2
...

4
...

6
...

8
...

Place a basket-type ﬁlter into the ﬁlter basket
...

Fill the decanter with cold, fresh water to the desired capacity
...

Close the lid
...

Press the ON button
...
First,
natural languages are very imprecise and ambiguous
...
For example, step three
assumes the operator understands the difference between coffee grounds and
ﬁnished coffee, and can infer that this use of “coffee” refers to coffee grounds
since the end goal of this process is to make drinkable coffee
...

One could, of course, add lots more details to our procedure and make the language more precise than this
...
This is why the United States tax code is 3
...

Another problem with this way of describing a procedure is that the size of the

Chapter 1
...
This is ﬁne
for simple processes that can be executed by humans in a reasonable amount
of time, but the processes we want to execute on computers involve trillions of
steps
...

To program computers, we need tools that allow us to describe processes precisely and succinctly
...
Instead,
we need mechanical procedures that can be followed without any thinking
...
Accept input
...

2
...

3
...
Output could be data displayed to a human, but it could
also be anything that effects the world outside the computer such as electrical signals that control how a device operates
...
Our primary focus is on universal computers, which are computers that can perform all possible mechanical computations on discrete inputs except for practical limits on space and
time
...

computer

A computer
terminal is not
some clunky old
television with a
typewriter in front
of it
...

Douglas Adams

1
...
This power can be captured with units like horsepower and
watt
...
Energy is consumed when a computer operates, but consuming energy
is not the purpose of using a computer
...
How much information it can process?
2
...

1
...
1

Information

Informally, we use information to mean knowledge
...

information

4

bit

binary question

1
...
Measuring Computing Power

The way computer scientists measure information is based on how what is known
changes as a result of obtaining the information
...
One bit of information halves the amount of uncertainty
...
Before learning the answer, there were two possibilities; after learning the answer, there is one
...
Since a bit can
have two possible values, we often represent the values as 0 and 1
...

Half of the time, the coin will land “heads”, and the other half of the time the
coin will land “tails”
...
One bit of information would be enough to
convey either “heads” or “tails”; we can use 0 to represent “heads” and 1 to represent “tails”
...

Similarly, one bit can distinguish between the values 0 and 1:

Is it 1?
No

Yes

0

1

Example 1
...
One bit is not enough
to identify the actual number, since one bit can only distinguish between two
values
...
Can we identify the
value with fewer than 5 questions?

5

Chapter 1
...
This is
not the case if we start with, “Is the value 6?”, since that answer is expected to be
“yes” only one time in six
...
Here, we expect the answer to be “yes” one half of the time,
and the “yes” and “no” answers are equally likely
...
With two more bits, we can distinguish between these
three values (note that two bits is actually enough to distinguish among four
different values, so some information is wasted here)
...
We need two more
bits to distinguish which of the three values it is
...

>= 4?
No

Yes

3?
No

Yes

1

No

3

2?
No

6?

No

Yes

2

Yes

5?

4

6
Yes

5

Three bits can convey more information that just six possible outcomes, however
...
Hence, we are not
obtaining a full bit of information with each question
...
In general, with n bits,
we can distinguish between 2n possibilities
...
The logarithm is deﬁned such that if a = bc
then logb a = c
...
For our six-sided die, log2 6 ≈ 2
...
58
binary questions
...
58 of a question, so
we need to use three binary questions
...
Figure 1
...
We call this structure a binary tree
...

Computer scientists draw trees upside down
...
, 7)
...
Thus, we can describe each of the eight

logarithm

binary tree

6

1
...
Measuring Computing Power

possible values using the answers to the questions down the tree
...
Since there are no more than two possible
answers for each node, we call this a binary tree
...
For example, if we wanted to distinguish between
16 possible numbers, we would add a new question, “Is is >= 8?” to the top
of the tree
...
1 to distinguish
numbers between 0 and 7
...
1, but add 8 to each of the numbers in the questions and the leaves
...
The
example tree has depth three
...

>= 4?
Yes

No
>= 2?
No

>= 6?
Yes

1?

3?
No

No Yes

0

No

1

2

Yes

5?

Yes

7?

No Yes

3

4

No

5

Yes

6

7

Figure 1
...
Using three bits to distinguish eight possible values
...
One byte is deﬁned as eight bits
...
For larger amounts of information, we use metric preﬁxes, but instead of scaling by factors of 1000 they scale by factors of 210 (1024)
...

Exercise 1
...
Draw a binary tree with the minimum possible depth to:
a
...
, 15
...
Distinguish among the 12 months of the year
...
Computing

7

Exercise 1
...
How many bits are needed:
a
...
To uniquely identify any human who ever lived?
c
...
To uniquely identify any atom in the observable universe?
Exercise 1
...
The examples all use binary questions for which there are two
possible answers
...
For
each trit, we can ask a ternary question (a question with three possible answers)
...
How many trits are needed to distinguish among eight possible values? (A
convincing answer would show a ternary tree with the questions and answers
for each node, and argue why it is not possible to distinguish all the values
with a tree of lesser depth
...
[ ] Devise a general formula for converting between bits and trits
...
1: Guessing Numbers
The guess-a-number game starts with one player (the chooser) picking a number between 1 and 100 (inclusive) and secretly writing it down
...
After each guess, the chooser responds with “correct” (the guesser guessed the number and the game is over),
“higher” (the actual number is higher than the guess), or “lower” (the actual
number is lower than the guess)
...
Explain why the guesser can receive slightly more than one bit of information
for each response
...
Assuming the chooser picks the number randomly (that is, all values between
1 and 100 are equally likely), what are the best ﬁrst guesses? Explain why
these guesses are better than any other guess
...
)
c
...
What is the average number of guesses needed (assuming the chooser picks
the number randomly as before)?
e
...

What number should she pick?
f
...
[ ] What are the best strategies for both players in the adversarial guess-anumber game where chooser’s goal is to pick a starting number that maximizes the number of guesses the guesser needs, and the guesser’s goal is to
guess the number using as few guesses as possible
...
2
...
2: Twenty Questions
The two-player game twenty questions starts with the ﬁrst player (the answerer)
thinking of an object, and declaring if the object is an animal, vegetable, or mineral (meant to include all non-living things)
...
The ﬁrst player answers each question “yes” or “no”
...
20q
...
The game is also sold
as a $10 stand-alone toy (shown in the picture)
...
How many different objects can be distinguished by a perfect questioner for
the standard twenty questions game?
b
...
Try playing the 20Q game at http://www
...
net
...
Instead of just “yes” and “no”, the 20Q game offers four different answers:
“Yes”, “No”, “Sometimes”, and “Unknown”
...
) If all four answers were
equally likely (and meaningful), how many items could be distinguished in
20 questions?
e
...
Is this a good ﬁrst
question?
f
...
[

1
...
2

] Speculate on how 20Q could build up its database
...
All we need to do
is think of the right binary questions for which the bits give answers that allow
us to represent each possible value
...

binary number
system
There are only 10
types of people in
the world:
those who understand binary,
and those who don’t
...
In the previous section, we identiﬁed a number using a tree where
each node asks a binary question and the branches correspond to the “Yes” and
“No” answers
...
This is
known as the binary number system
...

For example, the binary number 10010110 represents the decimal value 150:
Binary:
Value:
Decimal Value:

1
27
128

0
26
64

0
25
32

1
24
16

0
23
8

1
22
4

1
21
2

0
20
1

Chapter 1
...

By using more bits, we can represent larger numbers
...
The more bits we have, the larger the set
of possible numbers we can represent
...

Discrete Values
...
A set is countable if there is a countable
way to assign a unique natural number to each element of the set
...
Some, but not all, inﬁnite sets are countable
...
But, the integers are
in fact countable
...

and assign a unique natural number to each integer in turn
...
Georg Cantor proved
this using a technique known as diagonalization
...
This means we could list all the real numbers in order, so we could
assign a unique integer to each number
...
00000000000000
...
25000000000000
...
33333333333333
...
6666666666666
...
141592653589793
...
The trick is to produce a new real number that is not part of the enumeration
...
For the example enumeration above, we
might choose
...

The kth digit of the constructed number is different from the kth digit of the number k in the enumeration
...
Thus, there is a real number that is not included in the enumeration list, and it is impossible to enumerate all the real numbers
...
Continuous values, such as real numbers, can only be approximated by computers
...
1999999999999
...

even though they differ in each digit)
...

2 This is, indeed, part of the deﬁnition of a digital computer
...
In Chapter 6, we explain more of the inner workings of a computer and why
nearly all computers today are digital
...

The property that there are more real numbers than natural numbers has important implications
for what can and cannot be computed, which we return to in Chapter 12
...
2
...
The ﬁrst type, text, is discrete and can be represented exactly; images
are continuous, and can only be represented approximately
...
The set of all possible sequences of characters is countable
...
For example we could
enumerate all texts alphabetically by length (here, we limit the characters to lowercase letters): a, b, c,
...
, az, ba,
...

Since we have seen that we can represent all the natural numbers with a sequence of bits, so once we have the mapping between each item in the set and
a unique natural number, we can represent all of the items in the set
...

So, instead of enumerating a mapping between all possible character sequences
and the natural numbers, we need a process for converting any text to a unique
number that represents that text
...
If we include lower-case letters (26), upper-case
letters (26), and punctuation (space, comma, period, newline, semi-colon), we
have 57 different symbols to represent
...
For example, we could
encode using the mapping shown in Table 1
...
The ﬁrst bit answers the question: “Is it an uppercase letter after F or a special character?”
...

a
b
c
d
···
p
q
···
z

000000
000001
000010
000011
···
001111
010000
···
011001

A
B
C
···
F
G
···
Y
Z

011010
011011
011100
···
011111
100000
···
110010
110011

space
,

...
1
...

This is one way to encode the alphabet, but not the one typically used by computers
...
The
extra symbols are used to encode more special characters
...
So, the text CS is encoded as 011100 101100
...
To convert the number back into text, just invert the
mapping by replacing each group of six bits with the corresponding letter
...
We can also use bit sequences to represent complex data like pictures, movies, and audio recordings
...
Computing

Since the picture is divided into discrete squares known as pixels, we could encode this as a sequence of bits by using one bit to encode the color of each pixel
(for example, using 1 to represent black, and 0 to represent white)
...
We could represent the image using a sequence
of 256 bits (starting from the top left corner):
0000011111100000
0000100000010000
0011000000001100
0010000000000100
···
0000011111100000
What about complex pictures that are not divided into discrete squares or a ﬁxed
number of colors, like Van Gogh’s Starry Night?

Different wavelengths of electromagnetic radiation have different colors
...

But, each wavelength of light has a slightly different color; for example, light with
wavelength 650 nanometers would be a different color (albeit imperceptible to
humans) from light of wavelength 650
...
There are arguably
inﬁnitely many different colors, corresponding to different wavelengths of visible light
...

3 Whether there are actually inﬁnitely many different colors comes down to the question of
whether the space-time of the universe is continuous or discrete
...
In
reality, this may not be the case at extremely tiny scales
...

pixel

12

1
...
Measuring Computing Power

On the other hand, the human eye and brain have limits
...
Ability to distinguish colors
varies, but most humans can perceive only a few million different colors
...

A common way to represent color is to break it into its three primary components (red, green, and blue), and record the intensity of each component
...

Thus, we can represent a picture by recording the approximate color at each
point
...

But, as with color, once the points get smaller than a certain size they are imperceptible
...
The smaller the sample
regions, the more bits we will have and the more detail that will be visible in the
image
...

Summary
...
The more bits we use the more different values that can be represented;
with n bits we can represent 2n different values
...
Since the world we are trying to represent is continuous there are inﬁnitely many possible values, and we cannot represent these objects exactly
with any ﬁnite sequence of bits
...
Finding ways
to represent data that are both efﬁcient and easy to manipulate and interpret is
a constant challenge in computing
...
Chapter 5 focuses on ways to manage complex data
...
2
...
Looking at the number of bits different computers
can store over time gives us a rough indication of how computing power has
increased
...

AGC User Interface

The Apollo Guidance Computer was developed in the early 1960s to control the
ﬂight systems of the Apollo spacecraft
...
Most earlier computers required a full room,
and were far too expensive to be devoted to a single user; instead, they processed jobs submitted by many users in turn
...

Its volume was about a cubic foot and it weighed 70 pounds
...
Computing

13

of two values
...
The AGC consumed a
signiﬁcant fraction of all integrated circuits produced in the mid-1960s, and the
project spurred the growth of the integrated circuit industry
...
The smallest USB ﬂash memory you can buy today (from
SanDisk in December 2008) is the 1 gigabyte Cruzer for $9
...
6 billion bits, about 140 000 times the amount of
memory in the AGC (and all of the Cruzer memory is modiﬁable)
...

Improving by a factor of 4 million corresponds to doubling just over 22 times
...
This property of exponential improvement in computing power is known as Moore’s Law
...
This progress has
been driven by the growth of the computing industry, increasing the resources
available for designing integrated circuits
...
Improvement in computing power has followed this exponential growth remarkably closely over the
past 40 years, although there is no law that this growth must continue forever
...
Instead of 22 doublings in power since 1963, there should have
been 30 doublings (using the 18 month doubling rate)
...
The reason is our
comparison is very unequal relative to cost: the AGC was the world’s most expensive small computer of its time, reﬂecting many millions of dollars of government funding
...

1
...
The confusion stems from the
nature of computing as a new ﬁeld that does not ﬁt well into existing silos
...

Science
...
The goal of science is to develop general and predictive theories that allow
us to understand aspects of nature deeply enough to make accurate quantitative
predications
...
The more general a theory is the better
...

Everything gets
better and better
...
3
...

Science deals with real things (like bowling balls, planets, and electrons) and attempts to make progress toward theories that predict increasingly precisely how
these real things will behave in different situations
...
Instead of dealing with physical things in the real world, computer science concerns abstract things in a virtual world
...
But, since our focus is on abstract, artiﬁcial things rather than physical things, computer science
is not a traditional natural science but a more abstract ﬁeld like mathematics
...

In a deeper sense, computing pervades all of nature
...
One example of computing in nature comes from biology
...
People sometimes describe DNA
as a “blueprint”, but it is really much better thought of as a program
...
A human genome is not a blueprint that
describes the body plan of a human, it is a program that turns a single cell into
a complex human given the appropriate environment
...

Understanding how both these processes work is one of the most interesting
and important open scientiﬁc questions, and it involves deep questions in computer science, as well as biology, chemistry, and physics
...

Theodore von K´ rm´ n
a a

The questions we consider in this book focus on the question of what can and
cannot be computed
...

Engineering
...
Engineering is often distinguished from crafts in that engineers use scientiﬁc principles to create
their designs, and focus on designing under practical constraints
...
Our own favorite description of what engineers do is “design under constraint”
...
To be sure, the realities of nature is one
of the constraint sets we work under, but it is far from the only one, it is
4 William Wulf and George Fisher, A Makeover for Engineering Education, Issues in Science and
Technology, Spring 2002 (http://www
...
org/18
...
html)
...
Computing

15

seldom the hardest one, and almost never the limiting one
...
As we saw from the Apollo Guidance Computer comparison, practical constraints on computing power change
rapidly — the one billion times improvement in computing power is unlike any
change in physical materials5
...

Computer scientists, however, do face many constraints
...
As computing systems get more complex,
there is no way for a human to understand the entire system at once
...
The primary
tool computer scientists use to manage complexity is abstraction
...
By using carefully designed abstractions, we can construct complex systems with reliable properties while limiting the amount of information a human
designer needs to keep in mind at any one time
...
The notion of the liberal arts emerged during the middle ages to
distinguish education for the purpose of expanding the intellects of free people
from the illiberal arts such as medicine and carpentry that were pursued for
economic purposes
...
The traditional seven liberal arts started
with the Trivium (three roads), focused on language:6
• Grammar — “the art of inventing symbols and combining them to express
thought”
• Rhetoric — “the art of communicating thought from one mind to another,
the adaptation of language to circumstance”
• Logic — “the art of thinking”
The Trivium was followed by the Quadrivium, focused on numbers:
•
•
•
•

Arithmetic — “theory of number”
Geometry — “theory of space”
Music — “application of the theory of number”
Astronomy — “application of the theory of space”

All of these have strong connections to computer science, and we will touch on
each of them to some degree in this book
...
The next chapter discusses the structure of language and throughout this book we consider how to efﬁciently use and combine
5 For example, the highest strength density material available today, carbon nanotubes, are perhaps 300 times stronger than the best material available 50 years ago
...

abstraction

I must study politics
and war that my
sons may have
liberty to study
mathematics and
philosophy
...

John Adams, 1780

16

1
...
Summary and Roadmap

symbols to express meanings
...
In computing, we are not typically communicating directly between minds, but we see many forms of communication between entities: interfaces between components of a program, as well as protocols used to enable
multiple computing systems to communicate (for example, the HTTP protocol
deﬁnes how a web browser and web server interact), and communication between computer programs and human users
...
Hence, the traditional trivium liberal arts of language and logic permeate computer science
...
We have already seen how computers use sequences of bits to represent
numbers
...
Geometry is essential for computer graphics, and graph theory is also
important for computer networking
...
7 Unlike the other six liberal arts, astronomy is not
directly connected to computing, but computing is an essential tool for doing
modern astronomy
...

1
...
When confronted with a
problem, a computer scientist does not just attempt to solve it
...

The rest of this book presents a whirlwind introduction to computer science
...

Part I: Deﬁning Procedures
...
The nature of the computer forces solutions to
be expressed precisely in a language the computer can interpret
...
Natural languages like English are too complex and
inexact for this, so we need to invent and use new languages that are simpler,
more structured, and less ambiguously deﬁned than natural languages
...

The computer frees humans from having to actually carry out the steps needed
to solve the problem
...

Chapter 1
...
And it executes them at a remarkable rate — billions of simple steps in each second on a typical laptop
...
Problems like sequencing the human genome, simulating the global climate, and making a photomosaic not only could not have been solved without
computing, but perhaps could not have even been envisioned
...
To represent more interesting problems, we need
ways to manage more complex data
...

Part II: Analyzing Procedures
...
This requires understanding how machines can compute (Chapter 6), and mathematical tools for reasoning about
how cost grows with the size of the inputs to a procedure (Chapter 7)
...

Part III: Improving Expressiveness
...
Our goal, however, it to be able to deﬁne
concise, elegant, and efﬁcient procedures for performing desired computations
...

Part IV: The Limits of Computing
...
In Part IV, we consider the question of what can and cannot be done by a
mechanical computer
...

Themes
...
A recursive deﬁnition deﬁne a thing in terms of smaller
instances of itself
...
Recursive deﬁnitions can deﬁne
an inﬁnitely large set with a small description
...
We use recursive deﬁnitions to deﬁne
inﬁnite languages in Chapter 2, to solve problems in Chapter 4, to build complex
data structures in Chapter 5
...

Universality
...
Procedures themselves can be described
using just bits, so we can write procedures that process procedures as inputs and
that generate procedures as outputs
...
We introduce the use of procedures as
inputs and outputs in Chapter 4, see how generated procedures can be packaged with state to model objects in Chapter 10
...
4
...
We introduce a model of computation
in Chapter 6, and reason about the limits of computation in Chapter 12
...
Abstraction is a way of hiding details by giving things names
...
Good abstractions hide unnecessary details so they can be used to build complex systems without needing to understand all the details of the abstraction at once
...

Throughout this book, these three themes will recur recursively, universally, and
abstractly as we explore the art and science of how to instruct computing machines to perform useful tasks, reason about the resources needed to execute a
particular procedure, and understand the fundamental and practical limits on
what computers can do
...
For shame, Mr
...
This is true
whether we are considering communication between two humans, between a
human programmer and a computer, or between a network of computers
...
This chapter is about what
language is, how language works, and ways to deﬁne languages
...
1

Surface Forms and Meanings

A language is a set of surface forms and meanings, and a mapping between the
surface forms and their associated meanings
...

language

A natural language is a language spoken by humans, such as English or Swahili
...
We focus on designed languages that are created by humans for some a speciﬁc purpose such as for expressing procedures to be executed by computers
...
In a textual language,
the surface forms are linear sequences of characters
...
Each character is a symbol drawn from a ﬁnite set
known as an alphabet
...
, z} (for
the full language, capital letters, numerals, and punctuation symbols are also
needed)
...
For example, this table describes a communication system between trafﬁc lights and drivers:
Surface Form
Green
Yellow
Red

Meaning
Go
Caution
Stop

string
alphabet

20

2
...
Language Construction

Communication systems involving humans are notoriously imprecise and subjective
...
Communication systems for computers demand precision: we want to know what our programs will do, so it is important
that every step they make is understood precisely and unambiguously
...
The number
of possible meanings that can be expressed is limited by the number of entries in
the table
...
A useful language must be able to express inﬁnitely
many different meanings
...
1)
...
One way to generate inﬁnitely large sets is to use repeating
patterns
...
”
as the set of all natural numbers
...
” as meaning keep doing
the same thing for ever
...
Thus, with only a few numbers and symbols we can describe a set
containing inﬁnitely many numbers
...
2
...

But, ﬁnding a sensible mapping between most meanings and numbers is nearly
impossible
...

2
...
We need ways to describe languages that allow us to deﬁne an inﬁnitely large set of surface forms
and meanings with a compact notation
...

Components of Language
...

• means of combination — rules for building new language elements by combining simpler ones
...
A primitive cannot be broken into smaller parts whose
meanings can be combined to produce the meaning of the unit
...

Since we have rules for producing new words not all words are primitives
...

Chapter 2
...
Rules like this one mean anyone can invent a new word, and use
it in communication in ways that will probably be understood by listeners who
have never heard the word before
...
English speakers who know
the meaning of freeze and anti- could roughly guess the meaning of antifreeze
even if they have never heard the word before
...

Both anti and freeze are primitive; they cannot be broken into smaller parts
with meaning
...

Means of Abstraction
...

Means of abstraction allow us to give a simple name to a complex entity
...
The
meaning of a pronoun depends on the context in which it is used
...
For example, the it in the previous
sentence abstracts “the meaning of a pronoun”, but the it in the sentence before
that one abstracts “a pronoun”
...
English, in particular, has a very limited set of pronouns for abstracting people
...
The interpretation of what a pronoun abstract in natural languages is often confusing
...
Languages for programming computers need means of abstraction that are both powerful and unambiguous
...
1
...
This word was reputedly invented by a non-hippopotomonstrosesquipedaliophobic student at Eton
who combined four words in his Latin textbook
...
An English speaker (familiar with ﬂoccinaucinihilipiliﬁcation and the morphemes you use) should be able to deduce the meaning
of your word
...
2
...
Its deﬁnition is, “truth that
comes from the gut, not books”
...

1 Guessing that it is a verb meaning to pass from the solid to liquid state would also be reasonable
...

22

2
...
Recursive Transition Networks

Exercise 2
...
According to the Oxford English Dictionary, Thomas Jefferson is
the ﬁrst person to use more than 60 words in the dictionary
...
For each Jeffersonian
word, guess its derivation and explain whether or not its meaning could be inferred from its components
...
Society is the
workshop in which
new ones are
elaborated
...

Thomas Jefferson,
letter to John Adams,
1820

recursive transition
network

Exercise 2
...
Embiggening your vocabulary with anticromulent words ecdysiasts can grok
...
Invent a new English word by combining common morphemes
...
Get someone else to use the word you invented
...
[

2
...

Recursive Transition Networks

This section describes a more powerful technique for deﬁning languages
...
To
deﬁne a language, we need to deﬁne a system that produces all strings in the
language and no other strings
...
)
A recursive transition network (RTN) is deﬁned by a graph of nodes and edges
...
The nodes and edge structure provides the means of combination
...
One or more of the nodes may be designated as ﬁnal nodes
(indicated by an inner circle)
...

Figure 2
...
Starting at the node marked Noun, there are two possible edges to follow
...
From that node there are two output edges, each leading to the ﬁnal node marked S
...
Hence,
the RTN can produce four strings corresponding to the four different paths from
the start to ﬁnal node: “Alice jumps”, “Alice runs”, “Bob jumps”, and “Bob runs”
...
For example, adding one more edge from Noun to
Alice

jumps

Verb

Noun
Bob

S
runs

Figure 2
...
Simple recursive transition network
...
Language
Verb with label “Colleen” adds two new strings to the language
...
This is where the recursive
in the name comes from
...
2 below
...
2
...

Now, we can produce inﬁnitely many different strings! We can follow the “and”
edge back to the Noun node to produce strings like “Alice runs and Bob jumps
and Alice jumps” with as many conjuncts as we want
...
5
...

Exercise 2
...
How many different strings can be produced by the RTN below:
jumps
Alice
Noun

Adverb

Verb
Bob

quickly

eats

runs

S
slowly

Exercise 2
...
Recursive transition networks
...
How many nodes are needed for a recursive transition network that can produce exactly 8 strings?
b
...
[ ] Given a whole number n, how many edges are needed for a recursive
transition network that can produce exactly n strings?
Subnetworks
...
We can make more expressive RTNs by allowing edge labels to
also name subnetworks
...
3
...
When an edge labeled with a subnetwork is followed, the network traversal jumps to the subnetwork node
...
Upon reaching a ﬁnal node, the network traversal jumps back to
complete the edge
...
3
...
1, but uses subnetworks for Noun and Verb
...
The only edge out from Sentence is labeled Noun
...
Now, we can follow any path from Noun to a ﬁnal node
(in this cases, outputting either “Alice” or “Bob” on the path toward EndNoun
...
3
...

Suppose we replace the Noun subnetwork with the more interesting version
shown in Figure 2
...
This subnetwork includes an edge from Noun to N1 labeled
Noun
...
Starting from Noun, we can generate complex phrases like “Alice and Bob”
or “Alice and Bob and Alice” (ﬁnd the two different paths through the network
that generate this phrase)
...
4
...

To keep track of paths through RTNs without subnetworks, a single marker sufﬁces
...
Keeping track of paths on an RTN with
subnetworks is more complicated
...
Since we can enter subnetworks within subnetworks,
we need a way to keep track of arbitrarily many jump points
...
We can think of a stack
like a stack of trays in a cafeteria
...
We can pop the top tray off the stack, after which the next
tray is now on top
...

We use a stack of nodes to keep track of the subnetworks as they are entered
...
At each step, we pop
the node off the stack and follow a transition from that node
...
Language
Noun
Stack

Sentence

Step

1

EndNoun

S1
N = Sentence
2

S1

S1

S1

3, 4, 5, 6

N = Noun
2

3, 4, 5, 6

N = EndNoun
2, 3

Alice

Output
Verb
Stack
Step
Output

EndVerb

EndSentence
N = S1
2

EndSentence

EndSentence

3, 4, 5, 6

N = Verb
2

3, 4, 5, 6

EndSentence
N = EndVerb N = EndSentence
2
2, 3

runs

Figure 2
...
RTN generating “Alice runs”
...

2
...

4
...

If the stack is empty, stop
...

If the popped node, N, is a ﬁnal node return to step 2
...
Use D to denote the
destination of that edge, and s to denote the output symbol on the edge
...
Push D on the stack
...
If s is a subnetwork, push the node s on the stack
...

7
...

Consider generating the string “Alice runs” using the RTN in Figure 2
...
We start
following step 1 by pushing Sentence on the stack
...
Since Sentence is not a ﬁnal node, we do
nothing for step 3
...
There is
only one edge to choose and it leads to the node labeled S1
...
The edge we followed is labeled with the node Noun, so we
push Noun on the stack
...
Since
Noun is on top, this means we will ﬁrst traverse the Noun subnetwork, and then
continue from S1
...
It is not a ﬁnal node, so we continue to step 4, and
select the edge labeled “Alice” from Noun to EndNoun
...
The label on the edge is the terminal, “Alice”, so we output “Alice” following step 6
...
The full processing steps are shown in Figure 2
...

Exercise 2
...
Show the sequence of stacks used in generating the string “Alice
and Bob and Alice runs” using the network in Figure 2
...
4
...
RTNs can
have edges out of ﬁnal nodes (as in Figure 2
...

26

2
...
Replacement Grammars

Exercise 2
...
Identify a string that cannot be produced using the RTN from
Figure 2
...
4 without the stack
growing to contain ﬁve elements
...
10
...
Hence, it cannot follow all
possible paths for an RTN where there are edges out of a ﬁnal node
...

2
...
This is the most common
way languages are deﬁned by computer scientists today, and the way we will use
for the rest of this book
...
We use
the Backus-Naur Form (BNF) notation to deﬁne a grammar
...
1 explains
what this means and why it is the case), but easier to write down
...
Backus led efforts at IBM
to deﬁne and implement Fortran, the ﬁrst widely used programming language
...
In deﬁning the Fortran language, Backus and his
team used ad hoc English descriptions to deﬁne the language
...

Rules in a Backus-Naur Form grammar have the form:
nonterminal ::⇒ replacement

I ﬂunked out every
year
...
I hated
studying
...
It
had the delightful
consequence that
every year I went to
summer school in
New Hampshire
where I spent the
summer sailing and
having a nice time
...
The right side of a rule contains
one or more symbols
...
They may
also be terminals, which are output symbols that never appear as the left side
of a rule
...
The terminals are the primitives in the language; the grammar rules are its means of combination
...
g
...
Wherever we ﬁnd
a nonterminal on the left side of a rule, we can replace it with what appears on
the right side of any rule where that nonterminal matches the left side
...

Here is an example BNF grammar (that describes the same language as the RTN

27

Chapter 2
...
1):
1
...

3
...

5
...

A derivation shows how a grammar generates a given string
...
Such a tree is known as a parse
tree
...
For example, adding the rule, Noun ::⇒ Colleen, to the grammar adds two new strings
(“Colleen runs” and “Colleen jumps”) to the language
...
The real power of BNF as a compact notation for describing languages, though, comes once we start adding recursive rules to our grammar
...

Suppose we add the rule,
Sentence ::⇒ Sentence and Sentence
to our example grammar
...
2
...
This is very powerful: by using recursive rules a compact grammar can be
used to deﬁne a language containing inﬁnitely many strings
...
4
...
1: Whole Numbers
This grammar deﬁnes the language of the whole numbers (0, 1,
...
Recursive Deﬁnitions
...
This is sometimes written as to make it clear that the
replacement is empty: MoreDigits ::⇒
...
The key is that we can
only produce a string when all nonterminals in the string have been replaced
with terminals
...

The only rule we have with Number on the left side is the ﬁrst rule, which replaces Number with Digit MoreDigits
...
We can produce as many Digits as we want,
but without the MoreDigits ::⇒ rule we can never stop
...

Without the stopping rule, MoreDigits would be deﬁned in a circular way
...

With the MoreDigits ::⇒ rule, however, we have a way to produce something
terminal from MoreDigits
...

Condensed Notation
...
For example, the whole numbers grammar has ten rules
with Digit on the left side to produce the ten terminal digits
...
A compact notation for these types of rules is to use the vertical

29

Chapter 2
...
For example, we could write the ten
Digit rules compactly as:
Digit ::⇒ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Exercise 2
...
Suppose we replaced the ﬁrst rule (Number ::⇒ Digit MoreDigits)
in the whole numbers grammar with: Number ::⇒ MoreDigits Digit
...
How does this change the parse tree for the derivation of 37? Draw the parse
tree that results from the new grammar
...
Does this change the language? Either show some string that is in the language deﬁned by the modiﬁed grammar but not in the original language (or
vice versa), or argue that both grammars generate the same strings
...
12
...
Devise a grammar that
produces all whole numbers (including “0”), but no strings with unnecessary
leading zeros
...
13
...
14159, 0
...
2
...

Exercise 2
...
The BNF grammar below (extracted from Paul Mockapetris, Domain Names - Implementation and Speciﬁcation, IETF RFC 1035) describes the
language of domain names on the Internet
...
Label
Letter MoreLetters
LetterHyphens LetterDigit |
LDHyphen | LDHyphen LetterHyphens |
LetterDigit | Letter | Digit
A|B|
...
|z
0|1|2|3|4|5|6|7|8|9

a
...
virginia
...

b
...
-b
...
b-b
...
g
...
e
...
t
...
o
...
a
...
n-
...
1: Power of Language Systems
Section 2
...
What does it mean to say two systems are equally powerful?
A language description mechanism is used to deﬁne a set of strings comprising a
language
...

One approach to measure the power of language description mechanism would
be to count the number of languages that it can deﬁne
...
4
...
Both RTNs and BNFs can describe inﬁnitely many different languages
...

Instead, we need to consider the set of languages that each mechanism can deﬁne
...
This matches our
intuitive interpretation of more powerful — A is more powerful than B if it can
do everything B can do and more
...
6 show three possible scenarios
...
Hence, A is more powerful than B
...
This means every language that can be deﬁned by A can also be deﬁned by B, and every language that can be deﬁned by B
can also be deﬁned by A, and the systems are equally powerful
...
This means we cannot say
either one is more powerful; A can do some things B cannot do, and B can do
some things A cannot do
...
6
...

To determine the relationship between RTNs and BNFs we need to understand
if there are languages that can be deﬁned by a BNF that cannot be deﬁned by
a RTN and if there are languages that can be deﬁned by a RTN that cannot be
deﬁned by an BNF
...

proof by
construction

For the ﬁrst part, we prove that there are no languages that can be deﬁned by a
BNF that cannot be deﬁned by an RTN
...
Since there are inﬁnitely
many languages that can be deﬁned by BNF grammars, we cannot prove this
by enumerating each language and showing its corresponding RTN
...
We show an algorithm that given any BNF grammar constructs an RTN
that deﬁnes the same language as the input BNF grammar
...

For each rule where the nonterminal is on the left side, the right hand side is
converted to a path through that node’s subnetwork
...
Language

Before presenting the general construction algorithm, we illustrate the approach
with the example BNF grammar from Example 2
...
For each
nonterminal, we construct a subnetwork by ﬁrst creating two nodes corresponding to the start and end of the subnetwork for the nonterminal
...

Next, we need to add edges to the RTN corresponding to the production rules
in the grammar
...
To make the corresponding RTN, we need to introduce an intermediate node since each RTN edge can only contain one label
...
The resulting partial RTN is shown in Figure 2
...

StartDigit

StartNumber

StartMoreDigits

EndNumber

X0

Figure 2
...
Converting the Number productions to an RTN
...
The ﬁrst means
MoreDigits can be replaced with nothing
...
So, the equivalent of outputting nothing is to turn StartMoreDigits into a ﬁnal node
...
We do this in the RTN by adding an edge between StartMoreDigits and
EndMoreDigits labeled with Number, as shown in Figure 2
...

StartMoreDigits

Number

EndMoreDigits

Figure 2
...
Converting the MoreDigits productions to an RTN
...
For each rule, we add an edge
between StartDigit and EndDigit labeled with the digit terminal, as shown in
Figure 2
...

This example illustrates that it is possible to convert a particular grammar to an
RTN
...
For each nonterminal X in the grammar, construct two nodes, StartX and

32

2
...
Summary
0
1

...
9
...

EndX, where EndX is a ﬁnal node
...

2
...

All BNF rules have the form X ::⇒ replacement where X is a nonterminal
in the grammar and replacement is a sequence of zero or more terminals
and nonterminals: [ R0 , R1 ,
...

(a) If the replacement is empty, make StartX a ﬁnal node
...

(c) Otherwise:
i
...

ii
...
(For
example, for element R1 , a new node Xi,1 is added, and an edge
from Xi,0 to Xi,1 with edge label R1
...
Add an edge from Xi,n−1 to EndX with edge label Rn
...
Hence, we have proved that RTNs are at least as
powerful as BNF grammars
...
This part of the proof can be done
using a similar strategy in reverse: by showing a procedure that can be used to
construct a BNF equivalent to any input RTN
...

Exercise 2
...
Produce an RTN that deﬁnes the same languages as the BNF
grammar from Exercise 2
...

Exercise 2
...
[ ] Prove that BNF grammars are as powerful as RTNs by devising
a procedure that can construct a BNF grammar that deﬁnes the same language
as any input RTN
...
5

Summary

Languages deﬁne a set of surface forms and associated meanings
...
Language

33

deﬁning inﬁnite sets of surface forms using compact and precise notations
...
This system
can describe inﬁnite languages with small representations because of the power
of recursive rules
...

34

2
...
Summary

3

Programming
The Analytical Engine has no pretensions whatever to originate any thing
...
It can follow analysis; but it
has no power of anticipating any analytical relations or truths
...

Augusta Ada Countess of Lovelace, in Notes on the Analytical Engine, 1843

What distinguishes a computer from other machines is its programmability
...
With the right
program, though, a computer can be a tool for communicating across the continent, discovering a new molecule that can cure cancer, composing a symphony,
or managing the logistics of a retail empire
...
It is an intensely creative activity, involving aspects of art, engineering, and science
...
The best programs
are delightful in ways similar to the best architecture, elegant in both form and
function
...

Fortunately, it is not necessary to possess all of those rare qualities to be a good
programmer! Indeed, anyone who is able to master the intellectual challenge
of learning a language (which, presumably, anyone who has gotten this far has
done at least for English) can become a good programmer
...
Because the computer does exactly what it is told, a small mistake in a
program may prevent it from working as intended
...

In the previous chapter, we explored the components of language and mechanisms for deﬁning languages
...

Golden Gate Bridge

36

3
...
1
...
Why can’t we use
natural languages to program computers?
Next, we survey several of the reasons for this
...

Complexity
...
Despite using it for
most of their waking hours for many years, native English speakers know a small
fraction of the entire language
...

Ambiguity
...
Understanding the intended meaning of
an utterance requires knowing the context, and sometimes pure guesswork
...
Happening every two weeks
...
Happening twice a week; semiweekly
...
occurring twice a week
2
...

Even if we can agree on the deﬁnition of every word, the meaning of a sentence
is often ambiguous
...
In
order that there be no doubt as to which is the bottom and which is the
top, for storage purposes, it will be seen that the bottom of each warhead
has been labeled ’TOP’
...
Because natural languages evolve over time as different cultures
interact and speakers misspeak and listeners mishear, natural languages end up
a morass of irregularity
...
For example,
English has a rule that we can make a word plural by appending an s
...
answers
...

2 Merriam-Webster Online, Merriam-Webster, 2008 (http://www
...
com/dictionary/
biweekly)
...
Gaither and Alma E
...

Chapter 3
...
This rule works for
most words: word → words, language → languages, person → persons
...
The plural of goose is geese (and gooses
is not an English word), the plural of deer is deer (and deers is not an English
word), and the plural of beer is controversial (and may depend on whether you
speak American English or Canadian English)
...

There is no sure way to predict when the rule can be applied, and it is necessary
to memorize each of the irregular forms
...
It requires a lot of space to express a complex idea in a natural language
...
Since natural languages
evolved for everyday communication, they are not well suited to describing the
precise steps and decisions needed in a computer program
...

In English, we could describe it like this:

I have made this
letter longer than
usual, only because
I have not had the
time to make it
shorter
...
If the ﬁrst number is greater than the second number, the maximum is the ﬁrst number
...

Perhaps shorter descriptions are possible, but any much shorter description
probably assumes the reader already knows a lot
...
Natural languages provide small, ﬁxed sets of
pronouns to use as means of abstraction, and the rules for binding pronouns to
meanings are often unclear
...

3
...
A programming language is a language that is designed to be read and written by humans to create
programs that can be executed by computers
...
It is difﬁcult to simultaneously
satisfy all desired properties since simplicity is often at odds with economy
...
For the ﬁrst two parts of this book, we use
the Scheme programming language which was designed primarily for simplicity
...

4 Or is it

people? What is the singular of people? What about peeps? Can you only have one peep?

programming
language

38

3
...
Programming Languages

Another reason there are many different programming languages is that they
are at different levels of abstraction
...
Other languages hide most of the details of
the machine operation from the programmer, allowing them to focus on higherlevel actions
...
This means at the
lowest level we need languages the computer can understand directly
...

Code at this level is not easy for humans to understand or write, but it is easy
for a processor to execute quickly
...

For example, the bit sequence 1110101111111110 encodes an instruction in the
Intel x86 instruction set (used on most PCs) that instructs the processor to jump
backwards two locations
...
Hence, the processor gets stuck running forever without making
any progress
...
This means each
instruction can be executed very quickly
...
5
Until the early 1950s, all programming was done at the level of simple instructions
...

compiler

A compiler is a computer program that generates other programs
...
Admiral Grace Hopper developed the ﬁrst compilers in the 1950s
...
An interpreter is a tool that translates between a higher-level language and a lower-level language, but where a
compiler translates an entire program at once and produces a machine language
program that can be executed directly, an interpreter interprets the program a
small piece at a time while it is running
...
This makes it
easy to make small changes to a program and try it again, and to observe the
state of our program as it is running
...
They told
me computers could
only do arithmetic
...
Another advantage of compilers over
5 A “2GHz processor” executes 2 billion cycles per second
...

39

Chapter 3
...

This is especially important when writing critical programs such as ﬂight control software — we want to detect as many problems as possible in the ﬂight
control software before the plane is ﬂying!
Since we are more concerned with interactive exploration than with performance
and detecting errors early, we use an interpreter instead of a compiler
...
3

Scheme

The programming system we use for the ﬁrst part of this book is depicted in
Figure 3
...
The input to our programming system is a program written in a programming language named Scheme
...

Scheme was developed at MIT in the 1970s by Guy Steele and Gerald Sussman,
based on the LISP programming language that was developed by John McCarthy
in the 1950s
...
It is, however, a great language for learning about
computing and programming
...
The language is simple enough
that this chapter covers nearly the entire language (we defer describing a few
aspects until Chapter 9), and by the end of this book you will know enough to
implement your own Scheme interpreter
...

Scheme Program

(define (bigger a b)
(if (> a b) a b))
(bigger 3 4)

Interpreter
(DrRacket)

Processor

Figure 3
...
Running a Scheme program
...
4
...
org/
...
The selected language deﬁnes the grammar
and evaluation rules that will be used to interpret your program
...

3
...
5)
...

The act of determining the value associated with an expression is called evaluation
...
If you enter an expression into a Scheme
interpreter, the interpreter evaluates the expression and displays its value
...
Scheme also provides means of combination
for producing complex expressions from simple expressions
...
Section 3
...
7 describes expressions that can be used to make decisions
...
4
...
Hence,
the value of a primitive is its pre-deﬁned meaning
...
Three useful types of primitives are
described next: numbers, Booleans, and primitive procedures
...
Numbers represent numerical values
...

Example numbers include:
150
3
...
For example, the value of the primitive expression 1120 is 1120
...
Booleans represent truth values
...
In the DrRacket
interpreter, #t and #f are used to represent the primitive truth values
...

41

Chapter 3
...
1
...

All of these primitive procedures operate on numbers
...
Some of these procedures are
deﬁned for more inputs than just the ones shown here (e
...
, the subtract procedure also
works on one number, producing its negation)
...
Scheme provides primitive procedures corresponding to
many common functions
...
For each valid input to the function, there is exactly one associated
output
...
Its output is the sum of the values of the inputs
...
1 describes some primitive procedures for performing arithmetic and comparisons on numbers
...
4
...
The expression (+ 1 2) is an ApplicationExpression, consisting of three subexpressions
...
The same process will allow us to understand how any
expression is evaluated
...
4
...

The value of the ﬁrst expression should be a procedure; the remaining expressions are the inputs to the procedure known as operands
...

Here is a parse tree for the expression (+ 1 2):

Expression
ApplicationExpression
(

Expression

PrimitiveExpression

MoreExpressions
Expression

)
MoreExpressions

PrimitiveExpression

Expression

1

+

MoreExpressions

PrimitiveExpression
2

Following the grammar rules, we replace Expression with ApplicationExpression
at the top of the parse tree
...
The Expression term is replaced PrimitiveExpression, and ﬁnally, the primitive addition procedure +
...
The MoreExpressions term produces the two operand expressions: 1 and 2, both of which are
primitives that evaluate to their own values
...
Following the meaning
of the primitive procedure, (+ 1 2) evaluates to 3 as expected
...

We can build up complex expressions like (+ (∗ 10 10) (+ 25 25))
...
Programming
Expression
ApplicationExpression
(

Expression

MoreExpressions

PrimitiveExpression

+

Expression

MoreExpressions

ApplicationExpression
(∗ 10 10)

)

Expression

MoreExpressions

ApplicationExpression
(+ 25 25)

This tree is similar to the previous tree, except instead of the subexpressions
of the ﬁrst application expression being simple primitive expressions, they are
now application expressions
...
)
To evaluate the output application, we need to evaluate all the subexpressions
...
The second
subexpression, (∗ 10 10), evaluates to 100, and the third expression, (+ 25 25),
evaluates to 50
...

Exercise 3
...
Draw a parse tree for the Scheme expression (+ 100 (∗ 5 (+ 5 5)))
and show how it is evaluated
...
2
...
After making your prediction, try evaluating the expression in DrRacket
...

a
...
(+ 1120)
c
...
(= (+ 10 20) (∗ 15 (+ 5 5)))
e
...
(+ + <)
Exercise 3
...
For each question, construct a Scheme expression and evaluate it
in DrRacket
...
How many seconds are there in a year?
b
...
For what fraction of your life have you been in school?

44

3
...
Deﬁnitions

Exercise 3
...
Construct a Scheme expression to calculate the distance in inches
that light travels during the time it takes the processor in your computer to execute one cycle
...
Hence, light travels at 299, 792, 458 meters per second
...
One hertz means once per second, so 1 GHz means the
processor executes 1,000,000,000 cycles per second
...
Note that Scheme performs calculations exactly, so the result will be displayed as a fraction
...
)

3
...
A deﬁnition introduces a new name and gives it a value:
Deﬁnition ::⇒ (deﬁne Name Expression)
After a deﬁnition, the N ame in the deﬁnition is now associated with the value of
the expression in the deﬁnition
...

A name can be any sequence of letters, digits, and special characters (such as
−, >, ?, and !) that starts with a letter or special character
...
We don’t
recommend using some of these names in your programs, however! A good programmer will pick names that are easy to read, pronounce, and remember, and
that are not easily confused with other names
...
(Alert
readers should be worried that we need a more precise deﬁnition of the meaning
of deﬁnitions to know what it means for a value to be associated with a name
...
)
Below we deﬁne speed-of-light to be the speed of light in meters per second,
deﬁne seconds-per-hour to be the number of seconds in an hour, and use them
to calculate the speed of light in kilometers per hour:

> (deﬁne speed-of-light 299792458)
> speed-of-light
299792458

> (deﬁne seconds-per-hour (∗ 60 60))
> (/ (∗ speed-of-light seconds-per-hour) 1000)
1079252848 4/5

Chapter 3
...
6

45

Procedures

In Chapter 1 we deﬁned a procedure as a description of a process
...
Section 3
...
1 introduced some of Scheme’s primitive procedures
...

Procedures are similar to mathematical functions in that they provide a mapping between inputs and outputs, but they differ from mathematical functions
in two important ways:
State
...
This means that even when the same procedure is applied to the
same inputs, the output produced may vary
...
State makes procedures
much harder to reason about
...

Resources
...
The most important resources are space (memory) and time
...
Each step of a
procedure requires some time to execute
...
We consider this
throughout this book, and in particular in Chapter 7
...

3
...
1

Making Procedures

Scheme provides a general mechanism for making a procedure:
Expression
::⇒ ProcedureExpression
ProcedureExpression ::⇒ (lambda (Parameters) Expression)
Parameters
::⇒ | Name Parameters
Evaluating a ProcedureExpression produces a procedure that takes as inputs the
Parameters following the lambda
...
The body of the resulting procedure is the Expression, which is not
evaluated until the procedure is applied
...
This means anywhere an Expression is used we can create a new procedure
...
6
...

(lambda (a b) (+ a b))
Procedure that takes two inputs, and produces the sum of the input values
as its output
...
The result of
applying this procedure to any argument is always 0
...

This is an example of a higher-order procedure
...

This can be confusing, but is also very powerful
...
6
...
In Section 3
...
2, we saw the
syntax and evaluation rule for an ApplicationExpression when the procedure
to be applied is a primitive procedure
...
In this case, the ﬁrst Expression evaluates to a procedure that was
created using a ProcedureExpression, so the ApplicationExpression becomes:
ApplicationExpression ::⇒
((lambda (Parameters)Expression) MoreExpressions)
(The underlined part is the replacement for the ProcedureExpression
...
These expressions are known as the operands of the application
...
There must be exactly one expression in the MoreExpressions corresponding to each name in the
parameters list
...
Finally, evaluate the expression that is the body of the
procedure
...

Example 3
...
To evaluate the application, we evaluate all the
subexpressions and apply the value of the ﬁrst subexpression to the values of

Chapter 3
...
The ﬁrst subexpression evaluates to a procedure
that takes one parameter named x and has the expression body (∗ x x)
...

To evaluate the application we bind the ﬁrst parameter, x, to the value of the
ﬁrst operand, 2, and evaluate the procedure body, (∗ x x)
...
This is an application of the primitive
multiplication procedure
...

The procedure in our example, (lambda (x) (∗ x x)), is a procedure that takes a
number as input and as output produces the square of that number
...
5) to give this procedure a name so
we can reuse it:
(deﬁne square (lambda (x) (∗ x x)))
This deﬁnes the name square as the procedure
...
2: Make adder
The expression
((lambda (a)
(lambda (b) (+ a b)))
3)
evaluates to a procedure that adds 3 to its input
...
By using deﬁne, we can give these procedures sensible names:
(deﬁne make-adder
(lambda (a)
(lambda (b) (+ a b))))
Then, (deﬁne add-three (make-adder 3)) deﬁnes add-three as a procedure that
takes one parameter and outputs the value of that parameter plus 3
...
Since we commonly deﬁne new procedures, Scheme provides a condensed notation for deﬁning a procedure6 :
6 The condensed notation also includes a begin expression, which is a special form
...
We describe
the begin special form in Chapter 9
...
7
...
For example,
(deﬁne square (lambda (x) (∗ x x)))
can be written equivalently as:
(deﬁne (square x) (∗ x x))
Exercise 3
...
Deﬁne a procedure, cube, that takes one number as input and
produces as output the cube of that number
...
6
...
The output should be the total cost, which is computed as the
price of the item plus the sales tax on the item, which is its price times the sales
tax rate
...
05) should evaluate to 13
...

3
...
For example, we may want a procedure that takes two numbers as
inputs and evaluates to the greater of the two inputs
...
The IfExpression expression provides a way
of using the result of one expression to select which of two possible expressions
to evaluate:
Expression ::⇒ IfExpression
IfExpression ::⇒ (if ExpressionPredicate
ExpressionConsequent
ExpressionAlternate )
The IfExpression replacement has three Expression terms
...
To evaluate an IfExpression, ﬁrst evaluate the predicate expression,
ExpressionPredicate
...
If the predicate expression evaluates
to false, the value of the IfExpression is the value of ExpressionAlternate , the alternate expression, and the consequent expression is not evaluated at all
...
If the value of the predicate
is anything other than false, the consequent expression is used
...

special form

The if expression is a special form
...
Instead, we have

Chapter 3
...
The reason a special evaluation rule
is needed is because we do not want all the subexpressions to be evaluated
...
With the if special form evaluation rule, the predicate
expression is always evaluated ﬁrst and only one of the following subexpressions
is evaluated depending on the result of evaluating the predicate expression
...
For example,
(if (> 3 4) (∗ + +) 7)
evaluates to 7 even though evaluating the subexpression (∗ + +) would produce
an error
...

Example 3
...
The deﬁnition,
(deﬁne (bigger a b) (if (> a b) a b))
is a condensed procedure deﬁnition
...
This is a procedure that takes two inputs, named
a and b
...
The
predicate expression compares the value that is bound to the ﬁrst parameter, a,
with the value that is bound to the second parameter, b, and evaluates to true if
the value of the ﬁrst parameter is greater, and false otherwise
...
When the predicate evaluates to false, the value of
the if expression is the value of the alternate expression, b
...

Exercise 3
...
Follow the evaluation rules to evaluate the Scheme expression:
(bigger 3 4)
where bigger is the procedure deﬁned above
...
)

50

3
...
Evaluation Rules

Exercise 3
...
Deﬁne a procedure, xor, that implements the logical exclusive-or
operation
...
Otherwise, it outputs false
...

Exercise 3
...
Deﬁne a procedure, absvalue, that takes a number as input and
produces the absolute value of that number as its output
...

Exercise 3
...
Deﬁne a procedure, bigger-magnitude, that takes two inputs, and
outputs the value of the input with the greater magnitude (that is, absolute distance from zero)
...

Exercise 3
...
Deﬁne a procedure, biggest, that takes three inputs, and produces
as output the maximum value of the three inputs
...
Find at least two different ways to deﬁne biggest, one using
bigger, and one without using it
...
8

Evaluation Rules

Here we summarize the grammar rules and evaluation rules
...

Program
ProgramElement

::⇒
::⇒

| ProgramElement Program
Expression | Deﬁnition

A program is a sequence of expressions and deﬁnitions
...

Deﬁnition

::⇒

(deﬁne (Name Parameters) Expression)

Abbreviation for
(deﬁne Name (lambda Parameters) Expression)
Expression

::⇒

PrimitiveExpression | NameExpression
| ApplicationExpression
| ProcedureExpression | IfExpression

The value of the expression is the value of the replacement
expression
...
A primitive expression evaluates to
its pre-deﬁned value
...
Programming
NameExpression

::⇒

Name

Evaluation Rule 2: Names
...

ApplicationExpression ::⇒

(Expression MoreExpressions)

Evaluation Rule 3: Application
...
Evaluate all the subexpressions;
b
...

MoreExpressions
ProcedureExpression
Parameters

::⇒
::⇒
::⇒

| Expression MoreExpressions
(lambda (Parameters) Expression)
| Name Parameters

Evaluation Rule 4: Lambda
...

IfExpression

::⇒

(if ExpressionPredicate
ExpressionConsequent
ExpressionAlternate )

Evaluation Rule 5: If
...

The evaluation rule for an application (Rule 3b) uses apply to perform the application
...

To apply a primitive procedure, just do it
...

To apply a constructed procedure, evaluate the body of the procedure with
each parameter name bound to the corresponding input expression value
...
Thus,
the evaluation rules are deﬁned using the application rules, which are deﬁned
using the evaluation rules! This appears to be a circular deﬁnition, but as with
the grammar examples, it has a base case
...
g
...
Hence, the process of evaluating an expression will sometimes ﬁnish and when it does we end with the value of the
expression
...

52

3
...
9
...
In fact (as we show in
Chapter 12), we have covered enough to express every possible computation!
We just need to combine these constructs in more complex ways to perform
more interesting computations
...

4

Problems and Procedures
A great discovery solves a great problem, but there is a grain of discovery in the solution of
any problem
...

George P´ lya, How to Solve It
o

Computers are tools for performing computations to solve problems
...

4
...

Once the question is answered or the obstacle circumvented, the problem is
solved and we can declare victory and move on to the next one
...
We don’t just want to solve one instance of a problem, we want an
algorithm that can solve all instances of a problem
...
Recall from Chapter 1, that a
procedure is a precise description of a process and a procedure is guaranteed
to always ﬁnish is called an algorithm
...
Although the name algorithm was adopted after al-Khw¯ rizm¯’s book, algoa
ı
rithms go back much further than that
...
1)
...
There are inﬁnitely many possible
inputs that each specify different instances of the problem; a general solution to
the problem is a procedure that ﬁnds the best route for all possible inputs
...
A procedure
1 Actually

ﬁnding a general algorithm that does without needing to essentially try all possible
routes is a challenging and interesting problem, for which no efﬁcient solution is known
...
2
...
1
...
1
...

Our goal in solving a problem is to devise a procedure that takes inputs that
deﬁne a problem instance, and produces as output the solution to that problem
instance
...

There is no magic wand for solving problems
...
The creative challenge is to ﬁnd the simpler subproblems
that can be combined to solve the original problem
...

divide-and-conquer

The following sections describe a two key forms of divide-and-conquer problem
solving: composition and recursive problem solving
...

4
...
Each step can be deﬁned by one procedure, and the two
procedures can be combined to create one procedure that solves the problem
...
2 shows a composition of two functions, f and g
...

Input

f

g

Output

Figure 4
...
Composition
...
The written order appears to be reversed from the picture in Figure 4
...
This is because we apply a procedure to the values of its subexpressions:
2 Although procedures can produce more than one output, we limit our discussion here to procedures that produce no more than one output
...

Chapter 4
...
So, the inner subexpression (f x) is evaluated ﬁrst since the evaluation rule for the outer application expression is to ﬁrst
evaluate all the subexpressions
...
This works for any two
procedures that both take a single input parameter
...

4
...
1

Procedures as Inputs and Outputs

All the procedure inputs and outputs we have seen so far have been numbers
...
A higher-order procedure is a procedure that takes other procedures as inputs or that produces a procedure as its output
...

We can create a generic composition procedure by making f and g parameters:
(deﬁne fog (lambda (f g x) (g (f x))))
The fog procedure takes three parameters
...
The third parameter is a value that can be the input to the
ﬁrst procedure
...
In the second example, the ﬁrst parameter is the procedure produced by the lambda expression (lambda (x) (+ x 1))
...
We use
a deﬁnition to name this procedure inc (short for increment):
(deﬁne inc (lambda (x) (+ x 1)))
A more useful composition procedure would separate the input value, x, from
the composition
...
Hence, the result of applying fcompose to two procedures is not a simple
value, but a procedure
...

3 We name our composition procedure fcompose to avoid collision with the built-in compose procedure that behaves similarly
...
3
...
1
...
Assume fcompose and inc are deﬁned as above
...
((fcompose square square) 3)
b
...
((fcompose (lambda (x) (∗ x 2)) (lambda (x) (/ x 2))) 1120)
d
...
2
...

Exercise 4
...
Deﬁne a procedure fcompose3 that takes three procedures as input, and produces as output a procedure that is the composition of the three
input procedures
...
Deﬁne fcompose3 two different ways: once without using fcompose,
and once using fcompose
...
4
...
Deﬁne a f2compose procedure that composes two procedures
where the ﬁrst procedure takes two inputs, and the second procedure takes one
input
...

4
...
A particularly useful variation on this is when we can break a problem into a smaller
version of the original problem
...
3
...
Problems and Procedures

f
Input

Figure 4
...
Circular Composition
...
This never stops — no output is ever produced and
the interpreter will keep evaluating applications of f until it is stopped or runs
out of memory
...
To make progress, each subsequent application should have a smaller
input
...
The stopping condition is called the
base case, similarly to the grammar rules in Section 2
...
In our grammar examples, the base case involved replacing the nonterminal with nothing (e
...
,
MoreDigits ::⇒ ) or with a terminal (e
...
, Noun ::⇒ Alice)
...
When the input is a number, this
is often (but not necessarily) when the input is 0 or 1
...
If it does, the consequent expression is the known answer
for the base case
...
That application needs to make progress towards reaching
the base case
...
If the base case is for 0, and the original input is a positive
number, one way to get closer to the base case input is to subtract 1 from the
input value with each recursive application
...
4
...
For the base case
application, a result is returned to the previous application
...
Keeping track of where we are in a
recursive evaluation is similar to keeping track of the subnetworks in an RTN
traversal
...

Here is the corresponding procedure:
(deﬁne g
(lambda (n)
(if (= n 0) 1 (g (− n 1)))))
Unlike the earlier circular f procedure, if we apply g to any non-negative integer
it will eventually produce an output
...

4 Curious

readers should try entering this deﬁnition into a Scheme interpreter and evaluating (f

0)
...

58

4
...
Recursive Problem Solving

2

1

0
1
1

1

Figure 4
...
Recursive Composition
...
The subexpression, (− n
1) evaluates to 1, so the result is the result of applying g to 1
...
The next application leads to the application, (g
0)
...
The consequent expression is just 1, so no further applications of g
are performed and this is the result of the application (g 0)
...

We can think of the recursive evaluation as winding until the base case is reached,
and then unwinding the outputs back to the original application
...
To solve interesting problems with recursive
procedures, we need to accumulate results as the recursive applications wind
or unwind
...
1 and 4
...
Example 4
...

Example 4
...
The second card can be any of the cards except for
the card that is the top card, so there are 51 possible choices for the second card
...

52 ∗ 51 ∗ 50 ∗ · · · ∗ 2 ∗ 1
factorial

This is known as the factorial function (denoted in mathematics using the exclamation point, e
...
, 52!)
...
Problems and Procedures

59

Evaluating (factorial 52) produces the number of arrangements of a 52-card deck:
a sixty-eight digit number starting with an 8
...
The only difference is the alternative expression
for the if expression: in g we used (g (− n 1)); in factorial we added the outer
application of ∗: (∗ n (factorial (− n 1)))
...

Exercise 4
...
How many different ways are there of choosing an unordered 5card hand from a 52-card deck?
This is an instance of the “n choose k” problem (also known as the binomial
coefﬁcient): how many different ways are there to choose a set of k items from
n items
...
, and n − k + 1 ways to choose the kth item
...
The number of possible ways to
order the k items is k!, so we can compute the number of ways to choose k items
from a set of n items as:
n ∗ ( n − 1) ∗ · · · ∗ ( n − k + 1)
n!
=
k!
(n − k)!k!
a
...

b
...

c
...

In a standard 52-card deck, there are 13 cards of each of the four suits
...

Exercise 4
...
Reputedly, when Karl Gauss was in elementary school his teacher
assigned the class the task of summing the integers from 1 to 100 (e
...
, 1 + 2 +
3 + · · · + 100) to keep them busy
...
Had he been a computer scientist, however, and had access to a
Scheme interpreter in the late 1700s, he might have instead deﬁned a recursive
procedure to solve the problem
...
For example, (gauss-sum 100) should evaluate to 5050
...
7
...
6) and factorial
...
g
...
With your accumulate procedure, ((accumulate +)
100) should evaluate to 5050 and ((accumulate ∗) 3) should evaluate to 6
...
3
...

Hint: since your procedure should produce a procedure as its output, it could
start like this:
(deﬁne (accumulate f )
(lambda (n)
(if (= n 1) 1

...
2: Find Maximum
Consider the problem of deﬁning a procedure that takes as its input a procedure,
a low value, and a high value, and outputs the maximum value the input procedure produces when applied to an integer value between the low value and high
value input
...
To ﬁnd the maximum, the
ﬁnd-maximum procedure should evaluate the input procedure f at every integer value between the low and high, and output the greatest value found
...
For the base case, we need a case so simple we already
know the answer
...
Then, there
is only one value to use, and we know the value of the maximum is (f low)
...

How do we make progress towards the base case? Suppose the value of high is
equal to the value of low plus 1
...
We could select it using the bigger procedure
(from Example 3
...
We can extend this to the case
where high is equal to low plus 2:
(bigger (f low) (bigger (f (+ low 1)) (f (+ low 2))))
The second operand for the outer bigger evaluation is the maximum value of the
input procedure between the low value plus one and the high value input
...
This works whether high is
equal to (+ low 1), or (+ low 2), or any other value greater than high
...
8
...
Problems and Procedures

61

its input, we need to evaluate at all numbers in the range, not just the integers
...
We can approximate this, however, by evaluating the function at
many numbers in the range
...
As the value of epsilon decreases, ﬁnd-maximum-epsilon
should evaluate to a value that approaches the actual maximum value
...
5 x))) 1 10 1)
evaluates to 7
...
And,
(ﬁnd-maximum-epsilon (lambda (x) (∗ x (− 5
...
01)
evaluates to 7
...

Exercise 4
...
[ ] The ﬁnd-maximum procedure we deﬁned evaluates to the
maximum value of the input function in the range, but does not provide the
input value that produces that maximum output value
...

For example, (ﬁnd-maximum-input inc 1 10) should evaluate to 10 and (ﬁndmaximum-input (lambda (x) (∗ x (− 5
...

Exercise 4
...
[ ] Deﬁne a ﬁnd-area procedure that takes as input a function
f , a low range value low, a high range value high, and an increment epsilon,
and produces as output an estimate for the area under the curve produced by
the function f between low and high using the epsilon value to determine how
many regions to evaluate
...
3: Euclid’s Algorithm
In Book 7 of the Elements, Euclid describes an algorithm for ﬁnding the greatest
common divisor of two non-zero integers
...
For example, the greatest common divisor of 150 and 200 is 50 since
(/ 150 50) evaluates to 3 and (/ 200 50) evaluates to 4, and there is no number
greater than 50 that can evenly divide both 150 and 200
...
For example,
(modulo 6 3) evaluates to 0 and (modulo 7 3) evaluates to 1
...
If (modulo a b) evaluates to 0 then b is the greatest common divisor of a
and b
...
If (modulo a b) evaluates to a non-zero integer r, the greatest common
divisor of a and b is the greatest common divisor of b and r
...
3
...
For the
gcd-euclid procedure, the base case corresponds to the ﬁrst property above
...
The alternate expression, (gcd-euclid b (modulo a b)), is the recursive application
...
We do not need to combine
the result of the recursive application with some other value as was done in the
factorial deﬁnition, the result of the recursive application is the ﬁnal result
...
When no further evaluation is necessary to get from
the result of the recursive application to the ﬁnal result, a recursive deﬁnition is
said to be tail recursive
...
Since the ﬁnal call produces the ﬁnal result, there is no need for the
interpreter to unwind the recursive calls to produce the answer
...
11
...

Exercise 4
...
[ ] Provide a convincing argument why the evaluation of (gcdeuclid a b) will always ﬁnish when the inputs are both positive integers
...
13
...

To be tail recursive, the expression containing the recursive application cannot
be part of another application expression
...
)
Exercise 4
...
Provide a tail recursive deﬁnition of ﬁnd-maximum
...
15
...

Exploration 4
...
It
is known as Heron’s method after the Greek mathematician Heron of Alexandria
who lived in the ﬁrst century AD who described the method, although it was also
known to the Babylonians many centuries earlier
...
We
name our procedure gcd-euclid to avoid a clash with the build-in procedure
...
Problems and Procedures

more general method for estimating functions based on their derivatives known
as Netwon’s method, of which Heron’s method is a specialization
...
For many numbers (including 2), the square
root is irrational, so the best we can hope for with is a good approximation
...

Heron’s method works by starting with an arbitrary guess, g0
...

The deﬁnition is recursive since we compute gn as a function of gn−1 , so we can
deﬁne a recursive procedure that computes Heron’s method
...
It takes three inputs: the target number, a, the number of guesses to
make, n, and the value of the ﬁrst guess, g
...
The choice doesn’t really matter
— the method works with any starting guess (but will reach a closer estimate
quicker if the starting guess is good)
...
So, we
can deﬁne a ﬁnd-sqrt procedure that takes two inputs, the target number and
the number of guesses to make, and outputs an approximation of the square
root of the target number
...
4142135623730951

Heron of
Alexandria

64

4
...
Evaluating Recursive Applications

The actual square root of 2 is 1
...
, so our estimate is correct
to 16 digits after only ﬁve guesses
...
Instead, what is important to a square root
user is how close the estimate is to the actual value
...
Hence, we can stop when the square of
the guess is close enough to the target value
...
Otherwise, our deﬁnitions are the same as before
...
We are making the problem smaller by making each successive guess closer
to the required answer
...
01)))
2
...
0000001)))
2
...
How accurate is the built-in sqrt procedure?
b
...
Why doesn’t the built-in procedure do better?

4
...
It may be confusing, however, to see
that this works because of the apparent circularity of the procedure deﬁnition
...
The evaluation and application rules refer to the rules summary in Section 3
...
We ﬁrst
show the complete evaluation following the substitution model evaluation rules

65

Chapter 4
...

Stepping through even a fairly simple evaluation using the evaluation rules is
quite tedious, and not something humans should do very often (that’s why we
have computers!) but instructive to do once to understand exactly how an expression is evaluated
...
A Scheme interpreter is free to evaluate
them in any order
...
The value produced by an evaluation does not depend on
the order in which the subexpressions are evaluated
...
So, 2 represents the Scheme expression that evaluates to the number 2
...
Once we introduce side effects
and mutation, it is no longer the case, and expressions can produce different results depending on
the order in which they are evaluated
...
4
...
)))) 0)))

33

(* 2 (* 1 ((lambda (n) (if (= n 0) 1 (* n (factorial (- n 1))))) 0)))

34

(* 2 (* 1 (if (= 0 0) 1 (* 0 (factorial (- 0 1))))))

35

(* 2 (* 1 (if (= 0 0) 1 (* 0 (factorial (- 0 1))))))

36

(* 2 (* 1 (if (= 0 0) 1 (* 0 (factorial (- 0 1))))))

31

Evaluation Rule 4, Lambda
Evaluation Rule 3(b), Application Rule 2
Evaluation Rule 5(a): If predicate
Evaluation Rule 3(a): Application subexpressions
Evaluation Rule 1: Primitives
37

(* 2 (* 1 (if (= 0 0) 1 (* 0 (factorial (- 0 1))))))

38

(* 2 (* 1 (if true 1 (* 0 (factorial (- 0 1))))))

Evaluation Rule 3(b): Application, Application Rule 1

(* 2 (* 1 1))
(* 2 (* 1 1))
(* 2 1)

39
40
41
42

2

Evaluation Rule 5(b): If consequent
Evaluation Rule 1: Primitives
Evaluation Rule 3(b): Application, Application Rule 1
Evaluation Rule 3(b): Application, Application Rule 1
Evaluation ﬁnished, no unevaluated expressions remain
...
If the if expression were evaluated like a regular application all subexpressions would be evaluated, and the alternative expression containing the recursive call would never ﬁnish evaluating! Since the evaluation rule for if evaluates
the predicate expression ﬁrst and does not evaluate the alternative expression
when the predicate expression is true, the circularity in the deﬁnition ends when
the predicate expression evaluates to true
...
In the example,
this is the base case where (= n 0) evaluates to true and instead of producing
another recursive call it evaluates to 1
...
The structure of the evaluation is clearer from just the
most revealing steps:
1
17
31
40
41
42

(factorial 2)
(* 2 (factorial 1))
(* 2 (* 1 (factorial 0)))
(* 2 (* 1 1))
(* 2 1)
2

Step 1 starts evaluating (factorial 2)
...
To evaluate (factorial 2), we follow the evaluation rules, eventually reaching the body
expression of the if expression in the factorial deﬁnition in Step 17
...
At Step 17,
the ﬁrst evaluation is in progress, but to complete it we need the value resulting
from the second recursive application
...
At this point, the evaluation of (factorial 2) is stuck in

Chapter 4
...
The evaluation of the (factorial 1) application leads to the (factorial 0) subexpression,
which must be evaluated before the (factorial 1) evaluation can complete
...
Now, the (factorial 1) evaluation can complete, producing 1
as shown in Step 41
...

Each recursive application can be tracked using a stack, similarly to processing RTN subnetworks (Section 2
...
A stack has the property that the ﬁrst item
pushed on the stack will be the last item removed—all the items pushed on
top of this one must be removed before this item can be removed
...
To
ﬁnish evaluating the ﬁrst expression, all of its component subexpressions must
be evaluated
...

Exercise 4
...
This exercise tests your understanding of the (factorial 2) evaluation
...
In step 5, the second part of the application evaluation rule, Rule 3(b), is used
...
In step 11, the ﬁrst part of the application evaluation rule, Rule 3(a), is used
...
In step 25, the ﬁrst part of the application evaluation rule, Rule 3(a), is used
...
To evaluate (factorial 3), how many times would Evaluation Rule 2 be used to
evaluate the name factorial?
e
...
17
...
For values where the evaluation would not ﬁnish, explain why
...
5

Developing Complex Programs

To develop and use more complex procedures it will be useful to learn some
helpful techniques for understanding what is going on when procedures are
evaluated
...
Wise programmers build programs incrementally, by writing and testing small components one at a time
...
The key to debugging effectively is to be systematic and thoughtful
...
Thoughtless debugging can be very frustrating, and is unlikely to lead to a correct program
...
5
...
Ensure you understand the intended behavior of your procedure
...

2
...
Try your
program on simple inputs ﬁrst
...
Make changes to your procedure and retest it
...

First actual bug
Grace Hopper’s notebook, 1947

For more complex programs, follow this strategy at the level of sub-components
...
Break your program into several procedures so you can
test and debug each procedure independently
...

DrRacket provides many useful and powerful features to aid debugging, but the
most important tool for debugging is using your brain to think carefully about
what your program should be doing and how its observed behavior differs from
the desired behavior
...

4
...
1

Printing

One useful procedure built-in to DrRacket is the display procedure
...
Instead of producing an output, it prints out the
value of the input (it will appear in purple in the Interactions window)
...

For example, if we add a (display n) expression at the beginning of our factorial
procedure we can see all the intermediate calls
...
The newline procedure
prints a new line; it takes no inputs and produces no output
...

It takes one or more inputs
...
The string can include special ~a markers that print
out values of objects inside the string
...
Another special marker, ~n, prints out a new line inside the string
...
Problems and Procedures

69

(deﬁne (factorial n)
(printf "Enter factorial: ˜a˜n" n)
(if (= n 0) 1 (∗ n (factorial (− n 1)))))
The display, printf , and newline procedures do not produce output values
...
A side effect is something that
changes the state of a computation
...
Side effects make reasoning about what programs do
much more complicated since the order in which events happen now matters
...

4
...
2

Tracing

DrRacket provides a more automated way to observe applications of procedures
...

To use tracing, it is necessary to ﬁrst load the tracing library by evaluating this
expression:
(require racket/trace)
This deﬁnes the trace procedure that takes one input, a constructed procedure
(trace does not work for primitive procedures)
...
If there are other applications before the ﬁrst application
ﬁnishes evaluating, these will be printed indented so it is possible to match up
the beginning and end of each application evaluation
...
The outputs of each of these applications is lined up vertically below the application entry trace
...
2: Recipes for π
The value π is the deﬁned as the ratio between the circumference of a circle and
its diameter
...
5
...
The more terms that are included, the closer
the computed value will be to the actual value of π
...
[ ] Deﬁne a procedure compute-pi that takes as input n, the number of terms
to include and outputs an approximation of π computed using the ﬁrst n
terms of the Gregory-Leibniz series
...
For higher terms, use the built-in
procedure exact->inexact to see the decimal value
...
1414926535900434
...

M¯ dhava discovered another series for computing the value of π that converges
a
much more quickly:
π=

√

12 ∗ (1 −

1
1
1
1
−
+
−
...
14159265359
...
[ ] Deﬁne a procedure cherry-pi that takes as input n, the number of terms
to include and outputs an approximation of π computed using the ﬁrst n
terms of the M¯ dhava series
...
)
a
To deﬁne faster-pi, ﬁrst deﬁne two helper functions: faster-pi-helper, that takes
one input, n, and computes the sum of the ﬁrst n terms in the series without the
√
12 factor, and faster-pi-term that takes one input n and computes the value
of the nth term in the series (without alternating the adding and subtracting)
...
Then, deﬁne faster-pi as:
(deﬁne (faster-pi terms) (∗ (sqrt 12) (faster-pi-helper terms)))
This uses the built-in sqrt procedure that takes one input and produces as output an approximation of its square root
...
1 for ways to compute a more accurate approximation for the
square root of 12)
...

The built-in expt procedure takes two inputs, a and b, and produces ab as its
output
...

c
...
This point is named for
Richard Feynman, who quipped that he wanted to memorize π to that point
so he could recite it as “
...

7 To test its accuracy, try evaluating (square

(sqrt 12))
...
Problems and Procedures

71

Exploration 4
...
For this exploration, we
consider how to develop a winning strategy for some two-player games
...
The game ends when the player who’s turn it is cannot move; the
other player wins
...

One approach for developing a winning strategy is to work backwards from the
winning position
...
If the game reaches a winning position for player 1, then player 1 wins
...
Continuing backwards, if the game reaches a position where
it is player 1’s move, and there is a move that leads to a position where all possible moves for player 2 lead to a winning position for player 1, then player 1 is
guaranteed to win
...
Variants on Nim have been played
widely over many centuries, but no one is quite sure where the name comes
from
...

In this version of Nim, the game starts with a pile of 21 stones
...
The player who removes the last stone
wins, since the other player cannot make a valid move on the following turn
...
What should the player who moves ﬁrst do to ensure she can always win the
game? (Hint: start with the base case, and work backwards
...
)
b
...
For
what values of n should the ﬁrst player always win (when the game starts
with 21 stones)?
A standard Nim game starts with three heaps
...
We can describe the state of a 3-heap game of Nim using three
numbers, representing the number of stones in each heap
...
8
c
...
Which player should win if the starting state is (2 2 2)?
e
...
[ ] Describe a strategy for always winning a winnable game of Nim starting
from any position
...

9 If you get stuck, you’ll ﬁnd many resources about Nim on the Internet; but, you’ll get a lot more
out of this if you solve it yourself
...
5
...
10 The game is played using a single Queen on a arbitrarily large chessboard as shown in Figure 4
...

«
0

1

2

5

6

7

Figure 4
...
Cornering the Queen
...
As with the other games,
the last player to make a legal move wins
...
Hence,
the player who moves the Queen onto the wins the game
...

So, the Queen in the picture is on square (4 7)
...
Identify all the starting squares for which the ﬁrst played to move can win
right away
...
)
h
...
Explain why there
is no way you can avoid losing the game
...
Given the shown starting position (with the Queen at (4 7), would you rather
be the ﬁrst or second player?
j
...
Explain from
which starting positions it is not possible to win (assuming the other player
always makes the right move)
...
[ ] Deﬁne a variant of Nim that is essentially the same as the “Corner the
Queen” game
...
)
Developing winning strategies for these types of games is similar to deﬁning a
recursive procedure that solves a problem
...

10 Described in Martin Gardner, Penrose Tiles to Trapdoor Ciphers
...

Chapter 4
...
6

73

Summary

By breaking problems down into simpler problems we can develop solutions
to complex problems
...
When we deﬁne a procedure to solve a
problem this way, it needs to have a predicate expression to determine when the
base case has been reached, a consequent expression that provides the value for
the base case, and an alternate expression that deﬁnes the solution to the given
input as an expression using a solution to a smaller input
...
Be optimistic! Assume you can solve it
...
Think of the simplest version of the problem, something you can already
solve
...

3
...
This is the recursive
case
...
Combine the base case and the recursive case to solve the problem
...
The problem size is usually reduced is by subtracting 1 from one of the
inputs
...
For problems
involving complex data, the same strategy will work but with different base cases
and ways to shrink the problem size
...

Albert Einstein

74

4
...
Summary

5

Data

From a bit to a few hundred megabytes, from a microsecond to half an hour of computing
confronts us with the completely bafﬂing ratio of 109 !
...

Edsger Dijkstra

For all the programs so far, we have been limited to simple data such as numbers
and Booleans
...
As we saw in
Chapter 1, we can represent all discrete data using just (enormously large) whole
numbers
...
But, it would be very difﬁcult to design and understand
computations that use numbers to represent complex data
...
We want
to represent data in ways that allow us to think about the problem we are trying
to solve, rather than the details of how data is represented and manipulated
...

5
...
Internally, all data is stored just
as a sequence of bits, so the type of the data is important to understand what it
means
...

A datatype deﬁnes a set (often inﬁnite) of possible values
...
The Number type includes the
inﬁnite set of all whole numbers (it also includes negative numbers and rational
numbers)
...
On any real computer, the number of possible values of any data type is always ﬁnite
...

The type of a value determines what can be done with it
...
A
Boolean can be used as the ﬁrst subexpression of an if expression and as the

datatype

76

5
...
Types

input to the not procedure (—not— can also take a Number as its input, but for
all Number value inputs the output is false), but cannot be used as the input to
+, ∗, or =
...
There
are inﬁnitely many different types of Procedures, since the type of a Procedure
depends on its input and output types
...
We denote
this type as:
Number × Number → Number
The inputs to the procedure are shown on the left side of the arrow
...
2 The output type is
given on the right side of the arrow
...
How do we know the inputs must be Numbers and the output is
a Number?
The body of the bigger procedure is an if expression with the predicate expression (> a b)
...
The
type of the > procedure is Number × Number → Boolean
...
This means the
input values to bigger must both be Numbers
...

Starting with the primitive Boolean, Number, and Procedure types, we can build
arbitrarily complex datatypes
...

Exercise 5
...
Describe the type of each of these expressions
...
17
b
...
((lambda (a) (> a 0)) 3)
d
...
(lambda (a) a)
1 The primitive procedure equal? is a more general comparison procedure that can take as inputs any two values, so could be used to compare Boolean values
...

2 The notation using × to separate input types makes sense if you think about the number of
different inputs to a procedure
...
Each Boolean input can be one of two possible
values
...

77

Chapter 5
...
2
...

a
...
Number → Number
c
...
Number → (Number → (Number → Number))

5
...
We draw a Pair as two boxes,
each containing a value
...
Here is a Pair where the
ﬁrst cell has the value 37 and the second cell has the value 42:

37

42

Scheme provides built-in procedures for constructing a Pair, and for extracting
each cell from a Pair:
cons: Value × Value → Pair
Evaluates to a Pair whose ﬁrst cell is the ﬁrst input and second cell is the
second input
...

car: Pair → Value
Evaluates to the ﬁrst cell of the input, which must be a Pair
...

These rather unfortunate names come from the original LISP implementation
on the IBM 704
...
The name car is short for
“Contents of the Address part of the Register” and the name cdr (pronounced
“could-er”) is short for “Contents of the Decrement part of the Register”
...
6)
...

We can construct the Pair shown above by evaluating (cons 37 42)
...
42)
...

> (deﬁne mypair (cons 37 42))
> (car mypair)
37

> (cdr mypair)
42

The values in the cells of a Pair can be any type, including other Pairs
...
2
...
2), and (cdr doublepair)
evaluates to the Pair (3
...

We can compose multiple car and cdr applications to extract components from
nested pairs:

> (cdr (car doublepair))
2

> (car (cdr doublepair))
3

> ((fcompose cdr cdr) doublepair)

fcompose from Section 4
...
1

4

> (car (car (car doublepair)))
car: expects argument of type ; given 1
The last expression produces an error when it is evaluated since car is applied
to the scalar value 1
...
Hence, an error results when we attempt to apply car to
a scalar value
...
g
...
g
...
Every
procedure expects a certain type of inputs, and typically produces an error when
it is applied to values of the wrong type
...
For instance, try drawing (cons 1 (cons 2 (cons 3 (cons 4 5)))) this way
...

This is the structure of doublepair shown using arrows:

Using arrows to point to cell contents allows us to draw arbitrarily complicated
data structures such as (cons 1 (cons 2 (cons 3 (cons 4 5)))), keeping the cells
reasonable sizes:

1

2

3

4

5

Chapter 5
...
3
...

a
...
(car (car (car tpair)))
c
...
(car (cdr (cdr tpair)))
Exercise 5
...
Write expressions that extract each of the four elements from
fstruct deﬁned by (deﬁne fstruct (cons 1 (cons 2 (cons 3 4))))
...
5
...

5
...
1

Making Pairs

Although Scheme provides the built-in procedures cons, car, and cdr for creating Pairs and accessing their cells, there is nothing magical about these procedures
...

Here is one way to deﬁne the pair procedures (we prepend an s to the names to
avoid confusion with the built-in procedures):
(deﬁne (scons a b) (lambda (w) (if w a b)))
(deﬁne (scar pair) (pair true))
(deﬁne (scdr pair) (pair false))
The scons procedure takes the two parts of the Pair as inputs, and produces as
output a procedure
...
If the selector is true, the
value of the if expression is the value of the ﬁrst cell; if the selector is false, it is
the value of the second cell
...

80

5
...
Pairs

Exercise 5
...
Convince yourself the deﬁnitions of scons, scar, and scdr above
work as expected by following the evaluation rules to evaluate
(scar (scons 1 2))
Exercise 5
...
Show the corresponding deﬁnitions of tcar and tcdr that provide
the pair selection behavior for a pair created using tcons deﬁned as:
(deﬁne (tcons a b) (lambda (w) (if w b a)))

5
...
2

Triples to Octuples

Pairs are useful for representing data that is composed of two parts such as a
calendar date (composed of a number and month), or a playing card (composed
of a rank and suit)
...

A triple has three components
...

Another way to make a triple would be to combine two Pairs
...

A triple is a Pair whose second cell is a Pair
...

A quintuple is a Pair whose second cell is a quadruple
...

Building from the simple Pair, we can construct tuples containing any number
of components
...
Data

81

Exercise 5
...
Deﬁne a procedure that constructs a quintuple and procedures
for selecting the ﬁve elements of a quintuple
...
9
...
Provide deﬁnitions of make-triple, tripleﬁrst, triple-second, and triple-third for this construct
...
3

Lists

In the previous section, we saw how to construct arbitrarily large tuples from
Pairs
...
For many applications, we want to be able to manage data of any length
such as all the items in a web store, or all the bids on a given item
...
We need a data
type that can hold any number of items
...

This seems to allow an any-uple to contain any number of elements
...
With only the deﬁnition above, there is no
way to construct an any-uple without already having one
...

Recall the grammar rules for MoreExpressions:
MoreExpressions ::⇒ Expression MoreExpressions
MoreExpressions ::⇒
The rule for constructing an any-uple is analogous to the ﬁrst MoreExpression
replacement rule
...
Since it is hard to type and read nothing in a program, Scheme
has a name for this value: null
...
It is also known as the empty list,
since it represents the List containing no elements
...

Using null, we can now deﬁne a List:
A List is either (1) null or (2) a Pair whose second cell is a List
...
3
...
Starting from null, we can create Lists of any length:
null evaluates to a List containing no elements
...

(cons 1 (cons 2 null)) evaluates to a List containing two elements
...

•
•
•
•

Scheme provides a convenient procedure, list, for constructing a List
...
The following expressions are equivalent to the corresponding
expressions above: (list), (list 1), (list 1 2), and (list 1 2 3)
...
Here is the structure resulting from (list 1 2 3):

2

1

3

There are three Pairs in the List, the second cell of each Pair is a List
...

Table 5
...

Exercise 5
...
For each of the following expressions, explain whether or not the
expression evaluates to a List
...

a
...
(cons 1 2)
c
...
(cons (cons (cons 1 2) 3) null)
e
...
(cons (list 1 2 3) 4)
Type

Output

cons

Value × Value → Pair

a Pair consisting of the two inputs

car

Pair → Value

the ﬁrst cell of the input Pair

cdr

Pair → Value

the second cell of the input Pair

list

zero or more Values → List

a List containing the inputs

null?

Value → Boolean

true if the input is null, otherwise false

pair?

Value → Boolean

true if the input is a Pair, otherwise false

list?

Value → Boolean

true if the input is a List, otherwise false

Table 5
...
Selected Built-In Scheme Procedures for Lists and Pairs
...
Data

5
...
Whereas most recursive procedures on inputs that are Numbers usually used 0 as the base case, for lists the
most common base case is null
...
This means we often break
problems involving Lists into ﬁguring out what to do with the ﬁrst element of
the List and the result of applying the recursive procedure to the rest of the List
...
Be very optimistic! Since lists themselves are recursive data structures,
most problems involving lists can be solved with recursive procedures
...
Think of the simplest version of the problem, something you can already
solve
...
For lists, this is usually the empty list
...
Consider how you would solve a big version of the problem by using the
result for a slightly smaller version of the problem
...
For lists, the smaller version of the problem is usually the rest (cdr)
of the List
...
Combine the base case and the recursive case to solve the problem
...
Section 5
...
2 generalizes these procedures
...
4
...

5
...
1

Procedures that Examine Lists

All of the example procedures in this section take a single List as input and produce a scalar value that depends on the elements of the List as output
...

Example 5
...
” For this procedure, the simplest version of the problem
is when the input is the empty list, null
...
So, the base case test is (null? p) and the output for the base case is 0
...
Recall from our deﬁnition that a List is either null or (cons Value List )
...
The length of this List is one more than the length of
the List that is the cdr of the Pair
...
Here, we will deﬁne our own list-length procedure that does this (without
using the built-in length procedure)
...

84

5
...
List Procedures

(deﬁne (list-length p)
(if (null? p)
0

(+ 1 (list-length (cdr p)))))
Here are a few example applications of our list-length procedure:
> (list-length null)
0

> (list-length (cons 0 null))
1

> (list-length (list 1 2 3 4))
4

Example 5
...
As usual, the base case is
when the input is null: the sum of an empty list is 0
...

(deﬁne (list-sum p)
(if (null? p) 0 (+ (car p) (list-sum (cdr p)))))
We can deﬁne list-product similarly, using ∗ in place of +
...
We follow the mathematical convention that the
product of the empty list is 1
...
11
...
Your procedure should behave identically
to the built-in list? procedure, but you should not use list? in your deﬁnition
...
12
...
For example, (list-max
(list 1 1 2 0)) should evaluate to 2
...
4
...
The base case is when the input is the empty list, and the recursive case
involves doing something with the ﬁrst element of the List and recursively calling the procedure with the rest of the List:
(deﬁne (Recursive-Procedure p)
(if (null? p)
Base-Case-Result
(Accumulator-Function (car p) (Recursive-Procedure (cdr p)))))
We can deﬁne a generic accumulator procedure for lists by making the base case
result and accumulator function inputs:

Chapter 5
...
The recursive case in the
original list-length procedure is (+ 1 (list-length (cdr p))); it does not use the
value of the ﬁrst element of the List
...

We should follow our usual strategy: be optimistic! Being optimistic as in recursive deﬁnitions, the value of the second input should be the length of the rest of
the List
...
13
...
12)
...
14
...
11)
...
3: Accessing List Elements
The built-in car procedure provides a way to get the ﬁrst element of a list, but
what if we want to get the third element? We can do this by taking the cdr twice
to eliminate the ﬁrst two elements, and then using car to get the third:
(car (cdr (cdr p)))
We want a more general procedure that can access any selected list element
...
If we
start counting from 1 (it is often more natural to start from 0), then the base case
is when the index is 1 and the output should be the ﬁrst element of the List:
(if (= n 1) (car p)
...
We also need to adjust the index: since we have removed the ﬁrst element
of the list, the index should be reduced by one
...

(deﬁne (list-get-element p n)
(if (= n 1)
(car p)
(list-get-element (cdr p) (− n 1))))
What happens if we apply list-get-element to an index that is larger than the size
of the input List (for example, (list-get-element (list 1 2) 3))?

86

5
...
List Procedures

The ﬁrst recursive call is (list-get-element (list 2) 2)
...
At this point, n is 1, so the base case is reached and (car
p) is evaluated
...

A better version of list-get-element would provide a meaningful error message
when the requested element is out of range
...
The String datatype is a
sequence of characters; we can create a String by surrounding characters with
double quotes, as in the example
...

defensive
programming

Checking explicitly for invalid inputs is known as defensive programming
...

Exercise 5
...
Deﬁne a procedure list-last-element that takes as input a List
and outputs the last element of the input List
...

Exercise 5
...
Deﬁne a procedure list-ordered? that takes two inputs, a test
procedure and a List
...
For example, (list-ordered? < (list 1 2 3)) evaluates to true, and (list-ordered? < (list 1 2 3 2)) evaluates to false
...

5
...
3

Procedures that Construct Lists

The procedures in this section take values (including Lists) as input, and produce a new List as output
...

Since we are producing a List as output, the result for the base case is also usually null
...

Example 5
...

For the base case, applying any procedure to every element of the empty list
produces the empty list
...

The ﬁrst element is the result of applying the mapping procedure to the ﬁrst
element of the input List
...

Here is a procedure that constructs a List that contains the square of every element of the input List:

87

Chapter 5
...
The procedure list-map takes a procedure as its ﬁrst input and a List as
its second input
...
4
(deﬁne (list-map f p)
(if (null? p) null
(cons (f (car p))
(list-map f (cdr p)))))
We can use list-map to deﬁne square-all:
(deﬁne (square-all p) (list-map square p))
Exercise 5
...
Deﬁne a procedure list-increment that takes as input a List of
numbers, and produces as output a List containing each element in the input
List incremented by one
...

Exercise 5
...
Use list-map and list-sum to deﬁne list-length:
(deﬁne (list-length p) (list-sum (list-map

p)))

Example 5
...
For example, (listﬁlter-negative (list 1 −3 −4 5 −2 0)) evaluates to (1 5 0)
...
If we ﬁlter the
negative numbers from the empty list, the result is an empty list
...

In the recursive case, we need to determine whether or not the ﬁrst element
should be included in the output
...
If it should not be included, we skip the ﬁrst element
and the result is the result of ﬁltering the remaining elements in the List
...
It behaves like this one when passed a procedure
and a single List as inputs, but can also work on more than one List input at a time
...
4
...
5
(deﬁne (list-ﬁlter test p)
(if (null? p) null
(if (test (car p))
(cons (car p) (list-ﬁlter test (cdr p)))
(list-ﬁlter test (cdr p)))))
Using the list-ﬁlter procedure, we can deﬁne list-ﬁlter-negative as:
(deﬁne (list-ﬁlter-negative p) (list-ﬁlter (lambda (x) (>= x 0)) p))
We could also deﬁne the list-ﬁlter procedure using the list-accumulate procedure from Section 5
...
1:
(deﬁne (list-ﬁlter test p)
(list-accumulate
(lambda (el rest) (if (test el) (cons el rest) rest))
null
p))
Exercise 5
...
Deﬁne a procedure list-ﬁlter-even that takes as input a List of
numbers and produces as output a List consisting of all the even elements of
the input List
...
20
...
As output, it produces a List that is a copy of the input List
with all of the elements for which the test procedure evaluates to true removed
...

Exercise 5
...
[ ] Deﬁne a procedure list-unique-elements that takes as input
a List and produces as output a List containing the unique elements of the input
List
...

Example 5
...
6 For the base case, when the ﬁrst List is empty, the result of appending
the lists should just be the second List
...

(deﬁne (list-append p q)
(if (null? p) q
(cons (car p) (list-append (cdr p) q))))

5 Scheme provides a built-in function

ﬁlter that behaves like our list-ﬁlter procedure
...
The built-in append takes any number of
Lists as inputs, and appends them all into one List
...
Data
Example 5
...
7 For example, (listreverse (list 1 2 3)) evaluates to the List (3 2 1)
...
The reverse of the empty list is the empty
list
...

The tricky part is putting the ﬁrst element at the end, since cons only puts elements at the beginning of a List
...
To make this
work, we need to turn the element at the front of the List into a List containing
just that element
...

(deﬁne (list-reverse p)
(if (null? p) null
(list-append (list-reverse (cdr p)) (list (car p)))))
Exercise 5
...
Deﬁne the list-reverse procedure using list-accumulate
...
8: Intsto
For our ﬁnal example, we deﬁne the intsto procedure that constructs a List containing the whole numbers between 1 and the input parameter value
...

This example combines ideas from the previous chapter on creating recursive
deﬁnitions for problems involving numbers, and from this chapter on lists
...
Instead, we use the input value 0 as the base case
...
For higher values, the output is the result of
putting the input value at the end of the List of numbers up to the input value
minus one
...
Hence, it produces the List of numbers
in descending order: (revintsto 5) evaluates to (5 4 3 2 1)
...
2:
(deﬁne intsto (fcompose list-reverse revintsto))
Alternatively, we could use list-append to put the high number directly at the
end of the List
...

7 The built-in procedure reverse

does this
...
5
...
We consider the problem of estimating the running-times of different
procedures in Part II
...
23
...

5
...
In
deﬁning procedures that operate on Lists of Lists, we often use more than one
recursive call when we need to go inside the inner Lists
...
9: Summing Nested Lists
Consider the problem of summing all the numbers in a List of Lists
...
We can
deﬁne nested-list-sum using list-sum on each List
...
But, what if the input can
contain arbitrarily deeply nested Lists?
To handle this, we need to recursively sum the inner Lists
...
If it is a List, we should add the value of
the sum of all elements in the List to the result for the rest of the List
...
So, our procedure involves two recursive calls: one for the ﬁrst element
in the List when it is a List, and the other for the rest of the List
...
10: Flattening Lists
Another way to compute the deep list sum would be to ﬁrst ﬂatten the List, and
then use the list-sum procedure
...
We can deﬁne list-ﬂatten by using list-append to
append all the inner Lists together
...
Data
(deﬁne (list-ﬂatten p)
(if (null? p) null
(list-append (car p) (list-ﬂatten (cdr p)))))

This ﬂattens a List of Lists into a single List
...
24
...
It should take two parameters, a mapping
procedure, and a List (that may contain deeply nested Lists as elements), and
output a List with the same structure as the input List with each value mapped
using the mapping procedure
...
25
...

Exploration 5
...
The numbers in Pascal’s Triangle are the coefﬁcients in a binomial expansion
...
For example, ( x + y)2 = x2 + 2xy + y2 , so the coefﬁcients are 1 2 1,
matching the third row in the triangle; from the ﬁfth row, ( x + y)4 = x4 + 4x3 y +
6x2 y2 + 4xy3 + y4
...
5) — the kth number on the
nth row of the triangle gives the number of ways to choose k elements from a set
of size n
...
6
...

The goal of this exploration is to deﬁne a procedure, pascals-triangle to produce
Pascal’s Triangle
...
For example, (pascals-triangle 0) should produce
((1)) (a list containing one element which is a list containing the number 1), and
(pascals-triangle 4) should produce ((1) (1 1) (1 2 1) (1 3 3 1) (1 4 6 4 1))
...

a
...
It
takes a List of numbers as input, and as output produces a List with one more
element than the input list
...
Every other number in the output List is the
sum of two numbers in the input List
...
For example, (expandrow (list 1)) evaluates to (1 1); (expand-row (list 1 1)) evaluates to (1 2 1); and
(expand-row (list 1 4 6 4 1)) evaluates to (1 5 10 10 5 1)
...
It also needs to deal with the ﬁrst element specially
...
Deﬁne a procedure pascals-triangle-row that takes one input, n, and outputs
the nth row of Pascal’s Triangle
...

c
...

5
...
Our goal is to hide unnecessary details about how
data is represented so we can focus on the important aspects of what the data
means and what we need to do with it to solve our problem
...

The datatypes we have seen so far are not very abstract
...
A good data abstraction
is abstract enough to be used without worrying about details like which cell of
the Pair contains which datum and how to access the different elements of a List
...

The rest of this section is an extended example that illustrates how to solve problems by ﬁrst identifying the objects we need to model the problem, and then implementing data abstractions that represent those objects
...
Data

ate data abstractions are designed and implemented, the solution to the problem often follows readily
...

Exploration 5
...
The
standard puzzle is a one-player game played on a triangular board with ﬁfteen
holes with pegs in all of the holes except one
...

A peg may jump over an adjacent peg only when there is a free hole on the other
side of the peg
...
The game ends when there are no
possible moves
...
If
more than one peg remains, the player loses (“Leave four or more’n you’re just
plain ‘eg-no-ra-moose’
...
1
...

The blue peg can jump the red peg as shown, removing the red peg
...

Our goal is to develop a program that ﬁnds a winning solution to the pegboard
game from any winnable starting position
...
Brute force solutions only work
on small-size problems
...
8
The ﬁrst thing to think about to solve a complex problem is what datatypes we
need
...
For the pegboard game, we need to model the board with its
pegs
...
The important thing about a datatype is what you can do with it
...
In
the physical pegboard game, the board holds the pegs
...

For this, we need a way of identifying board positions
...

This means it is not known whether or not any solution exists that is substantially better than the
brute force solution, but it would be extraordinarily surprising (and of momentous signiﬁcance!) to
ﬁnd one
...
6
...
Finally, we use these datatypes to deﬁne a
procedure that ﬁnds a winning solution
...
We identify the board positions using row and column numbers:
(1 1)
(2 1) (2 2)
(3 1) (3 2) (3 3)
(4 1) (4 2) (4 3) (4 4)
(5 1) (5 2) (5 3) (5 4) (5 5)
A position has a row and a column, so we could just use a Pair to represent a
position
...

Our Position datatype should provide at least these operations:
make-position: Number × Number → Position
Creates a Position representing the row and column given by the input
numbers
...

position-get-column: Position → Number
Outputs the column number of the input Position
...
A more defensive implementation
of the Position datatype uses a tagged list
...
All operations check the
tag is correct before proceeding
...
Symbols are a quote (’)
followed by a sequence of characters
...

We deﬁne the tagged list datatype, tlist, using the list-get-element procedure
from Example 5
...
5
...
Instead of printing as a side effect, format produces
a String
...
" (list 1 2 3) 123) evaluates to the
String "list: (1 2 3) number: 123
...
Data

This is an example of defensive programming
...

Using the tagged list, we deﬁne the Position datatype as:
(deﬁne (make-position row col) (make-tlist ’Position (list row col)))
(deﬁne (position-get-row posn) (tlist-get-element ’Position posn 1))
(deﬁne (position-get-column posn) (tlist-get-element ’Position posn 2))
Here are some example interactions with our Position datatype:

> (deﬁne pos (make-position 2 1))
> pos
(Position 2 1)
> (get-position-row pos)
2

> (get-position-row (list 1 2))
Bad tag: 1 (expected Position)

Error since input is not a Position
...
A move involves three positions: where the jumping peg starts, the position of the peg that is jumped and removed, and the landing position
...
A better
option is to observe that once any two of the positions are known, the third position is determined
...
Hence,
we could represent a jump using just the starting and landing positions
...
This is also enough to determine the jumped and landing positions
...
To do it, we
ﬁrst design a Direction datatype for representing the possible move directions
...

We implement the Direction datatype using a tagged list similarly to how we
deﬁned Position:
(deﬁne (make-direction right down)
(make-tlist ’Direction (list right down)))
(deﬁne (direction-get-horizontal dir) (tlist-get-element ’Direction dir 1))
(deﬁne (direction-get-vertical dir) (tlist-get-element ’Direction dir 2))
The Move datatype is deﬁned using the starting position and the jump direction:
(deﬁne (make-move start direction)
(make-tlist ’Move (list start direction)))
(deﬁne (move-get-start move) (tlist-get-element ’Move move 1))
(deﬁne (move-get-direction move) (tlist-get-element ’Move move 2))
We also deﬁne procedures for getting the jumped and landing positions of a
move
...
So, it will be useful to deﬁne a procedure that
takes a Position and a Direction as input, and outputs a Position that is one step
in the input Direction from the input Position
...
6
...

(deﬁne (move-get-jumped move)
(direction-step (move-get-start move) (move-get-direction move)))
(deﬁne (move-get-landing move)
(direction-step (move-get-jumped move) (move-get-direction move)))
Board
...
It keeps
track of which holes in the board contain pegs, and provides operations that
model adding and removing pegs from the board:
make-board: Number → Board
Outputs a board full of pegs with the input number of rows
...
)
board-rows: Board → Number
Outputs the number of rows in the input board
...

board-is-winning?: Board → Boolean
Outputs true if the Board represents a winning position (exactly one
peg); otherwise, false
...

board-add-peg : Board × Position → Board
Output a Board containing all the pegs of the input Board and one additional peg at the input Position
...

board-remove-peg : Board × Position → Board
Outputs a Board containing all the pegs of the input Board except for
the peg at the input Position
...

The procedures for adding and removing pegs change the state of the board to
reﬂect moves in the game, but nothing we have seen so far, however, provides a
means for changing the state of an existing object
...
These procedures take a
Board and Position as inputs, and produce as output a Board
...
One possibility is
to keep a List of the Positions of the pegs on the board
...
Allowing state to change breaks
the substitution model of evaluation
...
Data

97

keep a List of the Positions of the empty holes on the board
...
The elements in each of the Lists are Booleans representing whether or
not there is a peg at that position
...
As
long as the procedures for implementing the Board are updated the work with
the new representation, all the code that uses the board abstraction should continue to work correctly without any changes
...
So, make-board evaluates to a List of Lists,
where each element of the List contains the row number of elements and all the
inner elements are true (the initial board is completely full of pegs)
...
The output is a List of length n where each element has the value
val
...
As usual, a recursive problem solving strategy works well: the simplest
board is a board with zero rows (represented as the empty list); for each larger
board, we add a row with the right number of elements
...
This is similar to the problem we
faced with intsto, and a similar solution using append-list works here:
(deﬁne (make-board rows)
(if (= rows 0) null
(list-append (make-board (− rows 1))
(list (make-list-of-constants rows true)))))
Evaluating (make-board 3) produces ((true) (true true) (true true true))
...

(deﬁne (board-rows board) (length board))
The board-valid-position? indicates if a Position is on the board
...

(deﬁne (board-valid-position? board pos)
(and (>= (position-get-row pos) 1) (>= (position-get-column pos) 1)
(<= (position-get-row pos) (board-rows board))
(<= (position-get-column pos) (position-get-row pos))))
We need a way to check if a Board represents a winning solution (that is, contains
only one peg)
...
Our board representation used true to represent a peg
...
Then, we use sum-list to count the

98

5
...
Data Abstraction

number of pegs
...

(deﬁne (board-number-of-pegs board)
(list-sum
(list-map (lambda (peg ) (if peg 1 0)) (list-ﬂatten board))))
A board is a winning board if it contains exactly one peg:
(deﬁne (board-is-winning? board)
(= (board-number-of-pegs board) 1))
The board-contains-peg? procedure takes a Board and a Position as input, and
outputs a Boolean indicating whether or not that Position contains a peg
...
The list-get-element procedure (from Example 5
...
Since our board is represented as a List of Lists, we need to use it
twice: ﬁrst to get the row, and then to select the column within that row:
(deﬁne (board-contains-peg? board pos)
(list-get-element (list-get-element board (position-get-row pos))
(position-get-column pos)))
Deﬁning procedures for adding and removing pegs from the board is more complicated
...
For
that row, we need to replace the corresponding value
...

First we consider the subproblem of replacing a peg in a row
...
We can deﬁne row-replace-peg recursively: the base case is when the modiﬁed peg is at the
beginning of the row (the column number is 1); in the recursive case, we copy
the ﬁrst element in the List, and replace the peg in the rest of the list
...
Since true values represent holes with pegs, a true value indicates that we are adding a peg and false
means we are removing a peg
...

(deﬁne (board-replace-peg board row col val)
(if (= row 1)
(cons (row-replace-peg (car board) col val) (cdr board))
(cons (car board) (board-replace-peg (cdr board) (− row 1) col val))))
Both board-add-peg and board-remove-peg can be deﬁned simply using board-

Chapter 5
...
They ﬁrst check if the operation is valid (adding is valid only if the
selected position does not contain a peg, removing is valid only if the selected
position contains a peg), and then use board-replace-peg to produce the modiﬁed board:
(deﬁne (board-add-peg board pos)
(if (board-contains-peg? board pos)
(error (format "Board already contains peg at position: ˜a" pos))
(board-replace-peg board (position-get-row pos)
(position-get-column pos) true)))
(deﬁne (board-remove-peg board pos)
(if (not (board-contains-peg? board pos))
(error (format "Board does not contain peg at position: ˜a" pos))
(board-replace-peg board (position-get-row pos)
(position-get-column pos) false)))
We can now deﬁne a procedure that models making a move on a board
...
Moving the peg is equivalent to removing
the peg from the starting position and adding a peg to the landing position, so
the procedures we deﬁned for adding and removing pegs can be composed to
model making a move
...
Now that we can model the board and simulate making
jumps, we are ready to develop the solution
...
So, we need a procedure that takes a Board as
its input and outputs a List of all valid moves on the board
...

First, we generate all conceivable moves by creating moves starting from each
position on the board and moving in all possible move directions
...

We can generate a list of all row numbers using the intsto procedure (from Example 5
...
To get a list of all the positions, we need to produce a list of the
positions for each row
...
6
...
The
list-ﬂatten procedure (from Example 5
...

(deﬁne (all-positions board)
(list-ﬂatten (all-positions-helper board)))
For each Position, we ﬁnd all possible moves starting from that position
...

(deﬁne all-directions
(list
(make-direction −1 0) (make-direction 1 0) ; left, right
(make-direction −1 −1) (make-direction 0 −1) ; up-left, up-right
(make-direction 0 1) (make-direction 1 1))) ; down-left, down-right
For each position on the board, we create a list of possible moves starting at that
position and moving in each possible move directions
...

(deﬁne (all-conceivable-moves board)
(list-ﬂatten
(list-map
(lambda (pos) (list-map (lambda (dir) (make-move pos dir))
all-directions))
(all-positions board))))
The output produced by all-conceivable-moves includes moves that ﬂy off the
board
...
A legal move
must start at a position that contains a peg, jump over a position that contains a
peg, and land in an empty hole
...
Data

101

(deﬁne (all-legal-moves board)
(list-ﬁlter
(lambda (move)
(and
(board-contains-peg? board (move-get-start move))
(board-contains-peg? board (move-get-jumped move))
(not (board-contains-peg? board (move-get-landing move)))))
(all-board-moves board)))
Winning the Game
...
If there is a winning sequence of
moves, we can ﬁnd it by trying all possible moves on the current board
...
If the original board has a winning sequence
of moves, at least one of the new boards has a winning sequence of moves
...

(deﬁne (solve-pegboard board)
(if (board-is-winning? board)
null ; no moves needed to reach winning position
(try-moves board (all-legal-moves board))))
If there is a sequence of moves that wins the game starting from the input Board,
solve-pegboard outputs a List of Moves representing a winning sequence
...
If
there is no sequence of moves to win from the input board, solve-pegboard outputs false
...
It takes a Board and a List of Moves
as inputs
...

The base case is when there are no moves to try
...
We output false to mean this attempt did
not lead to a winning board
...
If it leads to a
winning position, try-moves should output the List of Moves that starts with the
ﬁrst move and is followed by the rest of the moves needed to solve the board
resulting from taking the ﬁrst move (that is, the result of solve-pegboard applied
to the Board resulting from taking the ﬁrst move)
...

(deﬁne (try-moves board moves)
(if (null? moves)
false ; didn’t ﬁnd a winner
(if (solve-pegboard (board-execute-move board (car moves)))
(cons (car moves)
(solve-pegboard (board-execute-move board (car moves))))
(try-moves board (cdr moves)))))
Evaluating (solve-pegboard (make-board 5)) produces false since there is no way
to win starting from a completely full board
...
7
...
; 8 moves elided
(Move (Position 5 1) (Direction 1 1)))
a
...
Only the procedures with names starting with board- should need to change when the Board
representation is changed
...

b
...
Deﬁne a more general pegboard puzzle
solver that works for a board of any shape
...
[ ] The described implementation is very inefﬁcient
...
For example, all-possible-moves evaluates to the same
value every time it is applied to a board with the same number of rows
...
See
how much faster you can make the pegboard solver
...
7

Summary of Part I

To conclude Part I, we revisit the three main themes introduced in Section 1
...

Recursive deﬁnitions
...
Recursive grammars
provide a compact way to deﬁne a language; recursive procedure deﬁnitions
enable us to solve problems by optimistically assuming a smaller problem instance can be solved and using that solution to solve the problem; recursive data
structures such as the list type allow us to deﬁne and manipulate complex data
built from simple components
...
For
grammars, the base case provides a way to stop the recursive replacements by
produce a terminal (or empty output) directly; for procedures, the base case
provides a direct solution to a small problem instance; for data structures, the
base case provides a small instance of the data type (e
...
, null)
...

universal
programming
language

Universality
...
This subset is a universal programming
language: it is powerful enough to describe all possible computations
...
We have also seen the
universality of code and data
...

Abstraction
...
Procedural abstraction deﬁnes a procedure; by using inputs, a short procedure deﬁnition can
abstract inﬁnitely many different information processes
...
Data

103

the details of how data is represented by providing procedures that abstractly
create and manipulate that data
...
We need to break problems down into smaller parts that can be
solved separately
...
Most of the work in solving the problem is deﬁning
the right datatypes; once we have the datatypes we need to model the problem
well, we are usually well along the path to a solution
...
In Part II, we examine the costs of executing procedures
...
7
...

Gottfried Wilhelm von Leibniz, 1685

The ﬁrst ﬁve chapters focused on ways to use language to describe procedures
...
As a very rough approximation, a typical
laptop gives an individual computing power comparable to having every living
human on the planet working for you without ever making a mistake or needing
a break
...
Computers are different from
other machines in two key ways:
1
...

2
...
The simple computer model
introduced in this chapter can perform all possible computations
...
Section 6
...
Section 6
...

We provide only a very shallow introduction to how machines can implement
computations
...
The following chapters use this to reason about the costs of
various procedures
...

106

6
...
1
...
These machines took physical things as
inputs, performed physical actions on those things, and produced some physical output
...

The ﬁrst big leap toward computing machines was the development of machines
whose purpose is abstract rather than physical
...
The output of the machine is valuable because it can be interpreted as information, not
for its direct physical effect
...
The base ten
number system used by most human cultures reﬂects using our ten ﬁngers for
counting
...

Shepherds used stones to represent numbers, making the cognitive leap of using
a physical stone to represent some quantity of sheep
...

Suan Pan

Pascaline
David Monniaux

More complex societies required more counting and more advanced calculating
...

Many cultures developed forms of abaci, including the ancient Mesopotamians
and Romans
...
The
Chinese suan pan (“calculating plate”) is an abacus with a beam subdividing
the rods, typically with two beads above the bar (each representing 5), and ﬁve
beads below the beam (each representing 1)
...

All of these machines require humans to move parts to perform calculations
...
The ﬁrst automatic
calculating machine to be widely demonstrated was the Pascaline, built by then
nineteen-year old French mathematician Blaise Pascal (also responsible for Pascal’s triangle from Exploration 5
...
The Pascaline had ﬁve wheels, each representing one digit of a number, linked by gears to perform addition with carries
...

Over the following centuries, more sophisticated mechanical calculating machines were developed but these machines could still only perform one operation at a time
...
For example, many cultures including the
Maya and Basque adopted base twenty systems counting both ﬁngers and toes
...

Chapter 6
...

The big breakthrough was the conceptual leap of programmability
...

The ﬁrst programmable computing machine was envisioned (but never successfully built) in the 1830s by Charles Babbage
...
In the 1800s, calculations were
done by looking up values in large books of mathematical and astronomical tables
...
The
calculations were especially important for astronomical navigation, and when
the values were incorrect a ship would miscalculate its position at sea (sometimes with tragic consequences)
...
Starting in 1822, he designed a steam-powered machine known
as the Difference Engine to compute polynomials needed for astronomical calculations using Newton’s method of successive differences (a generalization of
Heron’s method from Exploration 4
...
The Difference Engine was never fully
completed
...

This new machine, the Analytical Engine, designed between 1833 and 1844, was
the ﬁrst general-purpose computer envisioned
...
One breakthrough in Babbage’s design was to feed the machine’s outputs back into its inputs
...

The Analytical Engine was programmed using punch cards, based on the cards
that were used by Jacquard looms
...
The Analytical Engine supported conditional jumps where the jump would be taken depending on the
state of a lever in the machine (this is essentially a simple form of the if expression)
...

Babbage’s
grumblings
...

Richard Sheepshanks,
Letter to the Board of
Visitors of the
Greenwich Royal
Observatory, 1854

Analytical Engine
Science Museum, London

In 1842, Charles Babbage visited Italy and described the Analytical Engine to
Luigi Menabrea, an Italian engineer, military ofﬁcer, and mathematician who
would later become Prime Minister of Italy
...
Ada Augusta Byron King (also known as Ada,
Countess of Lovelace) translated the article into English
...
The
notes included a program to compute Bernoulli numbers, the ﬁrst detailed program for the Analytical Engine
...

This was the ﬁrst computer program ever described, and Ada is recognized as
the ﬁrst computer programmer
...
Babbage, if you
put into the
machine wrong
ﬁgures, will the
right answers come
out?” I am not able
rightly to
apprehend the kind
of confusion of ideas
that could provoke
such a question
...
2
...
It is unclear whether the main reason for the failure to build a working
Analytical Engine was due to limitations of the mechanical components available at the time, or due to Babbage’s inability to work with his engineer collaborator or to secure continued funding
...
Advances in electronics enabled more reliable and faster components than the mechanical components used by Babbage, and the desperation
brought on by World War II spurred the funding and efforts that led to working
general-purpose computing machines
...
In Babbage’s Analytical Engine, the program is a stack of cards and the data are numbers stored in the machine
...

The idea of treating the program as just another kind of data the machine can
process was developed in theory by Alan Turing in the 1930s (Section 6
...

This computer (and all general-purpose computers in use today) stores the program itself in the machine’s memory
...
This power to change its own program is
what makes stored-program computers so versatile
...
1
...
How many bits
could the store of Babbage’s Analytical Engine hold?

6
...
We use Boolean logic, in which there are two possible values: true
(often denoted as 1), and false (often denoted as 0)
...
Boolean logic is named for George Boole, a
self-taught British mathematician who published An investigation into the Laws
of Thought, on Which are founded the Mathematical Theories of Logic and Probabilities in 1854
...
Boole made logic a formal language to which the tools of mathematics
could be applied
...
Modern computers use electrons to compute
because they are small (more than a billion billion billion (1031 ) electrons ﬁt
within the volume of a grain of sand), fast (approaching the speed of light), and
cheap (more than a billion billion (1022 ) electrons come out of a power outlet for
less than a cent)
...
The basic notions of mechanical computation don’t depend on the medium we use to compute, only on our ability to use it to represent
values and to perform simple logical operations
...
Machines

6
...
1

Implementing Logic

To implement logic using a machine, we need physical ways of representing the
two possible values
...
If the value of an input is true, we pour a bottle
of wine in the input nozzle; for false inputs we do nothing
...

And
...
The
output is true if both of the inputs are true; otherwise the output is false
...

A different way to deﬁne a function is by using a table to show the corresponding
output value for each possible pair of input values
...
For functions in Boolean logic, there are only two
possible values for each input (true and false) so it is feasible to list the outputs
for all possible inputs
...
If there is one input, the
table needs two entries, showing the output value for each possible input
...
The truth table for the logical and
function is:
A
false
true
false
true

B
false
false
true
true

(and A B)
false
false
false
true

We design a machine that implements the function described by the truth table: if both inputs are true (represented by full bottles of wine in our machine),
the output should be true; if either input is false, the output should be false (an
empty bottle)
...
1
...
The output nozzle is placed at a height corresponding to one bottle of
wine in the collection basin, so the output bottle will ﬁll (representing true), only
if both inputs are true
...
1 would probably not work very well in practice
...
What should a 4 full bottle of wine represent? What
about a bottle that is half full?
2 Scheme provides a special form and that performs the same function as the logical and function
...

truth table

110

6
...
Mechanizing Logic

Figure 6
...
Computing and with wine
...
Although there are many different quantities of wine that could be in a bottle, regardless of the actual quantity the value
is interpreted as only one of two possible values: true or false
...
This means an inﬁnitely large set of possible values are abstracted as
meaning true, so it doesn’t matter which of the values above half full it is
...
It is much easier to design computing systems around discrete values than around continuous values;
by mapping a range of possible continuous values to just two discrete values, we
give up a lot of information but gain in simplicity and reliability
...

Or
...
2(a)
...
The output of the not function is the opposite of the value
of its input:
A
false
true

(not A)
true
false

3 Scheme provides a special form or that implements the logical or function, similarly to the and
special form
...

111

Chapter 6
...
To implement the not function, we need the notion of a source current
and a clock
...
The
clock ticks periodically, on each operation
...
When the clock ticks, a bottle of wine is sent through the source
current, and the output is produced
...
2(b) shows one way to implement
the not function
...

(b) Computing not with wine
...
2
...

(b) The not machine uses a clock
...
If the input is true, the
ﬂoat is lifted, blocking the source opening; if the input i false, the ﬂoat rests on the bottom of
the basin
...
If the ﬂoat is up (because of the true
input), the opening is blocked, and the output is empty (false)
...
(This design assumes wine coming from the source does not leak under the
ﬂoat, which might be hard to build in a real system
...
2
...

We start by making a three-input conjunction function
...
One way to make the three-input
and3 is to follow the same idea as the two-input and where all three inputs pour
into the same basin, but make the basin with the output nozzle above the two
bottle level
...
2
...
2
...
3
...
We could write this as (and (and A B) C)
...
3
...

Composing logical functions also allows us to build new logical functions
...
So, we
could compute (xor A B) as (and (or A B) (not (and A B)))
...

We can compose any pair of functions where the outputs for the ﬁrst function
are consistent with the input for the second function
...
Machines

All Boolean logic functions can be implemented using just the nand function
...

For example, we can implement not using nand where the one input to the not
function is used for both inputs to the nand function:
(not A) ≡ (nand A A)
Now that we have shown how to implement not using nand, it is easy to see how
to implement and using nand:
(and A B) ≡ (not (nand A B))
Implementing or is a bit trickier
...
But, A nand B is true if both inputs are false, and false if both inputs are
true
...
We omit the details here, but leave some of the other
functions as exercises
...
Trillions of nand gates are produced in
silicon every day
...
2
...

Exercise 6
...
What is the meaning of composing not with itself? For example,
(not (not A))
...
4
...

Exercise 6
...
[ ] Our deﬁnition of (not A) as (nand A A) assumes there is a
way to produce two copies of a given input
...
It should take one input, and produce two outputs,
both with the same value as the input
...
)
Exercise 6
...
[ ] The digital abstraction works ﬁne as long as actual values stay
close to the value they represent
...
For example, if
3
the inputs to the and3 function in Figure 6
...
When combined with the third
2
input, the second basin will contain 1 1 bottles, so only 1 will spill into the output
4
4
bottle
...
The solution to this problem is to use an ampliﬁer to restore values to

114

6
...
Mechanizing Logic

their full representations
...
If that input
represents true (any value that is half full or more), the ampliﬁer should output
true, but with a strong, full bottle representation
...

6
...
3

Arithmetic

Not only is the nand function complete for Boolean logical functions, it is also
enough to implement all discrete arithmetic functions
...

There are four possible pairs of inputs:
A
0
0
1
1

+
+
+
+

B
0
1
0
1

r1
0
0
0
1

=
=
=
=

r0
0
1
1
0

We can compute each of the two output bits as a logical function of the two input
bits
...

Adding larger numbers requires more logical functions
...
If the result in one place is more than one digit, the
additional tens are carried to the next digit
...

115

Chapter 6
...

• Repeat for each digit k from 0 to n:
1
...

2
...

3
...

This is perhaps the ﬁrst interesting algorithm most people learn: if followed correctly, it is guaranteed to produce the correct result, and to always ﬁnish, for any
two input numbers
...
But, the numbers added are one-digit numbers (and ck is 0 or 1)
...
We can memorize the
100 possibilities for adding two digits (or write them down in a table), and easily
add one as necessary for the carry
...

We can use the same algorithm to sum binary numbers, except it is simpler since
there are only two binary digits
...
If the carry bit is 1, the result bit should ﬂip
...

The carry bit is 1 if the sum of the input bits and previous carry bit is greater than
1
...

We can propagate the equations through the steps to ﬁnd a logical equation for
each result bit in terms of just the input bits
...
3
...
But,
for any number of digits, we can always ﬁnd functions for computing the result
bits using only logical functions on the input bits
...

We can also implement multiplication, subtraction, and division using only nand
functions
...

Exercise 6
...
Adding logically
...
What is the logical formula for r3 ?
b
...
[ ] Is it possible to compute r4 with fewer logical functions?
Exercise 6
...
Show how to compute the result bits for binary multiplication of
two 2-bit inputs using only logical functions
...
9
...

6
...
And, we can perform any discrete arithmetic function
using only Boolean functions
...
We would like to be able to make the inputs to the machine describe the logical functions that it should perform, rather than having to build
a new machine for each desired function
...
Instead, we consider
programmable computing machines abstractly
...
Accept input
...
Execute a mechanical procedure
...
Produce output
...

Modeling input
...

For our model, we want to keep things as simple as possible, though
...
We
can represent any discrete input with a sequence of bits
...
Input from a pointing device like a mouse could
be continuous, but we can always identify some minimum detected movement

Chapter 6
...
Richer input devices like a camera or microphone can also produce discrete output by discretizing the input using a process similar to the image storage in Chapter 1
...

For real input devices, the time an event occurs is often crucial
...
How can we model inputs where time matters
using just our simple sequence of bits?
One way would be to divide time into discrete quanta and encode the input as
zero or one events in each quanta
...
The timestamps are just numbers (e
...
, the number of milliseconds since the start time), so can be written down just as sequences of bits
...
The input must be ﬁnite, since our model computer needs all
the input before it starts processing
...
g
...

In practice, though, this isn’t usually a big problem since we can make the input
ﬁnite by limiting the time the server is running in the model
...

Modeling output
...

We don’t attempt to model the physical impact of computer outputs; that would
be far too complicated, but it is also one step beyond modeling the computation itself
...
The
information in a picture is the same whether it is presented as a sequence of bits
or an image projected on a screen, its just less pleasant to look at as a sequence
of bits
...

Modeling processing
...

One thing our model computer needs is a way to keep track of what it is doing
...
In Babbage’s Analytical Engine, this
was called the store, and divided into a thousand variables, each of which could
store a ﬁfty decimal digit number
...
Each word
was 15 bits (plus one bit for error correction)
...

The virtual
shopping spree was
a ﬁrst for the
President who has a
reputation for being
“technologically
challenged
...
”
BusinessWeek,
22 December 1999

118

6
...
Modeling Computing

For our model machine, we don’t want to have arbitrary limits on the amount of
working storage
...

Like the input and output tapes, it is divided into squares, and each square can
contain one symbol
...
We can, however, imagine continuing to add more memory to
a real computer as needed until we have enough to solve a given problem, and
adding more if we need to solve a larger problem
...

We can simplify the model by using a single tape for all three
...
As processing
is done, the input is read and the tape is used as the working tape
...

We also need a way for our model machine to interface with the tape
...
On each processing
step, the tape head can read the symbol in the current square, write a symbol in
the current square, and move one square either left or right
...
In our
model, this means controlling what the tape head does: at each step, it needs to
decide what to write on the tape, and whether to move left or right, or to ﬁnish
the execution
...
We
don’t want to have to model anything as complex as multiplication in our model
machine, however
...
To carry out a
complex operation as a composition of simple operations, we need a way to
keep track of enough state to know what to do next
...
Unlike the tape, it is
limited to a ﬁnite number
...

We also need rules to control what the tape head does
...
The input for a rule is the symbol in the current tape square and the
current state of the machine; the output of each rule is three things: the symbol
to write on the current tape square, the direction for the tape head to move (left,
right, or halt), and the new machine state
...
For each machine state, we need a rule for each
possible symbol on the tape
...
3
...
Turing’s model is depicted in Figure 6
...

An inﬁnite tape divided into squares is used as the input, working storage, and
output
...
The tape head

119

Chapter 6
...
4
...

keeps track of its internal state, and follows rules matching the current state and
current tape square to determine what to do next
...
Turing developed this model in 1936, before anything resembling a modern computer existed
...
He devised the inﬁnite tape to model the two-dimensional
graph paper students use to perform arithmetic
...

Turing’s model is equivalent to the model we described earlier, but instead of
using only bits as the symbols on the tape, Turing’s model uses members of any
ﬁnite set of symbols, known as the alphabet of the tape
...
That means, every computation that
could be done with a Turing Machine using any alphabet set, could also be done
by some Turing Machine using only the binary digits
...
As we
saw in Chapter 1, we can map each of the alphabet symbols to a ﬁnite sequence
of binary digits
...
For example, suppose our original machine uses 16
alphabet symbols, and we map each symbol to a 4-bit sequence
...
These four states would read the 1, 0, 1, 1 from the tape
...
To follow the rule, we also need to use four states to write the bit sequence
corresponding to the original write symbol on the tape
...
Hence, we may need 12 states for each
transition rule of the original machine, but can simulate everything it does using
only two symbols
...
3
...
This means every
algorithm can be implemented by some Turing Machine
...

Any physical machine has a limited amount of memory
...
Nevertheless, the simplicity
and robustness of the Turing Machine model make it a useful way to think about
computing even if we cannot build a truly universal computing machine
...
Despite being invented
before anything resembling a modern computer existed, nearly every computing machine ever imagined or built can be modeled well using Turing’s simple
model
...
Any step on any computer that operates using
standard physics and be simulated with a ﬁnite number of steps on a Turing
Machine
...
Hence, if we can reason about the number of steps required for a Turing Machine to solve a given problem, then we can
make strong and general claims about the number of steps it would take any
standard computer to solve the problem
...

Example 6
...
For example, in a Scheme expression, every opening left
parenthesis must have a corresponding closing right parenthesis
...
Our goal is to design a Turing
Machine that takes as input a string of parentheses (with a # at the beginning
and end to mark the endpoints) and produces as output a 1 on the tape if the
input string is well-balanced, and a 0 otherwise
...

Our strategy is to ﬁnd matching pairs of parentheses and cross them out by writing an X on the tape in place of the parenthesis
...
If not, the input was not well-balanced, and the machine writes
a 0 as its output and halts
...
The plan for the machine is to move the tape head to the
right (without changing the input) until a closing parenthesis is found
...
This matches the closing parenthesis, so it is
replaced with an X
...
If the end of the tape (marked with a #) is found, check the tape has
no remaining open parenthesis
...
Machines

ues to the right until it ﬁnds a closing parenthesis (this is the start state); LookForOpen, in which the machine continues to the left until it ﬁnds the balancing
open parenthesis; and CheckTape, in which the machine checks there are no unbalanced open parentheses on the tape starting from the right end of the tape
and moving towards the left end
...
5
...

Keep looking
...

End of tape
...

Found open
...

Reached beginning
...

Unbalanced open
...

Finished checking
...
5
...

Another way to depict a Turing Machine is to show the states and rules graphically
...
For each rule, we draw an edge on the
graph between the starting state and the next state, and label the edge with the
read and write tape symbols (separated by a /), and move direction
...
6 shows the same Turing Machine as a state graph
...
If there
is no outgoing edge for the current read symbol for the current state in the state
graph, execution terminates with an error
...
6
...

122

6
...
Modeling Computing

Exercise 6
...
Follow the rules to simulate the checking parentheses Turing
Machine on each input (assume the beginning and end of the input are marked
with a #):
a
...
()
c
...
(()(()))()
e
...
11
...
The input is of the form an−1
...
b1 b0 (with # markers at both ends) where each ak and bk is either 0 or 1
...

Proﬁle: Alan Turing

Alan Turing
Image from Bletchley Park Ltd
...
He developed the model to solve a famous
problem posed by David Hilbert in 1928
...
To solve the
problem, Turing ﬁrst needed a formal model of an algorithm
...
With
the model, Turing was able to show that there are some problems that cannot
be solved by any algorithm
...

After publishing his solution to the Entscheidungsproblem in 1936, Turing went
to Princeton and studied with Alonzo Church (inventor of the Lambda calculus, on which Scheme is based)
...
Turing
was instrumental in breaking the Enigma code which was used by the Nazi’s
to communicate with ﬁeld units and submarines
...
The machines used logical operations to search
the possible rotor settings on the Enigma to ﬁnd the settings that were most
likely to have generated an intercepted encrypted message
...
The Allies used the
knowledge gained from them to avoid Nazi submarines and gain a tremendous
tactical advantage
...
Among other things, he worked on designing
general-purpose computing machines and published a paper (Intelligent Machinery) speculating on the ability of computers to exhibit intelligence
...
Machines

123

would be able to pass the test within 50 years (that is, by the year 2000)
...

In 1952, Turing’s house was broken into, and Turing reported the crime to the
police
...
Turing did not attempt to hide his homosexuality, and was convicted and given a choice between serving time in prison
and taking hormone treatments
...
In 1954, at the age of 41, Turing was found dead in
an apparent suicide, with a cynide-laced partially-eaten apple next to him
...
In September
2009, instigated by an on-line petition, British Prime Minister Gordon Brown
issued an apology for how the British government treated Alan Turing
...
4

Summary

The power of computers comes from their programmability
...
The Turing Machine model
provides a simple, abstract, model of a computing machine
...

As the ﬁrst computer programmer, Ada deserves the last word:
By the word operation, we mean any process which alters the mutual relation of two or more things, be this relation of what kind it may
...

In abstract mathematics, of course operations alter those particular relations which are involved in the considerations of number and space, and
the results of operations are those peculiar results which correspond to the
nature of the subjects of operation
...

The operating mechanism can even be thrown into action independently
of any object to operate upon (although of course no result could then be
developed)
...
Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might
compose elaborate and scientiﬁc pieces of music of any degree of complexity or extent
...
4
...

Alan Perlis
I told my dad that someday I’d have a computer that I could write programs on
...
I said, “Well, then I’m going to live in an apartment
...
Predicting the cost of executing a procedure has practical value (for
example, we can estimate how much computing power is needed to solve a particular problem or decide between two possible implementations), but also provides deep insights into the nature of procedures and problems
...
Other measures of cost include
the amount of memory needed and the amount of energy consumed
...

7
...
If we are
primarily concerned with time, we could just use a stopwatch to measure the
evaluation time
...
1 Evaluating (time Expression) produces the value of the input expression, but also prints out the time required to evaluate the expression (shown
in our examples using slanted font)
...
CPU
is an abbreviation for “central processing unit”, the computer’s main processor
...
Since
other processes may be running on the computer while this expression
is evaluated, the real time may be longer than the CPU time, which only
counts the time the processor was working on evaluating this expression
...
If it were evaluated normally, there would be
no way to time how long it takes to evaluate, since it would have already been evaluated before time
is applied
...
1
...
Garbage collection is used to reclaim memory that is
storing data that will never be used again
...
The real time is 152 seconds,
meaning this evaluation took just over two and a half minutes
...

Here are two more examples:

> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 531 real time: 531 gc time: 62
1
> (time (car (list-append (intsto 1000) (intsto 100))))
cpu time: 609 real time: 609 gc time: 0
1
The two expressions evaluated are identical, but the reported time varies
...

Many properties unrelated to our expression (such as where things happen to
be stored in memory) impact the actual time needed for any particular evaluation
...

There’s no sense in
being precise when
you don’t even know
what you’re talking
about
...
If we try an evaluation and it has not ﬁnished after an
hour, say, we have no idea if the actual time to ﬁnish the evaluation is sixty-one
minutes or a quintillion years
...
The techniques we develop allow us to predict the time an evaluation
needs without waiting for it to execute
...
We would like to understand how the evaluation time scales with the
size of the inputs so we can understand which inputs the procedure can sensibly
be applied to, and can choose the best procedure to use for different situations
...

Exercise 7
...
Suppose you are deﬁning a procedure that needs to append two
lists, one short list, short and one very long list, long , but the order of elements
in the resulting list does not matter
...
)

127

Chapter 7
...
1: Multiplying Like Rabbits

Filius Bonacci was an Italian monk and mathematician in the 12th century
...

It also included the problem for which Fibonacci numbers are named:2
A pair of newly-born male and female rabbits are put in a ﬁeld
...

Assume rabbits never die and that each female rabbit produces one new
pair (one male, one female) every month from her second month on
...

The sequence produced is known as the Fibonacci sequence:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377,
...
Fibonacci numbers occur frequently in nature, such as the arrangement of ﬂorets in the sunﬂower (34 spirals in one direction and 55 in the
other) or the number of petals in common plants (typically 1, 2, 3, 5, 8, 13, 21, or
34), hence the rarity of the four-leaf clover
...
The sequence was already known to Indian mathematicians with whom Bonacci studied
...
1
...

The ﬁbo procedure is deﬁned in a way that guarantees it eventually completes
when applied to a non-negative whole number: each recursive call reduces the
input by 1 or 2, so both recursive calls get closer to the base case
...
To understand why the evaluation of
(ﬁbo 60) did not ﬁnish in our interpreter, we need to consider how much work is
required to evaluate the expression
...
To evaluate (ﬁbo 59), it
needs to evaluate (ﬁbo 58) again and also evaluate (ﬁbo 57)
...
So,
there is one evaluation of (ﬁbo 60), one evaluation of (ﬁbo 59), two evaluations
of (ﬁbo 58), and three evaluations of (ﬁbo 57)
...
1
...
Hence, the number of 1 values must be the value of the ﬁnal result, which just sums all these numbers
...
The number of evaluations of applications of ﬁbo needed to evaluate (ﬁbo 60) is the 61st Fibonacci number —
2,504,730,781,961 — over two and a half trillion applications of ﬁbo!
(ﬁbo 5)
(ﬁbo 4)
(ﬁbo 3)

(ﬁbo 3)

(ﬁbo 2)

(ﬁbo 2)

(ﬁbo 1)

1

(ﬁbo 2)

(ﬁbo 1)

1

1

1

1

Figure 7
...
Evaluation of ﬁbo procedure
...
It involves a tremendous amount of
duplicated work: for the (ﬁbo 60) example, there are two evaluations of (ﬁbo 58)
and over a trillion evaluations of (ﬁbo 1) and (ﬁbo 2)
...
This is more like the way a human would determine the numbers
in the Fibonacci sequence: we ﬁnd the next number by adding the previous two

Chapter 7
...

The fast-ﬁbo procedure computes the nth Fibonacci number, but avoids the duplicate effort by computing the results building up from the ﬁrst two Fibonacci
numbers, instead of working backwards
...
The deﬁnition is still
recursive, but unlike the original deﬁnition the problem is broken down differently
...
In the case of Fibonacci, the fast-ﬁbo procedure
builds up from the two base cases until reaching the desired answer
...

The helper procedure, ﬁbo-iter (short for iteration), takes three parameters: a
is the value of the previous-previous Fibonacci number, b is the value of the
previous Fibonacci number, and left is the number of iterations needed before reaching the target
...
Each
recursive call to ﬁbo-iter reduces the value passed in as left by one, and advances
the values of a and b to the next numbers in the Fibonacci sequence
...
The number of applications of
ﬁbo-iter needed to evaluate (fast-ﬁbo 60) is now only 59
...
This allows us to compute the
expected number of rabbits in 5 years is 1548008755920 (over 1
...

7
...
The important question
in understanding the resources required to evaluate a procedure application is
how the required resources scale with the size of the input
...
For large inputs, the
ﬁrst Fibonacci procedure never ﬁnishes, but the fast Fibonacci procedure ﬁnishes effectively instantly
...
This result
suggests that in about 10 years the mass of all the rabbits produced from the initial pair will exceed
the mass of the Earth, which, although scary, seems unlikely!

dynamic
programming

130

7
...
Orders of Growth

the important properties of how resources required grow with input size
...

Θ( f ) (theta)
The set of functions that grow as fast as f grows
...

Charles Babbage, 1851

Figure 7
...
Next, we deﬁne each
function and provide some examples
...
3 illustrates how to analyze the
time required to evaluate applications of procedures using these notations
...
Capital
thus checks its own
accumulation:
knowledge thus
accelerates its own
advance
...

These functions capture the asymptotic behavior of functions, that is, how they
behave as the inputs get arbitrarily large
...

f

Functions that
grow slower than f

Functions that grow
as fast as f

Figure 7
...
Visualization of the sets O( f ), Ω( f ), and Θ( f )
...
2
...
The O function takes
as input a function, and produces as output the set of all functions that grow no
faster than the input function
...
In Figure 7
...

To deﬁne the meaning of O precisely, we need to consider what it means for a
function to grow
...
First, we consider a few examples; then
we provide a formal deﬁnition of O
...
Cost

f (n) = n + 12 and g(n) = n − 7
No matter what n value we use, the value of f (n) is greater than the value of
g(n)
...
What matters is how
the difference between g(n) and f (n) changes as the input values increase
...
Thus, the growth rates of f and g are identical and
n − 7 is in the set O(n + 12), and n + 12 is in the set O(n − 7)
...
This difference increases as the
input value n increases, but it increases by the same amount as n increases
...
The value of 2n is always within
n
a constant multiple of 3n, so they grow asymptotically at the same rate
...
x
f (n) = n and g(n) = n2
The difference between g(n) and f (n) is n2 − n = n(n − 1)
...
The value of n − 1 increases as n increases,
so g grows faster than f
...
The function n is in O(n2 ) since n grows slower than n2 grows
...
The value of Fibonacci (n + 2)
is more than double the value of Fibonacci (n) since
Fibonacci (n + 2) = Fibonacci (n + 1) + Fibonacci (n)
and Fibonacci (n + 1) > Fibonacci (n)
...
414 (since increasing by one twice
more than doubles the value)
...
618, also known as the “golden ratio”
...
)
This is much faster than the growth rate of n, which increases by one when
we increase n by one
...

Some of the example functions are plotted in Figure 7
...
The O notation reveals
the asymptotic behavior of functions
...
In the ﬁrst graph, the
r n2

100

r

80

40
20
0

5000

r

60

Fibo (n)

r

r
r
r ` ` `
`r `
`r `r
2

6

4

`

Fibo (n)
6000

`

`

4000
3000

3n

2000
1000
0

8

10

n

r 2
` ` ` ` ` ` ` r
` `
`r `r r r `r `r `r r r `r r r `r `r r `r r r ` ` n
4

8

12
n

Figure 7
...
Orders of Growth
...
2
...
In the second graph, the values of Fibonacci (n) for input values up to
20 are so large that the other functions appear as nearly ﬂat lines on the graph
...
The function g is a member of the set O( f ) if and only if there
exist positive constants c and n0 such that, for all values n ≥ n0 ,
g ( n ) ≤ c f ( n )
...
To show g is not in O( f ), we need to explain how, for
any choices of c and n0 , we can ﬁnd values of n that are greater than n0 such that
g(n) ≤ c f (n) does not hold
...
1: O Examples
We now show the claimed properties are true using the formal deﬁnition
...
Then, we need to show n − 7 ≤ 1(n + 12) for all
values n ≥ 1
...

n + 12 is in O(n − 7)
Choose c = 2 and n0 = 26
...
The equation simpliﬁes to n + 12 ≤ 2n − 14, which simpliﬁes
to 26 ≤ n
...

2n is in O(3n)
Choose c = 1 and n0 = 1
...

3n is in O(2n)
Choose c = 2 and n0 = 1
...

n is in O(n2 )
Choose c = 1 and n0 = 1
...

n2 is not in O(n)
We need to show that no matter what values are chosen for c and n0 , there
are values of n ≥ n0 such that the inequality n2 ≤ cn does not hold
...

n is in O(Fibonacci (n))
Choose c = 1 and n0 = 3
...

Fibonacci (n) is not in O(n − 2)
No matter what values are chosen for c and n0 , there are values of n ≥ n0
such that Fibonacci (n) > c(n)
...
So, no matter what
value is chosen for c, we can choose n = c
...
For n > 12, we know Fibonacci (n) > n2
...

133

Chapter 7
...
For the given c values, we can always use a higher n0 value than the
selected value
...
Hence, our proofs work equally well with higher values for n0 than we
selected
...
The key is just to pick any appropriate values for c and n0 , and show the
inequality holds for all values n ≥ n0
...
The key to these proofs
is that the value of n that invalidates the inequality is selected after the values of
c and n0 are chosen
...
The ﬁrst player picks c and n0 , and the second player picks n
...

Exercise 7
...
For each of the g functions below, answer whether or not g is in the
set O(n)
...
If g is in O(n) you should identify
values of c and n0 that can be selected to make the necessary inequality hold
...

a
...
g(n) =
...
g(n) = 150n +
d
...
5

e
...
3
...
For all positive integers m, f (m) ≤ g(m)
...
For some positive integer m, f (m) < g(m)
...
For some positive integer m0 , and all positive integers m > m0 ,
f ( m ) < g ( m )
...
2
...

So, a function g is in Ω( f ) if g grows as fast as f or faster
...
In Figure 7
...

The formal deﬁnition of Ω( f ) is nearly identical to the deﬁnition of O( f ): the
only difference is the ≤ comparison is changed to ≥
...
The function g is a member of the set Ω( f ) if and only if
there exist positive constants c and n0 such that, for all values n ≥ n0 ,
g ( n ) ≥ c f ( n )
...
2
...
2: Ω Examples
We repeat selected examples from the previous section with Ω instead of O
...
To show g is not in Ω( f ), we need to
explain how, for any choices of c and n0 , we can ﬁnd a choice for n ≥ n0 such
that g(n) < c f (n)
...
Then, we need to show n − 7 ≥ 2 (n + 12) for
n
all values n ≥ 26
...

2n is in Ω(3n)
Choose c = 1 and n0 = 1
...

n is not in Ω(n2 )
Whatever values are chosen for c and n0 , we can choose n ≥ n0 such that
n ≥ cn2 does not hold
...
Then, the right side of the inequality cn2 will be greater than
n, and the needed inequality n ≥ cn2 does not hold
...
The value of Fibonacci (n) more than
doubles every time n is increased by 2 (see Section 7
...
1), but the value
of c(n) only increases by 2c
...

Exercise 7
...
Repeat Exercise 7
...

Exercise 7
...
For each part, identify a function g that satisﬁes the property
...
g is in O(n2 ) but not in Ω(n2 )
...
g is not in O(n2 ) but is in Ω(n2 )
...
g is in both O(n2 ) and Ω(n2 )
...
2
...

It is the intersection of the sets O( f ) and Ω( f )
...
In Figure 7
...

An alternate deﬁnition combines the inequalities for O and Ω:
Deﬁnition of Θ( f )
...

135

Chapter 7
...
If g(n) ∈
Θ( f (n)) then g and f grow at the same rate,
Example 7
...

n − 7 is in Θ(n + 12)
Since n − 7 is in O(n + 12) and n − 7 is in Ω(n + 12) we know n − 7 is in Θ(n +
12)
...
We can also show this using the
1
deﬁnition of Θ( f ): choose c1 = 1, c2 = 2 , and n0 = 38
...
Choose c1 = 1, c2 = 1 , and n0 = 1
...
Intuitively, n grows slower than n2 since increasing n by
one always increases the value of the ﬁrst function, n, by one, but increases
the value of n2 by 2n + 1, a value that increases as n increases
...

n − 2 is not in Θ(Fibonacci (n + 1)): n − 2 is not in Ω(n)
...

Properties of O, Ω, and Θ
...
For example, we saw n − 7 is in
Θ(n + 12) and n + 12 is in Θ(n − 7)
...

More generally, if we could prove g is in Θ( an + k) where a is a positive constant
and k is any constant, then g is also in Θ(n)
...

We prove Θ( an + k ) ≡ Θ(n) using the deﬁnition of Θ
...

Θ(n) ⊆ Θ( an + k): For any function g, if g is in Θ(n) then g is in Θ( an + k )
...
To show g is also in Θ( an + k) we ﬁnd d1 , d2 , and m0 such that
d1 ( an + k ) ≥ g(n) ≥ d2 ( an + k ) for all n ≥ m0
...
Ignoring the constants for
now, we can pick d1 = ca1 and d2 = ca2
...
As for the constants, as n increases they become insigniﬁcant
...

Hence, as n grows, an becomes greater than k
...

Since g is in Θ( an + k) there exist positive constants c1 , c2 , and n0 such

136

7
...
Analyzing Procedures
that c1 ( an + k) ≥ g(n) ≥ c2 ( an + k)
...
To show g is also in Θ(n), we ﬁnd d1 , d2 , and m0 such that
d1 n ≥ g(n) ≥ d2 n for all n ≥ m0
...
As before, the constants become
inconsequential as n increases
...

This result can be generalized to any polynomial
...
+ ak nk ) is equivalent to Θ(nk )
...

Exercise 7
...
Repeat Exercise 7
...

Exercise 7
...
Show that Θ(n2 − n) is equivalent to Θ(n2 )
...
8
...
1 )? Either prove they are identical,
or prove they are different
...
9
...

7
...
Instead, we can
consider the essential properties of how the running time of the procedures increases with the size of the input
...
To understand the growth rate of a procedure’s running time, we need a function that
maps the size of the inputs to the procedure to the amount of time it takes to
evaluate the application
...
In Section 7
...
3 we consider
which input of a given size should be used to reason about the cost of applying
a procedure
...
4 provides examples of procedures with different growth
rates
...

7
...
1

Input Size

Procedure inputs may be many different types: Numbers, Lists of Numbers,
Lists of Lists, Procedures, etc
...

We use the Turing machine to model a computer, so the way to measure the size
of the input is the number of characters needed to write the input on the tape
...
The number of different symbols in the tape

Chapter 7
...
Within the O, Ω, and Θ operators, a constant factor does not matter (e
...
, Θ(n) ≡ Θ(17n + 523))
...

With two symbols the input may be 8 times as long as it is with a 256-symbol alphabet, but the constant factor does not matter inside the asymptotic operator
...
To ﬁgure out the input size
of a given type, we need to think about how many symbols it would require to
write down inputs of that type
...
There are only two Boolean values: true and false
...

Numbers
...
Using the binary number
system (that is, 2 tape symbols), we can write it using log2 n bits
...
We can see this from
the argument above — changing the number of symbols in the input alphabet
changes the input length by a constant factor which has no impact within the
asymptotic operators
...
If the input is a List, the size of the input is related to the number of
elements in the list
...
Hence, the size of
an input that is a list of n elements is cn for some constant c
...
If
List elements can vary in size, then we need to account for that in the input size
...
Then, there are n2
total elements and the input size is in Θ(n2 )
...
3
...

To estimate the running time of an evaluation, we use the number of steps required to perform the evaluation
...
For any particular processor, both the time it takes to perform a step and the amount of work that can be
done in one step varies
...
Instead, what we care about is how the running
time changes as the input size increases
...

The clearest and simplest deﬁnition of a step is to use one Turing Machine step
...

Thomas Paine

7
...
Analyzing Procedures

it can read the symbol in the current square, write a symbol into that square,
transition its internal state number, and move one square to the left or right
...

Instead, we usually reason directly from a Scheme procedure (or any precise description of a procedure) using larger steps
...
One possibility is to count the number of times an evaluation rule is used in an evaluation of an application of the procedure
...

Hence, it is reasonable to assume all the evaluation rules to take constant time
...
For example, the evaluation rule for application expressions includes
evaluating every subexpression
...
In cases where the bigger steps are unclear, we can always return to our precise deﬁnition of a step as one step of a Turing Machine
...
3
...

For example, consider this procedure that takes a List as input and outputs the
ﬁrst positive number in the list:
(deﬁne (list-ﬁrst-pos p)
(if (null? p) (error "No positive element found")
(if (> (car p) 0) (car p) (list-ﬁrst-pos (cdr p)))))
If the ﬁrst element in the input list is positive, evaluating the application of listﬁrst-pos requires very little work
...
On the other hand, if none of the
elements are positive, the procedure needs to test each element in the list until
it reaches the end of the list (where the base case reports an error)
...
For a given size, the
worst case input is the input for which evaluating the procedure takes the most
work
...
Without knowing something about the possible inputs to
the procedure, it is safest to be pessimistic about the input and not assume any
properties that are not known (such as that the ﬁrst number in the list is positive
for the ﬁrst-pos example)
...
Since most procedures
can take inﬁnitely many inputs, this requires understanding the distribution of
possible inputs to determine an “average” input
...
If we use the worst-case running time for the helper procedure, we will
grossly overestimate the running time of the main procedure
...
Cost

139

we know how the main procedure uses the helper procedure, we can more precisely estimate the actual running time by considering the actual inputs
...
4
...

7
...
Symbolically, we can think of this function as:
Max-Steps Proc : Number → Number
where Proc is the name of the procedure we are analyzing
...

Because of all the issues with counting steps exactly, and the uncertainty about
how much work can be done in one step on a particular machine, we cannot
usually determine the exact function for Max-Steps Proc
...
Inside the O, Ω, and Θ operators, the actual time needed
for each step does not matter since the constant factors are hidden by the operator; what matters is how the number of steps required grows as the size of the
input grows
...
The Θ operator provides the
most information
...
Hence, our goal is to characterize the
running time of a procedure using the set of functions deﬁned by Θ( f ) of some
function f
...
The
growth classes described are important classes that are commonly encountered
when analyzing procedures, but these are only examples of growth classes
...

7
...
1

No Growth: Constant Time

If the running time of a procedure does not increase when the size of the input
increases, the procedure must be able to produce its output by looking at only a
constant number of symbols in the input
...

Their running time is in O(1) — it does not grow at all
...
Since there is no way to grow
slower than not growing at all, O(1) and Θ(1) are equivalent
...
4
...
A constant time procedure must be able to produce its output by examining only a ﬁxed-size part of the input
...
No matter how long the input
is, a constant time procedure can look at no more than some ﬁxed number of
squares on the tape, so cannot even read the whole input
...
When
car is applied to a non-empty list, it evaluates to the ﬁrst element of that list
...
So, the running time of car is in O(1)
...
None of these procedures need to examine
more than the ﬁrst pair of the list
...
4
...
If the input size is n, the running time is in Θ(n)
...

An example of a procedure that has linear growth is the elementary school addition algorithm from Section 6
...
3
...
The number of steps required
grows linearly with the size of the numbers (recall from Section 7
...
1 that the size
of a number is the number of input symbols needed to represent the number)
...
A procedure
that does something that takes constant time with every element in the input
List, has running time that grows linearly with the size of the input since adding
one element to the list increases the number of steps by a constant amount
...

Example 7
...
6):
(deﬁne (list-append p q)
(if (null? p) q (cons (car p) (list-append (cdr p) q))))
Since list-append takes two inputs, we need to be careful about how we refer
to the input size
...
So, our
goal is to deﬁne a function Max-Steps list-append (n p , nq ) that captures how the
maximum number of steps required to evaluate an application of list-append
scales with the size of its input
...
It
would be rather shocking, however, for an implementation to implement car in a way such that
its running time that is not in O(1)
...
2
...

141

Chapter 7
...
The predicate expression applies the null? procedure with is constant time since the effort required to determine if a list is null does not depend
on the length of the list
...

Next, we consider the alternate expression
...
Hence, the running time of the alternate expression is the time
required to evaluate the recursive application plus the time required to evaluate
everything else in the expression
...

So, we can deﬁned the total running time recursively as:
Max-Steps list-append (n p , nq ) = C + Max-Steps list-append (n p − 1, nq )
where C is some constant that reﬂects the time for all the operations besides the
recursive call
...

This does not yet provide a useful characterization of the running time of listappend though, since it is a circular deﬁnition
...
The base case for the running time deﬁnition is the same
as the base case for the procedure: when the input is null
...

To better characterize the running time of list-append, we want a closed form
solution
...
+ C + C0 where
there are n − 1 of the C terms in the sum
...
We do not know what the values of C and C2 are, but
within the asymptotic notations the constant values do not matter
...

Thus, the running time of list-append is in Θ(n p ) where n p is the number of
elements in the ﬁrst input
...
Instead, to analyze the
running time of a recursive procedure it is enough to determine the amount of
work involved in each recursive call (excluding the recursive application itself )
and multiply this by the number of recursive calls
...
Each call involves only constant-time procedures (other
than the recursive application), so the amount of work involved in each call is
constant
...
Equivalently, the running time
for the list-append procedure scales linearly with the length of the ﬁrst input list
...
5: Length
Consider the list-length procedure from Example 5
...
4
...
If the input has n elements, there will be n + 1 total applications
of list-length to evaluate (one for each element, and one for the null)
...

To determine the running time, we need to determine how much work is involved in each application
...
To evaluate the if expression, the predicate expression, (null? p), must be evaluated ﬁrst
...
4
...
The
consequent expression is the primitive expression, 0, which can be evaluated in
constant time
...
There are n + 1 total applications of list-length to evaluate, the
total running time is n + 1 times the work required for each application (other
than the recursive application itself )
...
The
cdr procedure is constant time
...

Cost of Addition
...
Following the elementary school addition algorithm
(from Section 6
...
3), we know we can add any two numbers by walking down
the digits
...

The number of digits to add is the maximum number of digits in the two input
numbers
...
In the worst
case, we need to look at all the digits in both numbers
...

But, in the list-length procedure the + is used in a very limited way: one of the
inputs is always 1
...
Recall the addition algorithm: we start at the rightmost
(least signiﬁcant) digit, add that digit, and continue with the carry
...
In the worst case, adding one requires
changing every digit in the other input
...
In
the best case (when the last digit is below 9), adding one requires only examining
and changing one digit
...
We assume the numbers are represented
in binary, so instead of decimal digits we are counting bits (this is both simpler,
and closer to how numbers are actually represented in the computer)
...
When the last bit is not a 0, we need to examine the second least signiﬁcant bit (the second bit from the right): if it is a 0 we are done; if it is a 1, we need
to continue
...
Half the time we
also need to examine the second least signiﬁcant bit
...
This con-

143

Chapter 7
...
Thus, the expected number of bits we need
to examine is,
1
1
1
1
1+
1+
1+
1 + (1 +
...
Simplifying the equation, we get:
1+

1 1 1
1
1
+ + +
+
...
So, on average, the
number of bits to examine to add 1 is constant: it does not depend on the length
of the input number
...

Adding any constant C to a number n is equivalent to adding one C times
...

Excluding the recursive application, the list-length application involves applications of two constant time procedures: cdr and adding one using +
...

There are n + 1 total applications of list-length to evaluate total, so the total running time is c(n + 1) where c is the amount of time needed for each application
...

Example 7
...
3:
(deﬁne (list-get-element p n)
(if (= n 1)
(car p)
(list-get-element (cdr p) (− n 1))))
The procedure takes two inputs, a List and a Number selecting the element of
the list to get
...
We can use variables to represent the size of each input, for example
s p and sn for the size of p and n respectively
...

The procedure body is an if expression
...
The worst case running time of the = procedure is
linear in the size of the input: it potentially needs to look at all bits in the input
numbers to determine if they are equal
...
To compare
a number of any size to 1, it is enough to look at a few bits
...
If it is a 1, we need
to examine a few other bits of the input number to determine if its value is different from 1 (the exact number of bits depends on the details of how numbers
are represented)
...

144

7
...
Growth Rates

If the predicate is true, the base case applies the car procedure, which has constant running time
...
The − procedure is similar to +: for arbitrary inputs, its worst case running time is linear
in the input size, but when one of the inputs is a constant the running time is
constant
...
13 asks for a more detailed analysis of the running time of
subtraction)
...

The number of recursive calls is determined by the value of n and the number
of elements in the list p
...
Each recursive call reduces the value passed in as n by 1, so
the number of recursive calls scales linearly with n (the actual number is n − 1
since the base case is when n equals 1)
...
If the value passed in as n exceeds the number of elements in
p, the procedure will produce an error when it attempts to evaluate (cdr p) for
the empty list
...
Hence, the running time of list-get-element does not grow with
the length of the input passed as n; after the value of n exceeds the number of
elements in p it does not matter how much bigger it gets, the running time does
not continue to increase
...
Equivalently, the running time of list-get-element is in
Θ(s p ) where s p is the number of elements in the input list
...
10
...
4
...
Assume the procedure input
has constant running time
...
11
...
2):
(deﬁne (list-sum p) (if (null? p) 0 (+ (car p) (list-sum (cdr p)))))
What assumptions are needed about the elements in the list for the running time
to be linear in the number if elements in the input list?
Exercise 7
...
For the decimal six-digit odometer (shown in the picture on
page 142), we measure the amount of work to add one as the total number of
wheel digit turns required
...

a
...
What are the best case inputs?
c
...

d
...
Most level machines used a
three-digit odometer to tally votes
...

145

Chapter 7
...
13
...
Develop a more convincing argument why this is true by analyzing the worst case
and average case inputs for −
...
14
...
Test experimentally if the DrRacket
+ procedure actually satisﬁes this property
...

7
...
3

Quadratic Growth

If the running time of a procedure scales as the square of the size of the input,
the procedure’s running time grows quadratically
...
The running time is in Θ(n2 )
where n is the size of the input
...
For
example, we can compare every element in a list of length n with every other
element using n(n − 1) comparisons
...
7)
...
7: Reverse
Consider the list-reverse procedure deﬁned in Section 5
...
2:
(deﬁne (list-reverse p)
(if (null? p) null (list-append (list-reverse (cdr p)) (list (car p)))))
To determine the running time of list-reverse, we need to know how many recursive calls there are and how much work is involved in each recursive call
...
Hence, applying list-reverse to a input list with n elements involves n recursive calls
...
The ﬁrst input to list-append is the output of the recursive
call
...
4, the running time of list-append is in Θ(n p )
where n p is the number of elements in its ﬁrst input
...
For the
ﬁrst call, (cdr p) is the parameter, with length n − 1; for the second call, there will
be n − 2 elements; and so forth, until the ﬁnal call where (cdr p) has 0 elements
...
+ 1 + 0
...
Within the
2
asymptotic operators the constant factor of 1 does not matter, so the average
2
running time for each recursive application is in Θ(n)
...
4
...

Thus, the running time is quadratic in the size of the input list
...
8: Multiplication
Consider the problem of multiplying two numbers
...
The intermediate results will be n rows,
each containing n digits
...
, n digits in the 10n−1 s place,
...
Each digit addition requires
constant work, so the total work for all the digit additions is in Θ(n2 )
...

This is not the fastest known algorithm for multiplying two numbers, although it
was the best algorithm known until 1960
...
Since log2 3 < 1
...
In 2007,
¨
Martin Furer discovered an even faster algorithm for multiplication
...

Exercise 7
...
[ ] Analyze the running time of the elementary school long division algorithm
...
16
...
Strive for your procedure to have running time in Θ(n) where n is the
total number of digits in the input numbers
...
17
...

5 Martin Furer, Faster Integer Multiplication,
¨

ACM Symposium on Theory of Computing, 2007
...
Cost

7
...
4

147

Exponential Growth

If the running time of a procedure scales as a power of the size of the input,
the procedure’s running time grows exponentially
...
The
growth rate of a function whose output is multiplied by w when the input size,
n, increases by one is wn
...

For a surprisingly large number of interesting problems, the best known algorithm has exponential running time
...
2, solving
generalized versions of most other games such as Suduko and Minesweeper,
and ﬁnding the factors of a number
...

Example 7
...
The ﬁnd-factor procedure takes one number as input and outputs the lowest factor of that number (other than 1):
(deﬁne (ﬁnd-factor n)
(deﬁne (ﬁnd-factor-helper v)
(if (= (modulo n v) 0) v (ﬁnd-factor-helper (+ 1 v))))
(ﬁnd-factor-helper 2))
The ﬁnd-factor-helper procedure takes two inputs, the number to factor and the
current guess
...
The magnitude
of n is exponential in its size, so the number of recursive calls is in Θ(2b ) where
b is the number of bits in the input
...

The actual work for each recursive call is not constant, though, since it involves
an application of modulo
...
Hence,
it output is 0 if n is divisible by v
...
This means the running time of ﬁnd-factor is in Ω(2b ):
it grows at least as fast as 2b
...
This techniques can improve the running time by constant factors, but there is no known factoring algorithm that runs in faster than
6 In fact, it computing the remainder requires performing division, which is quadratic in the size
of the input
...
4
...
The security of the widely used RSA encryption algorithm depends on factoring being hard
...
7
Example 7
...
For example, the power set
of {1, 2, 3} is {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
...

Here is a procedure that takes a list as input, and produces as output the power
set of the elements of the list:
(deﬁne (list-powerset s)
(if (null? s) (list null)
(list-append (list-map (lambda (t) (cons (car s) t))
(list-powerset (cdr s)))
(list-powerset (cdr s)))))
The list-powerset procedure produces a List of Lists
...
In the recursive case, we can produce the power set by appending the list
of all the subsets that include the ﬁrst element, with the list of all the subsets that
do not include the ﬁrst element
...

An application of list-powerset involves applying list-append, and two recursive
applications of (list-powerset (cdr s))
...
The number of applications of list-powerset is
2n where n is the length of the input list
...
The predicate applies the constanttime procedure, null?
...
The alternate expression is an application of list-append
...
4, we know the running time of list-append is Θ(n p ) where n p is the number of elements in its ﬁrst input
...
The length of
the list output by list-map is the same as the length of its input, so we need to
determine the length of (list-powerset (cdr s))
...
The length of the input list
to map is the number of elements in the power set of a size ns − 1 set: 2ns −1
...
Since we are trying to determine
the total running time, we can do this by thinking about the total length of all the
input lists to list-map over all of the list-powerset
...
+ 21 + 20 , which is equal to 2n − 1
...

8 Observant readers will note that it is not really necessary to perform this evaluation twice since
we could do it once and reuse the result
...

Chapter 7
...

The analysis of the list-append applications is similar
...

Other than the applications of list-map and list-append, the rest of each listpowerset application requires constant time
...
The total running time for list-powerset is the sum
of the running times for the list-powerset applications, in Θ(2n ); the list-map
applications, in Θ(2n ); and the list-append applications, in Θ(2n )
...

In this case, we know there can be no faster than exponential procedure that
solves the same problem, since the size of the output is exponential in the size
of the input
...
The size of the powerset is 2n where n is the size of the
input set
...

7
...
5

Faster than Exponential Growth

We have already seen an example of a procedure that grows faster than exponentially in the size of the input: the ﬁbo procedure at the beginning of this
chapter! Evaluating an application of ﬁbo involves Θ(φn ) recursive applications
where n is the magnitude of the input parameter
...
Hence, the running time of the ﬁbo procedure is
b
in Θ(φ2 ) where b is the size of the input
...

7
...
6

Non-terminating Procedures

All of the procedures so far in the section are algorithms: they may be slow, but
they are guaranteed to eventually ﬁnish if one can wait long enough
...
For example,
(deﬁne (run-forever) (run-forever))
deﬁnes a procedure that never ﬁnishes
...
The running time of this procedure is effectively
inﬁnite since it never ﬁnishes
...
5

Summary

Because the speed of computers varies and the exact time required for a particular application depends on many details, the most important property to understand is how the work required scales with the size of the input
...

Procedures that can produce an output only touching a ﬁxed amount have constant running times
...
5
...
Procedures whose running time quadruples when the input size doubles have quadratic
(in Θ(n2 )) running times
...
Procedures
with exponential running time can only be evaluated for small inputs
...
For large enough
inputs, a procedure with running time in Θ(n) is always faster than a procedure
with running time in Θ(n2 )
...
Without knowing the constants that are hidden by the
asymptotic operators, there is no way to accurately predict the actual running
time on a given input
...
18
...
2):
(deﬁne (list-sum p)
(if (null? p)
0

(+ (car p) (list-sum (cdr p)))))
You may assume all of the elements in the list have values below some constant
(but explain why this assumption is useful in your analysis)
...
19
...
1):
(deﬁne (factorial n) (if (= n 0) 1 (∗ n (factorial (− n 1)))))
Be careful to describe the running time in terms of the size (not the magnitude)
of the input
...
20
...
8)
...
[ ] Analyze the asymptotic running time of this intsto procedure:
(deﬁne (revintsto n)
(if (= n 0)
null
(cons n (revintsto (− n 1)))))
(deﬁne (intsto n) (list-reverse (revintsto n)))
b
...
Which version is better?
d
...
Cost

151

Exercise 7
...
Analyze the running time of the board-replace-peg procedure
(from Exploration 5
...
22
...
5:
(deﬁne (deep-list-ﬂatten p)
(if (null? p) null
(list-append (if (list? (car p))
(deep-list-ﬂatten (car p))
(list (car p)))
(deep-list-ﬂatten (cdr p)))))
Exercise 7
...
[ ] Find and correct at least one error in the Orders of Growth
section of the Wikipedia page on Analysis of Algorithms (http://en
...
org/
wiki/Analysis of algorithms)
...
Hopefully it will soon become a
[
] challenge, and perhaps, eventually will become impossible!

152

7
...
Summary

8

Sorting and Searching
If you keep proving stuff that others have done, getting conﬁdence, increasing the
complexities of your solutions—for the fun of it—then one day you’ll turn
around and discover that nobody actually did that one!
And that’s the way to become a computer scientist
...
These examples involve some quite challenging problems and incorporate many of the ideas we
have seen up to this point in the book
...
1

Sorting

The sorting problem takes two inputs: a list of elements and a comparison procedure
...
For example, if we sort a list of numbers
using < as the comparison procedure, the output is the list of numbers sorted
in order from least to greatest
...
Try to develop a sorting procedure yourself before continuing further
...
For example, take a shufﬂed deck of cards and arrange them in sorted order by ranks
...

Next, we present and analyze three different sorting procedures
...
1
...
The best element is an element for which the comparison procedure
evaluates to true when applied to that element and every other element
...
This element belongs at the front of the output list
...
This means it has the property that for any inputs a, b, and c, if (cf a b) and (cf b c) are both true, the result
of (cf a c) must be true
...
1
...
If the comparison function does not have this
property, there may be no way to arrange the elements in a single sorted list
...

Once we can ﬁnd the best element in a given list, we can sort the whole list by
repeatedly ﬁnding the best element of the remaining elements until no more
elements remain
...

Finding the Best
...
Hence, we deﬁne list-ﬁnd-best recursively
...
When the input list has only one element, that element is the best
element
...

To pick the better element from two elements, we deﬁne the pick-better procedure that takes three inputs: a comparison function and two values
...
For most of our examples, we use the < procedure as the comparison function
...
But, if the maximum value of a number
in the input list is limited, then we can consider < a constant time procedure
since all of the inputs passed to it are below some ﬁxed size
...
An application of list-ﬁnd-best involves n − 1 recursive applications since each one passes
in (cdr p) as the new p operand and the base case stops when the list has one
element left
...
So, the total running time for list-ﬁndbest is in Θ(n); it scales linearly with the length of the input list
...
To implement best ﬁrst sorting, we need to produce a
list that contains all the elements of the original list except for the best element,
which will be placed at the front of the output list
...

Chapter 8
...
The
equal? procedure behaves identically to = when both inputs are Numbers, but
also works sensibly on many other datatypes including Booleans, Characters,
Pairs, Lists, and Strings
...

The worst case running time for list-delete occurs when no element in the list
matches the value of el (in the best case, the ﬁrst element matches and the running time does not depend on the length of the input list at all)
...
There can be up to n recursive applications of list-delete
...
Hence, the total running time for list-delete is in Θ(n) where n is the
length of the input list
...

delete:

We deﬁne list-sort-best-ﬁrst using list-ﬁnd-best and list-

(deﬁne (list-sort-best-ﬁrst cf p)
(if (null? p) null
(cons (list-ﬁnd-best cf p)
(list-sort-best-ﬁrst cf (list-delete p (list-ﬁnd-best cf p))))))
The running time of the list-sort-best-ﬁrst procedure grows quadratically with
the length of the input list
...
There are n recursive applications since each application of list-delete
produces an output list that is one element shorter than its input list
...

Each of these applications has running time in Θ(m) where m is the length of
the input list to list-ﬁnd-best and list-delete (we use m here to avoid confusion
with n, the length of the ﬁrst list passed into list-sort-best-ﬁrst)
...
Hence, the average length
of the input lists to list-ﬁnd-best and list-delete is approximately n
...

There are three applications (two of list-ﬁnd-best and one of list-delete) for each
application of list-sort-best-ﬁrst, so the total running time for each application
is in Θ(3n), which is equivalent to Θ(n)
...
This means doubling the
length of the input list quadruples the expected running time, so we predict that
sorting a list of 2000 elements to take approximately four times as long as sorting

156

8
...
Sorting

a list of 1000 elements
...
Each application of the list-sort-best-ﬁrst procedure involves
two evaluations of (list-ﬁnd-best cf p), a procedure with running time in Θ(n)
where n is the length of the input list
...
We could just evaluate (list-ﬁnd-best cf p) once and reuse the
result
...

Scheme provides the let expression special form to avoid this type of duplicate
work more elegantly
...
To evaluate a let expression, evaluate each binding in order
...
Then, the value of the let expression is the value of the body
expression evaluated with the names in the expression that match
binding names substituted with their bound values
...

The let expression
(let ((Name1 Expression1 ) (Name2 Expression2 )
· · · (Namek Expressionk ))
Expressionbody )
is equivalent to the application expression:
((lambda (Name1 Name2
...
Expressionk )
The advantage of the let expression syntax is it puts the expressions next to the
names to which they are bound
...
Sorting and Searching

157

(deﬁne (list-sort-best-ﬁrst-let cf p)
(if (null? p) null
(let ((best (list-ﬁnd-best cf p)))
(cons best (list-sort-best-ﬁrst-let cf (list-delete p best))))))
This runs faster than list-sort-best-ﬁrst since it avoids the duplicate evaluations,
but the asymptotic asymptotic running time is still in Θ(n2 ): there are n recursive applications of list-sort-best-ﬁrst-let and each application involves linear
time applications of list-ﬁnd-best and list-delete
...

Exercise 8
...
What is the best case input for list-sort-best-ﬁrst? What is its
asymptotic running time on the best case input?
Exercise 8
...
Use the time special form (Section 7
...
Do the results
match the expected running times based on the Θ(n2 ) asymptotic running time?
You may ﬁnd it helpful to deﬁne a procedure that constructs a list containing
n random elements
...
Be careful
in your time measurements that you do not include the time required to generate the input list
...
3
...
4
...

Exercise 8
...
[ ] Deﬁne and analyze a list-sort-worst-last procedure that sorts
by ﬁnding the worst element ﬁrst and putting it at the end of the list
...
1
...
For every output element, we are searching the whole remaining list to ﬁnd the best element, but
do nothing of value with all the comparisons that were done to ﬁnd the best
element
...

Insertion sort works by putting the ﬁrst element in the list in the right place in
the list that results from sorting the rest of the elements
...
The input List must be sorted according
to the comparison function
...

158

8
...
Sorting

(deﬁne (list-insert-one cf el p) ; requires: p is sorted by cf
(if (null? p) (list el)
(if (cf el (car p)) (cons el p)
(cons (car p) (list-insert-one cf el (cdr p))))))
The running time for list-insert-one is in Θ(n) where n is the number of elements
in the input list
...
Each application
involves constant work so the overall running time of list-insert-one is in Θ(n)
...
The lengths of the input lists in the recursive applications are
n − 1, n − 2,
...
Each application involves an application of list-insert-one
which has linear running time
...
There are n applications of list-insert-one, so the
total running time is in Θ(n2 )
...
5
...

Analyze the best case running time
...

Exercise 8
...
Both the list-sort-best-ﬁrst-sort and list-sort-insert procedures
have asymptotic running times in Θ(n2 )
...
For the questions below, use both analytical
and empirical analysis to provide a convincing answer
...
How do the actual running times of list-sort-best-ﬁrst-sort and list-sort-insert
on typical inputs compare?
b
...
For sorting a long list of n random elements, how long does each procedure
take? (See Exercise 8
...
)

8
...
3

Quicker Sorting

Although insertion sort is typically faster than best-ﬁrst sort, its running time is
still scales quadratically with the length of the list
...
Yet computers routinely need to
sort lists containing many millions of elements (for example, consider processing credit card transactions or analyzing the data collected by a super collider)
...
Sorting and Searching

159

The problem with our insertion sort is that it divides the work unevenly into
inserting one element and sorting the rest of the list
...
Any sorting procedure that works by considering one element at a time
and putting it in the sorted position as is done by list-sort-ﬁnd-best and listsort-insert has a running time in Ω(n2 )
...

To do better, we need to either reduce the number of recursive applications
needed (this would mean each recursive call results in more than one element
being sorted), or reduce the time required for each application
...
We partition the elements in the list so that all elements in the ﬁrst part are less than
(according to the comparison function) all elements in the second part
...
This
approach does not produce a better-than-quadratic time sorting procedure because of the inefﬁciency of accessing list elements; however, it leads to insights
for producing a quicker sorting procedure
...
The worst case input is when the value of end is the
length of the input list, which means there will be n recursive applications, each
involving a constant amount of work
...

The list-insert-one-split procedure inserts an element in sorted order by ﬁrst
splitting the list in halves and then recursively inserting the new element in the
appropriate half of the list:

160

8
...
Sorting

(deﬁne (list-insert-one-split cf el p) ; requires: p is sorted by cf
(if (null? p) (list el)
(if (null? (cdr p))
(if (cf el (car p)) (cons el p) (list (car p) el))
(let ((front (list-ﬁrst-half p)) (back (list-second-half p)))
(if (cf el (car back))
(list-append (list-insert-one-split cf el front) back)
(list-append front (list-insert-one-split cf el back)))))))
In addition to the normal base case when the input list is null, we need a special
case when the input list has one element
...

In the recursive case, we use the list-ﬁrst-half and list-second-half procedures
to split the input list and bind the results of the ﬁrst and second halves to the
front and back variables so we do not need to evaluate these expressions more
than once
...
Hence, only one comparison
is needed to determine which of the sublists contains the new element: if the
element is before the ﬁrst element in back it is in the ﬁrst half, and we produce
the result by appending the result of inserting the element in the front half with
the back half unchanged; otherwise, it is in the second half, so we produce the
result by appending the front half unchanged with the result of inserting the
element in the back half
...
We use
n to denote the number of elements in the input list
...
The
reason for this is that instead of using (cdr p) in the recursive call, list-insert-onesplit passes in either the front or back value which is the result of (ﬁrst-half p) or
(second-half p) respectively
...
With each recursive application,
2
the size of the input list is halved
...
This means the number of recursive
calls is logarithmic in the size of the input
...
In computing, we most commonly encounter logarithms with base 2
...
Changing the base of a logarithm from k to b changes the value by the constant factor (see Section 7
...
1), so
inside the asymptotic operators a constant base of a logarithm does not matter
...

Each list-insert-one-split application applies list-append to a ﬁrst parameter that
is either the front half of the list or the result of inserting the element in the front

161

Chapter 8
...
In either case, the length of the list is approximately n
...

So, the time required for each list-insert-one-split application is in Θ(n) where
n is the length of the input list to list-insert-one-split
...
, 1, since the length of the list halves with each call
...
Hence, the total running time for the list-append applications in
2
n
each application of list-insert-one-split is in Θ(log2 n × log n ) = Θ(n)
...

2
Hence, the total running time for list-insert-one-split is in Θ(n)
...
Since list-sort2
insert-split involves Θ(n) applications of list-insert-one-split with average input list length of n , the total running time for list-sort-insert-split is in Θ(n2 )
...

The problem with our list-insert-one-split procedure is that the list-ﬁrst-half
and list-second-half procedures have to cdr down the whole list to get to the
middle of the list, and the list-append procedure needs to walk through the entire input list to put the new element in the list
...
To use the splitting
strategy effectively, we need is a way to get to the middle of the list quickly
...
To do better, we need to
2
change the way we represent our data
...
1
...

8
...
4

Binary Trees

The data structure we will use is known as a sorted binary tree
...
The left and right
sides of the tree are themselves trees
...

sorted binary tree

162

8
...
Sorting

Whereas we deﬁned a List (in Chapter 5) as:
A List is either (1) null or (2) a Pair whose second cell is a List
...

Symbolically:
Tree ::⇒ null
Tree ::⇒ (make-tree Tree Element Tree)
The make-tree procedure can be deﬁned using cons to package the three inputs
into a tree:
(deﬁne (make-tree left element right)
(cons element (cons left right)))
We deﬁne selector procedures for extracting the parts of a non-null tree:
(deﬁne (tree-element tree) (car tree))
(deﬁne (tree-left tree) (car (cdr tree)))
(deﬁne (tree-right tree) (cdr (cdr tree)))
The tree-left and tree-right procedures are constant time procedures that evaluate to the left or right subtrees respectively of a tree
...
All elements
in the left subtree of a tree are less than (according to the comparison function)
the value of the root element of the tree; all elements in the right subtree of a tree
are greater than or equal to the value of the root element of the tree (the result
of comparing them with the root element is false)
...
The null subtrees are not shown
...
Although there
are six elements in the tree, we can reach any element from the top by following
at most two branches
...

depth

The depth of a tree is the largest number of steps needed to reach any node in

163

Chapter 8
...
The example tree has depth 2, since we can reach
every node starting from the root of the tree in two or fewer steps
...
One way to see this is from this recursive
deﬁnition for the maximum number of nodes in a tree:
TreeNodes (d) =

1
TreeNodes (d − 1) + 2 × TreeLeaves (d − 1)

:
:

d=0
d>0

A tree of depth zero has one node
...
The maximum number of leaves in a
tree of depth d is 2d since each level doubles the number of leaves
...

The value of TreeNodes (d − 1) is 2d−1 + 2d−2 +
...
Adding 2d and
2d − 1 gives 2d+1 − 1 as the maximum number of nodes in a tree of depth d
...

A tree is well-balanced if the left and right subtrees of all nodes in the contain
nearly the same number of elements
...
For example, tree-left is analogous to list-ﬁrst-half and make-tree is
analogous to list-append
...
Otherwise, the procedure compares the
element to insert with the element at the top node of the tree
...
The result is a tree
where the left tree is the result of inserting this element in the old left subtree,
and the element and right subtree are the same as they were in the original tree
...

In addition to the recursive call, tree-insert-one only applies constant time procedures
...
Hence, the
running time to insert an element in a well-balanced tree using tree-insert-one
is in Θ(log n)
...
1
...
It inserts each element of the list in turn
into the sorted tree:
(deﬁne (list-to-sorted-tree cf p)
(if (null? p) null
(tree-insert-one cf (car p) (list-to-sorted-tree cf (cdr p)))))
Assuming well-balanced trees as above (we revisit this assumption later), the
expected running time of list-to-sorted-tree is in Θ(n log n) where n is the size of
the input list
...
Each application
involves an application of tree-insert-one (as well as only constant time procedures), so the expected running time of each application is in Θ(log n)
...

To use our list-to-sorted-tree procedure to perform sorting we need to extract
a list of the elements in the tree in the correct order
...
Starting from the top node, all
elements in its left subtree should appear before the top element, and all the elements in its right subtree should follow it
...
For each application, the body applies list-append where the ﬁrst parameter is the elements
extracted from the left subtree
...

Hence, the total size of all the appended lists is at most n, and the running time
for all the list-append applications is in Θ(n)
...

Putting things together, we deﬁne list-sort-tree:
(deﬁne (list-sort-tree cf p)
(tree-extract-elements (list-to-sorted-tree cf p)))
The total running time for list-sort-tree is the running time of the list-to-sortedtree application plus the running time of the tree-extract-elements application
...
Sorting and Searching

165

ments in its input list (which is the result of the list-to-sorted tree application, a
list containing n elements where n is the number of elements in p)
...
This is substantially better than the
previous sorting algorithms which had running times in Θ(n2 ) since logarithms
grow far slower than their input
...

There is no general sorting procedure that has expected running time better
than Θ(n log n), so there is no algorithm that is asymptotically faster than listsort-tree (in fact, it can be proven that no asymptotically faster sorting procedure
exists)
...

Unbalanced Trees
...

If the input list is in random order, this assumption is likely to be valid: each
element we insert is equally likely to go into the left or right half, so the halves
contain approximately the same number of elements all the way down the tree
...

For example, suppose the input list is already in sorted order
...

For the previous example, this produces the unbalanced tree shown in Figure 8
...

This tree contains the same six elements as the earlier example, but because it is
not well-balanced the number of branches that must be traversed to reach the
deepest element is 5 instead of 2
...

In these pathological situations, the tree effectively becomes a list
...
1
...

166

8
...
Sorting

ber of recursive applications of tree-insert-one needed to insert a new element
will not be in Θ(log n), but rather will be in Θ(n)
...
The listsort-tree-insert procedure has expected running time in Θ(n log n) for randomly
distributed inputs, but has worst case running time in Θ(n2 )
...
7
...
Analyze the running time
of your procedure
...
8
...
The running time of your procedure
should not grow faster than linearly with the number of nodes in the tree
...
9
...

The depth of the output tree should be no higher than log2 n + 1 where n is the
number of elements in the input tree
...

My boss and tutor,
Pat Shackleton, was
very pleased with
my completed
program
...

He bet me sixpence
that I had not
...

Sir Tony Hoare, The
Emperor’s Old Clothes,
1980 Turing Award
Lecture
...
)

8
...
5

Quicksort

Although building and extracting elements from trees allows us to sort with expected time in Θ(n log n), the constant time required to build all those trees and
extract the elements from the ﬁnal tree is high
...

Instead, we keep the two sides of the tree as separate lists, and sort them recursively
...

The values in the ﬁrst half of the list are all less than the values in the second half
of the list, so the lists can be sorted separately
...
5) to divide the input list into sublists containing elements below and above the comparison element, and then recursively sorts those sublists:
(deﬁne (list-quicksort cf p)
(if (null? p) null
(list-append
(list-quicksort cf
(list-ﬁlter (lambda (el) (cf el (car p))) (cdr p)))
(cons (car p)
(list-quicksort cf
(list-ﬁlter (lambda (el) (not (cf el (car p)))) (cdr p)))))))
This is the famous quicksort algorithm that was invented by Sir C
...
R
...
He
was there to study probability theory, but also got a job working on a project to
translate Russian into English
...
Since the dictionary was stored on a magnetic tape which could
be read in order faster than if it was necessary to jump around, the translation

Chapter 8
...

Hoare invented the quicksort algorithm for this purpose and it remains the most
widely used sorting algorithm
...
In the expected
cases, each recursive call halves the size of the input list (since if the list is randomly arranged we expect about half of the list elements are below the value of
the ﬁrst element), so there are approximately log n expected recursive calls
...
At each call depth, the total length of the
inputs to all the calls to list-ﬁlter is n since the original list is subdivided into 2d
sublists, which together include all of the elements in the original list
...
As with list-sort-tree-insert, if the input list is not randomly
rearranged it is possible that all elements end up in the same partition
...

Exercise 8
...
Estimate the time it would take to sort a list of one million elements using list-quicksort
...
11
...
Experimentally compare their actual running times
...
12
...

Exercise 8
...
[ ] Instead of using binary trees, we could use ternary trees
...
Each node has three subtrees: left, containing elements before the left element; middle, containing elements between the left and right
elements; and right, containing elements after the right element
...
2

Searching

In a broad sense, nearly all problems can be thought of as search problems
...
For example, to solve the pegboard puzzle (Exploration 5
...
For most interesting problems, however, the search space is
far too large to search through all possible solutions
...
First, we consider
the simple problem of ﬁnding an element in a list that satisﬁes some property
...
Finally, we consider
the more speciﬁc problem of efﬁciently searching for documents (such as web

There are two ways
of constructing a
software design: one
way is to make it so
simple that there
are obviously no
deﬁciencies, and the
other way is to
make it so
complicated that
there are no obvious
deﬁciencies
...
It
demands the same
skill, devotion,
insight, and even
inspiration as the
discovery of the
simple physical
laws which underlie
the complex
phenomena of
nature
...
2
...

8
...
1

Unstructured Search

Finding an item that satisﬁes an arbitrary property in unstructured data requires
testing each element in turn until one that satisﬁes the property is found
...

The list-search procedure takes as input a matching function and a list, and outputs the ﬁrst element in the list that satisﬁes the matching function or false if
there is no satisfying element:1
(deﬁne (list-search ef p)
(if (null? p) false ; Not found
(if (ef (car p)) (car p) (list-search ef (cdr p)))))
For example,
(list-search (lambda (el) (= 12 el)) (intsto 10)) ⇒ false
(list-search (lambda (el) (= 12 el)) (intsto 15)) ⇒ 12
(list-search (lambda (el) (> el 12)) (intsto 15)) ⇒ 13
Assuming the matching function has constant running time, the worst case running time of list-search is linear in the size of the input list
...
If the input list has length n, there
are n recursive calls to list-search, each of which involves only constant time
procedures
...
In the worst case, we always need to test
every element in the input list before concluding that there is no element that
satisﬁes the matching function
...
2
...
Suppose the input data
is a sorted binary tree, as introduced in Section 8
...
4
...
Instead of eliminating just one element with each application of the matching function as was the case with list-search, with a sorted
binary tree a single application of the comparison function is enough to exclude
approximately half the elements
...
The ﬁrst procedure determines when a satisfying element has been
found (we call this the ef procedure, suggesting equality)
...
Since cf is used
to traverse the tree, the input tree must be sorted by cf
...
An
alternative would be to produce an error if no satisfying element is found, but this is more awkward
when list-search is used by other procedures
...
Sorting and Searching

169

(deﬁne (binary-tree-search ef cf tree) ; requires: tree is sorted by cf
(if (null? tree) false
(if (ef (tree-element tree)) (tree-element tree)
(if (cf (tree-element tree))
(binary-tree-search ef cf (tree-left tree))
(binary-tree-search ef cf (tree-right tree))))))
For example, we can search for a number in a sorted binary tree using = as the
equality function and < as the comparison function:
(deﬁne (binary-tree-number-search tree target)
(binary-tree-search (lambda (el) (= target el))
(lambda (el) (< target el))
tree))
To analyze the running time of binary-tree-search, we need to determine the
number of recursive calls
...
If not, all the elements could be in the right branch,
for example, and binary-tree-search becomes like list-search in the pathological
case
...

Hence, the number of calls needed to reach a null tree is in Θ(log n) where n is
the number of elements in the input tree
...

Assuming the procedures passed as ef and cf have constant running time, the
work for each call is constant except for the recursive call
...
This is a huge improvement over linear searching: with linear
search, doubling the number of elements in the input doubles the search time;
with binary search, doubling the input size only increases the search time by a
constant
...
2
...
What if we want to search a collection of documents, such as ﬁnding all web pages that contain a given word? The web visible to search engines
contains billions of web pages most of which contain hundreds or thousands
of words
...
One way to do this is to build an index that provides a mapping from words to the documents that contain them
...

Once the index is built, the work required to perform one search is just the time
it takes to look up the target word in the index
...

170

8
...
Searching

Strings
...
A String is similar to a List, but specialized for representing sequences
of characters
...
For example, "abcd" evaluates to a String containing four characters
...

stringOutputs true if the ﬁrst input String is lexicographically before the second
input String, otherwise false
...

list->string : List → String
Outputs a String containing the characters in the input List
...

Building the index
...
Each location is a
Pair consisting of a document identiﬁer (for web documents, this is the Uniform
Resource Locator (URL) that is the address of the web page represented as a
string) and a Number identifying the position within the document where the
word appears (we label positions as the number of characters in the document
before this location)
...
The ﬁrst step is to deﬁne a procedure that takes
as input a string representing an entire document, and produces a list of (word

...
We
deﬁne a word as a sequence of alphabetic characters; non-alphabetic characters
including spaces, numbers, and punctuation marks separate words and are not
included in the index
...
The inner procedure, text-to-word-positions-iter, takes three inputs: a list of the characters in
the document, a list of the characters in the current word, and a number representing the position in the string where the current word starts; it outputs the list
of (word
...
The value passed in as w can be null, meaning there
is no current word
...

A word starts when the ﬁrst alphabetic character is found, and continues until
either the ﬁrst non-alphabetic character or the end of the document
...

Chapter 8
...
To enable
fast searching, we store the index in a binary tree sorted by the target word
...

The index is represented as a sorted binary tree where each element is a pair of a
word and a list of the positions where that word appears
...
Otherwise, a new entry is added to the index for the word with
a list of positions containing the position as its only element
...
position) pairs in a list into the index, we use insert-intoindex to add each pair, passing the resulting index into the next recursive call:
(deﬁne (insert-all-wps index wps)
(if (null? wps) index
(insert-all-wps (insert-into-index index (car wps)) (cdr wps))))
To add all the words in a document to the index we use text-to-word-positions
to obtain the list of word-position pairs
...
2
...
Then, we use insert-allwps to add all the word-position pairs in this document to the index
...

(deﬁne (index-document url text)
(insert-all-wps
null
(list-map (lambda (wp) (cons (car wp) (cons url (cdr wp))))
(text-to-word-positions text))))
We leave analyzing the running time of index-document as an exercise
...
Once the index is built, we can use it to answer any number of search
queries without needing to reconstruct the index
...
Our goal is to produce an index for a set of documents, not
just a single document
...
We use this repeatedly to create an index of any number of documents
...
If a word occurs in both documents, the
word should appear in the merged index with a position list that includes all the
positions in both indexes
...

(deﬁne (merge-indexes d1 d2)
(deﬁne (merge-elements p1 p2)
(if (null? p1) p2
(if (null? p2) p1
(if (string=? (car (car p1)) (car (car p2)))
(cons (cons (car (car p1))
(list-append (cdr (car p1)) (cdr (car p2))))
(merge-elements (cdr p1) (cdr p2)))
(if (string(cons (car p1) (merge-elements (cdr p1) p2))
(cons (car p2) (merge-elements p1 (cdr p2))))))))
(list-to-sorted-tree
(lambda (e1 e2) (string(merge-elements (tree-extract-elements d1)
(tree-extract-elements d2)))))))
To merge the indexes, we ﬁrst use tree-extract-elements to convert the tree representations to lists
...

Since the lists are sorted by the target word, we can perform the merge efﬁciently
...
If they are different, we
use string ...

Obtaining documents
...
Sorting and Searching

173

documents to index
...
To read documents from the web, we use library procedures provided by DrRacket
...
ss" "net"))
...
A Uniform Resource Locator (URL) is a standard way to identify a document on the network
...

The full grammar for URLs is quite complex (see Exercise 2
...
Domain
Path
/ Name OptPath

An example of a URL is http://www
...
gov/index
...
The http indicates
the HyperText Transfer Protocol, which prescribes how the web client (browser)
and server communicate with each other
...
whitehouse
...
html (which is the default page for most web
servers)
...
The read-char
procedure takes as input a port, and outputs the ﬁrst character in that port
...
We can use
read-char repeatedly to read each character in the web page of the port
...
The procedure eof-object? evaluates to
true when applied to this marker, and false for all other inputs
...

(deﬁne (web-get url)
(list->string (read-all-chars (get-pure-port (string->url url)))))
2 We use Name to represent sequences of characters in the domain and path names, although the
actual rules for valid names for each of these are different
...
2
...
It recurses through the list of pages, indexing each
document, and merging that index with the result of indexing the rest of the
pages in the list
...
For example, here
we use Jeremy Hylton’s collection of the complete works of William Shakespeare
(http://shakespeare
...
edu) to deﬁne shakespeare-index as an index of the words
used in all of Shakespeare’s plays
...
mit
...
html"))
;; List of plays following the site’s naming conventions
...
It contains
22949 distinct words and over 1
...
Much of the time
spent building the index is in constructing new lists and trees for every change,
which can be avoided by using the mutable data types we cover in the next chapter
...
Once the
documents have been indexed, we can use the index to quickly perform any
search
...
Using an index, searching for pages that use a given word is easy
and efﬁcient
...
position)
(lambda (el) (stringindex))
As analyzed in the previous section, the expected running time of binary-treesearch is in Θ(log n) where n is the number of nodes in the input tree
...
The number of
nodes in the index is the number of distinct words in the indexed documents
...

See Exercise 8
...

Chapter 8
...
Note that the number and size of
the documents does not matter! This is why a search engine such as Google
can respond to a query quickly even though its index contains many billions of
documents
...
Our analysis of binary-tree-search assumes the
equality and comparison functions are constant time procedures
...
As used here, one of the inputs
is the target word
...
Thus, the overall running time of
search-in-index is in Θ(w log d) where w is the length of word and d is the number of words in the index
...
Thus, the overall running time is in Θ(log d)
...
mit
...
html"
...
mit
...
html"
...
mit
...
html"
...
A useful web search engine needs at least two more capabilities:
a way to automate the process of ﬁnding documents to index, and a way to rank
the documents that contain the target word by the likelihood they are useful
...

Histogram
...
The
index-histogram procedure produces a list of the words in an index sorted by
how frequently they appear:
(deﬁne (index-histogram index)
(list-quicksort
(lambda (e1 e2) (> (cdr e1) (cdr e2)))
(list-map (lambda (el) (cons (car el) (length (cdr el))))
(tree-extract-elements index))))
The expression,
(list-ﬁlter (lambda (entry) (> string-length (car entry) 5))
(index-histogram shakespeare-index))
evaluates to a list of Shakespeare’s favorite 6-letter and longer words along with
the number of times they appear in the corpus (the ﬁrst two entries are from
their use in the page formatting):

The mathematics
and the
metaphysics, Fall to
them as you ﬁnd
your stomach serves
you; No proﬁt grows
where is no pleasure
ta’en: In brief, sir,
study what you
most affect
...
2
...
63345) ("speech"
...
1557) ("father"
...
1061)
("master"
...
826) ("mistress"
...
("brother"
...
("daughter"
...
("mother"
...
("mustardseed"
...
("excrement"
...
("zwaggered"
...
14
...

Analyze the running time of your procedure
...
15
...

Exercise 8
...
[ ] Analyze the running time required to build the index
...
Analyze the running time of the text-to-word-positions procedure
...
Be careful to clearly state all assumptions on
which your analysis relies
...
Analyze the running time of the insert-into-index procedure
...
Analyze the running time of the index-document procedure
...
Analyze the running time of the merge-indexes procedure
...
Analyze the overall running time of the index-pages procedure
...

Exercise 8
...
[ ] The search-in-index procedure does not actually have the expected running time in Θ(log w) (where w is the number of distinct words in
the index) for the Shakespeare index because of the way it is built using mergeindexes
...
Explain why the input to list-to-sorted-tree in the mergeindexes procedure leads to a binary tree where the running time for searching is
in Θ(w)
...

Exercise 8
...
[ ] The site http://www
...
com provides an interesting
way to view political speeches by looking at how the frequency of the use of
different words changes over time
...
You could use your program to analyze
how Shakespeare’s word use is different in tragedies and comedies or to compare
Shakespeare’s vocabulary to Jefferson’s
...
Sorting and Searching
Exploration 8
...

For our Shakespeare index example, we manually found a list of interesting documents to index
...
For this, we need a web crawler
...
Typical web crawlers start with a set
of seed URLs, and then ﬁnd more documents to index by following the links on
those pages
...
To develop a web crawler,
we need a way to extract the links on a given web page, and to manage the set of
pages to index
...
[ ] Deﬁne a procedure extract-links that takes as input a string representing the text of a web page and outputs a list of all the pages linked to from
this page
...
An anchor tag has the form:4
...
)
b
...
As output, it produces a pair consisting of an index (that is
the result of adding all words from the page at the input URL to the input
index) and a list of URLs representing all pages linked to by the crawled page
...
[ ] Deﬁne a procedure crawl-web that takes as input a list of seed URLs and
a Number indicating the maximum depth of the crawl
...

For a web search engine to be useful, we don’t want to just get a list of all the
pages that contain some target word, we want the list to be sorted according to
which of those pages are most likely to be interesting
...
Many
factors are used to rank pages including an analysis of the text on the page itself,
whether the target word is part of a title, and how recently the page was updated
...
If many pages link to a given page, it is more likely that the given page is
useful
...

The ranking system used by Google is based on this formula:
R(u) =

∑

v∈ Lu

R(v)
L(v)

4 Not all links match this structure exactly, so this may miss some of the links on a page
...
In
addition, the rank
of a document is
calculated from a
constant
representing the
probability that a
browser through the
database will
randomly jump to
the document
...

United States Patent
#6,285,999, September
2001
...
3
...
The value R(u) gives a measure of the ranking of the page identiﬁed
by u, where higher values indicate more valuable pages
...

relaxation

One way to approximate equations like this one is to use relaxation
...
To estimate the page ranks for a set of web pages, we
initially assume every page has rank 1 and evaluate R(u) for all the pages (using
the value of 1 as the rank for every other page)
...
A relaxation keeps repeating until the values change
by less than some threshold amount, but there is no guarantee how quickly this
will happen
...

d
...

The linking structure can be represented as a List where each element of the
List is a pair of a URL and a list of all the URLs that include a link to that URL
...

e
...

f
...

g
...
001% of all web searches to your site
...
3

Summary

The focus of Part II has been on predicting properties of procedures, in particular how their running time scales with the size of their input
...
4: recursive
deﬁnitions, universality, and abstraction
...

Actual machines use the digital abstraction to allow a continuous range of voltages to represent just two values
...

In Part III, we will see many more recursive deﬁnitions, and extend the notion of
recursive deﬁnitions to the language interpreter itself
...

9

Mutation
Faced with the choice between changing one’s mind and proving that there is no
need to do so, almost everyone gets busy on the proof
...
This enabled very simple evaluation
rules for names, as well as allowing the substitution model of evaluation
...

This chapter introduces special forms known as mutators that allow programs
to change the value in a given place
...
It does, however, make it possible to express certain computations more efﬁciently and clearly than could be done without it
...

9
...
The exclamation point at the end of set! follows a naming
convention to indicate that an operation may mutate state
...
It assigns a value to a variable
...
To evaluate an assignment, evaluate the
expression, and replace the value associated with the name with the value
of the expression
...

Assignments do not produce output values, but are used for their side effects
...

Here is an example use of set!:

assignment

180

9
...
Assignment

> (deﬁne num 200)
> num
200

> (set! num 150)
> (set! num 1120)
> num
1120

Begin expression
...
A begin expression is a special form that evaluates a sequence of expressions in order and evaluates to the value of the last
expression
...
To evaluate a begin expression,
(begin Expression1 Expression2
...
The value of the
begin expression is the value of the last subexpression, Expressionk
...

The begin expression must be a special form
...

The deﬁnition syntax for procedures includes a hidden begin expression
...
1
...

(let ((Name1 Expression1 ) (Name2 Expression2 )
· · · (Namek Expressionk ))
MoreExpressions Expression)
is equivalent to the application expression:
((lambda (Name1 Name2
...
Expressionk )

Chapter 9
...
2

181

Impact of Mutation

Introducing assignment presents many complications for our programming model
...
6
...
All the procedures we can deﬁne without
using mutation behave almost like mathematical functions—every time they are
applied to the same inputs they produce the same output
...

Example 9
...
Because
of the hidden begin expression in the deﬁnition, the (set! counter (+ counter 1))
is always evaluated ﬁrst, followed by counter which is the last expression in the
begin expression so its value is the value of the procedure
...

The substitution model of evaluation doesn’t make any sense for this evaluation:
the value of counter changes during the course of the evaluation
...

Mutation also means some expressions have undetermined values
...
The evaluation rule
for the application expression does not specify the order in which the operand
subexpressions are evaluated
...
If the second subexpression, counter,
is evaluated before the third subexpression, (update-counter!), the value of the
expression is 1 the ﬁrst time it is evaluated, and 3 the second time it is evaluated
...
With this ordering,
the value of the expression is 2 the ﬁrst time it is evaluated, and 4 the second
time it is evaluated
...
5
...

182

9
...
1

9
...
Impact of Mutation

Names, Places, Frames, and Environments

Because assignments can change the value associated with a name, the order in
which expressions are evaluated now matters
...

place

frame
environment

Since the value associated with a name can now change, instead of associating a
value directly with a name we use a name as a way to identify a place
...
With mutation, we can
change the value in a place; this changes the value associated with the place’s
name
...

An environment is a pair consisting of a frame and a pointer to a parent environment
...
The global environment exists when the interpreter starts,
and is maintained for the lifetime of the interpreter
...
Names deﬁned in the interactions
buffer are placed in the global environment
...
Figure 9
...

Every environment has a parent environment except for the global environment
...
Hence,
if we start with any environment, and continue to follow its parent pointers we
always eventually reach the global environment
...
An environment captures the current state of the interpreter
...

Global Environment

counter
update-counter!
x
Environment A

x

7

0

·
3

environment: parameters: ()
body: (begin (set! counter (+ counter 1))

counter)
Environment B

y

88
Figure 9
...
Sample environments
...
Each name has an associated place
that contains the value associated with that name
...
The value associated with set-counter! is the procedure we deﬁned in Example 9
...

A procedure is characterized by its parameters, body code, and a pointer to the environment in
which it will be evaluated
...
Mutation

9
...
2

183

Evaluation Rules with State

Introducing mutation requires us to revise the evaluation rule for names, the
deﬁnition rule, and the application rule for constructed procedures
...

Names
...
To evaluate a name expression,
search the evaluation environment’s frame for a place with a name that
matches the name in the expression
...
Otherwise, the value of
the name expression is the result of evaluating the name expression in
the parent environment
...

For example, to evaluate the value of the name expression x in Environment B
in Figure 9
...

Since there is no place named x in that frame, we follow the parent pointer to
Environment A, and evaluate the value of the name expression in Environment
A
...

The value of the same expression in the Global Environment is 3 since that is the
value in the place named x in the Global Environment’s frame
...
Since no y place exists, evaluation continues by
evaluating the expression in the parent environment, which is the Global Environment
...

Deﬁnition
...
A deﬁnition creates a new place with the deﬁnition’s name in the frame associated with the evaluation environment
...
If there is already a place with the name in the current frame, the deﬁnition replaces
the old place with a new place and value
...
The meaning is different, though, since an assignment
ﬁnds the place associated with the name and puts a new value in that place
...
Hence, (deﬁne Name Expression) has a different meaning from (set! Name Expression) when there is no place named Name in the
current execution environment
...

Application
...
Instead of using substitution, the

184

9
...
Impact of Mutation

new application rule creates a new environment with a frame containing places
named for the parameters
...
To apply a constructed procedure:
1
...

2
...
Evaluate each
operand expression in the environment or the application and initialize the value in each place to the value of the corresponding
operand expression
...
Evaluate the body of the procedure in the newly created environment
...

Consider evaluating the application expression (bigger 3 4) where bigger is the
procedure from Example 3
...

Evaluating an application of bigger involves following the Stateful Application
Rule 2
...
Since bigger was deﬁned in the global
environment, its environment pointer points to the global environment
...

Next, create places in the new environment’s frame named for the procedure
parameters, a and b
...
The value in the place associated with b is 4
...
Figure 9
...
The values of a and b are found in the application environment
...
2
...

The new application rule becomes more interesting when we consider procedures that create new procedures
...
Mutation

The environment that results from evaluating (deﬁne inc (make-adder 1)) is
shown in Figure 9
...
The name inc has a value that is the procedure resulting
from the application of (make-adder 1)
...

Global Environment

make-adder
inc

Application Env 1

v

1

·
·
env: params: (v)
body: (lambda (n) (+ n v))

env: params: (n)
body: (+ n v)
Figure 9
...
Environment after evaluating (deﬁne inc (make-adder 1))
...
Since the body is a lambda expression, it evaluates to a procedure
...

Next, consider evaluating (inc 149)
...
4 illustrates the environment for
evaluating the body of the inc procedure
...
We
evaluate the body of the procedure, (+ n v), in that environment
...
The value of v is not found there, so
evaluation continues by looking in the parent environment
...

Exercise 9
...
Devise a Scheme expression that could have four possible values,
depending on the order in which application subexpressions are evaluated
...
2
...
3
...
4
...

Exercise 9
...
Draw the environment that results after evaluating the following
expressions, and explain what the value of the ﬁnal expression is
...
)

> (deﬁne (make-applier proc) (lambda (x) (proc x))
> (deﬁne p (make-applier (lambda (x) (let ((x 2)) x))))
> (p 4)

9
...
This means that once
a Pair is created, the values in its cells cannot be changed
...
A MutablePair is constructed using
mcons, which is similar to cons but produces a MutablePair
...
A MutablePair is a distinct datatype
from a Pair; it is an error to apply car to a MutablePair, or to apply mcar to an
immutable Pair
...

set-mcdr!: MutablePair × Value → Void
Replaces the value in the second cell of the MutablePair with the value of
the second input
...
In most Scheme implementations
and the standard deﬁnition of Scheme, a standard cons pair is mutable
...
So, the designers of DrRacket decided for Version
4
...

Chapter 9
...

Here are some interactions using a MutablePair:

> (deﬁne pair (mcons 1 2))
> (set-mcar! pair 3)
> pair
(3
...
4)
The set-mcdr! procedure allows us to create a pair where the second cell of the
pair is itself: (set-mcdr! pair pair)
...
5
...
5
...

the output
...

We can also create objects that combine mutable and immutable Pairs
...
Since the outer Pair is immutable,
we cannot change the objects in its cells
...
We can, however, change the values in the cells of the mutable pair in its ﬁrst cell
...

Mutable Lists
...
A MutableList is either null or a MutablePair whose second cell contains a MutableList
...
To use it, evaluate the following
expression: (require racket/mpair)
...
This library deﬁnes the mlist procedure
that is similar to the list procedure, but produces a MutableList instead of an
immutable List
...
6
...
6
...

set-mcdr! procedures to change the values in the cells
...

188

9
...
Imperative Programming

Many of the list procedures from Chapter 5 can be directly translated to work on
mutable lists
...
4, though, we need to be careful when using mcdr to
recurse through a MutableList since structures created with MutablePairs can
include circular pointers
...
4
...
5?
Exercise 9
...
[ ] Deﬁne a mpair-circular? procedure that takes a MutablePair as
its input and outputs true when the input contains a cycle and false otherwise
...
4
imperative
programming

Imperative Programming

Mutation enables a style of programming known as imperative programming
...

The main operation in function programming is application
...
With imperative programming, the
primary operation is assignment (performed by set!, set-mcar!, and set-mcdr! in
Scheme; but typically by an assignment operator, often := or =, in languages designed for imperative programming such as Pascal, Algol60, Java, and Python)
...
The following
subsection introduces some imperative control structures
...
4
...
4
...
When our goal is only
to change some elements in an existing list, this wastes memory constructing a
new list and may require more running time than a procedure that modiﬁes the
input list instead
...
4
...

Example 9
...
4) produces a new list that is the result
of applying the same procedure to every element in the input list
...
Mutation

189

Whereas the functional list-map procedure uses cons to build up the output list,
the imperative mlist-map! procedure uses set-car! to mutate the input list’s elements:
(deﬁne (mlist-map! f p)
(if (null? p) (void)
(begin (set-mcar! p (f (mcar p)))
(mlist-map! f (mcdr p)))))
The base case uses (void) to evaluate to no value
...

Assuming the procedure passed as f has constant running time, the running
time of the mlist-map! procedure is in Θ(n) where n is the number of elements
in the input list
...
This is asymptotically the same as the list-map procedure, but we would expect the actual running time to be faster since there is no
need to construct a new list
...
The list-map procedure allocates n new cons cells, so it requires memory in Θ(n) where n is the number of
elements in the input list
...

Example 9
...
In Example 5
...
We deﬁne mlist-ﬁlter! using set-mcdr! to skip over elements that
should not be included in the ﬁltered list:
(deﬁne (mlist-ﬁlter! test p)
(if (null? p) null
(begin (set-mcdr! p (mlist-ﬁlter! test (mcdr p)))
(if (test (mcar p)) p (mcdr p)))))
Assuming the test procedure has constant running time, the running time of the
mlist-ﬁlter! procedure is linear in the length of the input list
...

Unlike mlist-map!, mlist-ﬁlter! outputs a value
...
Consider this example:

> (deﬁne a (mlist 1 2 3 1 4))
> (mlist-ﬁlter! (lambda (x) (> x 1)) a)
{2 3 4}

190

9
...
Imperative Programming

>a
{1 2 3 4}

The value of a still includes the initial 1
...

To avoid this, mlist-ﬁlter! should be used with set! to assign the variable to the
resulting mutable list:
(set! a (mlist-ﬁlter! (lambda (x) (> x 1)) a))
Example 9
...
An
imperative version of this procedure instead mutates the ﬁrst list to contain the
elements of both lists
...
3
Like list-append, the running time of the mlist-append! procedure is in Θ(n)
where n is the number of elements in the ﬁrst input list
...
The memory use of mlist-append! is
constant: it does not create any new cons cells to append the lists
...
Adding mutation makes it possible to deﬁne many procedures more
efﬁciently and compactly, but introduces many new potential pitfalls in producing reliable programs
...

aliasing

One challenge introduced by mutation is aliasing
...
This was true before mutation also, but didn’t matter since the value of an object never changed
...

For example,

> (deﬁne m1 (mlist 1 2 3))
> (deﬁne m2 (mlist 4 5 6))
> (mlist-append! m1 m2)
> (set! m1 (mlist-ﬁlter! (lambda (el) (= (modulo el 2) 0)) m1))
3 The

mappend! library procedure in DrRacket takes a different approach: when the ﬁrst input
is null it produces the value of the second list as output in this case
...

Chapter 9
...
But, the value of m2 has still changed! It changed because after
evaluating (mlist-append! m1 m2) the m1 object shares cells with m2
...

The built-in procedure eq? takes as input any two objects and outputs a Boolean
...
For example, (eq?
3 3) evaluates to true but (eq? (mcons 1 2) (mcons 1 2)) evaluates to false
...

For the earlier mlist-append! example, (eq? m1 m2) evaluates to false since m1
and m2 do not refer to the same object
...
Evaluating
(set-mcar! m2 3) changes the value of both m1 and m2 since the modiﬁed cell is
common to both structures
...
6
...

Exercise 9
...
[ ] Deﬁne a procedure mlist-truncate! that takes as input a MutableList and modiﬁes the list by removing the last element in the list
...

Exercise 9
...
[ ] Deﬁne a procedure mlist-make-circular! that takes as input a
MutableList and modiﬁes the list to be a circular list containing all the elements
in the original list
...
5
...
9
...
Is it possible to implement a mlist-reverse! procedure that is asymptotically faster than the list-reverse procedure from Example 5
...
10
...

9
...
2

Imperative Control Structures

The imperative style of programming makes progress by using assignments to
manipulate state
...
With functional programming, this is done using recursive deﬁnitions
...
With imperative programming, we
can make progress by changing state repeatedly without needing to pass in different operands
...

Your Federal Income
Tax, IRS Publication
17, 2009
...
4
...
A while
loop has a test condition and a body
...
If it
evaluates to true, the while loop body is executed
...
The while loop continues to execute until the test condition
evaluates to false
...
Even though
the test and body procedures take no parameters, they need to be procedures instead of expressions, since every iteration of the loop should re-evaluate the test
and body expressions of the passed procedures
...
In each iteration, the
body procedure is applied, updating the values of a and b to the next Fibonacci
numbers
...
The reason for this
is the previous assignment expression has already changed the value of b, by
adding a to it
...
The fact that the value of a variable
can change depending on when it is used often makes imperative programming
trickier than functional programming
...
For example, here is a version of the procedure in the Python programming language:
def ﬁbonacci (n):
a=1
b=1

Chapter 9
...

The most interesting statement is the double assignment: a, b = b, a + b
...
Without the double assignment operator, it would
be necessary to store the old value of b in a new variable so it can be assigned to
a after updating b to the new value
...
11
...

Exercise 9
...
Another common imperative programming structure is a repeatuntil loop
...
The procedure should evaluate the body procedure
repeatedly, until the test procedure evaluates to a true value
...
13
...
2
...
Start by deﬁning a mutable binary tree abstraction,
and then use this and the MutableList data type to implement an imperativestyle insert-into-index! procedure that mutates the input index by adding a new
word-position pair to it
...
Analyze the impact of your
changes on the running time of indexing a collection of documents
...
5

Summary

Adding the ability to change the value associated with a name complicates our
evaluation rules, but enables simpler and more efﬁcient solutions to many problems
...

Once we add assignment to our language, the order in which things happen
affects the value of some expressions
...

repeat-until

194

9
...
Summary

The problem with mutation is that it makes it much tougher to reason about
the meaning of an expression
...
This helps
manage some of the complexity resulting from mutation by limiting the places
where data may be accessed and modiﬁed
...
You can talk about it on a
blackboard until you are blue in the face, and people would say, “Oh, yes, but why do you need
that?”
...
To this day I
can still remember people only realizing when they saw a real demo, say, “Hey, it talks back
...
”
Fernando Corbat´ (who worked on Whirlwind in the 1950s)
o
Charles Babbage Institute interview, 1989

So far, we have seen two main approaches for solving problems:
Functional programming
Break a problem into a group of simpler procedures that can be composed
to solve the problem (introduced in Chapter 4)
...

All computational problems involve both data and procedures
...
Any data-focused design must involve some procedures to perform
computations using that data
...
By packaging procedures and data together, object-oriented programming overcomes a weakness of both previous approaches: the data
and the procedures that manipulate it are separate
...
1 We build an object system ourselves, taking advantage
of the stateful evaluation rules
...
In Chapter 11, we see how Python provides language support for objectoriented programming
...
Section 10
...
The MzScheme language does provide additional constructs for supporting objects, but we do not cover them in this book
...
1
...
Section 10
...

10
...
1:
(deﬁne (update-counter!) (set! counter (+ counter 1)) counter)
Every time an application of update-counter! is evaluated, we expect to obtain a
value one larger than the previous application
...
Hence, we can only
have one counter: there is only one counter place in the global environment
...
For each new
counter, we would need a new variable and a new procedure
...
1
...
Then we could create as many counters as we want, each with
its own counter variable to manipulate
...
2
...

The make-counter procedure creates a counter object that packages the count
variable with the procedure that increases its value:
(deﬁne (make-counter)
((lambda (count)
(lambda () (set! count (+ 1 count)) count))
0))

encapsulation

Each application of make-counter produces a new object that is a procedure
with its own associated count variable
...

The count place is encapsulated within the counter object
...

An equivalent make-counter deﬁnition uses a let expression to make the initialization of the count variable clearer:
(deﬁne (make-counter)
(let ((count 0))
(lambda () (set! count (+ 1 count)) count)))

197

Chapter 10
...
1 depicts the environment after creating two counter objects and applying one of them
...
1
...
1
...
To produce more useful objects, we need a way to combine state
with multiple behaviors
...
We do this by adding a message parameter to the procedure
produced by make-counter:
(deﬁne (make-counter)
(let ((count 0))
(lambda (message)
(if (eq? message ’get-count) count
(if (eq? message ’reset!) (set! count 0)
(if (eq? message ’next!) (set! count (+ 1 count))
(error "Unrecognized message")))))))
Like the earlier make-counter, this procedure produces a procedure with an environment containing a frame with a place named count
...

The message parameter is a Symbol
...
Two Symbols are equal, as determined
by the eq? procedure, if their sequences of characters are identical
...
This makes

198

10
...
Packaging Procedures and State

symbols a more efﬁcient way of selecting object behaviors than Strings, and a
more memorable way to select behaviors than using Numbers
...
For objects with many behaviors, the nested if expressions can get quite cumbersome
...
The conditional expression (cond)
has no value
...
The value of such a conditional expression is the value of the
if expression:
(if Expression p1 Expressionc1 (cond Rest))
This evaluation rule is recursive since the transformed expression still includes a
conditional expression, but uses the empty conditional with no value as its base
case
...
When used as the predicate in the last conditional clause
it means the same thing as true
...

Sending messages
...

Chapter 10
...
So, (ask counter ’next!) is equivalent to (counter ’next!), but looks more like passing a message to an object than
applying a procedure
...

Messages with parameters
...
For example, we may want to support a message adjust!
that increases the counter value by an input value
...
The procedures
for reset!, next!, and get-count take no parameters; the procedure for adjust!
takes one parameter
...

So far, all the procedures we have deﬁned take a ﬁxed number of operands
...
NameRest ) Expression)
The name following the dot is bound to all the remaining operands combined
into a list
...
If there are only n operand expressions, the value bound to NameRest is null
...

To apply the procedure we use the built-in apply procedure which takes two
inputs, a Procedure and a List
...

(deﬁne (ask object message
...
g
...

10
...
3

Object Terminology

An object is an entity that packages state and procedures
...
The

instance variables

200

10
...
Inheritance

instance variables are stored in places that are part of the application environment for the object
...
An object produced by (make-counter) deﬁnes a single instance variable, count
...
Methods may provide information about the state of an object (we call these observers) or modify
the state of an object (we call these mutators)
...

invoke

An object is manipulated using the object’s methods
...
This is analogous to applying a procedure
...
Classes are similar to data types
...
We also need procedures for creating new objects, such as
the make-counter procedure above
...
By convention,
we call the constructor for a class make- where is the name of
the class
...

constructors

Exercise 10
...
Modify the make-counter deﬁnition to add a previous! method
that decrements the counter value by one
...
2
...

set-increment!: Number → Void
Sets the increment amount for this counter to the input value
...

get-count: Void → Number
Outputs the current value of the counter
...
2

Inheritance

Objects are particularly well-suited to programs that involve modeling real or
imaginary worlds such as graphical user interfaces (modeling windows, ﬁles,

201

Chapter 10
...

Objects in the real world (or most simulated worlds) are complex
...
It might include
many different kinds of objects including places (which are stationary and may
contain other objects), things, and people
...
All objects in our game have a name
and a location; some objects also have methods for talking and moving
...
It would also make it hard to add a new behavior
to all of the objects in the game without modifying many different procedures
...
For example, a student is a kind of person
...
To implement a student class,
we want to reuse methods from the person class without needing to duplicate
them in the student implementation
...
The
reused class is known as the superclass, so person is the superclass of student
...
2
Figure 10
...

The arrows point from subclasses to their superclass
...
For example,
person is a subclass of movable-object, but a superclass of student and professor
...
2
...

Our goal is to be able to reuse superclass methods in subclasses
...
This can
continue up the superclass chain
...

2 Some object systems (such as the one provided by the C++ programming language) allow a class

to have more than one superclass
...
If a class has two superclasses and
both deﬁne methods with the same name, it may be ambiguous which of the methods is used when
it is invoked on an object of the subclass
...

subclass
superclass

202

10
...
Inheritance

Hence, if the sim-object class deﬁnes a get-name method, when the get-name
method is invoked on a student object, the implementation of get-name in the
sim-object class will be used (as long as neither person nor movable-object deﬁnes its own get-name method)
...
Inheritance is a powerful way to obtain
many different objects with a small amount of code
...
2
...

The make-sub-object procedure does this
...
If the method is deﬁned by
the subclass, the result will be the subclass method
...

(deﬁne (make-sub-object super subproc)
(lambda (message)
(let ((method (subproc message)))
(if method method (super message)))))
When an object produced by (make-sub-object obj proc) is applied to a message, it ﬁrst applies the subclass dispatch procedure to the message to ﬁnd an
appropriate method if one is deﬁned
...

References to self
...
Otherwise, the object will
lose its special behaviors as it is moves up the superclasses
...
To support this,
we modify the ask procedure to pass in the object parameter to the method:
(deﬁne (ask object message
...
So, the counter constructor is deﬁned as:
(deﬁne (make-counter)
(let ((count 0))
(lambda (message)
(cond
((eq? message ’get-count) (lambda (self ) count))
((eq? message ’reset!) (lambda (self ) (set! count 0)))
((eq? message ’next!) (lambda (self ) (set! count (+ 1 count))))
(else (error "Unrecognized message"))))))
Subclassing counter
...
Objects

203

we need to also provide a set-count! method for setting the value of the counter
to an arbitrary value
...
With inheritance, we can deﬁne
make-adjustable-counter as a subclass of make-counter without repeating any
code:
(deﬁne (make-adjustable-counter)
(make-sub-object
(make-counter)
(lambda (message)
(cond
((eq? message ’adjust!)
(lambda (self val)
(ask self ’set-count! (+ (ask self ’get-count) val))))
(else false)))))
We use make-sub-object to create an object that inherits the behaviors from one
class, and extends those behaviors by deﬁning new methods in the subclass implementation
...
It cannot use (set! count (+ count val)) directly, though, since the
count variable is deﬁned in the application environment of its superclass object
and is not visible within adjustable-counter
...

Suppose we create an adjustable-counter object:
(deﬁne acounter (make-adjustable-counter))
Consider what happens when (ask acounter ’adjust! 3) is evaluated
...
2
...
The body of ask evaluates (object message) to ﬁnd the method associated with the input message, in this case ’adjust!
...
)
The result of applying subproc to message is the adjust! procedure deﬁned by
make-adjustable-counter:
(lambda (self val)
(ask self ’set-count! (+ (ask self ’get-count) val)))
Since this is not false, the predicate of the if expression is non-false and the value
of the consequent expression, method, is the result of the procedure application
...
The object is the acounter object, and the args is the list of the extra
parameters, in this case (3)
...
The body of
the adjust! method uses ask to invoke the set-count! method on the self object
...
In this case, the subclass implementation provides no set-count!
method so the result of (subproc message) in the application of the subclass object is false
...
This
evaluates to the method associated with the set-count! message in the superclass
...

We can deﬁne new classes by deﬁning subclasses of previously deﬁned classes
...
If the message to a adjustable-counter object is not previous!,
the method from its superclass, adjustable-counter is used
...
Since the
subclass implementation does not provide an adjust! method, this results in the
superclass method being applied
...
2
...
When a subclass replaces a method deﬁned
by its superclass, then the subclass method overrides the superclass method
...

For example, we can deﬁne a subclass of reversible-counter that is not allowed
to have negative counter values
...
Objects

205

instead of setting the counter to the new value, it produces an error message and
maintains the counter at zero
...

(deﬁne (make-positive-counter)
(make-subobject
(make-reversible-counter)
(lambda (message)
(cond
((eq? message ’set-count!)
(lambda (self val) (if (< val 0) (error "Negative count")

...
? When the value to set the count to is not
negative, what should happen is the count is set as it would be by the superclass set-count! method
...
There is also no way to invoke the superclass’ set-count! method
since it has been overridden by positive-counter
...
We can do this by adding a get-super method to the object produced by
make-sub-object:
(deﬁne (make-sub-object super subproc)
(lambda (message)
(if (eq? message ’get-super)
(lambda (self ) super)
(let ((method (subproc message)))
(if method method (super message))))))
Thus, when an object produced by make-sub-object is passed the get-super message it returns a method that produces the super object
...

With the get-super method we can deﬁne the set-count! method for positivecounter, replacing the
...
3 shows the subclasses that inherit from counter and the methods they
deﬁne or override
...
2
...
3
...

For the ﬁrst ask application, the next! method is invoked on a positive-counter
object
...
The reversible-counter implementation also does not deﬁne a next! method, so the message is passed up to its
superclass, adjustable-counter
...
The counter class deﬁnes
a next! method, so that method is used
...
Since the positive-counter
class does not deﬁne a previous! method, the message is sent to the superclass
...
Its implementation involves an invocation of the adjust! method: (ask self ’adjust! −1)
...
Hence, the adjust! method is found from the positive-counter class implementation
...
Hence, the second invocation of previous! produces
the “Negative count” error and does not adjust the count to −1
...
The method used for an invocation
depends on the self object
...
It depends on the actual self object: if it is a positive-counter object, the adjust! method deﬁned by positivecounter is used; if it is a reversible-counter object, the adjust! method deﬁned by
adjustable-counter class (the superclass of reversible-counter) is used
...
It enables us to
use the same code to produce many different behaviors by overriding methods
in subclasses
...
For example, we cannot make any claims about what the
previous! method in reversible-counter actually does without knowing what the
adjust! method does in all subclasses of reversible-counter
...
Objects

207

The value of encapsulation and inheritance increases as programs get more complex
...

Exercise 10
...
Deﬁne a countdown class that simulates a rocket launch countdown: it starts at some initial value, and counts down to zero, at which point the
rocket is launched
...
4
...
2 as a subclass of counter
...
5
...
For example, (make-parameterizable-counter 0
...
1 after one invocation of the next!
method
...
3

Object-Oriented Programming

Object-oriented programming is a style of programming where programs are
broken down into objects that can be combined to solve a problem or model a
simulated world
...

During World War II, the US Navy began to consider the possibility of building a
airplane simulator for training pilots and aiding aircraft designers
...
The Navy wanted a simulator that could be used for multiple
airplanes and could accurately model the aerodynamics of different airplanes
...
The initial plans
called for an analog computer which would need to be manually reconﬁgured
to change the aerodynamics model to a different airplane
...

Before Whirlwind, all digital computers operated as batch processors where a
programmer creates a program (typically described using a stack of punch cards)
and submits it to the computer
...
A
ﬂight simulator, though, requires direct interaction between a human user and
the computer
...
It
was the ﬁrst interactive programmable digital computer
...
3
...
Jay Forrester invented a much faster memory based
known as magnetic-core memory
...

The interactiveness of the Whirlwind computer opened up many new possibilities for computing
...
The successor to this
machine became the TX-2, and Ken Olsen went on to found Digital Equipment
Corporation (DEC) which pioneered the widespread use of moderately priced
computers in science and industry
...

Ivan Sutherland, then a graduate student at MIT, had an opportunity to use the
TX-2 machine
...
Sketchpad allowed users to draw
and manipulate objects on the screen using a light pen
...
These general functions give
the Sketchpad system the ability to operate on a wide range of problems
...
Each of the general functions implemented in the Sketchpad
system abstracts, in some sense, some common property of pictures independent of the speciﬁc subject matter of the pictures themselves
...
In what has become known as “the mother of all demos”, Engelbart and
his colleagues demonstrated a networked, graphical, interactive computing system to the general public for the ﬁrst time in 1968
...

Sketchpad also inﬂuenced Alan Kay in developing object-oriented programming
...
Simula was designed as a language for implementing simulations
...

In 1966, Alan Kay entered graduate school at the University of Utah, where Ivan
Sutherland was then a professor
...
On it was a pile of tapes and listings,
and a note: “This is the Algol for the 1108
...
Please make it
work
...
The documentation was incomprehensible
...
Objects

209

which in fact it was
...
Finally, another graduate student and I unrolled the program listing 80 feet down the hall and
crawled over it yelling discoveries to each other
...
A few days later, that provided the clue
...
There were descriptions that acted like masters and they could create instances, each of
which was an independent entity
...
For the ﬁrst time
I thought of the whole as the entire computer and wondered why anyone
would want to divide it up into weaker things called data structures and
procedures
...
Why not thousands of them, each simulating a useful structure?
Alan Kay went on to design the language Smalltalk, which became the ﬁrst widely
used object-oriented language
...

In Smalltalk, everything is an object, and all computation is done by sending
messages to objects
...
Here is Smalltalk code for implementing a
counter object:
class name counter
instance variable names count
new count <− 0
next count <− count + 1
current ˆ count
The new method is a constructor analogous to make-counter
...

Nearly all widely-used languages today provide built-in support for some form
of object-oriented programming
...
count = 0
def rest(self): self
...
count = self
...
count
The constructor is named init
...

10
...
By packaging state and procedures together, we can encapsulate state in
ways that enable more elegant and robust programs
...
The best
way to predict the
future is to invent it
...
4
...
Programming using objects and
inheritance enables a style of problem solving known as object-oriented programming in which we solve problems by modeling problem instances using
objects
...

11

Interpreters

“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to
mean - nothing more nor less
...
”
Lewis Carroll, Through the Looking Glass
The tools we use have a profound (and devious!) inﬂuence on our
thinking habits, and, therefore, on our thinking abilities
...
Different languages encourage different ways of thinking and lead to different thoughts
...
We can solve a problem by
designing a language in which it is easy to express a solution and implementing
an interpreter for that language
...
As input, it takes a speciﬁcation of a program in
some language
...
Implementing an interpreter further blurs the line between data and programs, that
we ﬁrst crossed in Chapter 3 by passing procedures as parameters and returning new procedures as results
...
The interpreter determines the meaning of the program
...
Implement a parser that takes as input a string representation of a program in the target language and produces a structural parse of the input
program
...
Section 11
...

2
...
The evaluator should implement
the target language’s evaluation rules
...
3 describes our evaluator
...
1 The Charme
language is very simple, yet is powerful enough to express all computations (that
is, it is a universal programming language)
...
Because the computer on which “Schemer” was implemented only allowed six-letter ﬁle
names, its name was shortened to “Scheme”
...
Depending on the programmer’s state of mind, the language
name can be pronounced either “charm” or “char me”
...
1
...
The full grammar and evaluation rules
for Charme are given in Section 11
...
The evaluator implements those evaluation rules
...
4 illustrates how changing the evaluation rules of our interpreter opens
up new ways of programming
...
1

Python

We could implement a Charme interpreter using Scheme or any other universal programming language, but implement it using the programming language
Python
...
2 Python is freely available from http://www
...
org
...
The ﬁrst reason is pedagogical: it is instructive to learn new languages
...
This is true for natural languages,
but also true for programming languages
...
All of the major
concepts we have covered so far apply to Python nearly identically to how they
apply to Scheme, but seeing them in the context of a different language should
make it clearer what the fundamental concepts are and what are artifacts of a
particular programming language
...
These include built-in support
for objects and imperative control structures
...

The grammar for Python is quite different from the Scheme grammar, so Python
programs look very different from Scheme programs
...
This chapter does
not describe the entire Python language, but introduces the grammar rules and
evaluation rules for the most important Python constructs as we use them to
implement the Charme interpreter
...
Both languages can
express all mechanical computations
...
Conversely, every Python program has an equivalent Scheme program
...
Since we can implement
an interpreter for a Scheme-like language in Python, we know we can express
every computation that can be expressed by a program in that language with an
equivalent Python program: the Charme interpreter with the Charme program
as its input
...

We introduce Python using one of the procedures in our inter-

2 The name Python

alludes to Monty Python’s Flying Circus
...
Interpreters

213

preter implementation
...
The ﬁrst part is the tokenizer
...

A token is an indivisible syntactic unit
...
Tokens are separated by whitespace (spaces,
tabs, and newlines)
...

The tokenize procedure below takes as input a string s in the Charme target language, and produces as output a list of the tokens in s
...

def tokenize(s): #
# starts a comment until the end of the line
current = ' ' #
initialize current to the empty string (two single quotes)
tokens = [] #
initialize tokens to the empty list
for c in s: #
for each character, c, in the string s
if c
...
append(current) #
add it to the list
current = '' #
reset current token to empty string
elif c in '()': #
otherwise, if c is a parenthesis
if len(current) > 0: #
end the current token
tokens
...
append(c) #
add the parenthesis to the token list
else: #
otherwise (it is an alphanumeric)
current = current + c #
add the character to the current token
# end of the for loop
reached the end of s
if len(current) > 0: #
if there is a current token
tokens
...
1
...
Unlike expressions, a statement
has no value
...
It is more imperative than that used with Scheme: instead
of composing expressions in ways that pass the result of one expression as an
operand to the next expression, Python procedures consist mostly of statements,
each of which alters the state in some way towards reaching the goal state
...

Deﬁning a procedure in Python is similar to deﬁning a procedure in Scheme,
except the syntax is different:

tokenizer

token

214

11
...
Python

ProcedureDeﬁnition ::⇒
Parameters
::⇒
Parameters
::⇒
SomeParameters
::⇒
SomeParameters
::⇒
Block
Block
Statements
MoreStatements
MoreStatements

::⇒
::⇒
::⇒
::⇒
::⇒

def Name ( Parameters ) : Block
SomeParameters
Name
Name , SomeParameters
Statement
indented(Statements)
Statement MoreStatements
Statement MoreStatements

Unlike in Scheme, whitespace (such as new lines) has meaning in Python
...
Indentation within a line also matters
...
The Python interpreter reports an error if the indentation does not match the logical structure of the code
...
We use indented(elements) to indicate that the elements are indented
...
The statements are all indented one level inside the block’s
indentation
...

The evaluation rule for a procedure deﬁnition is similar to the rule for evaluating
a procedure deﬁnition in Scheme
...
The procedure deﬁnition,
def Name (Parameters): Block
deﬁnes Name as a procedure that takes as inputs the Parameters and
has the body expression Block
...
, deﬁnes a procedure named tokenize
that takes a single parameter, s
...
The body of the procedure uses several different types of Python
statements
...
For example, the assignment statement, tokens = [] assigns the value [] (the empty list)
to the name tokens
...
Anything that can hold a value (such as an element of
a list) can be the target of an assignment
...
Interpreters

The evaluation rule for an assignment statement is similar to Scheme’s evaluation rule for assignments: the meaning of x = e in Python is similar to the meaning of (set! x e) in Scheme, except that in Python the target Name need not exist
before the assignment
...

Python Evaluation Rule: Assignment
...
If no such place exists, create a new
place with that name
...
Python supports many different
kinds of expressions for performing arithmetic and comparisons
...
This deﬁnes an order of precedence for parsing expressions
...
In Scheme, the expressions (+
3 (∗ 4 5)) and (∗ (+ 3 4) 5) are clearly different and the parentheses group the
subexpressions
...

Supporting precedence makes the Python grammar rules more complex since
they must deal with * and + differently, but it makes the meaning of Python expressions match our familiar mathematical interpretation, without needing to
clutter expressions with parentheses
...
This makes the
multiplication operator have higher precedence than the addition operator
...
The replacement rules that happen ﬁrst have lower precedence,
since their components must be built from the remaining pieces
...
1
...
For
example, (3 + 4) * 5 is parsed as the PrimaryExpression, (3 + 4), times 5, so evaluates to 35; without the parentheses, 3 + 4 * 5 is parsed as 3 plus the MultExpression, 4 * 5, so evaluates to 23
...
Numbers in Python are
similar (but not identical) to numbers in Scheme
...
The evaluation rule for a name in Python is similar to the stateful rule for evaluating a
name in Scheme3
...
1
...

a
...
3 > 2 + 2
c
...
(3 * 6 >= 15) == True
Exercise 11
...
Do comparison expressions have higher or lower precedence
than addition expressions? Explain why using the grammar rules
...
1
...
We describe three of the most useful
data types here: lists, strings, and dictionaries
...
Python provides a list datatype similar to lists in Scheme, except instead
of building lists from simpler parts (that is, using cons pairs in Scheme), the
Python list type is a built-in datatype
...
3
...
For example, [] denotes an
empty list and [1, 2] denotes a list containing two elements
...

Elements can be selected from a list using the list subscript expression:
PrimaryExpression ::⇒ SubscriptExpression
SubscriptExpression ::⇒ PrimaryExpression [ Expression ]
A subscript expression evaluates to the element indexed by value of the inner
expression from the list
...
1 of the Python Reference
Manual), however, which we do not go into here
...
Interpreters

217

The expression p[0] in Python is analogous to (car p) in Scheme
...
The reason for this is that Python stores lists
internally differently from how Scheme stores as chains of pairs
...

A subscript expression can also select a range of elements from the list:
SubscriptExpression ::⇒ PrimaryExpression [ Bound Low : Bound High ]
Bound
::⇒ Expression |
Subscript expressions with ranges evaluate to a list containing the elements between the low bound and the high bound
...
If the high bound is missing, the high bound
is the end of the list
...

Python lists are mutable (the value of a list can change after it is created)
...
append(current) to append an element to the tokens list
...

Strings
...
As in Scheme, a String is a sequence of characters
...
1
...
Once a string
is created its value cannot change
...

Strings can be enclosed in single quotes (e
...
, 'hello'), double quotes (e
...
, ''hello''),
and triple-double quotes (e
...
, '' '' ''hello'' '' ''; a string inside triple quotes can span
multiple lines)
...
The input, s, is a string object
...
In tokenize, we
use current = current + c to update the value of current to include a new character
...
Instead, appending a character to a string involves creating a
new string object
...
A dictionary is a lookup-table where values are associated with
keys
...
We did not use the dictionary type
in tokenize, but it is very useful for implementing frames in the evaluator
...
The empty dictionary is {}
...
For example,
birthyear = {}
birthyear['Euclid'] = '300BC'
birthyear['Ada'] = 1815
birthyear['Alan Turing'] = 1912
birthyear['Alan Kay'] = 1940
deﬁnes birthyear as a dictionary containing four entries
...

We can obtain the value associated with a key in the dictionary using a subscript expression
...
We can
replace the value associated with a key using the same syntax as adding a keyvalue pair to the dictionary
...

The dictionary type also provides a method has key that takes one input and
produces a Boolean indicating if the dictionary object contains the input value
as a key
...
has key('John Backus') ⇒ False
birthyear
...
Interpreters

scale as the size of the dictionary increases
...
The number is used to index into a structure similar
to a Python list (so it has constant time to retrieve any element)
...

11
...
3

Applications and Invocations

The grammar rules for expressions that apply procedures are:
PrimaryExpression ::⇒
CallExpression
::⇒
ArgumentList
::⇒
ArgumentList
::⇒
SomeArguments ::⇒
SomeArguments ::⇒

CallExpression
PrimaryExpression ( ArgumentList )
SomeArguments
Expression
Expression , SomeArguments

In Python, nearly every data value (including lists and strings) is an object
...
To invoke a
method we use the same rules, but the PrimaryExpression of the CallExpression
speciﬁes an object and method:
PrimaryExpression ::⇒ AttributeReference
AttributeReference ::⇒ PrimaryExpression
...

The tokenize procedure includes ﬁve method applications, four of which are
tokens
...
The object reference is tokens, the list of tokens in the
input
...

The other method invocation is c
...
The isspace method for the string datatype returns true
if the input string is non-empty and all characters in the string are whitespace
(either spaces, tabs, or newlines)
...
It is a procedure, not a method; the input
object is passed in as a parameter
...

11
...
4

Control Statements

Python provides control statements for making decisions, looping, and for returning from a procedure
...
Python’s if statement is similar to the conditional expression in
Scheme:

220

11
...
Python

Statement ::⇒
IfStatement ::⇒
Elifs
::⇒
Elifs
::⇒
OptElse
::⇒
OptElse
::⇒

IfStatement
if ExpressionPredicate : Block Elifs OptElse
elif ExpressionPredicate : Block Elifs
else : Block

Unlike in Scheme, there is no need to have an alternate clause since the Python
if statement does not need to produce a value
...
First, evaluate the ExpressionPredicate
...
Otherwise, each of the elif
predicates is evaluated in order
...
If none
of the elif predicates evaluates to a true value, the else Block is evaluated
if there is one
...
isspace():
...

else: current = current + c
The ﬁrst if predicate tests if the current character is a space
...
The consequent Block is itself an IfStatement:
if len(current) > 0:
tokens
...
This IfStatement has no elif or else clauses, so if the predicate is false, there is nothing
to do
...
A for statement provides a way of iterating through a set of
values, carrying out a body block for each value
...
First evaluate the Expression which must
produce a value that is a collection
...

Other than the ﬁrst two initializations, and the ﬁnal two statements, the bulk
of the tokenize procedure is contained in a for statement
...
The string s is the input string, a collection of

Chapter 11
...
So, the loop will repeat once for each character in s, and the value of
c is each character in the input string (represented as a singleton string), in turn
...
In Scheme, the body of a procedure is an expression and the
value of that expression is the result of evaluating an application of the procedure
...

Statements have no value, so there is no obvious way to decide what the result
of a procedure application should be
...

The grammar for the return statement is:
Statement
::⇒ ReturnStatement
ReturnStatement ::⇒ return Expression
A return statement ﬁnishes execution of a procedure, returning the value of the
Expression to the caller as the result
...
It returns the value of the tokens list to the caller
...
2

Parser

The parser takes as input a Charme program string, and produces as output a
nested list that encodes the structure of the input program
...

The next step is to take the list of tokens and produce a data structure that encodes the structure of the input program
...
But, unlike the list returned by tokenize which is a ﬂat list containing
the tokens in order, the list returned by parse is a structured list that may have
lists (and lists of lists, etc
...

Charme’s syntax is very simple, so the parser can be implemented by just breaking an expression into its components using the parentheses and whitespace
...
For example, if the input string is
(deﬁne square (lambda (x) (∗ x x)))
the output of tokenizer is the list:
['(', 'deﬁne', 'square', '(', 'lambda', '(', 'x', ')', '(', '*', 'x', 'x', ')', ')', ')']
The parser structures the tokens according to the program structure, producing
a parse tree that encodes the structure of the input program
...
For the example, the resulting parse tree is:
['deﬁne',
'square',
[ 'lambda',
['x'],
['*', 'x', 'x'] ] ]

222

11
...
Parser

Here is the deﬁnition of parse:
def parse(s):
def parse tokens(tokens, inner):
res = []
while len(tokens) > 0:
current = tokens
...
append (parse tokens(tokens, True))
elif current == ')':
if inner: return res
else:
error('Unmatched close paren: ' + s)
return None
else:
res
...
The output is a list of the
parenthesized expressions in the input
...

recursive descent

The parse procedure implements a recursive descent parser
...

The parse tokens procedure takes two inputs: tokens, a list of tokens (that results
from the tokenize procedure); and inner, a Boolean that indicates whether the
parser is inside a parenthesized expression
...
All of the
recursive calls result from encountering a '(', so the value passed as inner is True
for all the recursive calls
...
Then, the while statement iterates as long as the token
list contains at least one element
...
Interpreters

223

The ﬁrst statement of the while statement block assigns tokens
...

The pop method of the list takes a parameter that selects an element from the
list
...
The pop method also mutates
the list object by removing the selected element
...
pop(0) returns the
ﬁrst element of the tokens list and removes that element from the list
...
pop(0) expression
is evaluated the number of elements in the token list is reduced by one
...
The result is a list of tokens, which is appended to the result
...
If it is inside an expression (that is,
an open parenthesis has been encountered with no matching close parenthesis
yet), the close parenthesis closes the inner expression, and the result is returned
...

The else clause deals with all other tokens by appending them to the list
...
This would mean there was an open parenthesis without a
corresponding close, so an error is reported
...

11
...
The evaluator implements the evaluation rules
for the target language
...
We next consider
each evaluation rule in turn
...
3
...
If the expression is a number, it is a string of digits
...
3
...
isdigit()

Here, we use the built-in function isinstance to check if expr is of type str
...
If it evaluates to a true value, the right
operand is evaluated, and the value of the and expression is the value of its right
operand
...
isdigit() in the right
operand, since it is only evaluated if the left operand evaluated to a true value,
which means expr is a string
...
We deﬁne the procedure is primitive procedure using callable, a procedure that returns true only
for callable objects such as procedures and methods:
def is primitive procedure(expr):
return callable(expr)
The evaluation rule for a primitive is identical to the Scheme rule:
Charme Evaluation Rule 1: Primitives
...

We need to implement the pre-deﬁned values in our Charme interpreter
...
The int(s) constructor takes a string as its input and
outputs the corresponding integer:
def eval primitive(expr):
if is number(expr): return int(expr)
else: return expr
The else clause means that all other primitives (in Charme, this is only primitive procedures and Boolean constants) self-evaluate: the value of evaluating a
primitive is itself
...
For example, here is the primitive plus procedure
that is associated with the + primitive procedure:
def primitive plus (operands):
if (len(operands) == 0): return 0
else: return operands[0] + primitive plus (operands[1:])
The input is a list of operands
...
For numbers, the values are Python integers, so we
can use the Python + operator to add them
...
Interpreters

225

evaluate to 0 when there are no operands, and otherwise to recursively add all
of the operand values
...
3
...
The grammar rule for an if expression is:
IfExpression ::⇒ (if ExpressionPredicate
ExpressionConsequent
ExpressionAlternate )
The expression object representing an if expression should be a list containing
three elements, with the ﬁrst element matching the keyword if
...

The is special form procedure takes an expression and a keyword and outputs
a Boolean
...
3
...
We recognize an if expression by the if token at the beginning of the expression:
def is if(expr):
return is special form(expr, 'if')
The evaluation rule for an if expression is:4
Charme Evaluation Rule 5: If
...

This procedure implements the if evaluation rule:
def eval if(expr,env):
if meval(expr[1], env) != False: return meval(expr[2],env)
else: return meval(expr[3],env)

11
...
3

Deﬁnitions and Names

To evaluate deﬁnitions and names we need to represent environments
...

We use a Python class to represent an environment
...
In Scheme, we needed
to use a message-accepting procedure to do this
...
We deﬁne the Environment class for representing an environment
...

The dictionary datatype provides a convenient way to implement a frame
...
It initializes the frame of the new
environment to the empty dictionary using self
...

The add variable method either deﬁnes a new variable or updates the value associated with a variable
...

The lookup variable method ﬁrst checks if the frame associated with this environment has a key associated with the input name
...
Otherwise, if the environment has a parent, the value associated with the name is the
value of looking up the variable in the parent environment
...
The else clause
4 We number the Charme evaluation rules using the numbers we used for the analogous Scheme
evaluation rules, but present them in a different order
...
Interpreters

227

addresses the situation where the name is not found and there is no parent environment (since we have already reached the global environment) by reporting
an evaluation error indicating an undeﬁned name
...
parent = parent
self
...
frame[name] = value
def lookup variable(self, name):
if self
...
has key(name): return self
...
parent): return self
...
lookup variable(name)
else: eval error('Undeﬁned name: %s' % (name))
Using the Environment class, the evaluation rules for deﬁnitions and name expressions are straightforward
...
add variable(name, value)
def is name(expr): return isinstance(expr, str)
def eval name(expr, env):
return env
...
3
...
Hence, to deﬁne the
evaluation rule for lambda expressions we need to deﬁne a class for representing
user-deﬁned procedures
...
params = params
self
...
env = env
def getParams(self): return self
...
body
def getEnvironment(self): return self
...
3
...
3
...
To perform an application,
we need to evaluate all the subexpressions of the application expression, and
then apply the result of evaluating the ﬁrst subexpression to the values of the
other subexpressions
...
The ﬁrst parameter to map is a procedure constructed using a lambda expression (similar in meaning, but not in syntax, to
Scheme’s lambda expression); the second parameter is the list of subexpressions
...
If the procedure is a
primitive, it “just does it”: it applies the primitive procedure to its operands
...
To apply a constructed procedure:
1
...

2
...
Evaluate each
operand expression in the environment or the application and initialize the value in each place to the value of the corresponding
operand expression
...
Evaluate the body of the procedure in the newly created environment
...

The mapply procedure implements the application rules for primitive and constructed procedures:
def mapply(proc, operands):
if (is primitive procedure(proc)): return proc(operands)
elif isinstance(proc, Procedure):
params = proc
...
getEnvironment())
if len(params) != len(operands):
eval error ('Parameter length mismatch: %s given operands %s'
% (str(proc), str(operands)))
for i in range(0, len(params)):
newenv
...
getBody(), newenv)
else: eval error('Application of non−procedure: %s' % (proc))

Chapter 11
...
3
...
The evaluation loop reads a string from the user using the Python built-in procedure
raw input
...
Then, it uses a for loop to iterate through the expressions
...

To initialize the global environment, we create an environment with no parent
and place variables in it corresponding to the primitives in Charme
...
add variable('true', True)
genv
...
add variable('+', primitive plus)
genv
...
add variable('*', primitive times)
genv
...
add variable('<', primitive lessthan)
while True:
inv = raw input('Charme> ')
if inv == 'quit': break
for expr in parse(inv):
print str(meval(expr, genv))
Here are some sample interactions with our Charme interpreter:
evalLoop()
Charme> (+ 2 2)
4
Charme> (deﬁne ﬁbo
(lambda (n)
(if (= n 1) 1
(if (= n 2) 1
(+ (ﬁbo (− n 1)) (ﬁbo (− n 2)))))))
None

Charme> (ﬁbo 10)
55

11
...
This enables a new problem-solving strategy: if
the solution to a problem cannot be expressed easily in an existing language,
deﬁne and implement an interpreter for a new language in which the problem
can be solved more easily
...
LazyCharme
changes the application evaluation rule so that operand expressions are not
evaluated until their values are needed
...
Lazy
evaluation enables many procedures which would otherwise be awkward to express to be deﬁned concisely
...
4
...

11
...
1
eager evaluation

Much of my work
has come from
being lazy
...

Larry Wall,
Programming Perl

Lazy Interpreter

The Charme interpreter as well as the standard Scheme language evaluate application expressions eagerly: all operand subexpressions are evaluated whether or
not their values are needed
...
Eager evaluation
means that any expression that does not always evaluate all of its subexpressions must be a special form
...

With lazy evaluation, an expression is evaluated only when its value is needed
...
Instead of evaluating all operand expressions, lazy evaluation delays
evaluation of an operand expression until the value of the parameter is needed
...
By delaying evaluation of operand expressions until their value is needed, we can enable programs to deﬁne procedures
that conditionally evaluate their operands like the if special form
...
To apply a constructed procedure:
1
...

2
...
Put a thunk in
that place, which is an object that can be used later to evaluate
the value of the corresponding operand expression if and when
its value is needed
...
Evaluate the body of the procedure in the newly created environment
...

The rule is identical to the Stateful Application Rule except for the bolded part
of step 2
...
We start by deﬁning a Python class for representing
thunks and then modify the interpreter to support lazy evaluation
...
A thunk keeps track of an expression whose evaluation is delayed until it is needed
...
Thus, a thunk is in one of two possible states: unevaluated and
evaluated
...
expr = expr

Chapter 11
...
env = env
self
...
evaluated:
self
...
expr, self
...
evaluated = True
return self
...
Since
the value of the expression may be needed when the evaluator is evaluating an
expression in some other environment, it also keeps track of the environment in
which the thunk expression should be evaluated in the env instance variable
...
Initially this value is False
...

The value method uses force eval (deﬁned later) to obtain the evaluated value of
the thunk expression and stores that result in value
...
To implement lazy evaluation, we change the evaluator so there are two different evaluation procedures: meval is the standard evaluation procedure (which leaves thunks in their unevaluated state), and force eval
is the evaluation procedure that forces thunks to be evaluated to values
...

In the meval procedure, a thunk evaluates to itself
...
If
the result is a thunk, it uses the Thunk
...
That method uses force eval to ﬁnd the value of the thunk
expression, so any thunks inside the expression will be recursively evaluated
...
value()
else: return val
Next, we change the application rule to perform delayed evaluation and change
a few other places in the interpreter to use force eval instead of meval to obtain
the actual values when they are needed
...
4
...
Hitherto we
have continued to
be as energetic as we
were before there
were machines; in
this we have been
foolish, but there is
no reason to go on
being foolish
forever
...

Hence, eval application uses force eval to obtain the value of the ﬁrst subexpression, but makes Thunk objects for the operand expressions
...
Hence, the deﬁnition for mapply
forces evaluation of the operands to a primitive procedure:
def mapply(proc, operands):
def dethunk(expr):
if is thunk(expr): return expr
...
# same as in Charme interpreter

Bertrand Russell, In
Praise of Idleness, 1932

To evaluate an if expression, it is necessary to know the actual value of the predicate expressions
...

The ﬁnal change to the interpreter is to force evaluation when the result is displayed to the user in the evalLoop procedure by replacing the call to meval with
force eval
...
4
...
For example, with lazy evaluation we can deﬁne a procedure
that behaves like the if expression special form
...
Interpreters

233

With eager evaluation, this would not work since all operands would be evaluated; with lazy evaluation, only the operand that corresponds to the appropriate
consequent or alternate expression is evaluated
...
This is possible since only those values of the apparently inﬁnite data
structure that are used need to be created
...
This means, we can deﬁne an inﬁnite list:
(deﬁne ints-from (lambda (n) (cons n (ints-from (+ n 1)))))
With eager evaluation, (ints-from 1) would never ﬁnish evaluating; it has no
base case for stopping the recursive applications
...
Hence, (ints-from 1) terminates and produces a seemingly
inﬁnite list, but only the evaluations that are needed are performed:
LazyCharme> (car (ints-from 1))
1
LazyCharme> (car (cdr (cdr (cdr (ints-from 1)))))
4

Some evaluations fail to terminate even with lazy evaluation
...
Every time an application of list-length is evaluated, it applies cdr to the input list, which causes
ints-from to evaluate another cons, increasing the length of the list by one
...

Lists with delayed evaluation can be used in useful programs
...
Using lazy evaluation, we can deﬁne a list
that is the inﬁnitely long Fibonacci sequence:5
(deﬁne ﬁbo-gen (lambda (a b) (cons a (ﬁbo-gen b (+ a b)))))
(deﬁne ﬁbos (ﬁbo-gen 0 1))
5 This example is based on Abelson and Sussman, Structure and Interpretation of Computer
Programs, Section 3
...
2, which also presents several other examples of interesting programs constructed using delayed evaluation
...
4
...

Another strategy for deﬁning the Fibonacci sequence is to ﬁrst deﬁne a procedure that merges two (possibly inﬁnite) lists, and then deﬁne the Fibonacci sequence recursively
...

(deﬁne merge-lists
(lambda (lst1 lst2 proc)
(if (null? lst1) null
(if (null? lst2) null
(cons (proc (car lst1) (car lst2))
(merge-lists (cdr lst1) (cdr lst2) proc))))))
We can deﬁne the Fibonacci sequence as the combination of two sequences,
starting with the 0 and 1 base cases, combined using addition where the second
sequence is offset by one position:
(deﬁne ﬁbos (cons 0 (cons 1 (merge-lists ﬁbos (cdr ﬁbos) +))))
The sequence is deﬁned to start with 0 and 1 as the ﬁrst two elements
...
This deﬁnition relies heavily on lazy evaluation; otherwise, the evaluation of (merge-lists ﬁbos (cdr ﬁbos) +) would never terminate: the input lists are
effectively inﬁnite
...
3
...

Exercise 11
...
Describe the inﬁnite list deﬁned by each of the following deﬁnitions
...
)
a
...
(deﬁne t (cons 1 (merge-lists t (merge-lists t t +) +)))
c
...
(deﬁne doubles (merge-lists (ints-from 1) twos ∗))
Exercise 11
...
[ ] A simple procedure known as the Sieve of Eratosthenes for
ﬁnding prime numbers was created by Eratosthenes, an ancient Greek mathematician and astronomer
...
Then, it repeats the following two steps
forever:
Eratosthenes

1
...

2
...

To carry out the procedure in practice, of course, the initial list of numbers must
be ﬁnite, otherwise it would take forever to cross off all the multiples of 2
...
Interpreters

with delayed evaluation, we can implement the Sieve procedure on an effectively inﬁnite list
...
You may ﬁnd the
list-ﬁlter and merge-lists procedures useful, but will probably ﬁnd it necessary
to deﬁne some additional procedures
...
5

Summary

Languages are tools for thinking, as well as means to express executable programs
...

To implement a language, we need to implement a parser that carries out the
grammar rules and an evaluator that implements the evaluation rules
...
Changing the evaluation rules changes what programs mean, and enables new approaches to solving problems
...
5
...
This conviction of the
solvability of every mathematical problem is a powerful incentive to the worker
...
Seek its solution
...

David Hilbert, 1900

In this chapter we consider the question of what problems can and cannot be
solved by mechanical computation
...

computability

Section 12
...
2 introduces the Halting Problem, a problem that cannot be solved by any algorithm
...
3 sketches Alan Turing’s proof that the Halting Problem is
noncomputable
...
4 discusses how to show other problems are noncomputable
...
1

Mechanizing Reasoning

Humans have been attempting to mechanize reasoning for thousands of years
...

Euclid went beyond Aristotle by developing a formal axiomatic system
...
The goal of an axiomatic system is to codify knowledge in some
domain
...

Euclid started with ﬁve axioms (more commonly known as postulates); an example axiom is: A straight line segment can be drawn joining any two points
...
An example of a common notion is: The whole is
greater than the part
...
g
...
1
...
A proposition is a statement that is stated precisely enough
to be either true or false
...

proof

A proof of a proposition in an axiomatic system is a sequence of steps that ends
with the proposition
...
Most of Euclid’s proofs are constructive: propositions state that a
thing with a particular property exists, and proofs show steps for constructing
something with the stated property
...

consistent

A consistent axiomatic system is one that can never derive contradictory statements by starting from the axioms and following the inference rules
...
If the system cannot generate any contradictory pairs of statements it is
consistent
...
This means if a given proposition is
true, some proof for that proposition can be found in the system
...
This means, for any proposition P, a complete axiomatic system would be able to derive either P or not P
...
An ideal axiomatic system would be
complete and consistent: it would derive all true statements and no false statements
...
Euclid’s system is consistent but not complete for the set of propositions about geometry
...

Figure 12
...
The one on the left one incomplete:
there are some propositions that can be stated in the system that are true for
which no valid proof exists in the system
...
Once a single contradictory proposition can be proven the system becomes completely useless
...
Hence, only consistent systems are interesting and we
focus on whether it is possible for them to also be complete
...
Towards the end of the 19th century, many mathematicians
sought to systematize mathematics by developing a consistent axiomatic system that is complete for some area of mathematics
...

239

Chapter 12
...

All propositions
True
propositions
Provable
propositions

Inconsistent system: there is a valid
proof of a proposition that is false
...
1
...

Bertrand Russell discovered a problem with Frege’s system, which is now known
as Russell’s paradox
...
For example, the set of all prime numbers
does not contain itself as a member, so it is a member of R
...
This set
contains all sets, since a set is not a prime number, so it must contain itself
...
Hence, R cannot be a member of itself, and the statement that R
is a member of R must be false
...
This is a contradiction, so the statement that R
is not a member of R must be false
...
Symbolically, we summarize the
paradox: for any set s, s ∈ R if and only if s ∈ s
...

/
Whitehead and Russell attempted to resolve this paradox by constructing their
system to make it impossible to deﬁne the set R
...
Each set has an associated type, and a set cannot contain members of its
own type
...

• A type-n set can only contain sets of type n − 1 and below
...
Since R is a type k set, it cannot contain itself, since it
cannot contain any type k sets
...
1
...

Whitehead and Russell attempted to derive all true mathematical statements
about numbers and sets starting from a set of axioms and formal inference rules
...

For example, consider this paradox named for the Cretan philosopher Epimenides
who was purported to have said “All Cretans are liars”
...
Another version is the self-referential sentence: this statement is
false
...
If the statement is false, then it is a true statement (also a contradiction)
...

12
...
1

G¨ del’s Incompleteness Theorem
o

Kurt G¨ del was born in Brno (then in Austria-Hungary, now in the Czech Reo
public) in 1906
...
More generally, G¨ del showed that
o
no powerful axiomatic system could be both complete and consistent: no matter what the axiomatic system is, if it is powerful enough to express a notion of
proof, it must also be the case that there exist statements that can be expressed
in the system but cannot be proven either true or false within the system
...
The statement G¨ del found:
GPM : Statement GPM does not have any proof in the system
of Principia Mathematica
...
It makes
no sense for GPM to be either true or false:
Statement GPM is provable in the system
...
The system is inconsistent: it can be used to prove a
statement that is not true
...

Since GPM cannot be proven in the system, GPM is a true statement
...

The proof generalizes to any axiomatic system, powerful enough to express a
corresponding statement G:
G: Statement G does not have any proof in the system
...

To express G formally, we need to consider what it means for a statement to not
have any proof in the system
...
, TN
...
Computability

241

true so far
...
To be a proof of G,
TN must contain G
...

To express statement G an axiomatic system needs to be powerful enough to
express the notion that a valid proof does not exist
...

That is, in order for an axiomatic system to be complete and consistent, it must
be so weak that it is not possible to express this statement has no proof in the
system
...
2

The Halting Problem

G¨ del established that no interesting and consistent axiomatic system is capao
ble of proving all true statements in the system
...

procedure: A speciﬁcation of a series of actions
...

A procedure solves a problem if that procedure produces a correct output for
every possible input
...
So,
the question can be stated as: are there problems for which no procedure exists
that produces the correct output for every possible problem instance in a ﬁnite
amount of time?
A problem is computable if there exists an algorithm that solves the problem
...
If no such
algorithm exists, the problem is noncomputable
...
The way to show that
uncomputable problems exist is to ﬁnd one, similarly to the way G¨ del showed
o
unprovable true statements exist by ﬁnding an unprovable true statement
...

Output: If evaluating the input program would ever ﬁnish, output True
...

1 The terms decidable and undecidable are sometimes used to mean the same things as computable and noncomputable
...
Of course, Turing did not deﬁne the problem using a Python program since Python had
not yet been invented when Turing proved the Halting Problem was noncomputable in 1936
...

computable

noncomputable

242

12
...
The Halting Problem

Suppose we had a procedure halts that solves the Halting Problem
...

For example, halts('(+ 2 3)') should evaluate to True, halts('while True: pass') should
evaluate to False (the Python pass statement does nothing, but is needed to
make the while loop syntactically correct), and
halts(''''''
def ﬁbo(n):
if n == 1 or n == 2: return 1
else: return ﬁbo(n−1) + ﬁbo(n−2)
ﬁbo(60)
'''''')
should evaluate to True
...

The problem is knowing when to give up and output False
...

This argument is not sufﬁcient to prove that halts is noncomputable
...
To show
that halts is noncomputable, we need to show that it is impossible to implement
a halts procedure that would produce the correct output for all inputs in a ﬁnite
amount of time
...
We assume unbounded integers even though every actual
computer has a limit on the largest number it can represent
...

Knowing whether or not the program halts would settle an open problem known
as Goldbach’s Conjecture: every even integer greater than 2 can be written as the
sum of two primes
...
Euler reﬁned it and believed it to be true, but
couldn’t prove it
...
We could use a halts algorithm like this to resolve many other open
problems
...

Proving Noncomputability
...
One way to prove
non-existence of an X, is to show that if an X exists it leads to a contradiction
...
Computability

243

We prove that the existence of a halts algorithm leads to a contradiction, so no
halts algorithm exists
...
Consider this procedure:
def paradox():
if halts('paradox()'): while True: pass
The body of the paradox procedure is an if expression
...

The predicate expression cannot sensibly evaluate to either True or False:
halts(‘paradox()’) ⇒ True
If the predicate expression evaluates to True, the consequent block is evaluated producing a never-ending loop
...
But, this
means the result of halts('paradox()') was incorrect
...
It is empty, so evaluation terminates
...

Either result for halts(`paradox()') leads to a contradiction! The only sensible
thing halts could do for this input is to not produce a value
...
Any procedure we deﬁne to
implement halts must sometimes either produce the wrong result or fail to produce a result at all (that is, run forever without producing a result)
...

There is one important hole in our proof: we argued that because paradox does
not make sense, something in the deﬁnition of paradox must not exist and identiﬁed halts as the component that does not exist
...

This seems reasonable enough—they are built-in to Python so they seem to exist
...
Although we have been using
these and they seems to always work ﬁne, we have no formal model in which
to argue that evaluating True always terminates or that an if expression means
exactly what the evaluation rules say it does
...
All we have shown is that no Python
procedure exists that solves halts
...
In fact, we will see that no more powerful programming
language exists
...
This is why Alan Turing developed a model of computation
...
3

12
...
Universality

Universality

Recall the Turing Machine model from Chapter 6: a Turing Machine consists
of an inﬁnite tape divided into discrete square into which symbols from a ﬁxed
alphabet can be written, and a tape head that moves along the tape
...
The machine can
keep track of a ﬁnite number of possible states, and determines which action to
take based on a set of transition rules that specify the output symbol and head
action for a given current state and read symbol
...
Recall this was 1936, so the model
for mechanical computation was not what a mechanical computer can do, but
what a human computer can do
...

We can enumerate all possible Turing Machines
...
A Turing Machine is completely described by its alphabet, states and transition rules
...
We can map each state and alphabet symbol to a number,
and use this encoding to write down a unique number for every possible Turing
Machine
...
Most positive integers do not correspond to valid
Turing Machines, but if we go through all the numbers we will eventually reach
every possible Turing Machine
...
The number of Turing Machines is less than the number of real numbers
...
2
...
Any attempt to
map the real numbers to the integers must fail to include all the real numbers
...

Universal Turing
Machine

The next step is to deﬁne the machine depicted in Figure 12
...
A Universal Turing Machine is a machine that takes as input a number that identiﬁes a Turing
Machine and simulates the speciﬁed Turing Machine running on initially empty
input tape
...
In his proof,
Turing describes the transition rules for such a machine
...
One can imagine doing this by using
the tape to keep track of the state of the simulated machine
...
Computability

N

Universal
Turing
Machine

Result of running TM N
on an empty input tape

Figure 12
...
Universal Turing Machine
...
This is the rule for the current state of the simulated machine
on the current input symbol of the simulated machine
...
Thus, there is a single Turing Machine that can simulate every
Turing Machine
...
Using the universal machine and a diagonalization argument similar to
the one above for the real numbers, Turing reached a similar contradiction for a
problem analogous to the Halting Problem for Python programs but for Turing
Machines instead
...
There is some program that can
be written in that language to perform every possible computation
...
To simulate a Universal Turing Machine, we need
some way to keep track of the state of the tape (for example, the list datatypes
in Scheme or Python would be adequate), a way to keep track of the internal
machine state (a number can do this), and a way to execute the transition rules
(we could deﬁne a procedure that does this using an if expression to make decisions about which transition rule to follow for each step), and a way to keep
going (we can do this in Scheme with recursive applications)
...

12
...
It is enough to provide a convincing argument that such a procedure exists; ﬁnding the actual procedure is not necessary (but often helps to make the
argument more convincing)
...
Since there are an inﬁnite number of possible
procedures, we cannot just list all possible procedures and show why each one
does not solve the problem
...

The core of our argument is based on knowing the Halting Problem is noncomputable
...
4
...
That is, no algorithm
exists that can solve P since if such an algorithm exists it could be used to also
solve the Halting Problem which we already know is impossible
...
The proof technique where we show that a solution for some
problem P can be used to solve a different problem Q is known as a reduction
...
This means that problem Q is no harder than problem P, since a solution to
problem Q leads to a solution to problem P
...
1: Prints-Three Problem
Consider the problem of determining if an application of a procedure would
ever print 3:
Prints-Three
Input: A string representing a Python program
...

We show the Prints-Three Problem is noncomputable by showing that it is as
hard as the Halting Problem, which we already know is noncomputable
...

Then, we could deﬁne halts as:
def halts(p):
return printsThree(p + '; print(3)')
The printsThree application would evaluate to True if evaluating the Python program speciﬁed by p would halt since that means the print(3) statement appended to p would be evaluated
...
As long as the program speciﬁed by p would never print 3, the application of printsThree should evaluate to
False
...

The one wrinkle is that the speciﬁed input program might print 3 itself
...
One way to do this would be to
replace all occurrences of print (or any other built-in procedure that prints) in
the string with a new procedure, dontprint that behaves like print but doesn’t
actually print out anything
...
Then, we could use printsThree to deﬁne halts:
def halts(p): return printsThree(replacePrints(p) + '; print(3)')
We know that the Halting Problem is noncomputable, so this means the PrintsThree Problem must also be noncomputable
...
1: Virus Detection
The Halting Problem and Prints-Three Problem are noncomputable, but do seem
to be obviously important problems
...
Computability

247

does not answer that question
...
This example considers a problem for which it would be very useful to have
a solution for it one existed
...
A virus spreads by copying its
own code into the code of other programs, so when those programs are executed
the virus will execute
...
A typical virus also includes a malicious payload so when it executes
in addition to infecting other programs it also performs some damaging (corrupting data ﬁles) or annoying (popping up messages) behavior
...

Output: If the expression contains a virus (a code fragment that will infect
other ﬁles) output True
...

We demonstrate the Is-Virus Problem is noncomputable using a similar strategy to the one we used for the Prints-Three Problem: we show how to deﬁne a
halts algorithm given a hypothetical isVirus algorithm
...

Assume infectFiles is a procedure that infects ﬁles, so the result of evaluating
isVirus('infectFiles()') is True
...
If it does, p could infect a ﬁle and never terminate, and halts would
produce the wrong output
...

A rough deﬁnition of ﬁle-infecting behavior would be to consider any write to
an executable ﬁle to be an infection
...
For example, we could do this by creating
a new temporary directory and prepend that path to all ﬁle names
...

def halts(p): isVirus(sandBox(p) + '; infectFiles()')
Since we know there is no algorithm that solves the Halting Problem, this proves
that there is no algorithm that solves the Is-Virus problem
...

Virus scanners detect known viruses by scanning ﬁles for strings that match signatures in a database of known viruses
...

248

12
...
Proving Non-Computability

Sophisticated virus scanners employ more advanced techniques to attempt to
detect complex viruses such as metamorphic viruses that alter their own code
as they propagate to avoid detection
...

I am rather puzzled
why you draw this
distinction between
proof ﬁnders and
proof checkers
...
One can
easily think up
suitable restrictions
on the idea of proof
which will make
this converse true
and which agree
well with our ideas
of what a proof
should be like
...

Alan Turing, letter to
Max Newman, 1940

Exercise 12
...
Is the Launches-Missiles Problem described below computable?
Provide a convincing argument supporting your answer
...

Output: If an application of the procedure would lead to the missiles being launched, outputs True
...

You may assume that the only thing that causes the missiles to be launched is
an application of the launchMissiles procedure
...
2
...

Same-Result
Input: Speciﬁcations of two procedures, P and Q
...
If an application of P does not terminate,
and an application of Q also does not terminate, outputs True
...

Exercise 12
...
Is the Check-Proof Problem described below computable? Provide a convincing argument supporting your answer
...

Output: Outputs True if the proof is a valid proof of the theorem in the
system, or False if it is not a valid proof
...
4
...

Find-Finite-Proof
Input: A speciﬁcation of an axiomatic system, a statement (the theorem),
and a maximum number of steps (max-steps)
...
Otherwise, outputs False
...
Computability

Exercise 12
...
[ ] Is the Find-Proof Problem described below computable? Provide a convincing argument why it is or why it is not computable
...

Output: If there is a proof in the axiomatic system of the theorem, outputs
True
...

Exploration 12
...

Output: A number representing that maximum number of steps a Turing
Machine with n states and a two-symbol tape alphabet can run starting
on an empty tape before halting
...

For example, if the Busy Beaver input n is 1, the output should be 1
...
If the transition rule
for a 0 input moves left, then it will reach another 0 square and continue forever
without halting; similarly it if moves right
...
The machine in Figure 12
...
One way to support this claim would be to try simulating all possible twostate Turing Machines
...
3
...

Busy Beaver numbers increase extremely quickly
...
The
value for a ﬁve-state machine is not yet known, but the best machine found to
date runs for 47,176,870 steps! For six states, the best known result, discovered
in 2007 by Terry Ligocki and Shawn Ligocki, is over 2879 decimal digits long
...
4
...
Suppose we had an algorithm, bb(n), that takes the number
of states as input and outputs the corresponding Busy Beaver
...

Output: If executing the input Turing Machine starting with a blank tape
would ever ﬁnish, output True
...

The TM Halting Problem is different from the Halting Problem as we deﬁned
it earlier, so ﬁrst we need to show that the TM Halting Problem is noncomputable by showing it could be used to solve the Python Halting Problem
...
Once way to do this would be to write
a Universal Turing Machine simulator in Python, and then create a Python program that ﬁrst creates a tape containing the input Turing Machine description,
and then calls the Universal Turing Machine simulator on that input
...

Next, we show that an algorithm that solves the Busy Beaver Problem could be
used to solve the TM Halting Problem
...
So,
haltsTM simulates up to bb(n) steps of the input machine m where n is the number of states in m
...
This means there is no algorithm that can solve the
Busy Beaver Problem
...
6
...
3 runs for 6 steps
before halting
...
Computability

Exercise 12
...
Prove the Beaver Bound problem described below is also noncomputable:
Beaver-Bound
Input: A positive integer, n
...

A valid solution to the Beaver-Bound problem can produce any result for n as
long as it is greater than the Busy Beaver value for n
...
8
...

12
...
The Halting Problem is the most famous example:
it is impossible to deﬁne a mechanical procedure that always terminates and
correctly determines if the computation speciﬁed by its input would terminate
...

Noncomputable problems frequently arise in practice
...

Just because a problem is noncomputable does not mean we cannot produce
useful programs that address the problem
...
They produce the correct results on many inputs, but on some inputs must either fail to produce any result
or produce an incorrect result

Title: Get Informed
Description: These notes are there for University students starting from 1st year beginners to 4th year exparts

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview