Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: Hacking The Art of Exploitation
Description: if you start to learn hacking this ebook might be helpful
Description: if you start to learn hacking this ebook might be helpful
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
International Best-Seller!
the fundamental techniques of Serious hacking
j
Program computers using C, assembly language,
and shell scripts
j
Corrupt system memory to run arbitrary code
using buffer overflows and format strings
j
Crack encrypted wireless traffic using the FMS
attack, and speed up brute-force attacks using a
password probability matrix
Hackers are always pushing the boundaries, investigating the unknown, and evolving their art
...
Combine this knowledge with
the included Linux environment, and all you need is
your own creativity
...
He speaks at computer security conferences and trains security
teams around the world
...
j
Inspect processor registers and system memory
with a debugger to gain a real understanding of
what is happening
livecd provides a complete linux programming and debugging environment
T H E F I N E ST I N G E E K E N T E RTA I N M E N T ™
w w w
...
com
“I LAY FLAT
...
Printed on recycled paper
2nd Edition
j
Redirect network traffic, conceal open ports, and
hijack TCP connections
$49
...
95 cdn)
shelve in : computer security/network security
the art of exploitation
The included LiveCD provides a complete Linux
programming and debugging environment—all
without modifying your current operating system
...
Get your hands dirty
debugging code, overflowing buffers, hijacking
network communications, bypassing protections,
exploiting cryptographic weaknesses, and perhaps
even inventing new exploits
...
To share the art
and science of hacking in a way that is accessible
to everyone, Hacking: The Art of Exploitation, 2nd
Edition introduces the fundamentals of C programming from a hacker’s perspective
...
Many people call themselves
hackers, but few have the strong technical foundation needed to really push the envelope
...
Finally a book that does not
just show how to use the exploits but how to develop them
...
”
—SECURITY FORUMS
“I recommend this book for the programming section alone
...
It is written by someone who knows of what
he speaks, with usable code, tools and examples
...
”
—COMPUTER POWER USER (CPU) MAGAZINE
“This is an excellent book
...
”
—ABOUT
...
Copyright © 2008 by Jon Erickson
...
No part of this work may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior
written permission of the copyright owner and the publisher
...
directly:
No Starch Press, Inc
...
863
...
863
...
com; www
...
com
Librar y of Congress Cataloging-in-Publication Data
Erickson, Jon, 1977Hacking : the art of exploitation / Jon Erickson
...
p
...
ISBN-13: 978-1-59327-144-2
ISBN-10: 1-59327-144-1
1
...
2
...
3
...
QA76
...
A25E75 2008
005
...
Title
...
Other product and
company names mentioned herein may be the trademarks of their respective owners
...
The information in this book is distributed on an “As Is” basis, without warranty
...
shall have any liability to any
person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the
information contained in it
...
xi
Acknowledgments
...
1
0x200
Programming
...
115
0x400
Networking
...
281
0x600
Countermeasures
...
393
0x800
Conclusion
...
455
CONTENTS IN DETAIL
P R E F A CE
xi
A CK N O W LE D G M E N T S
xii
0x100
INTRODUCTION
1
0x200
PROGRAMMING
5
0x210
0x220
0x230
0x240
0x250
0x260
0x270
0x280
What Is Programming?
...
7
Control Structures
...
8
0x232 While/Until Loops
...
10
More Fundamental Programming Concepts
...
11
0x242 Arithmetic Operators
...
14
0x244 Functions
...
19
0x251 The Bigger Picture
...
23
0x253 Assembly Language
...
37
0x261 Strings
...
41
0x263 Pointers
...
48
0x265 Typecasting
...
58
0x267 Variable Scoping
...
69
0x271 Memory Segments in C
...
77
0x273 Error-Checked malloc()
...
81
0x281 File Access
...
87
0x283 User IDs
...
96
0x285 Function Pointers
...
101
0x287 A Game of Chance
...
118
Buffer Overflows
...
122
Experimenting with BASH
...
142
Overflows in Other Segments
...
150
0x342 Overflowing Function Pointers
...
167
0x351 Format Parameters
...
170
0x353 Reading from Arbitrary Memory Addresses
...
173
0x355 Direct Parameter Access
...
182
0x357 Detours with
...
184
0x358 Another notesearch Vulnerability
...
190
0x400
0x410
0x420
E X PL O I T A T I O N
N E T W O RK IN G
195
OSI Model
...
198
0x421 Socket Functions
...
200
0x423 Network Byte Order
...
203
0x425 A Simple Server Example
...
207
0x427 A Tinyweb Server
...
217
0x431 Data-Link Layer
...
220
0x433 Transport Layer
...
224
0x441 Raw Socket Sniffer
...
228
0x443 Decoding the Layers
...
239
Denial of Service
...
252
0x452 The Ping of Death
...
256
0x454 Ping Flooding
...
257
0x456 Distributed DoS Flooding
...
258
0x461 RST Hijacking
...
263
C on t en t s in D et ai l
0x470
0x480
Port Scanning
...
264
0x472 FIN, X-mas, and Null Scans
...
265
0x474 Idle Scanning
...
267
Reach Out and Hack Someone
...
273
0x482 Almost Only Counts with Hand Grenades
...
278
0x500
0x510
0x520
0x530
0x540
0x550
0x630
0x640
0x650
0x660
0x670
0x680
0x690
281
Assembly vs
...
282
0x511 Linux System Calls in Assembly
...
286
0x521 Assembly Instructions Using the Stack
...
289
0x523 Removing Null Bytes
...
295
0x531 A Matter of Privilege
...
302
Port-Binding Shellcode
...
307
0x542 Branching Control Structures
...
314
0x600
0x610
0x620
SH E L L CO D E
C O U N T E R M E AS U R E S
319
Countermeasures That Detect
...
321
0x621 Crash Course in Signals
...
324
Tools of the Trade
...
329
Log Files
...
334
Overlooking the Obvious
...
336
0x652 Putting Things Back Together Again
...
346
Advanced Camouflage
...
348
0x662 Logless Exploitation
...
354
0x671 Socket Reuse
...
359
0x681 String Encoding
...
362
Buffer Restrictions
...
366
C on t en ts in D et ail
ix
0x6a0
0x6b0
0x6c0
Hardening Countermeasures
...
376
0x6b1 ret2libc
...
377
Randomized Stack Space
...
380
0x6c2 Bouncing Off linux-gate
...
388
0x6c4 A First Attempt
...
390
0x700
0x710
0x720
0x730
0x740
0x750
0x760
0x770
0x780
C O N C LU S I O N
451
References
...
454
I ND E X
x
393
Information Theory
...
394
0x712 One-Time Pads
...
395
0x714 Computational Security
...
397
0x721 Asymptotic Notation
...
398
0x731 Lov Grover’s Quantum Search Algorithm
...
400
0x741 RSA
...
404
Hybrid Ciphers
...
406
0x752 Differing SSH Protocol Host Fingerprints
...
413
Password Cracking
...
419
0x762 Exhaustive Brute-Force Attacks
...
423
0x764 Password Probability Matrix
...
11b Encryption
...
434
0x772 RC4 Stream Cipher
...
436
0x781 Offline Brute-Force Attacks
...
437
0x783 IV-Based Decryption Dictionary Tables
...
438
0x785 Fluhrer, Mantin, and Shamir Attack
...
Understanding hacking techniques
is often difficult, since it requires both breadth and
depth of knowledge
...
This
second edition of Hacking: The Art of Exploitation makes the world of hacking
more accessible by providing the complete picture—from programming to
machine code to exploitation
...
This CD
contains all the source code in the book and provides a development and
exploitation environment you can use to follow along with the book’s
examples and experiment along the way
...
Also, I would like to thank my friends Seth Benson and Aaron Adams
for proofreading and editing, Jack Matheson for helping me with assembly,
Dr
...
0x100
INTRODUCTION
The idea of hacking may conjure stylized images of
electronic vandalism, espionage, dyed hair, and body
piercings
...
Granted, there are people out
there who use hacking techniques to break the law, but hacking isn’t really
about that
...
The essence of hacking is finding unintended or overlooked uses for the
laws and properties of a given situation and then applying them in new and
inventive ways to solve a problem—whatever it may be
...
Each number must be
used once and only once, and you may define the order of
operations; for example, 3 * (4 + 6) + 1 = 31 is valid, however
incorrect, since it doesn’t total 24
...
Like the solution to this problem (shown on the last page of
this book), hacked solutions follow the rules of the system, but they use those
rules in counterintuitive ways
...
Since the infancy of computers, hackers have been creatively solving
problems
...
The club’s members used this
equipment to rig up a complex system that allowed multiple operators to control different parts of the track by dialing in to the appropriate sections
...
The group moved on
to programming on punch cards and ticker tape for early computers like the
IBM 704 and the TX-0
...
A new program that could achieve the same result
as an existing one but used fewer punch cards was considered better, even
though it did the same thing
...
Being able to reduce the number of punch cards needed for a program
showed an artistic mastery over the computer
...
Early hackers proved that technical problems can have artistic solutions, and they thereby transformed programming from a mere engineering
task into an art form
...
The few
who got it formed an informal subculture that remained intensely focused
on learning and mastering their art
...
Such obstructions included authority figures, the bureaucracy of
college classes, and discrimination
...
This drive to continually learn and explore transcended
even the conventional boundaries drawn by discrimination, evident in the
MIT model railroad club’s acceptance of 12-year-old Peter Deutsch when
he demonstrated his knowledge of the TX-0 and his desire to learn
...
The original hackers found splendor and elegance in the conventionally
dry sciences of math and electronics
...
Their desire
to dissect and understand wasn’t intended to demystify artistic endeavors; it
was simply a way to achieve a greater appreciation of them
...
This is not a new cultural trend; the
Pythagoreans in ancient Greece had a similar ethic and subculture, despite
not owning computers
...
That thirst for knowledge and its beneficial byproducts would continue on through history, from the Pythagoreans to Ada
Lovelace to Alan Turing to the hackers of the MIT model railroad club
...
How does one distinguish between the good hackers who bring us the
wonders of technological advancement and the evil hackers who steal our
credit card numbers? The term cracker was coined to distinguish evil hackers
from the good ones
...
Hackers stayed true to the
Hacker Ethic, while crackers were only interested in breaking the law and
making a quick buck
...
Cracker was meant to be the
catch-all label for anyone doing anything unscrupulous with a computer—
pirating software, defacing websites, and worst of all, not understanding what
they were doing
...
The term’s lack of popularity might be due to its confusing etymology—
cracker originally described those who crack software copyrights and reverse
engineer copy-protection schemes
...
Few technology journalists feel compelled to use terms that most of their
readers are unfamiliar with
...
Similarly, the term script kiddie is sometimes used
to refer to crackers, but it just doesn’t have the same zing as the shadowy
hacker
...
The current laws restricting cryptography and cryptographic research
further blur the line between hackers and crackers
...
This paper responded to a challenge issued by the Secure Digital Music
Initiative (SDMI) in the SDMI Public Challenge, which encouraged the
public to attempt to break these watermarking schemes
...
The Digital Millennium Copyright Act (DCMA) of 1998 makes it illegal to
discuss or provide technology that might be used to bypass industry consumer controls
...
He had written software to circumvent
In t ro duc ti on
3
overly simplistic encryption in Adobe software and presented his findings at a
hacker convention in the United States
...
Under the law, the complexity of the
industry consumer controls doesn’t matter—it would be technically illegal to
reverse engineer or even discuss Pig Latin if it were used as an industry consumer control
...
The sciences of nuclear physics and biochemistry can be used to kill,
yet they also provide us with significant scientific advancement and modern
medicine
...
Even if we wanted to, we couldn’t suppress
the knowledge of how to convert matter into energy or stop the continued
technological progress of society
...
Hackers will
constantly be pushing the limits of knowledge and acceptable behavior,
forcing us to explore further and further
...
Just as the speedy gazelle adapted from being chased by the cheetah,
and the cheetah became even faster from chasing the gazelle, the competition between hackers provides computer users with better and stronger
security, as well as more complex and sophisticated attack techniques
...
The defending hackers create IDSs
to add to their arsenal, while the attacking hackers develop IDS-evasion
techniques, which are eventually compensated for in bigger and better IDS
products
...
The intent of this book is to teach you about the true spirit of hacking
...
Included with this book is
a bootable LiveCD containing all the source code used herein as well as a
preconfigured Linux environment
...
The only requirement is an x86 processor, which is used by all
Microsoft Windows machines and the newer Macintosh computers—just
insert the CD and reboot
...
This way, you will gain a hands-on understanding and appreciation for hacking
that may inspire you to improve upon existing techniques or even to invent
new ones
...
4
0x 100
0x200
PROGRAMMING
Hacker is a term for both those who write code and
those who exploit it
...
Since an understanding
of programming helps those who exploit, and an understanding of exploitation helps those who program, many
hackers do both
...
Hacking is really just the act of finding a clever and counterintuitive
solution to a problem
...
Programming hacks are
similar in that they also use the rules of the computer in new and inventive
ways, but the final goal is efficiency or smaller source code, not necessarily a
security compromise
...
The few solutions that remain
are small, efficient, and neat
...
Hackers on both sides of programming
appreciate both the beauty of elegant code and the ingenuity of clever hacks
...
Because of the
tremendous exponential growth of computational power and memory,
spending an extra five hours to create a slightly faster and more memoryefficient piece of code just doesn’t make business sense when dealing with
modern computers that have gigahertz of processing cycles and gigabytes of
memory
...
When the
bottom line is money, spending time on clever hacks for optimization just
doesn’t make sense
...
These are the people who get
excited about programming and really appreciate the beauty of an elegant
piece of code or the ingenuity of a clever hack
...
0x210
What Is Programming?
Programming is a very natural and intuitive concept
...
Programs are
everywhere, and even the technophobes of the world use programs every day
...
A typical program for driving directions might look something
like this:
Start out down Main Street headed east
...
If the street is blocked because of construction, turn
right there at 15th Street, turn left on Pine Street, and then turn right on
16th Street
...
Continue on 16th Street, and turn left onto Destination Road
...
The address is 743 Destination Road
...
Granted, they’re not eloquent,
but each instruction is clear and easy to understand, at least for someone
who reads English
...
To instruct a computer to do something, the instructions
must be written in its language
...
To write a program in machine language for an
Intel x86 processor, you would have to figure out the value associated with
each instruction, how each instruction interacts, and myriad low-level details
...
What’s needed to overcome the complication of writing machine language
is a translator
...
Assembly language is less cryptic than machine language, since it uses names
for the different instructions and variables, instead of just using numbers
...
The instruction names
are very esoteric, and the language is architecture specific
...
Any program written using assembly language for one processor’s
architecture will not work on another processor’s architecture
...
In addition, in order to write an effective program in assembly
language, you must still know many low-level details of the processor architecture you are writing for
...
A compiler converts a high-level language into machine language
...
This means that if a program is written in a highlevel language, the program only needs to be written once; the same piece of
program code can be compiled into machine language for various specific
architectures
...
A program written in a high-level language is much more readable and
English-like than assembly language or machine language, but it still must
follow very strict rules about how the instructions are worded, or the compiler won’t be able to understand it
...
Pseudo-code is simply English arranged with a general structure
similar to a high-level language
...
Pseudo-code isn’t well defined; in fact, most people write pseudo-code
slightly differently
...
Pseudo-code makes for an excellent introduction to common universal programming concepts
...
This is fine for very simple programs, but most
programs, like the driving directions example, aren’t that simple
...
These
statements are known as control structures, and they change the flow of the
program’s execution from a simple sequential order to a more complex and
more useful flow
...
If it is, a special set of instructions needs to address that situation
...
These types of special cases
can be accounted for in a program with one of the most natural control
structures: the if-then-else structure
...
The if-then-else pseudo-code structure of the preceding driving directions might look something like this:
Drive down Main Street;
If (street is blocked)
{
Turn right on 15th Street;
Turn left on Pine Street;
Turn right on 16th Street;
}
Else
{
Turn right on 16th Street;
}
Each instruction is on its own line, and the various sets of conditional
instructions are grouped between curly braces and indented for readability
...
8
0x 200
Of course, other languages require the then keyword in their syntax—
BASIC, Fortran, and even Pascal, for example
...
Once a programmer understands the concepts
these languages are trying to convey, learning the various syntactical variations is fairly trivial
...
Another common rule of C-like syntax is when a set of instructions
bounded by curly braces consists of just one instruction, the curly braces are
optional
...
The driving directions from
before can be rewritten following this rule to produce an equivalent piece of
pseudo-code:
Drive down Main Street;
If (street is blocked)
{
Turn right on 15th Street;
Turn left on Pine Street;
Turn right on 16th Street;
}
Else
Turn right on 16th Street;
This rule about sets of instructions holds true for all of the control
structures mentioned in this book, and the rule itself can be described in
pseudo-code
...
There are variations of if-then-else, such as select/case statements,
but the logic is still basically the same: If this happens do these things, otherwise
do these other things (which could consist of even more if-then statements)
...
A programmer will often want to execute a set of
instructions more than once
...
A while loop says to execute the following set of
instructions in a loop while a condition is true
...
The amount of food the mouse finds each
time could range from a tiny crumb to an entire loaf of bread
...
Another variation on the while loop is an until loop, a syntax that is
available in the programming language Perl (C doesn’t use this syntax)
...
The
same mouse program using an until loop would be:
Until (you are not hungry)
{
Find some food;
Eat the food;
}
Logically, any until-like statement can be converted into a while loop
...
This can easily be changed into a
standard while loop by simply inverting the condition
...
This is generally used when
a programmer wants to loop for a certain number of iterations
...
The same statement can be written as such:
Set the counter to 0;
While (the counter is less than 5)
10
0x200
{
Drive straight for 1 mile;
Add 1 to the counter;
}
The C-like pseudo-code syntax of a for loop makes this even more
apparent:
For (i=0; i<5; i++)
Drive straight for 1 mile;
In this case, the counter is called i, and the for statement is broken up
into three sections, separated by semicolons
...
The second section is like
a while statement using the counter: While the counter meets this condition,
keep looping
...
In this case, i++ is a shorthand
way of saying, Add 1 to the counter called i
...
These concepts are used in many programming languages, with
a few syntactical differences
...
By the end, the pseudocode should look very similar to C code
...
A variable can
simply be thought of as an object that holds data that can be changed—
hence the name
...
Returning to the driving example, the speed of the car would
be a variable, while the color of the car would be a constant
...
This is because a C program will eventually be compiled into an executable program
...
Ultimately, all variables
are stored in memory somewhere, and their declarations allow the compiler
to organize this memory more efficiently
...
In C, each variable is given a type that describes the information that is
meant to be stored in that variable
...
Variables are declared simply by using these keywords before
listing the variables, as you can see below
...
14), and z is expected to hold a character value, like A
or w
...
int a = 13, b;
float k;
char z = 'A';
k = 3
...
14, z will contain the character w,
and b will contain the value 18, since 13 plus 5 equals 18
...
0x242
Arithmetic Operators
The statement b = a + 7 is an example of a very simple arithmetic operator
...
The first four operations should look familiar
...
If a is 13, then 13 divided by 5 equals 2, with a remainder of 3, which
means that a % 5 = 3
...
Floating-point variables must be used to retain the
more correct answer of 2
...
Operation
Symbol
Example
Addition
+
b = a + 5
Subtraction
-
b = a - 5
Multiplication
*
b = a * 5
Division
/
b = a / 5
Modulo reduction
%
b = a % 5
To get a program to use these concepts, you must speak its language
...
One of these was mentioned earlier and is used commonly in for loops
...
i = i - 1
i-- or --i
Subtract 1 from the variable
...
This is where the difference between i++ and ++i becomes apparent
...
The following example will help clarify
...
P rog ra m min g
13
Quite often in programs, variables need to be modified in place
...
This
happens commonly enough that shorthand also exists for it
...
i = i - 12
i-=12
Subtract some value from the variable
...
i = i / 12
i/=12
Divide some value from the variable
...
These conditional statements are based on some
sort of comparison
...
Condition
Symbol
Example
Less than
<
(a < b)
Greater than
>
(a > b)
Less than or equal to
<=
(a <= b)
Greater than or equal to
>=
(a >= b)
Equal to
==
(a == b)
Not equal to
!=
(a != b)
Most of these operators are self-explanatory; however, notice that the
shorthand for equal to uses double equal signs
...
The statement a = 7 means
Put the value 7 in the variable a, while a == 7 means Check to see whether the variable
a is equal to 7
...
) Also, notice that an
exclamation point generally means not
...
!(a < b)
is equivalent to
(a >= b)
These comparison operators can also be chained together using shorthand for OR and AND
...
Similarly,
the example statement consisting of two smaller comparisons joined with
AND logic will fire true if a is less than b AND a is not less than c
...
Many things can be boiled down to variables, comparison operators, and
control structures
...
Naturally, 1
means true and 0 means false
...
C doesn’t really have any Boolean operators, so any nonzero value is
considered true, and a statement is considered false if it contains 0
...
Checking to see whether the variable hungry
is equal to 1 will return 1 if hungry equals 1 and 0 if hungry equals 0
...
While (hungry)
{
Find some food;
Eat the food;
}
A smarter mouse program with more inputs demonstrates how comparison operators can be combined with variables
...
Just remember that any nonzero value is considered true, and the value of 0
is considered false
...
These instructions can be grouped into a smaller subprogram called a function
...
For example, the action of turning a car actually
consists of many smaller instructions: Turn on the appropriate blinker, slow
down, check for oncoming traffic, turn the steering wheel in the appropriate
direction, and so on
...
You can pass variables as arguments
to a function in order to modify the way the function operates
...
Function Turn(variable_direction)
{
Activate the variable_direction blinker;
Slow down;
Check for oncoming traffic;
while(there is oncoming traffic)
{
Stop;
Watch for oncoming traffic;
}
Turn the steering wheel to the variable_direction;
while(turn is not complete)
{
if(speed < 5 mph)
Accelerate;
}
Turn the steering wheel back to the original position;
Turn off the variable_direction blinker;
}
This function describes all the instructions needed to make a turn
...
When the function is called, the instructions found within it are
executed with the arguments passed to it; afterward, execution returns to
where it was in the program, after the function call
...
By default in C, functions can return a value to a caller
...
Imagine a
function that calculates the factorial of a number—naturally, it returns the
result
...
This format
looks very similar to variable declaration
...
The return statement
at the end of the function passes back the contents of the variable x and ends
the function
...
int a=5, b;
b = factorial(a);
At the end of this short program, the variable b will contain 120, since
the factorial function will be called with the argument of 5 and will return 120
...
This can be done by simply writing the entire function before using it
later in the program or by using function prototypes
...
The actual
function can be located near the end of the program, but it can be used anywhere else, since the compiler already knows about it
...
There’s no need to actually define any variable names in the prototype, since
this is done in the actual function
...
If a function doesn’t have any value to return, it should be declared as void,
as is the case with the turn() function I used as an example earlier
...
Every turn in the directions has both a direction and a street
name
...
This complicates the function
of turning, since the proper street must be located before the turn can be
made
...
P rog ra m min g
17
void turn(variable_direction, target_street_name)
{
Look for a street sign;
current_intersection_name = read street sign name;
while(current_intersection_name != target_street_name)
{
Look for another street sign;
current_intersection_name = read street sign name;
}
Activate the variable_direction blinker;
Slow down;
Check for oncoming traffic;
while(there is oncoming traffic)
{
Stop;
Watch for oncoming traffic;
}
Turn the steering wheel to the variable_direction;
while(turn is not complete)
{
if(speed < 5 mph)
Accelerate;
}
Turn the steering wheel right back to the original position;
Turn off the variable_direction blinker;
}
This function includes a section that searches for the proper intersection
by looking for street signs, reading the name on each street sign, and storing
that name in a variable called current_intersection_name
...
The pseudo-code driving
instructions can now be changed to use this turning function
...
Since pseudo-code doesn’t actually have to work,
full functions don’t need to be written out—simply jotting down Do some
complex stuff here will suffice
...
Most of the real usefulness of C comes from collections of
existing functions called libraries
...
C compilers exist for just about every operating system and processor
architecture out there, but for this book, Linux and an x 86-based processor
will be used exclusively
...
Since hacking is really about experimenting, it’s
probably best if you have a C compiler to follow along with
...
Just put the CD in the drive and reboot
your computer
...
From this Linux environment you can follow
along with the book and experiment on your own
...
The firstprog
...
firstprog
...
h>
int main()
{
int i;
for(i=0; i < 10; i++)
{
puts("Hello, world!\n");
}
return 0;
}
// Loop 10 times
...
// Tell OS the program exited without errors
...
Any text following two forward slashes (//) is a comment, which is
ignored by the compiler
...
This header file is added to the program when it is compiled
...
h, and it defines several constants and function prototypes for corresponding functions in the standard I/O library
...
This function prototype (along with many others) is included in the stdio
...
A lot of the power of C comes from its extensibility and libraries
...
You may have even noticed that there’s a set of curly braces that
can be eliminated
...
The GNU Compiler Collection (GCC) is a free C compiler that translates C
into machine language that a processor can understand
...
out by default
...
c
ls -l a
...
out
...
out
The Bigger Picture
Okay, this has all been stuff you would learn in an elementary programming
class—basic, but essential
...
Don’t get me wrong, being fluent in C is very useful
and is enough to make you a decent programmer, but it’s only a piece of the
bigger picture
...
Hackers get their edge from knowing how all
the pieces interact within this bigger picture
...
The code can’t actually do anything until it’s compiled into an executable
binary file
...
The binary a
...
Compilers are designed to translate the language of C code into machine
language for a variety of processor architectures
...
There are also Sparc processor
architectures (used in Sun Workstations) and the PowerPC processor architecture (used in pre-Intel Macs)
...
20
0x200
As long as the compiled program works, the average programmer is
only concerned with source code
...
With a better
understanding of how the CPU operates, a hacker can manipulate the programs that run on it
...
But what does
this executable binary look like? The GNU development tools include a program called objdump, which can be used to examine compiled binaries
...
reader@hacking:~/booksrc
08048374
8048374:
55
8048375:
89 e5
8048377:
83 ec 08
804837a:
83 e4 f0
804837d:
b8 00 00
8048382:
29 c4
8048384:
c7 45 fc
804838b:
83 7d fc
804838f:
7e 02
8048391:
eb 13
8048393:
c7 04 24
804839a:
e8 01 ff
804839f:
8d 45 fc
80483a2:
ff 00
80483a4:
eb e5
80483a6:
c9
80483a7:
c3
80483a8:
90
80483a9:
90
80483aa:
90
reader@hacking:~/booksrc
$ objdump -D a
...
:
00 00
00 00 00 00
09
84 84 04 08
ff ff
push
mov
sub
and
mov
sub
movl
cmpl
jle
jmp
movl
call
lea
incl
jmp
leave
ret
nop
nop
nop
%ebp
%esp,%ebp
$0x8,%esp
$0xfffffff0,%esp
$0x0,%eax
%eax,%esp
$0x0,0xfffffffc(%ebp)
$0x9,0xfffffffc(%ebp)
8048393
80483a6
$0x8048484,(%esp)
80482a0
0xfffffffc(%ebp),%eax
(%eax)
804838b
$
The objdump program will spit out far too many lines of output to
sensibly examine, so the output is piped into grep with the command-line
option to only display 20 lines after the regular expression main
...
The
numbering system you are most familiar with uses a base-10 system, since at
10 you need to add an extra symbol
...
This is a convenient notation since a byte contains 8 bits, each
of which can be either true or false
...
The hexadecimal numbers—starting with 0x8048374 on the far left—are
memory addresses
...
Memory is just a
collection of bytes of temporary storage space that are numbered with
addresses
...
Each
byte of memory can be accessed by its address, and in this case the CPU
accesses this part of memory to retrieve the machine language instructions
that make up the compiled program
...
The 32-bit processors
have 232 (or 4,294,967,296) possible addresses, while the 64-bit ones have 264
(1
...
The 64-bit processors can run in
32-bit compatibility mode, which allows them to run 32-bit code quickly
...
Of course, these hexadecimal values
are only representations of the bytes of binary 1s and 0s the CPU can understand
...
isn’t very useful to anything other than the processor, the machine code is
displayed as hexadecimal bytes and each instruction is put on its own line,
like splitting a paragraph into sentences
...
The instructions on
the far right are in assembly language
...
The instruction ret is far easier to remember and make sense of than 0xc3 or
11000011
...
This means that since every processor architecture has
different machine language instructions, each also has a different form of
assembly language
...
Exactly how
these machine language instructions are represented is simply a matter of
convention and preference
...
The assembly shown in the output on page 21
is AT&T syntax, as just about all of Linux’s disassembly tools use this syntax by
default
...
The same
code can be shown in Intel syntax by providing an additional command-line
option, -M intel, to objdump, as shown in the output below
...
out | grep -A20 main
...
Regardless of the assembly language representation, the commands a processor understands are quite simple
...
These operations move memory
around, perform some sort of basic math, or interrupt the processor to get it
to do something else
...
But in the same way millions of books have been written using a relatively
small alphabet of letters, an infinite number of possible programs can be
created using a relatively small collection of machine instructions
...
Most
of the instructions use these registers to read or write data, so understanding
the registers of a processor is essential to understanding the instructions
...
0x252
The x86 Processor
The 8086 CPU was the first x86 processor
...
If you remember people talking
about 386 and 486 processors in the ’80s and ’90s, this is what they were
referring to
...
I could just talk abstractly about these registers now, but
I think it’s always better to see things for yourself
...
Debuggers are used by programmers to step through compiled programs, examine program memory, and
view processor registers
...
Similar to a microscope, a debugger allows
a hacker to observe the microscopic world of machine code—but a debugger is
far more powerful than this metaphor allows
...
P rog ra m min g
23
Below, GDB is used to show the state of the processor registers right before
the program starts
...
/a
...
so
...
(gdb) break main
Breakpoint 1 at 0x804837a
(gdb) run
Starting program: /home/reader/booksrc/a
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $
A breakpoint is set on the main() function so execution will stop right
before our code is executed
...
The first four registers (EAX, ECX, EDX, and EBX) are known as generalpurpose registers
...
They are used for a variety of purposes, but they mainly
act as temporary variables for the CPU when it is executing machine
instructions
...
These stand for Stack Pointer, Base Pointer, Source Index, and Destination Index,
respectively
...
These registers
are fairly important to program execution and memory management; we will
discuss them more later
...
There are load and store instructions
that use these registers, but for the most part, these registers can be thought
of as just simple general-purpose registers
...
Like a child pointing his finger
at each word as he reads, the processor reads each instruction using the EIP
register as its finger
...
Currently, it points to a memory address at 0x804838a
...
The actual memory is
split into several different segments, which will be discussed later, and these
registers keep track of that
...
0x253
Assembly Language
Since we are using Intel syntax assembly language for this book, our tools
must be configured to use this syntax
...
You can configure this setting to run every time GDB starts up by
putting the command in the file
...
reader@hacking:~/booksrc
(gdb) set dis intel
(gdb) quit
reader@hacking:~/booksrc
reader@hacking:~/booksrc
set dis intel
reader@hacking:~/booksrc
$ gdb -q
$ echo "set dis intel" > ~/
...
gdbinit
$
Now that GDB is configured to use Intel syntax, let’s begin understanding
it
...
The operations are usually intuitive mnemonics: The mov
operation will move a value from the source to the destination, sub will
subtract, inc will increment, and so forth
...
8048375:
8048377:
89 e5
83 ec 08
mov
sub
ebp,esp
esp,0x8
P rog ra m min g
25
There are also operations that are used to control the flow of execution
...
The example below first compares a 4-byte
value located at EBP minus 4 with the number 9
...
If that value is less than or equal to 9, execution jumps to the
instruction at 0x8048393
...
If the value isn’t less than or equal to 9, execution will jump to 0x80483a6
...
The -g flag can be used by the GCC compiler to include extra debugging
information, which will give GDB access to the source code
...
c
reader@hacking:~/booksrc $ ls -l a
...
out
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/libthread_db
...
1"
...
h>
2
3
int main()
4
{
5
int i;
6
for(i=0; i < 10; i++)
7
{
8
printf("Hello, world!\n");
9
}
10
}
(gdb) disassemble main
Dump of assembler code for function main():
0x08048384
push
ebp
0x08048385
mov
ebp,esp
0x08048387
sub
esp,0x8
0x0804838a
and
esp,0xfffffff0
0x0804838d
mov
eax,0x0
0x08048392
sub
esp,eax
0x08048394
mov
DWORD PTR [ebp-4],0x0
0x0804839b
cmp
DWORD PTR [ebp-4],0x9
0x0804839f
jle
0x80483a3
0x080483a1
jmp
0x80483b6
26
0x200
0x080483a3
mov
DWORD PTR [esp],0x80484d4
0x080483aa
call
0x80482a8 <_init+56>
0x080483af
lea
eax,[ebp-4]
0x080483b2
inc
DWORD PTR [eax]
0x080483b4
jmp
0x804839b
0x080483b6
leave
0x080483b7
ret
End of assembler dump
...
c, line 6
...
out
Breakpoint 1, main() at firstprog
...
Then a breakpoint is set at the start of main(), and the program is
run
...
Since the breakpoint has been set at the
start of the main() function, the program hits the breakpoint and pauses
before actually executing any instructions in main()
...
Notice that EIP contains a memory address that points to an instruction in
the main() function’s disassembly (shown in bold)
...
Part of the reason variables need to be declared in C is to aid
the construction of this section of code
...
We’ll talk
more about the function prologue later, but for now we can take a cue from
GDB and skip it
...
Examining memory is a critical
skill for any hacker
...
In both magic and hacking, if you were to look in just the right
spot, the trick would be obvious
...
But with a debugger like GDB, every aspect
of a program’s execution can be deterministically examined, paused, stepped
through, and repeated as often as needed
...
The examine command in GDB can be used to look at a certain address
of memory in a variety of ways
...
P rog ra m min g
27
The display format also uses a single-letter shorthand, which is optionally
preceded by a count of how many items to examine
...
x
Display in hexadecimal
...
t
Display in binary
...
In the following example, the current address of the EIP
register is used
...
(gdb) i r
eip
(gdb) x/o
0x8048384
(gdb) x/x
0x8048384
(gdb) x/u
0x8048384
(gdb) x/t
0x8048384
(gdb)
eip
0x8048384
0x8048384
0x8048384
077042707
$eip
0x00fc45c7
$eip
16532935
$eip
00000000111111000100010111000111
The memory the EIP register is pointing to can be examined by using the
address stored in EIP
...
The value 077042707 in
octal is the same as 0x00fc45c7 in hexadecimal, which is the same as 16532935 in
base-10 decimal, which in turn is the same as 00000000111111000100010111000111
in binary
...
(gdb) x/2x $eip
0x8048384
(gdb) x/12x $eip
0x8048384
0x8048394
0x80483a4
(gdb)
0x00fc45c7
0x83000000
0x00fc45c7
0x84842404
0xc3c9e5eb
0x83000000
0x01e80804
0x90909090
0x7e09fc7d
0x8dffffff
0x90909090
0xc713eb02
0x00fffc45
0x5de58955
The default size of a single unit is a four-byte unit called a word
...
The valid size letters are as follows:
b
h
A word, which is four bytes in size
g
0x200
A halfword, which is two bytes in size
w
28
A single byte
A giant, which is eight bytes in size
This is slightly confusing, because sometimes the term word also refers to
2-byte values
...
In this
book, words and DWORDs both refer to 4-byte values
...
The following GDB output shows
memory displayed in various sizes
...
The first e xamine command shows the first eight bytes, and naturally, the
examine commands that use bigger units display more data in total
...
This same byte-reversal effect can be seen
when a full four-byte word is shown as 0x00fc45c7, but when the first four bytes
are shown byte by byte, they are in the order of 0xc7, 0x45, 0xfc, and 0x00
...
For example,
if four bytes are to be interpreted as a single value, the bytes must be used
in reverse order
...
Revisiting these
values displayed both as hexadecimal and unsigned decimals might help
clear up any confusion
...
Exit anyway? (y or n) y
reader@hacking:~/booksrc $ bc -ql
199*(256^3) + 69*(256^2) + 252*(256^1) + 0*(256^0)
3343252480
0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0)
16532935
quit
reader@hacking:~/booksrc $
P rog ra m min g
29
The first four bytes are shown both in hexadecimal and standard unsigned
decimal notation
...
The byte order of a given architecture is an
important detail to be aware of
...
In addition to converting byte order, GDB can do other conversions with
the examine command
...
The examine
command also accepts the format letter i, short for instruction, to display the
memory as disassembled assembly language instructions
...
/a
...
so
...
(gdb) break main
Breakpoint 1 at 0x8048384: file firstprog
...
(gdb) run
Starting program: /home/reader/booksrc/a
...
c:6
6
for(i=0; i < 10; i++)
(gdb) i r $eip
eip
0x8048384
0x8048384
(gdb) x/i $eip
0x8048384
mov
DWORD PTR [ebp-4],0x0
(gdb) x/3i $eip
0x8048384
mov
DWORD PTR [ebp-4],0x0
0x804838b
cmp
DWORD PTR [ebp-4],0x9
0x804838f
jle
0x8048393
(gdb) x/7xb $eip
0x8048384
0xc7
0x45
0xfc
0x00
(gdb) x/i $eip
0x8048384
mov
DWORD PTR [ebp-4],0x0
(gdb)
0x00
0x00
0x00
In the output above, the a
...
Since the EIP register is pointing to memory that actually contains machine language instructions, they disassemble quite nicely
...
8048384:
c7 45 fc 00 00 00 00
mov
DWORD PTR [ebp-4],0x0
This assembly instruction will move the value of 0 into memory located
at the address stored in the EBP register, minus 4
...
Basically, this command will zero out the
30
0x200
variable i for the for loop
...
The memory at this location can be
examined several different ways
...
The examine command can examine this memory address
directly or by doing the math on the fly
...
This variable named $1 can be used later to quickly re-access
a particular location in memory
...
Let’s execute the current instruction using the command nexti, which is
short for next instruction
...
(gdb) nexti
0x0804838b
6
for(i=0; i < 10; i++)
(gdb) x/4xb $1
0xbffff804:
0x00
0x00
0x00
0x00
(gdb) x/dw $1
0xbffff804:
0
(gdb) i r eip
eip
0x804838b
0x804838b
(gdb) x/i $eip
0x804838b
cmp
DWORD PTR [ebp-4],0x9
(gdb)
As predicted, the previous command zeroes out the 4 bytes found at EBP
minus 4, which is memory set aside for the C variable i
...
The next few instructions actually make more sense to
talk about in a group
...
The next instruction,
jle stands for jump if less than or equal to
...
In this case the
instruction says to jump to the address 0x8048393 if the value stored in memory
for the C variable i is less than or equal to the value 9
...
This will cause the EIP to jump to the address 0x80483a6
...
The first address of 0x8048393 (shown in
bold) is simply the instruction found after the fixed jump instruction, and
the second address of 0x80483a6 (shown in italics) is located at the end of the
function
...
(gdb) nexti
0x0804838f
6
for(i=0; i < 10; i++)
(gdb) x/i $eip
0x804838f
jle
0x8048393
(gdb) nexti
8
printf("Hello, world!\n");
(gdb) i r eip
eip
0x8048393
0x8048393
(gdb) x/2i $eip
0x8048393
mov
DWORD PTR [esp],0x8048484
0x804839a
call
0x80482a0
(gdb)
As expected, the previous two instructions let the program execution
flow down to 0x8048393, which brings us to the next two instructions
...
But what is ESP
pointing to?
(gdb) i r esp
esp
(gdb)
0xbffff800
0xbffff800
Currently, ESP points to the memory address 0xbffff800, so when the mov
instruction is executed, the address 0x8048484 is written there
...
(gdb) x/2xw 0x8048484
0x8048484:
0x6c6c6548
(gdb) x/6xb 0x8048484
0x8048484:
0x48
0x65
(gdb) x/6ub 0x8048484
0x8048484:
72
101
(gdb)
0x6f57206f
0x6c
0x6c
0x6f
0x20
108
108
111
32
A trained eye might notice something about the memory here, in particular the range of the bytes
...
These bytes fall within
the printable ASCII range
...
The bytes 0x48, 0x65, 0x6c, and 0x6f all correspond to letters in the alphabet on
the ASCII table shown below
...
ASCII Table
Oct
Dec Hex
Char
Oct
Dec
Hex
Char
-----------------------------------------------------------000
0
00
NUL '\0'
100
64
40
@
001
1
01
SOH
101
65
41
A
002
2
02
STX
102
66
42
B
003
3
03
ETX
103
67
43
C
004
4
04
EOT
104
68
44
D
005
5
05
ENQ
105
69
45
E
006
6
06
ACK
106
70
46
F
007
7
07
BEL '\a'
107
71
47
G
010
8
08
BS '\b'
110
72
48
H
011
9
09
HT '\t'
111
73
49
I
012
10
0A
LF '\n'
112
74
4A
J
013
11
0B
VT '\v'
113
75
4B
K
014
12
0C
FF '\f'
114
76
4C
L
015
13
0D
CR '\r'
115
77
4D
M
016
14
0E
SO
116
78
4E
N
017
15
0F
SI
117
79
4F
O
020
16
10
DLE
120
80
50
P
021
17
11
DC1
121
81
51
Q
P rog ra m min g
33
022
023
024
025
026
027
030
031
032
033
034
035
036
037
040
041
042
043
044
045
046
047
050
051
052
053
054
055
056
057
060
061
062
063
064
065
066
067
070
071
072
073
074
075
076
077
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
12
13
14
15
16
17
18
19
1A
1B
1C
1D
1E
1F
20
21
22
23
24
25
26
27
28
29
2A
2B
2C
2D
2E
2F
30
31
32
33
34
35
36
37
38
39
3A
3B
3C
3D
3E
3F
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
SPACE
!
"
#
$
%
&
'
(
)
*
+
,
...
The c format letter can be used to automatically
look up a byte on the ASCII table, and the s format letter will display an
entire string of character data
...
This string is the argument for the printf() function, which indicates that moving the address of this string to the address
stored in ESP (0x8048484) has something to do with this function
...
(gdb) x/2i $eip
0x8048393
mov
DWORD PTR [esp],0x8048484
0x804839a
call
0x80482a0
(gdb) x/xw $esp
0xbffff800:
0xb8000ce0
(gdb) nexti
0x0804839a
8
printf("Hello, world!\n");
(gdb) x/xw $esp
0xbffff800:
0x08048484
(gdb)
The next instruction is actually called the printf() function; it prints the
data string
...
(gdb) x/i $eip
0x804839a
call
0x80482a0
(gdb) nexti
Hello, world!
6
for(i=0; i < 10; i++)
(gdb)
Continuing to use GDB to debug, let’s examine the next two instructions
...
(gdb) x/2i $eip
0x804839f
0x80483a2
(gdb)
lea
inc
eax,[ebp-4]
DWORD PTR [eax]
These two instructions basically just increment the variable i by 1
...
The execution of this
instruction is shown below
...
The execution of this instruction is also
shown below
...
This behavior corresponds to a portion of C
code in which the variable i is incremented in the for loop
...
(gdb) x/i $eip
0x80483a4
(gdb)
jmp
0x804838b
When this instruction is executed, it will send the program back to the
instruction at address 0x804838b
...
Looking at the full disassembly again, you should be able to tell which
parts of the C code have been compiled into which machine instructions
...
(gdb) list
1
#include
The program execution will jump back to the compare instruction, continue to execute the
printf() call, and increment the counter variable until it finally equals 10
...
0x260
Back to Basics
Now that the idea of programming is less abstract, there are a few other
important concepts to know about C
...
In the same way
that knowing a little about Latin can greatly improve one’s understanding of
P rog ra m min g
37
the English language, knowledge of low-level programming concepts can
assist the comprehension of higher-level ones
...
0x261
Strings
The value "Hello, world!\n" passed to the printf() function in the previous
program is a string—technically, a character array
...
A 20-character array is simply 20
adjacent characters located in memory
...
The char_array
...
char_array
...
h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}
The GCC compiler can also be given the -o switch to define the output
file to compile to
...
reader@hacking:~/booksrc $ gcc -o char_array char_array
...
/char_array
Hello, world!
reader@hacking:~/booksrc $
In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one
...
Also notice that the last character is a 0
...
) The character array was defined, so 20 bytes
are allocated for it, but only 12 of these bytes are actually used
...
The remaining extra bytes are
just garbage and will be ignored
...
Since setting each character in a character array is painstaking and
strings are used fairly often, a set of standard functions was created for string
manipulation
...
The order of the function’s arguments is similar to Intel assembly syntax:
destination first and then source
...
c program can be rewritten
using strcpy() to accomplish the same thing using the string library
...
h since
it uses a string function
...
c
#include
h>
int main() {
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Let’s take a look at this program with GDB
...
The debugger will pause the program at
each breakpoint, giving us a chance to examine registers and memory
...
reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2
...
/char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
#include
c, line 6
...
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending
...
c, line 8
...
At each
breakpoint, we’re going to look at EIP and the instructions it points to
...
(gdb) run
Starting program: /home/reader/booksrc/char_array2
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved
Breakpoint 1, main () at char_array2
...
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) i r eip
eip
0xb7f076f4
0xb7f076f4
(gdb) x/5i $eip
0xb7f076f4
esi,DWORD PTR [ebp+8]
0xb7f076f7
eax,DWORD PTR [ebp+12]
0xb7f076fa
ecx,esi
0xb7f076fc
ecx,eax
0xb7f076fe
edx,eax
(gdb) continue
Continuing
...
c:8
8
printf(str_a);
(gdb) i r eip
eip
0x80483d7
0x80483d7
(gdb) x/5i $eip
0x80483d7
lea
eax,[ebp-40]
0x80483da
mov
DWORD PTR [esp],eax
0x80483dd
call
0x80482d4
0x80483e2
leave
0x80483e3
ret
(gdb)
40
0x200
The address in EIP at the middle breakpoint is different because the
code for the strcpy() function comes from a loaded library
...
I’d like to
point out that EIP is able to travel from the main code to the strcpy() code
and back again
...
The stack lets EIP return through long
chains of function calls
...
In the output below, the stack backtrace is shown at each breakpoint
...
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/char_array2
Error in re-setting breakpoint 4:
Function "strcpy" not defined
...
c:7
7
strcpy(str_a, "Hello, world!\n");
(gdb) bt
#0 main () at char_array2
...
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
(gdb) bt
#0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6
#1 0x080483d7 in main () at char_array2
...
Breakpoint 3, main () at char_array2
...
c:8
(gdb)
At the middle breakpoint, the backtrace of the stack shows its record of
the strcpy() call
...
This is due to an exploit protection
method that is turned on by default in the Linux kernel since 2
...
11
...
0x262
Signed, Unsigned, Long, and Short
By default, numerical values in C are signed, which means they can be both
negative and positive
...
Since it’s all just memory in the end, all numerical values must be stored
in binary, and unsigned values make the most sense in binary
...
A 32-bit signed integer is still just 32 bits, which means it can
P rog ra m min g
41
only be in one of 232 possible bit combinations
...
Essentially, one of
the bits is a flag marking the value positive or negative
...
Two’s complement represents negative numbers in a form suited for binary adders—when a negative value in
two’s complement is added to a positive number of the same magnitude, the
result will be 0
...
It sounds strange, but it works and
allows negative numbers to be added in combination with positive numbers
using simple binary adders
...
For simplicity’s sake, 8-bit numbers are used in this example
...
Then all the
bits are flipped, and 1 is added to result in the two’s complement representation for negative 73, 10110111
...
The program pcalc shows the value 256
because it’s not aware that we’re only dealing with 8-bit values
...
This example might shed some
light on how two’s complement works its magic
...
An unsigned integer would be declared
with unsigned int
...
The actual sizes will vary
depending on the architecture the code is compiled for
...
This works like a function that takes a data type as its input and returns
the size of a variable declared with that data type for the target architecture
...
c program explores the sizes of various data types, using
the sizeof() function
...
c
#include
It uses something called a format specifier to display the value returned from
the sizeof() function calls
...
reader@hacking:~/booksrc $ gcc datatype_sizes
...
/a
...
A float is also four bytes, while a char only needs
a single byte
...
0x263
Pointers
The EIP register is a pointer that “points” to the current instruction during a
program’s execution by containing its memory address
...
Since the physical memory cannot actually be moved, the
information in it must be copied
...
This is also expensive from a memory standpoint, since space for
the new destination copy must be saved or allocated before the source can be
copied
...
Instead of copying a large
block of memory, it is much simpler to pass around the address of the beginning of that block of memory
...
Since memory on the x 86 architecture uses 32-bit addressing, pointers are
also 32 bits in size (4 bytes)
...
Instead of defining a variable of that type, a pointer is
defined as something that points to data of that type
...
c program
is an example of a pointer being used with the char data type, which is only
1 byte in size
...
c
#include
h>
int main() {
char str_a[20];
char *pointer;
char *pointer2;
// A 20-element character array
// A pointer, meant for a character array
// And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array
...
// Print it
...
// Print again
...
When the character array is referenced like this,
it is actually a pointer itself
...
The second pointer is set to the
first pointer’s address plus two, and then some things are printed (shown in
the output below)
...
c
reader@hacking:~/booksrc $
...
The program is recompiled, and a
breakpoint is set on the tenth line of the source code
...
reader@hacking:~/booksrc $ gcc -g -o pointer pointer
...
/pointer
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
#include
(gdb)
11
printf(pointer);
12
13
pointer2 = pointer + 2; // Set the second one 2 bytes further in
...
15
strcpy(pointer2, "y you guys!\n"); // Copy into that spot
...
17
}
(gdb) break 11
Breakpoint 1 at 0x80483dd: file pointer
...
(gdb) run
Starting program: /home/reader/booksrc/pointer
Breakpoint 1, main () at pointer
...
Remember that
the string itself isn’t stored in the pointer variable—only the memory address
0xbffff7e0 is stored there
...
The address-of operator is a unary operator,
which simply means it operates on a single argument
...
When it’s used, the address
of that variable is returned, instead of the variable itself
...
(gdb) x/xw &pointer
0xbffff7dc:
0xbffff7e0
(gdb) print &pointer
$1 = (char **) 0xbffff7dc
(gdb) print pointer
$2 = 0xbffff7e0 "Hello, world!\n"
(gdb)
When the address-of operator is used, the pointer variable is shown to
be located at the address 0xbffff7dc in memory, and it contains the address
0xbffff7e0
...
The addressof
...
This line is shown in bold below
...
c
#include
reader@hacking:~/booksrc $ gcc -g addressof
...
/a
...
so
...
(gdb) list
1
#include
8
}
(gdb) break 8
Breakpoint 1 at 0x8048361: file addressof
...
(gdb) run
Starting program: /home/reader/booksrc/a
...
c:8
8
}
(gdb) print int_var
$1 = 5
(gdb) print &int_var
$2 = (int *) 0xbffff804
(gdb) print int_ptr
$3 = (int *) 0xbffff804
(gdb) print &int_ptr
$4 = (int **) 0xbffff800
(gdb)
As usual, a breakpoint is set and the program is executed in the
debugger
...
The first
print command shows the value of int_var, and the second shows its address
using the address-of operator
...
46
0x200
An additional unary operator called the dereference operator exists for use
with pointers
...
It takes the form of an
asterisk in front of the variable name, similar to the declaration of a pointer
...
Used in
GDB, it can retrieve the integer value int_ptr points to
...
c code (shown in addressof2
...
The added printf() functions use format
parameters, which I’ll explain in the next section
...
addressof2
...
h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
int_ptr = 0xbffff834
&int_ptr = 0xbffff830
*int_ptr = 0x00000005
int_var is located at 0xbffff834 and contains 5
int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5
reader@hacking:~/booksrc $
When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator
moves forward in the direction the pointer is pointing
...
This
function can also use format strings to print variables in many different formats
...
The way the printf() function has been used in the
previous programs, the "Hello, world!\n" string technically is the format string;
however, it is devoid of special escape sequences
...
Each format parameter
begins with a percent sign (%) and uses a single-character shorthand very
similar to formatting characters used by GDB’s examine command
...
There are also some format parameters that expect
pointers, such as the following
...
The %n
format parameter is unique in that it actually writes data
...
For now, our focus will just be the format parameters used for displaying
data
...
c program shows some examples of different format
parameters
...
c
#include
The final printf()
call uses the argument &A, which will provide the address of the variable A
...
reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings
...
/fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: '
31337', '00031337'
[string] sample Address bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $
The first two calls to printf() demonstrate the printing of variables A and B,
using different format parameters
...
The
%d format parameter allows for negative values, while %u does not, since it is
expecting unsigned values
...
This is because A is a negative number stored in two’s
complement, and the format parameter is trying to print it as if it were an
unsigned value
...
The third line in the example, labeled [field width on B], shows the use
of the field-width option in a format parameter
...
However,
this is not a maximum field width—if the value to be outputted is greater
than the field width, the field width will be exceeded
...
When 10 is used as the field width,
5 bytes of blank space are outputted before the output data
...
When 08 is used, for example, the output is 00031337
...
Remember that the variable string is actually a pointer containing
the address of the string, which works out wonderfully, since the %s format
parameter expects its data to be passed by reference
...
This value is displayed as eight
hexadecimal digits, padded by zeros
...
Minimum field widths can be set by putting a
number right after the percent sign, and if the field width begins with 0, it
will be padded with zeros
...
So far, so good
...
One key difference is that the scanf() function expects all
of its arguments to be pointers, so the arguments must actually be variable
addresses—not the variables themselves
...
The input
...
input
...
h>
#include
c, the scanf() function is used to set the count variable
...
reader@hacking:~/booksrc $ gcc -o input input
...
/input
Repeat how many times? 3
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
...
In addition, the ability to output the values of variables allows for debugging in
the program, without the use of a debugger
...
0x265
Typecasting
Typecasting is simply a way to temporarily change a variable’s data type, despite
how it was originally defined
...
The syntax for typecasting is
as follows:
(typecast_data_type) variable
This can be used when dealing with integers and floating-point variables,
as typecasting
...
typecasting
...
h>
int main() {
int a, b;
float c, d;
a = 13;
b = 5;
c = a / b;
d = (float) a / (float) b;
// Divide using integers
...
printf("[integers]\t a = %d\t b = %d\n", a, b);
printf("[floats]\t c = %f\t d = %f\n", c, d);
}
The results of compiling and executing typecasting
...
reader@hacking:~/booksrc $ gcc typecasting
...
/a
...
000000
d = 2
...
However, if these integer variables are typecast into floats, they will
be treated as such
...
6
...
Even though a pointer is just a memory address,
the C compiler still demands a data type for every pointer
...
An integer pointer should only
point to integer data, while a character pointer should only point to character data
...
An integer is four bytes
in size, while a character only takes up a single byte
...
c program will demonstrate and explain these concepts further
...
This is shorthand meant
for displaying pointers and is basically equivalent to 0x%08x
...
c
#include
printf("[integer pointer] points to %p, which contains the integer %d\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer
...
Two pointers are also defined,
one with the integer data type and one with the character data type, and they
are set to point at the start of the corresponding data arrays
...
In the loops, when the integer and character values
52
0x200
are actually printed with the %d and %c format parameters, notice that the
corresponding printf() arguments must dereference the pointer variables
...
reader@hacking:~/booksrc
reader@hacking:~/booksrc
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
reader@hacking:~/booksrc
$ gcc pointer_types
...
/a
...
Since a char is only 1 byte, the pointer to the next char
would naturally also be 1 byte over
...
In pointer_types2
...
The major changes to the code
are marked in bold
...
c
#include
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
P rog ra m min g
53
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
The output below shows the warnings spewed forth from the compiler
...
c
pointer_types2
...
c:12: warning: assignment from incompatible pointer type
pointer_types2
...
But the compiler
and perhaps the programmer are the only ones that care about a pointer’s
type
...
reader@hacking:~/booksrc
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[integer pointer] points
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
[char pointer] points to
reader@hacking:~/booksrc
$
...
out
to 0xbffff810, which contains the char
to 0xbffff814, which contains the char
to 0xbffff818, which contains the char
to 0xbffff81c, which contains the char
to 0xbffff820, which contains the char
0xbffff7f0, which contains the integer
0xbffff7f1, which contains the integer
0xbffff7f2, which contains the integer
0xbffff7f3, which contains the integer
0xbffff7f4, which contains the integer
$
'a'
'e'
'8'
'
'?'
1
0
0
0
2
Even though the int_pointer points to character data that only contains
5 bytes of data, it is still typed as an integer
...
Similarly, the char_pointer’s
address is only incremented by 1 each time, stepping through the 20 bytes of
integer data (five 4-byte integers), one byte at a time
...
The 4-byte value of 0x00000001 is actually stored
in memory as 0x01, 0x00, 0x00, 0x00
...
Since the pointer type determines the
size of the data it points to, it’s important that the type is correct
...
c below, typecasting is just a way to change the type of a
variable on the fly
...
c
#include
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
}
}
In this code, when the pointers are initially set, the data is typecast into
the pointer’s data type
...
To fix that, when 1 is added to the pointers, they must first be typecast into the correct data type so the address is incremented by the correct
amount
...
It doesn’t look too pretty, but it works
...
c
$
...
out
to 0xbffff810, which contains the char
to 0xbffff811, which contains the char
to 0xbffff812, which contains the char
to 0xbffff813, which contains the char
to 0xbffff814, which contains the char
0xbffff7f0, which contains the integer
0xbffff7f4, which contains the integer
0xbffff7f8, which contains the integer
0xbffff7fc, which contains the integer
0xbffff800, which contains the integer
$
'a'
'b'
'c'
'd'
'e'
1
2
3
4
5
P rog ra m min g
55
Naturally, it is far easier just to use the correct data type for pointers
in the first place; however, sometimes a generic, typeless pointer is desired
...
Experimenting with void pointers quickly reveals a few things about typeless
pointers
...
In order to retrieve the value stored in the pointer’s memory address, the
compiler must first know what type of data it is
...
These are fairly intuitive
limitations, which means that a void pointer’s main purpose is to simply hold
a memory address
...
c program can be modified to use a single void
pointer by typecasting it to the proper type each time it’s used
...
This also means a void pointer must always
be typecast when dereferencing it, however
...
c, which uses a void pointer
...
c
#include
printf("[char pointer] points to %p, which contains the char '%c'\n",
void_pointer, *((char *) void_pointer));
void_pointer = (void *) ((char *) void_pointer + 1);
}
void_pointer = (void *) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
c are as
follows
...
c
$
...
out
0xbffff810, which contains the char 'a'
0xbffff811, which contains the char 'b'
0xbffff812, which contains the char 'c'
0xbffff813, which contains the char 'd'
0xbffff814, which contains the char 'e'
to 0xbffff7f0, which contains the integer
to 0xbffff7f4, which contains the integer
to 0xbffff7f8, which contains the integer
to 0xbffff7fc, which contains the integer
to 0xbffff800, which contains the integer
$
1
2
3
4
5
The compilation and output of this pointer_types4
...
c
...
Since the type is taken care of by the typecasts, the void pointer is truly
nothing more than a memory address
...
In pointer_types5
...
pointer_types5
...
h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}
P rog ra m min g
57
This is rather hacky, but since this integer value is typecast into the
proper pointer types when it is assigned and dereferenced, the end result is
the same
...
reader@hacking:~/booksrc $ gcc pointer_types5
...
/a
...
In the end, after the
program has been compiled, the variables are nothing more than memory
addresses
...
0x266
Command-Line Arguments
Many nongraphical programs receive input in the form of command-line
arguments
...
This tends
to be more efficient and is a useful input method
...
The integer will contain the number of arguments, and
the array of strings will contain each of those arguments
...
c
program and its execution should explain things
...
c
#include
c
reader@hacking:~/booksrc $
...
/commandline
reader@hacking:~/booksrc $
...
/commandline
argument #1
this
argument #2
is
argument #3
a
argument #4
test
reader@hacking:~/booksrc $
The zeroth argument is always the name of the executing binary, and
the rest of the argument array (often called an argument vector) contains the
remaining arguments as strings
...
Regardless of this, the argument is passed in
as a string; however, there are standard conversion functions
...
The most common of these functions is atoi(),
which is short for ASCII to integer
...
Observe its usage
in convert
...
convert
...
h>
void usage(char *program_name) {
printf("Usage: %s
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
if(argc < 3)
// If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit
...
printf("Repeating %d times
...
}
The results of compiling and executing convert
...
reader@hacking:~/booksrc $ gcc convert
...
/a
...
/a
...
/a
...
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
In the preceding code, an if statement makes sure that three arguments
are used before these strings are accessed
...
In C it’s important to check for these types of conditions and handle them in program logic
...
The convert2
...
convert2
...
h>
void usage(char *program_name) {
printf("Usage: %s
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
//
//
if(argc < 3)
// If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit
...
printf("Repeating %d times
...
}
The results of compiling and executing convert2
...
reader@hacking:~/booksrc
reader@hacking:~/booksrc
Segmentation fault (core
reader@hacking:~/booksrc
$ gcc convert2
...
/a
...
This results in the program crashing due to a segmentation fault
...
When the program attempts to access an address
that is out of bounds, it will crash and die in what’s called a segmentation fault
...
60
0x200
reader@hacking:~/booksrc $ gcc -g convert2
...
/a
...
so
...
(gdb) run test
Starting program: /home/reader/booksrc/a
...
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2
...
(gdb) run test
The program being debugged has been started already
...
out test
Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2
...
Program received signal SIGSEGV, Segmentation fault
...
so
...
out"
(gdb) x/s 0xbffff9ce
0xbffff9ce:
"test"
(gdb) x/s 0x00000000
0x0:
(gdb) quit
The program is running
...
The where command will
sometimes show a useful backtrace of the stack; however, in this case, the
stack was too badly mangled in the crash
...
Since the argument vector is a pointer to list of strings, it is actually a
pointer to a list of pointers
...
The first one is the zeroth argument,
the second is the test argument, and the third is zero, which is out of bounds
...
P rog ra m min g
61
0x267
Variable Scoping
Another interesting concept regarding memory in C is variable scoping or
context—in particular, the contexts of variables within functions
...
In fact, multiple calls to the same function all have their own contexts
...
c
...
c
#include
reader@hacking:~/booksrc $ gcc scope
...
/a
...
Notice that within the main() function, the variable i is 3, even after calling
func1() where the variable i is 5
...
The best
way to think of this is that each function call has its own version of the
variable i
...
Variables are global if they are defined at the beginning
of the code, outside of any functions
...
c example code shown
below, the variable j is declared globally and set to 42
...
scope2
...
h>
int j = 42; // j is a global variable
...
printf("\t\t\t[in func3] i = %d, j = %d\n", i, j);
}
void func2() {
int i = 7;
printf("\t\t[in func2] i = %d, j = %d\n", i, j);
printf("\t\t[in func2] setting j = 1337\n");
j = 1337; // Writing to j
func3();
printf("\t\t[back in func2] i = %d, j = %d\n", i, j);
}
void func1() {
int i = 5;
printf("\t[in func1] i = %d, j = %d\n", i, j);
func2();
printf("\t[back in func1] i = %d, j = %d\n", i, j);
}
int main() {
int i = 3;
printf("[in main] i = %d, j = %d\n", i, j);
func1();
printf("[back in main] i = %d, j = %d\n", i, j);
}
The results of compiling and executing scope2
...
reader@hacking:~/booksrc $ gcc scope2
...
/a
...
In this case, the compiler prefers to use the local variable
...
The global variable j is just
stored in memory, and every function is able to access that memory
...
Printing the memory addresses of these
variables will give a clearer picture of what's going on
...
c example
code below, the variable addresses are printed using the unary address-of
operator
...
c
#include
void func3() {
int i = 11, j = 999; // Here, j is a local variable of func3()
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
[in main] i @ 0xbffff834 = 3
[in main] j @ 0x08049988 = 42
[in func1] i @ 0xbffff814 = 5
[in func1] j @ 0x08049988 = 42
[in func2] i @ 0xbffff7f4 = 7
[in func2] j @ 0x08049988 = 42
[in func2] setting j = 1337
[in func3] i @ 0xbffff7d4 = 11
[in func3] j @ 0xbffff7d0 = 999
[back in func2] i @ 0xbffff7f4 = 7
[back in func2] j @ 0x08049988 = 1337
[back in func1] i @ 0xbffff814 = 5
[back in func1] j @ 0x08049988 = 1337
[back in main] i @ 0xbffff834 = 3
[back in main] j @ 0x08049988 = 1337
reader@hacking:~/booksrc $
In this output, it is obvious that the variable j used by func3() is different
than the j used by the other functions
...
Also, notice that the variable i is actually a different memory address for each
function
...
Then the backtrace command shows the record of each function call
on the stack
...
c
reader@hacking:~/booksrc $ gdb -q
...
out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h>
2
3
int j = 42; // j is a global variable
...
7
printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
8
printf("\t\t\t[in func3] j @ 0x%08x = %d\n", &j, j);
9
}
P rog ra m min g
65
10
(gdb) break 7
Breakpoint 1 at 0x8048388: file scope3
...
(gdb) run
Starting program: /home/reader/booksrc/a
...
c:7
7
printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i);
(gdb) bt
#0 func3 () at scope3
...
c:17
#2 0x0804849f in func1 () at scope3
...
c:35
(gdb)
The backtrace also shows the nested function calls by looking at records
kept on the stack
...
Each line in the backtrace corresponds to a stack frame
...
The local
variables contained in each stack frame can be shown in GDB by adding the
word full to the backtrace command
...
c:7
i = 11
j = 999
#1 0x0804841d in func2 () at scope3
...
c:26
i = 5
#3 0x0804852b in main () at scope3
...
The global version of the variable j is used in the other
function’s contexts
...
Similar to global
variables, a static variable remains intact between function calls; however, static
variables are also akin to local variables since they remain local within a particular function context
...
The code in static
...
66
0x200
static
...
h>
void function() { // An example function, with its own context
int var = 5;
static int static_var = 5; // Static variable initialization
printf("\t[in function] var = %d\n", var);
printf("\t[in function] static_var = %d\n", static_var);
var++;
// Add one to var
...
}
int main() { // The main function, with its own context
int i;
static int static_var = 1337; // Another static, in a different context
for(i=0; i < 5; i++) { // Loop 5 times
...
}
}
The aptly named static_var is defined as a static variable in two places:
within the context of main() and within the context of function()
...
The function simply prints the values of the two variables in its context and then adds 1 to both of them
...
reader@hacking:~/booksrc $ gcc static
...
/a
...
This is because static variables retain their values, but also because
they are only initialized once
...
Once again, printing the addresses of these variables by dereferencing
them with the unary address operator will provide greater viability into what’s
really going on
...
c for an example
...
c
#include
static_var++; // Add 1 to static_var
...
}
}
The results of compiling and executing static2
...
reader@hacking:~/booksrc $ gcc static2
...
/a
...
You may have noticed that the addresses of the local variables all have very
high addresses, like 0xbffff814, while the global and static variables all have
very low memory addresses, like 0x0804968c and 0x8049688
...
Read on for your answers
...
Each segment represents a special portion of memory that is
set aside for a certain purpose
...
This is where
the assembled machine language instructions of the program are located
...
As a program
executes, the EIP is set to the first instruction in the text segment
...
Reads the instruction that EIP is pointing to
2
...
Executes the instruction that was read in step 1
4
...
The processor doesn’t
care about the change, because it’s expecting the execution to be nonlinear
anyway
...
Write permission is disabled in the text segment, as it is not used to store
variables, only code
...
Another advantage of this segment being read-only is that it
can be shared among different copies of the program, allowing multiple
executions of the program at the same time without any problems
...
The data and bss segments are used to store global and static program
variables
...
Although
these segments are writable, they also have a fixed size
...
Both global and static variables are able to persist
because they are stored in their own memory segments
...
Blocks of memory in this segment can be allocated and used for
whatever the programmer might need
...
All of the memory within the heap is managed by allocator and deallocator
algorithms, which respectively reserve a region of memory in the heap for
use and remove reservations to allow that portion of memory to be reused
for later reservations
...
This means a programmer using the heap
allocation functions can reserve and free memory on the fly
...
The stack segment also has variable size and is used as a temporary scratch
pad to store local function variables and context during function calls
...
When a program calls a function,
that function will have its own set of passed variables, and the function’s code
will be at a different memory location in the text (or code) segment
...
All of this information is stored together on the stack in what is
collectively called a stack frame
...
In general computer science terms, a stack is an abstract data structure
that is used frequently
...
Think of it
as putting beads on a piece of string that has a knot on one end—you can’t
get the first bead off until you have removed all the other beads
...
As the name implies, the stack segment of memory is, in fact, a stack data
structure, which contains stack frames
...
Since this is very dynamic behavior, it
makes sense that the stack is also not of a fixed size
...
The FILO nature of a stack might seem odd, but since the stack is used
to store context, it’s very useful
...
The EBP register—sometimes
called the frame pointer (FP) or local base (LB) pointer—is used to reference local
function variables in the current stack frame
...
The SFP is used to restore EBP to its previous value, and the
return address is used to restore EIP to the next instruction found after the
function call
...
70
0x200
The following stack_example
...
stack_example
...
The local variables for the function
include a single character called flag and a 10-character buffer called buffer
...
After
compiling the program, its inner workings can be examined with GDB
...
The main() function starts at 0x08048357 and test_function()
starts at 0x08048344
...
These instructions are collectively called
the procedure prologue or function prologue
...
Sometimes
the function prologue will handle some stack alignment as well
...
reader@hacking:~/booksrc $ gcc -g stack_example
...
/a
...
so
...
(gdb) disass main
Dump of assembler code for function main():
0x08048357
push
ebp
0x08048358
mov
ebp,esp
0x0804835a
sub
esp,0x18
0x0804835d
and
esp,0xfffffff0
0x08048360
mov
eax,0x0
0x08048365
sub
esp,eax
0x08048367
mov
DWORD PTR [esp+12],0x4
0x0804836f
mov
DWORD PTR [esp+8],0x3
0x08048377
mov
DWORD PTR [esp+4],0x2
0x0804837f
mov
DWORD PTR [esp],0x1
0x08048386
call
0x8048344
0x0804838b
leave
0x0804838c
ret
P rog ra m min g
71
End of assembler dump
(gdb) disass test_function()
Dump of assembler code for function test_function:
0x08048344
push
ebp
0x08048345
mov
ebp,esp
0x08048347
sub
esp,0x28
0x0804834a
mov
DWORD PTR [ebp-12],0x7a69
0x08048351
BYTE PTR [ebp-40],0x41
0x08048355
0x08048356
End of assembler dump
(gdb)
When the program is run, the main() function is called, which simply calls
test_function()
...
When test_function() is called, the function arguments are pushed onto the
stack in reverse order (since it’s FILO)
...
These values correspond to the variables d, c, b, and a in the
function
...
(gdb) disass main
Dump of assembler code for function main:
0x08048357
push
ebp
0x08048358
mov
ebp,esp
0x0804835a
sub
esp,0x18
0x0804835d
and
esp,0xfffffff0
0x08048360
mov
eax,0x0
0x08048365
sub
esp,eax
0x08048367
mov
DWORD PTR [esp+12],0x4
0x0804836f
mov
DWORD PTR [esp+8],0x3
0x08048377
mov
DWORD PTR [esp+4],0x2
0x0804837f
mov
DWORD PTR [esp],0x1
0x08048386
call
0x8048344
0x0804838b
leave
0x0804838c
ret
End of assembler dump
(gdb)
Next, when the assembly call instruction is executed, the return
address is pushed onto the stack and the execution flow jumps to the start of
test_function() at 0x08048344
...
In this case, the
return address would point to the leave instruction in main() at 0x0804838b
...
In this step, the current
value of EBP is pushed to the stack
...
The current value of ESP is then copied into EBP to set the new frame pointer
...
Memory is saved for these variables by subtracting from
ESP
...
In the
following output, a breakpoint is set in main() before the call to test_function()
and also at the beginning of test_function()
...
When the program is
run, execution stops at the breakpoint, where the register’s ESP (stack pointer),
EBP (frame pointer), and EIP (execution pointer) are examined
...
c, line 10
...
c, line 5
...
out
Breakpoint 1, main () at stack_example
...
This means the bottom of this new stack frame is at the current
value of ESP, 0xbffff7f0
...
The
output below shows similar information at the second breakpoint
...
(gdb) cont
Continuing
...
c:5
5
flag = 31337;
(gdb) i r esp ebp eip
esp
0xbffff7c0
0xbffff7c0
ebp
0xbffff7e8
0xbffff7e8
eip
0x804834a
0x804834a
(gdb) disass test_function
Dump of assembler code for function test_function:
0x08048344
push
ebp
0x08048345
mov
ebp,esp
0x08048347
sub
esp,0x28
0x0804834a
mov
DWORD PTR [ebp-12],0x7a69
0x08048351
BYTE PTR [ebp-40],0x41
0x08048355
0x08048356
End of assembler dump
...
The four arguments to
the function can be seen at the bottom of the stack frame ( ), with the return
address found directly on top ( )
...
The rest of
the memory is saved for the local stack variables: flag and buffer
...
Memory for the flag variable is shown at and memory for the
buffer variable is shown at
...
74
0x200
After the execution finishes, the entire stack frame is popped off of the
stack, and the EIP is set to the return address so the program can continue
execution
...
As each function ends, its
stack frame is popped off of the stack so execution can be returned to the
previous function
...
The various segments of memory are arranged in the order they
were presented, from the lower memory addresses to the higher memory
addresses
...
Some texts have this reversed, which can be very confusing; so for this
book, smaller memory addresses
Low addresses
Text (code) segment
are always shown at the top
...
Since the heap and the stack
The heap grows
down toward
are both dynamic, they both grow
higher memory
addresses
...
This minimizes wasted space,
The stack grows
up toward lower
allowing the stack to be larger if the
memory addresses
...
Stack segment
High addresses
0x271
Memory Segments in C
In C, as in other compiled languages, the compiled code goes into the text
segment, while the variables reside in the remaining segments
...
Variables that are defined outside of any functions are considered
to be global
...
If static or global variables are initialized with data, they are stored in the data memory segment; otherwise, these
variables are put in the bss memory segment
...
Usually, pointers are used to reference memory on the heap
...
Since the stack can contain many different stack frames, stack
variables can maintain uniqueness within different functional contexts
...
c program will help explain these concepts in C
...
c
#include
int stack_var; // Notice this variable has the same name as the one in main()
...
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These variables are in the bss segment
...
printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These variables are in the stack segment
...
The global and static variables are declared as described
earlier, and initialized counterparts are also declared
...
The heap variable is actually declared as an integer pointer, which
will point to memory allocated on the heap memory segment
...
Since the newly allocated
memory could be of any data type, the malloc() function returns a void
pointer, which needs to be typecast into an integer pointer
...
c
reader@hacking:~/booksrc $
...
out
global_initialized_var is at address 0x080497ec
static_initialized_var is at address 0x080497f0
static_var is at address 0x080497f8
global_var is at address 0x080497fc
heap_var is at address 0x0804a008
76
0x200
stack_var is at address 0xbffff834
the function's stack_var is at address 0xbffff814
reader@hacking:~/booksrc $
The first two initialized variables have the lowest memory addresses,
since they are located in the data memory segment
...
These memory addresses are slightly larger than the previous
variables’ addresses, since the bss segment is located below the data segment
...
The heap variable is stored in space allocated on the heap segment,
which is located just below the bss segment
...
Finally,
the last two stack_vars have very large memory addresses, since they are located
in the stack segment
...
This allows both memory segments to be dynamic without wasting space in
memory
...
The second stack_var in function() has its
own unique context, so that variable is stored within a different stack frame
in the stack segment
...
Since the stack grows back up toward the heap segment
with each new stack frame, the memory address for the second stack_var
(0xbffff814 ) is smaller than the address for the first stack_var (0xbffff834 )
found within main()’s context
...
However, using the heap requires a bit more effort
...
This function accepts a size as its only argument and reserves that
much space in the heap segment, returning the address to the start of this
memory as a void pointer
...
The corresponding deallocation function is free()
...
These relatively simple functions are demonstrated
in heap_example
...
heap_example
...
h>
#include
h>
P rog ra m min g
77
int main(int argc, char *argv[]) {
char *char_ptr; // A char pointer
int *int_ptr;
// An integer pointer
int mem_size;
if (argc < 2)
// If there aren't command-line arguments,
mem_size = 50; // use 50 as the default value
...
\n");
exit(-1);
}
strcpy(char_ptr, "This is memory is located on the heap
...
\n");
exit(-1);
}
*int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing
...
\n");
free(char_ptr); // Freeing heap memory
printf("\t[+] allocating another 15 bytes for char_ptr\n");
char_ptr = (char *) malloc(15); // Allocating more heap memory
if(char_ptr == NULL) { // Error checking, in case malloc() fails
fprintf(stderr, "Error: could not allocate heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
Then it uses the malloc() and
free() functions to allocate and deallocate memory on the heap
...
Since malloc() doesn’t know what type of memory it’s
allocating, it returns a void pointer to the newly allocated heap memory,
which must be typecast into the appropriate type
...
If the allocation fails and the pointer is NULL, fprintf() is used to
print an error message to standard error and the program exits
...
This function will be
explained more later, but for now, it’s just used as a way to properly display
an error
...
reader@hacking:~/booksrc $ gcc -o heap_example heap_example
...
/heap_example
[+] allocating 50 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...
[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a050) --> 'new memory'
[-] freeing int_ptr's heap memory
...
reader@hacking:~/booksrc $
In the preceding output, notice that each block of memory has an incrementally higher memory address in the heap
...
The heap allocation functions control this
behavior, which can be explored by changing the size of the initial memory
allocation
...
/heap_example 100
[+] allocating 100 bytes of memory on the heap for char_ptr
char_ptr (0x804a008) --> 'This is memory is located on the heap
...
[+] allocating another 15 bytes for char_ptr
char_ptr (0x804a008) --> 'new memory'
[-] freeing int_ptr's heap memory
...
reader@hacking:~/booksrc $
If a larger block of memory is allocated and then deallocated, the final
15-byte allocation will occur in that freed memory space, instead
...
Often, simple
informative printf() statements and a little experimentation can reveal many
things about the underlying system
...
c, there were several error checks for the malloc() calls
...
But with multiple malloc() calls, this errorchecking code needs to appear in multiple places
...
Since all the errorchecking code is basically the same for every malloc() call, this is a perfect
place to use a function instead of repeating the same instructions in multiple
places
...
c for an example
...
c
#include
h>
#include
else
mem_size = atoi(argv[1]);
printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size);
char_ptr = (char *) errorchecked_malloc(mem_size); // Allocating heap memory
strcpy(char_ptr, "This is memory is located on the heap
...
printf("int_ptr (%p) --> %d\n", int_ptr, *int_ptr);
printf("\t[-] freeing char_ptr's heap memory
...
\n");
free(int_ptr); // Freeing heap memory
printf("\t[-] freeing char_ptr's heap memory
...
\n");
exit(-1);
}
return ptr;
}
The errorchecked_heap
...
c code, except the heap memory allocation and
error checking has been gathered into a single function
...
This lets
the compiler know that there will be a function called errorchecked_malloc() that
expects a single, unsigned integer argument and returns a void pointer
...
The function itself is quite simple; it just accepts the size in bytes to
allocate and attempts to allocate that much memory using malloc()
...
This way, the custom errorchecked_malloc() function can be used in place of
a normal malloc(), eliminating the need for repetitious error checking afterward
...
0x280
Building on Basics
Once you understand the basic concepts of C programming, the rest is pretty
easy
...
In fact,
if the functions were removed from any of the preceding programs, all that
would remain are very basic statements
...
File descriptors use a set of low-level I/O functions, and filestreams are
a higher-level form of buffered I/O that is built on the lower-level functions
...
In this book, the focus will be on the low-level
I/O functions that use file descriptors
...
Because this
number is unique among the other books in a bookstore, the cashier can
scan the number at checkout and use it to reference information about this
book in the store’s database
...
Four common functions that use file descriptors
are open(), close(), read(), and write()
...
The open() function opens a file for reading and/or writing
and returns a file descriptor
...
The file descriptor is passed as an
argument to the other functions like a pointer to the opened file
...
The read() and
write() functions’ arguments are the file descriptor, a pointer to the data to
read or write, and the number of bytes to read or write from that location
...
These flags and
their usage will be explained in depth later, but for now let’s take a look at a
simple note-taking program that uses file descriptors—simplenote
...
This
program accepts a note as a command-line argument and then adds it to the
end of the file /tmp/notes
...
Other functions are used to display a usage message and to handle fatal errors
...
simplenote
...
h>
h>
h>
void usage(char *prog_name, char *filename) {
printf("Usage: %s \n", prog_name, filename);
exit(0);
}
void fatal(char *);
// A function for fatal errors
void *ec_malloc(unsigned int); // An error-checked malloc() wrapper
int main(int argc, char *argv[]) {
int fd; // file descriptor
char *buffer, *datafile;
buffer = (char *) ec_malloc(100);
datafile = (char *) ec_malloc(20);
strcpy(datafile, "/tmp/notes");
if(argc < 2)
// If there aren't command-line arguments,
usage(argv[0], datafile); // display usage message and exit
...
printf("[DEBUG] buffer
@ %p: \'%s\'\n", buffer, buffer);
printf("[DEBUG] datafile @ %p: \'%s\'\n", datafile, datafile);
strncat(buffer, "\n", 1); // Add a newline on the end
...
\n");
free(buffer);
free(datafile);
}
// A function to display an error message and then exit
void fatal(char *message) {
char error_message[100];
strcpy(error_message, "[!!] Fatal Error ");
strncat(error_message, message, 83);
perror(error_message);
exit(-1);
}
// An error-checked malloc() wrapper function
void *ec_malloc(unsigned int size) {
void *ptr;
ptr = malloc(size);
if(ptr == NULL)
fatal("in ec_malloc() on memory allocation");
return ptr;
}
Besides the strange-looking flags used in the open() function, most of this
code should be readable
...
The strlen() function accepts a string and returns its
length
...
The perror() function is short for print error and is
used in fatal() to print an additional error message (if it exists) before exiting
...
c
reader@hacking:~/booksrc $
...
/simplenote
P rog ra m min g
83
reader@hacking:~/booksrc $
...
reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
reader@hacking:~/booksrc $
...
reader@hacking:~/booksrc $ cat /tmp/notes
this is a test note
great, it works
reader@hacking:~/booksrc $
The output of the program’s execution is pretty self-explanatory, but
there are some things about the source code that need further explanation
...
h and sys/stat
...
The first set of flags is found in fcntl
...
The access mode must use at least one of
the following three flags:
O_RDONLY
O_WRONLY
O_RDWR
Open file for read-only access
...
Open file for both read and write access
...
A few of the more common and useful of these flags are
as follows:
O_APPEND
O_TRUNC
O_CREAT
Write data at the end of the file
...
Create the file if it doesn’t exist
...
When two bits enter an OR gate, the result is 1 if either the first bit or the
second bit is 1
...
Full 32-bit values can use these bitwise operators to
perform logic operations on each corresponding bit
...
c and the program output demonstrate these bitwise operations
...
c
#include
bit_b = (i & 1);
// Get the first bit
...
bit_b = (i & 1);
// Get the first bit
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
bitwise OR operator |
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
1 | 1 = 1
bitwise AND operator &
0 & 0 = 0
0 & 1 = 0
1 & 0 = 0
1 & 1 = 1
reader@hacking:~/booksrc $
The flags used for the open() function have values that correspond to
single bits
...
The fcntl_flags
...
h and how they combine with each other
...
c
#include
h>
void display_flags(char *, unsigned int);
void binary_print(unsigned int);
int main(int argc, char *argv[]) {
display_flags("O_RDONLY\t\t", O_RDONLY);
display_flags("O_WRONLY\t\t", O_WRONLY);
display_flags("O_RDWR\t\t\t", O_RDWR);
printf("\n");
display_flags("O_APPEND\t\t", O_APPEND);
display_flags("O_TRUNC\t\t\t", O_TRUNC);
display_flags("O_CREAT\t\t\t", O_CREAT);
P rog ra m min g
85
printf("\n");
display_flags("O_WRONLY|O_APPEND|O_CREAT", O_WRONLY|O_APPEND|O_CREAT);
}
void display_flags(char *label, unsigned int value) {
printf("%s\t: %d\t:", label, value);
binary_print(value);
printf("\n");
}
void binary_print(unsigned int value) {
unsigned int mask = 0xff000000; // Start with a mask for the highest byte
...
unsigned int byte, byte_iterator, bit_iterator;
for(byte_iterator=0; byte_iterator < 4; byte_iterator++) {
byte = (value & mask) / shift; // Isolate each byte
...
if(byte & 0x80) // If the highest bit in the byte isn't 0,
printf("1");
// print a 1
...
byte *= 2;
// Move all the bits to the left by 1
...
shift /= 256;
// Move the bits in shift right by 8
...
c are as follows
...
c
reader@hacking:~/booksrc $
...
out
O_RDONLY
: 0
: 00000000 00000000 00000000 00000000
O_WRONLY
: 1
: 00000000 00000000 00000000 00000001
O_RDWR
: 2
: 00000000 00000000 00000000 00000010
O_APPEND
O_TRUNC
O_CREAT
: 1024
: 512
: 64
: 00000000 00000000 00000100 00000000
: 00000000 00000000 00000010 00000000
: 00000000 00000000 00000000 01000000
O_WRONLY|O_APPEND|O_CREAT
$
: 1089
: 00000000 00000000 00000100 01000001
Using bit flags in combination with bitwise logic is an efficient and commonly used technique
...
In fcntl_flags
...
This technique only works
when all the bits are unique, though
...
This argument uses bit flags defined in sys/stat
...
S_IRUSR
Give the file read permission for the user (owner)
...
S_IXUSR
Give the file execute permission for the user (owner)
...
S_IWGRP
Give the file write permission for the group
...
S_IROTH
Give the file read permission for other (anyone)
...
S_IXOTH
Give the file execute permission for other (anyone)
...
If they don’t make sense, here’s a crash course in
Unix file permissions
...
These values can be displayed using
ls -l and are shown below in the following output
...
c
For the /etc/passwd file, the owner is root and the group is also root
...
Read, write, and execute permissions can be turned on and off for three
different fields: user, group, and other
...
These fields are also displayed in the front of the
ls -l output
...
The next three
characters display the group permissions, and the last three characters are
for the other permissions
...
Each permission corresponds to a bit flag; read is 4 (100 in binary), write is 2 (010 in binary), and
execute is 1 (001 in binary)
...
These values can be added together to define permissions for
user, group, and other using the chmod command
...
c
ls -l simplenote
...
c
chmod ugo-wx simplenote
...
c
1826 2007-09-07 02:51 simplenote
...
c
ls -l simplenote
...
c
The first command (chmod 721) gives read, write, and execute permissions to
the user, since the first number is 7 (4 + 2 + 1), write and execute permissions
to group, since the second number is 3 (2 + 1), and only execute permission to other, since the third number is 1
...
In the next chmod command, the argument ugo-wx
means Subtract write and execute permissions from user, group, and other
...
In the simplenote program, the open() function uses S_IRUSR|S_IWUSR for
its additional permission argument, which means the /tmp/notes file should
only have user read and write permission when it is created
...
This user ID can
be displayed using the id command
...
The su command can be used to switch to a different user, and if this command is run as root, it can be done without a password
...
On the LiveCD, sudo has been configured so it can be executed without a password, for simplicity’s sake
...
88
0x200
reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $ id
uid=501(jose) gid=501(jose) groups=501(jose)
jose@hacking:/home/reader/booksrc $
As the user jose, the simplenote program will run as jose if it is executed,
but it won’t have access to the /tmp/notes file
...
jose@hacking:/home/reader/booksrc $ ls -l /tmp/notes
-rw------- 1 reader reader 36 2007-09-07 05:20 /tmp/notes
jose@hacking:/home/reader/booksrc $
...
For example, the /etc/passwd file contains account
information for every user on the system, including each user’s default login
shell
...
This program needs to be able to make changes to the /etc/passwd file, but
only on the line that pertains to the current user’s account
...
This is an additional file permission bit that can be set using chmod
...
reader@hacking:~/booksrc $ which chsh
/usr/bin/chsh
reader@hacking:~/booksrc $ ls -l /usr/bin/chsh /etc/passwd
-rw-r--r-- 1 root root 1424 2007-09-06 21:05 /etc/passwd
-rwsr-xr-x 1 root root 23920 2006-12-19 20:35 /usr/bin/chsh
reader@hacking:~/booksrc $
The chsh program has the setuid flag set, which is indicated by an s in the
ls output above
...
The /etc/passwd file that chsh writes to is also owned by root and only allows
the owner to write to it
...
This
means that a running program has both a real user ID and an effective user
ID
...
c
...
c
#include
c are as follows
...
c
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 reader reader 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwxr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
c, both user IDs are shown to be 999 when
uid_demo is executed, since 999 is the user ID for reader
...
The program can still be executed, since it has execute
permission for other, and it shows that both user IDs remain 999, since
that’s still the ID of the user
...
/uid_demo
chmod: changing permissions of `
...
/uid_demo
reader@hacking:~/booksrc $ ls -l uid_demo
-rwsr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo
reader@hacking:~/booksrc $
...
The chmod u+s command turns on the setuid permission, which can be seen in the following ls -l output
...
This is how the chsh program is able to allow
any user to change his or her login shell stored in /etc/passwd
...
The next program will be a modification of the simplenote program; it will
also record the user ID of each note’s original author
...
The ec_malloc() and fatal() functions have been useful in many of our
programs
...
hacking
...
h, the functions can just be included
...
If the filename
is surrounded by quotes, the compiler looks in the current directory
...
h is in the same directory as a program, it can be included
with that program by typing #include "hacking
...
The changed lines for the new notetaker program (notetaker
...
notetaker
...
h>
h>
h>
"hacking
...
strcpy(buffer, argv[1]);
// Copy into buffer
...
// Writing data
if(write(fd, &userid, 4) == -1) // Write user ID before note data
...
if(write(fd, buffer, strlen(buffer)) == -1) // Write note
...
// Closing file
if(close(fd) == -1)
fatal("in main() while closing file");
printf("Note has been saved
...
The getuid() function is used to
get the real user ID, which is written to the datafile on the line before the note’s
line is written
...
92
0x200
reader@hacking:~/booksrc $ gcc -o notetaker notetaker
...
/notetaker
reader@hacking:~/booksrc $ sudo chmod u+s
...
/notetaker
-rwsr-xr-x 1 root root 9015 2007-09-07 05:48
...
/notetaker "this is a test of multiuser notes"
[DEBUG] buffer
@ 0x804a008: 'this is a test of multiuser notes'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
Now when the program
is executed, the program runs as the root user, so the file /var/notes is also
owned by root when it is created
...
this is a t|
|est of multiuser|
| notes
...
Because of little-endian architecture, the 4 bytes of the integer 999 appear
reversed in hexadecimal (shown in bold above)
...
The notesearch
...
Additionally, an
optional command-line argument can be supplied for a search string
...
notesearch
...
h>
h>
h"
P rog ra m min g
93
#define FILENAME "/var/notes"
int print_notes(int, int, char *);
int find_user_note(int, int);
int search_note(char *, char *);
void fatal(char *);
//
//
//
//
Note printing function
...
Search for keyword function
...
userid = getuid();
fd = open(FILENAME, O_RDONLY); // Open the file for read-only access
...
int print_notes(int fd, int uid, char *searchstring) {
int note_length;
char byte=0, note_buffer[100];
note_length = find_user_note(fd, uid);
if(note_length == -1) // If end of file reached,
return 0;
//
return 0
...
note_buffer[note_length] = 0;
// Terminate the string
...
return 1;
}
// A function to find the next note for a given userID;
// returns -1 if the end of the file is reached;
// otherwise, it returns the length of the found note
...
if(read(fd, ¬e_uid, 4) != 4) // Read the uid data
...
if(read(fd, &byte, 1) != 1) // Read the newline separator
...
if(read(fd, &byte, 1) != 1) // Read a single byte
...
length++;
}
}
lseek(fd, length * -1, SEEK_CUR); // Rewind file reading by length bytes
...
int search_note(char *note, char *keyword) {
int i, keyword_length, match=0;
keyword_length = strlen(keyword);
if(keyword_length == 0) // If there is no search string,
return 1;
// always "match"
...
if(note[i] == keyword[match]) // If byte matches keyword,
match++;
// get ready to check the next byte;
else {
//
otherwise,
if(note[i] == keyword[0]) // if that byte matches first keyword byte,
match = 1; // start the match count at 1
...
}
if(match == keyword_length) // If there is a full match,
return 1;
// return matched
...
}
Most of this code should make sense, but there are some new concepts
...
Also, the
function lseek() is used to rewind the read position in the file
...
Since this turns out to be a negative number, the position is moved backward
by length bytes
...
c
sudo chown root:root
...
/notesearch
...
But this is just a single user; what happens if a different user uses
the notetaker and notesearch programs?
reader@hacking:~/booksrc $ sudo su jose
jose@hacking:/home/reader/booksrc $
...
jose@hacking:/home/reader/booksrc $
...
This
means that value is added to all notes written with notetaker, and only notes
with a matching user ID will be displayed by the notesearch program
...
/notetaker "This is another note for the reader user"
[DEBUG] buffer @ 0x804a008: 'This is another note for the reader user'
[DEBUG] datafile @ 0x804a070: '/var/notes'
[DEBUG] file descriptor is 3
Note has been saved
...
/notesearch
[DEBUG] found a 34 byte note for user id 999
this is a test of multiuser notes
[DEBUG] found a 41 byte note for user id 999
This is another note for the reader user
-------[ end of note data ]------reader@hacking:~/booksrc $
Similarly, all notes for the user reader have the user ID 999 attached to
them
...
This is very similar to how the /etc/passwd file stores
user information for all users, yet programs like chsh and passwd allow any user
to change his own shell or password
...
In C, structs are variables that can contain many other variables
...
96
0x200
A simple example will suffice for now
...
h
...
struct tm {
int
int
int
int
int
int
int
int
int
};
tm_sec;
tm_min;
tm_hour;
tm_mday;
tm_mon;
tm_year;
tm_wday;
tm_yday;
tm_isdst;
/*
/*
/*
/*
/*
/*
/*
/*
/*
seconds */
minutes */
hours */
day of the month */
month */
year */
day of the week */
day in the year */
daylight saving time */
After this struct is defined, struct tm becomes a usable variable type, which
can be used to declare variables and pointers with the data type of the tm struct
...
c program demonstrates this
...
h is included,
the tm struct is defined, which is later used to declare the current_time and
time_ptr variables
...
c
#include
h>
int main() {
long int seconds_since_epoch;
struct tm current_time, *time_ptr;
int hour, minute, second, day, month, year;
seconds_since_epoch = time(0); // Pass time a null pointer as argument
...
localtime_r(&seconds_since_epoch, time_ptr);
// Three different ways to access struct elements:
hour = current_time
...
Time on Unix systems is kept relative to this rather arbitrary point in
time, which is also known as the epoch
...
The pointer time_ptr has already been set to the address
P rog ra m min g
97
of current_time, an empty tm struct
...
The elements of structs can be accessed in
three different ways; the first two are the proper ways to access struct elements,
and the third is a hacked solution
...
Therefore, current_time
...
Pointers to structs are often used,
since it is much more efficient to pass a four-byte pointer than an entire data
structure
...
When using a struct pointer like time_ptr, struct elements can be
similarly accessed by the struct element’s name, but using a series of characters that looks like an arrow pointing right
...
The
seconds could be accessed via either of these proper methods, using the
tm_sec element or the tm struct, but a third method is used
...
c
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311588
Current time is: 04:19:48
reader@hacking:~/booksrc $
...
out
time() - seconds since epoch: 1189311600
Current time is: 04:20:00
reader@hacking:~/booksrc $
The program works as expected, but how are the seconds being accessed
in the tm struct? Remember that in the end, it’s all just memory
...
In the line second = *((int *) time_ptr), the variable time_ptr
is typecast from a tm struct pointer to an integer pointer
...
Since
the address to the tm struct also points to the first element of this struct, this
will retrieve the integer value for tm_sec in the struct
...
c code (time_example2
...
This shows that the elements of tm struct are right next to each
other in memory
...
time_example2
...
h>
#include