Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Printed on recycled paper in the United States of America 11 10 09 08 07 1 2 3 4 5 6 7 8 9 ISBN-10: 1-59327-144-1 ISBN-13: 978-1-59327-144-2 Publisher:
For information on book distributors or translations, please contact No Starch Press, Inc
...
555 De Haro Street, Suite 250, San Francisco, CA 94107 phone: 415
...
9900; fax: 415
...
9950; info@nostarch
...
nostarch
...
-- 2nd ed
...
cm
...
Computer security
...
Computer hackers
...
Computer networks--Security measures
...
Title
...
9
...
8--dc22 2007042910
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc
...
Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark
...
While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc
...
ACKNOWLEDGMENTS I would like to thank Bill Pollock and everyone else at No Starch Press for making this book a possibility and allowing me to have so much creative control in the process
...
Seidel for keeping me interested in the science of computer science, my parents for buying that first Commodore VIC-20, and the hacker community for the innovation and creativity that produced the techniques explained in this book
...
Understanding hacking techniques is often difficult, since it requires both breadth and depth of knowledge
...
This second edition of Hacking: The Art of Exploitation makes the world of hacking more accessible by providing the complete picture—from programming to machine code to exploitation
...
This CD contains all the source code in the book and provides a development and exploitation environment you can use to follow along with the book's examples and experiment along the way
...
INTRODUCTION The idea of hacking may conjure stylized images of electronic vandalism, espionage, dyed hair, and body piercings
...
Granted, there are people out there who use hacking techniques to break the law, but hacking isn't really about that
...
The essence of hacking is finding unintended or overlooked uses for the laws and properties of a given situation and then applying them in new and inventive ways to solve a problem—whatever it may be
...
Each number must be used once and only once, and you may define the order of operations; for example, 3 * (4 + 6) + 1 = 31 is valid, however incorrect, since it doesn't total 24
...
Like the solution to this problem (shown on the last page of this book), hacked solutions follow the rules of the system, but they use those rules in counterintuitive ways
...
Since the infancy of computers, hackers have been creatively solving problems
...
The club's members used this equipment to rig up a complex system that allowed multiple operators to control different parts of the track by dialing in to the appropriate sections
...
The group moved on to programming on punch cards and ticker tape for early computers like the IBM 704 and the TX-0
...
A new program that could achieve the same result as an existing one but used fewer punch cards was considered better, even though it did the same thing
...
Being able to reduce the number of punch cards needed for a program showed an artistic mastery over the computer
...
Early hackers proved that technical problems can have artistic solutions, and they thereby transformed programming from a mere engineering task into an art form
...
The few who got it formed an informal subculture that remained intensely focused on learning and mastering their art
...
Such obstructions included authority figures, the bureaucracy of college classes, and discrimination
...
This drive to continually learn and explore transcended even the conventional boundaries drawn by discrimination, evident in the MIT model railroad club's acceptance of 12-year-old Peter Deutsch when he demonstrated his knowledge of the TX-0 and his desire to learn
...
The original hackers found splendor and elegance in the conventionally dry sciences of math and electronics
...
Their desire to dissect and understand wasn't intended to demystify artistic endeavors; it was simply a way to achieve a greater appreciation of them
...
This is not a new cultural trend; the Pythagoreans in ancient Greece had a similar ethic and subculture, despite not owning computers
...
That thirst for knowledge and its beneficial byproducts would continue on through history, from the Pythagoreans to Ada Lovelace to Alan Turing to the hackers of the MIT model railroad club
...
How does one distinguish between the good hackers who bring us the wonders of technological advancement and the evil hackers who steal our credit card numbers? The term cracker was coined to distinguish evil hackers from the good ones
...
Hackers stayed true to the Hacker Ethic, while crackers were only interested in breaking the law and making a quick buck
...
Cracker was meant to be the catch-all label for anyone doing anything unscrupulous with a computer— pirating software, defacing websites, and worst of all, not understanding what they were doing
...
The term's lack of popularity might be due to its confusing etymology— cracker originally described those who crack software copyrights and reverse engineer
copy-protection schemes
...
Few technology journalists feel compelled to use terms that most of their readers are unfamiliar with
...
Similarly, the term script kiddie is sometimes used to refer to crackers, but it just doesn't have the same zing as the shadowy hacker
...
The current laws restricting cryptography and cryptographic research further blur the line between hackers and crackers
...
This paper responded to a challenge issued by the Secure Digital Music Initiative (SDMI) in the SDMI Public Challenge, which encouraged the public to attempt to break these watermarking schemes
...
The Digital Millennium Copyright Act (DCMA) of 1998 makes it illegal to discuss or provide technology that might be used to bypass industry consumer controls
...
He had written software to circumvent overly simplistic encryption in Adobe software and presented his findings at a hacker convention in the United States
...
Under the law, the complexity of the industry consumer controls doesn't matter—it would be technically illegal to reverse engineer or even discuss Pig Latin if it were used as an industry consumer control
...
The sciences of nuclear physics and biochemistry can be used to kill, yet they also provide us with significant scientific advancement and modern medicine
...
Even if we wanted to, we couldn't suppress the knowledge of how to convert matter into energy or stop the continued technological progress of society
...
Hackers will constantly be pushing the limits of knowledge and acceptable behavior, forcing us to explore further and further
...
Just as the speedy gazelle adapted from being chased by the cheetah, and the cheetah became even faster from chasing the gazelle, the competition between hackers
provides computer users with better and stronger security, as well as more complex and sophisticated attack techniques
...
The defending hackers create IDSs to add to their arsenal, while the attacking hackers develop IDS-evasion techniques, which are eventually compensated for in bigger and better IDS products
...
The intent of this book is to teach you about the true spirit of hacking
...
Included with this book is a bootable LiveCD containing all the source code used herein as well as a preconfigured Linux environment
...
The only requirement is an x86 processor, which is used by all Microsoft Windows machines and the newer Macintosh computers—just insert the CD and reboot
...
This way, you will gain a hands-on understanding and appreciation for hacking that may inspire you to improve upon existing techniques or even to invent new ones
...
Chapter 0x200
...
Even though these two groups of hackers have different end goals, both groups use similar problem-solving techniques
...
There are interesting hacks found in both the techniques used to write elegant code and the techniques used to exploit programs
...
The hacks found in program exploits usually use the rules of the computer to bypass security in ways never intended
...
There are actually an infinite number of programs that can be written to accomplish any given task, but most of these solutions are unnecessarily large, complex, and sloppy
...
Programs that have these qualities are said to have elegance, and the clever and inventive solutions that tend to lead to this efficiency are called hacks
...
In the business world, more importance is placed on churning out functional code than on achieving clever hacks and elegance
...
While time and memory optimizations go without notice by all but the most sophisticated of users, a new feature is marketable
...
True appreciation of programming elegance is left for the hackers: computer hobbyists whose end goal isn't to make a profit but to squeeze every possible bit of functionality out of their old Commodore 64s, exploit writers who need to write tiny and amazing pieces of code to slip through narrow security cracks, and anyone else who appreciates the pursuit and the challenge of finding the best possible solution
...
Since an understanding of programming is a prerequisite to understanding how programs can be exploited, programming is a natural starting point
...
A program is nothing more than a series of statements written in a specific language
...
Driving directions, cooking recipes, football plays, and DNA are all types of programs
...
Continue on Main Street until you see a church on your right
...
Otherwise, you can just continue and make a right on 16th Street
...
Drive straight down Destination Road for 5 miles, and then you'll see the house on the right
...
Anyone who knows English can understand and follow these driving directions, since they're written in English
...
But a computer doesn't natively understand English; it only understands machine language
...
However, machine language is arcane and difficult to work with—it consists of raw bits and bytes, and it differs from architecture to architecture
...
Programming like this is painstaking and cumbersome, and it is certainly not intuitive
...
An assembler is one form of machine-language translator—it is a program that translates assembly language into machine-readable code
...
However, assembly language is still far from intuitive
...
Just as machine language for Intel x86 processors is different from machine language for Sparc processors, x86 assembly language is different from Sparc assembly language
...
If a program is written in x86 assembly language, it must be rewritten to run on Sparc architecture
...
These problems can be mitigated by yet another form of translator called a compiler
...
Highlevel languages are much more intuitive than assembly language and can be converted into many different types of machine language for different processor architectures
...
C, C++, and Fortran are all examples of high-level languages
...
Pseudo-code Programmers have yet another form of programming language called pseudocode
...
It isn't understood by compilers, assemblers, or any computers, but it is a useful way for a programmer to arrange instructions
...
It's sort of the nebulous missing link between English and high-level programming languages like C
...
Control Structures Without control structures, a program would just be a series of instructions executed in sequential order
...
The driving directions included statements like, Continue on Main Street until you see a church on your right and If the street is blocked because of construction…
...
If-Then-Else In the case of our driving directions, Main Street could be under construction
...
Otherwise, the original set of instructions should be followed
...
In general, it looks something like this: If (condition) then { Set of instructions to execute if the condition is met;
} Else { Set of instruction to execute if the condition is not met; }
For this book, a C-like pseudo-code will be used, so every instruction will end with a semicolon, and the sets of instructions will be grouped with curly braces and indentation
...
In C and many other programming languages, the then keyword is implied and therefore left out, so it has also been omitted in the preceding pseudo-code
...
These types of syntactical differences in
programming languages are only skin deep; the underlying structure is still the same
...
Since C will be used in the later sections, the pseudo code used in this book will follow a C-like syntax, but remember that pseudo-code can take on many forms
...
For the sake of readability, it's still a good idea to indent these instructions, but it's not syntactically necessary
...
If (there is only one instruction in a set of instructions) The use of curly braces to group the instructions is optional; Else { The use of curly braces is necessary; Since there must be a logical way to group these instructions; }
Even the description of a syntax itself can be thought of as a simple program
...
While/Until Loops Another elementary programming concept is the while control structure, which is a type of loop
...
A program can accomplish this task through looping, but it requires a set of conditions that tells it when to stop looping, lest it continue into infinity
...
A simple program for a hungry mouse could look something like this: While (you are hungry) { Find some food; Eat the food; }
The set of two instructions following the while statement will be repeated while the mouse is still hungry
...
Similarly, the number of times the set of instructions in the while statement is executed changes depending on how much food the mouse finds
...
An until loop is simply a while loop with the conditional statement inverted
...
The driving directions from before contained the statement Continue on Main Street until you see a church on your right
...
While (there is not a church on the right) Drive down Main Street;
For Loops Another looping control structure is the for loop
...
The driving direction Drive straight down Destination Road for 5 miles could be converted to a for loop that looks something like this: For (5 iterations) Drive straight for 1 mile;
In reality, a for loop is just a while loop with a counter
...
The first section declares the counter and sets it to its initial value, in this case 0
...
The third
and final section describes what action should be taken on the counter during each iteration
...
Using all of the control structures, the driving directions from What Is Programming? can be converted into a C-like pseudo-code that looks something like this: Begin going East on Main Street; While (there is not a church on the right) Drive down Main Street; If (street is blocked) { Turn right on 15th Street; Turn left on Pine Street; Turn right on 16th Street; } Else Turn right on 16th Street; Turn left on Destination Road; For (i=0; i<5; i++) Drive straight for 1 mile; Stop at 743 Destination Road;
More Fundamental Programming Concepts In the following sections, more universal programming concepts will be introduced
...
As I introduce these concepts, I will integrate them into pseudo-code examples using C-like syntax
...
Variables The counter used in the for loop is actually a type of variable
...
There are also variables that don't change, which are aptly called constants
...
In pseudo code, variables are simple abstract concepts, but in C (and in many other languages), variables must be declared and given a type before they can be used
...
Like a cooking recipe that lists all the required ingredients before giving the instructions, variable declarations allow you to make preparations before getting into the meat of the program
...
In the end though, despite all of the variable type declarations, everything is all just memory
...
Some of the most common types are int (integer values), float (decimal floating-point values), and char (single character values)
...
int a, b; float k; char z;
The variables a and b are now defined as integers, k can accept floating point values (such as 3
...
Variables can be assigned values when they are declared or anytime afterward, using the = operator
...
14; z = 'w'; b = a + 5;
After the following instructions are executed, the variable a will contain the value
of 13, k will contain the number 3
...
Variables are simply a way to remember values; however, with C, you must first declare each variable's type
...
In C, the following symbols are used for various arithmetic operations
...
Modulo reduction may seem like a new concept, but it's really just taking the remainder after division
...
Also, since the variables a and b are integers, the statement b = a / 5 will result in the value of 2 being stored in b, since that's the integer portion of it
...
6
...
The C language also provides several forms of shorthand for these arithmetic operations
...
Full Expression Short hand Explanat ion i = i + 1
i++ or ++i
Add 1 to the variable
...
These shorthand expressions can be combined with other arithmetic operations to produce more complex expressions
...
The first expression means Increment the value of i by 1 after evaluating the arithmetic operation, while the second expression means Increment the value of i by 1 before evaluating the arithmetic operation
...
int a, b; a = 5; b = a++ * 6;
At the end of this set of instructions, b will contain 30 and a will contain 6, since the shorthand of b = a++ * 6; is equivalent to the following statements: b = a * 6; a = a + 1;
However, if the instruction b = ++a * 6; is used, the order of the addition to a changes, resulting in the following equivalent instructions: a = a + 1; b = a * 6;
Since the order has changed, in this case b will contain 36, and a will still contain 6
...
For example, you might need to add an arbitrary value like 12 to a variable, and store the result right back in that variable (for example, i = i + 12)
...
Full Expression Short hand Explanat ion i = i + 12
i+=12
Add some value to the variable
...
i = i * 12
i*=12
Multiply some value by the variable
...
Comparison Operators Variables are frequently used in the conditional statements of the previously explained control structures
...
In C, these comparison operators use a shorthand syntax that is fairly common across many programming languages
...
This is an important distinction, since the double equal sign is used to test equivalence, while the single equal sign is used to assign a value to a variable
...
(Some programming languages like Pascal actually use := for variable assignment to eliminate visual confusion
...
This symbol can be used by itself to invert any expression
...
Logic Symbol Example OR
||
AND &&
((a < b) || (a < c))
((a < b) && !(a < c))
The example statement consisting of the two smaller conditions joined with OR logic will fire true if a is less than b, OR if a is less than c
...
These statements should be grouped with parentheses and can contain many different variations
...
Returning to the example of the mouse searching for food, hunger can be translated into a Boolean true/false variable
...
While (hungry == 1) { Find some food; Eat the food; }
Here's another shorthand used by programmers and hackers quite often
...
In fact, the comparison operators will actually return a value of 1 if the comparison is true and a value of 0 if it is false
...
Since the program only uses these two cases, the comparison operator can be dropped altogether
...
While ((hungry) && !(cat_present)) { Find some food; If(!(food_is_on_a_mousetrap)) Eat the food; }
This example assumes there are also variables that describe the presence of a cat and the location of the food, with a value of 1 for true and 0 for false
...
Functions Sometimes there will be a set of instructions the programmer knows he will need several times
...
In other languages, functions are known as subroutines or procedures
...
The driving directions from the beginning of this chapter require quite a few turns; however, listing every little instruction for every turn would be tedious (and less readable)
...
In this case, the function is passed the direction of the turn
...
When a program that knows about this function needs to turn, it can just call this function
...
Either left or right can be passed into this function, which causes the function to turn in that direction
...
For those familiar with functions in mathematics, this makes perfect sense
...
In C, functions aren't labeled with a "function" keyword; instead, they are declared by the data type of the variable they are returning
...
If a function is meant to return an integer (perhaps a function that calculates the factorial of some number x), the function could look like this: int factorial(int x) { int i; for(i=1; i < x; i++) x *= i; return x; }
This function is declared as an integer because it multiplies every value from 1 to x and returns the result, which is an integer
...
This factorial function can then be used like an integer variable in the main part of any program that knows about it
...
Also in C, the compiler must "know" about functions before it can use them
...
A function prototype is simply a way to tell the compiler to expect a function with this name, this return data type, and these data types as its functional arguments
...
An example of a function prototype for the factorial() function would look something like this: int factorial(int);
Usually, function prototypes are located near the beginning of a program
...
The only thing the compiler cares about is the function's name, its return data type, and the data types of its functional arguments
...
However, the
turn() function doesn't yet capture all the functionality that our driving directions need
...
This means that a turning function should have two variables: the direction to turn and the street to turn on to
...
A more complete turning function using proper C-like syntax is listed below in pseudo-code
...
It will continue to look for and read street signs until the target street is found; at that point, the remaining turning instructions will be executed
...
Begin going East on Main Street; while (there is not a church on the right) Drive down Main Street; if (street is blocked) { Turn(right, 15th Street); Turn(left, Pine Street); Turn(right, 16th Street); } else Turn(right, 16th Street); Turn(left, Destination Road); for (i=0; i<5; i++) Drive straight for 1 mile; Stop at 743 Destination Road;
Functions aren't commonly used in pseudo-code, since pseudo-code is mostly used as a way for programmers to sketch out program concepts before writing compilable code
...
But in a programming language like C, functions are used heavily
...
Getting Your Hands Dirty Now that the syntax of C feels more familiar and some fundamental programming concepts have been explained, actually programming in C isn't that big of a step
...
Linux is a free operating system that everyone has access to, and x86-based processors are the most popular consumer-grade processor on the planet
...
Included with this book is a Live CD you can use to follow along if your computer has an x86 processor
...
It will boot into a Linux environment without modifying your existing operating system
...
Let's get right to it
...
c program is a simple piece of C code that will print "Hello, world!" 10 times
...
c #include
...
{ puts("Hello, world!\n"); // put the string to the output
...
}
The main execution of a C program begins in the aptly named main()function
...
The first line may be confusing, but it's just C syntax that tells the compiler to include headers for a standard input/output (I/O) library named stdio
...
It is located at /usr/include/stdio
...
Since the main() function uses the printf() function from the standard I/O library, a function prototype is needed for printf() before it can be used
...
h header file
...
The rest of the code should make sense and
look a lot like the pseudo-code from before
...
It should be fairly obvious what this program will do, but let's compile it using GCC and run it just to make sure
...
The outputted translation is an executable binary file, which is called a
...
Does the compiled program do what you thought it would? reader@hacking:~/booksrc $ gcc firstprog
...
out -rwxr-xr-x 1 reader reader 6621 2007-09-06 22:16 a
...
/a
...
Most introductory programming classes just teach how to read and write C
...
Most programmers learn the language from the top down and never see the big picture
...
To see the bigger picture in the realm of programming, simply realize that C code is meant to be compiled
...
Thinking of C-source as a program is a common misconception that is exploited by hackers every day
...
out's instructions are written in machine language, an elementary language the CPU can understand
...
In this case, the processor is in a family that uses the x86 architecture
...
Each architecture has a different machine language, so the compiler acts as a middle ground—translating C code into machine language for the target architecture
...
But a hacker realizes that the compiled program is
what actually gets executed out in the real world
...
We have seen the source code for our first program and compiled it into an executable binary for the x86 architecture
...
Let's start by looking at the machine code the main() function was translated into
...
out | grep -A20 main
...
Each byte is represented in hexadecimal notation, which is a base-16 numbering system
...
Hexadecimal uses 0 through 9 to represent 0 through 9, but it also uses A through F to represent the values 10 through 15
...
This means a byte has 256 (28) possible values, so each byte can be described with 2 hexadecimal digits
...
The bits of the machine language instructions must be put somewhere, and this somewhere is called memory
...
Like a row of houses on a local street, each with its own address, memory can be thought of as a row of bytes, each with its own memory address
...
Older Intel x86 processors use a 32-bit addressing scheme,
while newer ones use a 64-bit one
...
84467441 x 1019) possible addresses
...
The hexadecimal bytes in the middle of the listing above are the machine language instructions for the x86 processor
...
But since 0101010110001001111001011000001111101100111100001 … isn't very useful to anything other than the processor, the machine code is displayed as hexadecimal bytes and each instruction is put on its own line, like splitting a paragraph into sentences
...
The instructions on the far right are in assembly language
...
The instruction ret is far easier to remember and make sense of than 0xc3 or 11000011
...
This means that since every processor architecture has different machine language instructions, each also has a different form of assembly language
...
Exactly how these machine language instructions are represented is simply a matter of convention and preference
...
The assembly shown in the output on The Bigger Picture is AT&T syntax, as just about all of Linux's disassembly tools use this syntax by default
...
The same code can be shown in Intel syntax by providing an additional command-line option, -M intel, to objdump, as shown in the output below
...
out | grep -A20 main
...
Regardless of the assembly language representation, the commands a processor understands are quite simple
...
These operations move memory around, perform some sort of basic math, or interrupt the processor to get it to do something else
...
But in the same way millions of books have been written using a relatively small alphabet of letters, an infinite number of possible programs can be created using a relatively small collection of machine instructions
...
Most of the instructions use these registers to read or write data, so understanding the registers of a processor is essential to understanding the instructions
...
The x86 Processor The 8086 CPU was the first x86 processor
...
If you remember people talking about 386 and 486 processors in the '80s and '90s, this is what they were referring to
...
I could just talk abstractly about these registers now, but I think it's always better to see things for yourself
...
Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers
...
Similar to a microscope, a debugger allows a hacker to observe the microscopic world of machine code—but a debugger is far more powerful than this metaphor allows
...
Below, GDB is used to show the state of the processor registers right before the program starts
...
/a
...
so
...
(gdb) break main
Breakpoint 1 at 0x804837a (gdb) run Starting program: /home/reader/booksrc/a
...
Exit anyway? (y or n) y reader@hacking:~/booksrc $
A breakpoint is set on the main() function so execution will stop right before our code is executed
...
The first four registers (EAX, ECX, EDX, and EBX) are known as general purpose registers
...
They are used for a variety of purposes, but they mainly act as temporary variables for the CPU when it is executing machine instructions
...
These stand for Stack Pointer, Base Pointer, Source Index, and Destination Index, respectively
...
These registers are fairly important to program execution and memory management; we will discuss them more later
...
There are load and store instructions that use these registers, but for the most part, these registers can be thought of as just simple general-purpose registers
...
Like a child pointing his finger at each word as he reads, the processor reads each instruction using the EIP register as its finger
...
Currently, it points to a memory address at 0x804838a
...
The actual memory is split into
several different segments, which will be discussed later, and these registers keep track of that
...
Assembly Language Since we are using Intel syntax assembly language for this book, our tools must be configured to use this syntax
...
You can configure this setting to run every time GDB starts up by putting the command in the file
...
Now that GDB is configured to use Intel syntax, let's begin understanding it
...
The operations are usually intuitive mnemonics: The movoperation will move a value from the source to the destination, sub will subtract, inc will increment, and so forth
...
There are also operations that are used to control the flow of execution
...
The example below first compares a 4-byte value located at EBP minus 4 with the number 9
...
If that value is less than or equal to 9, execution jumps to the instruction at 0x8048393
...
If the value isn't less than or equal to 9, execution will jump to 0x80483a6
...
The -g flag can be used by the GCC compiler to include extra debugging information, which will give GDB access to the source code
...
c reader@hacking:~/booksrc $ ls -l a
...
out reader@hacking:~/booksrc $ gdb -q
...
out Using host libthread_db library "/lib/libthread_db
...
1"
...
h> 2 3 int main() 4 { 5 int i; 6 for(i=0; i < 10; i++) 7 { 8 printf("Hello, world!\n"); 9 } 10 } (gdb) disassemble main Dump of assembler code for function main(): 0x08048384 : push ebp 0x08048385 : mov ebp,esp 0x08048387 : sub esp,0x8 0x0804838a : and esp,0xfffffff0 0x0804838d : mov eax,0x0 0x08048392 : sub esp,eax 0x08048394 : mov DWORD PTR [ebp-4],0x0
0x0804839b : cmp DWORD PTR [ebp-4],0x9 0x0804839f : jle 0x80483a3 0x080483a1 : jmp 0x80483b6 0x080483a3 : mov DWORD PTR [esp],0x80484d4 0x080483aa : call 0x80482a8 <_init+56> 0x080483af : lea eax,[ebp-4] 0x080483b2 : inc DWORD PTR [eax] 0x080483b4 : jmp 0x804839b 0x080483b6 : leave 0x080483b7 : ret End of assembler dump
...
c, line 6
...
out Breakpoint 1, main() at firstprog
...
Then a breakpoint is set at the start of main(), and the program is run
...
Since the breakpoint has been set at the start of the main() function, the program hits the breakpoint and pauses before actually executing any instructions in main()
...
Notice that EIP contains a memory address that points to an instruction in the
main() function's disassembly (shown in bold)
...
Part of the reason variables need to be declared in C is to aid the construction of this section of code
...
We'll talk more about the function prologue later, but for now we can take a cue from GDB and skip it
...
Examining memory is a critical skill for any hacker
...
In both magic and hacking, if you were to look in just the right spot, the trick would be obvious
...
But with a debugger like GDB, every aspect of a program's execution can be deterministically examined, paused, stepped through, and repeated as often as needed
...
The examine command in GDB can be used to look at a certain address of memory in a variety of ways
...
The display format also uses a single-letter shorthand, which is optionally preceded by a count of how many items to examine
...
x Display in hexadecimal
...
t Display in binary
...
In the following example, the current address of the EIP register is used
...
The memory the EIP register is pointing to can be examined by using the address stored in EIP
...
The value 077042707 in octal is the same as 0x00fc45c7 in hexadecimal, which is the same as 16532935 in base10 decimal, which in turn is the same as 00000000111111000100010111000111 in binary
...
The default size of a single unit is a four-byte unit called a word
...
The valid size letters are as follows: A single byte h A halfword, which is two bytes in size w A word, which is four bytes in size g A giant, which is eight bytes in size b
This is slightly confusing, because sometimes the term word also refers to 2-byte values
...
In this book, words and DWORDs both refer to 4-byte values
...
The following GDB output shows memory displayed in various sizes
...
The first examine command shows the first eight bytes, and naturally, the examine commands that use bigger units display more data in total
...
This same byte-reversal effect can be seen when a full four-byte word is shown as 0x00fc45c7, but when the first four bytes are shown byte by byte, they are in the order of 0xc7, 0x45, 0xfc, and 0x00
...
For example, if four bytes are to be interpreted as a single value, the bytes must be used in reverse order
...
Revisiting these values displayed both as hexadecimal and unsigned decimals might help clear up any confusion
...
Exit anyway? (y or n) y reader@hacking:~/booksrc $ bc -ql 199*(256^3) + 69*(256^2) + 252*(256^1) + 0*(256^0) 3343252480 0*(256^3) + 252*(256^2) + 69*(256^1) + 199*(256^0) 16532935 quit reader@hacking:~/booksrc $
The first four bytes are shown both in hexadecimal and standard unsigned decimal notation
...
The byte order of a given architecture is an important detail to be aware of
...
In addition to converting byte order, GDB can do other conversions with the examine command
...
The examine command also accepts the format letter i, short for instruction, to display the memory as disassembled assembly language instructions
...
/a
...
so
...
(gdb) break main Breakpoint 1 at 0x8048384: file firstprog
...
In the output above, the a
...
Since the EIP register is pointing to memory that actually contains machine language instructions, they disassemble quite nicely
...
This assembly instruction will move the value of 0 into memory located at the address stored in the EBP register, minus 4
...
Basically, this command will zero out the variable i for the for loop
...
The memory at this location can be examined several different ways
...
The examine command can examine this memory address directly or by doing the math on the fly
...
This variable named $1 can be used later to quickly re-access a particular location in memory
...
Let's execute the current instruction using the command nexti, which is short for next instruction
...
As predicted, the previous command zeroes out the 4 bytes found at EBP minus 4, which is memory set aside for the C variable i
...
The next few instructions actually make more sense to talk about in a group
...
The next instruction, jle stands for jump if less than or equal to
...
In this case the instruction says to jump to the address 0x8048393 if the value stored in memory for the C variable i is less than or equal to the value 9
...
This will cause the EIP to jump to the address 0x80483a6
...
The first address of 0x8048393 (shown in bold) is simply the instruction found after the fixed jump instruction, and the second address of 0x80483a6 (shown in italics) is located at the end of the function
...
A trained eye might notice something about the memory here, in particular the range of the bytes
...
These bytes fall within the printable ASCII range
...
The bytes 0x48, 0x65, 0x6c, and 0x6f all correspond to letters in the alphabet on the ASCII table shown below
...
ASCII Table
Oct Dec Hex Char Oct Dec Hex Char -----------------------------------------------------------000 0 00 NUL '\0' 100 64 40 @ 001 1 01 SOH 101 65 41 A 002 2 02 STX 102 66 42 B 003 3 03 ETX 103 67 43 C 004 4 04 EOT 104 68 44 D 005 5 05 ENQ 105 69 45 E 006 6 06 ACK 106 70 46 F 007 7 07 BEL '\a' 107 71 47 G 010 8 08 BS '\b' 110 72 48 H 011 9 09 HT '\t' 111 73 49 I 012 10 0A LF '\n' 112 74 4A J 013 11 0B VT '\v' 113 75 4B K 014 12 0C FF '\f' 114 76 4C L 015 13 0D CR '\r' 115 77 4D M 016 14 0E SO 116 78 4E N 017 15 0F SI 117 79 4F O 020 16 10 DLE 120 80 50 P 021 17 11 DC1 121 81 51 Q 022 18 12 DC2 122 82 52 R 023 19 13 DC3 123 83 53 S 024 20 14 DC4 124 84 54 T 025 21 15 NAK 125 85 55 U 026 22 16 SYN 126 86 56 V
027 23 17 ETB 127 87 57 W 030 24 18 CAN 130 88 58 X 031 25 19 EM 131 89 59 Y 032 26 1A SUB 132 90 5A Z 033 27 1B ESC 133 91 5B [ 034 28 1C FS 134 92 5C \ '\\' 035 29 1D GS 135 93 5D ] 036 30 1E RS 136 94 5E ^ 037 31 1F US 137 95 5F _ 040 32 20 SPACE 140 96 60 ` 041 33 21 ! 141 97 61 a 042 34 22 " 142 98 62 b 043 35 23 # 143 99 63 c 044 36 24 $ 144 100 64 d 045 37 25 % 145 101 65 e 046 38 26 & 146 102 66 f 047 39 27 ' 147 103 67 g 050 40 28 ( 150 104 68 h 051 41 29 ) 151 105 69 i 052 42 2A * 152 106 6A j 053 43 2B + 153 107 6B k 054 44 2C , 154 108 6C l 055 45 2D - 155 109 6D m 056 46 2E
...
The c format letter can be used to automatically look up a byte on the ASCII table, and the s format letter will display an entire string of character data
...
This string is the argument for the printf() function, which indicates that moving the address of this string to the address tored in ESP (0x8048484) has something to do with this function
...
These two instructions basically just increment the variable i by 1
...
The execution of this instruction is shown below
...
The execution of this instruction is also shown below
...
This behavior corresponds to a portion of C code in which the variable i is incremented in the for loop
...
(gdb) x/i $eip 0x80483a4 : jmp 0x804838b (gdb)
When this instruction is executed, it will send the program back to the instruction at address 0x804838b
...
Looking at the full disassembly again, you should be able to tell which parts of the C code have been compiled into which machine instructions
...
(gdb) list 1 #include
...
The program execution will jump
back to the compare instruction, continue to execute the printf() call, and increment the counter variable until it finally equals 10
...
Back to Basics Now that the idea of programming is less abstract, there are a few other important concepts to know about C
...
In the same way that knowing a little about Latin can greatly improve one's understanding of the English language, knowledge of low-level programming concepts can assist the comprehension of higher-level ones
...
Strings The value "Hello, world!\n" passed to the printf() function in the previous program is a string—technically, a character array
...
A 20-character array is simply 20 adjacent characters located in memory
...
The char_array
...
In the preceding program, a 20-element character array is defined as str_a, and each element of the array is written to, one by one
...
Also notice that the last character is a 0
...
) The character array was defined, so 20 bytes are allocated for it, but only 12 of these bytes are actually used
...
The remaining extra bytes are just garbage and will be ignored
...
Since setting each character in a character array is painstaking and strings are used fairly often, a set of standard functions was created for string manipulation
...
The order of the function's arguments is similar to Intel assembly syntax: destination first and then source
...
c program can be rewritten using strcpy() to accomplish the same thing using the string library
...
h since it uses a string function
...
c #include
...
h> int main() { char str_a[20]; strcpy(str_a, "Hello, world!\n"); printf(str_a); }
Let's take a look at this program with GDB
...
The debugger will pause the program at each breakpoint, giving us a chance to examine registers and memory
...
Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 2 (strcpy) pending
...
c, line 8
...
At each breakpoint, we're going to look at EIP and the instructions it points to
...
(gdb) run Starting program: /home/reader/booksrc/char_array2 Breakpoint 4 at 0xb7f076f4 Pending breakpoint "strcpy" resolved Breakpoint 1, main () at char_array2
...
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6 (gdb) i r eip eip 0xb7f076f4 0xb7f076f4
The address in EIP at the middle breakpoint is different because the code for the
strcpy() function comes from a loaded library
...
I'd like to point out that EIP is able to travel from the main code to the strcpy() code and back again
...
The stack lets EIP return through long chains of function calls
...
In the output below, the stack backtrace is shown at each breakpoint
...
Start it from the beginning? (y or n) y Starting program: /home/reader/booksrc/char_array2 Error in re-setting breakpoint 4: Function "strcpy" not defined
...
c:7 7 strcpy(str_a, "Hello, world!\n"); (gdb) bt #0 main () at char_array2
...
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6 (gdb) bt #0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc
...
6 #1 0x080483d7 in main () at char_array2
...
Breakpoint 3, main () at char_array2
...
c:8 (gdb)
At the middle breakpoint, the backtrace of the stack shows its record of the strcpy() call
...
This is due to an exploit protection method that is turned on by default in the Linux kernel since 2
...
11
...
Signed, Unsigned, Long, and Short By default, numerical values in C are signed, which means they can be both negative and positive
...
Since it's all just memory in the end, all numerical values must be stored in binary, and unsigned values make the most sense in binary
...
A 32-bit signed integer is still just 32 bits, which means it can only be in one of 232 possible bit combinations
...
Essentially, one of the bits is a flag marking the value positive
or negative
...
Two's complement represents negative numbers in a form suited for binary adders— when a negative value in two's complement is added to a positive number of the same magnitude, the result will be 0
...
It sounds strange, but it works and allows negative numbers to be added in combination with positive numbers using simple binary adders
...
For simplicity's sake, 8-bit numbers are used in this example
...
Then all the bits are flipped, and 1 is added to result in the two's complement representation for negative 73, 10110111
...
The program pcalc shows the value 256 because it's not aware that we're only dealing with 8-bit values
...
This example might shed some light on how two's complement works its magic
...
An unsigned integer would be declared with unsigned int
...
The actual sizes will vary depending on the architecture the code is compiled for
...
This works like a function that takes a data type as its input and returns the size of a variable declared with that data type for the target architecture
...
c program explores the sizes of various data types, using the sizeof() function
...
c #include
...
It uses something called a format specifier to display the value returned from the sizeof() function calls
...
reader@hacking:~/booksrc $ gcc datatype_sizes
...
/a
...
A float is also four bytes, while a char only needs a single byte
...
Pointers The EIP register is a pointer that "points" to the current instruction during a program's execution by containing its memory address
...
Since the physical memory cannot actually be moved, the information in it must be copied
...
This is also expensive from a memory standpoint, since space for the new destination copy must be saved or allocated before the source can be copied
...
Instead of copying a large block of memory, it is much simpler to pass around the address of the beginning of that block of memory
...
Since memory on the x86 architecture uses 32-bit addressing, pointers are also 32 bits in size (4 bytes)
...
Instead of defining a variable of that type, a pointer is defined as something that points to data of that type
...
c program is an example of a pointer being used with the char data type, which is only 1 byte in size
...
c #include
...
h> int main() {
char str_a[20]; // A 20-element character array char *pointer; // A pointer, meant for a character array char *pointer2; // And yet another one strcpy(str_a, "Hello, world!\n"); pointer = str_a; // Set the first pointer to the start of the array
...
printf(pointer2); // Print it
...
printf(pointer); // Print again
...
When the character array is referenced like this, it is actually a pointer itself
...
The second pointer is set to the first pointer's address plus two, and then some things are printed (shown in the output below)
...
c reader@hacking:~/booksrc $
...
The program is recompiled, and a breakpoint is set on the tenth line of the source code
...
(gdb) run Starting program: /home/reader/booksrc/pointer
Breakpoint 1, main () at pointer
...
Remember that the string itself isn't stored in the pointer variable—only the memory address 0xbffff7e0 is stored there
...
The address-of operator is a unary operator, which simply means it operates on a single argument
...
When it's used, the address of that variable is returned, instead of the variable itself
...
When the address-of operator is used, the pointer variable is shown to be located at the address 0xbffff7dc in memory, and it contains the address 0xbffff7e0
...
The addressof
...
This line is shown in bold below
...
c #include
...
reader@hacking:~/booksrc $ gcc -g addressof
...
/a
...
so
...
As usual, a breakpoint is set and the program is executed in the debugger
...
The first print command shows the value of int_var, and the second shows its address using the address-of operator
...
An additional unary operator called the dereference operator exists for use with pointers
...
It takes the form of an asterisk in front of the variable name, similar to the declaration of a pointer
...
Used in GDB, it can retrieve the integer value int_ptr points to
...
c code (shown in addressof2
...
The added printf() functions use format parameters, which I'll explain in the next section
...
addressof2
...
h> int main() { int int_var = 5; int *int_ptr; int_ptr = &int_var; // Put the address of int_var into int_ptr
...
c are as follows
...
c reader@hacking:~/booksrc $
...
out int_ptr = 0xbffff834 &int_ptr = 0xbffff830 *int_ptr = 0x00000005 int_var is located at 0xbffff834 and contains 5 int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5 reader@hacking:~/booksrc $
When the unary operators are used with pointers, the address-of operator can be thought of as moving backward, while the dereference operator moves forward in the direction the pointer is pointing
...
This function can also use format strings to print variables in many different formats
...
The way the printf() function has been used in the previous programs, the "Hello, world!\n" string technically is the format string; however, it is devoid of special escape sequences
...
Each format parameter begins with a percent sign (%) and uses a single-character shorthand very similar to formatting characters used by GDB's examine command
...
There are also some format parameters that expect pointers, such as the following
...
The %nformat parameter is unique in that it actually writes data
...
For now, our focus will just be the format parameters used for displaying data
...
c program shows some examples of different format parameters
...
c #include
...
The final printf() call uses the argument A, which will provide the address of the variable A
...
The first two calls to printf() demonstrate the printing of variables A and B, using different format parameters
...
The %d format parameter allows for negative values, while %u does not, since it is expecting unsigned values
...
This is because A is a negative number stored in two's complement, and the format parameter is trying to print it as if it were an unsigned value
...
The third line in the example, labeled [field width on B], shows the use of the field-width option in a format parameter
...
However, this is not a maximum field width—if the value to be outputted is greater than the field width, the field width will be exceeded
...
When 10 is used as the field width, 5 bytes of blank space are outputted before the output data
...
When 08 is used, for example, the output is 00031337
...
Remember that the variable string is actually a pointer containing the address of the string, which works out wonderfully, since the %s format parameter expects its data to be passed by reference
...
This value is displayed as eight hexadecimal digits, padded by zeros
...
Minimum field widths can be set by putting a number right after the percent sign, and if the field width begins with 0, it will be padded with zeros
...
So far, so good
...
One key difference is that the scanf() function expects all of its arguments to be pointers, so the arguments must actually be variable addresses— not the variables themselves
...
The input
...
input
...
h> #include
...
c, the scanf() function is used to set the count variable
...
reader@hacking:~/booksrc $ gcc -o input input
...
/input Repeat how many times? 3 0 - Hello, world! 1 - Hello, world! 2 - Hello, world! reader@hacking:~/booksrc $
...
In addition, the ability to output the values of variables allows for debugging in the program, without the use of a debugger
...
Typecasting Typecasting is simply a way to temporarily change a variable's data type, despite how it was originally defined
...
The syntax for typecasting is as follows: (typecast_data_type) variable
This can be used when dealing with integers and floating-point variables, as typecasting
...
typecasting
...
h> int main() { int a, b; float c, d;
a = 13; b = 5; c = a / b; // Divide using integers
...
printf("[integers]\t a = %d\t b = %d\n", a, b); printf("[floats]\t c = %f\t d = %f\n", c, d); }
The results of compiling and executing typecasting
...
However, if these integer variables are typecast into floats, they will be treated as such
...
6
...
Even though a pointer is just a memory address, the C compiler still demands a data type for every pointer
...
An integer pointer should only point to integer data, while a character pointer should only point to character data
...
An integer is four bytes in size, while a character only takes up a single byte
...
c program will demonstrate and explain these concepts further
...
This is shorthand meant for displaying pointers and is basically equivalent to 0x%08x
...
c #include
...
printf("[integer pointer] points to %p, which contains the integer %d\n", int_pointer, *int_pointer); int_pointer = int_pointer + 1;
} for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer
...
Two pointers are also defined, one with the integer data type and one with the character data type, and they are set to point at the start of the corresponding data arrays
...
In the loops, when the integer and character values are actually printed with the %d and %c format parameters, notice that the corresponding printf() arguments must dereference the pointer variables
...
reader@hacking:~/booksrc $ gcc pointer_types
...
/a
...
Since a char is only 1 byte, the pointer to the next char would naturally also be 1 byte over
...
In pointer_types2
...
The major changes to the code are marked in bold
...
c #include
...
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
printf("[char pointer] points to %p, which contains the integer %d\n", char_pointer, *char_pointer); char_pointer = char_pointer + 1; } }
The output below shows the warnings spewed forth from the compiler
...
c pointer_types2
...
c:12: warning: assignment from incompatible pointer type pointer_types2
...
But the compiler and perhaps the programmer are the only ones that care about a pointer's type
...
reader@hacking:~/booksrc $
...
out [integer pointer] points to 0xbffff810, which contains the char 'a' [integer pointer] points to 0xbffff814, which contains the char 'e' [integer pointer] points to 0xbffff818, which contains the char '8' [integer pointer] points to 0xbffff81c, which contains the char ' [integer pointer] points to 0xbffff820, which contains the char '?' [char pointer] points to 0xbffff7f0, which contains the integer 1 [char pointer] points to 0xbffff7f1, which contains the integer 0 [char pointer] points to 0xbffff7f2, which contains the integer 0 [char pointer] points to 0xbffff7f3, which contains the integer 0 [char pointer] points to 0xbffff7f4, which contains the integer 2 reader@hacking:~/booksrc $
Even though the int_pointer points to character data that only contains 5 bytes of data, it is still typed as an integer
...
Similarly, the char_pointer's address is only incremented by 1 each time, stepping through the 20 bytes of integer data (five 4-byte integers), one byte at a time
...
The 4-byte value of 0x00000001 is actually stored in memory as 0x01, 0x00, 0x00, 0x00
...
Since the pointer type determines the size of the data
it points to, it's important that the type is correct
...
c below, typecasting is just a way to change the type of a variable on the fly
...
c #include
...
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
printf("[char pointer] points to %p, which contains the integer %d\n", char_pointer, *char_pointer); char_pointer = (char *) ((int *) char_pointer + 1); } }
In this code, when the pointers are initially set, the data is typecast into the pointer's data type
...
To fix that, when 1 is added to the pointers, they must first be typecast into the correct data type so the address is incremented by the correct amount
...
It doesn't look too pretty, but it works
...
c reader@hacking:~/booksrc $
...
out [integer pointer] points to 0xbffff810, which contains the char 'a' [integer pointer] points to 0xbffff811, which contains the char 'b' [integer pointer] points to 0xbffff812, which contains the char 'c' [integer pointer] points to 0xbffff813, which contains the char 'd' [integer pointer] points to 0xbffff814, which contains the char 'e' [char pointer] points to 0xbffff7f0, which contains the integer 1 [char pointer] points to 0xbffff7f4, which contains the integer 2 [char pointer] points to 0xbffff7f8, which contains the integer 3 [char pointer] points to 0xbffff7fc, which contains the integer 4 [char pointer] points to 0xbffff800, which contains the integer 5 reader@hacking:~/booksrc $
Naturally, it is far easier just to use the correct data type for pointers in the first
place; however, sometimes a generic, typeless pointer is desired
...
Experimenting with void pointers quickly reveals a few things about typeless pointers
...
In order to retrieve the value stored in the pointer's memory address, the compiler must first know what type of data it is
...
These are fairly intuitive limitations, which means that a void pointer's main purpose is to simply hold a memory address
...
c program can be modified to use a single void pointer by typecasting it to the proper type each time it's used
...
This also means a void pointer must always be typecast when dereferencing it, however
...
c, which uses a void pointer
...
c #include
...
printf("[char pointer] points to %p, which contains the char '%c'\n", void_pointer, *((char *) void_pointer)); void_pointer = (void *) ((char *) void_pointer + 1); } void_pointer = (void *) int_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
c are as follows
...
c reader@hacking:~/booksrc $
...
out [char pointer] points to 0xbffff810, which contains the char 'a' [char pointer] points to 0xbffff811, which contains the char 'b' [char pointer] points to 0xbffff812, which contains the char 'c' [char pointer] points to 0xbffff813, which contains the char 'd' [char pointer] points to 0xbffff814, which contains the char 'e' [integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2 [integer pointer] points to 0xbffff7f8, which contains the integer 3 [integer pointer] points to 0xbffff7fc, which contains the integer 4 [integer pointer] points to 0xbffff800, which contains the integer 5 reader@hacking:~/booksrc $
The compilation and output of this pointer_types4
...
c
...
Since the type is taken care of by the typecasts, the void pointer is truly nothing more than a memory address
...
In pointer_types5
...
pointer_types5
...
h> int main() { int i; char char_array[5] = {'a', 'b', 'c', 'd', 'e'}; int int_array[5] = {1, 2, 3, 4, 5}; unsigned int hacky_nonpointer; hacky_nonpointer = (unsigned int) char_array; for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer
...
printf("[hacky_nonpointer] points to %p, which contains the integer %d\n", hacky_nonpointer, *((int *) hacky_nonpointer)); hacky_nonpointer = hacky_nonpointer + sizeof(int); } }
This is rather hacky, but since this integer value is typecast into the proper pointer types when it is assigned and de-referenced, the end result is the same
...
reader@hacking:~/booksrc $ gcc pointer_types5
...
/a
...
In the end, after the program has been compiled, the variables are nothing more than memory addresses
...
Command-Line Arguments Many nongraphical programs receive input in the form of command-line arguments
...
This tends to be more efficient and is a useful input method
...
The integer will contain the number of arguments, and the array of strings will contain each of those arguments
...
c program and its execution should explain things
...
c #include
...
c reader@hacking:~/booksrc $
...
/commandline reader@hacking:~/booksrc $
...
/commandline argument #1 - this argument #2 - is argument #3 - a argument #4 - test reader@hacking:~/booksrc $
The zeroth argument is always the name of the executing binary, and the rest of
the argument array (often called an argument vector) contains the remaining arguments as strings
...
Regardless of this, the argument is passed in as a string; however, there are standard conversion functions
...
The most common of these functions is atoi(), which is short for ASCII to integer
...
Observe its usage in convert
...
convert
...
h> void usage(char *program_name) { printf("Usage: %s <# of times to repeat>\n", program_name); exit(1); } int main(int argc, char *argv[]) { int i, count; if(argc < 3) // If fewer than 3 arguments are used, usage(argv[0]); // display usage message and exit
...
printf("Repeating %d times
...
}
The results of compiling and executing convert
...
In the preceding code, an if statement makes sure that three arguments are used before these strings are accessed
...
In C it's important to check for these types of conditions and handle them in program logic
...
The convert2
...
convert2
...
h> void usage(char *program_name) { printf("Usage: %s <# of times to repeat>\n", program_name); exit(1); } int main(int argc, char *argv[]) { int i, count; // if(argc < 3) // If fewer than 3 arguments are used, // usage(argv[0]); // display usage message and exit
...
printf("Repeating %d times
...
}
The results of compiling and executing convert2
...
reader@hacking:~/booksrc $ gcc convert2
...
/a
...
This results in the program crashing due to a segmentation fault
...
When the program attempts to access an address that is out of bounds, it will crash and die in what's called a segmentation fault
...
reader@hacking:~/booksrc $ gcc -g convert2
...
/a
...
so
...
(gdb) run test Starting program: /home/reader/booksrc/a
...
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6 (gdb) where #0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc
...
6 #1 0xb800183c in ?? () #2 0x00000000 in ?? () (gdb) break main Breakpoint 1 at 0x8048419: file convert2
...
(gdb) run test The program being debugged has been started already
...
out test Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2
...
Program received signal SIGSEGV, Segmentation fault
...
so
...
out" (gdb) x/s 0xbffff9ce 0xbffff9ce: "test" (gdb) x/s 0x00000000 0x0: (gdb) quit The program is running
...
The where command will sometimes show a useful backtrace of the stack; however, in this case, the stack was too badly mangled in the crash
...
Since the argument vector is a pointer to list of strings, it is actually a pointer to a list of pointers
...
The first one is the zeroth argument, the second is the test argument, and the third is zero, which is out of bounds
...
Variable Scoping Another interesting concept regarding memory in C is variable scoping or context —in particular, the contexts of variables within functions
...
In fact, multiple calls to the same function all have their own contexts
...
c
...
c #include
...
reader@hacking:~/booksrc $ gcc scope
...
/a
...
Notice that within the main() function, the variable i is 3, even after calling func1() where the variable i is 5
...
The best way to think of this is that each function call has its own version of the variable i
...
Variables are global if they are defined at the beginning of the code, outside of any functions
...
c example code shown below, the variable j is declared globally and set to 42
...
scope2
...
h> int j = 42; // j is a global variable
...
printf("\t\t\t[in func3] i = %d, j = %d\n", i, j); } void func2() { int i = 7; printf("\t\t[in func2] i = %d, j = %d\n", i, j); printf("\t\t[in func2] setting j = 1337\n"); j = 1337; // Writing to j func3();
printf("\t\t[back in func2] i = %d, j = %d\n", i, j); } void func1() { int i = 5; printf("\t[in func1] i = %d, j = %d\n", i, j); func2(); printf("\t[back in func1] i = %d, j = %d\n", i, j); } int main() { int i = 3; printf("[in main] i = %d, j = %d\n", i, j); func1(); printf("[back in main] i = %d, j = %d\n", i, j); }
The results of compiling and executing scope2
...
reader@hacking:~/booksrc $ gcc scope2
...
/a
...
In this case, the compiler prefers to use the local variable
...
The global variable j is just stored in memory, and every function is able to access that memory
...
Printing the memory addresses of these variables will give a clearer picture of what's going on
...
c example code below, the variable addresses are printed using the unary address-of operator
...
c #include
...
void func3() { int i = 11, j = 999; // Here, j is a local variable of func3()
...
c are as follows
...
c reader@hacking:~/booksrc $
...
out [in main] i @ 0xbffff834 = 3 [in main] j @ 0x08049988 = 42 [in func1] i @ 0xbffff814 = 5 [in func1] j @ 0x08049988 = 42 [in func2] i @ 0xbffff7f4 = 7 [in func2] j @ 0x08049988 = 42 [in func2] setting j = 1337 [in func3] i @ 0xbffff7d4 = 11 [in func3] j @ 0xbffff7d0 = 999 [back in func2] i @ 0xbffff7f4 = 7 [back in func2] j @ 0x08049988 = 1337 [back in func1] i @ 0xbffff814 = 5 [back in func1] j @ 0x08049988 = 1337 [back in main] i @ 0xbffff834 = 3 [back in main] j @ 0x08049988 = 1337 reader@hacking:~/booksrc $
In this output, it is obvious that the variable j used by func3() is different than the j used by the other functions
...
Also, notice that the variable i is actually a different memory address for each function
...
Then the backtrace command shows the record of each function call on the stack
...
c reader@hacking:~/booksrc $ gdb -q
...
out Using host libthread_db library "/lib/tls/i686/cmov/libthread_db
...
1"
...
h> 2 3 int j = 42; // j is a global variable
...
(gdb) run Starting program: /home/reader/booksrc/a
...
c:7 7 printf("\t\t\t[in func3] i @ 0x%08x = %d\n", &i, i); (gdb) bt #0 func3 () at scope3
...
c:17 #2 0x0804849f in func1 () at scope3
...
c:35 (gdb)
The backtrace also shows the nested function calls by looking at records kept on the stack
...
Each line in the backtrace corresponds to a stack frame
...
The local variables contained in each stack frame can be shown in GDB by adding the word full to the backtrace command
...
c:7 i = 11 j = 999 #1 0x0804841d in func2 () at scope3
...
c:26 i = 5 #3 0x0804852b in main () at scope3
...
The global version of the variable j is used in the other function's contexts
...
Similar to global
variables, a static variable remains intact between function calls; however, static variables are also akin to local variables since they remain local within a particular function context
...
The code in static
...
static
...
h> void function() { // An example function, with its own context int var = 5; static int static_var = 5; // Static variable initialization printf("\t[in function] var = %d\n", var); printf("\t[in function] static_var = %d\n", static_var); var++; // Add one to var
...
} int main() { // The main function, with its own context int i; static int static_var = 1337; // Another static, in a different context for(i=0; i < 5; i++) { // Loop 5 times
...
} }
The aptly named static_var is defined as a static variable in two places: within the context of main() and within the context of function()
...
The function simply prints the values of the two variables in its context and then adds 1 to both of them
...
reader@hacking:~/booksrc $ gcc static
...
/a
...
This is because static variables retain their values, but also because they are only initialized once
...
Once again, printing the addresses of these variables by dereferencing them with the unary address operator will provide greater viability into what's really going on
...
c for an example
...
c #include
...
static_var++; // Add 1 to static_var
...
} }
The results of compiling and executing static2
...
reader@hacking:~/booksrc $ gcc static2
...
/a
...
You may have noticed that the addresses of the local variables all have very high addresses, like 0xbffff814, while the global and static variables all have very low memory addresses, like 0x0804968c and 0x8049688
...
Read on for your answers
...
Each segment represents a special portion of memory that is set aside for a certain purpose
...
This is where the assembled machine language instructions of the program are located
...
As a program executes, the EIP is set to the first instruction in the text segment
...
Reads the instruction that EIP is pointing to 2
...
Executes the instruction that was read in step 1 4
...
The processor doesn't care about the change, because it's expecting the execution to be nonlinear anyway
...
Write permission is disabled in the text segment, as it is not used to store variables, only code
...
Another advantage of this segment being read-only is that it can be shared among different copies of the program, allowing multiple executions of the program at the same time without any problems
...
The data and bss segments are used to store global and static program variables
...
Although these segments are writable, they also have a fixed size
...
Both global and static variables are able to persist because they are stored in their own memory segments
...
Blocks of memory in this segment can be allocated and used for whatever the programmer might need
...
All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserve a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations
...
This means a programmer using the heap allocation functions can reserve and free memory on the fly
...
The stack segment also has variable size and is used as a temporary scratch pad to store local function variables and context during function calls
...
When a program calls a function, that function will have its own set of passed variables, and the function's code will be at a different memory location in the text (or code) segment
...
All of this information is stored together on the stack in what is collectively called a stack frame
...
In general computer science terms, a stack is an abstract data structure that is used frequently
...
Think of it as putting beads on a piece of string that has a knot on one end—you can't get the first bead off until you have removed all the other beads
...
As the name implies, the stack segment of memory is, in fact, a stack data structure, which contains stack frames
...
Since this is very dynamic behavior, it makes sense that the stack is also not of a fixed size
...
The FILO nature of a stack might seem odd, but since the stack is used to store context, it's very useful
...
The EBP register—sometimes called the frame pointer (FP) or local base (LB) pointer— is used to reference local function variables in the current stack frame
...
The SFP is used to restore EBP to its previous value, and the return address is used to restore EIP to the next instruction found after the function call
...
The following stack_example
...
Memory Segmentation stack_example
...
The local variables for the function include a single character called flag and a 10-character buffer called buffer
...
After compiling the program, its inner workings can be examined with GDB
...
The main() function starts at 0x08048357 and test_function()starts at 0x08048344
...
These instructions are collectively called the procedure prologue or function prologue
...
Sometimes the function prologue will handle some stack alignment as well
...
reader@hacking:~/booksrc $ gcc -g stack_example
...
/a
...
so
...
(gdb) disass main Dump of assembler code for function main(): 0x08048357 : push ebp 0x08048358 : mov ebp,esp 0x0804835a : sub esp,0x18 0x0804835d : and esp,0xfffffff0 0x08048360 : mov eax,0x0 0x08048365 : sub esp,eax
0x08048367 : mov DWORD PTR [esp+12],0x4 0x0804836f : mov DWORD PTR [esp+8],0x3 0x08048377 : mov DWORD PTR [esp+4],0x2 0x0804837f : mov DWORD PTR [esp],0x1 0x08048386 : call 0x8048344 0x0804838b : leave 0x0804838c : ret End of assembler dump (gdb) disass test_function() Dump of assembler code for function test_function: 0x08048344 : push ebp 0x08048345 : mov ebp,esp
0x08048347 : sub esp,0x28
0x0804834a : mov DWORD PTR [ebp-12],0x7a69 0x08048351 : mov BYTE PTR [ebp-40],0x41 0x08048355 : leave 0x08048356 : ret End of assembler dump (gdb)
When the program is run, the main() function is called, which simply calls test_function()
...
When test_function() is called, the function arguments are pushed onto the stack in reverse order (since it's FILO)
...
These values correspond to the variables d, c, b, and a in the function
...
(gdb) disass main Dump of assembler code for function main: 0x08048357 : push ebp 0x08048358 : mov ebp,esp 0x0804835a : sub esp,0x18 0x0804835d : and esp,0xfffffff0 0x08048360 : mov eax,0x0 0x08048365 : sub esp,eax 0x08048367 : mov DWORD PTR [esp+12],0x4 0x0804836f : mov DWORD PTR [esp+8],0x3 0x08048377 : mov DWORD PTR [esp+4],0x2 0x0804837f : mov DWORD PTR [esp],0x1
0x08048386 : call 0x8048344 0x0804838b : leave 0x0804838c : ret End of assembler dump (gdb)
Next, when the assembly call instruction is executed, the return address is pushed onto the stack and the execution flow jumps to the start of test_function() at 0x08048344
...
In this case, the return address would point to the leave instruction in main() at 0x0804838b
...
In this step, the current value of EBP is pushed to the stack
...
The current value of ESP is then copied into EBP to set the new frame pointer
...
Memory is saved for these variables by subtracting fromESP
...
We can watch the stack frame construction on the stack using GDB
...
GDB will put the first breakpoint before the function arguments are pushed to the stack, and the second breakpoint after test_function()'s procedure prologue
...
(gdb) list main 4 5 flag = 31337; 6 buffer[0] = 'A'; 7 } 8 9 int main() { 10 test_function(1, 2, 3, 4); 11 } (gdb) break 10 Breakpoint 1 at 0x8048367: file stack_example
...
(gdb) break test_function Breakpoint 2 at 0x804834a: file stack_example
...
This breakpoint is right before the stack frame for the test_function() call is
created
...
The next breakpoint is right after the procedure prologue for test_function(), so continuing will build the stack frame
...
The local variables (flag and buffer) are referenced relative to the frame pointer (EBP)
...
Breakpoint 2, test_function (a=1, b=2, c=3, d=4) at stack_example
...
The stack frame is shown on the stack at the end
...
Above that is the saved frame pointer of 0xbffff808 ( ), which is what EBP was in the previous stack frame
...
Calculating their relative addresses to EBP show their exact locations in the stack frame
...
The extra space in the stack frame is just padding
...
If another function was called within the function, another stack frame would be pushed onto the stack, and so on
...
This behavior is the reason this segment of memory is organized in a FILO data structure
...
Since most people are familiar with seeing numbered lists that count downward, the smaller memory addresses are shown at the top
...
Most debuggers also display memory in this style, with the smaller memory addresses at the top and the higher ones at the bottom
...
This minimizes wasted space, allowing the stack to be larger if the heap is small and vice versa
...
Memory Segments in C In C, as in other compiled languages, the compiled code goes into the text segment, while the variables reside in the remaining segments
...
Variables that are defined outside of any functions are considered to be global
...
If static or global variables are initialized with data, they are stored in the data memory segment; otherwise, these variables are put in the bss memory segment
...
Usually, pointers are used to reference memory on the heap
...
Since the stack can contain many different stack frames, stack variables can maintain uniqueness within different functional contexts
...
c program will help explain these concepts in C
...
c #include
...
int stack_var; // Notice this variable has the same name as the one in main()
...
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var); printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var); // These variables are in the bss segment
...
printf("heap_var is at address 0x%08x\n\n", heap_var_ptr); // These variables are in the stack segment
...
The global and static variables are declared as described earlier, and initialized counterparts are also declared
...
The heap variable is actually declared as an integer pointer, which will point to memory allocated on the heap memory segment
...
Since the newly allocated memory could be of any data type, the malloc() function returns a void pointer, which needs to be typecast into an integer pointer
...
c reader@hacking:~/booksrc $
...
out global_initialized_var is at address 0x080497ec static_initialized_var is at address 0x080497f0 static_var is at address 0x080497f8 global_var is at address 0x080497fc
heap_var is at address 0x0804a008 stack_var is at address 0xbffff834 the function's stack_var is at address 0xbffff814 reader@hack ing:~/booksrc $
The first two initialized variables have the lowest memory addresses, since they are located in the data memory segment
...
These memory addresses are slightly larger than the previous variables' addresses, since the bss segment is located below the data segment
...
The heap variable is stored in space allocated on the heap segment, which is located just below the bss segment
...
Finally, the last two stack_vars have very large memory addresses, since they are located in the stack segment
...
This allows both memory segments to be dynamic without wasting space in memory
...
The second stack_var in function() has its own unique context, so that variable is stored within a different stack frame in the stack segment
...
Since the stack grows back up toward the heap segment with each new stack frame, the memory address for the second stack_var(0xbffff814) is smaller than the address for the first stack_var (0xbffff834) found within main()'s context
...
However, using the heap requires a bit more effort
...
This function accepts a size as its only argument and reserves that much space in the heap segment, returning the address to the start of this memory as a void pointer
...
The corresponding deallocation function is free()
...
These relatively simple functions are demonstrated in heap_example
...
heap_example
...
h>
#include
...
h> int main(int argc, char *argv[]) { char *char_ptr; // A char pointer int *int_ptr; // An integer pointer int mem_size; if (argc < 2) // If there aren't command-line arguments, mem_size = 50; // use 50 as the default value
...
\n"); exit(-1); } strcpy(char_ptr, "This is memory is located on the heap
...
\n"); exit(-1); } *int_ptr = 31337; // Put the value of 31337 where int_ptr is pointing
...
\n"); free(char_ptr); // Freeing heap memory printf("\t[+] allocating another 15 bytes for char_ptr\n"); char_ptr = (char *) malloc(15); // Allocating more heap memory if(char_ptr == NULL) { // Error checking, in case malloc() fails fprintf(stderr, "Error: could not allocate heap memory
...
\n"); free(int_ptr); // Freeing heap memory printf("\t[-] freeing char_ptr's heap memory
...
Then it uses the malloc() and free() functions to allocate and deallocate memory on the heap
...
Since malloc() doesn't know what type of memory it's allocating, it returns a void pointer to the newly allocated heap memory, which must be typecast into the appropriate type
...
If the allocation fails and the pointer is NULL, fprintf() is used to print an error message to standard error and the program exits
...
This function will be explained more later, but for now, it's just used as a way to properly display an error
...
reader@hacking:~/booksrc $ gcc -o heap_example heap_example
...
/heap_example [+] allocating 50 bytes of memory on the heap for char_ptr char_ptr (0x804a008) --> 'This is memory is located on the heap
...
[+] allocating another 15 bytes for char_ptr char_ptr (0x804a050) --> 'new memory' [-] freeing int_ptr's heap memory
...
reader@hacking:~/booksrc $
In the preceding output, notice that each block of memory has an incrementally higher memory address in the heap
...
The heap allocation functions control this behavior, which can be explored by changing the size of the initial memory allocation
...
/heap_example 100 [+] allocating 100 bytes of memory on the heap for char_ptr char_ptr (0x804a008) --> 'This is memory is located on the heap
...
[+] allocating another 15 bytes for char_ptr char_ptr (0x804a008) --> 'new memory' [-] freeing int_ptr's heap memory
...
reader@hacking:~/booksrc $
If a larger block of memory is allocated and then deallocated, the final 15-byte allocation will occur in that freed memory space, instead
...
Often, simple informative printf() statements and a little experimentation can reveal many things about the underlying system
...
c, there were several error checks for the malloc() calls
...
But with multiple malloc() calls, this error-checking code needs to appear in multiple places
...
Since all the error-checking code is basically the same for every malloc() call, this is a perfect place to use a function instead of repeating the same instructions in multiple places
...
c for an example
...
c #include
...
h> #include
...
else mem_size = atoi(argv[1]); printf("\t[+] allocating %d bytes of memory on the heap for char_ptr\n", mem_size); char_ptr = (char *) errorchecked_malloc(mem_size); // Allocating heap memory strcpy(char_ptr, "This is memory is located on the heap
...
The errorchecked_heap
...
c code, except the heap memory allocation and error checking has been gathered into a single function
...
This lets the compiler know that there will be a function called errorchecked_malloc() that expects a single, unsigned integer argument and returns a void pointer
...
The function itself is quite simple; it just accepts the size in bytes to allocate and attempts to allocate that much memory using malloc()
...
This way, the custom errorchecked_malloc() function can be used in place of a normal malloc(), eliminating the need for repetitious error checking afterward
...
Building on Basics Once you understand the basic concepts of C programming, the rest is pretty easy
...
In fact, if the functions were removed from any of the preceding programs, all that would remain are very basic statements
...
File descriptors use a set of low-level I/O functions, and filestreams are a higher-level form of buffered I/O that is built on the lower-level functions
...
In this book, the focus will be on the low-level I/O functions that use file descriptors
...
Because this number is unique among the other books in a bookstore, the cashier can scan the number at checkout and use it to reference information about this book in the store's database
...
Four common functions that use file descriptors are open(), close(), read(), and write()
...
The open() function opens a file for reading and/or writing and returns a file descriptor
...
The file descriptor is passed as an argument to the other functions like a pointer to the opened file
...
The read() and write() functions' arguments are the file descriptor, a pointer to the data to read or write, and the number of bytes to read or write from that location
...
These flags and their usage will be explained in depth later, but for now let's take a look at a simple note-taking program that uses file descriptors —simplenote
...
This program accepts a note as a command-line argument and then adds it to the end of the file /tmp/notes
...
Other functions are used to display a usage message and to handle fatal errors
...
simplenote
...
h> #include
...
h> #include
...
h> void usage(char *prog_name, char *filename) { printf("Usage: %s \n", prog_name, filename); exit(0); } void fatal(char *); // A function for fatal errors void *ec_malloc(unsigned int); // An error-checked malloc() wrapper int main(int argc, char *argv[]) { int fd; // file descriptor char *buffer, *datafile; buffer = (char *) ec_malloc(100); datafile = (char *) ec_malloc(20); strcpy(datafile, "/tmp/notes"); if(argc < 2) // If there aren't command-line arguments, usage(argv[0], datafile); // display usage message and exit
...
printf("[DEBUG] buffer @ %p: \'%s\'\n", buffer, buffer); printf("[DEBUG] data file @ %p: \'%s\'\n", datafile, datafile); strncat(buffer, "\n", 1); // Add a newline on the end
...
\n"); free(buffer); free(datafile); } // A function to display an error message and then exit void fatal(char *message) { char error_message[100]; strcpy(error_message, "[!!] Fatal Error "); strncat(error_message, message, 83); perror(error_message); exit(-1); } // An error-checked malloc() wrapper function void *ec_malloc(unsigned int size) { void *ptr; ptr = malloc(size); if(ptr == NULL) fatal("in ec_malloc() on memory allocation");
return ptr; }
Besides the strange-looking flags used in the open() function, most of this code should be readable
...
The strlen() function accepts a string and returns its length
...
The perror() function is short for print error and is used in fatal() to print an additional error message (if it exists) before exiting
...
c reader@hacking:~/booksrc $
...
/simplenote reader@hacking:~/booksrc $
...
reader@hacking:~/booksrc $ cat /tmp/notes this is a test note reader@hacking:~/booksrc $
...
reader@hacking:~/booksrc $ cat /tmp/notes this is a test note great, it works reader@hacking:~/booksrc $
The output of the program's execution is pretty self-explanatory, but there are some things about the source code that need further explanation
...
h and sys/stat
...
The first set of flags is found in fcntl
...
The access mode must use at least one of the following three flags: Open file for read-only access
...
O_RDWR Open file for both read and write access
...
A few of the more common and useful of these flags areas follows: Write data at the end of the file
...
O_CREAT Create the file if it doesn't exist
...
When two bits enter an OR gate, the result is 1 if either the first bit or the second bit is 1
...
Full 32-bit values can use these bitwise operators to perform logic operations on each corresponding bit
...
c and the program output demonstrate these bitwise operations
...
c #include
...
bit_b = (i & 1); // Get the first bit
...
bit_b = (i & 1); // Get the first bit
...
c are as follows
...
c reader@hacking:~/booksrc $
...
out bitwise OR operator | 0 | 0 = 0 0 | 1 = 1 1 | 0 = 1 1 | 1 = 1 bitwise AND operator & 0 & 0 = 0 0 & 1 = 0 1 & 0 = 0 1 & 1 = 1 reader@hacking:~/booksrc $
The flags used for the open() function have values that correspond to single bits
...
The fcntl_flags
...
h and how they combine with each other
...
c #include
...
h> void display_flags(char *, unsigned int); void binary_print(unsigned int); int main(int argc, char *argv[]) { display_flags("O_RDONLY\t\t", O_RDONLY); display_flags("O_WRONLY\t\t", O_WRONLY); display_flags("O_RDWR\t\t\t", O_RDWR); printf("\n"); display_flags("O_APPEND\t\t", O_APPEND); display_flags("O_TRUNC\t\t\t", O_TRUNC); display_flags("O_CREAT\t\t\t", O_CREAT);
printf("\n"); display_flags("O_WRONLY|O_APPEND|O_CREAT", O_WRONLY|O_APPEND|O_CREAT); } void display_flags(char *label, unsigned int value) { printf("%s\t: %d\t:", label, value); binary_print(value); printf("\n"); } void binary_print(unsigned int value) { unsigned int mask = 0xff000000; // Start with a mask for the highest byte
...
if(byte & 0x80) // If the highest bit in the byte isn't 0, printf("1"); // print a 1
...
byte *= 2; // Move all the bits to the left by 1
...
shift /= 256; // Move the bits in shift right by 8
...
c are as follows
...
c reader@hacking:~/booksrc $
...
out O_RDONLY : 0 : 00000000 00000000 00000000 00000000 O_WRONLY : 1 : 00000000 00000000 00000000 00000001 O_RDWR : 2 : 00000000 00000000 00000000 00000010 O_APPEND : 1024 : 00000000 00000000 00000100 00000000 O_TRUNC : 512 : 00000000 00000000 00000010 00000000 O_CREAT : 64 : 00000000 00000000 00000000 01000000 O_WRONLY|O_APPEND|O_CREAT : 1089 : 00000000 00000000 00000100 01000001 $
Using bit flags in combination with bitwise logic is an efficient and commonly used technique
...
In fcntl_flags
...
This technique only works when all the bits are unique, though
...
This argument uses bit flags defined in sys/stat
...
Give the file read permission for the user (owner)
...
S_IXUSR Give the file execute permission for the user (owner)
...
S_IWGRP Give the file write permission for the group
...
S_IROTH Give the file read permission for other (anyone)
...
S_IXOTH Give the file execute permission for other (anyone)
...
If they don't make sense, here's a crash course in Unix file permissions
...
These values can be displayed using ls -l and are shown below in the following output
...
c reader@hacking:~/booksrc $
For the /etc/passwd file, the owner is root and the group is also root
...
Read, write, and execute permissions can be turned on and off for three different fields: user, group, and other
...
These fields are also displayed in the front of the ls -l output
...
The next three characters display the group permissions, and the last three characters are for the other permissions
...
Each permission corresponds to a bit flag; read is 4 (100 in binary), write is 2 (010 in binary), and execute is 1 (001 in binary)
...
These values can be added together to define permissions for user, group, and other using the chmod command
...
c reader@hacking:~/booksrc $ ls -l simplenote
...
c reader@hacking:~/booksrc $ chmod ugo-wx simplenote
...
c -r-------- 1 reader reader 1826 2007-09-07 02:51 simplenote
...
c reader@hacking:~/booksrc $ ls -l simplenote
...
c
reader@hacking:~/booksrc $
The first command (chmod 721) gives read, write, and execute permissions to the user, since the first number is 7 (4 + 2 + 1), write and execute permissions to group, since the second number is 3 (2 + 1), and only execute permission to other, since the third number is 1
...
In the next chmod command, the argument ugo-wx means Subtract write and execute permissions from user, group, and other
...
In the simplenote program, the open() function uses S_IRUSR|S_IWUSR for its additional permission argument, which means the /tmp/notes file should only have user read and write permission when it is created
...
This user ID can be displayed using the id command
...
The su command can be used to switch to a different user, and if this command is run as root, it can be done without a password
...
On the LiveCD, sudo has been configured so it can be executed without a password, for simplicity's sake
...
reader@hacking:~/booksrc $ sudo su jose jose@hacking:/home/reader/booksrc $ id uid=501(jose) gid=501(jose) groups=501(jose) jose@hacking:/home/reader/booksrc $
As the user jose, the simplenote program will run as jose if it is executed, but it won't have access to the /tmp/notes file
...
jose@hacking:/home/reader/booksrc $ ls -l /tmp/notes -rw------- 1 reader reader 36 2007-09-07 05:20 /tmp/notes jose@hacking:/home/reader/booksrc $
...
For example, the /etc/passwd file contains account information for every user on the system, including each user's default login shell
...
This program needs to be able to make changes to the /etc/passwd file, but only on the line that pertains to the current user's account
...
This is an additional file permission bit that can be set using chmod
...
The chsh program has the setuid flag set, which is indicated by an s in the ls output above
...
The /etc/passwd file that chsh writes to is also owned by root and only allows the owner to write to it
...
This means that a running program has both a real user ID and an effective user ID
...
c
...
c #include
...
c are as follows
...
c reader@hacking:~/booksrc $ ls -l uid_demo -rwxr-xr-x 1 reader reader 6825 2007-09-07 05:32 uid_demo reader@hacking:~/booksrc $
...
/uid_demo reader@hacking:~/booksrc $ ls -l uid_demo -rwxr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo reader@hacking:~/booksrc $
...
c, both user IDs are shown to be 999 when uid_demo is executed, since 999 is the user ID for reader
...
The program can still be executed, since it has execute permission for other, and it shows that both user IDs remain 999, since that's still the ID of the user
...
/uid_demo chmod: changing permissions of `
...
/uid_demo reader@hacking:~/booksrc $ ls -l uid_demo -rwsr-xr-x 1 root root 6825 2007-09-07 05:32 uid_demo reader@hacking:~/booksrc $
...
The chmod u+s command turns on the setuid permission, which can be seen in the following ls -l output
...
This is how the chsh program is able to allow any user to change his or her login shell stored in /etc/passwd
...
The next program will be a modification of the simplenote program; it will also record the user ID of each note's original author
...
The ec_malloc() and fatal() functions have been useful in many of our programs
...
hacking
...
h, the functions can just be included
...
If the filename is surrounded by quotes, the compiler looks in the current directory
...
h is in the same directory as a program, it can be included with that program by typing #include "hacking
...
The changed lines for the new notetaker program (notetaker
...
// Closing file if(close(fd) == -1) fatal("in main() while closing file"); printf("Note has been saved
...
The getuid() function is used to get the real user ID, which is written to the datafile on the line before the note's line is written
...
reader@hacking:~/booksrc $ gcc -o notetaker notetaker
...
/notetaker reader@hacking:~/booksrc $ sudo chmod u+s
...
/notetaker -rwsr-xr-x 1 root root 9015 2007-09-07 05:48
...
/notetaker "this is a test of multiuser notes" [DEBUG] buffer @ 0x804a008: 'this is a test of multiuser notes' [DEBUG] datafile @ 0x804a070: '/var/notes' [DEBUG] file descriptor is 3 Note has been saved
...
Now when the program is executed, the program runs as the root user, so the file /var/notes is also owned by root when it is created
...
this is a t| 00000010 65 73 74 20 6f 66 20 6d 75 6c 74 69 75 73 65 72 |est of multiuser| 00000020 20 6e 6f 74 65 73 0a | notes
...
Because of
little-endian architecture, the 4 bytes of the integer 999 appear reversed in hexadecimal (shown in bold above)
...
The notesearch
...
Additionally, an optional command-line argument can be supplied for a search string
...
int search_note(char *, char *); // Search for keyword function
...
userid = getuid(); fd = open(FILENAME, O_RDONLY); // Open the file for read-only access
...
int print_notes(int fd, int uid, char *searchstring) { int note_length; char byte=0, note_buffer[100]; note_length = find_user_note(fd, uid); if(note_length == -1) // If end of file reached, return 0; // return 0
...
note_buffer[note_length] = 0; // Terminate the string
...
return 1; } // A function to find the next note for a given userID; // returns -1 if the end of the file is reached; // otherwise, it returns the length of the found note
...
if(read(fd, ¬e_uid, 4) != 4) // Read the uid data
...
int search_note(char *note, char *keyword) { int i, keyword_length, match=0; keyword_length = strlen(keyword); if(keyword_length == 0) // If there is no search string, return 1; // always "match"
...
if(note[i] == keyword[match]) // If byte matches keyword, match++; // get ready to check the next byte; else { // otherwise, if(note[i] == keyword[0]) // if that byte matches first keyword byte, match = 1; // start the match count at 1
...
} if(match == keyword_length) // If there is a full match, return 1; // return matched
...
}
Most of this code should make sense, but there are some new concepts
...
Also, the function
lseek() is used to rewind the read position in the file
...
Since this turns out to be a negative number, the position is moved backward by length bytes
...
c reader@hacking:~/booksrc $ sudo chown root:root
...
/notesearch reader@hacking:~/booksrc $
...
But this is just a single user; what happens if a different user uses the notetaker and notesearch programs? reader@hacking:~/booksrc $ sudo su jose jose@hacking:/home/reader/booksrc $
...
jose@hacking:/home/reader/booksrc $
...
This means that value is added to all notes written with notetaker, and only notes with a matching user ID will be displayed by the notesearch program
...
/notetaker "This is another note for the reader user" [DEBUG] buffer @ 0x804a008: 'This is another note for the reader user' [DEBUG] datafile @ 0x804a070: '/var/notes' [DEBUG] file descriptor is 3 Note has been saved
...
/notesearch [DEBUG] found a 34 byte note for user id 999 this is a test of multiuser notes [DEBUG] found a 41 byte note for user id 999 This is another note for the reader user -------[ end of note data ]------- reader@hacking:~/booksrc $
Similarly, all notes for the user reader have the user ID 999 attached to them
...
This is very similar to how the /etc/passwd file stores user information for all users, yet programs like chsh and passwd allow any user to change his own shell or password
...
In C, structs are variables that can contain many other variables
...
A simple example will suffice for now
...
h
...
struct tm { int tm_sec; /* seconds */ int tm_min; /* minutes */ int tm_hour; /* hours */ int tm_mday; /* day of the month */ int tm_mon; /* month */ int tm_year; /* year */ int tm_wday; /* day of the week */ int tm_yday; /* day in the year */ int tm_isdst; /* daylight saving time */ };
After this struct is defined, struct tm becomes a usable variable type, which can be used to declare variables and pointers with the data type of the tm struct
...
c program demonstrates this
...
h is included, the tm struct is defined, which is later used to declare the current_time and time_ptr variables
...
c #include
...
h> int main() { long int seconds_since_epoch; struct tm current_time, *time_ptr; int hour, minute, second, day, month, year; seconds_since_epoch = time(0); // Pass time a null pointer as argument
...
localtime_r(&seconds_since_epoch, time_ptr); // Three different ways to access struct elements: hour = current_time
...
Time on Unix systems is kept relative to this rather arbitrary point in time, which is also known as the epoch
...
The pointer time_ptr has already been set to the address of current_time, an empty tm struct
...
The elements of structs can be accessed in three different ways; the first two are the proper ways to access struct elements, and the third is a hacked solution
...
Therefore, current_time
...
Pointers to structs are often used, since it is much more efficient to pass a four-byte pointer than an entire data structure
...
When using a struct pointer like time_ptr, struct elements can be similarly accessed by the struct element's name, but using a series of characters that looks like an arrow pointing right
...
The seconds could be accessed via either of these proper methods, using the tm_sec element or the tm struct, but a third method is used
...
c reader@hacking:~/booksrc $
...
out time() - seconds since epoch: 1189311588 Current time is: 04:19:48 reader@hacking:~/booksrc $
...
out time() - seconds since epoch: 1189311600 Current time is: 04:20:00 reader@hacking:~/booksrc $
The program works as expected, but how are the seconds being accessed in the tm struct? Remember that in the end, it's all just memory
...
In the line second = *((int *) time_ptr), the variable time_ptr is typecast from a tm struct pointer to an integer pointer
...
Since the address to the tm struct also points to the first element of this struct, this will retrieve the integer value for tm_sec in the struct
...
c code (time_example2
...
This shows that the elements of tm struct are right next to each other in memory
...