By Robert Portvliet.
This is the second blog post in a four part series. In the first post, we reviewed the structure of a simple C program. In this installment, we will cover disassembling this program, and reviewing the Assembly code generated by the compiler, GCC.
First, let’s once again post our source code, just for reference purposes:
Next, let’s compile it using GCC. I’m going to include the “
Our command line is:
This command takes the
To disassemble our program, we’ll use objdump.
That gives us the following:
So, let’s look at what’s going on here.
These first two lines are the function prologue. The first pushes EBP, the base pointer, onto the stack. The second line copies ESP, the existing stack pointer from the previous stack frame, into EBP.
The next line aligns the stack to a 16-byte boundary. This is another instruction added by the compiler.
The fourth line “
The fifth line moves the memory address
Note: Whenever you see brackets around something in assembly, such as we see with ESP here, it’s pointing to the value (the actual data) in the memory address, that is being pointed to (in this case by ESP). This is called dereferencing a pointer.
We’re going to see the behavior of something being shoved into a memory address pointed to by ESP right before a function is called a few more times before we’re done. This is the equivalent of pushing a value onto the stack so we can work with it.
The sixth line makes a call to
Anyway, back to
Line seven moves the value at
As you can see,
Note: It’s helpful to remember that each memory address is 4 bytes (32bits), or one
The next three lines are best looked at together. Line nine dereferences the pointer that
Lines nine and ten are setting up the function variables for
Finally, lines 12-14 are basically clean up:
In the first, it zeros out the
Here we collapse our stack frame by moving
Finally, ‘
Ok, let’s get started...
Once again, the first two lines are the function prologue.
Next 0x28 bytes of space is allocated on the stack.
Then, the value at
At
Note: The value in
Here we see it setting up to do something with this data again, as Line 5 copies the contents of
Yup, there it is. So, here we are calling
So, look at the source code below. We’ve just done the part in red. Now, I’m guessing the value we get out of
Ok, on to the next few lines...
Note: EDX– “The data register is an extension to the accumulator (EAX). It is most useful for storing data related to the accumulator's current calculation.”
Now that we’ve freed up
Now, we’re moving the value in
So, two things here, what is now pointed to by
Yup! Here we call
Now that’s done. We have one more thing to do:
Line 12 moves the value at
Line 13 then moves it from
Line 14 gives us a new instruction,
Then we move the contents of
Here we call
The last two lines we are already familiar with from reviewing main(). The first instruction, leave, is basically a shortcut for the function epilogue, and equates to the following:
Finally, ‘
So, that wraps up part two of the series. In part 3 we’re going to cover dynamic analysis with GDB. Hope you enjoyed :)
This is the second blog post in a four part series. In the first post, we reviewed the structure of a simple C program. In this installment, we will cover disassembling this program, and reviewing the Assembly code generated by the compiler, GCC.
First, let’s once again post our source code, just for reference purposes:
#include <stdio.h>
void func(char *ptr)
{
char buf[10];
printf("copy %d bytes of data to buf\n", strlen(ptr));
strcpy(buf, ptr);
}
int main(int argc, char **argv)
{
printf("Passing user input to func()\n");
func(argv[1]);
return 0;
}
Next, let’s compile it using GCC. I’m going to include the “
–fno-stack-protector
” switch, to avoid adding a stack canary. This will simplify things, and allow us to walk through a simple stack based buffer overflow in the 3rd blog post of our series.Our command line is:
gcc -o basic basic.c -fno-stack-protector
This command takes the
basic.c
source code as input and outputs the compiled program ‘basic
’. We built it; now let’s immediately take it apart :)To disassemble our program, we’ll use objdump.
objdump -M intel -d basic | grep -A 15 main.:
main()
Here, we’re disassembling the ‘basic
’ program, specifying Intel syntax, and piping the output to grep
, where we want the next 20 lines after we see “main.:
”That gives us the following:
080484bc <main>:
80484bc: 55 push ebp
80484bd: 89 e5 mov ebp,esp
80484bf: 83 e4 f0 and esp,0xfffffff0
80484c2: 83 ec 10 sub esp,0x10
80484c5: c7 04 24 ce 85 04 08 mov DWORD PTR [esp],0x80485ce
80484cc: e8 e7 fe ff ff call 80483b8 <puts@plt>
80484d1: 8b 45 0c mov eax,DWORD PTR [ebp+0xc]
80484d4: 83 c0 04 add eax,0x4
80484d7: 8b 00 mov eax,DWORD PTR [eax]
80484d9: 89 04 24 mov DWORD PTR [esp],eax
80484dc: e8 a3 ff ff ff call 8048484 <func>
80484e1: b8 00 00 00 00 mov eax,0x0
80484e6: c9 leave
80484e7: c3 ret
So, let’s look at what’s going on here.
1. 80484bc: 55 push ebp
2. 80484bd: 89 e5 mov ebp,esp
These first two lines are the function prologue. The first pushes EBP, the base pointer, onto the stack. The second line copies ESP, the existing stack pointer from the previous stack frame, into EBP.
3. 80484bf: 83 e4 f0 and esp,0xfffffff0
The next line aligns the stack to a 16-byte boundary. This is another instruction added by the compiler.
4. 80484c2: 83 ec 10 sub esp,0x10
The fourth line “
sub esp,0x10
” allocates 16 bytes of space on the stack. Remember that the stack grows towards lower memory addresses, so allocations will use ‘sub
’. This is carving out space for ‘buf
’. char buf[10];
5. 80484c5: c7 04 24 ce 85 04 08 mov DWORD PTR [esp],0x80485ce
The fifth line moves the memory address
0x80485ce
into the location pointed to by ESP. It denotes this as being a DWORD
or double word, which is 4 bytes (32 bits). In this case, it is the string "Passing user input to func()\n
” that is being moved here, as it is setting up for the puts()
function.Note: Whenever you see brackets around something in assembly, such as we see with ESP here, it’s pointing to the value (the actual data) in the memory address, that is being pointed to (in this case by ESP). This is called dereferencing a pointer.
We’re going to see the behavior of something being shoved into a memory address pointed to by ESP right before a function is called a few more times before we’re done. This is the equivalent of pushing a value onto the stack so we can work with it.
5. 80484cc: e8 e7 fe ff ff call 80483b8
The sixth line makes a call to
puts()
. You may be wondering why this is since it was not in our source code. The answer is that it is a compiler (GCC) optimization. If we were to specify –fno-builtin-printf
when we compiled the program, we would see printf()
being called here instead. Anyway, back to
puts()
. Three things happen here, first ‘call
’ puts the address of the next instruction (80484d1
) on the stack so the program can return to it after puts()
is done executing. Then it calls puts()
which prints the string "Passing user input to func()\n
" to stdout. 7. 80484d1: 8b 45 0c mov eax,DWORD PTR [ebp+0xc]
8. 80484d4: 83 c0 04 add eax,0x4
Line seven moves the value at
ebp+0xc
(12 bytes down the stack from EBP
) into EAX
, then line eight moves us another 4 bytes. The best way to figure out what that is would be to look at how the stack frame is laid out. As you can see,
EBP+12
contains the function parameters that we are passing in from the command line. However, recall from our first blog post that the first element in the argv
array, argv[0]
, is always the program itself, so we would want the second element in the array, argv[1]
, which is EBP+16
. Note: It’s helpful to remember that each memory address is 4 bytes (32bits), or one
DWORD
(double word). So, each line of assembly that we are covering here equates to 4 bytes. 9. 80484d7: 8b 00 mov eax,DWORD PTR [eax]
10. 80484d9: 89 04 24 mov DWORD PTR [esp],eax
11. 80484dc: e8 a3 ff ff ff call 8048484 <func>
The next three lines are best looked at together. Line nine dereferences the pointer that
EAX
points to (grabs the value at the memory address that EAX
points to) and stores that value in the EAX
register itself. This is our command line argument. Then, line ten moves that value from EAX
into a location pointed to by ESP
. Lines nine and ten are setting up the function variables for
func()
, and then on line eleven we call func()
. This which will once again place the address of the next instruction (80484e1
) on the stack, execute func()
, and then return to address of the next instruction, 80484e1
, when func()
is done. Finally, lines 12-14 are basically clean up:
12. 80484e1: b8 00 00 00 00 mov eax,0x0
13. 80484e6: c9 leave
14. 80484e7: c3 ret
In the first, it zeros out the
EAX
register, then in the next it invokes ‘leave
’ which is basically a shortcut for the function epilogue, and equates to the following: mov esp, ebp
pop ebp
Here we collapse our stack frame by moving
EBP
into ESP
, and then pop EBP
off the stack. Finally, ‘
ret
’ pops the return address of the previous stack frame off the stack and returns to it. func()
Now, let’s take a look at thefunc()
function. Run the following to disassemble basic and grep
for the next 20 lines following “func.:
”objdump -M intel -d basic | grep -A20 func.:
08048484 :
8048484: 55 push ebp
8048485: 89 e5 mov ebp,esp
8048487: 83 ec 28 sub esp,0x28
804848a: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
804848d: 89 04 24 mov DWORD PTR [esp],eax
8048490: e8 f3 fe ff ff call 8048388 <strlen@plt>
8048495: 89 c2 mov edx,eax
8048497: b8 b0 85 04 08 mov eax,0x80485b0
804849c: 89 54 24 04 mov DWORD PTR [esp+0x4],edx
80484a0: 89 04 24 mov DWORD PTR [esp],eax
80484a3: e8 00 ff ff ff call 80483a8 <printf@plt>
80484a8: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
80484ab: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
80484af: 8d 45 ee lea eax,[ebp-0x12]
80484b2: 89 04 24 mov DWORD PTR [esp],eax
80484b5: e8 de fe ff ff call 8048398 <strcpy@plt>
80484ba: c9 leave
80484bb: c3 ret
Ok, let’s get started...
1. 8048484: 55 push ebp
2. 8048485: 89 e5 mov ebp,esp
Once again, the first two lines are the function prologue.
1. 8048484: 55 push ebp
3. 8048487: 83 ec 28 sub esp,0x28
Next 0x28 bytes of space is allocated on the stack.
4. 804848a: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
Then, the value at
EBP+0x08
is loaded into the EAX
register. To figure out what that is, let’s refer back to our stack diagram. At
EBP+8
(8 bytes ‘down’ the stack from EBP
) is the ‘ptr
’ pointer variable. So, it is loading the value of *ptr
, (brackets mean ‘actual data at location in memory’, remember?), into the EAX
register. Note: The value in
*ptr
is the contents of argv[1]
. When main()
called func()
it passed argv[1]
as a parameter. func(argv[1]);
5. 804848d: 89 04 24 mov DWORD PTR [esp],eax
Here we see it setting up to do something with this data again, as Line 5 copies the contents of
EAX
(argv[1]
), into the memory location pointed to by ESP
. I smell a function call coming...6. 8048490: e8 f3 fe ff ff call 8048388 <strlen@plt>
Yup, there it is. So, here we are calling
strlen()
with ‘ptr
’ as an argument. As mention in our first blog post, strlen()
returns the length of a string. It iterates through until it hits a null byte. So, look at the source code below. We’ve just done the part in red. Now, I’m guessing the value we get out of
strlen()
is headed for printf()
to fill in the %d
, don’t you? printf("copy %d bytes of data to buf\n", strlen(ptr));
Ok, on to the next few lines...
7. 8048495: 89 c2 mov edx,eax
EAX
is generally used to contain the output of a function, so the output of strlen()
(the length of the data in *ptr
) is now in EAX
. It’s likely moving it into EDX
for the moment, so EAX
can be used for something else. Note: EDX– “The data register is an extension to the accumulator (EAX). It is most useful for storing data related to the accumulator's current calculation.”
8. 8048497: b8 b0 85 04 08 mov eax,0x80485b0
Now that we’ve freed up
EAX
, we can move the memory address 0x80485b0
into it. 9. 804849c: 89 54 24 04 mov DWORD PTR [esp+0x4],edx
10. 80484a0: 89 04 24 mov DWORD PTR [esp],eax
Now, we’re moving the value in
EDX
(the output of strlen()
) into ESP+4
. That’s 4 bytes, or one memory address ‘down’ the stack from ESP
. Then we move the contents of EAX
(the memory address 0x80485b0
) into a location pointed to by ESP
. So, two things here, what is now pointed to by
ESP
is the string "copy %d bytes of data to buf\n
", and what is in ESP+4
is the value going into %d
. Do you sense a function call being setup here? ;)11. 80484a3: e8 00 ff ff ff call 80483a8 <printf@plt>
Yup! Here we call
printf()
and print "copy %d bytes of data to buf\n
" to stdout. The %d
being the length (as determined by strlen()
) of the argument we provided to the program at runtime.Now that’s done. We have one more thing to do:
strcpy(buf, ptr);
12. 80484a8: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
Line 12 moves the value at
EBP+8
, which is *ptr
, into EAX
. 13. 80484ab: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
Line 13 then moves it from
EAX
into the location pointed to by ESP+4
.14. 80484af: 8d 45 ee lea eax,[ebp-0x12]
Line 14 gives us a new instruction,
LEA
. LEA
stands for ‘load effective address’, and it does just that. In this case, it loads the memory address (not the contents) at EBP-12
(0x12 bytes ‘up’ the stack from EBP
) into EAX
. 15. 80484b2: 89 04 24 mov DWORD PTR [esp],eax
Then we move the contents of
EAX
into the address pointed to by ESP
. Again, this is the equivalent of pushing it onto the stack, and means we are setting up for another function...16. 80484b5: e8 de fe ff ff call 8048398 <strcpy@plt>
Here we call
strcpy()
and copy the contents of ‘ptr
’ into ‘buf
’. 17. 80484ba: c9 leave
18. 80484bb: c3 ret
The last two lines we are already familiar with from reviewing main(). The first instruction, leave, is basically a shortcut for the function epilogue, and equates to the following:
mov esp, ebp
pop ebp
Finally, ‘
ret
’ pops the return address of the previous stack frame off the stack and returns to it. So, that wraps up part two of the series. In part 3 we’re going to cover dynamic analysis with GDB. Hope you enjoyed :)