4. Buffer Overflow
4. Buffer Overflow
2
Memory Layout
High Addr
■ Stack Stack
▪ Stack frames of executing functions
■ Heap
▪ Memory blocks dynamically allocated by
using malloc() or new()
■ Shared library Heap
▪ Functions that you didn’t write directly
Shared
■ Data library
▪ Global variables of your program
Data
■ Code (a.k.a. Text)
▪ Instructions of the functions that you wrote
Code
Low Addr
3
Memory Layout Example
High Addr
Stack
int i_arr [65536];
char *str = "Hello world";
int count = 0;
4
Topics
■ Memory layout of a program
■ Basic concept of buffer overflow
▪ Stack memory corruption and control hijack
▪ Exploitation with shellcode
■ The first round of war between attacker vs. defender
▪ (Mitigation) Stack canary, NX
▪ (Bypassing) Memory disclosure
5
Buffer Overflow (BOF)
■ C has no automatic check on array index and boundary
▪ Also, some functions (like gets) don’t check the input length
▪ This allows to write past the end of an array (buffer): overflow!
▪ Such write can corrupt other data in the memory
char buf[12];
start end
6
Buffer Overflow (BOF)
■ C has no automatic check on array index and boundary
▪ Also, some functions (like gets) don’t check the input length
▪ This allows to write past the end of an array (buffer): overflow!
▪ Such write can corrupt other data in the memory
start end
7
Buffer Overflow (BOF)
■ C has no automatic check on array index and boundary
▪ Also, some functions (like gets) don’t check the input length
▪ This allows to write past the end of an array (buffer): overflow!
▪ Such write can corrupt other data in the memory
■ What kind of critical data can be corrupted?
▪ Return address saved in the stack frame is a good example
▪ Corruption of saved return address allows an attacker to
manipulate the program counter (a.k.a. control hijack)
start end
8
Classic Buffer Overflow
■ Overflow of a buffer in the stack memory
▪ Called stack-based buffer overflow, sometimes stack smashing*
▪ Not to be confused with "stack overflow"
■ Often caused by using unsafe string-handling functions
▪ gets(), scanf("%s", ...), strcpy(), strcat()...
■ Became famous because hackers could easily exploit
them and infect machines
▪ Ex) Morris Worm (the first internet worm) in 1988 also exploited
stack-based buffer overflow vulnerability in fingerd server
int main(void) {
int arr[32];
int idx;
scanf("%d", &idx);
arr[idx] = 1; // Error
return 0;
}
10
Example Program with BOF
void echo(void) { jschoi@ubuntu:~$ ./bof
char buf[8]; Hello
gets(buf);
Hello
puts(buf);
}
int main(void) {
jschoi@ubuntu:~$ ./bof
echo(); 0123456789ABCDE
return 0; 0123456789ABCDE
}
jschoi@ubuntu:~$ ./bof
Starts to crash 0123456789ABCDEF
from this point 0123456789ABCDEF
(we will see why) Segmentation fault
11
Assembly Code of the Example
void echo(void) { (gdb) disassemble echo
char buf[8]; 0x401136: sub $0x18,%rsp
gets(buf); 0x40113a: lea 0x8(%rsp),%rdi
puts(buf); 0x40113f: mov $0x0,%eax
} 0x401144: call 0x401040 <gets@plt>
0x401149: lea 0x8(%rsp),%rdi
0x40114e: call 0x401030 <puts@plt>
0x401153: add $0x18,%rsp
0x401157: ret
int main(void) { (gdb) disassemble main
echo(); 0x401158: sub $0x8,%rsp
return 0; 0x40115c: call 0x401136 <echo>
} 0x401161: mov $0x0,%eax
0x401166: add $0x8,%rsp
0x40116a: ret
12
Stack Frame Layout
High Address (gdb) disassemble echo
0x401136: sub $0x18,%rsp
Stack frame
0x40113a: lea 0x8(%rsp),%rdi
of main()
0x40113f: mov $0x0,%eax
0x401144: call 0x401040 <gets@plt>
0x401149: lea 0x8(%rsp),%rdi
0x40114e: call 0x401030 <puts@plt>
Stack frame 0x401153: add $0x18,%rsp
of echo() 0x401157: ret
(gdb) disassemble main
0x401158: sub $0x8,%rsp
0x40115c: call 0x401136 <echo>
0x401161: mov $0x0,%eax
Low Address
0x401166: add $0x8,%rsp
0x40116a: ret
13
Stack Frame Layout
High Address (gdb) disassemble echo
0x401136: sub $0x18,%rsp
Stack frame
0x40113a: lea 0x8(%rsp),%rdi
of main()
0x40113f: mov $0x0,%eax
Return addr 0x401144: call 0x401040 <gets@plt>
0x401161 0x401149: lea 0x8(%rsp),%rdi
0x40114e: call 0x401030 <puts@plt>
(Unused) 0x401153: add $0x18,%rsp
0x401157: ret
char buf[8] 0x18 (gdb) disassemble main
%rsp+8 0x401158: sub $0x8,%rsp
(Unused) 0x40115c: call 0x401136 <echo>
%rsp 0x401161: mov $0x0,%eax
Low Address
0x401166: add $0x8,%rsp
Let’s take a closer look 0x40116a: ret
on the stack frame
14
What happens in the stack frame?
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
... call 0x401040 <gets@plt>
Return addr
0x401161
(Unused)
■ When gets(buf) is called...
▪ Each character in the input string will be stored
char buf[8] into char buf[8], starting from lower address
(Unused)
15
Example Input #1
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
... call 0x401040 <gets@plt>
Return addr
0x401161
(ASCII Encoding)
(Unused) 'H': 0x48
jschoi@ubuntu:~$ ./bof
char buf[8] Hello 'E': 0x45
Hello 'L': 0x4C
(Unused) 'O': 0x4F
16
Example Input #2
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
... call 0x401040 <gets@plt>
Return addr
0x401161
(Unused)
jschoi@ubuntu:~$ ./bof (ASCII Encoding)
char buf[8] 0123456789ABCDE '0': 0x30
0123456789ABCDE 'A': 0x41
(Unused)
17
Example Input #3
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
... call 0x401040 <gets@plt>
Return addr
0x401161
18
Example Input #3: Crash
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
puts(buf); add $0x18,%rsp
Return addr ret
}
0x401161
19
Example Input #4: Control Hijack
void echo(void) { sub $0x18,%rsp
Stack frame char buf[8]; lea 0x8(%rsp),%rdi
of main() gets(buf); ...
puts(buf); add $0x18,%rsp
Return addr ret
}
0x401161
20
We’ve seen that hackers can manipulate program
counter (control hijack) by corrupting the stack
21
Code Execution
■ Inject malicious code in the memory (e.g., in buf[])
▪ And overwrite the return address with the address of buf
void echo(void) {
jschoi@ubuntu:~$ ./bof
char buf[8];
j0YX45P... (omitted)
gets(buf);
j0YX45P... (omitted)
puts(buf);
Segmentation fault
}
23 11 40 00 00 00 00 00
22
Code Execution
■ Inject malicious code in the memory (e.g., in buf[])
▪ And overwrite the return address with the address of buf
void echo(void) {
jschoi@ubuntu:~$ ./bof
char buf[8];
j0YX45P... (omitted)
gets(buf);
j0YX45P... (omitted)
puts(buf);
Segmentation fault
}
23
Code Execution
■ Inject malicious code in the memory (e.g., in buf[])
▪ And overwrite the return address with the address of buf
▪ Upon return, the content of buf will be executed as instructions
void echo(void) {
jschoi@ubuntu:~$ ./bof
char buf[8];
j0YX45P... (omitted)
gets(buf);
j0YX45P... (omitted)
puts(buf);
Segmentation fault
}
%rip
(execute)
6a 30 59 58 34 35 50 … 60 60 60 00 00 00 00 00
24
How can hacker inject "code"?
■ Program reads in string (data) as input, how can a
hacker inject "code" into the program memory?
▪ In fact, there is nothing special that the hacker has to do
■ Recall that machine code is just a sequence of bytes
▪ Just like any other data (e.g., integers, strings)
■ In the previous page, "j0YX45P..." was used as input
▪ ASCII code of this string is: 6A 30 59 58 34 35 50
▪ These bytes are also interpretable as x86-64 instructions below
0: 6a 30 push $0x30
2: 59 pop %rcx
3: 58 pop %rax
4: 34 35 xor $0x35,%al
6: 50 push %rax
25
Shellcode
■ In the previous page, I said "inject malicious code"
▪ But what kind of malicious code?
■ Once executed, this code will spawn a shell
▪ If a shell is given, hacker can run any command from now on!
▪ Such kind of malicious code is called shellcode
▪ Roughly speaking, it is execve("/bin/sh") written in
assembly instructions
# Shellcode Example
xor %rdx, %rdx Run jschoi@ubuntu:~$ ./bof
mov $0x6873..., %rbx ... (omitted)
... ... (omitted)
mov $0x3b, %al $ ls; rm –rf *
syscall
Shell is spawned
26
Topics
■ Memory layout of a program
■ Basic concept of buffer overflow
▪ Stack memory corruption and control hijack
▪ Exploitation with shellcode
■ The first round of war between attacker vs. defender
▪ (Mitigation) Stack canary, NX
▪ (Bypassing) Memory disclosure
27
Defense against BOF
■ How can we protect a program from BOF, then?
■ Solution 1: Removing the buffer overflow itself
▪ Ex) Replace with fgets(), scanf("%8s",...), etc.
void safe_echo(void) {
char buf[8];
fgets(buf, 8, stdin);
puts(buf);
}
28
Mitigation: Stack Canary
■ Place randomized bytes called canary* between the
buffer and the return address
▪ Canary is prepared right after entering a function
▪ Before the function returns, check if it was changed (corrupted)
Assume 0x606060
61 11 40 00 00 00 00 00
Buffer overflow
6a 30 59 58 34 35 50 … … … … … … … … … 60 60 60 00 00 00 00 00
Corrupted!
*Canary: Name of a bird that miners brought to a cave for gas leak detection 29
Credit: Icons are from Flaticon (by Freepik)
Assembly Code for Stack Canary
■ Nowadays, compilers will emit the following code
(gdb) disassemble echo
High Address
0x401146: sub $0x18,%rsp
Stack frame 0x40114a: mov %fs:0x28,%rax Canary
of echo 0x401153: mov %rax,0x8(%rsp) setup
0x401158: xor %eax,%eax
Return address 0x40115a: mov %rsp,%rdi
0x40115d: call 0x401050 <gets@plt>
Unused 0x401162: mov %rsp,%rdi
0x401165: call 0x401030 <puts@plt>
0x40116a: mov 0x8(%rsp),%rax Canary
canary
0x40116f: sub %fs:0x28,%rax
check
0x401178: jne 0x40117f <echo+57>
char buf[ ]
0x40117a: add $0x18,%rsp
%rsp 0x40117e: ret
Low Address
0x40117f: call 0x401040 <__stack_chk_fail@plt>
31
Memory Disclosure
■ Exploiting a vulnerability to disclose some information
in the memory
■ Again, misuse of array is the most common source of
vulnerability that allows memory disclosure
▪ Buffer overflow that reads the data past the end of an array
▪ Of course, BOF is not the only source of memory disclosure
■ Various kind of information can be disclosed
▪ Private user data, secret key in cryptography, etc.
▪ In this slide, let’s focus on disclosing the stack canary value
• If stack canary value is known, hacker can overwrite return
address and pretend as if nothing has happened
32
Memory Disclosure Example
■ In the code below, write(1, buf, len) prints out len
bytes of data stored in buf
▪ Unlike printf("%s", buf), it does not stop at NULL character
■ The famous Heartbleed vulnerability was also caused
by a similar mistake of trusting user input
▪ Review Chapter 1. Overview
}
char buf[32]
Stack section
(execute?)
6a 30 59 58 34 35 50 … 60 60 60 00 00 00 00 00
35
Side-Note:
Access Control & SUID
36
Access Control
■ Intuitively, access control is about what kind of
permission should be given to each user of a system
▪ There are formal models about this, but let’s keep it simple here
▪ Linux file system is a good example:
• Any user can execute cat, but cannot modify its content
• Only jason user can access the secret.txt file
/home/jason $ ls -l /usr/bin/cat
-rwxr-xr-x 1 root root 35280 /usr/bin/cat
/home/jason $ ls -l secret.txt
-rw------- 1 jason jason 16 secret.txt
37
Setuid Bit (SUID)*
■ Have you ever wondered how passwd command works?
▪ This command must update /etc/shadow file
▪ /etc/shadow file is writable only by root, of course
▪ Then how can you update your password (as a non-root user)?
■ Setuid bit is a mechanism that enables this
▪ When you execute /usr/bin/passwd, you temporarily run it
with the privilege of the file owner (root in this case)
/home/jason $ ls -l /etc/shadow
-rw-r----- 1 root shadow 828 /etc/shadow
/home/jason $ ls -l /usr/bin/passwd
-rwsr-xr-x 1 root root 59976 /usr/bin/passwd
https://www.oreilly.com/library/view/secure-programming-cookbook/0596003943/ch01s03.html 38
What if SUID program has BOF?
■ The expected behavior of /usr/bin/passwd is fixed
▪ It must read in your new password twice, compare if they are
same, and then update /etc/shadow file
■ But if /usr/bin/passwd has BOF, hacker can exploit it
and make the program do other things
▪ Run the code that the hacker (not the developer) wants
▪ Ex) Hacker can even make it run execve("/bin/bash"...)
▪ … what happens then?
39