0% found this document useful (0 votes)
68 views

Good Slides To Understand System Call

The document discusses system call implementation. It begins by explaining that user processes interact with the kernel via system calls, which are typically invoked through a trap instruction. It describes how arguments are passed via registers instead of the stack due to the context switch. The rest of the document provides examples of how Linux implements system calls, including dispatching to system call service routines via a system call table based on the system call ID passed in a register.

Uploaded by

Mahesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Good Slides To Understand System Call

The document discusses system call implementation. It begins by explaining that user processes interact with the kernel via system calls, which are typically invoked through a trap instruction. It describes how arguments are passed via registers instead of the stack due to the context switch. The rest of the document provides examples of how Linux implements system calls, including dispatching to system call service routines via a system call table based on the system call ID passed in a register.

Uploaded by

Mahesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SYSTEM CALL

IMPLEMENTATION
CS124 – Operating Systems
Fall 2018-2019, Lecture 14
2

User Processes and System Calls


• Previously stated that user applications interact with the
kernel via system calls
• Typically invoked via a trap instruction
• An intentional software-generated exception

• The kernel registers a handler for a specific trap


• int $0x80 for Linux system calls
• int $0x2e for Windows system calls
• int $0x30 for Pintos system calls

• Can’t easily pass arguments to system calls on the stack


• Trap instruction causes the CPU to switch operating modes (i.e.
from user mode to kernel mode)
• Different operating modes have different stacks
3

User Processes and System Calls (2)


• Typically, arguments to system calls passed in registers,
and the return-value(s) come back in registers
• One of the arguments is an integer indicating which
system call to invoke
• e.g. on Linux and Windows, %eax is set to operation to perform
• e.g. on UNIX systems, sys/syscall.h specifies these numbers
• Note: UNIX syscall IDs are not uniform across different UNIXes
• Obvious constraint: system-call arguments can’t be wider
than the registers
• Several possible approaches:
• Can split larger arguments across multiple registers
• Can store larger arguments in a struct, then pass a pointer to the
struct as an argument
4

User Processes and System Calls (3)


• The operating system frequently exposes system calls via
a standard library
• e.g. UNIX syscalls are exposed via the C standard library (libc)
• e.g. Windows syscalls are exposed via the (largely undocumented)
Native API (ntapi.dll)
• The library serves as an intermediary between apps and
the operating system
• Some functions are direct wrappers for system calls
• e.g. ssize_t read(int fd, void *buf, size_t nbyte)
• Implementation stores arguments from stack into registers, invokes
the system call entry-point (e.g. int $0x80), and returns result
• Others utilize system call wrappers internally
• e.g. malloc() is mainly implemented in user space, but uses
system calls to increase the process’ heap size
5

Review: Interrupt Mechanics


• Previously discussed how interrupts and traps are
handled on IA32 (see lecture 9 for details)
• User process has its own stack
• Executing the trap causes the
CPU to switch to the kernel-mode User Process Stack Kernel Thread Stack

stack associated with the process Caller’s SS


current contents Caller’s ESP
• Since system calls change of user process
stack Caller’s EFLAGS
from user mode to kernel Caller’s CS
mode, IA32 saves pointer to Caller’s EIP
previous stack on new stack
• Next, CPU saves the user trap

process’ execution state:


cs, eip and eflags
6

Review: Interrupt Mechanics (2)


• Operating system has a stub for every possible interrupt
• Some interrupts push an error code onto the stack; if not,
the OS stub will push a dummy value for consistency
• Next, stub pushes the interrupt User Process Stack Kernel Thread Stack
number onto the stack Caller’s SS
• Finally, stub records all register current contents
of user process
Caller’s ESP

state onto kernel stack stack Caller’s EFLAGS


Caller’s CS
Caller’s EIP

• Now the ISR can run without Error Code


Interrupt No.
disrupting the interrupted code
Register State
of Interrupted
Program
7

System Call Mechanics


• The OS exposes the user program’s CPU and register
state as arguments to the ISR
• Typically exposed to ISR as a struct with a field for each register
• System call handler needs to receive arguments from the
user program User Process Stack Kernel Thread Stack

• Can easily access these values Caller’s SS


on the kernel stack current contents Caller’s ESP
of user process
• Syscall handler must also stack Caller’s EFLAGS

return a status result in eax Caller’s CS


Caller’s EIP
• Can modify user program’s eax
on the kernel stack Error Code
Interrupt No.
• When kernel returns to the user
Register State
program, its context is restored of Interrupted
• Program sees new value of eax Program
8

System Call Mechanics (2)


• The ID of the system call is used to dispatch to a function
that implements the system call
• Called a system call service routine
• System call service routines are usually named after their
user-mode entry points
• e.g. sys_write() implements write()
• e.g. sys_fork() implements fork()
• (Aside: these service routines are sometimes called within the
kernel implementation to implement more complex operations)
• A system call table holds an array of function pointers to
all system call service routines
• The syscall ID is used to index into this table when making the call
9

System Call Mechanics (3)


• Need to check the system call ID to ensure it’s valid…
• If it’s invalid, return ENOSYS “Function not implemented” error

• Can easily check that the ID is below the max syscall ID


• If a specific syscall ID below the max is not supported,
simply register a service routine that returns ENOSYS
Example: Linux System Calls
• Snippet [paraphrased] of Linux system_call() handler:

... # Save registers onto stack

# Make sure it's a valid syscall ID


cmpl $(NR_syscalls), %eax
jb nobadsys

# Return-value of syscall() will be in eax


# as usual, so set value of eax stored on
# kernel stack to ENOSYS to indicate error
movl $(-ENOSYS), 24(%esp)
jmp ret_from_sys_call
nobadsys:
...
Example: Linux System Calls (2)
• Linux system_call() handler, continued:

...
nobadsys:
# Dispatch to the function in the system-call
# table corresponding to the specified ID
# (On IA32, pointers are 4 bytes, so use
# ID*4 as the address within the table)
call *sys_call_table(, %eax, 4)

# Store return-value from routine into


# location of eax on the kernel stack
movl %eax, 24(%esp)
jmp ret_from_sys_call
12

Example: Linux System Calls (3)


• Different syscalls require different numbers of arguments
• e.g. getpid() and fork() require no arguments
• e.g. mmap() requires up to six arguments
• System-call arguments are passed from the Kernel Thread Stack
user process in specific registers

• ebx is first argument, ecx is second argument, etc.
ebp = arg6
• Syscall service routines are written in C, and edi = arg5
they expect their args on the kernel stack esi = arg4
edx = arg3
• Linux system_call() handler pushes all
ecx = arg2
of the process’ registers onto the kernel stack ebx = arg1
in a specific order
• Specifically, the reverse order that registers are
used to pass arguments to system calls
13

Example: Linux System Calls (4)


• Arguments to syscall service routines are pushed in
reverse order, following the cdecl calling convention
• Under cdecl, if a function is passed more arguments than
it expects, the extra arguments are ignored Kernel Thread Stack
• Allows system_call() to dispatch to all the …
different service routines, regardless of the ebp = arg6

number of arguments they take edi = arg5


esi = arg4
• e.g. int sys_write(int fd, char *buf, int size) edx
edx == arg3
size
• Service routine for write(int fd, char *buf, int size) ecx
ecx==arg2
buf
ebx
ebx==arg1
fd
• When system_call() dispatches to sys_write(),
return address
sys_write() sees only the expected arguments
sys_write() frame
• Extra arguments are simply ignored by sys_write() …
14

System Calls: Security Holes?


• It goes without saying that the system call service routine
must carefully check all arguments to the system call…

• Are there potential security holes in accepting pointers as


arguments to system calls?
• Example: ssize_t read(int fd, void *buf,
size_t nbytes)
• Reads bytes from a file descriptor into a buffer

• Caller specifies:
• The file-descriptor to read
• A pointer to the buffer to store the data in
• A number of bytes to read
15

System Calls: Security Holes?!


• Example: ssize_t read(int fd, void *buf,
size_t nbytes)
• Generally the pointers are expected to be in user space…
• What if the user-mode program specifies an address in
the kernel’s address space?
• As long as the user-mode program doesn’t access this address, it
won’t cause a general protection fault…
• But, the kernel is allowed to write to this address!
• If kernel naïvely accepts the address from the user-mode program,
it could overwrite critical data
• Example: target critical kernel data structures
• Program opens file containing the data it wants to insert into kernel
• Program passes that file descriptor and address of kernel struct…
16

System Calls: Security Holes


Process-specific
• Very important to verify all addresses data structures

Kernel Space
that come from user-mode programs: Kernel stack

• Addresses must be in userspace! Mapping to


physical memory
• If an address is in kernel space, it’s an
Kernel code
access violation and global data
0xc0000000
%esp User stack

• Fast way to verify addresses:


• Make sure the address is below the Memory mapped region
for shared libraries
kernel / user address boundary! 0x40000000

User Space
(e.g. 0xc0000000 in Linux/Pintos)
brk
Run-time heap
(via malloc)
Uninitialized data (.bss)
Initialized data (.data)
Program text (.text)
0x08048000
Forbidden
0
17

System Calls and Page Faults


• Addresses below the kernel / user Process-specific
data structures

Kernel Space
boundary could still be invalid… Kernel stack

• e.g. pass a pointer to unallocated memory Mapping to


to a read() system call physical memory
• e.g. pass a pointer to read-only memory Kernel code
to a write() system call 0xc0000000
and global data
User stack
• OS will see a page fault or a general %esp

protection fault within the kernel


• Problem: this isn’t always an error!
Memory mapped region
for shared libraries
0x40000000

User Space
• Many OSes don’t allocate virtual memory
pages until they are actually accessed brk
Run-time heap
• Private copy-on-write pages are marked (via malloc)
read-only; first write attempt causes the Uninitialized data (.bss)
page to be copied for the writing process Initialized data (.data)
Program text (.text)
0x08048000
Forbidden
0
18

System Calls and Page Faults (2)


• Aside:
• In the Pintos system-call lab, virtual memory management isn’t
completed yet, so a page fault does mean an invalid address J

• The OS may see memory faults within the kernel:


• Sometimes these are valid scenarios
• Sometimes it’s an invalid pointer passed to a syscall L
• Sometimes it is a kernel bug L L
• Assume there is a way to identify valid scenarios…
• (We will examine that question in a few weeks)

• How do we distinguish between the remaining two cases?


19

System Calls and Page Faults (3)


• How to distinguish between:
• Faults caused by invalid addresses passed to system calls
• Faults caused by kernel bugs

• Linux has a very interesting solution to this problem

• How much kernel code actually interacts with user space?


• (Remember, CPU state of user processes is saved onto the kernel
stack, which is in kernel space)
20

System Calls and Page Faults (3)


• The amount of kernel code that interacts with user space
is actually very small…

• Linux kernel keeps an exception table, which records the


addresses of all instructions that touch user space
• In the fault handler, consult the exception table:
• If the faulting instruction is in the exception table, then the user
program passed the kernel a bad pointer
• Otherwise, it’s a kernel bug L
• Aside: if it’s a kernel bug, Linux performs a kernel oops
• Print out suitable info for a kernel developer to debug the error, and
log it to the system log
• Then terminate the process!
• Keeps kernel bugs from bringing down the entire system…
21

Example Kernel Oops


22

Pintos System Calls


• Pintos doesn’t follow the Linux syscall mechanism
• Syscall arguments are on the user stack, not in the registers
• This complicates the syscall mechanism, but only slightly
• Pictorially: User Process Stack Kernel Thread Stack

current contents Caller’s SS


of user process Caller’s ESP
stack
Caller’s EFLAGS
Arguments to
System Call Sys-Call Args Caller’s CS
struct intr_frame
Caller’s EIP
(threads/interrupt.h)
Error Code
Interrupt No. Pointer is passed to
system-call function
Register State
(userprog/syscall.c)
of Interrupted
Program
23

Pintos System Calls (2)


• intr_frame struct exposes process machine context
struct intr_frame {
• Note that topmost values // Pushed by intr_entry (intr-stubs.S).
on stack appear at bottom // The interrupted task's saved registers.
uint32_t edi; // Saved EDI
of the structure… uint32_t esi; // Saved ESI
uint32_t ebp; // Saved EBP
• Recall: C structure members uint32_t esp_dummy; // Not used
assigned increasing offsets uint32_t ebx; // Saved EBX
...
• Last struct members have the
highest addresses // Pushed by intrNN_stub (intr-stubs.S).
uint32_t vec_no; // Interrupt vector no.
• This struct makes it easy to // Sometimes pushed by CPU; otherwise for
// consistency, 0 is pushed (intrNN_stub).
access the user process’ uint32_t error_code;
stack contents // Pushed by the CPU. These are the
• e.g. retrieve esp member, // interrupted task's saved registers.
void (*eip) (void); // Next instruction
cast to uint32_t*, then uint16_t cs, :16; // Code segment
access user stack like an array uint32_t eflags; // Saved CPU flags
void *esp; // Saved stack ptr
uint16_t ss, :16; // Stack segment
};
24

Pintos System Calls (3)


• Pintos system-call arguments are pushed on the user
process stack
• Arguments themselves are pushed in reverse order
• Finally, system-call number is pushed
User Process Stack Kernel Thread Stack

• Caller’s esp points to the current contents Caller’s SS


system-call number of user process
stack
Caller’s ESP

• Use syscall no. to determine Caller’s EFLAGS


how many args are required Caller’s CS
Arg N Caller’s EIP
Arguments to … Error Code
• Finally, read in the System Call
Arg 1 Interrupt No.
args themselves Syscall Number Register State
• Accessing user-space, of Interrupted
so need to do this carefully Program
25

Next Topics!
• Next three lectures cover two fun topics!

• How signal handling works (1 lecture)

• Kernel allocators: how memory allocations are managed


within the kernel (2 lectures)

You might also like