Dokumen - Pub Windows Kernel Programming 9798379069513
Dokumen - Pub Windows Kernel Programming 9798379069513
Pavel Yosifovich
This book is for sale at http://leanpub.com/windowskernelprogrammingsecondedition
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process.
Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many
iterations to get reader feedback, pivot until you have the right book and build traction once you do.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Who Should Read This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What You Should Know to Use This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Book Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Sample Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Book Contents
Here is a quick rundown of the chapters in the book:
• Chapter 1: Windows Internals Overview - provides the fundamentals of the internal workings
of the Windows OS at a high level, enough to get the fundamentals without being bogged down
by too many details.
Introduction 2
• Chapter 2: Getting Started with Kernel Development - describes the tools and procedures
needed to set up a development environment for developing kernel drivers. A very simple driver
is created to make sure all the tools and procedures are working correctly.
• Chapter 3: Kernel Programming Basics - looks at the fundamentals of writing drivers,
including basic kernel APIs, handling of common programming tasks involving strings, linked
lists, dynamic memory allocations, and more.
• Chapter 4: Driver from Start to Finish - shows how to build a complete driver that performs
some useful functionality, along with a client application to drive it.
• Chapter 5: Debugging and Tracing - shows how to use WinDbg to debug user-mode and
especially kernel-mode code. It also looks at tracing driver code.
• Chapter 6: Kernel Mechanisms - looks at various kernel mechanisms that a driver developer
must be familiar with, such IRQLs, BSODs, and synchronization.
• Chapter 7: The I/O Request Packet - discussed the details of handling IRPs, accessing user-
mode buffers in a safe way, and other aspects of handling I/O requests, which is the main work
of a typical driver.
• Chapter 8: Advanced Programming Techniques (Part 1) - discussed various kernel program-
ming techniques, including thread management, memory management and using system calls.
• Chapter 9: Process and Thread Notifications - shows how drivers can be notified when
processes and threads are created or destroyed.
• Chapter 10: Object and Registry Notifications - shows how drivers can be notified when
handles are opened to certain types of objects. The chapter also shows how to be notified when
Registry operations are invoked.
• Chapter 11: Advanced Programming Techniques (Part 2) - shows more techniques useful for
driver writers, such as using timers and trees.
• Chapter 12: File System Mini-Filters - discussed the support provided by Windows and the
Filter Manager to handle file system notifications.
• Chapter 13: The Windows Filtering Platform - shows how to the WFP to intercept network
operations.
• Chapter 14: Introduction to KMDF - introduces the basics of the Kernel Mode Driver
Framework.
• Chapter 15: Miscellaneous Topics - discusses other topics of interest, such as generic filter
drivers, and hooking drivers.
• Appendix: The Kernel Template Library - summaries the usage of a set of classes supporting
many generic aspects of kernel development that has been developed specifically for this book.
If you are new to Windows kernel development, you should read chapters 1 to 7 in order. Chapter
8 contains some advanced material you may want to go back to after you have built a few simple
drivers. Chapters 9 onward describe specialized techniques, and in theory at least, can be read in any
order.
Sample Code
All the sample code from the book is freely available on the book’s Github repository at https://github.
com/zodiacon/windowskernelprogrammingbook2e. Updates to the code samples will be pushed to
Introduction 3
this repository. It’s recommended the reader clone the repository to the local machine, so it’s easy to
experiment with the code directly.
All code samples have been compiled with Visual Studio 2019. It’s possible to compile most code
samples with earlier versions of Visual Studio if desired. There might be few features of the latest C++
standards that may not be supported in earlier versions, but these should be easy to fix.
Happy reading!
Pavel Yosifovich
March 2023
Chapter 1: Windows Internals
Overview
This chapter describes the most important concepts in the internal workings of Windows. Some of
the topics will be described in greater detail later in the book, where it’s closely related to the topic
at hand. Make sure you understand the concepts in this chapter, as these make the foundations upon
any driver and even user mode low-level code, is built.
In this chapter:
• Processes
• Virtual Memory
• Threads
• System Services
• System Architecture
• Handles and Objects
Processes
A process is a containment and management object that represents a running instance of a program.
The term “process runs” which is used fairly often, is inaccurate. Processes don’t run – processes
manage. Threads are the ones that execute code and technically run. From a high-level perspective, a
process owns the following:
• An executable program, which contains the initial code and data used to execute code within
the process. This is true for most processes, but some special ones don’t have an executable
image (created directly by the kernel).
• A private virtual address space, used for allocating memory for whatever purposes the code
within the process needs it.
• An access token (called primary token), which is an object that stores the security context of
the process, used by threads executing in the process (unless a thread assumes a different token
by using impersonation).
• A private handle table to executive objects, such as events, semaphores, and files.
Chapter 1: Windows Internals Overview 5
• One or more threads of execution. A normal user-mode process is created with one thread
(executing the classic main/WinMain function). A user mode process without threads is mostly
useless, and under normal circumstances will be destroyed by the kernel.
A process is uniquely identified by its Process ID, which remains unique as long as the kernel process
object exists. Once it’s destroyed, the same ID may be reused for new processes. It’s important to
realize that the executable file itself is not a unique identifier of a process. For example, there may be
five instances of notepad.exe running at the same time. Each of these Notepad instances has its own
address space, threads, handle table, process ID, etc. All those five processes are using the same image
file (notepad.exe) as their initial code and data. Figure 1-2 shows a screenshot of Task Manager’s
Details tab showing five instances of Notepad.exe, each with its own attributes.
Chapter 1: Windows Internals Overview 6
Virtual Memory
Every process has its own virtual, private, linear address space. This address space starts out empty (or
close to empty, since the executable image and NtDll.Dll are the first to be mapped, followed by more
subsystem DLLs). Once execution of the main (first) thread begins, memory is likely to be allocated,
more DLLs loaded, etc. This address space is private, which means other processes cannot access it
directly. The address space range starts at zero (technically the first and last 64KB of the address space
cannot be committed), and goes all the way to a maximum which depends on the process “bitness”
(32 or 64 bit) and the operating system “bitness” as follows:
• For 32-bit processes on 32-bit Windows systems, the process address space size is 2 GB by
default.
• For 32-bit processes on 32-bit Windows systems that use the increase user virtual address space
setting, it can be configured to have up to 3GB of address space per process. To get the extended
address space, the executable from which the process was created must have been marked with
the LARGEADDRESSAWARE linker flag in its PE header. If it was not, it would still be limited to 2
GB.
• For 64-bit processes (on a 64-bit Windows system, naturally), the address space size is 8 TB
(Windows 8 and earlier) or 128 TB (Windows 8.1 and later).
• For 32-bit processes on a 64-bit Windows system, the address space size is 4 GB if the executable
image has the LARGEADDRESSAWARE flag in its PE header. Otherwise, the size remains at 2 GB.
The requirement of the LARGEADDRESSAWARE flag stems from the fact that a 2 GB address
range requires 31 bits only, leaving the most significant bit (MSB) free for application use.
Specifying this flag indicates that the program is not using bit 31 for anything and so
having that bit set (which would happen for addresses larger than 2 GB) is not an issue.
Chapter 1: Windows Internals Overview 7
Each process has its own address space, which makes any process address relative, rather than absolute.
For example, when trying to determine what lies in address 0x20000, the address itself is not enough;
the process to which this address relates to must be specified.
The memory itself is called virtual, which means there is an indirect relationship between an address
and the exact location where it’s found in physical memory (RAM). A buffer within a process may
be mapped to physical memory, or it may temporarily reside in a file (such as a page file). The term
virtual refers to the fact that from an execution perspective, there is no need to know if the memory
about to be accessed is in RAM or not; if the memory is indeed mapped to RAM, the CPU will perform
the virtual-to-physical translation before accessing the data. if the memory is not resident (specified
by a flag in the translation table entry), the CPU will raise a page fault exception that causes the
memory manager’s page fault handler to fetch the data from the appropriate file (if indeed it’s a valid
page fault), copy it to RAM, make the required changes in the page table entries that map the buffer,
and instruct the CPU to try again. Figure 1-3 shows this conceptual mapping from virtual to physical
memory for two processes.
The unit of memory management is called a page. Every attribute related to memory is always at a
page’s granularity, such as its protection or state. The size of a page is determined by CPU type (and
on some processors, may be configurable), and in any case, the memory manager must follow suit.
Normal (sometimes called small) page size is 4 KB on all Windows-supported architectures.
Apart from the normal (small) page size, Windows also supports large pages. The size of a large page is
2 MB (x86/x64/ARM64) or 4 MB (ARM). This is based on using the Page Directory Entry (PDE) to map
the large page without using a page table. This results in quicker translation, but most importantly
better use of the Translation Lookaside Buffer (TLB) – a cache of recently translated pages maintained
Chapter 1: Windows Internals Overview 8
by the CPU. In the case of a large page, a single TLB entry maps significantly more memory than a
small page.
The downside of large pages is the need to have the memory contiguous in RAM, which
can fail if memory is tight or very fragmented. Also, large pages are always non-pageable
and can only use read/write protection.
Huge pages (1 GB in size) are supported on Windows 10 and Server 2016 and later. These
are used automatically with large pages if an allocation is at least 1 GB in size, and that
size can be located as contiguous in RAM.
Page States
Each page in virtual memory can be in one of three states:
• Free – the page is not allocated in any way; there is nothing there. Any attempt to access that
page would cause an access violation exception. Most pages in a newly created process are free.
• Committed – the reverse of free; an allocated page that can be accessed successfully (assuming
non-conflicting protection attributes; for example, writing to a read-only page causes an access
violation). Committed pages are mapped to RAM or to a file (such as a page file).
• Reserved – the page is not committed, but the address range is reserved for possible future
commitment. From the CPU’s perspective, it’s the same as Free – any access attempt raises
an access violation exception. However, new allocation attempts using the VirtualAlloc
function (or NtAllocateVirtualMemory, the related native API) that does not specify a specific
address would not allocate in the reserved region. A classic example of using reserved memory
to maintain contiguous virtual address space while conserving committed memory usage is
described later in this chapter in the section “Thread Stacks”.
System Memory
The lower part of the address space is for user-mode processes use. While a particular thread is
executing, its associated process address space is visible from address zero to the upper limit as
described in the previous section. The operating system, however, must also reside somewhere – and
that somewhere is the upper address range that’s supported on the system, as follows:
• On 32-bit systems running without the increase user virtual address space setting, the operating
system resides in the upper 2 GB of virtual address space, from address 0x80000000 to
0xFFFFFFFF.
• On 32-bit systems configured with the increase user virtual address space setting, the operating
system resides in the address space left. For example, if the system is configured with 3 GB
user address space per process (the maximum), the OS takes the upper 1 GB (from address
0xC0000000 to 0xFFFFFFFF). The component that suffers mostly from this address space
reduction is the file system cache.
• On 64-bit systems running Windows 8, Server 2012 and earlier, the OS takes the upper 8 TB of
virtual address space.
Chapter 1: Windows Internals Overview 9
• On 64-bit systems running Windows 8.1, Server 2012 R2 and later, the OS takes the upper 128
TB of virtual address space.
Figure 1-4 shows the virtual memory layout for the two “extreme” cases: 32-bit process on a 32-bit
system (left) and a 64-bit process on a 64-bit system (right).
System space is not process-relative – after all, it’s the same system, the same kernel, the same drivers
that service every process on the system (the exception is some system memory that is on a per-session
basis but is not important for this discussion). It follows that any address in system space is absolute
rather than relative, since it “looks” the same from every process context. Of course, actual access
from user mode into system space results in an access violation exception.
System space is where the kernel itself, the Hardware Abstraction Layer (HAL), and kernel drivers
reside once loaded. Thus, kernel drivers are automatically protected from direct user mode access. It
also means they have a potentially system-wide impact. For example, if a kernel driver leaks memory,
that memory will not be freed even after the driver unloads. User-mode processes, on the other
hand, can never leak anything beyond their lifetime. The kernel is responsible for closing and freeing
everything private to a dead process (all handles are closed and all private memory is freed).
Threads
The actual entities that execute code are threads. A Thread is contained within a process, using the
resources exposed by the process to do work (such as virtual memory and handles to kernel objects).
The most important details a thread owns are the following:
Figure 1-5 shows the state diagram for these states. The numbers in parenthesis indicate the state
numbers, as can be viewed by tools such as Performance Monitor. Note that the Ready state has a
sibling state called Deferred Ready, which is similar, and exists to minimize internal locking.
Thread Stacks
Each thread has a stack it uses while executing, used to store local variables, parameters passed to
functions (in some cases), and where return addresses are stored prior to making function calls. A
thread has at least one stack residing in system (kernel) space, and it’s pretty small (default is 12 KB
on 32-bit systems and 24 KB on 64-bit systems). A user-mode thread has a second stack in its process
user-space address range and is considerably larger (by default can grow up to 1 MB). An example
with three user-mode threads and their stacks is shown in figure 1-6. In the figure, threads 1 and 2
are in process A, and thread 3 is in process B.
The kernel stack always resides in RAM while the thread is in the Running or Ready states. The reason
for this is subtle and will be discussed later in this chapter. The user-mode stack, on the other hand,
may be paged out, just like any other user-mode memory.
Chapter 1: Windows Internals Overview 11
The user-mode stack is handled differently than the kernel-mode stack in terms of its size. It starts
out with a certain amount of committed memory (could be as small as a single page), where the
next page is committed with a PAGE_GUARD attribute. The rest of the stack address space memory is
reserved, thus not wasting memory. The idea is to grow the stack in case the thread’s code needs
to use more stack space. If the thread needs more stack space it would access the guard page which
would throw a page-guard exception. The memory manager then removes the guard protection, and
commits an additional page, marking it with a PAGE_GUARD attribute. This way, the stack grows as
needed, avoiding the entire stack memory being committed upfront. Figure 1-7 shows this layout.
Technically, Windows uses 3 guard pages rather than one in most cases.
Chapter 1: Windows Internals Overview 12
• The executable image has a stack commit and reserved values in its Portable Executable (PE)
header. These are taken as defaults if a thread does not specify alternative values. These are
always used for the first thread in the process.
• When a thread is created with CreateThread (or similar functions), the caller can specify
its required stack size, either the upfront committed size or the reserved size (but not both),
depending on a flag provided to the function; specifying zero uses the defaults set in the PE
header.
issues a special CPU instruction (syscall on x64 or sysenter on x86) that makes the actual transition
to kernel mode while jumping to a predefined routine called the system service dispatcher.
The system service dispatcher, in turn, uses the value in that EAX register as an index into a System
Service Dispatch Table (SSDT). Using this table, the code jumps to the system service (system call) itself.
For our Notepad example, the SSDT entry would point to the NtCreateFile function, implemented
by the kernel’s I/O manager. Notice the function has the same name as the one in NTDLL.dll, and has
the same parameters as well. On the kernel side is the real implementation. Once the system service
is complete, the thread returns to user mode to execute the instruction following sysenter/syscall.
This sequence of calls is depicted in figure 1-8.
• User processes
These are normal processes based on image files, executing on the system, such as instances of
Notepad.exe, cmd.exe, explorer.exe, and so on.
• Subsystem DLLs
Subsystem DLLs are dynamic link libraries (DLLs) that implement the API of a subsystem. A
subsystem is a particular view of the capabilities exposed by the kernel. Technically, starting
from Windows 8.1, there is only a single subsystem – the Windows Subsystem. The subsystem
DLLs include well-known files, such as kernel32.dll, user32.dll, gdi32.dll, advapi32.dll, com-
base.dll, and many others. These include mostly the officially documented API of Windows.
• NTDLL.DLL
A system-wide DLL, implementing the Windows native API. This is the lowest layer of code
which is still in user mode. Its most important role is to make the transition to kernel mode
for system call invocation. NTDLL also implements the Heap Manager, the Image Loader and
some part of the user mode thread pool.
• Service Processes
Chapter 1: Windows Internals Overview 15
Service processes are normal Windows processes that communicate with the Service Control
Manager (SCM, implemented in services.exe) and allow some control over their lifetime. The
SCM can start, stop, pause, resume and send other messages to services. Services typically
execute under one of the special Windows accounts – local system, network service or local
service.
• Executive
The Executive is the upper layer of NtOskrnl.exe (the “kernel”). It hosts most of the code that is
in kernel mode. It includes mostly the various “managers”: Object Manager, Memory Manager,
I/O Manager, Plug & Play Manager, Power Manager, Configuration Manager, etc. It’s by far
larger than the lower Kernel layer.
• Kernel
The Kernel layer implements the most fundamental and time-sensitive parts of kernel-mode OS
code. This includes thread scheduling, interrupt and exception dispatching, and implementation
of various kernel primitives such as mutexes and semaphores. Some of the kernel code is written
in CPU-specific machine language for efficiency and for getting direct access to CPU-specific
details.
• Device Drivers
Device drivers are loadable kernel modules. Their code executes in kernel mode and so has the
full power of the kernel. This book is dedicated to writing certain types of kernel drivers.
• Win32k.sys
This is the kernel-mode component of the Windows subsystem. Essentially, it’s a kernel
module (driver) that handles the user interface part of Windows and the classic Graphics
Device Interface (GDI) APIs. This means that all windowing operations (CreateWindowEx,
GetMessage, PostMessage, etc.) are handled by this component. The rest of the system has
little-to-none knowledge of UI.
• Hardware Abstraction Layer (HAL)
The HAL is a software abstraction layer over the hardware closest to the CPU. It allows device
drivers to use APIs that do not require detailed and specific knowledge of things like Interrupt
Controllers or DMA controller. Naturally, this layer is mostly useful for device drivers written
to handle hardware devices.
• System Processes
System processes is an umbrella term used to describe processes that are typically “just there”,
doing their thing where normally these processes are not communicated with directly. They are
important nonetheless, and some in fact, critical to the system’s well-being. Terminating some
of them is fatal and causes a system crash. Some of the system processes are native processes,
meaning they use the native API only (the API implemented by NTDLL). Example system
processes include Smss.exe, Lsass.exe, Winlogon.exe, and Services.exe.
• Subsystem Process
The Windows subsystem process, running the image Csrss.exe, can be viewed as a helper to the
kernel for managing processes running under the Windows subsystem. It is a critical process,
meaning if killed, the system would crash. There is one Csrss.exe instance per session, so on a
standard system two instances would exist – one for session 0 and one for the logged-on user
session (typically 1). Although Csrss.exe is the “manager” of the Windows subsystem (the only
one left these days), its importance goes beyond just this role.
Chapter 1: Windows Internals Overview 16
• Hyper-V Hypervisor
The Hyper-V hypervisor exists on Windows 10 and server 2016 (and later) systems if they
support Virtualization Based Security (VBS). VBS provides an extra layer of security, where
the normal OS is a virtual machine controlled by Hyper-V. Two distinct Virtual Trust Levels
(VTLs) are defined, where VTL 0 consists of the normal user-mode/kernel-mode we know of,
and VTL 1 contains the secure kernel and Isolated User Mode (IUM). VBS is beyond the scope
of this book. For more information, check out the Windows Internals book and/or the Microsoft
documentation.
Windows 10 version 1607 introduced the Windows Subsystem for Linux (WSL). Although
this may look like yet another subsystem, like the old POSIX and OS/2 subsystems
supported by Windows, it is not like that at all. The old subsystems were able to execute
POSIX and OS/2 apps if these were compiled using a Windows compiler to use the PE
format and Windows system calls. WSL, on the other hand, has no such requirement.
Existing executables from Linux (stored in ELF format) can be run as-is on Windows,
without any recompilation.
To make something like this work, a new process type was created – the Pico process
together with a Pico provider. Briefly, a Pico process is an empty address space (minimal
process) that is used for WSL processes, where every system call (Linux system call) must
be intercepted and translated to the Windows system call(s) equivalent using that Pico
provider (a device driver). There is a true Linux (the user-mode part) installed on the
Windows machine.
The above description is for WSL version 1. Starting with Windows 10 version 2004,
Windows supports a new version of WSL known as WSL 2. WSL 2 is not based on pico
processes anymore. Instead, it’s based on a hybrid virtual machine technology that allows
installing a full Linux system (including the Linux kernel), but still see and share the
Windows machine’s resources, such as the file system. WSL 2 is faster than WSL 1 and
solves some edge cases that didn’t work well in WSL 1, thanks to the real Linux kernel
handling Linux system calls.
function returns a handle to the object. A return value of zero means an invalid handle (and a function
call failure). The OpenMutex function, on the other hand, tries to open a handle to a named mutex. If
the mutex with that name does not exist, the function fails and returns null (0).
Kernel (and driver) code can use either a handle or a direct pointer to an object. The choice is usually
based on the API the code wants to call. In some cases, a handle given by user mode to the driver
must be turned into a pointer with the ObReferenceObjectByHandle function. We’ll discuss these
details in a later chapter.
Most functions return null (zero) on failure, but some do not. Most notably, the
CreateFile function returns INVALID_HANDLE_VALUE (-1) if it fails.
Handle values are multiples of 4, where the first valid handle is 4; Zero is never a valid handle value.
Kernel-mode code can use handles when creating/opening objects, but they can also use direct
pointers to kernel objects. This is typically done when a certain API demands it. Kernel code can
get a pointer to an object given a valid handle using the ObReferenceObjectByHandle function. If
successful, the reference count on the object is incremented, so there is no danger that if the user-mode
client holding the handle decided to close it while kernel code holds a pointer to the object would now
hold a dangling pointer. The object is safe to access regardless of the handle-holder until the kernel
code calls ObDerefenceObject, which decrements the reference count; if the kernel code missed this
call, that’s a resource leak which will only be resolved in the next system boot.
All objects are reference counted. The object manager maintains a handle count and total reference
count for objects. Once an object is no longer needed, its client should close the handle (if a handle
was used to access the object) or dereference the object (if kernel client using a pointer). From that
point on, the code should consider its handle/pointer to be invalid. The Object Manager will destroy
the object if its reference count reaches zero.
Each object points to an object type, which holds information on the type itself, meaning there is a
single type object for each type of object. These are also exposed as exported global kernel variables,
some of which are defined in the kernel headers and are needed in certain cases, as we’ll see in later
chapters.
Object Names
Some types of objects can have names. These names can be used to open objects by name with a
suitable Open function. Note that not all objects have names; for example, processes and threads
don’t have names – they have IDs. That’s why the OpenProcess and OpenThread functions require
a process/thread identifier (a number) rather than a string-base name. Another somewhat weird case
of an object that does not have a name is a file. The file name is not the object’s name – these are
different concepts.
Chapter 1: Windows Internals Overview 18
Threads appear to have a name (starting from Windows 10), that can be set with the
user-mode API SetThreadDescription. This is not, however, a true name, but rather a
friendly name/description most useful in debugging, as Visual Studio shows a thread’s
description, if there is any.
From user-mode code, calling a Create function with a name creates the object with that name if an
object with that name does not exist, but if it exists it just opens the existing object. In the latter case,
calling GetLastError returns ERROR_ALREADY_EXISTS, indicating this is not a new object, and the
returned handle is yet another handle to an existing object.
The name provided to a Create function is not actually the final name of the object. It’s prepended with
\Sessions\x\BaseNamedObjects\ where x is the session ID of the caller. If the session is zero, the name
is prepended with \BaseNamedObjects\. If the caller happens to be running in an AppContainer (typi-
cally a Universal Windows Platform process), then the prepended string is more complex and consists
of the unique AppContainer SID: \Sessions\x\AppContainerNamedObjects\{AppContainerSID}.
All the above means is that object names are session-relative (and in the case of AppContainer
– package relative). If an object must be shared across sessions it can be created in session 0 by
prepending the object name with Global\; for example, creating a mutex with the CreateMutex
function named Global\MyMutex will create it under \BaseNamedObjects. Note that AppContainers
do not have the power to use session 0 object namespace.
This hierarchy can be viewed with the Sysinternals WinObj tool (run elevated) as shown in figure
1-10.
Chapter 1: Windows Internals Overview 19
The view shown in figure 1-10 is the object manager namespace, comprising of a hierarchy of named
objects. This entire structure is held in memory and manipulated by the Object Manager (part of the
Executive) as required. Note that unnamed objects are not part of this structure, meaning the objects
seen in WinObj do not comprise all the existing objects, but rather all the objects that were created
with a name.
Every process has a private handle table to kernel objects (whether named or not), which can be
viewed with the Process Explorer and/or Handles Sysinternals tools. A screenshot of Process Explorer
showing handles in some process is shown in figure 1-11. The default columns shown in the handles
view are the object type and name only. However, there are other columns available, as shown in
figure 1-11.
Chapter 1: Windows Internals Overview 20
By default, Process Explorer shows only handles for objects, which have names (according to Process
Explorer’s definition of a name, discussed shortly). To view all handles in a process, select Show
Unnamed Handles and Mappings from Process Explorer’s View menu.
The various columns in the handle view provide more information for each handle. The handle value
and the object type are self explanatory. The name column is tricky. It shows true object names
for Mutexes (Mutants), Semaphores, Events, Sections, ALPC Ports, Jobs, Timers, Directory (object
manager Directories, not file system directories), and other, less used object types. Yet others are
shown with a name that has a different meaning than a true named object:
• Process and Thread objects, the name is shown as their unique ID.
• For File objects, it shows the file name (or device name) pointed to by the file object. It’s not
the same as an object’s name, as there is no way to get a handle to a file object given the file
name - only a new file object may be created that accesses the same underlying file or device
(assuming sharing settings for the original file object allow it).
• (Registry) Key objects names are shown with the path to the registry key. This is not a name,
for the same reasoning as for file objects.
• Token object names are shown with the user name stored in the token.
//
// now kill it with some arbitrary exit code
//
BOOL success = TerminateProcess(hProcess, 1);
//
// close the handle
//
CloseHandle(hProcess);
The Decoded Access column provides a textual description of the access mask (for some object types),
making it easier to identify the exact access allowed for a particular handle.
Double-clicking a handle entry (or right-clicking and selecting Properties) shows some of the object’s
properties. Figure 1-12 shows a screenshot of an example event object properties.
Chapter 1: Windows Internals Overview 22
Notice that the dialog shown in figure 1-12 is for the object’s properties, rather than the handle’s. In
other words, looking at an object’s properties from any handle that points to the same object shows
the same information.
The properties in figure 1-12 include the object’s name (if any), its type, a short description, its address
in kernel memory, the number of open handles, and some specific object information, such as the state
and type of the event object shown. Note that the References shown do not indicate the actual number
Chapter 1: Windows Internals Overview 23
of outstanding references to the object (it does prior to Windows 8.1). A proper way to see the actual
reference count for the object is to use the kernel debugger’s !trueref command, as shown here:
We’ll take a closer look at the attributes of objects and the kernel debugger in later chapters.
In the next chapter, we’ll start writing a very simple driver to show and use many of the tools we’ll
need later in this book.
Chapter 2: Getting Started with Kernel
Development
This chapter deals with the fundamentals needed to get up and running with kernel driver develop-
ment. During the course of this chapter, you’ll install the necessary tools and write a very basic driver
that can be loaded and unloaded.
In this chapter:
• Visual Studio 2019 with the latest updates. Make sure the C++ workload is selected during
installation. Note that any SKU will do, including the free Community edition.
• Windows 11 SDK (generally, the latest is recommended). Make sure at least the Debugging
Tools for Windows item is selected during installation.
• Windows 11 Driver Kit (WDK) - it supports building drivers for Windows 7 and later versions
of Windows. Make sure the wizard installs the project templates for Visual Studio at the end of
the installation.
Chapter 2: Getting Started with Kernel Development 25
• The Sysinternals tools, which are invaluable in any “internals” work, can be downloaded for
free from http://www.sysinternals.com. Click on Sysinternals Suite on the left of that web page
and download the Sysinternals Suite zip file. Unzip to any folder, and the tools are ready to go.
The SDK and WDK versions must match. Follow the guidelines in the WDK download
page to load the corresponding SDK with the WDK.
A quick way to make sure the WDK templates are installed correctly is to open Visual
Studio and select New Project and look for driver projects, such as “Empty WDM Driver”.
Figure 2-2: New WDM Driver Project in Visual Studio 2019 with the Classic Project Dialog extension
Once the project is created, the Solution Explorer shows a single file within the Driver Files filter -
Sample.inf. You won’t need this file in this example, so simply delete it (right-click and select Remove
or press the Del key).
Now it’s time to add a source file. Right-click the Source Files node in Solution Explorer and select
Add / New Item… from the File menu. Select a C++ source file and name it Sample.cpp. Click OK to
create it.
NTSTATUS DriverEntry(
_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath);
The _In_ annotations are part of the Source (Code) Annotation Language (SAL). These annotations
are transparent to the compiler, but provide metadata useful for human readers and static analysis
Chapter 2: Getting Started with Kernel Development 27
tools. I may remove these annotations in code samples to make it easier to read, but you should use
SAL annotations whenever possible.
A minimal DriverEntry routine could just return a successful status, like so:
NTSTATUS DriverEntry(
_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath) {
return STATUS_SUCCESS;
}
This code would not yet compile. First, you’ll need to include a header that has the required definitions
for the types present in DriverEntry. Here’s one possibility:
#include <ntddk.h>
Now the code has a better chance of compiling, but would still fail. One reason is that by default, the
compiler is set to treat warnings as errors, and the function does not make use of its given arguments.
Removing treat warnings as errors from the compiler’s options is not recommended, as some warnings
may be errors in disguise. These warnings can be resolved by removing the argument names entirely
(or commenting them out), which is fine for C++ files. There is another, more “classic” way to solve
this, which is to use the UNREFERENCED_PARAMETER macro:
NTSTATUS
DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
UNREFERENCED_PARAMETER(DriverObject);
UNREFERENCED_PARAMETER(RegistryPath);
return STATUS_SUCCESS;
}
As it turns out, this macro actually references the argument given just by writing its value as is, and
this shuts the compiler up, making the argument technically “referenced”.
Building the project now compiles fine, but causes a linker error. The DriverEntry function must
have C-linkage, which is not the default in C++ compilation. Here’s the final version of a successful
build of the driver consisting of a DriverEntry function only:
return STATUS_SUCCESS;
}
Chapter 2: Getting Started with Kernel Development 28
At some point, the driver may be unloaded. At that time, anything done in the DriverEntry function
must be undone. Failure to do so creates a leak, which the kernel will not clean up until the next
reboot. Drivers can have an Unload routine that is automatically called before the driver is unloaded
from memory. Its pointer must be set using the DriverUnload member of the driver object:
DriverObject->DriverUnload = SampleUnload;
The unload routine accepts the driver object (the same one passed to DriverEntry) and returns
void. As our sample driver has done nothing in terms of resource allocation in DriverEntry, there
is nothing to do in the Unload routine, so we can leave it empty for now:
#include <ntddk.h>
DriverObject->DriverUnload = SampleUnload;
return STATUS_SUCCESS;
}
Note there is no space between type and the equal sign, and there is a space between the equal sign
and kernel; same goes for the second part.
If all goes well, the output should indicate success. To test the installation, you can open the registry ed-
itor (regedit.exe) and look for the driver details at HKLM\System\CurrentControlSet\Services\Sample.
Figure 2-3 shows a screenshot of the registry editor after the previous command.
To load the driver, we can use the Sc.exe tool again, this time with the start option, which uses the
StartService API to load the driver (the same API used to load services). However, on 64 bit systems
drivers must be signed, and so normally the following command would fail:
sc start sample
Since it’s inconvenient to sign a driver during development (maybe even not possible if you don’t
have a proper certificate), a better option is to put the system into test signing mode. In this mode,
unsigned drivers can be loaded without a hitch.
With an elevated command window, test signing can be turned on like so:
Unfortunately, this command requires a reboot to take effect. Once rebooted, the previous start
command should succeed.
Chapter 2: Getting Started with Kernel Development 30
If you are testing on a Windows 10 (or later) system with Secure Boot enabled, changing the
test signing mode will fail. This is one of the settings protected by Secure Boot (local kernel
debugging is also protected by Secure Boot). If you can’t disable Secure Boot through BIOS
setting, because of IT policy or some other reason, your best option is to test on a virtual
machine.
There is yet another setting that you may need to specify if you intend to test the driver on pre-
Windows 10 machine when using Visual Studio 2019 (or earlier) only. In this case, you have to set the
target OS version in the project properties dialog, as shown in figure 2-4. Notice that I have selected all
configurations and all platforms, so that when switching configurations (Debug/Release) or platforms
(x86/x64/ARM/ARM64), the setting is maintained.
Once test signing mode is on, and the driver is loaded, this is the output you should see:
Chapter 2: Getting Started with Kernel Development 31
With Visual Studio 2022, you can only build drivers for Windows 10 and later.
This means everything is well, and the driver is loaded. To confirm, we can open Process Explorer and
find the Sample.Sys driver image file. Figure 2-5 shows the details of the sample driver image loaded
into system space.
At this point, we can unload the driver using the following command:
sc stop sample
Behind the scenes, sc.exe calls the ControlService API with the SERVICE_CONTROL_STOP value.
Unloading the driver causes the Unload routine to be called, which at this time does nothing. You
can verify the driver is indeed unloaded by looking at Process Explorer again; the driver image entry
should not be there anymore.
Chapter 2: Getting Started with Kernel Development 32
Simple Tracing
How can we know for sure that the DriverEntry and Unload routines actually executed? Let’s add
basic tracing to these functions. Drivers can use the DbgPrint function to output printf-style text
that can be viewed using the kernel debugger, or some other tool.
Here is updated versions for DriverEntry and the Unload routine that use DbgPrint to trace the fact
their code executed:
DriverObject->DriverUnload = SampleUnload;
return STATUS_SUCCESS;
}
A more typical approach is to have these outputs in Debug builds only. This is because Dbgprint
has some overhead that you may want to avoid in Release builds. KdPrint is a macro that is only
compiled in Debug builds and calls the underlying DbgPrint kernel API. Here is a revised version
that uses KdPrint:
DriverObject->DriverUnload = SampleUnload;
Chapter 2: Getting Started with Kernel Development 33
return STATUS_SUCCESS;
}
Notice the double parenthesis when using KdPrint. This is required because KdPrint is a macro, but
apparently accepts any number of arguments, a-la printf. Since macros cannot receive a variable
number of parameters, a compiler trick is used to call the DbgPrint function that does accept a variable
number of parameters.
With these statements in place, we would like to load the driver again and see these messages. We’ll use
a kernel debugger in chapter 4, but for now we’ll use a useful Sysinternals tool named DebugView.
Before running DebugView, you’ll need to make some preparations. First, starting with Windows
Vista, DbgPrint output is not actually generated unless a certain value is in the registry. You’ll
have to add a key named Debug Print Filter under HKLM\SYSTEM\CurrentControlSet\Control\Session
Manager (the key typically does not exist). Within this new key, add a DWORD value named DEFAULT
(not the default value that exists in any key) and set its value to 8 (technically, any value with bit 3
set will do). Figure 2-6 shows the setting in RegEdit. Unfortunately, you’ll have to restart the system
for this setting to take effect.
Once this setting has been applied, run DebugView (DbgView.exe) elevated. In the Options menu,
make sure Capture Kernel is selected (or press Ctrl+K). You can safely deselect Capture Win32 and
Capture Global Win32, so that user-mode output from various processes does not clutter the display.
Chapter 2: Getting Started with Kernel Development 34
DebugView is able to show kernel debug output even without the Registry value shown
in figure 2-6 if you select Enable Verbose Kernel Output from its Capture menu. However,
it seems this option does not work on Windows 11, and the Registry setting is necessary.
Build the driver, if you haven’t already. Now you can load the driver again from an elevated command
window (sc start sample). You should see output in DebugView as shown in figure 2-7. If you
unload the driver, you’ll see another message appearing because the Unload routine was called. (The
third output line is from another driver and has nothing to do with our sample driver)
Add code to the sample DriverEntry to output the Windows OS version: major, minor,
and build number. Use the RtlGetVersion function to retrieve the information. Check
the results with DebugView.
Summary
We’ve seen the tools you need to have for kernel development and wrote a very minimalistic driver
to prove the basic tools work. In the next chapter, we’ll look at the fundamental building blocks of
kernel APIs, concepts, and fundamental structures.
Chapter 3: Kernel Programming Basics
In this chapter, we’ll dig deeper into kernel APIs, structures, and definitions. We’ll also examine some
of the mechanisms that invoke code in a driver. Finally, we’ll put all that knowledge together to create
our first functional driver and client application.
In this chapter:
Table 3-1: Differences between user mode and kernel mode development
Table 3-1: Differences between user mode and kernel mode development
Unhandled Exceptions
Exceptions occurring in user-mode that are not caught by the program cause the process to terminate
prematurely. Kernel-mode code, on the other hand, being implicitly trusted, cannot recover from an
unhandled exception. Such an exception causes the system to crash with the infamous Blue screen
of death (BSOD) (newer versions of Windows have more diverse colors for the crash screen). The
BSOD may first appear to be a form of punishment, but it’s essentially a protection mechanism. The
rationale being it, is that allowing the code to continue execution could cause irreversible damage to
Windows (such as deleting important files or corrupting the registry) that may cause the system to
fail boot. It’s better, then, to stop everything immediately to prevent potential damage. We’ll discuss
the BSOD in more detail in chapter 6.
All this leads to at least one conclusion: kernel code must be meticulously programmed, and no details
like error checking should be skipped.
Termination
When a process terminates, for whatever reason - either normally, because of an unhandled exception,
or terminated by external code - it never leaks anything: all private memory is freed, and all handles
are closed. Of course, premature handle closing may cause some loss of data, such as a file handle
being closed before flushing some data to disk - but there are no resource leaks beyond the lifetime
of the process; this is guaranteed by the kernel.
Kernel drivers, on the other hand, don’t provide such a guarantee. If a driver unloads while still holding
onto allocated memory or open kernel handles - these resources will not be freed automatically, only
released at the next system boot.
Why is that? Can’t the kernel keep track of a driver’s allocations and resource usage so these can be
freed automatically when the driver unloads?
Theoretically, this would have been possible to achieve (although currently the kernel does not track
such resource usage). The real issue is that it would be too dangerous for the kernel to attempt such
cleanup. The kernel has no way of knowing whether the driver leaked those resources for a reason;
Chapter 3: Kernel Programming Basics 37
for example, the driver could allocate some buffer and then pass it to another driver, with which
it cooperates. That second driver may use the memory buffer and free it eventually. If the kernel
attempted to free the buffer when the first driver unloads, the second driver would cause an access
violation when accessing that now-freed buffer, causing a system crash.
This emphasizes the responsibility of a kernel driver to properly clean up allocated resources; no one
else will do it.
IRQL
Interrupt Request Level (IRQL) is an important kernel concept that will be further discussed in chapter
6. Suffice it to say at this point that normally a processor’s IRQL is zero, and in particular it’s always
zero when user-mode code is executing. In kernel mode, it’s still zero most of the time - but not all the
time. Some restrictions on code execution exist at IRQL 2 and higher, which means the driver writer
must be careful to use only allowed APIs at that high IRQL. The effects of higher than zero IRQLs are
discussed in chapter 6.
C++ Usage
In user mode programming, C++ has been used for many years, and it works well when combined
with user-mode Windows APIs. With kernel code, Microsoft started officially supporting C++ with
Visual Studio 2012 and WDK 8. C++ is not mandatory, of course, but it has some important benefits
related to resource cleanup, with a C++ idiom called Resource Acquisition Is Initialization (RAII).
We’ll use this RAII idiom quite a bit to make sure we don’t leak resources.
C++ as a language is almost fully supported for kernel code. But there is no C++ runtime in the kernel,
and so some C++ features just cannot be used:
• The new and delete operators are not supported and will fail to compile. This is because their
normal operation is to allocate from a user-mode heap, which is irrelevant within the kernel.
The kernel API has “replacement” functions that are more closely modeled after the C functions
malloc and free. We’ll discuss these functions later in this chapter. It is possible, however, to
overload the new and delete operators similarly as is sometimes done in user-mode, and invoke
the kernel allocation and free functions in the implementation. We’ll see how to do that later
in this chapter as well.
Chapter 3: Kernel Programming Basics 38
• Global variables that have non-default constructors will not be called - there is no C/C++
runtime to call these constructors. These situations must be avoided, but there are some
workarounds:
– Avoid any code in the constructor and instead create some Init function to be called
explicitly from driver code (e.g. from DriverEntry).
– Allocate a pointer only as a global (or static) variable, and create the actual instance
dynamically. The compiler will generate the correct code to invoke the constructor. This
works assuming the new and delete operators have been overloaded, as described later
in this chapter.
• The C++ exception handling keywords (try, catch, throw) do not compile. This is because
the C++ exception handling mechanism requires its own runtime, which is not present in the
kernel. Exception handling can only be done using Structured Exception Handling (SEH) - a
kernel mechanism to handle exceptions. We’ll take a detailed look at SEH in chapter 6.
• The standard C++ libraries are not available in the kernel. Although most are template-based,
these do not compile, because they may depend on user-mode libraries and semantics. That
said, C++ templates as a language feature work just fine. One good usage of templates is to
create alternatives for a kernel-mode library types, based on similar types from the user-mode
standard C++ library, such as std::vector<>, std::wstring, etc.
The code examples in this book make some use of C++. The features mostly used in the code examples
are:
Any C++ standard can be used for kernel development. The Visual Studio setting for new projects is
to use C++ 14. However, you can change the C++ compiler standard to any other setting, including
C++ 20 (the latest standard as of this writing). Some features we’ll use later will depend on C++ 17 at
least.
Strictly speaking, kernel drivers can be written in pure C without any issues. If you prefer to go that
route, use files with a C extension rather than CPP. This will automatically invoke the C compiler for
these files.
Debugging kernel code must be done with another machine, where the actual driver is executing.
This is because hitting a breakpoint in kernel-mode freezes the entire machine, not just a particular
process. The developer’s machine hosts the debugger itself, while the second machine (again, usually
a virtual machine) executes the driver code. These two machines must be connected through some
mechanism so data can flow between the host (where the debugger is running) and the target. We’ll
look at kernel debugging in more detail in chapter 5.
If you take a look at the exported functions list from NtOsKrnl.exe, you’ll find many functions that
are not documented in the Windows Driver Kit; this is just a fact of a kernel developer’s life - not
everything is documented.
One set of functions bears discussion at this point - the Zw prefixed functions. These functions mirror
native APIs available as gateways from NtDll.Dll with the actual implementation provided by the
Executive. When an Nt function is called from user mode, such as NtCreateFile, it reaches the
Executive at the actual NtCreateFile implementation. At this point, NtCreateFile might do various
checks based on the fact that the original caller is from user mode. This caller information is stored
on a thread-by-thread basis, in the undocumented PreviousMode member in the KTHREAD structure
for each thread.
You can query the previous processor mode by calling the documented ExGetPreviousMode API.
On the other hand, if a kernel driver needs to call a system service, it should not be subjected to the
same checks and constraints imposed on user-mode callers. This is where the Zw functions come into
play. Calling a Zw function sets the previous caller mode to KernelMode (0) and then invokes the
native function. For example, calling ZwCreateFile sets the previous caller to KernelMode and then
calls NtCreateFile, causing NtCreateFile to bypass some security and buffer checks that would
otherwise be performed. The bottom line is that kernel drivers should call the Zw functions unless
there is a compelling reason to do otherwise.
Most code paths don’t care about the exact nature of the error, and so testing the most significant bit
is enough to find out whether an error occurred. This can be done with the NT_SUCCESS macro. Here
is an example that tests for failure and logs an error if that is the case:
NTSTATUS DoWork() {
NTSTATUS status = CallSomeKernelFunction();
if(!NT_SUCCESS(status)) {
KdPrint((L"Error occurred: 0x%08X\n", status));
return status;
}
return STATUS_SUCCESS;
}
In some cases, NTSTATUS values are returned from functions that eventually bubble up to user mode.
In these cases, the STATUS_xxx value is translated to some ERROR_yyy value that is available to user-
mode through the GetLastError function. Note that these are not the same numbers; for one, error
codes in user-mode have positive values (zero is still success). Second, the mapping is not one-to-one.
In any case, this is not generally a concern for a kernel driver.
Internal kernel driver functions also typically return NTSTATUS to indicate their success/failure status.
This is usually convenient, as these functions make calls to kernel APIs and so can propagate any
error by simply returning the same status they got back from the particular API. This also implies
that the “real” return values from driver functions is typically returned through pointers or references
provided as arguments to the function.
Return NTSTATUS from your own functions. It will make it easier and consistent to report
errors.
Strings
The kernel API uses strings in many scenarios as needed. In some cases, these strings are simple
Unicode pointers (wchar_t* or one of their typedefs such as WCHAR*), but most functions dealing
with strings expect a structure of type UNICODE_STRING.
The term Unicode as used in this book is roughly equivalent to UTF-16, which means 2 bytes per
character. This is how strings are stored internally within kernel components. Unicode in general is a
set of standards related to character encoding. You can find more information at https://unicode.org.
Chapter 3: Kernel Programming Basics 42
The UNICODE_STRING structure represents a string with its length and maximum length known. Here
is a simplified definition of the structure:
The Length member is in bytes (not characters) and does not include a Unicode-NULL terminator, if
one exists (a NULL terminator is not mandatory). The MaximumLength member is the number of bytes
the string can grow to without requiring a memory reallocation.
Manipulating UNICODE_STRING structures is typically done with a set of Rtl functions that deal
specifically with strings. Table 3-3 lists some of the common functions for string manipulation
provided by the Rtl functions.
Function Description
RtlInitUnicodeString Initializes a UNICODE_STRING based on an existing C-string pointer.
It sets Buffer, then calculates the Length and sets MaximumLength
to the same value. Note that this function does not allocate any
memory - it just initializes the internal members.
RtlCopyUnicodeString Copies one UNICODE_STRING to another. The destination string
pointer (Buffer) must be allocated before the copy and
MaximumLength set appropriately.
RtlCompareUnicodeString Compares two UNICODE_STRINGs (equal, less, greater), specifying
whether to do a case sensitive or insensitive comparison.
RtlEqualUnicodeString Compares two UNICODE_STRINGs for equality, with case sensitivity
specification.
RtlAppendUnicodeStringToString Appends one UNICODE_STRING to another.
RtlAppendUnicodeToString Appends UNICODE_STRING to a C-style string.
In addition to the above functions, there are functions that work on C-string pointers. Moreover, some
of the well-known string functions from the C Runtime Library are implemented within the kernel
as well for convenience: wcscpy_s, wcscat_s, wcslen, wcscpy_s, wcschr, strcpy, strcpy_s and
others.
Chapter 3: Kernel Programming Basics 43
The wcs prefix works with C Unicode strings, while the str prefix works with C Ansi
strings. The suffix _s in some functions indicates a safe function, where an additional
argument indicating the maximum length of the string must be provided so the function
would not transfer more data than that size.
Never use the non-safe functions. You can include <dontuse.h> to get errors for deprecated
functions if you do use these in code.
Clearly, the non-paged pool is a “better” memory pool as it can never incur a page fault. We’ll see
later in this book that some cases require allocating from non-paged pool. Drivers should use this pool
sparingly, only when necessary. In all other cases, drivers should use the paged pool. The POOL_TYPE
enumeration represents the pool types. This enumeration includes many “types” of pools, but only
three should be used by drivers: PagedPool, NonPagedPool, NonPagedPoolNx (non-page pool without
execute permissions).
Table 3-4 summarizes the most common functions used for working with the kernel memory pools.
Function Description
ExAllocatePool Allocate memory from one of the pools with a default tag. This function
is considered obsolete. The next function in this table should be used
instead
ExAllocatePoolWithTag Allocate memory from one of the pools with the specified tag
ExAllocatePoolZero Same as ExAllocatePoolWithTag, but zeroes out the memory block
ExAllocatePoolWithQuotaTag Allocate memory from one of the pools with the specified tag and
charge the current process quota for the allocation
ExFreePool Free an allocation. The function knows from which pool the allocation
was made
Chapter 3: Kernel Programming Basics 44
ExAllocatePool calls ExAllocatePoolWithTag using the tag enoN (the word “none” in
reverse). Older Windows versions used ‘ mdW (WDM in reverse). You should avoid
this function and use ExAllocatePoolWithTag‘ instead.
ExAllocatePoolZero is implemented inline in wdm.h by calling
ExAllocatePoolWithTag and adding the POOL_ZERO_ALLOCATION (=1024) flag to
the pool type.
Other memory management functions are covered in chapter 8, “Advanced Programming Tech-
niques”.
The tag argument allows “tagging” an allocation with a 4-byte value. Typically this value is comprised
of up to 4 ASCII characters logically identifying the driver, or some part of the driver. These tags can
be used to help identify memory leaks - if any allocations tagged with the driver’s tag remain after
the driver is unloaded. These pool allocations (with their tags) can be viewed with the Poolmon WDK
tool, or my own PoolMonXv2 tool (downloadable from http://www.github.com/zodiacon/AllTools).
Figure 3-1 shows a screenshot of PoolMonXv2.
Chapter 3: Kernel Programming Basics 45
You must use tags comprised of printable ASCII characters. Otherwise, running the driver
under the control of the Driver Verifier (described in chapter 11) would lead to Driver
Verifier complaining.
The following code example shows memory allocation and string copying to save the registry path
passed to DriverEntry, and freeing that string in the Unload routine:
UNICODE_STRING g_RegistryPath;
g_RegistryPath.Buffer = (WCHAR*)ExAllocatePoolWithTag(PagedPool,
RegistryPath->Length, DRIVER_TAG);
if (g_RegistryPath.Buffer == nullptr) {
KdPrint(("Failed to allocate memory\n"));
return STATUS_INSUFFICIENT_RESOURCES;
}
g_RegistryPath.MaximumLength = RegistryPath->Length;
RtlCopyUnicodeString(&g_RegistryPath,
(PCUNICODE_STRING)RegistryPath);
ExFreePool(g_RegistryPath.Buffer);
KdPrint(("Sample driver Unload called\n"));
}
Linked Lists
The kernel uses circular doubly linked lists in many of its internal data structures. For example, all
processes on the system are managed by EPROCESS structures, connected in a circular doubly linked
list, where its head is stored in the kernel variable PsActiveProcessHead.
All these lists are built in the same way, centered around the LIST_ENTRY structure defined like so:
Figure 3-2 depicts an example of such a list containing a head and three instances.
Chapter 3: Kernel Programming Basics 47
One such structure is embedded inside the real structure of interest. For example, in the EPROCESS
structure, the member ActiveProcessLinks is of type LIST_ENTRY, pointing to the next and previous
LIST_ENTRY objects of other EPROCESS structures. The head of a list is stored separately; in the case
of the process, that’s PsActiveProcessHead.
To get the pointer to the actual structure of interest given the address of a LIST_ENTRY can be obtained
with the CONTAINING_RECORD macro.
For example, suppose you want to manage a list of structures of type MyDataItem defined like so:
struct MyDataItem {
// some data members
LIST_ENTRY Link;
// more data members
};
When working with these linked lists, we have a head for the list, stored in a variable. This means that
natural traversal is done by using the Flink member of the list to point to the next LIST_ENTRY in
the list. Given a pointer to the LIST_ENTRY, what we’re really after is the MyDataItem that contains
this list entry member. This is where the CONTAINING_RECORD comes in:
The macro does the proper offset calculation and does the casting to the actual data type (MyDataItem
in the example).
Table 3-5 shows the common functions for working with these linked lists. All operations use constant
time.
Chapter 3: Kernel Programming Basics 48
Function Description
InitializeListHead Initializes a list head to make an empty list. The forward and back
pointers point to the forward pointer.
InsertHeadList Insert an item to the head of the list.
InsertTailList Insert an item to the tail of the list.
IsListEmpty Check if the list is empty.
RemoveHeadList Remove the item at the head of the list.
RemoveTailList Remove the item at the tail of the list.
RemoveEntryList Remove a specific item from the list.
ExInterlockedInsertHeadList Insert an item at the head of the list atomically by using the specified
spinlock.
ExInterlockedInsertTailList Insert an item at the tail of the list atomically by using the specified
spinlock.
ExInterlockedRemoveHeadList Remove an item from the head of the list atomically by using the
specified spinlock.
The last three functions in table 3-4 perform the operation atomically using a synchronization
primitive called a spin lock. Spin locks are discussed in chapter 6.
Initially, the MajorFunction array is initialized by the kernel to point to a kernel internal routine,
IopInvalidDeviceRequest, which returns a failure status to the caller, indicating the operation is
not supported. This means the driver, in its DriverEntry routine only needs to initialize the actual
operations it supports, leaving all the other entries in their default values.
For example, our Sample driver at this point does not support any dispatch routines, which means
there is no way to communicate with the driver. A driver must at least support the IRP_MJ_CREATE
and IRP_MJ_CLOSE operations, to allow opening a handle to one of the device objects for the driver.
We’ll put these ideas into practice in the next chapter.
Object Attributes
One of the common structures that shows up in many kernel APIs is OBJECT_ATTRIBUTES, defined
like so:
Chapter 3: Kernel Programming Basics 50
The structure is typically initialized with the InitializeObjectAttributes macro, that allows
specifying all the structure members except Length (set automatically by the macro), and
SecurityQualityOfService, which is not normally needed. Here is the description of the members:
NTSTATUS ZwOpenProcess (
_Out_ PHANDLE ProcessHandle,
_In_ ACCESS_MASK DesiredAccess,
_In_ POBJECT_ATTRIBUTES ObjectAttributes,
_In_opt_ PCLIENT_ID ClientId);
It uses yet another common structure, CLIENT_ID that holds a process and/or a thread ID:
To open a process, we need to specify the process ID in the UniqueProcess member. Note that
although the type of UniqueProcess is HANDLE, it is the unique ID of the process. The reason for
Chapter 3: Kernel Programming Basics 52
the HANDLE type is that process and thread IDs are generated from a private handle table. This also
explains why process and thread IDs are always multiple of four (just like normal handles), and why
they don’t overlap.
With these details at hand, here is a process opening function:
NTSTATUS
OpenProcess(ACCESS_MASK accessMask, ULONG pid, PHANDLE phProcess) {
CLIENT_ID cid;
cid.UniqueProcess = ULongToHandle(pid);
cid.UniqueThread = nullptr;
OBJECT_ATTRIBUTES procAttributes =
RTL_CONSTANT_OBJECT_ATTRIBUTES(nullptr, OBJ_KERNEL_HANDLE);
return ZwOpenProcess(phProcess, accessMask, &procAttributes, &cid);
}
The ULongToHandle function performs the required casts so that the compiler is happy (HANDLE is
64-bit on a 64-bit system, but ULONG is always 32-bit). The only member used in the above code from
OBJECT_ATTRIBUTES is the Attributes flags.
The second example is a function that opens a handle to a file for read access, by using the ZwOpenFile
API, defined like so:
NTSTATUS ZwOpenFile(
_Out_ PHANDLE FileHandle,
_In_ ACCESS_MASK DesiredAccess,
_In_ POBJECT_ATTRIBUTES ObjectAttributes,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
_In_ ULONG ShareAccess,
_In_ ULONG OpenOptions);
A full discussion of the parameters to ZwOpenFile is reserved for chapter 11, but one thing is
obvious: the file name itself is specified using the OBJECT_ATTRIBUTES structure - there is no separate
parameter for that. Here is the full function opening a handle to a file for read access:
OBJECT_ATTRIBUTES fileAttributes;
InitializeObjectAttributes(&fileAttributes, &name,
OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE, nullptr, nullptr);
IO_STATUS_BLOCK ioStatus;
Chapter 3: Kernel Programming Basics 53
Device Objects
Although a driver object may look like a good candidate for clients to talk to, this is not the case.
The actual communication endpoints for clients are device objects. Device objects are instances of the
semi-documented DEVICE_OBJECT structure. Without device objects, there is no one to talk to. This
means that at least one device object should be created by the driver and given a name, so that it may
be contacted by clients.
The CreateFile function (and its variants) accepts a first argument which is called “file name” in
the documentation, but really this should point to a device object’s name, where an actual file system
file is just one particular case. The name CreateFile is somewhat misleading - the word “file” here
means “file object”. Opening a handle to a file or device creates an instance of the kernel structure
FILE_OBJECT, another semi-documented structure.
More precisely, CreateFile accepts a symbolic link, a kernel object that knows how to point to
another kernel object. (You can think of a symbolic link as similar in principle to a file system shortcut.)
All the symbolic links that can be used from the user mode CreateFile or CreateFile2 calls are
located in the Object Manager directory named ??. You can see the contents of this directory with the
Sysinternals WinObj tool. Figure 3-3 shows this directory (named Global?? in WinObj).
Chapter 3: Kernel Programming Basics 54
Some of the names seem familiar, such as C:, Aux, Con, and others. Indeed, these are valid “file names”
for CreateFile calls. Other entries look like long cryptic strings, and these in fact are generated by
the I/O system for hardware-based drivers that call the IoRegisterDeviceInterface API. These
types of symbolic links are not useful for the purpose of this book.
Most of the symbolic links in the \?? directory point to an internal device name under the \Device
directory. The names in this directory are not directly accessible by user-mode callers. But they can
be accessed by kernel callers using the IoGetDeviceObjectPointer API.
A canonical example is the driver for Process Explorer. When Process Explorer is launched with
administrator rights, it installs a driver. This driver gives Process Explorer powers beyond those that
can be obtained by user-mode callers, even if running elevated. For example, Process Explorer in its
Threads dialog for a process can show the complete call stack of a thread, including functions in
kernel mode. This type of information is not possible to obtain from user mode; its driver provides
the missing information.
The driver installed by Process Explorer creates a single device object so that Process Explorer is able to
open a handle to that device and make requests. This means that the device object must be named, and
must have a symbolic link in the ?? directory; and it’s there, called PROCEXP152, probably indicating
driver version 15.2 (at the time of writing). Figure 3-4 shows this symbolic link in WinObj.
Chapter 3: Kernel Programming Basics 55
Notice the symbolic link for Process Explorer’s device points to \Device\PROCEXP152, which is the
internal name only accessible to kernel callers (and the native APIs NtOpenFile and NtCreateFile,
as shown in the next section). The actual CreateFile call made by Process Explorer (or any other
client) based on the symbolic link must be prepended with \\.\. This is necessary so that the I/O
manager’s parser will not assume the string “PROCEXP152” refers to a file with no extension in the
current directory. Here is how Process Explorer would open a handle to its device object (note the
double backslashes because of the backslash being an escape character in C/C++):
With C++ 11 and later, you can write strings without escaping the backslash character. The
device path in the above code can be written like so: LR"(\\.\PROCEXP152)". L indicates
Unicode (as always), while anything between R"( and )" is not escaped.
You can try the above code yourself. If Process Explorer has run elevated at least once on the
system since boot, its driver should be running (you can verify with the tool itself), and the call
to CreateFile will succeed if the client is running elevated.
A driver creates a device object using the IoCreateDevice function. This function allocates and
initializes a device object structure and returns its pointer to the caller. The device object instance is
stored in the DeviceObject member of the DRIVER_OBJECT structure. If more than one device object
is created, they form a singly linked list, where the member NextDevice of the DEVICE_OBJECT points
to the next device object. Note that the device objects are inserted at the head of the list, so the first
Chapter 3: Kernel Programming Basics 56
device object created is stored last; its NextDevice points to NULL. These relationships are depicted
in figure 3-5.
NTAPI NtOpenFile (
OUT PHANDLE FileHandle,
IN ACCESS_MASK DesiredAccess,
IN POBJECT_ATTRIBUTES ObjectAttributes,
OUT PIO_STATUS_BLOCK IoStatusBlock,
IN ULONG ShareAccess,
IN ULONG OpenOptions);
Notice the similarity to the ZwOpenFile we used in an earlier section - this is the same function
prototype, just invoked here from user mode, eventually to land at NtOpenFile within the I/O
manager. The function requires usage of an OBJECT_ATTRIBUTES structure, described earlier in this
chapter.
The above prototype uses old macros such as IN, OUT and others. These have been replaced by SAL
annotations. Unfortunately, some header files were not yet converted to SAL.
To demonstrate using NtOpenFile from user mode, we’ll create an application to play a single sound.
Normally, the Beep Windows user-mode API provides such a service:
Chapter 3: Kernel Programming Basics 57
BOOL Beep(
_In_ DWORD dwFreq,
_In_ DWORD dwDuration);
The function accepts the frequency to play (in Hertz), and the duration to play, in milliseconds. The
function is synchronous, meaning it does not return until the duration has elapsed.
The Beep API works by calling a device named \Device\Beep (you can find it in WinObj), but the beep
device driver does not create a symbolic link for it. However, we can open a handle to the beep device
using NtOpenFile. Then, to play a sound, we can use the DeviceIoContol function with the correct
parameters. Although it’s not too difficult to reverse engineer the beep driver workings, fortunately
we don’t have to. The SDK provides the <ntddbeep.h> file with the required definitions, including the
device name itself.
We’ll start by creating a C++ Console application in Visual Studio. Before we get to the main function,
we need some #includes:
#include <Windows.h>
#include <winternl.h>
#include <stdio.h>
#include <ntddbeep.h>
<winternl.h> provides the definition for NtOpenFile (and related data structures), while <ntddbeep.h>
provides the beep-specific definitions.
Since we will be using NtOpenFile, we must also link against NtDll.Dll, which we can do by adding
a #pragma to the source code, or add the library to the linker settings in the project’s properties. Let’s
go with the former, as it’s easier, and is not tied to the project’s properties:
Without the above linkage, the linker would issue an “unresolved external” error.
Now we can start writing main, where we accept optional command line arguments indicating the
frequency and duration to play:
Chapter 3: Kernel Programming Basics 58
HANDLE hFile;
OBJECT_ATTRIBUTES attr;
UNICODE_STRING name;
RtlInitUnicodeString(&name, L"\\Device\\Beep");
InitializeObjectAttributes(&attr, &name, OBJ_CASE_INSENSITIVE,
nullptr, nullptr);
IO_STATUS_BLOCK ioStatus;
NTSTATUS status = ::NtOpenFile(&hFile, GENERIC_WRITE, &attr, &ioStatus, 0, 0);
RtlInitUnicodeString(&name, DD_BEEP_DEVICE_NAME_U);
if (NT_SUCCESS(status)) {
BEEP_SET_PARAMETERS params;
params.Frequency = freq;
params.Duration = duration;
DWORD bytes;
//
// play the sound
//
printf("Playing freq: %u, duration: %u\n", freq, duration);
::DeviceIoControl(hFile, IOCTL_BEEP_SET, ¶ms, sizeof(params),
nullptr, 0, &bytes, nullptr);
//
// the sound starts playing and the call returns immediately
// Wait so that the app doesn't close
Chapter 3: Kernel Programming Basics 59
//
::Sleep(duration);
::CloseHandle(hFile);
}
Write an application that plays an array of sounds by leveraging the above code.
Summary
In this chapter, we looked at some of the fundamental kernel data structures, concepts, and APIs. In
the next chapter, we’ll build a complete driver, and a client application, expanding on the information
presented thus far.
Chapter 4: Driver from Start to Finish
In this chapter, we’ll use many of the concepts we learned in previous chapters and build a simple,
yet complete, driver, and an associated client application, while filling in some of the missing details
from previous chapters. We’ll deploy the driver and use its capabilities - perform some operation in
kernel mode that is difficult, or impossible to do, in user mode.
In this chapter:
• Introduction
• Driver Initialization
• Client Code
• The Create and Close Dispatch Routines
• The Write Dispatch Routine
• Installing and Testing
Introduction
The problem we’ll solve with a simple kernel driver is the inflexibility of setting thread priorities using
the Windows API. In user mode, a thread’s priority is determined by a combination of its process
Priority Class with an offset on a per thread basis, that has a limited number of levels.
Changing a process priority class (shown as Base priority column in Task Manager) can be
achieved with the SetPriorityClass function that accepts a process handle and one of the six
supported priority classes. Each priority class corresponds to a priority level, which is the default
priority for threads created in that process. A particular thread’s priority can be changed with the
SetThreadPriority function, accepting a thread handle and one of several constants corresponding
to offsets around the base priority class. Table 4-1 shows the available thread priorities based on the
process priority class and the thread’s priority offset.
Chapter 4: Driver from Start to Finish 61
Table 4-1: Legal values for thread priorities with the Windows APIs
The values acceptable to SetThreadPriority specify the offset. Five levels correspond to the offsets
-2 to +2: THREAD_PRIORITY_LOWEST (-2), THREAD_PRIORITY_BELOW_NORMAL (-1), THREAD_PRIORITY_-
NORMAL (0), THREAD_PRIORITY_ABOVE_NORMAL (+1), THREAD_PRIORITY_HIGHEST (+2). The remaining
two levels, called Saturation levels, set the priority to the two extremes supported by that priority class:
THREAD_PRIORITY_IDLE (-Sat) and THREAD_PRIORITY_TIME_CRITICAL (+Sat).
The following code example changes the current thread’s priority to 11:
SetPriorityClass(GetCurrentProcess(),
ABOVE_NORMAL_PRIORITY_CLASS); // process base=10
SetThreadPriority(GetCurrentThread(),
THREAD_PRIORITY_ABOVE_NORMAL); // +1 offset for thread
The Real-time priority class does not imply Windows is a real-time OS; Windows does
not provide some of the timing guarantees normally provided by true real-time operating
systems. Also, since Real-time priorities are very high and compete with many kernel
threads doing important work, such a process must be running with administrator
privileges; otherwise, attempting to set the priority class to Real-time causes the value
to be set to High.
There are other differences between the real-time priorities and the lower priority classes.
Consult the Windows Internals book for more information.
Table 4-1 shows the problem we will address quite clearly. Only a small set of priorities are available
to set directly. We would like to create a driver that would circumvent these limitations and allow
setting a thread’s priority to any number, regardless of its process priority class.
Driver Initialization
We’ll start building the driver in the same way we did in chapter 2. Create a new WDM Empty Project
named Booster (or another name of your choosing) and delete the INF file created by the wizard. Next,
add a new source file to the project, called Booster.cpp (or any other name you prefer). Add the basic
#include for the main WDK header and an almost empty DriverEntry:
Chapter 4: Driver from Start to Finish 62
#include <ntddk.h>
Once all these operations are performed, the driver is ready to take requests.
The first step is to add an Unload routine and point to it from the driver object. Here is the new
DriverEntry with the Unload routine:
// prototypes
// DriverEntry
return STATUS_SUCCESS;
}
We’ll add code to the Unload routine as needed when we do actual work in DriverEntry that needs
to be undone.
Next, we need to set up the dispatch routines that we want to support. Practically all drivers must
support IRP_MJ_CREATE and IRP_MJ_CLOSE, otherwise there would be no way to open a handle to
any device for this driver. So we add the following to DriverEntry:
Chapter 4: Driver from Start to Finish 63
DriverObject->MajorFunction[IRP_MJ_CREATE] = BoosterCreateClose;
DriverObject->MajorFunction[IRP_MJ_CLOSE] = BoosterCreateClose;
We’re pointing the Create and Close major functions to the same routine. This is because, as we’ll
see shortly, they will do the same thing: simply approve the request. In more complex cases, these
could be separate functions, where in the Create case the driver can (for instance) check to see who
the caller is and only let approved callers succeed with opening a handle.
All major functions have the same prototype (they are part of an array of function pointers), so we
have to add a prototype for BoosterCreateClose. The prototype for these functions is as follows:
The function must return NTSTATUS, and accepts a pointer to a device object and a pointer to an I/O
Request Packet (IRP). An IRP is the primary object where the request information is stored, for all
types of requests. We’ll dig deeper into an IRP in chapter 7, but we’ll look at the basics later in this
chapter, since we require it to complete our driver.
BOOL WriteFile(
_In_ HANDLE hFile,
_In_reads_bytes_opt_(nNumberOfBytesToWrite) LPCVOID lpBuffer,
_In_ DWORD nNumberOfBytesToWrite,
_Out_opt_ LPDWORD lpNumberOfBytesWritten,
_Inout_opt_ LPOVERLAPPED lpOverlapped);
Our driver has to export its handling of a write operation capability by assigning a function pointer
to the IRP_MJ_WRITE index of the MajorFunction array in the driver object:
Chapter 4: Driver from Start to Finish 64
DriverObject->MajorFunction[IRP_MJ_WRITE] = BoosterWrite;
BoosterWrite must have the same prototype as all major function code handlers:
struct ThreadData {
ULONG ThreadId;
int Priority;
};
We need the thread’s unique ID and the target priority. Thread IDs are 32-bit unsigned integers, so
we select ULONG as the type. The priority should be a number between 1 and 31, so a simple 32-bit
integer will do.
We cannot normally use DWORD - a common type defined in user mode headers - because it’s not
defined in kernel mode headers. ULONG, on the other hand, is defined in both. It would be easy
enough to define it ourselves, but ULONG is the same anyway.
NTSTATUS IoCreateDevice(
_In_ PDRIVER_OBJECT DriverObject,
_In_ ULONG DeviceExtensionSize,
_In_opt_ PUNICODE_STRING DeviceName,
_In_ DEVICE_TYPE DeviceType,
_In_ ULONG DeviceCharacteristics,
_In_ BOOLEAN Exclusive,
_Outptr_ PDEVICE_OBJECT *DeviceObject);
• DriverObject - the driver object to which this device object belongs to. This should be simply
the driver object passed to the DriverEntry function.
• DeviceExtensionSize - extra bytes that would be allocated in addition to sizeof(DEVICE_-
OBJECT). Useful for associating some data structure with a device. It’s less useful for software
drivers creating just a single device object, since the state needed for the device can simply be
managed by global variables.
• DeviceName - the internal device name, typically created under the \Device Object Manager
directory.
• DeviceType - relevant to some type of hardware-based drivers. For software drivers, the value
FILE_DEVICE_UNKNOWN should be used.
• DeviceCharacteristics - a set of flags, relevant for some specific drivers. Software drivers
specify zero or FILE_DEVICE_SECURE_OPEN if they support a true namespace (rarely needed
by software drivers). More information on device security is presented in chapter 8.
• Exclusive - should more than one file object be allowed to open the same device? Most drivers
should specify FALSE, but in some cases TRUE is more appropriate; it forces a single client at a
time for the device.
• DeviceObject - the returned pointer, passed as an address of a pointer. If successful, IoCreateDevice
allocates the structure from non-paged pool and stores the resulting pointer inside the
dereferenced argument.
Before calling IoCreateDevice we must create a UNICODE_STRING to hold the internal device name:
The device name could be anything but should be in the \Device object manager directory. There are
two ways to initialize a UNICODE_STRING with a constant string. The first is using RtlInitUnicodeString,
which works just fine. But RtlInitUnicodeString must count the number of characters in the string
to initialize the Length and MaximumLength appropriately. Not a big deal in this case, but there is
a quicker way - using the RTL_CONSTANT_STRING macro, which calculates the length of the string
statically (at compile time), meaning it can only work correctly with literal strings.
Now we are ready to call the IoCreateDevice function:
Chapter 4: Driver from Start to Finish 66
PDEVICE_OBJECT DeviceObject;
NTSTATUS status = IoCreateDevice(
DriverObject, // our driver object
0, // no need for extra bytes
&devName, // the device name
FILE_DEVICE_UNKNOWN, // device type
0, // characteristics flags
FALSE, // not exclusive
&DeviceObject); // the resulting pointer
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to create device object (0x%08X)\n", status));
return status;
}
If all goes well, we now have a pointer to our device object. The next step is to make this device object
accessible to user-mode callers by providing a symbolic link. Creating a symbolic link involves calling
IoCreateSymbolicLink:
NTSTATUS IoCreateSymbolicLink(
_In_ PUNICODE_STRING SymbolicLinkName,
_In_ PUNICODE_STRING DeviceName);
The following lines create a symbolic link and connect it to our device object:
The IoCreateSymbolicLink does the work by accepting the symbolic link and the target of the link.
Note that if the creation fails, we must undo everything done so far - in this case just the fact the
device object was created - by calling IoDeleteDevice. More generally, if DriverEntry returns any
failure status, the Unload routine is not called. If we had more initialization steps to do, we would
have to remember to undo everything until that point in case of failure. We’ll see a more elegant way
of handling this in chapter 6.
Once we have the symbolic link and the device object set up, DriverEntry can return success,
indicating the driver is now ready to accept requests.
Before we move on, we must not forget the Unload routine. Assuming DriverEntry completed
successfully, the Unload routine must undo whatever was done in DriverEntry. In our case, there
are two things to undo: device object creation and symbolic link creation. We’ll undo them in reverse
order:
Chapter 4: Driver from Start to Finish 67
Notice the device object pointer is extracted from the driver object, as it’s the only argument we get
in the Unload routine. It’s certainly possible to store the device object pointer in a global variable and
access it here directly, but there is no need. Global variables usage should be kept to a minimum.
Client Code
At this point, it’s worth writing the user-mode client code. Everything we need for the client has
already been defined.
Add a new C++ Console Application project to the solution named Boost (or some other name of your
choosing). The Visual Studio wizard should create a single source file with some “hello world” type
of code. You can safely delete all the contents of the file.
First, we add the required #includes to the Boost.cpp file:
#include <windows.h>
#include <stdio.h>
#include "..\Booster\BoosterCommon.h"
Note that we include the common header file created by the driver to be shared with the client.
Change the main function to accept command line arguments. We’ll accept a thread ID and a priority
using command line arguments and request the driver to change the priority of the thread to the given
value.
//
// extract from command line
//
int tid = atoi(argv[1]);
int priority = atoi(argv[2]);
Chapter 4: Driver from Start to Finish 68
Next, we need to open a handle to our device. The “file name” to CreateFile should be the symbolic
link prepended with “\\.\”. The entire call should look like this:
The Error function simply prints some text with the last Windows API error:
The CreateFile call should reach the driver in its IRP_MJ_CREATE dispatch routine. If the driver is
not loaded at this time - meaning there is no device object and no symbolic link - we’ll get error
number 2 (file not found).
Now that we have a valid handle to our device, it’s time to set up the call to Write. First, we need to
create a ThreadData structure and fill in the details:
ThreadData data;
data.ThreadId = tid;
data.Priority = priority;
Now we’re ready to call WriteFile and close the device handle afterwards:
DWORD returned;
BOOL success = WriteFile(hDevice,
&data, sizeof(data), // buffer and length
&returned, nullptr);
if (!success)
return Error("Priority change failed!");
CloseHandle(hDevice);
The call to WriteFile reaches the driver by invoking the IRP_MJ_WRITE major function routine.
At this point, the client code is complete. All that remains is to implement the dispatch routines we
declared on the driver side.
Chapter 4: Driver from Start to Finish 69
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return STATUS_SUCCESS;
}
Every dispatch routine accepts the target device object and an I/O Request Packet (IRP). We don’t
care much about the device object, since we only have one, so it must be the one we created in
DriverEntry. The IRP on the other hand, is extremely important. We’ll dig deeper into IRPs in chapter
6, but we need to take a quick look at IRPs now.
An IRP is a semi-documented structure that represents a request, typically coming from one of the
managers in the Executive: the I/O Manager, the Plug & Play Manager, or the Power Manager. With
a simple software driver, that would most likely be the I/O Manager. Regardless of the creator of the
IRP, the driver’s purpose is to handle the IRP, which means looking at the details of the request and
doing what needs to be done to complete it.
Every request to the driver always arrives wrapped in an IRP, whether that’s a Create, Close, Read,
Write, or any other IRP. By looking at the IRP’s members, we can figure out the type and details of the
request (technically, the dispatch routine itself was pointed to based on the request type, so in most
cases you already know the request type). It’s worth mentioning that an IRP never arrives alone; it’s
accompanied by one or more structures of type IO_STACK_LOCATION. In simple cases like our driver,
there is a single IO_STACK_LOCATION. In more complex cases where there are filter drivers above or
below us, multiple IO_STACK_LOCATION instances exist, one for each layer in the device stack. (We’ll
discuss this more thoroughly in chapter 7). Simply put, some of the information we need is in the base
IRP structure, and some is in the IO_STACK_LOCATION for our “layer” in the device stack.
In the case of Create and Close, we don’t need to look into any members. We just need to set the
completion status of the IRP in its IoStatus member (of type IO_STATUS_BLOCK), which has two
members:
• Status (NTSTATUS) - indicating the status this request should complete with.
• Information (ULONG_PTR) - a polymorphic member, meaning different things in different
request types. In the case of Create and Close, a zero value is just fine.
To complete the IRP, we call IoCompleteRequest. This function has a lot to do, but basically it
propagates the IRP back to its creator (typically the I/O Manager), and that manager notifies the
Chapter 4: Driver from Start to Finish 70
client that the operation has completed and frees the IRP. The second argument is a temporary priority
boost value that a driver can provide to its client. In most cases for a software driver, a value of zero
is fine (IO_NO_INCREMENT is defined as zero). This is especially true since the request is completed
synchronously, so no reason the caller should get a priority boost. More information on this function
is provided in chapter 7.
The last thing to do is return the same status as the one put into the IRP. This may seem like a useless
duplication, but it is necessary (the reason will be clearer in a later chapter).
You may be tempted to write the last line of BoosterCreateClose like so:
return Irp->IoStatus.Status; So that the returned value is always the same as the
one stored in the IRP. This code is buggy, however, and will cause a BSOD in most
cases. The reason is that after IoCompleteRequest is invoked, the IRP pointer should
be considered “poison”, as it’s more likely than not that it has already been deallocated by
the I/O manager.
The key to getting the information for any IRP is to look inside the IO_STACK_LOCATION asso-
ciated with the current device layer. Calling IoGetCurrentIrpStackLocation returns a pointer
to the correct IO_STACK_LOCATION. In our case, there is just one IO_STACK_LOCATION, but in
the general case there could be more (in fact, a filter may be above our device), so calling
IoGetCurrentIrpStackLocation is the right thing to do.
The main ingredient in an IO_STACK_LOCATION is a monstrous union identified with the member
named Parameters, which holds a set of structures, one for each type of IRP. In the case of IRP_MJ_-
WRITE, the structure to look at is Parameters.Write.
Now we can check the buffer size to make sure it’s at least the size we expect:
Chapter 4: Driver from Start to Finish 71
do {
if (irpSp->Parameters.Write.Length < sizeof(ThreadData)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
The do keyword opens a simple do/while(false) block that allows using the break keyword to bail
out early in case of an error. We’ll discuss this technique in greater detail in chapter 7.
Next, we need to grab the user buffer’s pointer, and check if the priority value is in the legal range (0
to 31). We also check if the pointer itself is NULL, as it’s possible for the client to pass a NULL pointer for
the buffer, but the length may be greater than zero. The buffer’s address is provided in the UserBuffer
member of the IRP:
UserBuffer is typed as a void pointer, so we need to cast it to the expected type. Then we check the
priority value, and if not in range change the status to STATUS_INVALID_PARAMETER and break out of
the “loop”.
Notice the order of checks: the pointer is compared to NULL first, and only if non-NULL,
the next check takes place. If data is NULL, however, no further checks are made. This
behavior is guaranteed by the C/C++ standard, known as short circuit evaluation.
The use of static_cast asks the compiler to check if the cast makes sense. Technically,
the C++ compiler allows casting a void pointer to any other pointer, so it doesn’t look
that useful in this case, and perhaps a C-style cast would be simpler to write. Still, it’s a
good habit to have, as it can catch some errors at compile time (rather than nasty bugs at
runtime).
We’re getting closer to our goal. The API we would like to use is KeSetPriorityThread, prototyped
as follows:
KPRIORITY KeSetPriorityThread(
_Inout_ PKTHREAD Thread,
_In_ KPRIORITY Priority);
Chapter 4: Driver from Start to Finish 72
The KPRIORITY type is just an 8-bit integer. The thread itself is identified by a pointer to a KTHREAD
object. KTHREAD is one part of the way the kernel manages threads. It’s completely undocumented,
but we need the pointer value anyway. We have the thread ID from the client, and need to somehow
get a hold of a pointer to the real thread object in kernel space. The function that can look up a thread
by its ID is aptly named PsLookupThreadByThreadId. To get its definition, we need to add another
#include:
#include <ntifs.h>
You must add this #include before <ntddk.h>, otherwise you’ll get compilation errors. In
fact, you can remove <ntddk.h> entirely, as it’s included by <ntifs.h>.
NTSTATUS PsLookupThreadByThreadId(
_In_ HANDLE ThreadId,
_Outptr_ PETHREAD *Thread);
Again, we see that a thread ID is required, but its type is HANDLE - but it is the ID that we need
nonetheless. The resulting pointer is typed as PETHREAD or pointer to ETHREAD. ETHREAD is completely
opaque. Regardless, we seem to have a problem since KeSetPriorityThread accepts a PKTHREAD
rather than PETHREAD. It turns out these are the same, because the first member of an ETHREAD is a
KTHREAD (the member is named Tcb). We’ll prove all this in the next chapter when we use the kernel
debugger. Here is the beginning of the definition of ETHREAD:
The bottom line is we can safely switch PKTHREAD for PETHREAD or vice versa when needed without
a hitch.
Now we can turn our thread ID into a pointer:
PETHREAD thread;
status = PsLookupThreadByThreadId(ULongToHandle(data->ThreadId),
&thread);
if (!NT_SUCCESS(status))
break;
Chapter 4: Driver from Start to Finish 73
The call to PsLookupThreadByThreadId can fail, the main reason being that the thread ID does not
reference any thread in the system. If the call fails, we simply break and let the resulting NTSTATUS
propagate out of the “loop”.
We are finally ready to change the thread’s priority. But wait - what if after the last call succeeds, the
thread is terminated, just before we set its new priority? Rest assured, this cannot happen. Technically,
the thread can terminate (from an execution perspective) at that point, but that will not make our
pointer a dangling one. This is because the lookup function, if successful, increments the reference
count on the kernel thread object, so it cannot die until we explicitly decrement the reference count.
Here is the call to make the priority change:
We get back the old priority, which we output with KdPrint for debugging purposes. All that’s left
to do now is decrement the thread object’s reference; otherwise, we have a leak on our hands (the
thread object will never die), which will only be resolved in the next system boot. The function that
accomplishes this feat is ObDereferenceObject:
ObDereferenceObject(thread);
We should also report to the client that we used the buffer provided. This is where the information
variable is used:
information = sizeof(data);
We’ll write that value to the IRP before completing it. This is the value returned as the second to
last argument from the client’s WritewFile call. All that’s left to do is to close the while “loop” and
complete the IRP with whatever status we happen to have at this time.
//
// complete the IRP with the status we got at this point
//
Irp->IoStatus.Status = status;
Irp->IoStatus.Information = information;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return status;
}
And we’re done! For reference, here is the complete IRP_MJ_WRITE handler:
Chapter 4: Driver from Start to Finish 74
ObDereferenceObject(thread);
information = sizeof(data);
} while (false);
Irp->IoStatus.Status = status;
Irp->IoStatus.Information = information;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return status;
}
development machine). On the target machine, open an elevated command window and install the
driver using the sc.exe tool as we did back in chapter 2:
Make sure binPath includes the full path of the resulting SYS file. The name of the driver (booster)
in the example is the name of the created Registry key, and so must be unique. It doesn’t have to be
related to the SYS file name.
Now we can load the driver:
If all is well, the driver would have started successfully. To make sure, we can open WinObj and look
for our device name and symbolic link. Figure 4-1 shows the symbolic link in WinObj.
Now we can finally run the client executable. Figure 4-2 shows a thread in Process Explorer of a
cmd.exe process selected as an example for which we want set priority to a new value.
Chapter 4: Driver from Start to Finish 76
Run the client with the thread ID and the desired priority (replace the thread ID as needed):
If you get an error trying to run the executable (usually it’s a Debug build), you may need
to set the runtime library to a static one instead of a DLL. Go to Project properties in Visual
Studio for the client application, C++ node, Code Generation, Runtime Library, and select
Multithreaded Debug. Alternatively, you can compile the client in Release build, and that
should run without any changes.
You should also run DbgView and see the output when a successful priority change occurrs.
Chapter 4: Driver from Start to Finish 78
Summary
We’ve seen how to build a simple, yet complete, driver, from start to finish. We created a user-mode
client to communicate with the driver. In the next chapter, we’ll tackle debugging, which is something
we’re bound to do when writing drivers that may not behave as we expect.
Chapter 5: Debugging and Tracing
Just like with any software, kernel drivers tend to have bugs. Debugging drivers, as opposed to user-
mode debugging, is more challenging. Driver debugging is essentially debugging an entire machine,
not just a specific process. This requires a somewhat different mindset. This chapter discusses user-
mode and kernel-mode debugging using the WinDbg debugger.
In this chapter:
• Cdb and Ntsd are user-mode, console-based debuggers. This means they can be attached to
processes, just like any other user-mode debugger. Both have console UI - type in a command,
get a response, and repeat. The only difference between the two is that if launched from a
console window, Cdb uses the same console, whereas Ntsd always opens a new console window.
They are otherwise identical.
• Kd is a kernel debugger with a console user interface. It can attach to the local kernel (Local
Kernel Debugging, described in the next section), or to another machine for a full kernel
debugging experience.
• WinDbg is the only debugger with a graphical user interface. It can be used for user-mode
debugging or kernel debugging, depending on the selection performed with its menus or the
command line arguments passed to it when launched.
Chapter 5: Debugging and Tracing 80
A relatively recent alternative to the classic WinDbg is Windbg Preview, available through the
Microsoft store. This is a remake of the classic debugger with a much better user interface. It can be
installed on Windows 10 version 1607 or later. From a functionality standpoint, it’s similar to the
classic WinDbg. But it is somewhat easier to use because of the modern, convenient UI, and in fact
has also solved some bugs that still plague the classic debugger. All the commands we’ll see in this
chapter work equally well with either debugger.
Although these debuggers may seem different from one another, the user-mode debuggers are
essentially the same, as are the kernel debuggers. They are all based around a single debugger engine
implemented as a DLL (DbgEng.Dll). The various debuggers are able to use extension DLLs, that
provide most of the power of the debuggers by loading new commands.
The Debugger Engine is documented to a large extent in the Debugging tools for Windows
documentation, which makes it possible to write new debuggers (or other tools) that utilize the
debugger engine.
Other tools that are part of the package include the following (partial list):
• Gflags.exe - the Global Flags tool that allows setting some kernel flags and image flags.
• ADPlus.exe - generate a dump file for a process crash or hang.
• Kill.exe - a simple tool to terminate process(es) based on process ID, name, or pattern.
• Dumpchk.exe - tool to do some general checking of dump files.
• TList.exe - lists running processes on the system with various options.
• Umdh.exe - analyzes heap allocations in user-mode processes.
• UsbView.exe - displays a hierarchical view of USB devices and hubs.
Introduction to WinDbg
This section describes the fundamentals of WinDbg, but bear in mind everything is essentially the
same for the console debuggers, with the exception of the GUI windows.
WinDbg is built around commands. The user enters a command, and the debugger responds with text
describing the results of the command. With the GUI, some of these results are depicted in dedicated
windows, such as locals, stack, threads, etc.
WinDbg supports three types of commands:
• Intrinsic commands - these commands are built-in into the debugger (part of the debugger
engine), and they operate on the target being debugged.
Chapter 5: Debugging and Tracing 81
• Meta commands - these commands start with a period (.) and they operate on the debugging
environment, rather than directly on the target being debugged.
• Extension commands (sometimes called bang commands) - these commands start with an
exclamation point (!), providing much of the power of the debugger. All extension commands
are implemented in external DLLs. By default, the debugger loads a set of predefined extension
DLLs, but more can be loaded from the debugger directory or another directory with the .load
meta command.
Writing extension DLLs is possible and is fully documented in the debugger docs. In fact, many
such DLLs have been created and can be loaded from their respective source. These DLLs provide
new commands that enhance the debugging experience, often targeting specific scenarios.
• Launch Notepad.
• Launch WinDbg (either the Preview or the classic one. The following screenshots use the
Preview).
• Select File / Attach To Process and locate the Notepad process in the list (see figure 5-1). Then
click Attach. You should see output similar to figure 5-2.
Chapter 5: Debugging and Tracing 82
The Command window is the main window of interest - it should always be open. This is the one
showing the various responses of commands. Typically, most of the time in a debugging session is
spent interacting with this window.
Chapter 5: Debugging and Tracing 83
• The first command we’ll use is ∼, which shows information about all threads in the debugged
process:
0:003> ~
0 Id: 874c.18068 Suspend: 1 Teb: 00000001`2229d000 Unfrozen
1 Id: 874c.46ac Suspend: 1 Teb: 00000001`222a5000 Unfrozen
2 Id: 874c.152cc Suspend: 1 Teb: 00000001`222a7000 Unfrozen
. 3 Id: 874c.bb08 Suspend: 1 Teb: 00000001`222ab000 Unfrozen
The exact number of threads you’ll see may be different than shown here.
One thing that is very important is the existence of proper symbols. Microsoft provides a public symbol
server, which allows locating symbols for most modules by produced by Microsoft. This is essential
in any low-level debugging.
SRV*c:\Symbols*http://msdl.microsoft.com/download/symbols
The middle part (between asterisks) is a local path for caching symbols on your local machine; you
can select any path you like (including a network share, if sharing with a team is desired). Once this
environment variable is set, next invocations of the debugger will find symbols automatically and
load them from the Microsoft symbol server as needed.
The debuggers in the Debugging Tools for Windows are not the only tools that look for this
environment variables. Sysinternals tools (e.g. Process Explorer, Process Monitor), Visual
Studio, and others look for the same variable as well. You set it once, and get its benefit
using multiple tools.
• To make sure you have proper symbols, enter the lm (loaded modules) command:
Chapter 5: Debugging and Tracing 84
0:003> lm
start end module name
00007ff7`53820000 00007ff7`53863000 notepad (deferred)
00007ffb`afbe0000 00007ffb`afca6000 efswrt (deferred)
...
00007ffc`1db00000 00007ffc`1dba8000 shcore (deferred)
00007ffc`1dbb0000 00007ffc`1dc74000 OLEAUT32 (deferred)
00007ffc`1dc80000 00007ffc`1dd22000 clbcatq (deferred)
00007ffc`1dd30000 00007ffc`1de57000 COMDLG32 (deferred)
00007ffc`1de60000 00007ffc`1f350000 SHELL32 (deferred)
00007ffc`1f500000 00007ffc`1f622000 RPCRT4 (deferred)
00007ffc`1f630000 00007ffc`1f6e3000 KERNEL32 (pdb symbols) c:\symbols\ker\
nel32.pdb\3B92DED9912D874A2BD08735BC0199A31\kernel32.pdb
00007ffc`1f700000 00007ffc`1f729000 GDI32 (deferred)
00007ffc`1f790000 00007ffc`1f7e2000 SHLWAPI (deferred)
00007ffc`1f8d0000 00007ffc`1f96e000 sechost (deferred)
00007ffc`1f970000 00007ffc`1fc9c000 combase (deferred)
00007ffc`1fca0000 00007ffc`1fd3e000 msvcrt (deferred)
00007ffc`1fe50000 00007ffc`1fef3000 ADVAPI32 (deferred)
00007ffc`20380000 00007ffc`203ae000 IMM32 (deferred)
00007ffc`203e0000 00007ffc`205cd000 ntdll (pdb symbols) c:\symbols\ntd\
ll.pdb\E7EEB80BFAA91532B88FF026DC6B9F341\ntdll.pdb
The list of modules shows all modules (DLLs and the EXE) loaded into the debugged process at this
time. You can see the start and end virtual addresses into which each module is loaded. Following the
module name you can see the symbol status of this module (in parenthesis). Possible values include:
• deferred - the symbols for this module were not needed in this debugging session so far, and
so are not loaded at this time. The symbols will be loaded when needed (for example, if a call
stack contains a function from that module). This is the default value.
• pdb symbols - proper public symbols have been loaded. The local path of the PDB file is
displayed.
• private pdb symbols - private symbols are available. This would be the case for your own
modules, compiled with Visual Studio. For Microsoft modules, this is very rare (at the time
of writing, combase.dll is provided with private symbols). With private symbols, you have
information about local variables and private types.
• export symbols - only exported symbols are available for this DLL. This typically means there
are no symbols for this module, but the debugger is able to use the exported sysmbols. It’s better
than no symbols at all, but could be confusing, as the debugger will use the closest export it can
find, but the real function is most likely different.
• no symbols - this module’s symbols were attempted to be located, but nothing was found,
not even exported symbols (such modules don’t have exported symbols, as is the case of an
executable or driver files).
You can force loading of a module’s symbols using the following command:
Chapter 5: Debugging and Tracing 85
.reload /f modulename.dll
This will provide definitive evidence to the availability of symbols for this module.
Symbol paths can also be configured in the debugger’s settings dialog.
Open the File / Settings menu and locate Debugging Settings. You can then add more paths for symbol
searching. This is useful if debugging your own code, so you would like the debugger to search your
directories where relevant PDB files may be found (see figure 5-3).
Make sure you have symbols configured correctly before you proceed. To diagnose any issues, you
can enter the !sym noisy command that logs detailed information for symbol load attempts.
Back to the thread list - notice that one of the threads has a dot in front of its data. This is the current
thread as far as the debugger is concerned. This means that any command issued that involves a
thread, where the thread is not explicitly specified, will work on that thread. This “current thread” is
also shown in the prompt - the number to the right of the colon is the current thread index (3 in this
example).
Enter the k command, that shows the stack trace for the current thread:
0:003> k
# Child-SP RetAddr Call Site
00 00000001`224ffbd8 00007ffc`204aef5b ntdll!DbgBreakPoint
01 00000001`224ffbe0 00007ffc`1f647974 ntdll!DbgUiRemoteBreakin+0x4b
02 00000001`224ffc10 00007ffc`2044a271 KERNEL32!BaseThreadInitThunk+0x14
03 00000001`224ffc40 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Chapter 5: Debugging and Tracing 86
How can you tell that you don’t have proper symbols except using the lm command?
If you see very large offsets from the beginning of a function, this is probably not the
real function name - it’s just the closest one the debugger knows about. “Large offsets” is
obviously a relative term, but a good rule of thumb is that a 4-hex digit offset is almost
always wrong.
You can see the list of calls made on this thread (user-mode only, of course). The top of the call stack
in the above output is the function DbgBreakPoint located in the module ntdll.dll. The general
format of addresses with symbols is modulename!functionname+offset. The offset is optional and
could be zero if it’s exactly the start of this function. Also notice the module name is without an
extension.
In the output above, DbgBreakpoint was called by DbgUiRemoteBreakIn, which was called by
BaseThreadInitThunk, and so on.
This thread, by the way, was injected by the debugger in order to break into the target forcefully.
To switch to a different thread, use the following command: ∼ns where n is the thread index. Let’s
switch to thread 0 and then display its call stack:
0:003> ~0s
win32u!NtUserGetMessage+0x14:
00007ffc`1c4b1164 c3 ret
0:000> k
# Child-SP RetAddr Call Site
00 00000001`2247f998 00007ffc`1d802fbd win32u!NtUserGetMessage+0x14
01 00000001`2247f9a0 00007ff7`5382449f USER32!GetMessageW+0x2d
02 00000001`2247fa00 00007ff7`5383ae07 notepad!WinMain+0x267
03 00000001`2247fb00 00007ffc`1f647974 notepad!__mainCRTStartup+0x19f
04 00000001`2247fbc0 00007ffc`2044a271 KERNEL32!BaseThreadInitThunk+0x14
05 00000001`2247fbf0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
This is Notepad’s main (first) thread. The top of the stack shows the thread waiting for UI messages
(win32u!NtUserGetMessage). The thread is actually waiting in kernel mode, but this is invisible from
a user-mode debugger’s view.
An alternative way to show the call stack of another thread without switching to it, is to use the tilde
and thread number before the actual command. The following output is for thread 1’s stack:
Chapter 5: Debugging and Tracing 87
0:000> ~1k
# Child-SP RetAddr Call Site
00 00000001`2267f4c8 00007ffc`204301f4 ntdll!NtWaitForWorkViaWorkerFactory+0x14
01 00000001`2267f4d0 00007ffc`1f647974 ntdll!TppWorkerThread+0x274
02 00000001`2267f7c0 00007ffc`2044a271 KERNEL32!BaseThreadInitThunk+0x14
03 00000001`2267f7f0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
The above call stack is very common, and indicates a thread that is part of the thread
pool. TppWorkerThread is the thread entry point for thread pool threads (Tpp is short for
“Thread Pool Private”).
Notice the dot has moved to thread 0 (current thread), revealing a hash sign (#) on thread 3. The thread
marked with a hash (#) is the one that caused the last breakpoint (which in this case was our initial
debugger attach).
The basic information for a thread provided by the ∼ command is shown in figure 5-4.
Most numbers reported by WinDbg are hexadecimal by default. To convert a value to decimal, you
can use the ? (evaluate expression) command.
Type the following to get the decimal process ID (you can then compare to the reported PID in Task
Manager):
Chapter 5: Debugging and Tracing 88
0:000> ? 874c
Evaluate expression: 34636 = 00000000`0000874c
You can express decimal numbers with the 0n prefix, so you can get the inverse result as well:
0:000> ? 0n34636
Evaluate expression: 34636 = 00000000`0000874c
The 0y prefix can be used in WinDbg to specify binary values. For example, using 0y1100
is the same as 0n12 as is 0xc. You can use the ? command to see the converted values.
You can examine the TEB of a thread by using the !teb command. Using !teb without an address
shows the TEB of the current thread:
0:000> !teb
TEB at 000000012229d000
ExceptionList: 0000000000000000
StackBase: 0000000122480000
StackLimit: 000000012246f000
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 000000012229d000
EnvironmentPointer: 0000000000000000
ClientId: 000000000000874c . 0000000000018068
RpcHandle: 0000000000000000
Tls Storage: 000001c93676c940
PEB Address: 000000012229c000
LastErrorValue: 0
LastStatusValue: 8000001a
Count Owned Locks: 0
HardErrorMode: 0
0:000> !teb 00000001`222a5000
TEB at 00000001222a5000
ExceptionList: 0000000000000000
StackBase: 0000000122680000
StackLimit: 000000012266f000
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 00000001222a5000
Chapter 5: Debugging and Tracing 89
EnvironmentPointer: 0000000000000000
ClientId: 000000000000874c . 00000000000046ac
RpcHandle: 0000000000000000
Tls Storage: 000001c936764260
PEB Address: 000000012229c000
LastErrorValue: 0
LastStatusValue: c0000034
Count Owned Locks: 0
HardErrorMode: 0
Some data shown by the !teb command is relatively known or easy to guess:
• StackBase and StackLimit - user-mode current stack base and stack limit for the thread.
• ClientId - process and thread IDs.
• LastErrorValue - last Win32 error code (GetLastError).
• TlsStorage - Thread Local Storage (TLS) array for this thread (full explanation of TLS is beyond
the scope of this book).
• PEB Address - address of the Process Environment Block (PEB), viewable with the !peb
command.
• LastStatusValue - last NTSTATUS value returned from a system call.
• The !teb command (and similar commands) shows parts of the real data structure behind the
scenes, in this case _TEB. You can always look at the real structure using the dt (display type)
command:
0:000> dt ntdll!_teb
+0x000 NtTib : _NT_TIB
+0x038 EnvironmentPointer : Ptr64 Void
+0x040 ClientId : _CLIENT_ID
+0x050 ActiveRpcHandle : Ptr64 Void
+0x058 ThreadLocalStoragePointer : Ptr64 Void
+0x060 ProcessEnvironmentBlock : Ptr64 _PEB
...
+0x1808 LockCount : Uint4B
+0x180c WowTebOffset : Int4B
+0x1810 ResourceRetValue : Ptr64 Void
+0x1818 ReservedForWdf : Ptr64 Void
+0x1820 ReservedForCrt : Uint8B
+0x1828 EffectiveContainerId : _GUID
Notice that WinDbg is not case sensitive when it comes to symbols. Also, notice the structure name
starting with an underscore; this the way most structures are defined in Windows (user-mode and
Chapter 5: Debugging and Tracing 90
kernel-mode). Using the typedef name (without the underscore) may or may not work, so always
using the underscore is recommended.
How do you know which module defines a structure you wish to view? If the structure
is documented, the module would be listed in the docs for the structure. You can also
try specifying the structure without the module name, forcing the debugger to search for
it. Generally, you “know” where the structure is defined with experience and sometimes
context.
If you attach an address to the previous command, you can get the actual values of data members:
Each member is shown with its offset from the beginning of the structure, its name, and its value.
Simple values are shown directly, while structure values (such as NtTib above) are shown with a
hyperlink. Clicking this hyperlink provides the details of the structure.
Click on the NtTib member above to show the details of this data member:
The debugger uses the newer dx command to view data. See the section “Advanced Debugging with
WinDbg” later in this chapter for more on the dx command.
If you don’t see hyperlinks, you may be using a very old WinDbg, where Debugger Markup
Language (DML) is not on by default. You can turn it on with the .prefer_dml 1 command.
Now let’s turn our attention to breakpoints. Let’s set a breakpoint when a file is opened by notepad.
• Type the following command to set a breakpoint in the CreateFile API function:
0:000> bp kernel32!createfilew
Notice the function name is in fact CreateFileW, as there is no function called CreateFile. In code,
this is a macro that expands to CreateFileW (wide, Unicode version) or CreateFileA (ASCII or Ansi
version) based on a compilation constant named UNICODE. WinDbg responds with nothing. This is a
good thing.
The reason there are two sets of functions for most APIs where strings are involved is a
historical one. In any case, Visual Studio projects define the UNICODE constant by default,
so Unicode is the norm. This is a good thing - most of the A functions convert their input
to Unicode and call the W function.
0:000> bl
0 e Disable Clear 00007ffc`1f652300 0001 (0001) 0:**** KERNEL32!CreateFileW
You can see the breakpoint index (0), whether it’s enabled or disabled (e=enabled, d=disabled), and
you get DML hyperlinks to disable (bd command) and delete (bc command) the breakpoint.
Now let notepad continue execution, until the breakpoint hits:
Type the g command or press the Go button on the toolbar or hit F5:
You’ll see the debugger showing Busy in the prompt and the command area shows Debuggee is
running, meaning you cannot enter commands until the next break.
Notepad should now be alive. Go to its File menu and select Open…. The debugger should spew details
of module loads and then break:
Chapter 5: Debugging and Tracing 92
Breakpoint 0 hit
KERNEL32!CreateFileW:
00007ffc`1f652300 ff25aa670500 jmp qword ptr [KERNEL32!_imp_CreateFileW \
(00007ffc`1f6a8ab0)] ds:00007ffc`1f6a8ab0={KERNELBASE!CreateFileW (00007ffc`1c7\
5e260)}
• We have hit the breakpoint! Notice the thread in which it occurred. Let’s see what the call
stack looks like (it may take a while to show if the debugger needs to download symbols from
Microsoft’s symbol server):
0:002> k
# Child-SP RetAddr Call Site
00 00000001`226fab08 00007ffc`061c8368 KERNEL32!CreateFileW
01 00000001`226fab10 00007ffc`061c5d4d mscoreei!RuntimeDesc::VerifyMainRuntimeM\
odule+0x2c
02 00000001`226fab60 00007ffc`061c6068 mscoreei!FindRuntimesInInstallRoot+0x2fb
03 00000001`226fb3e0 00007ffc`061cb748 mscoreei!GetOrCreateSxSProcessInfo+0x94
04 00000001`226fb460 00007ffc`061cb62b mscoreei!CLRMetaHostPolicyImpl::GetReque\
stedRuntimeHelper+0xfc
05 00000001`226fb740 00007ffc`061ed4e6 mscoreei!CLRMetaHostPolicyImpl::GetReque\
stedRuntime+0x120
...
21 00000001`226fede0 00007ffc`1df025b2 SHELL32!CFSIconOverlayManager::LoadNonlo\
adedOverlayIdentifiers+0xaa
22 00000001`226ff320 00007ffc`1df022af SHELL32!EnableExternalOverlayIdentifiers\
+0x46
23 00000001`226ff350 00007ffc`1def434e SHELL32!CFSIconOverlayManager::RefreshOv\
erlayImages+0xff
24 00000001`226ff390 00007ffc`1cf250a3 SHELL32!SHELL32_GetIconOverlayManager+0x\
6e
25 00000001`226ff3c0 00007ffc`1ceb2726 windows_storage!CFSFolder::_GetOverlayIn\
fo+0x12b
26 00000001`226ff470 00007ffc`1cf3108b windows_storage!CAutoDestItemsFolder::Ge\
tOverlayIndex+0xb6
27 00000001`226ff4f0 00007ffc`1cf30f87 windows_storage!CRegFolder::_GetOverlayI\
nfo+0xbf
28 00000001`226ff5c0 00007ffb`df8fc4d1 windows_storage!CRegFolder::GetOverlayIn\
dex+0x47
29 00000001`226ff5f0 00007ffb`df91f095 explorerframe!CNscOverlayTask::_Extract+\
0x51
2a 00000001`226ff640 00007ffb`df8f70c2 explorerframe!CNscOverlayTask::InternalR\
esumeRT+0x45
Chapter 5: Debugging and Tracing 93
Your call stack may be different, as it depends on the Windows version, and any extensions that
may be loaded and used by the open file dialog box.
What can we do at this point? You may wonder what file is being opened. We can get that information
based on the calling convention of the CreateFileW function. Since this is a 64-bit process (and the
processor is Intel/AMD), the calling convention states that the first integer/pointer arguments are
passed in the RCX, RDX, R8, and R9 registers (in this order). Since the file name in CreateFileW is
the first argument, the relevant register is RCX.
You can get more information on calling conventions in the Debugger documentation (or in several
web resources).
Display the value of the RCX register with the r command (you’ll get a different value):
0:002> r rcx
rcx=00000001226fabf8
We can view the memory pointed by RCX with various d (display) family of commands. Here is the
db command, interpreting the data as bytes.
Chapter 5: Debugging and Tracing 94
0:002> db 00000001226fabf8
00000001`226fabf8 43 00 3a 00 5c 00 57 00-69 00 6e 00 64 00 6f 00 C.:.\.W.i.n\
.d.o.
00000001`226fac08 77 00 73 00 5c 00 4d 00-69 00 63 00 72 00 6f 00 w.s.\.M.i.c\
.r.o.
00000001`226fac18 73 00 6f 00 66 00 74 00-2e 00 4e 00 45 00 54 00 s.o.f.t...N\
.E.T.
00000001`226fac28 5c 00 46 00 72 00 61 00-6d 00 65 00 77 00 6f 00 \.F.r.a.m.e\
.w.o.
00000001`226fac38 72 00 6b 00 36 00 34 00-5c 00 5c 00 76 00 32 00 r.k.6.4.\.\\
.v.2.
00000001`226fac48 2e 00 30 00 2e 00 35 00-30 00 37 00 32 00 37 00 ..0...5.0.7\
.2.7.
00000001`226fac58 5c 00 63 00 6c 00 72 00-2e 00 64 00 6c 00 6c 00 \.c.l.r...d\
.l.l.
00000001`226fac68 00 00 76 1c fc 7f 00 00-00 00 00 00 00 00 00 00 ..v........\
.....
The db command shows the memory in bytes, and ASCII characters on the right. It’s pretty clear what
the file name is, but because the string is Unicode, it’s not very convenient to see.
Use the du command to view Unicode string more conveniently:
0:002> du 00000001226fabf8
00000001`226fabf8 "C:\Windows\Microsoft.NET\Framewo"
00000001`226fac38 "rk64\\v2.0.50727\clr.dll"
You can use a register value directly by prefixing its name with @:
0:002> du @rcx
00000001`226fabf8 "C:\Windows\Microsoft.NET\Framewo"
00000001`226fac38 "rk64\\v2.0.50727\clr.dll"
Similarly, you can view the value of the second argument by looking at the rdx register.
Now let’s set another breakpoint in the native API that is called by CreateFileW - NtCreateFile:
0:002> bp ntdll!ntcreatefile
0:002> bl
0 e Disable Clear 00007ffc`1f652300 0001 (0001) 0:**** KERNEL32!CreateFil\
eW
1 e Disable Clear 00007ffc`20480120 0001 (0001) 0:**** ntdll!NtCreateFile
Notice the native API never uses W or A - it always works with Unicode strings (in fact it expects
UNICODE_STRING structures, as we’ve seen already).
Continue execution with the g command. The debugger should break:
Chapter 5: Debugging and Tracing 95
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ffc`20480120 4c8bd1 mov r10,rcx
0:002> k
# Child-SP RetAddr Call Site
00 00000001`226fa938 00007ffc`1c75e5d6 ntdll!NtCreateFile
01 00000001`226fa940 00007ffc`1c75e2c6 KERNELBASE!CreateFileInternal+0x2f6
02 00000001`226faab0 00007ffc`061c8368 KERNELBASE!CreateFileW+0x66
03 00000001`226fab10 00007ffc`061c5d4d mscoreei!RuntimeDesc::VerifyMainRuntimeM\
odule+0x2c
04 00000001`226fab60 00007ffc`061c6068 mscoreei!FindRuntimesInInstallRoot+0x2fb
05 00000001`226fb3e0 00007ffc`061cb748 mscoreei!GetOrCreateSxSProcessInfo+0x94
...
List the next 8 instructions that are about to be executed with the u (unassemble or disassemble)
command:
0:002> u
ntdll!NtCreateFile:
00007ffc`20480120 4c8bd1 mov r10,rcx
00007ffc`20480123 b855000000 mov eax,55h
00007ffc`20480128 f604250803fe7f01 test byte ptr [SharedUserData+0x308 (0000\
0000`7ffe0308)],1
00007ffc`20480130 7503 jne ntdll!NtCreateFile+0x15 (00007ffc`204\
80135)
00007ffc`20480132 0f05 syscall
00007ffc`20480134 c3 ret
00007ffc`20480135 cd2e int 2Eh
00007ffc`20480137 c3 ret
Notice the value 0x55 is copied to the EAX register. This is the system service number for
NtCreateFile, as described in chapter 1. The syscall instruction shown is the one causing the
transition to kernel-mode, and then executing the NtCreateFile system service itself.
You can step over the next instruction with the p command (step - hit F10 as an alternative). You can
step into a function (in case of assembly, this is the call instruction) with the t command (trace - hit
F11 as an alternative):
Chapter 5: Debugging and Tracing 96
0:002> p
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ffc`20480120 4c8bd1 mov r10,rcx
0:002> p
ntdll!NtCreateFile+0x3:
00007ffc`20480123 b855000000 mov eax,55h
0:002> p
ntdll!NtCreateFile+0x8:
00007ffc`20480128 f604250803fe7f01 test byte ptr [SharedUserData+0x308 (0000\
0000`7ffe0308)],1 ds:00000000`7ffe0308=00
0:002> p
ntdll!NtCreateFile+0x10:
00007ffc`20480130 7503 jne ntdll!NtCreateFile+0x15 (00007ffc`204\
80135) [br=0]
0:002> p
ntdll!NtCreateFile+0x12:
00007ffc`20480132 0f05 syscall
Stepping inside a syscall is not possible, as we’re in user-mode. When we step over/into it, all is
done and we get back a result.
0:002> p
ntdll!NtCreateFile+0x14:
00007ffc`20480134 c3 ret
The return value of functions in x64 calling convention is stored in EAX or RAX. For system calls, it’s
an NTSTATUS, so EAX contains the returned status:
0:002> r eax
eax=c0000034
Zero means success, and a negative value (in two’s complement, most significant bit is set) means an
error. We can get a textual description of the error with the !error command:
0:002> bd *
0:002> g
Since we have no breakpoints at this time, we can force a break by clicking the Break button on the
toolbar, or hitting Ctrl+Break on the keyboard:
Notice the thread number in the prompt. Show all current threads:
0:022> ~
0 Id: 874c.18068 Suspend: 1 Teb: 00000001`2229d000 Unfrozen
1 Id: 874c.46ac Suspend: 1 Teb: 00000001`222a5000 Unfrozen
2 Id: 874c.152cc Suspend: 1 Teb: 00000001`222a7000 Unfrozen
3 Id: 874c.f7ec Suspend: 1 Teb: 00000001`222ad000 Unfrozen
4 Id: 874c.145b4 Suspend: 1 Teb: 00000001`222af000 Unfrozen
...
18 Id: 874c.f0c4 Suspend: 1 Teb: 00000001`222d1000 Unfrozen
19 Id: 874c.17414 Suspend: 1 Teb: 00000001`222d3000 Unfrozen
20 Id: 874c.c878 Suspend: 1 Teb: 00000001`222d5000 Unfrozen
21 Id: 874c.d8c0 Suspend: 1 Teb: 00000001`222d7000 Unfrozen
. 22 Id: 874c.16a54 Suspend: 1 Teb: 00000001`222e1000 Unfrozen
23 Id: 874c.10838 Suspend: 1 Teb: 00000001`222db000 Unfrozen
24 Id: 874c.10cf0 Suspend: 1 Teb: 00000001`222dd000 Unfrozen
Lots of threads, right? These were created by the common open dialog, so not the direct fault of
Notepad.
Continue exploring the debugger in any way you want!
Find out the system service numbers for NtWriteFile and NtReadFile.
ntdll!NtTerminateProcess+0x14:
00007ffc`2047fc14 c3 ret
0:000> k
# Child-SP RetAddr Call Site
00 00000001`2247f6a8 00007ffc`20446dd8 ntdll!NtTerminateProcess+0x14
01 00000001`2247f6b0 00007ffc`1f64d62a ntdll!RtlExitUserProcess+0xb8
02 00000001`2247f6e0 00007ffc`061cee58 KERNEL32!ExitProcessImplementation+0xa
03 00000001`2247f710 00007ffc`0644719e mscoreei!RuntimeDesc::ShutdownAllActiveR\
untimes+0x287
04 00000001`2247fa00 00007ffc`1fcda291 mscoree!ShellShim_CorExitProcess+0x11e
05 00000001`2247fa30 00007ffc`1fcda2ad msvcrt!_crtCorExitProcess+0x4d
06 00000001`2247fa60 00007ffc`1fcda925 msvcrt!_crtExitProcess+0xd
07 00000001`2247fa90 00007ff7`5383ae1e msvcrt!doexit+0x171
08 00000001`2247fb00 00007ffc`1f647974 notepad!__mainCRTStartup+0x1b6
09 00000001`2247fbc0 00007ffc`2044a271 KERNEL32!BaseThreadInitThunk+0x14
0a 00000001`2247fbf0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
You can use the q command to quit the debugger. If the process is still alive, it will be terminated. An
alternative is to use the .detach command to disconnect from the target without killing it.
Kernel Debugging
User-mode debugging involves the debugger attaching to a process, setting breakpoints that cause
the process’ threads to become suspended, and so on. Kernel-mode debugging, on the other hand,
involves controlling the entire machine with the debugger. This means that if a breakpoint is set and
then hit, the entire machine is frozen. Clearly, this cannot be achieved with a single machine. In full
kernel debugging, two machines are involved: a host (where the debugger runs) and a target (being
debugged). The target can, however, be a virtual machine hosted on the same machine (host) where
the debugger executes. Figure 5-5 shows a host and target connected via some connection medium.
Before we get into full kernel debugging, we’ll take a look at its simpler cousin - local kernel debugging.
is no way to set up breakpoints, which means you’re always looking at the current state of the system.
It also means that things change, even while commands are being executed, so some information may
be stale or unreliable. With full kernel debugging, commands can only be entered while the target
system is in a breakpoint, so system state is unchanged.
To configure LKD, enter the following in an elevated command prompt and then restart the system:
bcdedit /debug on
Local Kernel Debugging is protected by Secure Boot on Windows 10, Server 2016, and later.
To activate LKD you’ll have to disable Secure Boot in the machine’s BIOS settings. If, for
whatever reason, this is not possible, there is an alternative using the Sysinternals LiveKd
tool. Copy LiveKd.exe to the Debugging Tools for Windows main directory. Then launch
WinDbg using LiveKd with the following command: livekd -w. The experience is not
the same, as data may become stale because of the way Livekd works, and you may need
to exit the debugger and relaunch from time to time.
After the system is restarted, launch WinDbg elevated (the 64-bit one, if you are on a 64-bit system).
Select the menu File / Attach To Kernel (WinDbg preview) or File / Kernel Debug… (classic WinDbg).
Select the Local tab and click OK. You should see output similar to the following:
Connected to Windows 10 22000 x64 target at (Wed Sep 29 10:57:30.682 2021 (UTC \
+ 3:00)), ptr64 TRUE
Note the prompt displays lkd. This indicates Local Kernel Debugging is active.
Chapter 5: Debugging and Tracing 100
lkd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS ffffd104936c8040
SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 006d5000 ObjectTable: ffffa58d3cc44d00 HandleCount: 3909.
Image: System
PROCESS ffffd104936e2080
SessionId: none Cid: 0058 Peb: 00000000 ParentCid: 0004
DirBase: 0182c000 ObjectTable: ffffa58d3cc4ea40 HandleCount: 0.
Image: Secure System
PROCESS ffffd1049370a080
SessionId: none Cid: 0090 Peb: 00000000 ParentCid: 0004
DirBase: 011b6000 ObjectTable: ffffa58d3cc65a80 HandleCount: 0.
Image: Registry
PROCESS ffffd10497dd0080
SessionId: none Cid: 024c Peb: bc6c2ba000 ParentCid: 0004
DirBase: 10be4b000 ObjectTable: ffffa58d3d49ddc0 HandleCount: 60.
Image: smss.exe
...
• The address attached to the PROCESS text is the EPROCESS address of the process (in kernel
space, of course).
• SessionId - the session the process is running under.
• Cid - (client ID) the unique process ID.
• Peb - the address of the Process Environment Block (PEB). This address is in user space, naturally.
• ParentCid - (parent client ID) the process ID of the parent process. Note that it’s possible the
parent process no longer exists, so this ID may belong to some process created after the parent
process terminated.
• DirBase - physical address of the Master Page Directory for this process, used as the basis for
virtual to physical address translation. On x64, this is known as Page Map Level 4, and on x86
it’s Page Directory Pointer Table (PDPT).
• ObjectTable - pointer to the private handle table for the process.
Chapter 5: Debugging and Tracing 101
The !process command accepts at least two arguments. The first indicates the process of interest
using its EPROCESS address or the unique Process ID, where zero means “all or any process”. The
second argument is the level of detail to display (a bit mask), where zero means the least amount of
detail. A third argument can be added to search for a particular executable. Here are a few examples:
List all processes running explorer.exe:
PROCESS ffffd104a14e2080
SessionId: 1 Cid: 2548 Peb: 005c1000 ParentCid: 0314
DirBase: 140fe9000 ObjectTable: ffffa58d46a99500 HandleCount: 2613.
Image: explorer.exe
List more information for a specific process by specifying its address and a higher level of detail:
CommitCharge 678
Job ffffd104a05ed380
As can be seen from the above output, more information on the process is displayed. Some of this
information is hyperlinked, allowing easy further examination. For example, the job this process is
part of (if any) is a hyperlink, executing the !job command if clicked.
Click on the Job address hyperlink:
A Job is a kernel object that manages one or more processes, for which it can apply various
limits and get accounting information. A discussion of jobs is beyond the scope of this book.
More information can be found in the Windows Internals 7th edition, part 1 and Windows
10 System Programming, Part 1 books.
As usual, a command such as !job hides some information available in the real data structure. In this
case, the type is EJOB. Use the command dt nt!_ejob with the job address to see all the details.
The PEB of a process can be viewed as well by clicking its hyperlink. This is similar to the !peb
command used in user mode, but the twist here is that the correct process context must be set first,
as the address is in user space. Click the Peb hyperlink. You should see something like this:
Chapter 5: Debugging and Tracing 103
The correct process context is set with the .process meta command, and then the PEB is displayed.
This is a general technique you need to use to show memory that is in user space - always make sure
the debugger is set to the correct process context.
Execute the !process command again, but with the second bit set for the details:
Detail level 2 shows a summary of the threads in the process along with the object(s) they are waiting
on (if any).
You can use other detail values (4, 8), or combine them, such as 3 (1 or 2).
Repeat the !process command again, but this time with no detail level. More information is shown
for the process (the default in this case is full details):
The command lists all threads within the process. Each thread is represented by its ETHREAD address
attached to the text “THREAD”. The call stack is listed as well - the module prefix “nt” represents the
Chapter 5: Debugging and Tracing 106
One of the reasons to use “nt” instead of explicitly stating the kernel’s module name is because these
are different between 64 and 32 bit systems (ntoskrnl.exe on 64 bit, and ntkrnlpa.exe on 32 bit); and
it’s a lot shorter.
User-mode symbols are not loaded by default, so thread stacks that span to user mode show just
numeric addresses. You can load user symbols explicitly with .reload /user after setting the process
context to the process of interest with the .process command:
PROCESS ffffd104a14e2080
SessionId: 1 Cid: 2548 Peb: 005c1000 ParentCid: 0314
DirBase: 140fe9000 ObjectTable: ffffa58d46a99500 HandleCount: 2633.
Image: explorer.exe
DeviceMap ffffa58d41354230
Owning Process ffffd1049e118080 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 3921033 Ticks: 7089 (0:00:01:50.765)
Context Switch Count 16410 IdealProcessor: 5
UserTime 00:00:00.265
KernelTime 00:00:00.234
Win32 Start Address ntdll!TppWorkerThread (0x00007ffb37d96830)
Stack Init ffffbe88b5fc7630 Current ffffbe88b5fc6d20
Base ffffbe88b5fc8000 Limit ffffbe88b5fc1000 Call 0000000000000000
Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
ffffbe88`b5fc6d60 fffff802`07c5dc17 nt!KiSwapContext+0x76
ffffbe88`b5fc6ea0 fffff802`07c5fac9 nt!KiSwapThread+0x3a7
ffffbe88`b5fc6f80 fffff802`07c62526 nt!KiCommitThreadWait+0x159
ffffbe88`b5fc7020 fffff802`07c61f38 nt!KeRemoveQueueEx+0x2b6
ffffbe88`b5fc70d0 fffff802`07c6479c nt!IoRemoveIoCompletion+0x98
ffffbe88`b5fc71f0 fffff802`07e25075 nt!NtWaitForWorkViaWorkerFactory+0x\
39c
ffffbe88`b5fc7430 00007ffb`37e26e84 nt!KiSystemServiceCopyEnd+0x25 (Tra\
pFrame @ ffffbe88`b5fc74a0)
00000000`03def858 00007ffb`37d96b0f ntdll!NtWaitForWorkViaWorkerFactory\
+0x14
00000000`03def860 00007ffb`367a54e0 ntdll!TppWorkerThread+0x2df
00000000`03defb50 00007ffb`37d8485b KERNEL32!BaseThreadInitThunk+0x10
00000000`03defb80 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
...
Notice the thread above has issued several IRPs as well. We’ll discuss this in greater detail in chapter
7.
A thread’s information can be viewed separately with the !thread command and the address of the
thread. Check the debugger documentation for the description of the various pieces of information
displayed by this command.
Other generally useful/interesting commands in kernel-mode debugging include:
• !pcr - display the Process Control Region (PCR) for a processor specified as an additional index
(processor 0 is displayed by default if no index is specified).
• !vm - display memory statistics for the system and processes.
• !running - displays information on threads running on all processors on the system.
We’ll look at more specific commands useful for debugging drivers in subsequent chapters.
Chapter 5: Debugging and Tracing 108
Just like Local Kernel Debugging, the target machine cannot use Secure Boot. With full
kernel debugging, there is no workaround.
The target VM must be configured for kernel debugging, similar to local kernel debugging, but with
the added connection media set to a virtual serial port on that machine.
One way to do the configuration is using bcdedit in an elevated command window:
bcdedit /debug on
bcdedit /dbgsettings serial debugport:1 baudrate:115200
Change the debug port number according to the actual virtual serial number (typically 1).
The VM must be restarted for these configurations to take effect. Before you do that, we can map the
serial port to a named pipe. Here is the procedure for Hyper-V virtual machines:
If the Hyper-V VM is Generation 1 (older), there is a simple UI in the VM’s settings to do the
configuration. Use the Add Hardware option to add a serial port if there are none defined. Then
configure the serial port to be mapped to a named port of your choosing. Figure 5-6 shows this dialog.
Chapter 5: Debugging and Tracing 109
Figure 5-6: Mapping serial port to named pipe for Hyper-V Gen-1 VM
For Generation 2 VMs, no UI is currently available. To configure this, make sure the VM is shut down,
and open an elevated PowerShell window.
Type the following to set a serial port mapped to a named pipe:
Change the VM name appropriately and the COM port number as set inside the VM earlier with
bcdedit. Make sure the pipe path is unique.
You can verify the settings are as expected with Get-VMComPort:
Chapter 5: Debugging and Tracing 110
PS C:\>Get-VMComPort myvmname
The kernel debugger must be properly configured to connect with the VM on the same serial port
mapped to the same named pipe exposed on the host.
Launch the kernel debugger elevated, and select File / Attach To Kernel. Navigate to the COM tab. Fill
in the correct details as they were set on the target. Figure 5-7 shows what these settings look like.
Click OK. The debugger should attach to the target. If it does not, click the Break toolbar button. Here
is some typical output:
Chapter 5: Debugging and Tracing 111
Opened \\.\pipe\debug
Waiting to reconnect...
Connected to Windows 10 18362 x64 target at (Sun Apr 21 11:28:11.300 2019 (UTC \
+ 3:00)), ptr64 TRUE
Kernel Debugger connection established. (Initial Breakpoint requested)
Note the prompt has an index and the word kd. The index is the current processor that induced the
break. At this point, the target VM is completely frozen. You can now debug normally, bearing in
mind anytime you break somewhere, the entire machine is frozen.
Chapter 5: Debugging and Tracing 112
On the target machine, running with an elevated command window, configure network debugging
using the following format with bcdedit:
The hostip must be the IP address of the host accessible from the target. port can be any available port
on the host, but the documentation recommends working with port 50000 and up. The key is optional.
If you don’t specify it, the command generates a random key. For example:
The alternative is provide your own key for simplicity, which must be in the format a.b.c.d. This is
acceptable from a security standpoint when working with local virtual machines:
You can always display the current debug configuration with /dbgsettings alone:
bcdedit /dbgsettings
key 1.2.3.4
debugtype NET
hostip 10.100.102.53
port 51111
dhcp Yes
The operation completed successfully.
On the host machine, launch the debugger and select the File / Attach the Kernel option (or File / Kernel
Debug… in the classic WinDbg). Navigate to the NET tab, and enter the information corresponding
to your settings (figure 5-7).
Chapter 5: Debugging and Tracing 113
You may need to click the Break button (possibly multiple times) to establish a connection. More
information and troubeshooting tips can be found at https://docs.microsoft.com/en-us/windows-
hardware/drivers/debugger/setting-up-a-network-debugging-connection.
0: kd> bu booster!driverentry
0: kd> bl
0 e Disable Clear u 0001 (0001) (booster!driverentry)
The breakpoint is unresolved at this point, since our module (driver) is not yet loaded. The debugger
will re-evaluate the breakpoint any time a new module is loaded.
Issue the g command to let the target continue execution, and load the driver with sc start booster
(assuming the driver’s name is booster). If all goes well, the breakpoint should hit, and the source file
should open automatically, showing the following output in the command window:
Chapter 5: Debugging and Tracing 114
0: kd> g
Breakpoint 0 hit
Booster!DriverEntry:
fffff802`13da11c0 4889542410 mov qword ptr [rsp+10h],rdx
The index on the left of the colon is the CPU index running the code when the breakpoint
hit (CPU 0 in the above output).
Figure 5-9 shows a screenshot of WinDbg Preview source window automatically opening and the
correct line marked. The Locals window is also shown as expected.
Chapter 5: Debugging and Tracing 115
At this point, you can step over source lines, look at variables in the Locals window, and even add
expressions to the Watch window. You can also change values using the Locals window just like you
Chapter 5: Debugging and Tracing 116
0: kd> k
# Child-SP RetAddr Call Site
00 ffffbe88`b3f4f138 fffff802`13da5020 Booster!DriverEntry [D:\Dev\windowsk\
ernelprogrammingbook2e\Chapter04\Booster\Booster.cpp @ 9]
01 ffffbe88`b3f4f140 fffff802`081cafc0 Booster!GsDriverEntry+0x20 [minkerne\
l\tools\gs_support\kmode\gs_support.c @ 128]
02 ffffbe88`b3f4f170 fffff802`080858e2 nt!PnpCallDriverEntry+0x4c
03 ffffbe88`b3f4f1d0 fffff802`081aeab7 nt!IopLoadDriver+0x8ba
04 ffffbe88`b3f4f380 fffff802`07c48aaf nt!IopLoadUnloadDriver+0x57
05 ffffbe88`b3f4f3c0 fffff802`07d5b615 nt!ExpWorkerThread+0x14f
06 ffffbe88`b3f4f5b0 fffff802`07e16c24 nt!PspSystemThreadStartup+0x55
07 ffffbe88`b3f4f600 00000000`00000000 nt!KiStartSystemThread+0x34
If breakpoints fail to hit, it may be a symbols issue. Execute the .reload command and
see if the issues are resolved. Setting breakpoints in user space is also possible, but first
execute .reload /user to force the debugger to load user-mode symbols.
It may be the case that a breakpoint should hit only when a specific process is the one executing the
code. This can be done by adding the /p switch to a breakpoint. In the following example, a breakpoint
is set only if the process is a specific explorer.exe:
PROCESS ffffd104a14e2080
SessionId: 1 Cid: 2548 Peb: 005c1000 ParentCid: 0314
DirBase: 140fe9000 ObjectTable: ffffa58d46a99500 HandleCount: 4524.
Image: explorer.exe
Chapter 5: Debugging and Tracing 117
Let’s set a normal breakpoint somewhere in the BoosterWrite function, by hitting F9 on the line in
source view, as shown in figure 5-10 (the earlier conditional breakpoint is shown as well).
Listing the breakpoints reflect the new breakpoint with the offset calculated by the debugger:
Chapter 5: Debugging and Tracing 118
0: kd> bl
0 e Disable Clear fffff802`13da11c0 [D:\Dev\Chapter04\Booster\Booster.cpp @\
9] 0001 (0001) Booster!DriverEntry
1 e Disable Clear fffff802`13da1090 [D:\Dev\Chapter04\Booster\Booster.cpp @\
61] 0001 (0001) Booster!BoosterWrite
Match process data ffffd104`9e118080
2 e Disable Clear fffff802`13da10af [D:\Dev\Chapter04\Booster\Booster.cpp @\
65] 0001 (0001) Booster!BoosterWrite+0x1f
Enter the g command to release the target, and then run the boost application with some thread ID
and priority:
Breakpoint 2 hit
Booster!BoosterWrite+0x1f:
fffff802`13da10af 488b4c2468 mov rcx,qword ptr [rsp+68h]
You can continue debugging normally, looking at local variables, stepping over/into functions, etc.
Finally, if you would like to disconnect from the target, enter the .detach command. If it does not
resume the target, click the Stop Debugging toolbar button (you may need to click it multiple times).
Asserts
Just like in user mode, asserts can be used to verify that certain assumptions are correct. An invalid
assumption means something is very wrong, so it’s best to stop. The WDK header provides the NT_-
ASSERT macro for this purpose.
NT_ASSERT accepts something that can be converted to a Boolean value. If the result is non-zero (true),
execution continues. Otherwise, the assertion has failed, and the system takes one of the following
actions:
Here is a simple assert usage added to the DriverEntry function in the Booster driver from chapter
4:
Chapter 5: Debugging and Tracing 119
DriverObject->MajorFunction[IRP_MJ_CREATE] = BoosterCreateClose;
DriverObject->MajorFunction[IRP_MJ_CLOSE] = BoosterCreateClose;
DriverObject->MajorFunction[IRP_MJ_WRITE] = BoosterWrite;
PDEVICE_OBJECT DeviceObject;
NTSTATUS status = IoCreateDevice(
DriverObject, // our driver object
0, // no need for extra bytes
&devName, // the device name
FILE_DEVICE_UNKNOWN, // device type
0, // characteristics flags
FALSE, // not exclusive
&DeviceObject); // the resulting pointer
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to create device object (0x%08X)\n", status));
return status;
}
NT_ASSERT(DeviceObject);
NT_ASSERT(NT_SUCCESS(status));
return STATUS_SUCCESS;
}
The first assert makes sure the device object pointer is non-NULL:
NT_ASSERT(DeviceObject);
The second makes sure the status at the end of DriverEntry is a successful one:
Chapter 5: Debugging and Tracing 120
NT_ASSERT(NT_SUCCESS(status));
NT_ASSERT only compiles its expression in Debug builds, which makes using asserts practically free
from a performance standpoint, as these will not be part of the final released driver. This also means
you need to be careful that the expression inside NT_ASSERT has no side effects. For example, the
following code is wrong:
NT_ASSERT(NT_SUCCESS(IoCreateSymbolicLink(...)));
This is because the call to IoCreateSymbolicLink will disappear completely in Release build. The
correct way to assert would be something like the following:
status = IoCreateSymbolicLink(...);
NT_ASSERT(NT_SUCCESS(status));
Asserts are useful and should be used liberally because they only have an effect in Debug builds.
Extended DbgPrint
We’ve seen usage of the DbgPrint function (and the KdPrint macro) to generate output that can be
viewed with the kernel debugger or a comparable tool, such as DebugView. This works, and is simple
to use, but has some significant downsides:
• All the output is generated - there is no easy way to filter output to show just some output
(such as errors and warnings only). This is partially mitigated with the extended DbgPrintEx
function described in the next paragraph.
• DbgPrint(Ex) is a relatively slow function, which is why it’s mostly used with KdPrint so
that the overhead is removed in Release builds. But output in Release builds could be very
important. Some bugs may only happen in Release builds, where good output could be useful
for diagnosing issues.
• There is no semantic meaning associated with DbgPrint - it’s just text. There is no way to add
values with property name or type information.
• There is no built-in way to save the output to a file rather than just see it in the debugger. if
using DebugView, it allows saving its output to a file.
The output from DbgPrint(Ex) is limited to 512 bytes. Any remaining bytes are lost.
The DbgPrintEx function (and the associated KdPrintEx macro) were added to provide some filtering
support for DbgPrint output:
Chapter 5: Debugging and Tracing 121
ULONG DbgPrintEx (
_In_ ULONG ComponentId,
_In_ ULONG Level,
_In_z_ _Printf_format_string_ PCSTR Format,
...); // any number of args
A list of component Ids is present in the <dpfilter.h> header (common to user and kernel mode),
currently containing 155 valid values (0 to 154). Most values are used by the kernel and Microsoft
drivers, except for a handlful that are meant to be used by third-party drivers:
• 0 to 31 - the level is a single bit formed by the expression 1 << Level. For example, if Level
is 5, then the value is 32.
• Anything greater than 31 - the value is used as is.
#define DPFLTR_ERROR_LEVEL 0
#define DPFLTR_WARNING_LEVEL 1
#define DPFLTR_TRACE_LEVEL 2
#define DPFLTR_INFO_LEVEL 3
You can define more (or different) values as needed. The final result of whether the output will make
its way to its destination depends on the component ID, the bit mask formed by the Level argument,
and on a global mask read from the Debug Print Filter Registry key at system startup. Since the Debug
Print Filter key does not exist by default, there is a default value for all component IDs, which is zero.
This means that actual level value is 1 (1 << 0). The output will go through if either of the following
conditions is true (value is the value specified by the Level argument to DbgPrintEx):
• If value & (Debug print Filter value for that component) is non-zero, the output
goes through. With the default, it’s (value & 1) != 0.
Chapter 5: Debugging and Tracing 122
• If the result of the value ANDed with the Level of the ComponentId is non-zero, the output
goes through.
• Using the Debug Print Filter key under HKLM\System\CCS\Control\Session Manager. DWORD
values can be specified where their name is the macro name of a component ID without
the prefix or suffix. For example, for DPFLTR_IHVVIDEO_ID, you would set the name to
“IHVVIDEO”.
• If a kernel debugger is connected, the level of a component can be changed during debugging.
For example, the following command changes the level of DPFLTR_IHVVIDEO_ID to 0x1ff:
ed Kd_IHVVIDEO_Mask 0x1ff
The Debug Print Filter value can also be changed with the kernel debugger by using the
global kernel variable Kd_WIN2000_Mask.
• The last option is to make the change through the NtSetDebugFilterState native API. It’s
undocumented, but it may be useful in practice. The Dbgkflt tool, available in the Tools
folder in the book’s samples repositpry, makes use of this API (and its query counterpart,
NtQueryDebugFilterState), so that changes can be made even if a kernel debugger is not
attached.
If NtSetDebugFilterState is called from user mode, the caller must have the Debug privilege in
its token. Since administrators have this privilege by default (but not non-admin users), you must
run dbgkflt from an elevated command window for the change to succeed.
Using Dbgkflt
Running Dbgkflt with no arguments shows its usage.
To query the effective level of a given component, add the component name (without the prefix or
suffix). For example:
dbgkflt default
This returns the effective bits for the DPFLTR_DEFAULT_ID component. To change the value to
something else, specify the value you want. It’s always ORed with 0x80000000 so that the bits
you specify are directly used, rather than interpreting numbers lower than 32 as (1 << number).
For example, the following sets the first 4 bits for the DEFAULT component:
DbgPrint is just a shortcut that calls DbgPrintEx with the DPFLTR_DEFAULT_ID component like so
(this is conceptual and will not compile):
This explains why the DWORD named DEFAULT with a value of 8 (1 << DPFLTR_INFO_LEVEL) is the
value to write in the Registry to get DbgPrint output to go through.
Given the above details, a driver can use DbgPrintEx (or the KdPrintEx macro) to specify different
levels so that output can be filtered as needed. Each call, however, may be somewhat verbose. For
example:
DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,
"Booster: DriverEntry called. Registry Path: %wZ\n", RegistryPath);
Obviously, we might prefer a simpler function that always uses DPFLTR_IHVDRIVER_ID (the one that
should be used for generic third-party drivers), like so:
Log(DPFLTR_INFO_LEVEL,
"Booster: DriverEntry called. Registry Path: %wZ\n", RegistryPath);
We can go even further by defining specific functions that use a log level implicitly:
Here is an example where we define several bits to be used by creating an enumeration (there is no
necessity to used the defined ones):
Chapter 5: Debugging and Tracing 124
Each value is associated with a small number (below 32), so that the values are interpreted as powers
of two by DbgPrintEx. Now we can define functions like the following:
and so on. Log is the most generic function, while the others use a predefined log level. Here is the
implementation of the first two functions:
#include <stdarg.h>
The use of static_cast in the above code is required in C++, as scoped enums don’t
automatically convert to integers. You can use a C-style cast instead, if you prefer. If you’re
using pure C, change the scoped enum to a standard enum (remove the class keyword).
The return value from the various DbgPrint variants is typed as a ULONG, but is in fact a
standard NTSTATUS.
Chapter 5: Debugging and Tracing 125
The implementation uses the classic C variable arguments ellipsis (...) and implements these as
you would in standard C. The implementation calls vDbgPrintEx that accepts a va_list, which is
necessary for this to work correctly.
It’s possible to create something more elaborate using the C++ variadic template feature.
This is left as an exercise to the interested (and enthusiastic) reader.
The above code can be found in the Booster2 project, part of the samples for this chapter. As part of
that project, here are a few examples where these functions are used:
// in DriverEntry
Log(LogLevel::Information, "Booster2: DriverEntry called. Registry Path: %wZ\n"\
,
RegistryPath);
// unload routine
LogInfo("Booster2: unload called\n");
ULONG vDbgPrintEx(
_In_ ULONG ComponentId,
_In_ ULONG Level,
_In_z_ PCCH Format,
_In_ va_list arglist);
It’s identical to DbgPrintEx, except its last argument is an already constructed va_list. A wrapper
macro exists as well - vKdPrintEx (compiled in Debug builds only).
Lastly, there is yet another extended function for printing - vDbgPrintExWithPrefix:
Chapter 5: Debugging and Tracing 126
ULONG vDbgPrintExWithPrefix (
_In_z_ PCCH Prefix,
_In_ ULONG ComponentId,
_In_ ULONG Level,
_In_z_ PCCH Format,
_In_ va_list arglist);
It adds a prefix (first parameter) to the output. This is useful to distinguish our driver from other drivers
using the same functions. It also allows easy filtering in tools such as DebugView. For example, this
code snippet shown earlier uses an explicit prefix:
We can define one as a macro, and use it as the first word in any output like so:
This works, but it could be nicer by adding the prefix in every call automatically, by calling
vDbgPrintExWithPrefix instead of vDbgPrintEx in the Log implementations. For example:
Trace Logging
Using DbgPrint and its variants is convenient enough, but as discussed earlier has some drawbacks.
Trace logging is a powerful alternative (or complementary) that uses Event Tracing for Windows
(ETW) for logging purposes, that can be captured live or to a log file. ETW has the additional benefits
of being performant (can be used to log thousands of events per second without any noticeable
delay), and has semantic information not available with the simple strings generated by the DbgPrint
functions.
Chapter 5: Debugging and Tracing 127
Trace logging can be used in exactly the same way in user mode as well.
ETW is beyond the scope of this book. You can find more information in the official
documentation or in my book “Windows 10 System Programming, Part 2”.
To get started with trace logging, an ETW provider has to be defined. Contrary to “classic” ETW, no
provider registration is necessary, as trace logging ensures the even metadata is part of the logged
information, and as such is self-contained.
A provider must have a unique GUID. You can generate one with the Create GUID tool available
with Visual Studio (Tools menu). Figure 5-11 shows a screenshot of the tool with the second radio
button selected, as it’s the closest to the format we need. Click the Copy button to copy that text to
the clipboard.
Paste the text to the main source file of the driver and change the pasted macro to TRACELOGGING_-
DEFINE_PROVIDER to look like this:
Chapter 5: Debugging and Tracing 128
// {B2723AD5-1678-446D-A577-8599D3E85ECB}
TRACELOGGING_DEFINE_PROVIDER(g_Provider, "Booster", \
(0xb2723ad5, 0x1678, 0x446d, 0xa5, 0x77, 0x85, 0x99, 0xd3, 0xe8, 0x5e, 0xcb\
));
g_Provider is a global variable created to represent the ETW provider, where “Booster” is set as its
friendly name.
You will need to add the following #includes (these are common with user-mode):
#include <TraceLoggingProvider.h>
#include <evntrace.h>
TraceLoggingRegister(g_Provider);
Similarly, the provider should be deregistered in the unload routine like so:
TraceLoggingUnregister(g_Provider);
The logging is done with the TraceLoggingWrite macro that is provided a variable number of
arguments using another set of macros that provide convenient usage for typed properties. Here is an
example of a logging call in DriverEntry:
Notice the usage of the TraceLoggingValue macro - it’s the most generic and uses the type inferred by
the first argument (the value). Many other type-safe macros exist, such as the TraceLoggingUnicodeString
macro above that ensures its first argument is indeed a UNICODE_STRING.
Here is another example - if symbolic link creation fails:
Chapter 5: Debugging and Tracing 129
TraceLoggingWrite(g_Provider, "Error",
TraceLoggingLevel(TRACE_LEVEL_ERROR),
TraceLoggingValue("Symbolic link creation failed", "Message"),
TraceLoggingNTStatus(status, "Status", "Returned status"));
You can use any “properties” you want. Try to provide the most important details for the event.
Here are a couple of more examples, taken from the Booster project part of the samples for this chapter:
Select the File / Create New log Session menu to create a new session. This opens up the dialog shown
in figure 5-13.
Chapter 5: Debugging and Tracing 131
TraceView provides several methods of locating providers. We can add multiple providers to the same
session to get information from other components in the system. For now, we’ll add our provider by
using the Manually Entered Control GUID option, and type in our GUID (figure 5-14):
Chapter 5: Debugging and Tracing 132
Click OK. A dialog will pop up asking the source for decoding information. Use the default Auto
option, as trace logging does not require any outside source. You’ll see the single provider in the
Create New Log Session dialog. Click the Next button. The last step of the wizard allows you to select
where the output should go to: a real-time session (shown with TraceView), a file, or both (figure 5-15).
Chapter 5: Debugging and Tracing 133
Click Finish. Now you can load/use the driver normally. You should see the output generated in the
main TraceView window (figure 5-16).
You can see the various properties shown in the Message column. When logging to a file, you can
open the file later with TraceView and see what was logged.
There are other ways to use TraceView, and other tools to record and view ETW information. You
could also write your own tools to parse the ETW log, as the events have semantic information and
Chapter 5: Debugging and Tracing 134
Summary
In this chapter, we looked at the basics of debugging with WinDbg, as well as tracing activities within
the driver. Debugging is an essential skill to develop, as software of all kinds, including kernel drivers,
may have bugs.
In the next chapter, we’ll delve into some kernel mechanisms we need to get acquainted with, as these
come up frequently while developing and debugging drivers.
Chapter 6: Kernel Mechanisms
This chapter discusses various mechanisms the Windows kernel provides. Some of these are directly
useful for driver writers. Others are mechanisms that a driver developer needs to understand as it
helps with debugging and general understanding of activities in the system.
In this chapter:
interrupts coming in with an IRQL of 5 or less cannot interrupt this processor. If, on the other hand,
the IRQL of the new interrupt is above 5, the CPU will save its state again, raise IRQL to the new
level, execute the second ISR associated with the second interrupt and when completed, will drop
back to IRQL 5, restore its state and continue executing the original ISR. Essentially, raising IRQL
blocks code with equal or lower IRQL temporarily. The basic sequence of events when an interrupt
occurs is depicted in figure 6-1. Figure 6-2 shows what interrupt nesting looks like.
An important fact for the depicted scenarios in figures 6-1 and 6-2 is that execution of all ISRs is done
by the same thread - which got interrupted in the first place. Windows does not have a special thread
to handle interrupts; they are handled by whatever thread was running at that time on the interrupted
processor. As we’ll soon discover, context switching is not possible when the IRQL of the processor is
2 or higher, so there is no way another thread can sneak in while these ISRs execute.
The interrupted thread does not get its quantum reduced because of these “interruptions”. It’s not
its fault, so to speak.
When user-mode code is executing, the IRQL is always zero. This is one reason why the term IRQL
is not mentioned in any user-mode documentation - it’s always zero and cannot be changed. Most
kernel-mode code runs with IRQL zero as well. It’s possible, however, in kernel mode, to raise the
IRQL on the current processor.
The important IRQLs are described below:
• PASSIVE_LEVEL in WDK (0) - this is the “normal” IRQL for a CPU. User-mode code always
runs at this level. Thread scheduling working normally, as described in chapter 1.
• APC_LEVEL (1) - used for special kernel APCs (Asynchronous Procedure Calls will be discussed
later in this chapter). Thread scheduling works normally.
• DISPATCH_LEVEL (2) - this is where things change radically. The scheduler cannot wake up on
this CPU. Paged memory access is not allowed - such access causes a system crash. Since the
scheduler cannot interfere, waiting on kernel objects is not allowed (causes a system crash if
used).
• Device IRQL - a range of levels used for hardware interrupts (3 to 11 on x64/ARM/ARM64, 3
to 26 on x86). All rules from IRQL 2 apply here as well.
Chapter 6: Kernel Mechanisms 138
• Highest level (HIGH_LEVEL) - this is the highest IRQL, masking all interrupts. Used by some
APIs dealing with linked list manipulation. The actual values are 15 (x64/ARM/ARM64) and 31
(x86).
When a processor’s IRQL is raised to 2 or higher (for whatever reason), certain restrictions apply on
the executing code:
• Accessing memory not in physical memory is fatal and causes a system crash. This means
accessing data from non-paged pool is always safe, whereas accessing data from paged pool or
from user-supplied buffers is not safe and should be avoided.
• Waiting on any kernel object (e.g. mutex or event) causes a system crash, unless the wait timeout
is zero, which is still allowed. (we’ll discuss dispatcher object and waiting later in this chapter
in the Thread Synchronization section.)
These restrictions are due to the fact that the scheduler “runs” at IRQL 2; so if a processor’s IRQL is
already 2 or higher, the scheduler cannot wake up on that processor, so context switches (replacing the
running thread with another on this CPU) cannot occur. Only higher level interrupts can temporarily
divert code into an associated ISR, but it’s still the same thread - no context switch can occur; the
thread’s context is saved, the ISR executes and the thread’s state resumes.
The current IRQL of a processor can be viewed while debugging with the !irql command.
An optional CPU number can be specified, which shows the IRQL of that CPU.
You can view the registered interrupts on a system using the !idt debugger command.
NT_ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
KeLowerIrql(oldIrql);
Chapter 6: Kernel Mechanisms 139
If you raise IRQL, make sure you lower it in the same function. It’s too dangerous to return
from a function with a higher IRQL than it was entered. Also, make sure KeRaiseIrql
actually raises the IRQL and KeLowerIrql actually lowers it; otherwise, a system crash
will follow.
As we’ve seen in chapter 4, completing a request is done by calling IoCompleteRequest. The problem
is that the documentation states this function can only be called at IRQL <= DISPATCH_LEVEL (2). This
means the ISR cannot call IoCompleteRequest or it will crash the system. So what is the ISR to do?
You may wonder why is there such a restriction. One of the reasons has to do with the
work done by IoCompleteRequest. We’ll discuss this in more detail in the next chapter,
but the bottom line is that this function is relatively expensive. If the call would have been
allowed, that would mean the ISR will take substantially longer to execute, and since it
executes in a high IRQL, it will mask off other interrupts for a longer period of time.
The mechanism that allows the ISR to call IoCompleteRequest (and other functions with similar
limitations) as soon as possible is using a Deferred Procedure Call (DPC). A DPC is an object
encapsulating a function that is to be called at IRQL DISPATCH_LEVEL. At this IRQL, calling
IoCompleteRequest is permitted.
You may wonder why does the ISR not simply lower the current IRQL to DISPATCH_LEVEL,
call IoCompleteRequest, and then raise the IRQL back to its original value. This can cause
a deadlock. We’ll discuss the reason for that later in this chapter in the section Spin Locks.
Chapter 6: Kernel Mechanisms 141
The driver which registered the ISR prepares a DPC in advance, by allocating a KDPC structure from
non-paged pool and initializing it with a callback function using KeInitializeDpc. Then, when the
ISR is called, just before exiting the function, the ISR requests the DPC to execute as soon as possible by
queuing it using KeInsertQueueDpc. When the DPC function executes, it calls IoCompleteRequest.
So the DPC serves as a compromise - it’s running at IRQL DISPATCH_LEVEL, meaning no scheduling
can occur, no paged memory access, etc. but it’s not high enough to prevent hardware interrupts from
coming in and being served on the same processor.
Every processor on the system has its own queue of DPCs. By default, KeInsertQueueDpc queues
the DPC to the current processor’s DPC queue. When the ISR returns, before the IRQL can drop
back to zero, a check is made to see whether DPCs exist in the processor’s queue. If there are, the
processor drops to IRQL DISPATCH_LEVEL (2) and then processes the DPCs in the queue in a First In
First Out (FIFO) manner, calling the respective functions, until the queue is empty. Only then can the
processor’s IRQL drop to zero, and resume executing the original code that was disturbed at the time
the interrupt arrived.
DPCs can be customized in some ways. Check out the docs for the functions
KeSetImportantceDpc and KeSetTargetProcessorDpc.
Figure 6-6 augments figure 6-5 with the DPC routine execution.
Chapter 6: Kernel Mechanisms 142
a DPC is more powerful than a zero IRQL based callback, since it is guaranteed to execute before any
user mode code (and most kernel mode code).
KTIMER Timer;
KDPC TimerDpc;
LARGE_INTEGER interval;
interval.QuadPart = -10000LL * msec;
KeSetTimer(&Timer, interval, &TimerDpc);
}
NT_ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
• User mode APCs - these execute in user mode at IRQL PASSIVE_LEVEL only when the thread
goes into alertable state. This is typically accomplished by calling an API such as SleepEx,
WaitForSingleObjectEx, WaitForMultipleObjectsEx and similar APIs. The last argument
to these functions can be set to TRUE to put the thread in alertable state. In this state it looks at
its APC queue, and if not empty - the APCs now execute until the queue is empty.
Chapter 6: Kernel Mechanisms 144
• Normal kernel-mode APCs - these execute in kernel mode at IRQL PASSIVE_LEVEL and preempt
user-mode code (and user-mode APCs).
• Special kernel APCs - these execute in kernel mode at IRQL APC_LEVEL (1) and preempt user-
mode code, normal kernel APCs, and user-mode APCs. These APCs are used by the I/O manager
to complete I/O operations as will be discussed in the next chapter.
The APC API is undocumented in kernel mode (but has been reversed engineered enough to allow
usage if desired).
User-mode can use (user mode) APCs by calling certain APIs. For example, calling
ReadFileEx or WriteFileEx start an asynchronous I/O operation. When the operation
completes, a user-mode APC is attached to the calling thread. This APC will execute
when the thread enters an alertable state as described earlier. Another useful function
in user mode to explicitly generate an APC is QueueUserAPC. Check out the Windows
API documentation for more information.
an interrupt is asynchronous and can arrive at any time. Examples of exceptions include division by
zero, breakpoint, page fault, stack overflow and invalid instruction.
If an exception occurs, the kernel catches this and allows code to handle the exception, if possible.
This mechanism is called Structured Exception Handling (SEH) and is available for user-mode code
as well as kernel-mode code.
The kernel exception handlers are called based on the Interrupt Dispatch Table (IDT), the same one
holding mapping between interrupt vectors and ISRs. Using a kernel debugger, the !idt command
shows all these mappings. The low numbered interrupt vectors are in fact exception handlers. Here’s
a sample output from this command:
lkd> !idt
(truncated)
Note the function names - most are very descriptive. These entries are connected to Intel/AMD (in
this example) faults. Some common examples of exceptions include:
Some other exceptions are raised by the kernel as a result of a previous CPU fault. For example, if a
page fault is raised, the Memory Manager’s page fault handler will try to locate the page that is not
resident in RAM. If the page happens not to exist at all, the Memory Manager will raise an Access
Violation exception.
Once an exception is raised, the kernel searches the function where the exception occurred for a
handler (except for some exceptions which it handles transparently, such as Breakpoint (3)). If not
found, it will search up the call stack, until such handler is found. If the call stack is exhausted, the
system will crash.
How can a driver handle these types of exceptions? Microsoft added four keywords to the C language
to allow developers to handle such exceptions, as well as have code execute no matter what. Table 6-1
shows the added keywords with a brief description.
Keyword Description
__try Starts a block of code where exceptions may occur.
__except Indicates if an exception is handled, and provides the handling code if it is.
__finally Unrelated to exceptions directly. Provides code that is guaranteed to execute no matter what -
whether the __try block is exited normally, with a return statement, or because of an
exception.
__leave Provides an optimized mechanism to jump to the __finally block from somewhere within a
__try block.
The valid combination of keywords is __try/__except and __try/__finally. However, these can
be combined by using nesting to any level.
These same keywords work in user mode as well, in much the same way.
Using __try/__except
In chapter 4, we implemented a driver that accesses a user-mode buffer to get data needed for the
driver’s operation. We used a direct pointer to the user’s buffer. However, this is not guaranteed to be
safe. For example, the user-mode code (say from another thread) could free the buffer, just before the
driver accesses it. In such a case, the driver would cause a system crash, essentially because of a user’s
error (or malicious intent). Since user data should never be trusted, such access should be wrapped in
a __try/__except block to make sure a bad buffer does not crash the driver.
Here is the important part of a revised IRP_MJ_WRITE handler using an exception handler:
Chapter 6: Kernel Mechanisms 147
do {
if (irpSp->Parameters.Write.Length < sizeof(ThreadData)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
auto data = (ThreadData*)Irp->UserBuffer;
if (data == nullptr) {
status = STATUS_INVALID_PARAMETER;
break;
}
__try {
if (data->Priority < 1 || data->Priority > 31) {
status = STATUS_INVALID_PARAMETER;
break;
}
PETHREAD Thread;
status = PsLookupThreadByThreadId(
ULongToHandle(data->ThreadId), &Thread);
if (!NT_SUCCESS(status))
break;
KeSetPriorityThread((PKTHREAD)Thread, data->Priority);
ObDereferenceObject(Thread);
KdPrint(("Thread Priority change for %d to %d succeeded!\n",
data->ThreadId, data->Priority));
break;
}
__except (EXCEPTION_EXECUTE_HANDLER) {
// probably something wrong with the buffer
status = STATUS_ACCESS_VIOLATION;
}
} while(false);
Does all this mean that the driver can catch any and all exceptions? If so, the driver will never cause
a system crash. Fortunately (or unfortunately, depending on your perspective), this is not the case.
Chapter 6: Kernel Mechanisms 148
Access violation, for example, is something that can only be caught if the violated address is in user
space. If it’s in kernel space, it cannot be caught and still cause a system crash. This makes sense,
since something bad has happened and the kernel will not let the driver get away with it. User mode
addresses, on the other hand, are not at the control of the driver, so such exceptions can be caught
and handled.
The SEH mechanism can also be used by drivers (and user-mode code) to raise custom exceptions.
The kernel provides the generic function ExRaiseStatus to raise any exception and some specific
functions like ExRaiseAccessViolation:
A driver can also crash the system explicitly if it concludes that something really bad going on, such
as data being corrupted from underneath the driver. The kernel provides the KeBugCheckEx for this
purpose:
VOID KeBugCheckEx(
_In_ ULONG BugCheckCode,
_In_ ULONG_PTR BugCheckParameter1,
_In_ ULONG_PTR BugCheckParameter2,
_In_ ULONG_PTR BugCheckParameter3,
_In_ ULONG_PTR BugCheckParameter4);
KeBugCheckEx is the normal kernel function that generates a crash. BugCheckCode is the crash code
to be reported, and the other 4 numbers can provide more details about the crash. If the bugcheck
code is one of those documented by Microsoft, the meaning of the other 4 numbers must be provided
as documented. (See the next section System Crash for more details).
Using __try/__finally
Using a block of __try and __finally is not directly related to exceptions. This is about making sure
some piece of code executes no matter what - whether the code exits cleanly or mid-way because of
an exception. This is similar in concept to the finally keyword popular in some high level languages
(e.g. Java, C#). Here is a simple example to show the problem:
void foo() {
void* p = ExAllocatePoolWithTag(PagedPool, 1024, DRIVER_TAG);
if(p == nullptr)
return;
// do something with p
ExFreePool(p);
}
Chapter 6: Kernel Mechanisms 149
The above code seems harmless enough. However, there are several issues with it:
• If an exception is thrown between the allocation and the release, a handler in the caller will be
searched, but the memory will not be freed.
• If a return statement is used in some conditional between the allocation and release, the buffer
will not be freed. This requires the code to be careful to make sure all exit points from the
function pass through the code freeing the buffer.
The second bullet can be implemented with careful coding, but is a burden best avoided. The first
bullet cannot be handled with standard coding techniques. This is where __try/__finally come in.
Using this combination, we can make sure the buffer is freed no matter what happens in the __try
block:
void foo() {
void* p = ExAllocatePoolWithTag(PagedPool, 1024, DRIVER_TAG);
if(p == nullptr)
return;
__try {
// do something with p
}
__finally {
// called no matter what
ExFreePool(p);
}
}
With the above code in place, even if return statements appear within the __try body, the __finally
code will be called before actually returning from the function. If some exception occurs, the __-
finally block runs first before the kernel searches up the call stack for possible handlers.
__try/__finally is useful not just with memory allocations, but also with other resources, where
some acquisition and release need to take place. One common example is when synchronizing threads
accessing some shared data. Here is an example of acquiring and releasing a fast mutex (fast mutex
and other synchronization primitives are described later in this chapter):
FAST_MUTEX MyMutex;
void foo() {
ExAcquireFastMutex(&MyMutex);
__try {
// do work while the fast mutex is held
}
__finally {
ExReleaseFastMutex(&MyMutex);
}
}
Chapter 6: Kernel Mechanisms 150
template<typename T = void>
struct kunique_ptr {
explicit kunique_ptr(T* p = nullptr) : _p(p) {}
~kunique_ptr() {
if (_p)
ExFreePool(_p);
}
T* operator->() const {
return _p;
}
private:
T* _p;
};
The class uses templates to allow working easily with any type of data. An example usage follows:
struct MyData {
ULONG Data1;
HANDLE Data2;
};
void foo() {
// take charge of the allocation
kunique_ptr<MyData> data((MyData*)ExAllocatePool(PagedPool, sizeof(MyData))\
);
// use the pointer
data->Data1 = 10;
// when the object goes out of scope, the destructor frees the buffer
}
Chapter 6: Kernel Mechanisms 151
If you don’t normally use C++ as your primary programming language, you may find the above code
confusing. You can continue working with __try/__finally, but I recommend getting acquainted
with this type of code. In any case, even if you struggle with the implementation of kunique_ptr
above, you can still use it without needing to understand every little detail.
The kunique_ptr type presented above is a bare minimum. You should also remove the copy
constructor and copy assignment, and allow move copy and assignment (C++ 11 and later, for
ownership transfer). Here is a more complete implementation:
template<typename T = void>
struct kunique_ptr {
explicit kunique_ptr(T* p = nullptr) : _p(p) {}
~kunique_ptr() {
Release();
}
T* operator->() const {
Chapter 6: Kernel Mechanisms 152
return _p;
}
void Release() {
if (_p)
ExFreePool(_p);
}
private:
T* _p;
};
We’ll build other RAII wrappers for synchronization primitives later in this chapter.
Using C++ RAII wrappers has one missing piece - if an exception occurs, the destructor
will not be called, so a leak of some sort occurs. The reason this does not work (as it does in
user-mode), is the lack of a C++ runtime and the current inability of the compiler to set up
elaborate code with __try/__finally to mimic this effect. Even so, it’s still very useful,
as in many cases exceptions are not expected, and even if they are, no handler exists in
the driver for that and the system should probably crash anyway.
System Crash
As we already know, if an unhandled exception occurs in kernel mode, the system crashes, typically
with the “Blue Screen of Death” (BSOD) showing its face (on Windows 8+, that’s literally a face -
saddy or frowny - the inverse of smiley). In this section, we’ll discuss what happens when the system
crashes and how to deal with it.
The system crash has many names, all meaning the same thing - “Blue screen of Death”, “System
failure”, “Bugcheck”, “Stop error”. The BSOD is not some punishment, as may seem at first, but a
protection mechanism. If kernel code, which is supposed to be trusted, did something bad, stopping
everything is probably the safest approach, as perhaps letting the code continue roaming around may
result in an unbootable system if some important files or Registry data is corrupted.
Recent versions of Windows 10 have some alternate colors for when the system crashes. Green is
used for insider preview builds, and I actually encountered a pink as well (power-related errors).
Chapter 6: Kernel Mechanisms 153
If the crashed system is connected to a kernel debugger, the debugger will break. This allows
examining the state of the system before other actions take place.
The system can be configured to perform some operations if the system crashes. This can be done
with the System Properties UI on the Advanced tab. Clicking Settings… at the Startup and Recovery
section brings the Startup and Recovery dialog where the System Failure section shows the available
options. Figure 6-7 shows these two dialogs.
If the system crashes, an event entry can be written to the event log. It’s checked by default, and there
is no good reason to change it. The system is configured to automatically restart; this has been the
default since Windows 2000.
The most important setting is the generation of a dump file. The dump file captures the system state
at the time of the crash, so it can later be analyzed by loading the dump file into the debugger. The
type of the dump file is important since it determines what information will be present in the dump.
The dump is not written to the target file at crash time, but instead written to the first page file.
Only when the system restarts, the kernel notices there is dump information in the page file, and it
copies the data to the target file. The reason has to do with the fact that at system crash time it may
be too dangerous to write something to a new file (or overwrite an existing file); the I/O system may
not be stable enough. The best bet is to write the data to a page file, which is already open anyway.
The downside is that the page file must be large enough to contain the dump, otherwise the dump file
will not be generated.
The dump type determines what data would be written and hints at the page file size that may be
required. Here are the options:
• Small memory dump (256 KB on Windows 8 and later, 64 KB on older systems) - a very minimal
dump, containing basic system information and information on the thread that caused the crash.
Usually this is too little to determine what happened in all but the most trivial cases. The upside
is that the file is small, so it can be easily moved.
• Kernel memory dump - this is the default on Windows 7 and earlier versions. This setting
captures all kernel memory but no user memory. This is usually good enough, since a system
crash can only be caused by kernel code misbehaving. It’s extremely unlikely that user-mode
had anything to do with it.
• Complete memory dump - this provides a dump of all physical memory, user memory and
kernel memory. This is the most complete information available. The downside is the size of
the dump, which could be gigantic depending on the size of RAM (the total size of the final file).
The obvious optimization is not to include unused pages, but Complete Memory Dump does not
do that.
• Automatic memory dump (Windows 8+) - this is the default on Windows 8 and later. This is
the same as kernel memory dump, but the kernel resizes the page file on boot to a size that
guarantees with high probability that the page file size would be large enough to contain a
kernel dump. This is only done if the page file size is specified as “System managed” (the default).
• Active memory dump (Windows 10+) - this is similar to a complete memory dump, with two
exceptions. First, unused pages are not written. Second, if the crashed system is hosting guest
virtual machines, the memory they were using at the time is not captured (as it’s unlikely these
have anything to do with the host crashing). These optimizations help in reducing the dump
file size.
The debugger suggests running !analyze -v and it’s the most common thing to do at the start of dump
analysis. Notice the call stack is at KeBugCheckEx, which is the function generating the bugcheck.
The default logic behind !analyze -v performs basic analysis on the thread that caused the crash
and shows a few pieces of information related to the crash dump code:
2: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: ffffd907b0dc7660, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Chapter 6: Kernel Mechanisms 156
Debugging Details:
------------------
(truncated)
DUMP_TYPE: 1
BUGCHECK_P1: ffffd907b0dc7660
BUGCHECK_P2: 2
BUGCHECK_P3: 0
BUGCHECK_P4: fffff80375261530
CURRENT_IRQL: 2
FAULTING_IP:
myfault+1530
fffff803`75261530 8b03 mov eax,dword ptr [rbx]
(truncated)
000=????????
Resetting default scope
STACK_TEXT:
fffff988`53b0f6a8 fffff803`70c8a469 : 00000000`0000000a ffffd907`b0dc7660 00000\
000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff988`53b0f6b0 fffff803`70c867a5 : ffff8788`e4604080 ffffff4c`c66c7010 00000\
000`00000003 00000000`00000880 : nt!KiBugCheckDispatch+0x69
fffff988`53b0f7f0 fffff803`75261530 : ffffff4c`c66c7000 00000000`00000000 fffff\
988`53b0f9e0 00000000`00000000 : nt!KiPageFault+0x465
fffff988`53b0f980 fffff803`75261e2d : fffff988`00000000 00000000`00000000 ffff8\
788`ec7cf520 00000000`00000000 : myfault+0x1530
fffff988`53b0f9b0 fffff803`75261f88 : ffffff4c`c66c7010 00000000`000000f0 00000\
000`00000001 ffffff30`21ea80aa : myfault+0x1e2d
fffff988`53b0fb00 fffff803`70ae3da9 : ffff8788`e6d8e400 00000000`00000001 00000\
000`83360018 00000000`00000001 : myfault+0x1f88
fffff988`53b0fb40 fffff803`710d1dd5 : fffff988`53b0fec0 ffff8788`e6d8e400 00000\
000`00000001 ffff8788`ecdb6690 : nt!IofCallDriver+0x59
fffff988`53b0fb80 fffff803`710d172a : ffff8788`00000000 00000000`83360018 00000\
000`00000000 fffff988`53b0fec0 : nt!IopSynchronousServiceTail+0x1a5
fffff988`53b0fc20 fffff803`710d1146 : 00000054`344feb28 00000000`00000000 00000\
000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x5ca
fffff988`53b0fd60 fffff803`70c89e95 : ffff8788`e4604080 fffff988`53b0fec0 00000\
054`344feb28 fffff988`569fd630 : nt!NtDeviceIoControlFile+0x56
fffff988`53b0fdd0 00007ff8`ba39c147 : 00000000`00000000 00000000`00000000 00000\
000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25
00000054`344feb48 00000000`00000000 : 00000000`00000000 00000000`00000000 00000\
000`00000000 00000000`00000000 : 0x00007ff8`ba39c147
(truncated)
FOLLOWUP_IP:
myfault+1530
fffff803`75261530 8b03 mov eax,dword ptr [rbx]
FAULT_INSTR_CODE: 8d48038b
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: myfault+1530
Chapter 6: Kernel Mechanisms 158
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: myfault
IMAGE_NAME: myfault.sys
(truncated)
Every crash dump code can have up to 4 numbers that provide more information about the crash. In
this case, we can see the code is DRIVER_IRQL_NOT_LESS_OR_EQUAL (0xd1) and the next four numbers
named Arg1 through Arg4 mean (in order): memory referenced, the IRQL at the time of the call, read
vs. write operation and the accessing address.
The command clearly recognizes myfault.sys as the faulting module (driver). That’s because this is
an easy crash - the culprit is on the call stack as can be seen in the STACK TEXT section above (you
can also simply use the k command to see it again).
The !analyze -v command is extensible and it’s possible to add more analysis to that
command using an extension DLL. You may be able to find such extensions on the web.
Consult the debugger API documentation for more information on how to add your own
analysis code to this command.
More complex crash dumps may show calls from the kernel only on the call stack of the offending
thread. Before you conclude that you found a bug in the Windows kernel, consider this more likely
scenario: A driver did something that was not fatal in itself, such as experience a buffer overflow
- wrote data beyond its allocated buffer, but unfortunately ,the memory following that buffer was
allocated by some other driver or the kernel, and so nothing bad happened at that time. Some time
later, the kernel accessed that memory and got bad data and caused a system crash. But the faulting
driver is nowhere to be found on any call stack; this is much harder to diagnose.
One way to help diagnose such issues is using Driver Verifier. We’ll look at the basics of
Driver Verifier in module 12.
Once you get the crash dump code, it’s helpful to look in the debugger documentation at
the topic “Bugcheck Code Reference”, where common bugcheck codes are explained more
fully with typical causes and ideas on what to investigate next.
• The prompt indicates the current processor. Switching processors can be done with the
command ∼ns where n is the CPU index (it looks like switching threads in user mode).
• The !running command can be used to list the threads that were running on all processors at
the time of the crash. Adding -t as an option shows the call stack for each thread. Here is an
example with the above crash dump:
2: kd> !running -t
acksCompleted+0x15e
03 fffff988`5683ed00 fffff803`70ae3da9 FLTMGR!FltpDispatch+0xb6
04 fffff988`5683ed60 fffff803`710cfe4d nt!IofCallDriver+0x59
05 fffff988`5683eda0 fffff803`710de470 nt!IopDeleteFile+0x12d
06 fffff988`5683ee20 fffff803`70aea9d4 nt!ObpRemoveObjectRoutine+0x80
07 fffff988`5683ee80 fffff803`723391f5 nt!ObfDereferenceObject+0xa4
08 fffff988`5683eec0 fffff803`72218ca7 Ntfs!NtfsDeleteInternalAttributeStream+0\
x111
09 fffff988`5683ef00 fffff803`722ff7cf Ntfs!NtfsDecrementCleanupCounts+0x147
0a fffff988`5683ef40 fffff803`722fe87d Ntfs!NtfsCommonCleanup+0xadf
0b fffff988`5683f390 fffff803`70ae3da9 Ntfs!NtfsFsdCleanup+0x1ad
0c fffff988`5683f6e0 fffff803`702bb5de nt!IofCallDriver+0x59
0d fffff988`5683f720 fffff803`702b9f16 FLTMGR!FltpLegacyProcessingAfterPreCallb\
acksCompleted+0x15e
0e fffff988`5683f7a0 fffff803`70ae3da9 FLTMGR!FltpDispatch+0xb6
0f fffff988`5683f800 fffff803`710ccc38 nt!IofCallDriver+0x59
10 fffff988`5683f840 fffff803`710d4bf8 nt!IopCloseFile+0x188
11 fffff988`5683f8d0 fffff803`710d9f3e nt!ObCloseHandleTableEntry+0x278
12 fffff988`5683fa10 fffff803`70c89e95 nt!NtClose+0xde
13 fffff988`5683fa80 00007ff8`ba39c247 nt!KiSystemServiceCopyEnd+0x25
14 000000b5`aacf9df8 00000000`00000000 0x00007ff8`ba39c247
The command gives a pretty good idea of what was going on at the time of the crash.
• The !stacks command lists all thread stacks for all threads by default. A more useful variant is
a search string that lists only threads where a module or function containing this string appears.
This allows locating driver’s code throughout the system (because it may not have been running
at the time of the crash, but it’s on some thread’s call stack). Here’s an example for the above
dump:
2: kd> !stacks
Proc.Thread .Thread Ticks ThreadState Blocker
[fffff803710459c0 Idle]
0.000000 fffff80371048400 0000003 RUNNING nt!KiIdleLoop+0x15e
0.000000 ffffb000c17b1140 0000ed9 RUNNING hal!HalProcessorIdle+0xf
0.000000 ffffb000c1955140 0000b6e RUNNING nt!KiIdleLoop+0x15e
0.000000 ffffb000c1c91140 000012b RUNNING nt!KiIdleLoop+0x15e
[ffff8788d6a81300 System]
4.000018 ffff8788d6b8a080 0005483 Blocked nt!PopFxEmergencyWorker+0x3e
4.00001c ffff8788d6bc5140 0000982 Blocked nt!ExpWorkQueueManagerThread+0x\
127
4.000020 ffff8788d6bc9140 000085a Blocked nt!KeRemovePriQueue+0x25c
Chapter 6: Kernel Mechanisms 161
(truncated)
(truncated)
The address next to each line is the thread’s ETHREAD address that can be fed to the !thread command.
System Hang
A system crash is the most common type of dump that is typically investigated. However, there is
yet another type of dump that you may need to work with: a hung system. A hung system is a non-
responsive or near non-responsive system. Things seem to be halted or deadlocked in some way - the
system does not crash, so the first issue to deal with is how to get a dump of the system.
A dump file contains some system state, it does not have to be related to a crash or any other bad
state. There are tools (including the kernel debugger) that can generate a dump file at any time.
If the system is still responsive to some extent, the Sysinternals NotMyFault tool can force a system
crash and so force a dump file to be generated (this is in fact the way the dump in the previous section
was generated). Figure 6-8 shows a screenshot of NotMyFault. Selecting the first (default) option and
clicking Crash immediately crashes the system and will generate a dump file (if configured to do so).
Chapter 6: Kernel Mechanisms 162
NotMyFault uses a driver, myfault.sys that is actually responsible for the crash.
NotMyFault has 32 and 64 bit versions (the later file name ends with “64”). Remember to
use the correct one for the system at hand, otherwise its driver will fail to load.
If the system is completely unresponsive, and you can attach a kernel debugger (the target was
configured for debugging), then debug normally or generate a dump file using the .dump command.
If the system is unresponsive and a kernel debugger cannot be attached, it’s possible to generate a crash
manually if configured in the Registry beforehand (this assumes the hang was somehow expected).
When a certain key combination is detected, the keyboard driver will generate a crash. Consult this
link¹ to get the full details. The crash code in this case is 0xe2 (MANUALLY_INITIATED_CRASH).
¹https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/forcing-a-system-crash-from-the-keyboard
Chapter 6: Kernel Mechanisms 163
Thread Synchronization
Threads sometimes need to coordinate work. A canonical example is a driver using a linked list to
gather data items. The driver can be invoked by multiple clients, coming from many threads in one or
more processes. This means manipulating the linked list must be done atomically, so it’s not corrupted.
If multiple threads access the same memory where at least one is a writer (making changes), this is
referred to as a data race. If a data race occurs, all bets are off and anything can happen. Typically,
within a driver, a system crash occurs sooner or later; data corruption is practically guaranteed.
In such a scenario, it’s essential that while one thread manipulates the linked list, all other threads
back off the linked list, and wait in some way for the first thread to finish its work. Only then another
thread (just one) can manipulate the list. This is an example of thread synchronization.
The kernel provides several primitives that help in accomplishing proper synchronization to protect
data from concurrent access. The following discussed various primitives and techniques for thread
synchronization.
Interlocked Operations
The Interlocked set of functions provide convenient operations that are performed atomically by
utilizing the hardware, which means no software objects are involved. If using these functions gets
the job done, then they should be used, as these are as efficient as they can possibly be.
Technically, these Interlocked-family of functions are called compiler intrinsincs, as they are
instructions to the processor, disguised as functions.
A simple example is incrementing an integer by one. Generally, this is not an atomic operation. If two
(or more) threads try to perform this at the same time on the same memory location, it’s possible (and
likely) some of the increments will be lost. Figure 6-9 shows a simple scenario where incrementing a
value by 1 done from two threads ends up with result of 1 instead of 2.
Chapter 6: Kernel Mechanisms 164
The example in figure 6-9 is extremely simplistic. With real CPUs there are other effects
to consider, especially caching, which makes the shown scenario even more likely. CPU
caching, store buffers, and other aspects of modern CPUs are non-trivial topics, well
beyond the scope of this book.
Table 6-2 lists some of the Interlocked functions available for drivers use.
Function Description
InterlockedIncrement / Atomically increment a 32/16/64 bit integer by one
InterlockedIncrement16 /
InterlockedIncrement64
InterlockedDecrement / 16 / 64 Atomically decrement a 32/16/64 bit integer by one.
InterlockedAdd / InterlockedAdd64 Atomically add one 32/64 bit integer to a variable.
InterlockedExchange / 8 / 16 / 64 Atomically exchange two 32/8/16/64 bit values.
InterlockedCompareExchange / 64 / 128 Atomically compare a variable with a value. If equal
exchange with the provided value and return TRUE;
otherwise, place the current value in the variable
and return FALSE.
Chapter 6: Kernel Mechanisms 165
The functions in table 6-2 are also available in user mode, as these are not really functions,
but rather CPU intrinsics - special instructions to the CPU.
Dispatcher Objects
The kernel provides a set of primitives known as Dispatcher Objects, also called Waitable Objects.
These objects have a state, either signaled or non-signaled, where the meaning of signaled and non-
signaled depends on the type of object. They are called “waitable” because a thread can wait on such
objects until they become signaled. While waiting, the thread does not consume CPU cycles as it’s in
a Waiting state.
The primary functions used for waiting are KeWaitForSingleObject and KeWaitForMultipleObjects.
Their prototypes (with simplified SAL annotations for clarity) are shown below:
NTSTATUS KeWaitForSingleObject (
_In_ PVOID Object,
_In_ KWAIT_REASON WaitReason,
_In_ KPROCESSOR_MODE WaitMode,
_In_ BOOLEAN Alertable,
_In_opt_ PLARGE_INTEGER Timeout);
NTSTATUS KeWaitForMultipleObjects (
_In_ ULONG Count,
_In_reads_(Count) PVOID Object[],
_In_ WAIT_TYPE WaitType,
_In_ KWAIT_REASON WaitReason,
_In_ KPROCESSOR_MODE WaitMode,
_In_ BOOLEAN Alertable,
_In_opt_ PLARGE_INTEGER Timeout,
_Out_opt_ PKWAIT_BLOCK WaitBlockArray);
• Object - specifies the object to wait for. Note these functions work with objects, not handles. If
you have a handle (maybe provided by user mode), call ObReferenceObjectByHandle to get
the pointer to the object.
Chapter 6: Kernel Mechanisms 166
• WaitReason - specifies the wait reason. The list of wait reasons is pretty long, but drivers should
typically set it to Executive, unless it’s waiting because of a user request, and if so specify
UserRequest.
• WaitMode - can be UserMode or KernelMode. Most drivers should specify KernelMode.
• Alertable - indicates if the thread should be in an alertable state during the wait. Alertable state
allows delivering of user mode Asynchronous Procedure Calls (APCs). User mode APCs can be
delivered if wait mode is UserMode. Most drivers should specify FALSE.
• Timeout - specifies the time to wait. If NULL is specified, the wait is indefinite - as long as it
takes for the object to become signaled. The units of this argument are in 100nsec chunks, where
a negative number is relative wait, while a positive number is an absolute wait measured from
January 1, 1601 at midnight.
• Count - the number of objects to wait on.
• Object[] - an array of object pointers to wait on.
• WaitType - specifies whether to wait for all object to become signaled at once (WaitAll) or just
one object (WaitAny).
• WaitBlockArray - an array of structures used internally to manage the wait operation. It’s
optional if the number of objects is <= THREAD_WAIT_OBJECTS (currently 3) - the kernel will
use the built-in array present in each thread. If the number of objects is higher, the driver must
allocate the correct size of structures from non-paged memory, and deallocate them after the
wait is over.
• STATUS_SUCCESS - the wait is satisfied because the object state has become signaled.
• STATUS_TIMEOUT - the wait is satisfied because the timeout has elapsed.
Note that all return values from the wait functions pass the NT_SUCCESS macro with true.
There are some fine details associated with the wait functions, especially if wait mode is
UserMode and the wait is alertable. Check the WDK docs for the details.
Table 6-3 lists some of the common dispatcher objects and the meaning of signaled and non-signaled
for these objects.
Chapter 6: Kernel Mechanisms 167
All the object types from table 6-3 are also exported to user mode. The primary waiting
functions in user mode are WaitForSingleObject and WaitForMultipleObjects.
The following sections will discuss some of common object types useful for synchronization in drivers.
Some other objects will be discussed as well that are not dispatcher objects, but support waiting as
well.
Mutex
Mutex is the classic object for the canonical problem of one thread among many that can access a
shared resource at any one time.
Mutex is sometimes referred to as Mutant (its original name). These are the same thing.
A mutex is signaled when it’s free. Once a thread calls a wait function and the wait is satisfied, the
mutex becomes non-signaled and the thread becomes the owner of the mutex. Ownership is critical
for a mutex. It means the following:
• If a thread is the owner of a mutex, it’s the only one that can release the mutex.
• A mutex can be acquired more than once by the same thread. The second attempt succeeds
automatically since the thread is the current owner of the mutex. This also means the thread
needs to release the mutex the same number of times it was acquired; only then the mutex
becomes free (signaled) again.
Using a mutex requires allocating a KMUTEX structure from non-paged memory. The mutex API
contains the following functions working on that KMUTEX:
• One of the waiting functions, passing the address of the allocated KMUTEX structure.
• KeReleaseMutex is called when a thread that is the owner of the mutex wants to release it.
Here are the definitions of the APIs that can initialize a mutex:
VOID KeInitializeMutex (
_Out_ PKMUTEX Mutex,
_In_ ULONG Level);
VOID KeInitializeMutant ( // defined in ntifs.h
_Out_ PKMUTANT Mutant,
_In_ BOOLEAN InitialOwner);
The Level parameter in KeInitializeMutex is not used, so zero is a good value as any. KeInitializeMutant
allows specifying if the current thread should be the initial owner of the mutex. KeInitializeMutex
initializes the mutex to be unowned.
Releasing the mutex is done with KeReleaseMutex:
LONG KeReleaseMutex (
_Inout_ PKMUTEX Mutex,
_In_ BOOLEAN Wait);
The returned value is the previous state of the mutex object (including recursive ownership count),
and should mostly be ignored (although it may sometimes be useful for debugging purposes). The
Wait parameter indicates whether the next API call is going to be one of the wait functions. This is
used as a hint to the kernel that can optimize slightly if the thread is about to enter a wait state.
Given the above functions, here is an example using a mutex to access some shared data so that only
a single thread does so at a time:
KMUTEX MyMutex;
LIST_ENTRY DataHead;
void Init() {
KeInitializeMutex(&MyMutex, 0);
}
void DoWork() {
Chapter 6: Kernel Mechanisms 169
KeReleaseMutex(&MyMutex, FALSE);
}
It’s important to release the mutex no matter what, so it’s better to use __try / __finally to make
sure it’s executed however the __try block is exited:
void DoWork() {
// wait for the mutex to be available
}
__finally {
// once done, release the mutex
KeReleaseMutex(&MyMutex, FALSE);
}
}
Figure 6-10 shows two threads attempting to acquire the mutex at roughly the same time, as they
want to access the same data. One thread succeeds in acquiring the mutex, the other has to wait until
the mutex is released by the owner before it can acquire it.
Chapter 6: Kernel Mechanisms 170
Since using __try/__finally is a bit awkward, we can use C++ to create a RAII wrapper for waits.
This could also be used for other synchronization primitives.
First, we’ll create a mutex wrapper that provides functions named Lock and Unlock:
struct Mutex {
void Init() {
KeInitializeMutex(&_mutex, 0);
}
void Lock() {
KeWaitForSingleObject(&_mutex, Executive, KernelMode, FALSE, nullptr);
}
void Unlock() {
KeReleaseMutex(&_mutex, FALSE);
Chapter 6: Kernel Mechanisms 171
private:
KMUTEX _mutex;
};
Then we can create a generic RAII wrapper for waiting for any type that has a Lock and Unlock
functions:
template<typename TLock>
struct Locker {
explicit Locker(TLock& lock) : _lock(lock) {
lock.Lock();
}
~Locker() {
_lock.Unlock();
}
private:
TLock& _lock;
};
With these definitions in place, we can replace the code using the mutex with the following:
Mutex MyMutex;
void Init() {
MyMutex.Init();
}
void DoWork() {
Locker<Mutex> locker(MyMutex);
Since locking should be done for the shortest time possible, you can use an artificial C/C++
scope containing Locker and the code to execute while the mutex is owned, to acquire
the mutex as late as possible and release it as soon as possible.
With C++ 17 and later, Locker can be used without specifying the type like so:
Locker locker(MyMutex);
Since Visual Studio currently uses C++ 14 as its default language standard, you’ll have to
change that in the project properties under the General node / C++ Language Standard.
We’ll use the same Locker type with other synchronization primitives in subsequent sections.
Abandoned Mutex
A thread that acquires a mutex becomes the mutex owner. The owner thread is the only one that
can release the mutex. What happens to the mutex if the owner thread dies for whatever reason?
The mutex then becomes an abandoned mutex. The kernel explicitly releases the mutex (as no thread
can do it) to prevent a deadlock, so another thread would be able to acquire that mutex normally.
However, the returned value from the next successful wait call is STATUS_ABANDONED rather than
STATUS_SUCCESS. A driver should log such an occurrence, as it frequently indicates a bug.
Mutexes support a few miscellaneous functions that may be useful at times, mostly for debugging
purposes. KeReadStateMutex returns the current state (recursive count) of the mutex, where 0 means
“unowned”:
Just remember that after the call returns, the result may no longer be correct as the mutex state may
have changed because some other thread has acquired or released the mutex before the code gets to
examine the result. The benefit of this function is in debugging scenarios only.
You can get the current mutex owner with a call to KeQueryOwnerMutant (defined in <ntifs.h>) as a
CLIENT_ID data structure, containing the thread and process IDs:
VOID KeQueryOwnerMutant (
_In_ PKMUTANT Mutant,
_Out_ PCLIENT_ID ClientId);
Just like with KeReadStateMutex, the returned information may be stale if other threads are doing
work with that mutex.
Chapter 6: Kernel Mechanisms 173
Fast Mutex
A fast mutex is an alternative to the classic mutex, providing better performance. It’s not a dispatcher
object, and so has its own API for acquiring and releasing it. A fast mutex has the following
characteristics compared with a regular mutex:
Because of the first two bullets above, the fast mutex is slightly faster than a regular mutex. In fact,
most drivers requiring a mutex use a fast mutex unless there is a compelling reason to use a regular
mutex.
Don’t use I/O operations while holding on to a fast mutex. I/O completions are delivered
with a special kernel APC, but those are blocked while holding a fast mutex, creating a
deadlock.
A fast mutex is initialized by allocating a FAST_MUTEX structure from non-paged memory and calling
ExInitializeFastMutex. Acquiring the mutex is done with ExAcquireFastMutex or ExAcquireFastMutexUnsaf
(if the current IRQL happens to be APC_LEVEL already). Releasing a fast mutex is accomplished with
ExReleaseFastMutex or ExReleaseFastMutexUnsafe.
Semaphore
The primary goal of a semaphore is to limit something, such as the length of a queue. The semaphore
is initialized with its maximum and initial count (typically set to the maximum value) by calling
KeInitializeSemaphore. While its internal count is greater than zero, the semaphore is signaled. A
thread that calls KeWaitForSingleObject has its wait satisfied, and the semaphore count drops by
one. This continues until the count reaches zero, at which point the semaphore becomes non-signaled.
Semaphores use the KSEMAPHORE structure to hold their state, which must be allocated from non-paged
memory. Here is the definition of KeInitializeSemaphore:
VOID KeInitializeSemaphore (
_Out_ PRKSEMAPHORE Semaphore,
_In_ LONG Count, // starting count
_In_ LONG Limit); // maximum count
As an example, imagine a queue of work items managed by the driver. Some threads want to add items
to the queue. Each such thread calls KeWaitForSingleObject to obtain one “count” of the semaphore.
As long as the count is greater than zero, the thread continues and adds an item to the queue, increasing
its length, and semaphore “loses” a count. Some other threads are tasked with processing work items
from the queue. Once a thread removes an item from the queue, it calls KeReleaseSemaphore that
Chapter 6: Kernel Mechanisms 174
increments the count of the semaphore, moving it to the signaled state again, allowing potentially
another thread to make progress and add a new item to the queue.
KeReleaseSemaphore is defined like so:
LONG KeReleaseSemaphore (
_Inout_ PRKSEMAPHORE Semaphore,
_In_ KPRIORITY Increment,
_In_ LONG Adjustment,
_In_ BOOLEAN Wait);
The Increment parameter indicates the priority boost to apply to the thread that has a successful
waiting on the semaphore. The details of how this boost works are described in the next chapter. Most
drivers should provide the value 1 (that’s the default used by the kernel when a semaphore is released
by the user mode ReleaseSemaphore API). Adjustment is the value to add to the semaphore’s current
count. It’s typically one, but can be a higher value if that makes sense. The last parameter (Wait)
indicates whether a wait operation (KeWaitForSingleObject or KeWaitForMultipleObjects)
immediately follows (see the information bar in the mutex discussion above). The function returns
the old count of the semaphore.
Is a semaphore with a maximum count of one equivalent to a mutex? At first, it seems so,
but this is not the case. A semaphore lacks ownership, meaning one thread can acquire the
semaphore, while another can release it. This is a strength, not a weakness, as described
in the above example. A Semaphore’s purpose is very different from that of a mutex.
You can read the current count of the semaphore by calling KeReadStateSemaphore:
Event
An event encapsulates a boolean flag - either true (signaled) or false (non-signaled). The primary
purpose of an event is to signal something has happened, to provide flow synchronization. For example,
if some condition becomes true, an event can be set, and a bunch of threads can be released from
waiting and continue working on some data that perhaps is now ready for processing.
The are two types of events, the type being specified at event initialization time:
• Notification event (manual reset) - when this event is set, it releases any number of waiting
threads, and the event state remains set (signaled) until explicitly reset.
• Synchronization event (auto reset) - when this event is set, it releases at most one thread (no
matter how many are waiting for the event), and once released the event goes back to the reset
(non-signaled) state automatically.
An event is created by allocating a KEVENT structure from non-paged memory and then calling
KeInitializeEvent to initialize it, specifying the event type (NotificationEvent or SynchronizationEvent)
and the initial event state (signaled or non-signaled):
Chapter 6: Kernel Mechanisms 175
VOID KeInitializeEvent (
_Out_ PRKEVENT Event,
_In_ EVENT_TYPE Type, // NotificationEvent or SynchronizationEvent
_In_ BOOLEAN State); // initial state (signaled=TRUE)
Waiting for an event is done normally with the KeWaitXxx functions. Calling KeSetEvent sets the
event to the signaled state, while calling KeResetEvent or KeClearEvent resets it (non-signaled state)
(the latter function being a bit quicker as it does not return the previous state of the event):
LONG KeSetEvent (
_Inout_ PRKEVENT Event,
_In_ KPRIORITY Increment,
_In_ BOOLEAN Wait);
VOID KeClearEvent (_Inout_ PRKEVENT Event);
LONG KeResetEvent (_Inout_ PRKEVENT Event);
Just like with a semaphore, setting an event allows providing a priority boost to the next successful
wait on the event.
Finally, the current state of an event (signaled or non-signaled) can be read with KeReadStateEvent:
Named Events
Event objects can be named (as can mutexes and semaphores). This can be used as an easy way
of sharing an event object with other drivers or with user-mode clients. One way of creating or
opening a named event by name is with the helper functions IoCreateSynchronizationEvent and
IoCreateNotificationEvent APIs:
PKEVENT IoCreateSynchronizationEvent(
_In_ PUNICODE_STRING EventName,
_Out_ PHANDLE EventHandle);
PKEVENT IoCreateNotificationEvent(
_In_ PUNICODE_STRING EventName,
_Out_ PHANDLE EventHandle);
Chapter 6: Kernel Mechanisms 176
These APIs create the named event object if it does not exist and set its state to signaled, or
obtain another handle to the named event if it does exist. The name itself is provided as a normal
UNICODE_STRING and must be a full path in the Object Manager’s namespace, as can be observed in
the Sysinternals WinObj tool.
These APIs return two values: the pointer to the event object (direct returned value) and an open
handle in the EventHandle parameter. The returned handle is a kernel handle, to be used by the
driver only. The functions return NULL on failure.
You can use the previously described events API to manipulate the returned event by address.
Don’t forget to close the returned handle (ZwClose) to prevent a leak. Alternatively, you can call
ObReferenceObject on the returned pointer to make sure it’s not prematurely destroyed and close
the handle immediately. In that case, call ObDereferenceObject when you’re done with the event.
One use of the IoCreateNotificationEvent API is to gain access to a bunch of named event objects
the kernel provides in the \KernelObjects directory. These events provide various notifications for
memory related status, that may be useful for kernel drivers.
Figure 6-11 shows the named events in WinObj. Note that the lower symbolic links are actually
events, as these are internally implemented as Dynamic Symbolic Links (see more details at https:
//scorpiosoftware.net/2021/04/30/dynamic-symbolic-links/).
All the events shown in figure 6-11 are Notification events. Table 6-5 lists these events with their
meaning.
Name Description
HighMemoryCondition The system has lots of free physical memory
LowMemoryCondition The system is low on physical memory
HighPagedPoolCondition The system has lots of free paged pool memory
LowPagedPoolCondition The system is low on paged pool memory
HighNonPagedPoolCondition The system has lots of free non-paged pool memory
LowNonPagedPoolCondition The system is low on non-paged pool memory
HighCommitCondition The system has lots of free memory in RAM and paging file(s)
LowCommitCondition The system is low on RAM and paging file(s)
MaximumCommitCondition The system is almost out of memory, and no further increase in page files
size is possible
Drivers can use these events as hints to either allocate more memory or free memory as required. The
following example shows how to obtain one of these events and wait for it on some thread (error
handling ommitted):
UNICODE_STRING name;
RtlInitUnicodeString(&name, L"\\KernelObjects\\LowCommitCondition");
HANDLE hEvent;
auto event = IoCreateNotificationEvent(&name, &hEvent);
Write a driver that waits on all these named events and uses DbgPrint to indicate a
signaled event with its description.
Executive Resource
The classic synchronization problem of accessing a shared resource by multiple threads was dealt
with by using a mutex or fast mutex. This works, but mutexes are pessimistic, meaning they allow a
Chapter 6: Kernel Mechanisms 178
single thread to access a shared resource. That may be unfortunate in cases where multiple threads
access a shared resource by reading only.
In cases where it’s possible to distinguish data changes (writes) vs. just looking at the data (reading)
- there is a possible optimization. A thread that requires access to the shared resource can declare
its intentions - read or write. If it declares read, other threads declaring read can do so concurrently,
improving performance. This is especially useful if the shared data changes infrequently, i.e. there are
considerably more reads than writes.
Mutexes by their very nature are pessimistic locks, since they enforce a single thread at a time
execution. This makes them always work at the expense of possible performance gains with
concurrency.
The kernel provides yet another synchronization primitive that is geared towards this scenario, known
as single writer, multiple readers. This object is the Executive Resource, another special object which
is not a dispatcher object.
Initializing an executive resource is done by allocating an ERESOURCE structure from non-paged
pool and calling ExInitializeResourceLite. Once initialized, threads can acquire either the
exclusive lock (for writes) using ExAcquireResourceExclusiveLite or the shared lock by calling
ExAcquireResourceSharedLite. Once done the work, a thread releases the executive resource with
ExReleaseResourceLite (no matter whether it acquired as exclusive or not).
The requirement for using the acquire and release functions is that normal kernel APCs must be
disabled. This can be done with KeEnterCtriticalRegion just before the acquire call, and then
KeLeaveCriticalRegion just after the release call. The following code snippet demonstrates that:
ERESOURCE resource;
void WriteData() {
KeEnterCriticalRegion();
ExAcquireResourceExclusiveLite(&resource, TRUE); // wait until acquired
ExReleaseResourceLite(&resource);
KeLeaveCriticalRegion();
}
Since these calls are so common when working with executive resources, there are functions that
perform both operations with a single call:
Chapter 6: Kernel Mechanisms 179
void WriteData() {
ExEnterCriticalRegionAndAcquireResourceExclusive(&resource);
ExReleaseResourceAndLeaveCriticalRegion(&resource);
}
NTSTATUS ExDeleteResourceLite(
_Inout_ PERESOURCE Resource);
You can query the number of waiting threads for exclusive and shared access of a resource with the
functions ExGetExclusiveWaiterCount and ExGetSharedWaiterCount, respectively.
There are other functions for working with executive resources for some specialized cases. Consult
the WDK documentation for more information.
The simple case is where the system has a single CPU. In this case, when accessing the shared resource,
the low IRQL function just needs to raise IRQL to DISPATCH_LEVEL and then access the resource.
During that time a DPC cannot interfere with this code since the CPU’s IRQL is already 2. Once the
code is done with the shared resource, it can lower the IRQL back to zero, allowing the DPC to execute.
This prevents execution of these routines at the same time. Figure 6-12 shows this setup.
In standard systems, where there is more than one CPU, this synchronization method is not enough,
because the IRQL is a CPU’s property, not a system-wide property. If one CPU’s IRQL is raised to 2,
if a DPC needs to execute, it can execute on another CPU whose IRQL may be zero. In this case, it’s
possible that both functions execute at the same time, accessing the shared data, causing a data race.
How can we solve that? We need something like a mutex, but that can synchronize between processors
- not threads. That’s because when the CPU’s IRQL is 2 or higher, the thread itself loses meaning
because the scheduler cannot do work on that CPU. This kind of object exists - the Spin Lock.
Chapter 6: Kernel Mechanisms 181
Acquiring a spin lock is always a two-step process: first, raise the IRQL to the proper level, which is
the highest level of any function trying to synchronize access to a shared resource. In the previous
example, this associated IRQL is 2. Second, acquire the spin lock. These two steps are combined by
using the appropriate API. This process is depicted in figure 6-14.
Chapter 6: Kernel Mechanisms 183
Acquiring and releasing a spin lock is done using an API that performs the two steps outlined in figure
6-12. Table 6-4 shows the relevant APIs and the associated IRQL for the spin locks they operate on.
KeReleaseInterruptSpinLock.
(d) A set of three functions for manipulating LIST_ENTRY-based linked lists. These functions use the
provided spin lock and raise IRQL to HIGH_LEVEL. Because of the high IRQL, these routines can be
used in any IRQL, since raising IRQL is always a safe operation.
If you acquire a spin lock, be sure to release it in the same function. Otherwise, you’re
risking a deadlock or a system crash.
Where do spin locks come from? The scenario described here requires the driver to allocate
its own spin lock to protect concurrent access to its own data from high-IRQL functions.
Some spin locks exist as part of other objects, such as the KINTERRUPT object used by
hardware-based drivers that handle interrupts. Another example is a system-wide spin
lock known as the Cancel spin lock, which is acquired by the kernel before calling a
cancellation routine registered by a driver. This is the only case where a driver released a
spin lock it has not acquired explicitly.
If several CPUs try to acquire the same spin lock at the same time, which CPU gets
the spin lock first? Normally, there is no order - the CPU with fastest electrons wins
:). The kernel does provide an alternative, called Queued spin locks that serve CPUs
on a FIFO basis. These only work with IRQL DISPATCH_LEVEL. The relevant APIs are
KeAcquireInStackQueuedSpinLock and KeReleaseInStackQueuedSpinLock. Check
the WDK documentation for more details.
Write a C++ wrapper for a DISPATCH_LEVEL spin lock that works with the Locker RAII
class defined earlier in this chapter.
• Queued spin locks always raise to IRQL DISPTACH_LEVEL (2). This means they cannot be used
for synchronizing with an ISR, for example.
• There is a queue of CPU waiting to acquire the spin lock, on a FIFO basis. This is more efficient
when high contention is expected. Normal spin locks provide no gauarantee as to the order of
acquisition when multiple CPUs attempt to acquire a spin lock.
A queued spin lock is initialized just like a normal spin lock (KeInitializeSpinLock). Acquiring
and releasing a queued spin lock is achieved with different APIs:
Chapter 6: Kernel Mechanisms 185
void KeAcquireInStackQueuedSpinLock (
_Inout_ PKSPIN_LOCK SpinLock,
_Out_ PKLOCK_QUEUE_HANDLE LockHandle);
void KeReleaseInStackQueuedSpinLock (
_In_ PKLOCK_QUEUE_HANDLE LockHandle);
Except for a spin lock, the caller provides an opaque KLOCK_QUEUE_HANDLE structure that is filled in by
KeAcquireInStackQueuedSpinLock. The same one must be passed to KeReleaseInStackQueuedSpinLock.
Just like with normal dispatch-level spin locks, shortcuts exist if the caller is already at IRQL
DISPATCH_LEVEL. KeAcquireInStackQueuedSpinLockAtDpcLevel acquires the spin lock with no
IRQL changes, while KeReleaseInStackQueuedSpinLockFromDpcLevel releases it.
Work Items
Sometimes there is a need to run a piece of code on a different thread than the executing one. One
way to do that is to create a thread explicitly and task it with running the code. The kernel provides
functions that allow a driver to create a separate thread of execution: PsCreateSystemThread and
IoCreateSystemThread (available in Windows 8+). These functions are appropriate if the driver
needs to run code in the background for a long time. However, for time-bound operations, it’s better
to use a kernel-provided thread pool that will execute your code on some system worker thread.
Work items is the term used to describe functions queued to the system thread pool. A driver can
allocate and initialize a work item, pointing to the function the driver wishes to execute, and then the
work item can be queued to the pool. This may seem very similar to a DPC, the primary difference
Chapter 6: Kernel Mechanisms 186
being work items always execute at IRQL PASSIVE_LEVEL (0). Thus, work items can be used by IRQL
2 code (such as DPCs) to perform operations not normally allowed at IRQL 2 (such as I/O operations).
Creating and initializing a work item can be done in one of two ways:
• Allocate and initialize the work item with IoAllocateWorkItem. The function returns a
pointer to the opaque IO_WORKITEM. When finished with the work item it must be freed with
IoFreeWorkItem.
• Allocate an IO_WORKITEM structure dynamically with size provided by IoSizeofWorkItem.
Then call IoInitializeWorkItem. When finished with the work item, call IoUninitializeWorkItem.
These functions accept a device object, so make sure the driver is not unloaded while there is a work
item queued or executing.
There is another set of APIs for work items, all start with Ex, such as ExQueueWorkItem.
These functions do not associate the work item with anything in the driver, so it’s possible
for the driver to be unloaded while a work item is still executing. These APIs are marked
as deprecated - always prefer using the Io functions.
viud IoQueueWorkItem(
_Inout_ PIO_WORKITEM IoWorkItem, // the work item
_In_ PIO_WORKITEM_ROUTINE WorkerRoutine, // the function to be called
_In_ WORK_QUEUE_TYPE QueueType, // queue type
_In_opt_ PVOID Context); // driver-defined value
The callback function the driver needs to provide has the following prototype:
IO_WORKITEM_ROUTINE WorkItem;
void WorkItem(
_In_ PDEVICE_OBJECT DeviceObject,
_In_opt_ PVOID Context);
The system thread pool has several queues (at least logically), based on the thread priorities that serve
these work items. There are several levels defined:
Chapter 6: Kernel Mechanisms 187
The documentation indicates DelayedWorkQueue must be used, but in reality, any other supported
level can be used.
There is another function that can be used to queue a work item: IoQueueWorkItemEx.
This function uses a different callback that has an added parameter which is the work
item itself. This is useful if the work item function needs to free the work item before it
exits.
Summary
In this chapter, we looked at various kernel mechanisms driver developers should be aware of and
use. In the next chapter, we’ll take a closer look at I/O Request Packets (IRPs).
Chapter 7: The I/O Request Packet
After a typical driver completes its initialization in DriverEntry, its primary job is to handle requests.
These requests are packaged as the semi-documented I/O Request Packet (IRP) structure. In this
chapter, we’ll take a deeper look at IRPs and how a driver handles common IRP types.
In This chapter:
• Introduction to IRPs
• Device Nodes
• IRP and I/O Stack Location
• Dispatch Routines
• Accessing User Buffers
• Putting it All Together: The Zero Driver
Introduction to IRPs
An IRP is a structure that is allocated from non-paged pool typically by one of the “managers” in
the Executive (I/O Manager, Plug & Play Manager, Power Manager), but can also be allocated by the
driver, perhaps for passing a request to another driver. Whichever entity allocating the IRP is also
responsible for freeing it.
An IRP is never allocated alone. It’s always accompanied by one or more I/O Stack Location structures
(IO_STACK_LOCATION). In fact, when an IRP is allocated, the caller must specify how many I/O stack
locations need to be allocated with the IRP. These I/O stack locations follow the IRP directly in
memory. The number of I/O stack locations is the number of device objects in the device stack. We’ll
discuss device stacks in the next section. When a driver receives an IRP, it gets a pointer to the IRP
structure itself, knowing it’s followed by a set of I/O stack location, one of which is for the driver’s
use. To get the correct I/O stack location, a driver calls IoGetCurrentIrpStackLocation (actually a
macro). Figure 7-1 shows a conceptual view of an IRP and its associated I/O stack locations.
Chapter 7: The I/O Request Packet 189
The parameters of the request are somehow “split” between the main IRP structure and the current
IO_STACK_LOCATION.
Device Nodes
The I/O system in Windows is device-centric, rather than driver-centric. This has several implications:
• Device objects can be named, and handles to device objects can be opened. The CreateFile
function accepts a symbolic link that leads to a device object. CreateFile cannot accept a
driver’s name as an argument.
• Windows supports device layering - one device can be layered on top of another. Any request
destined for a lower device will reach the uppermost device first. This layering is common for
hardware-based devices, but it works with any device type.
Figure 7-2 shows an example of several layers of devices, “stacked” one on top of the other. This
set of devices is known as a device stack, sometimes referred to as device node (although the term
device node is often used with hardware device stacks). Figure 7-1 shows six layers, or six devices.
Each of these devices is represented by a DEVICE_OBJECT structure created by calling the standard
IoCreateDevice function.
Chapter 7: The I/O Request Packet 190
The different device objects that comprise the device node (devnode) layers are labeled according to
their role in the devnode. These roles are relevant in a hardware-based devnode.
All the device objects in figure 7-2 are just DEVICE_OBJECT structures, each created by a different
driver that is in charge of that layer. More generically, this kind of device node does not have to be
related to hardware-based device drivers.
Here is a quick rundown of the meaning of the labels present in figure 7-2:
• PDO (Physical Device Object) - Despite the name, there is nothing “physical” about it. This
device object is created by a bus driver - the driver that is in charge of the particular bus (e.g.
PCI, USB, etc.). This device object represents the fact that there is some device in that slot on
that bus.
• FDO (Functional Device Object) - This device object is created by the “real” driver; that is, the
driver typically provided by the hardware’s vendor that understands the details of the device
intimately.
• FiDO (Filter Device Object) - These are optional filter devices created by filter drivers.
Chapter 7: The I/O Request Packet 191
The Plug & Play (P&P) manager, in this case, is responsible for loading the appropriate drivers, starting
from the bottom. As an example, suppose the devnode in figure 7-2 represents a set of drivers that
manage a PCI network card. The sequence of events leading to the creation of this devnode can be
summarized as follows:
1. The PCI bus driver (pci.sys) recognizes the fact that there is something in that particular slot.
It creates a PDO (IoCreateDevice) to represent this fact. The bus driver has no idea whether
this a network card, video card or something else; it only knows there is something there and
can extract basic information from its controller, such as the Vendor ID and Device ID of the
device.
2. The PCI bus driver notifies the P&P manager that it has changes on its bus (calls IoInvalidateDeviceRelatio
with the BusRelations enumeration value).
3. The P&P manager requests a list of PDOs managed by the bus driver. It receives back a list of
PDOs, in which this new PDO is included.
4. Now the P&P manager’s job is to find and load the proper driver that should manage this new
PDO. It issues a query to the bus driver to request the full hardware device ID.
5. With this hardware ID in hand, the P&P manager looks in the Registry at HKLM\System\
CurrentControlSet\Enum\PCI\(HardwareID). If the driver has been loaded before, it will be
registered there, and the P&P manager will load it. Figure 7-3 shows an example hardware
ID in the registry (NVIDIA display driver).
6. The driver loads and creates the FDO (another call to IoCreateDevice), but adds an additional
call to IoAttachDeviceToDeviceStack, thus attaching itself over the previous layer (typically
the PDO).
We’ll see how to write filter drivers that take advantage of IoAttachDeviceToDeviceStack in
chapter 13.
Chapter 7: The I/O Request Packet 192
The value Service in figure 7-3 indirectly points to the actual driver at
HKLM\System\CutrrentControlSet\Services\{ServiceName} where all drivers must be
registered.
The filter device objects are loaded as well, if they are registered correctly in the Registry. Lower filters
(below the FDO) load in order, from the bottom. Each filter driver loaded creates its own device object
and attaches it on top of the previous layer. Upper filters work the same way but are loaded after the
FDO. All this means that with operational P&P devnodes, there are at least two layers - PDO and FDO,
but there could be more if filters are involved. We’ll look at basic filter development for hardware-
based drivers in chapter 13.
Full discussion of Plug & Play and the exact way this kind of devnode is built is beyond the scope of
this book. The previous description is incomplete and glances over some details, but it should give
you the basic idea. Every devnode is built from the bottom up, regardless of whether it is related to
hardware or not.
Lower filters are searched in two locations: the hardware ID key shown in figure 7-3 and in the corre-
sponding class based on the ClassGuid value listed under HKLM\System\CurrentControlSet\Control\Classes.
The value name itself is LowerFilters and is a multiple string value holding service names, pointing
to the same Services key. Upper filters are searched in a similar manner, but the value name is
Chapter 7: The I/O Request Packet 193
UpperFilters. Figure 7-4 shows the registry settings for the DiskDrive class, which has a lower filter
and an upper filter.
IRP Flow
Figure 7-2 shows an example devnode, whether related to hardware or not. An IRP is created by one
of the managers in the Executive - for most of our drivers that is the I/O Manager.
The manager creates an IRP with its associated IO_STACK_LOCATIONs - six in the example in figure
7-2. The manager initializes the main IRP structure and the first I/O stack location only. Then it passes
the IRP’s pointer to the uppermost layer.
A driver receives the IRP in its appropriate dispatch routine. For example, if this is a Read IRP, then
the driver will be called in its IRP_MJ_READ index of its MajorFunction array from its driver object.
At this point, a driver has several options when dealing with IRP:
• Pass the request down - if the driver’s device is not the last device in the devnode, the driver
can pass the request along if it’s not interesting for the driver. This is typically done by a filter
driver that receives a request that it’s not interested in, and in order not to hurt the functionality
of the device (since the request is actually destined for a lower-layer device), the driver can pass
it down. This must be done with two calls:
– Call IoSkipCurrentIrpStackLocation to make sure the next device in line is going to
see the same information given to this device - it should see the same I/O stack location.
– Call IoCallDriver passing the lower device object (which the driver received at the time
it called IoAttachDeviceToDeviceStack) and the IRP.
Before passing the request down, the driver must prepare the next I/O stack location with proper in-
formation. Since the I/O manager only initializes the first I/O stack location, it’s the responsibility of
each driver to initialize the next one. One way to do that is to call IoCopyIrpStackLocationToNext
before calling IoCallDriver. This works, but is a bit wasteful if the driver just wants the lower
layer to see the same information. Calling IoSkipCurrentIrpStackLocation is an optimization
Chapter 7: The I/O Request Packet 194
which decrements the current I/O stack location pointer inside the IRP, which is later incremented
by IoCallDriver, so the next layer sees the same IO_STACK_LOCATION this driver has seen. This
decrement/increment dance is more efficient than making an actual copy.
• Handle the IRP fully - the driver receiving the IRP can just handle the IRP without propagating it
down by eventually calling IoCompleteRequest. Any lower devices will never see the request.
• Do a combination of the above options - the driver can examine the IRP, do something (such
as log the request), and then pass it down. Or it can make some changes to the next I/O stack
location, and then pass the request down.
• Pass the request down (with or without changes) and be notified when the request completes
by a lower layer device - Any layer (except the lowest one) can set up an I/O completion routine
by calling IoSetCompletionRoutine before passing the request down. When one of the lower
layers completes the request, the driver’s completion routine will be called.
• Start some asynchronous IRP handling - the driver may want to handle the request, but if
the request is lengthy (typical of a hardware driver, but also could be the case for a software
driver), the driver may mark the IRP as pending by calling IoMarkIrpPending and return a
STATUS_PENDING from its dispatch routine. Eventually, it will have to complete the IRP.
Once some layer calls IoCompleteRequest, the IRP turns around and starts “bubbling up” towards
the originator of the IRP (typically one of the I/O System Managers). If completion routines have been
registered, they will be invoked in reverse order of registration.
In most drivers in this book, layering will not be considered, since the driver is most likely the single
device in its devnode. The driver will handle the request then and there or handle it asynchronously;
it will not pass it down, as there is no device underneath.
We’ll discuss other aspects of IRP handling in filter drivers, including completion routines, in chapter
13.
• IoStatus - contains the Status (NT_STATUS) of the IRP and an Information field. The
Information field is a polymorphic one, typed as ULONG_PTR (32 or 64-bit integer), but its
meaning depends on the type of IRP. For example, for Read and Write IRPs, its meaning is the
number of bytes transferred in the operation.
• UserBuffer - contains the raw buffer pointer to the user’s buffer for relevant IRPs. Read and
Write IRPs, for instance, store the user’s buffer pointer in this field. In DeviceIoControl IRPs,
this points to the output buffer provided in the request.
• UserEvent - this is a pointer to an event object (KEVENT) that was provided by a client if the call
is asynchronous and such an event was supplied. From user mode, this event can be provided
(with a HANDLE) in the OVERLAPPED structure that is mandatory for invoking I/O operations
asynchronously.
• AssociatedIrp - this union holds three members, only one (at most) of which is valid:
* SystemBuffer - the most often used member. This points to a system-allocated non-paged pool
buffer used for Buffered I/O operations. See the section “Buffered I/O” later in this chapter for the
details.
* MasterIrp - A pointer to a “master” IRP, if this IRP is an associated IRP. This idea is supported
by the I/O manager, where one IRP is a “master” that may have several “associated” IRPs. Once all
Chapter 7: The I/O Request Packet 196
the associated IRPs complete, the master IRP is completed automatically. MasterIrp is valid for an
associated IRP - it points to the master IRP.
* IrpCount - for the master IRP itself, this field indicates the number of associated IRPs associated
with this master IRP.
Usage of master and associated IRPs is pretty rare. We will not be using this mechanism in this
book.
• Cancel Routine - a pointer to a cancel routine that is invoked (if not NULL) if the driver is asked
to can cel the IRP, such as with the user mode functions CancelIo and CancelIoEx. Software
drivers rarely need cancellation routines, so we will not be using those in most examples.
• MdlAddress - points to an optional Memory Descriptor List (MDL). An MDL is a kernel data
structure that knows how to describe a buffer in RAM. MdlAddress is used primarily with
Direct I/O (see the section “Direct I/O” later in this chapter).
Every IRP is accompanied by one or more IO_STACK_LOCATIONs. Figure 7-6 shows the important
fields in an IO_STACK_LOCATION.
• MajorFunction - this is the major function of the IRP (IRP_MJ_CREATE, IRP_MJ_READ, etc.).
This field is sometimes useful if the driver points more than one major function code to the
same handling routine. In that routine, the driver may want to distinguish between the major
function codes using this field.
• MinorFunction - some IRP types have minor functions. These are IRP_MJ_PNP, IRP_MJ_POWER
and IRP_MJ_SYSTEM_CONTROL (WMI). Typical code for these handlers has a switch statement
based on the MinorFunction. We will not be using these types of IRPs in this book, except
in the case of filter drivers for hardware-based devices, which we’ll examine in some detail in
chapter 13.
• FileObject - the FILE_OBJECT associated with this IRP. Not needed in most cases, but is
available for dispatch routines that need it.
• DeviceObject - the device object associated with this IRP. Dispatch routines receive a pointer
to this, so typically accessing this field is not required.
• CompletionRoutine - the completion routine that is set for the previous (upper) layer (set with
IoSetCompletionRoutine), if any.
• Context - the argument to pass to the completion routine (if any).
• Parameters - this monster union contains multiple structures, each valid for a particular
operation. For example, in a Read (IRP_MJ_READ) operation, the Parameters.Read structure
field should be used to get more information about the Read operation.
The current I/O stack location obtained with IoGetCurrentIrpStackLocation hosts most of the
parameters of the request in the Parameters union. It’s up to the driver to access the correct structure,
as we’ve already seen in chapter 4 and will see again in this and subsequent chapters.
lkd> !irpfind
Unable to get offset of nt!_MI_VISIBLE_STATE.SpecialPool
Unable to get value of nt!_MI_VISIBLE_STATE.SessionSpecialPool
Scanning large pool allocation table for tag 0x3f707249 (Irp?) (ffffbf0a8761000\
0 : ffffbf0a87910000)
Faced with a specific IRP, the command !irp examines the IRP, providing a nice overview of its data.
As always, the dt command can be used with the nt!_IRP type to look at the entire IRP structure.
Here’s an example of one IRP viewed with !irp:
(truncated)
\FileSystem\Ntfs
Args: 00004000 00000051 00000000 00000000
[IRP_MJ_DIRECTORY_CONTROL(c), N/A(2)]
0 0 ffffbf0a60e83dc0 ffffbf0a7f52f790 00000000-00000000
\FileSystem\FltMgr
Args: 00004000 00000051 00000000 00000000
The !irp commands lists the I/O stack locations and the information stored in them. The current I/O
stack location is marked with a > symbol (see the IRP_MJ_DIRECTORY_CONTROL line above).
The details for each IO_STACK_LOCATION are as follows (in order):
• first line:
– Major function code (e.g. IRP_MJ_DEVICE_CONTROL).
– Minor function code.
• second line:
– Flags (mostly unimportant)
– Control flags
– Device object pointer
– File object pointer
– Completion routine (if any)
– Completion context (for the completion routine)
– Success, Error, Cancel indicate the IRP completion cases where the completion routine
would be invoked
– “pending” if the IRP was marked as pending (SL_PENDING_RETURNED flag is set in the
Control flags)
• Driver name for that layer
• “Args” line:
– The value of Parameters.Others.Argument1 in the I/O stack location. Essentially the
first pointer-size member in the Parameters union.
– The value of Parameters.Others.Argument2 in the I/O stack location (the second
pointer-size member in the Parameters union)
– Device I/O control code (if IRP_MJ_DEVICE_CONTROL or IRP_MJ_INTERNAL_DEVICE_-
CONTROL). It’s shown as a DML link that invokes the !ioctldecode command to decode
the control code (more on device I/O control codes later in this chapter). For other major
function codes, shows the third pointer-size member (Parameters.Others.Argument3)
– The forth pointer-size member (Parameters.Others.Argument4)
The !irp command accepts an optional details argument. The default is zero, which provides the
output described above (considered a summary). Specifying 1 provides additional information in a
concrete form. Here is an example for an IRP targeted towards the console driver (you can locate
those easily by looking for cmd.exe processes):
Chapter 7: The I/O Request Packet 200
Additionally, specifying detail value of 4 shows Driver Verifier information related to the IRP (if the
driver handling this IRP is under the verifier’s microscope). Driver Verifier will be discussed in chapter
13.
Chapter 7: The I/O Request Packet 201
Dispatch Routines
In chapter 4, we have seen an important aspect of DriverEntry - setting up dispatch routines. These
are the functions connected with major function codes. The MajorFunction field in DRIVER_OBJECT
is the array of function pointers index by the major function code.
All dispatch routines have the same prototype, repeated here for convenience using the DRIVER_-
DISPATCH typedef from the WDK (somewhat simplified for clarity):
The relevant dispatch routine (based on the major function code) is the first routine in a driver that
sees the request. Normally, it’s called in the requesting thread context, i.e. the thread that called the
relevant API (e.g. ReadFile) in IRQL PASSIVE_LEVEL (0). However, it’s possible that a filter driver
sitting on top of this device sent the request down in a different context - it may be some other thread
unrelated to the original requestor and even in higher IRQL, such as DISPATCH_LEVEL (2). Robust
drivers need to be ready to deal with this kind of situation, even though for software drivers this
“inconvenient” context is rare. We’ll discuss the way to properly deal with this situation in the section
“Accessing User Buffers”, later in this chapter.
The first thing a typical dispatch routine does is check for errors. For example, read and write
operations contain buffers - do these buffers have appropriate size? For DeviceIoControl, there is a
control code in addition to potentially two buffers. The driver needs to make sure the control code is
something it recognizes. If any error is identified, the IRP is typically completed immediately with an
appropriate status.
If all checks turn up ok, then the driver can deal with performing the requested operation.
Here is the list of the most common dispatch routines for a software driver:
• IRP_MJ_CREATE - corresponds to a CreateFile call from user mode or ZwCreateFile in kernel
mode. This major function is essentially mandatory, otherwise no client will be able to open a
handle to a device controlled by this driver. Most drivers just complete the IRP with a success
status.
• IRP_MJ_CLOSE - the opposite of IRP_MJ_CREATE. Called by CloseHandle from user mode or
ZwClose from kernel mode when the last handle to the file object is about to be closed. Most
drivers just complete the request successfully, but if something meaningful was done in IRP_-
MJ_CREATE, this is where it should be undone.
• IRP_MJ_READ - corresponds to a read operation, typically invoked from user mode by ReadFile
or kernel mode with ZwReadFile.
• IRP_MJ_WRITE - corresponds to a write operation, typically invoked from user mode by
WriteFile or kernel mode with ZwWriteFile.
• IRP_MJ_DEVICE_CONTROL - corresponds to the DeviceIoControl call from user mode or
ZwDeviceIoControlFile from kernel mode (there are other APIs in the kernel that can
generate IRP_MJ_DEVICE_CONTROL IRPs).
• IRP_MJ_INTERNAL_DEVICE_CONTROL - similar to IRP_MJ_DEVICE_CONTROL, but only available
to kernel callers.
Chapter 7: The I/O Request Packet 202
Completing a Request
Once a driver decides to handle an IRP (meaning it’s not passing down to another driver), it must
eventually complete it. Otherwise, we have a leak on our hands - the requesting thread cannot really
terminate and by extension, its containing process will linger on as well, resulting in a “zombie
process”.
Completing a request means calling IoCompleteRequest after setting the request status and extra
information. If the completion is done in the dispatch routine itself (a common case for software
drivers), the routine must return the same status that was placed in the IRP.
The following code snippet shows how to complete a request in a dispatch routine:
Since the dispatch routine must return the same status as was placed in the IRP, it’s
tempting to write the last statement like so: return Irp->IoStatus.Status; This,
however, will likely result in a system crash. Can you guess why?
After the IRP is completed, touching any of its members is a bad idea. The IRP has probably
already been freed and you’re touching deallocated memory. It can actually be worse,
since another IRP may have been allocated in its place (this is common), and so the code
may return the status of some random IRP.
The Information field should be zero in case of an error (a failure status). Its exact meaning for a
successful operation depends on the type of IRP.
The IoCompleteRequest API accepts two arguments: the IRP itself and an optional value to
temporarily boost the original thread’s priority (the thread that initiated the request in the first place).
In most cases, for software drivers, the thread in question is the executing thread, so a thread boost
is inappropriate. The value IO_NO_INCREMENT is defined as zero, so no increment in the above code
snippet.
However, the driver may choose to give the thread a boost, regardless of whether it’s the calling thread
or not. In this case, the thread’s priority jumps with the given boost, and then it’s allowed to execute
one quantum with that new priority before the priority decreases by one, it can then get another
quantum with the reduced priority, and so on, until its priority returns to its original level. Figure 7-7
illustrates this scenario.
Chapter 7: The I/O Request Packet 203
The thread’s priority after the boost can never go above 15. If it’s supposed to, it will be
15. If the original thread’s priority is above 15 already, boosting has no effect.
• IRQL of the calling CPU is 2 (or higher), meaning no page fault handling can occur.
• The thread calling the driver may be some arbitrary thread, and not the original requestor. This
means that the buffer pointer(s) provided are meaningless, since the wrong process address
space is accessible.
Chapter 7: The I/O Request Packet 204
Using exception handling in such a case will not work as expected, because we’ll be accessing some
memory location that is essentially invalid in this random process context. Even if the access succeeds
(because that memory happens to be allocated in this random process and is resident in RAM), you’ll
be accessing random memory, and certainly not the original buffer provided to the client.
All this means that there must be some good way to access the original user’s buffer in such an
inconvenient context. In fact, there are two such ways provided by the I/O manager for this purpose,
called Buffered I/O and Direct I/O. In the next two sections, we’ll see what each of these schemes
mean and how to use them.
Some data structures are always safe to access, since they are allocated from non-paged
pool (and are in system space). Common examples are device objects (created with
IoCreateDevice) and IRPs.
Buffered I/O
Buffered I/O is the simplest of the two ways. To get support for Buffered I/O for Read and Write
operations, a flag must be set on the device object like so:
DeviceObject is the allocated pointer from a previous call to IoCreateDevice (or IoCreateDeviceSecure).
For IRP_MJ_DEVICE_CONTROL buffers, see the section “User Buffers for IRP_MJ_DEVICE_CONTROL”
later in this chapter.
Here are the steps taken by the I/O Manager and the driver when a read or write request arrives:
1. The I/O Manager allocates a buffer from non-paged pool with the same size as the user’s buffer.
It stores the pointer to this new buffer in the AssociatedIrp->SystemBuffer member of the
IRP. (The buffer size can be found in the current I/O stack location’s Parameters.Read.Length
or Parameters.Write.Length.)
2. For a write request, the I/O Manager copies the user’s buffer to the system buffer.
3. Only now the driver’s dispatch routine is called. The driver can use the system buffer pointer
directly without any checks, because the buffer is in system space (its address is absolute - the
same from any process context), and in any IRQL, because the buffer is allocated from non-
paged pool, so it cannot be paged out.
4. Once the driver completes the IRP (IoCompleteRequest), the I/O manager (for read requests)
copies the system buffer back to the user’s buffer (the size of the copy is determined by the
IoStatus.Information field in the IRP set by the driver).
5. Finally, the I/O Manager frees the system buffer.
Chapter 7: The I/O Request Packet 205
You may be wondering how does the I/O Manager copy back the system buffer to the
original user’s buffer from IoCompleteRequest. This function can be called from any
thread, in IRQL <= 2. The way it’s done is by queuing a special kernel APC to the thread
that requested the operation. Once this thread is scheduled for execution, the first thing
it does is run this APC which performs the actual copying. The requesting thread is
obviously in the correct process context, and the IRQL is 1, so page faults can be handled
normally.
Figures 7-8a to 7-8e illustrate the steps taken with Buffered I/O.
Figure 7-8d: Buffered I/O: on IRP completion, I/O manager copies buffer back (for read)
Figure 7-8e: Buffered I/O: final state - I/O manager frees system buffer
• Easy to use - just specify the flag in the device object, and everything else is taken care of by
Chapter 7: The I/O Request Packet 208
Direct I/O
The purpose of Direct I/O is to allow access to a user’s buffer in any IRQL and any thread but without
any copying going around.
For read and write requests, selecting Direct I/O is done with a different flag of the device object:
DeviceObject->Flags |= DO_DIRECT_IO;
As with Buffered I/O, this selection only affects read and write requests. For DeviceIoControl see
the next section.
Here are the steps involved in handling Direct I/O:
1. The I/O Manager first makes sure the user’s buffer is valid and then pages it into physical
memory (if it wasn’t already there).
2. It then locks the buffer in memory, so it cannot be paged out until further notice. This solves
one of the issues with buffer access - page faults cannot happen, so accessing the buffer in any
IRQL is safe.
3. The I/O Manager builds a Memory Descriptor List (MDL), a data structure that describes a
buffer in physical memory. The address of this data structure is stored in the MdlAddress field
of the IRP.
4. At this point, the driver gets the call to its dispatch routine. The user’s buffer, although locked
in RAM, cannot be accessed from an arbitrary thread just yet. When the driver requires access
to the buffer, it must call a function that maps the same user buffer to a system address,
which by definition is valid in any process context. So essentially, we get two mappings to
the same memory buffer. One is from the original address (valid only in the context of the
requestor process) and the other in system space, which is always valid. The API to call is
MmGetSystemAddressForMdlSafe, passing the MDL built by the I/O Manager. The return
value is the system address.
5. Once the driver completes the request, the I/O Manager removes the second mapping (to system
space), frees the MDL, and unlocks the user’s buffer, so it can be paged normally just like any
other user-mode memory.
The MDL is in actually a list of MDL structures, each one describing a piece of the buffer that
is contigous in physical memory. Remember, that a buffer that is contigous in virtual memory is
not necessary contigous in physical memory (the smallest piece is a page size). In most cases, we
don’t need to care about this detail. One case where this matters is in Direct Memory Access (DMA)
operations. Fortunately, this is in the realm of hardware-based drivers.
Chapter 7: The I/O Request Packet 209
Figures 7-9a to 7-9f illustrate the steps taken with Direct I/O.
Figure 7-9b: Direct I/O: I/O manager faults buffer’s pages to RAM and locks them
Figure 7-9c: Direct I/O: the MDL describing the buffer is stored in the IRP
Chapter 7: The I/O Request Packet 211
Figure 7-9d: Direct I/O: the driver double-maps the buffer to a system address
Figure 7-9e: Direct I/O: the driver accesses the buffer using the system address
Chapter 7: The I/O Request Packet 212
Figure 7-9f: Direct I/O: when the IRP is completed, the I/O manager frees the mapping, the MDL and unlocks the buffer
Notice there is no copying at all. The driver just reads/writes to the user’s buffer directly, using the
system address.
Locking the user’s buffer is done with the MmProbeAndLockPages API, fully documented
in the WDK. Unlocking is done with MmUnlockPages, also documented. This means a
driver can use these routines outside the narrow context of Direct I/O.
PVOID MmGetSystemAddressForMdlSafe (
_Inout_ PMDL Mdl,
_In_ ULONG Priority);
The function is implemented inline within the wdm.h header by calling the more generic MmMapLockedPagesSpecify
function:
Chapter 7: The I/O Request Packet 213
You may be wondering why doesn’t the I/O manager call MmGetSystemAddressForMdlSafe
automatically, which would be simple enough to do. This is an optimization, where the driver may
not need to call this function at all if there is any error in the request, so that the mapping doesn’t
have to occur at all.
Drivers that don’t set either of the flags DO_BUFFERED_IO nor DO_DIRECT_IO in the device object
flags implicitly use Neither I/O, which simply means the driver doesn’t get any special help from the
I/O manager, and it’s up to the driver to deal with the user’s buffer.
BOOL DeviceIoControl(
HANDLE hDevice, // handle to device or file
DWORD dwIoControlCode, // IOCTL code (see <winioctl.h>)
PVOID lpInBuffer, // input buffer
DWORD nInBufferSize, // size of input buffer
PVOID lpOutBuffer, // output buffer
DWORD nOutBufferSize, // size of output buffer
PDWORD lpdwBytesReturned, // # of bytes actually returned
LPOVERLAPPED lpOverlapped); // for async. operation
There are three important parameters here: the I/O control code, and optional two buffers designated
“input” and “output”. As it turns out, the way these buffers are accessed depends on the control
code, which is very convenient, because different requests may have different requirements related
to accessing the user’s buffer(s).
The control code defined by a driver must be built with the CTL_CODE macro, defined in the WDK
and user-mode headers, defined like so:
The first parameter, DeviceType can be one of a set of constants defined by Microsoft for various
known device types (such as FILE_DEVICE_DISK and FILE_DEVICE_KEYBOARD). For custom devices
(like the ones we are writing), it can be any value, but the documentation states that the minimum
value for custom codes should be 0x8000.
The second parameter, Function, is a running index, that should be different between multiple control
codes defined by the same driver. If all other components of the macro are same (possible), at least
the Function would be a differentating factor. Similarly to device type, the official documentation
states that custom devices should use values starting from 0x800.
The third parameter (Method) is the key to selecting the buffering method for accessing the input and
output buffers provided with DeviceIoControl. Here are the options:
• METHOD_NEITHER - this value means no help is required of the I/O manager, so the driver is left
dealing with the buffers on its own. This could be useful, for instance, if the particular code
does not require any buffer - the control code itself is all the information needed - it’s best to
let the I/O manager know that it does not need to do any additional work.
– In this case, the pointer to the user’s input buffer is stored in the current I/O stack loca-
tion’s Parameters.DeviceIoControl.Type3InputBuffer field, and the output buffer is
stored in the IRP’s UserBuffer field.
• METHOD_BUFFERED - this value indicates Buffered I/O for both the input and output buffer. When
the request starts, the I/O manager allocates the system buffer from non-paged pool with the
size that is the maximum of the lengths of the input and output buffers. It then copies the input
buffer to the system buffer. Only now the IRP_MJ_DEVICE_CONTROL dispatch routine is invoked.
When the request completes, the I/O manager copies the number of bytes indicated with the
IoStatus.Information field in the IRP to the user’s output buffer.
Chapter 7: The I/O Request Packet 215
The last bullet indicates that the output buffer can also be treated as input by using
METHOD_IN_DIRECT.
Finally, the Access parameter to the macro indicates the direction of data flow. FILE_WRITE_ACCESS
means from the client to the driver, FILE_READ_ACCESS means the opposite, and FILE_ANY_ACCESS
means bi-directional access (the input and output buffers are used). You should always use FILE_-
ANY_ACCESS. Beside simplifying the control code building, you guarantee that if later on, once the
driver is already deployed, you may want to use the other buffer, you wouldn’t need to change the
Access parameter, and so not disturb existing clients that would not know about the control code
change.
If a control code is built with METHOD_NEITHER, the I/O manager does nothing to help with
accessing the buffer(s). The values for the input and output buffer pointers provided by
the client are copied as-is to the IRP. No checking is done by the I/O manager to make sure
these pointers point to valid memory. A driver should not use these pointers as memory
pointers, but they can be used as two arbitrary values propagating to the driver that may
mean something.
The driver will use Direct I/O so as not to incur the overhead of copies, as the buffers provided by the
client can potentially be very large.
We’ll start the project by creating an “Empty WDM Project” in Visual Studio and and name it Zero.
Then we’ll delete the created INF file, resulting in an empty project, just like in previous examples.
Many user mode projects created by Visual Studio already use precompiled headers.
Kernel-mode projects provided by the WDK templates currently don’t use precompiled
headers. Since we’re starting with an empty project, we have to set up precompiled headers
manually anyway.
• Add a new header file to the project and call it pch.h. This file will serve as the precompiled
header. Add all rarely-changing #includes here:
// pch.h
#pragma once
#include <ntddk.h>
• Add a source file named pch.cpp and put a single #include in it: the precompiled header itself:
#include "pch.h"
• Now comes the tricky part. Letting the compiler know that pch.h is the precompiled header
and pch.cpp is the one creating it. Open project properties, select All Configurations and All
Platforms so you won’t need to configure every configuration/platform separately, navigate to
C/C++ / Precompiled Headers and set Precompiled Header to Use and the file name to “pch.h”
(see figure 7-10). Click OK and to close the dialog box.
Chapter 7: The I/O Request Packet 217
• The pch.cpp file should be set as the creator of the precompiled header. Right click this file
in Solution Explorer, and select Properties. Navigate to C/C++ / Precompiled Headers and set
Precompiled Header to Create (see figure 7-11). Click OK to accept the setting.
Chapter 7: The I/O Request Packet 218
From this point on, every C/CPP file in the project must #include "pch.h" as the first thing in the
file. Without this include, the project will not compile.
Make sure there is nothing before this #include "pch.h" in a source file. Anything before
this line does not get compiled at all!
// DriverEntry
DriverObject->DriverUnload = ZeroUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = ZeroCreateClose;
DriverObject->MajorFunction[IRP_MJ_READ] = ZeroRead;
DriverObject->MajorFunction[IRP_MJ_WRITE] = ZeroWrite;
Now we need to create the device object and symbolic link and handle errors in a more general and
robust way. The trick we’ll use is a do / while(false) block, which is not really a loop, but it allows
getting out of the block with a simple break statement in case something goes wrong:
do {
status = IoCreateDevice(DriverObject, 0, &devName, FILE_DEVICE_UNKNOWN,
0, FALSE, &DeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to create device (0x%08X)\n", status));
break;
}
if (!NT_SUCCESS(status)) {
Chapter 7: The I/O Request Packet 220
if (DeviceObject)
IoDeleteDevice(DeviceObject);
}
return status;
The pattern is simple: if an error occurs in any call, just break out of the “loop”. Outside the loop, check
the status, and if it’s a failure, undo any operations done so far. With this scheme in hand, it’s easy to
add more initializations (which we’ll need in more complex drivers), while keeping the cleanup code
localized and appearing just once.
It’s possible to use goto statements instead of the do / while(false) approach, but as the great
Dijkstra wrote, “goto considered harmful”, so I tend to avoid it if I can.
Notice we’re also initializing the device to use Direct I/O for our read and write operations.
Notice the default values for the status and information. The Create/Close dispatch routine implemen-
tation becomes almost trivial:
Note that the length of the user’s buffer is provided through the Parameters.Read member inside
the current I/O stack location.
We have configured Direct I/O, so we need to map the locked buffer to system space using
MmGetSystemAddressForMdlSafe:
The functionality we need to implement is to zero out the given buffer. We can use a simple memset
call to fill the buffer with zeros and then complete the request:
memset(buffer, 0, len);
If you prefer a more “fancy” function to zero out memory, call RtlZeroMemory. It’s a macro, defined
in terms of memset.
It’s important to set the Information field to the length of the buffer. This indicates to the client the
number of bytes transferred in the operation (returned in the second to last parameter to ReadFile).
This is all we need for the read operation.
Note that we don’t even bother calling MmGetSystemAddressForMdlSafe, as we don’t need to access
the actual buffer. This is also the reason this call is not made beforehand by the I/O manager: the
driver may not even need it, or perhaps need it in certain conditions only; so the I/O manager prepares
everything (the MDL) and lets the driver decide when and if to map the buffer.
Test Application
We’ll add a new console application project to the solution to test the read and write operations.
Here is some simple code to test these operations:
int main() {
HANDLE hDevice = CreateFile(L"\\\\.\\Zero", GENERIC_READ | GENERIC_WRITE,
0, nullptr, OPEN_EXISTING, 0, nullptr);
if (hDevice == INVALID_HANDLE_VALUE) {
return Error("Failed to open device");
}
// test read
BYTE buffer[64];
DWORD bytes;
BOOL ok = ReadFile(hDevice, buffer, sizeof(buffer), &bytes, nullptr);
if (!ok)
return Error("failed to read");
if (bytes != sizeof(buffer))
printf("Wrong number of bytes\n");
Chapter 7: The I/O Request Packet 223
// test write
BYTE buffer2[1024]; // contains junk
ok = WriteFile(hDevice, buffer2, sizeof(buffer2), &bytes, nullptr);
if (!ok)
return Error("failed to write");
if (bytes != sizeof(buffer2))
printf("Wrong byte count\n");
CloseHandle(hDevice);
}
Read/Write Statistics
Let’s add some more functionality to the Zero driver. We may want to count the total bytes
read/written throughout the lifetime of the driver. A user-mode client should be able to read these
statistics, and perhaps even zero them out.
We’ll start by defining two global variables to keep track of the total number of bytes read/written (in
Zero.cpp):
You could certainly put these in a structure for easier maintenance and extension. The long long
C++ type is a signed 64-bit value. You can add unsigned if you wish, or use a typedef such as LONG64
or ULONG64, which would mean the same thing. Since these are global variables, they are zeroed out
by default.
We’ll create a new file that contains information common to user-mode clients and the driver called
ZeroCommon.h. here is where we define the control codes we support, as well as data structures to
be shared with user-mode.
First, we’ll add two control codes: one for getting the stats and another for clearing them:
Chapter 7: The I/O Request Packet 224
#define IOCTL_ZERO_GET_STATS \
CTL_CODE(DEVICE_ZERO, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_ZERO_CLEAR_STATS \
CTL_CODE(DEVICE_ZERO, 0x801, METHOD_NEITHER, FILE_ANY_ACCESS)
The DEVICE_ZERO is defined as some number from 0x8000 as the documentation recommends. The
function number starts with 0x800 and incremented with each control code. METHOD_BUFFERED is
used for getting the stats, as the size of the returned data is small (2 x 8 bytes). Clearing the stats
requires no buffers, so METHOD_NEITHER is selected.
Next, we’ll add a structure that can be used by clients (and the driver) for storing the stats:
struct ZeroStats {
long long TotalRead;
long long TotalWritten;
};
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = ZeroDeviceControl;
The details for IRP_MJ_DEVICE_CONTROL are located in the current I/O stack location in the
Parameters.DeviceIoControl structure. The status is initialized to an error in case the control code
provided is unsupported. len keeps track of the number of valid bytes returned in the output buffer.
Implementing the IOCTL_ZERO_GET_STATS is done in the usual way. First, check for errors. If all goes
well, the stats are written to the output buffer:
Chapter 7: The I/O Request Packet 225
switch (dic.IoControlCode) {
case IOCTL_ZERO_GET_STATS:
{ // artificial scope so the compiler does not complain
// about defining variables skipped by a case
if (dic.OutputBufferLength < sizeof(ZeroStats)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
Once out of the switch, the IRP would be completed. Here is the stats clearing Ioctl handling:
case IOCTL_ZERO_CLEAR_STATS:
g_TotalRead = g_TotalWritten = 0;
status = STATUS_SUCCESS;
break;
}
All that’s left to do is complete the IRP with whatever the status and length values are:
switch (dic.IoControlCode) {
case IOCTL_ZERO_GET_STATS:
{
if (dic.OutputBufferLength < sizeof(ZeroStats)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
case IOCTL_ZERO_CLEAR_STATS:
g_TotalRead = g_TotalWritten = 0;
status = STATUS_SUCCESS;
break;
}
The stats have to be updated when data is read/written. It must be done in a thread safe way, as
multiple clients may bombard the driver with read/write requests. Here is the updated ZeroWrite
function:
Chapter 7: The I/O Request Packet 227
Astute readers may question the safety of the Ioctl implementations. For example, is reading the
total number of bytes read/written with no multithreaded protection (while possible read/write
operations are in effect) a correct operation, or is it a data race? Technically, it’s a data race, as the
driver might be updating to the stats globals while some client is reading the values, that could
result in torn reads. One way to resolve that is by dispensing with the interlocked instructions and
use a mutex or a fast mutex to protect access to these variables. Alternatively, There are functions
to deal with these scenario, such as ReadAcquire64. Their implementation is CPU dependent. For
x86/x64, they are actually normal reads, as the processor provides safety against such torn reads. On
ARM CPUs, this requires a memory barrier to be inserted (memory barriers are beyond the scope
of this book).
Save the number of bytes read/written to the Registry before the driver unloads. Read it
back when the driver loads.
Replace the Interlocked instructions with a fast mutex to protect access to the stats.
ZeroStats stats;
if (!DeviceIoControl(hDevice, IOCTL_ZERO_GET_STATS,
nullptr, 0, &stats, sizeof(stats), &bytes, nullptr))
return Error("failed in DeviceIoControl");
Summary
In this chapter, we learned how to handle IRPs, which drivers deal with all the time. Armed with
this knowledge, we can start leveraging more kernel functionality, starting with process and thread
callbacks in chapter 9. Before getting to that, however, there are more techniques and kernel APIs that
may be useful for a driver developer, described in the next chapter.
Chapter 8: Advanced Programming
Techniques (Part 1)
In this chapter we’ll examine various techniques of various degrees of usefulness to driver developers.
In this chapter:
NTSTATUS PsCreateSystemThread(
_Out_ PHANDLE ThreadHandle,
_In_ ULONG DesiredAccess,
_In_opt_ POBJECT_ATTRIBUTES ObjectAttributes,
_In_opt_ HANDLE ProcessHandle,
_Out_opt_ PCLIENT_ID ClientId,
_In_ PKSTART_ROUTINE StartRoutine,
_In_opt_ PVOID StartContext);
Both functions have the same set of parameters except the additional first parameter to IoCreateSystemThread.
The latter function takes an additional reference on the object passed in (which must be a device
object or a driver object), so the driver is not unloaded prematurely while the thread is alive.
IoCreateSystemThread is only available for Windows 8 and later systems. Here is a description
of the other parameters:
• ThreadHandle is the address of a handle to the created thread if successful. The driver must
use ZwClose to close the handle at some point.
• DesiredAccess is the access mask requested. Drivers should simply use THREAD_ALL_ACCESS
to get all possible access with the resulting handle.
• ObjectAttributes is the standard OBJECT_ATTRIBUTES structure. Most members have no
meaning for a thread. The most common attributes to request of the returned handle is
OBJ_KERNEL_HANDLE, but it’s not needed if the thread is to be created in the System process -
just pass NULL, which will always return a kernel handle.
• ProcessHandle is a handle to the process where this thread should be created. Drivers should
pass NULL to indicate the thread should be part of the System process so it’s not tied to any
specific process’ lifetime.
• ClientId is an optional output structure, providing the process and thread ID of the newly
created thread. In most cases, this information is not needed, and NULL can be specified.
• StartRoutine is the function to execute in a separate thread of execution. This function must
have the following prototype:
The StartContext value is provided by the last parameter to Ps/IoCreateSystemThread. This could
be anything (or NULL) that would give the new thread data to work with.
The function indicated by StartRoutine will start execution on a separate thread. It’s executed with
the IRQL being PASSIVE_LEVEL (0) in a critical region (where normal kernel APCs are disabled).
For PsCreateSystemThread, exiting the thread function is not enough to terminate the thread. An
explicit call to PsTerminateSystemThread is required to properly manage the thread’s lifetime:
The exit status is the exit code of the thread, which can be retrieved with PsGetThreadExitStatus
if desired.
Chapter 8: Advanced Programming Techniques (Part 1) 230
Memory Management
We have looked at the most common functions for dynamic memory allocation in chapter 3. The
most useful is ExAllocatePoolWithTag, which we have used multiple times in previous chapters.
There are other functions for dynamic memory allocation you might find useful. Then, we’ll examine
lookaside lists, that allow more efficient memory management if fixed-size chunks are needed.
Pool Allocations
In addition to ExAllocatePoolWithTag, the Executive provides an extended version that indicates
the importance of an allocation, taken into account in low memory conditions:
PVOID ExAllocatePoolWithTagPriority (
_In_ POOL_TYPE PoolType,
_In_ SIZE_T NumberOfBytes,
_In_ ULONG Tag,
_In_ EX_POOL_PRIORITY Priority);
The priority-related values indicate the importance of succeeding an allocation if system mem-
ory is low (LowPoolPriority), very low (NormalPoolPriority), or completely out of memory
(HighPoolPriority). In any case, the driver should be prepared to handle a failure.
Chapter 8: Advanced Programming Techniques (Part 1) 231
The “special pool” values tell the Executive to make the allocation at the end of a page (“Overrun”
values) or beginning of a page (“Underrun”) values, so it’s easier to catch buffer overflow or underflow.
These values should only be used while tracking memory corruptions, as each allocation costs at least
one page.
Starting with Windows 10 version 1909 (and Windows 11), two new pool allocation functions are
supported. The first is ExAllocatePool2 declared like so:
PVOID ExAllocatePool2 (
_In_ POOL_FLAGS Flags,
_In_ SIZE_T NumberOfBytes,
_In_ ULONG Tag);
Where the POOL_FLAGS enumeration consists of a combination of values shown in table 8-1:
The Must recognize? column indicates whether failure to recognize or satisfy the flag causes the
function to fail.
The second allocation function, ExAllocatePool3, is extensible, so new functions of this sort are
unlikely to pop up in the future:
Chapter 8: Advanced Programming Techniques (Part 1) 232
PVOID ExAllocatePool3 (
_In_ POOL_FLAGS Flags,
_In_ SIZE_T NumberOfBytes,
_In_ ULONG Tag,
_In_reads_opt_(ExtendedParametersCount)
PCPOOL_EXTENDED_PARAMETER ExtendedParameters,
_In_ ULONG ExtendedParametersCount);
This function allows customization with an array of “parameters”, where the supported parameter
types may be extended in future kernel versions. The currently available parameters are defined with
the POOL_EXTENDED_PARAMETER_TYPE enumeration:
union {
ULONG64 Reserved2;
PVOID Reserved3;
EX_POOL_PRIORITY Priority;
POOL_EXTENDED_PARAMS_SECURE_POOL* SecurePoolParams;
POOL_NODE_REQUIREMENT PreferredNode; // ULONG
};
} POOL_EXTENDED_PARAMETER, *PPOOL_EXTENDED_PARAMETER;
The Type member indicates which of the union members is valid for this parameter (POOL_-
EXTENDED_PARAMETER_TYPE). Optional indicates if the parameter set is optional or required. An
optional parameter that fails to be satisfied does not cause the ExAllocatePool3 to fail. Based on
Type, the correct member in the union must be set. Currently, these parameters are available:
Chapter 8: Advanced Programming Techniques (Part 1) 233
The following example shows using ExAllocatePool3 to achieve the same effect as
ExAllocatePoolWithTagPriority for non-paged memory:
Secure Pools
Secure pools introduced in Windows 10 version 1909 allow kernel callers to have a memory pool
that cannot be accessed by other kernel components. This kind of protection is internally achieved by
the Hyper-V hypervisor, leveraging its power to protect memory access even from the kernel, as the
memory is part of Virtual Trust Level (VTL) 1 (the secure world). Currently, secure pools are not fully
documented, but here are the basic steps to use a secure pool.
Secure pools are only available if Virtualization Based Security (VBS) is active (meaning
Hyper-V exists and creates the two worlds - normal and secure). Discussion of VBS is
beyond the scope of this book. Consult information online (or the Windows Internals
books) for more on VBS.
A secure pool can be created with ExCreatePool, returning a handle to the pool:
NTSTATUS ExCreatePool (
_In_ ULONG Flags,
_In_ ULONG_PTR Tag,
_In_opt_ POOL_CREATE_EXTENDED_PARAMS* Params,
_Out_ HANDLE* PoolHandle);
Chapter 8: Advanced Programming Techniques (Part 1) 234
Buffer points to existing data to be initially stored in the new allocation. Cookie is used for validation,
by calling ExSecurePoolValidate. Freeing memory from a secure pool must be done with a new
function, ExFreePool2:
VOID ExFreePool2 (
_Pre_notnull_ PVOID P,
_In_ ULONG Tag,
_In_reads_opt_(ExtendedParametersCount)
PCPOOL_EXTENDED_PARAMETER ExtendedParameters,
_In_ ULONG ExtendedParametersCount);
NTSTATUS ExSecurePoolUpdate (
_In_ HANDLE SecurePoolHandle,
_In_ ULONG Tag,
_In_ PVOID Allocation,
_In_ ULONG_PTR Cookie,
_In_ SIZE_T Offset,
_In_ SIZE_T Size,
_In_ PVOID Buffer);
• new causes a constructor to be invoked, and delete causes the destructor to be invoked.
• new accepts a type for which memory must be allocated, rather than specifying a number of
bytes.
Fortunately, C++ allows overloading the new and delete operators, either globally or for secific types.
new can be overloaded with extra parameters that are needed for kernel allocations - at least the pool
type must be specified. The first argument to any overloaded new is the number of bytes to allocate,
and any extra parameters can be added after that. These are specified with paranthesis when actually
used. The compiler inserts a call to the appropriate constructor, if exists.
Here is a basic implementation of an overloaded new operator that calls ExAllocatePoolWithTag:
The __cdecl modifier indicates this should be using the C calling convention (rather than the __-
stdcall convention). It only matters in x86 builds, but still should be specified as shown.
Here is an example usage, assuming an object of type MyData needs to be allocated from paged pool:
The size parameter is never specified explicitly as the compiler inserts the correct size (which is
essentially sizeof(MyData) in the above example). All other parameters must be specified. We can
make the overload simpler to use if we default the tag to a macro such as DRIVER_TAG, expected to
exist:
In the above examples, the default constructor is invoked, but it’s perfectly valid to invoke any other
constructor that exists for the type. For example:
struct MyData {
MyData(ULONG someValue);
// details not shown
};
We can easily extend the overloading idea to other overloads, such as one that wraps ExAllocatePoolWithTagPrior
Another common case is where you already have an allocated block of memory to store some object
(perhaps allocated by a function out of your control), but you still want to initialize the object by
invoking a constructor. Another new overload can be used for this purpose, known as placement new,
since it does not allocate anything, but the compiler still adds a call to a constructor. Here is how to
define a placement new operator overload:
void* SomeFunctionAllocatingObject();
Finally, an overload for delete is required so the memory can be freed at some point, calling the
destructor if it exists. Here is how to overload the delete operator:
Chapter 8: Advanced Programming Techniques (Part 1) 237
The extra size parameter is not used in practice (zero is always the value provided), but the compiler
requires it.
Remember that you cannot have global objects that have default constructors that do
something, since there is no runtime to invoke them. The compiler will report a warning
if you try. A way around it (of sorts) is to declare the global variable as a pointer, and then
use an overloaded new to allocate and invoke a constructor in DriverEntry. of course,
you must remember to call delete in the driver’s unload routine.
Another variant of the delete operator the compiler might insist on if you set the compiler
conformance to C++17 or newer is the following:
You can look up the meaning of std::align_val_t in a C++ reference, but it does not
matter for our purposes.
Lookaside Lists
The dynamic memory allocation functions discussed so far (the ExAllocatePool* family of APIs)
are generic in nature, and can accommodate allocations of any size. Internally, managing the pool is
non-trivial: various lists are needed to manage allocations and deallocations of different sizes. This
management aspect of the pools is not free.
One fairly common case that leaves room for optimizations is when fixed-sized allocations are needed.
When such allocation is freed, it’s possible to not really free it, but just mark it as available. The next
allocation request can be satisfied by the existing block, which is much faster to do than allocating a
fresh block. This is exactly the purpose of lookaside lists.
There are two APIs to use for working with lookaside lists. The original one, available from Windows
2000, and a newer available from Vista. I’ll describe both, as they are quite similar.
VOID ExInitializePagedLookasideList (
_Out_ PPAGED_LOOKASIDE_LIST Lookaside,
_In_opt_ PALLOCATE_FUNCTION Allocate,
_In_opt_ PFREE_FUNCTION Free,
_In_ ULONG Flags,
_In_ SIZE_T Size,
_In_ ULONG Tag,
_In_ USHORT Depth);
The non-paged variant is practically the same, with the function name being ExInitializeNPagedLookasideList.
The first parameter is the resulting initialized structure. Although, the structure layout is described in
wdm.h (with a macro named GENERAL_LOOKASIDE_LAYOUT to accommodate multiple uses that can’t
be shared in other ways using the C language), you should treat this structure as opaque.
The Allocate parameter is an optional allocation function that is called by the lookaside implementa-
tion when a new allocation is required. If specified, the allocation function must have the following
prototype:
PVOID AllocationFunction (
_In_ POOL_TYPE PoolType,
_In_ SIZE_T NumberOfBytes,
_In_ ULONG Tag);
The allocation function receives the same parameters as ExAllocatePoolWithTag. In fact, if the
allocation function is not specified, this is the call made by the lookaside list manager. If you don’t
require any other code, just specify NULL. A custom allocation function could be useful for debugging
purposes, for example. Another possibility is to call ExAllocatePoolWithTagPriority instead of
ExAllocatePoolWithTag, if that makes sense for your driver.
If you do provide an allocation function, you might need to provide a de-allocation function in the
Free parameter. If not specified, the lookaside list manager calls ExFreePool. Here is the expected
prototype for this function:
VOID FreeFunction (
_In_ __drv_freesMem(Mem) PVOID Buffer);
Once a lookaside list is initialized, you can request a memory block (of the size specified in the
initialization function, of course) by calling ExAllocateFromPagedLookasideList:
PVOID ExAllocateFromPagedLookasideList (
_Inout_ PPAGED_LOOKASIDE_LIST Lookaside)
Nothing could be simpler - no special parameters are required, since everything else is already known.
The corresponding function for a non-paged pool lookaside list is ExAllocateFromNPagedLookasideList.
The opposite function used to free an allocation (or return it to the cache) is ExFreeToPagedLookasideList:
VOID ExFreeToPagedLookasideList (
_Inout_ PPAGED_LOOKASIDE_LIST Lookaside,
_In_ __drv_freesMem(Mem) PVOID Entry)
The only value required is the pointer to free (or return to the cache). As you probably guess, the
non-paged pool variant is ExFreeToNPagedLookasideList.
Finally, when the lookaside list is no longer needed, it must be freed by calling ExDeletePagedLookasideList:
VOID ExDeletePagedLookasideList (
_Inout_ PPAGED_LOOKASIDE_LIST Lookaside);
One nice benefit of lookaside lists is that you don’t have to return all allocations to the list by
repeatedly calling ExFreeToPagedLookasideList before calling ExDeletePagedLookasideList;
the latter is enough, and will free all allocated blocks automatically. ExDeleteNPagedLookasideList
is the corresponding non-paged variant.
Write a C++ class wrapper for lookaside lists using the above APIs.
NTSTATUS ExInitializeLookasideListEx (
_Out_ PLOOKASIDE_LIST_EX Lookaside,
_In_opt_ PALLOCATE_FUNCTION_EX Allocate,
_In_opt_ PFREE_FUNCTION_EX Free,
_In_ POOL_TYPE PoolType,
_In_ ULONG Flags,
_In_ SIZE_T Size,
_In_ ULONG Tag,
_In_ USHORT Depth);
PLOOKASIDE_LIST_EX is the opaque data structure to initialize, which must be allocated from non-
paged memory, regardless of whether the lookaside list is to manage paged or non-paged memory.
The allocation and free functions are optional, just as they are with the classic API. These are their
prototypes:
PVOID AllocationFunction (
_In_ POOL_TYPE PoolType,
_In_ SIZE_T NumberOfBytes,
_In_ ULONG Tag,
_Inout_ PLOOKASIDE_LIST_EX Lookaside);
VOID FreeFunction (
_In_ __drv_freesMem(Mem) PVOID Buffer,
_Inout_ PLOOKASIDE_LIST_EX Lookaside);
Notice the lookaside list itself is a parameter. This could be used to access driver data that is part
of a larger structure containing the lookaside list. For example, suppose the driver has the following
structure:
struct MyData {
ULONG SomeData;
LIST_ENTRY SomeHead;
LOOKASIDELIST_EX Lookaside;
};
The driver creates an instance of that structure (maybe globally, or on a per-client basis). Let’s assume
it’s created dynamically for every client creating a file object to talk to a device the driver manages:
Chapter 8: Advanced Programming Techniques (Part 1) 241
In the allocation and free functions, we can get a pointer to our MyData object that contains whatever
lookaside list is being used at the time:
The usefulness of this technique is if you have multiple lookaside lists, each one could have their own
“context” data. Obviously, if you just have one such list stored globally, you can just access whatever
global variables you need.
Continuing with ExInitializeLookasideListEx - PoolType is the pool type to use; this is where the
driver selects where allocations should be made from. Size, Tag and Depth have the same meaning as
they do in the classic API.
The Flags parameter can be zero, or one of the following:
Once the lookaside list is initialized, allocation and deallocation are done with the following APIs:
Chapter 8: Advanced Programming Techniques (Part 1) 242
Of course, the terms “allocation” and “deallocation” are in the context of a lookaside list, meaning
allocations could be reused, and deallocations might return the block to the cache.
Finally, a lookaside list must be deleted with ExDeleteLookasideListEx:
PIRP IoBuildDeviceIoControlRequest(
_In_ ULONG IoControlCode,
_In_ PDEVICE_OBJECT DeviceObject,
_In_opt_ PVOID InputBuffer,
_In_ ULONG InputBufferLength,
_Out_opt_ PVOID OutputBuffer,
_In_ ULONG OutputBufferLength,
_In_ BOOLEAN InternalDeviceIoControl,
_In_opt_ PKEVENT Event,
_Out_ PIO_STATUS_BLOCK IoStatusBlock);
The API returns a proper IRP pointer on success, including filling in the first IO_STACK_LOCATION, or
NULL on failure. Some of the parameters to IoBuildDeviceIoControlRequest are the same provided
Chapter 8: Advanced Programming Techniques (Part 1) 243
• DeviceObject is the target device of this request. It’s needed so the API can allocate the correct
number of IO_STACK_LOCATION structures that accompany any IRP.
• InternalDeviceControl indicates whether the IRP should set its major function to IRP_-
MJ_INTERNAL_DEVICE_CONTROL (TRUE) or IRP_MJ_DEVICE_CONTROL (FALSE). This obviously
depends on the target device’s expectations.
• Event is an optional pointer to an event object that gets signaled when the IRP is completed
by the target device (or some other device the target may send the IRP to). An event is needed
if the IRP is sent for synchronous processing, so that the caller can wait on the event if the
operation has not yet completed. We’ll see a complete example in the next section.
• IoStatusBlock returns the final status of the IRP (status and information), so the caller can
examine it if it so wishes.
The call to IoBuildDeviceIoControlRequest just builds the IRP - it is not sent anywhere at this
point. To actually send the IRP to a device, call the generic IoCallDriver API:
NTSTATUS IoCallDriver(
_In_ PDEVICE_OBJECT DeviceObject,
_Inout_ PIRP Irp);
IoCallDriver advances the current I/O stack location to the next, and then invokes the target driver’s
major function dispatch routine. It returns whatever is returned from that dispatch routine. Here is a
very simplified implementation:
return (DeviceObject->DriverObject->MajorFunction[irpSp->MajorFunction])
(DeviceObject, Irp);
}
The main question remaining is how to we get a pointer to a device object in the first place? One way
is by calling IoGetDeviceObjectPointer:
Chapter 8: Advanced Programming Techniques (Part 1) 244
NTSTATUS IoGetDeviceObjectPointer(
_In_ PUNICODE_STRING ObjectName,
_In_ ACCESS_MASK DesiredAccess,
_Out_ PFILE_OBJECT *FileObject,
_Out_ PDEVICE_OBJECT *DeviceObject);
The ObjectName parameter is the fully-qualified name of the device object in the Object Manager’s
namespace (as can be viewed with the WinObj tool from Sysinternals). Desired access is usually
FILE_READ_DATA, FILE_WRITE_DATA or FILE_ALL_ACCESS. Two values are returned on success: the
device object pointer (in DeviceObject) and an open file object pointing to the device object (in
FileObject).
The file object is not usually needed, but it should be kept around as a means of keeping the device
object referenced. When you’re done with the device object, call ObDereferenceObject on the file
object pointer to decrement the device object’s reference count indirectly. Alternatively, you can
increment the device object’s reference count (ObReferenceObject) and then decrement the file
object’s reference count so you don’t have to keep it around.
The next section demostrates usage of these APIs.
It’s possible to come up with a user-mode solution that would do essentially the same thing, but this
can only be easily done in the context of a single process. A driver, on the other hand, can accept
calls from multiple processes, having a “global” ordering of playback. In any case, the point is to
demonstrate driver programming techniques, rather than managing a sound playing scenario.
We’ll start by creating an empty WDM driver, as we’ve done in previous chapters, named KMelody.
Then we’ll add a file named MelodyPublic.h to serve as the common data to the driver and a user-mode
client. This is where we define what a note looks like and an I/O control code for communication:
Chapter 8: Advanced Programming Techniques (Part 1) 245
// MelodyPublic.h
#pragma once
struct Note {
ULONG Frequency;
ULONG Duration;
ULONG Delay{ 0 };
ULONG Repeat{ 1 };
};
#define IOCTL_MELODY_PLAY \
CTL_CODE(MELODY_DEVICE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
A note consists of a frequency (in Hertz) and duration to play. To make it a bit more interesting, a
delay and repeat count are added. If Repeat is greater than one, the sound is played Repeat times,
with a delay of Delay between repeats. Duration and Delay are provided in milliseconds.
The architecture we’ll go for in the driver is to have a thread created when the first client opens a
handle to our device, and that thread will perform the playback based on a queue of notes the driver
manages. The thread will be shut down when the driver unloads.
It may seem asymmetric at this point - why not create the thread when the driver loads? As we
shall see shortly, there is a little “snag” that we have to deal with that prevents creating the thread
when the driver loads.
Let’s start with DriverEntry. It needs to create a device object and a symbolic link. Here is the full
function:
PlaybackState* g_State;
do {
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Device\\KMelody");
status = IoCreateDevice(DriverObject, 0, &name, FILE_DEVICE_UNKNOWN,
0, FALSE, &DeviceObject);
if (!NT_SUCCESS(status))
break;
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "Error (0x%08X)\n", status));
delete g_State;
if (DeviceObject)
IoDeleteDevice(DeviceObject);
return status;
}
DriverObject->DriverUnload = MelodyUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = MelodyCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = MelodyDeviceControl;
return status;
}
Most of the code should be familiar by now. The only new code is the creation of an object of type
PlaybackState. The new C++ operator is overloaded as described earlier in this chapter. If allocating
a PlaybackState instance fails, DriverEntry returns STATUS_INSUFFICIENT_RESOURCES, reporting
a failure to the kernel.
The PlaybackState class is going to manage the list of notes to play and most other functionality
specific to the driver. Here is its declaration (in PlaybackState.h):
Chapter 8: Advanced Programming Techniques (Part 1) 247
struct PlaybackState {
PlaybackState();
~PlaybackState();
private:
static void PlayMelody(PVOID context);
void PlayMelody();
LIST_ENTRY m_head;
FastMutex m_lock;
PAGED_LOOKASIDE_LIST m_lookaside;
KSEMAPHORE m_counter;
KEVENT m_stopEvent;
HANDLE m_hThread{ nullptr };
};
m_head is the head of the linked list holding the notes to play. Since multiple threads can access
this list, it must be protected with a synchronization object. In this case, we’ll go with a fast mutex.
FastMutex is a wrapper class similar to the one we saw in chapter 6, with the added twist that it’s
initialized in its constructor rather than a separate Init method. This is convenient, and possible,
because PlaybackState is allocated dynamically, causing its constructor to be invoked, along with
constructors for data members (if any).
The note objects will be allocated from a lookaside list (m_lookaside), as each note has a fixed size,
and there is a strong likelihood of many notes coming and going. m_stopEvent is an event object that
will be used as a way to signal our playback thread to terminate. m_hThread is the playback thread
handle. Finally, m_counter is a semaphore that is going to be used in a somewhat counter-intuitive
way, its internal count indicating the number of notes in the queue.
As you can see, the event and semaphore don’t have wrapper classes, so we need to initialize them in
the PlaybackState constructor. Here is the constructor in full (in PlaybackState.cpp) with an addition
of a type that is going to hold a single node:
Chapter 8: Advanced Programming Techniques (Part 1) 248
PlaybackState::PlaybackState() {
InitializeListHead(&m_head);
KeInitializeSemaphore(&m_counter, 0, 1000);
KeInitializeEvent(&m_stopEvent, SynchronizationEvent, FALSE);
ExInitializePagedLookasideList(&m_lookaside, nullptr, nullptr, 0,
sizeof(FullNote), DRIVER_TAG, 0);
}
Before the driver finally unloads, the PlaybackState object is going to be destroyed, invoking its
destructor:
PlaybackState::~PlaybackState() {
Stop();
ExDeletePagedLookasideList(&m_lookaside);
}
The call to Stop signals the playback thread to terminate as we’ll see shortly. The only other thing
left to do in terms of cleanup is to free the lookaside list.
The unload routine for the driver is similar to ones we’ve seen before with the addition of freeing the
PlaybackState object:
Chapter 8: Advanced Programming Techniques (Part 1) 249
The IRP_MJ_DEVICE_CONTROL handler is where notes provided by a client need to be added to the
queue of notes to play. The implementation is pretty straightforward because the heavy lifting is
performed by the PlaybackState::AddNotes method. Here is MelodyDeviceControl that validates
the client’s data and then invokes AddNotes:
switch (dic.IoControlCode) {
case IOCTL_MELODY_PLAY:
if (dic.InputBufferLength == 0 ||
dic.InputBufferLength % sizeof(Note) != 0) {
status = STATUS_INVALID_BUFFER_SIZE;
break;
}
auto data = (Note*)Irp->AssociatedIrp.SystemBuffer;
if (data == nullptr) {
status = STATUS_INVALID_PARAMETER;
break;
}
status = g_State->AddNotes(data,
dic.InputBufferLength / sizeof(Note));
if (!NT_SUCCESS(status))
break;
info = dic.InputBufferLength;
break;
}
return CompleteRequest(Irp, status, info);
}
CompleteRequest is a helper that we’ve seen before that completes the IRP with the given status and
information:
Chapter 8: Advanced Programming Techniques (Part 1) 250
PlaybackState::AddNotes needs to iterate over the provided notes. Here is the beginning of the
function:
For each note, it needs to allocate a FullNote structure from the lookaside list:
If succesful, the note data is copied to the FullNote and is added to the linked list under the protection
of the fast mutex:
//
// copy the data from the Note structure
//
memcpy(fullNote, ¬es[i], sizeof(Note));
//
// insert into the linked list
//
Locker locker(m_lock);
InsertTailList(&m_head, &fullNote->Link);
}
Locker<T> is the same type we looked at in chapter 6. The notes are inserted at the back of the list
with InsertTailList. This is where we must provide a pointer to a LIST_ENTRY object, which is why
FullNote objects are used instead of just Note. Finally, when the loop is completed, the semaphore
must be incremented by the number of notes to indicate there are count more notes to play:
Chapter 8: Advanced Programming Techniques (Part 1) 251
//
// make the semaphore signaled (if it wasn't already) to
// indicate there are new note(s) to play
//
KeReleaseSemaphore(&m_counter, 2, count, FALSE);
KdPrint((DRIVER_PREFIX "Semaphore count: %u\n",
KeReadStateSemaphore(&m_counter)));
The value 2 used in KeReleaseSemaphore is the temporary priority boost a driver can provide to a
thread that is released because of the semaphore becoming signaled (the same thing happens with the
second parameter to IoCompleteRequest). I’ve chosen the value 2 arbitrarily. The value 0 (IO_NO_-
INCREMENT) is fine as well.
For debugging purposes, it may be useful to read the semaphore’s count with KeReadStateSemaphore
as was done in the above code. Here is the full function (without the comments):
Locker locker(m_lock);
InsertTailList(&m_head, &fullNote->Link);
}
KeReleaseSemaphore(&m_counter, 2, count, FALSE);
KdPrint((DRIVER_PREFIX "Semaphore count: %u\n",
KeReadStateSemaphore(&m_counter)));
return STATUS_SUCCESS;
}
The next part to look at is handling IRP_MJ_CREATE and IRP_MJ_CLOSE. In earlier chapters, we just
completed these IRPs successfully and that was it. This time, we need to create the playback thread
when the first client opens a handle to our device. The initialization in DriverEntry points both
indices to the same function, but the code is slightly different between the two. We could separate
them to different functions, but if the difference is not great we might decide to handle both within
the same function.
For IRP_MJ_CLOSE, there is nothing to do but complete the IRP successfuly. For IRP_MJ_CREATE, we
want to start the playback thread the first time the dispatch routine is invoked. Here is the code:
Chapter 8: Advanced Programming Techniques (Part 1) 252
The I/O stack location contains the IRP major function code we can use to make the distinction as
required here. In the Create case, we call PlaybackState::Start with the device object pointer that
would be used to keep the driver object alive as long as the thread is running. Let’s see what that
method looks like.
return IoCreateSystemThread(
IoObject, // Driver or device object
&m_hThread, // resulting handle
THREAD_ALL_ACCESS, // access mask
nullptr, // no object attributes required
NtCurrentProcess(), // create in the current process
nullptr, // returned client ID
PlayMelody, // thread function
this); // passed to thread function
}
Acquiring the fast mutex ensures that a second thread is not created (as m_hThread would al-
ready be non-NULL). The thread is created with IoCreateSystemThread, which is preferred over
PsCreateSystemThread because it ensures that the driver is not unloaded while the thread is
executing (this does require Windows 8 or later).
The passed-in I/O object is the device object provided by the IRP_MJ_CREATE handler. The most
common way of creating a thread by a driver is to run it in the context of the System process, as it
normally should not be tied to a user-mode process. Our case, however, is more complicated because
we intend to use the Beep driver to play the notes. The Beep driver needs to be able to handle multiple
users (that might be connected to the same system), each one playing their own sounds. This is why
when asked to play a note, the Beep driver plays in the context of the caller’s session. If we create
the thread in the System process, which is always part of session zero, we will not hear any sound,
because session 0 is not an interactive user session.
Chapter 8: Advanced Programming Techniques (Part 1) 253
This means we need to create our thread in the context of some process running under the caller’s
session - Using the caller’s process directly (NtCurrentProcess) is the simplest way to get it working.
You may frown at this, and rightly so, because the first process calling the driver to play something is
going to have to host that thread for the lifetime of the driver. This has an unintended side effect: the
process will not die. Even if it may seem to terminate, it will still show up in Task Manager with our
thread being the single thread still keeping the process alive. We’ll find a more elegant solution later
in this chapter.
Yet another consequence of this arrangement is that we only handle one session - the first one where
one of its processes happens to call the driver. We’ll fix that as well later on.
The thread created starts running the PlayMelody function - a static function in the PlaybackState
class. Callbacks must be global or static functions (because they are directly C function pointers), but
in this case we would like to access the members of this instance of PlaybackState. The common
trick is to pass the this pointer as the thread argument, and the callback simply invokes an instance
method using this pointer:
// static function
void PlaybackState::PlayMelody(PVOID context) {
((PlaybackState*)context)->PlayMelody();
}
Now the instance method PlaybackState::PlayMelody has full access to the object’s members.
There is another way to invoke the instance method without going through the inter-
mediate static by using C++ lambda functions, as non-capturing lambdas are directly
convertible to C function pointers:
order of business in the new thread is to obtain a pointer to the Beep device using IoGetDeviceObjectPointer:
#include <ntddbeep.h>
void PlaybackState::PlayMelody() {
PDEVICE_OBJECT beepDevice;
UNICODE_STRING beepDeviceName = RTL_CONSTANT_STRING(DD_BEEP_DEVICE_NAME_U);
PFILE_OBJECT beepFileObject;
auto status = IoGetDeviceObjectPointer(&beepDeviceName, GENERIC_WRITE,
&beepFileObject, &beepDevice);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "Failed to locate beep device (0x%X)\n",
status));
Chapter 8: Advanced Programming Techniques (Part 1) 254
return;
}
The Beep device name is \Device\Beep as we’ve seen in chapter 2. Conveniently, the provided header
ntddbeep.h declares everything we need in order to work with the device, such as the DD_BEEP_-
DEVICE_NAME_U macro that defines the Unicode name.
At this point, the thread should loop around while it has notes to play and has not been instructed
to terminate. This is where the semaphore and the event come in. The thread must wait until one of
them is signaled. If it’s the event, it should break out of the loop. If it’s the semaphore, it means the
semaphore’s count is greater than zero, which in turn means the list of notes is not empty:
for (;;) {
status = KeWaitForMultipleObjects(2, objects, WaitAny, Executive,
KernelMode, FALSE, nullptr, nullptr);
if (status == STATUS_WAIT_1) {
KdPrint((DRIVER_PREFIX "Stop event signaled. Exiting thread...\n"));
break;
}
The required fucntion call is to KeWaitForMultipleObjects with the event and semaphore. They
are put in an array, since this is the requirement for KeWaitForMultipleObjects. If the returned
status is STATUS_WAIT_1 (which is the same as STATUS_WAIT_0 + 1), meaning index number 1 is the
signaled object, the loop is exited with a break instruction.
Now we need to extract the next note to play:
PLIST_ENTRY link;
{
Locker locker(m_lock);
link = RemoveHeadList(&m_head);
NT_ASSERT(link != &m_head);
}
auto note = CONTAINING_RECORD(link, FullNote, Link);
KdPrint((DRIVER_PREFIX "Playing note Freq: %u Dur: %u Rep: %u Delay: %u\n",
note->Frequency, note->Duration, note->Repeat, note->Delay));
Chapter 8: Advanced Programming Techniques (Part 1) 255
We remove the head item from the list, and doing so under the fast mutex’ protection. The assert
ensures we are in a consistent state - remember that removing an item from an empty list returns the
pointer to its head.
The actual FullNote pointer is retrieved with the help of the CONTAINING_RECORD macro, that moves
the LIST_ENTRY pointer we received from RemoveHeadList to the containing FullNode that we are
actually interested in.
The next step is to handle the note. If the note’s frequency is zero, let’s consider that as a “silence time”
with the length provided by Delay:
if (note->Frequency == 0) {
//
// just do a delay
//
NT_ASSERT(note->Duration > 0);
LARGE_INTEGER interval;
interval.QuadPart = -10000LL * note->Duration;
KeDelayExecutionThread(KernelMode, FALSE, &interval);
}
KeDelayExecutionThread is the rough equivalent of the Sleep/SleepEx APIs from user-mode. Here
is its declaration:
NTSTATUS KeDelayExecutionThread (
_In_ KPROCESSOR_MODE WaitMode,
_In_ BOOLEAN Alertable,
_In_ PLARGE_INTEGER Interval);
We’ve seen all these parameters as part of the wait functions. The most common invocation is with
KernelMode and FALSE for WaitMode and Alertable, respectively. The interval is the most important
parameter, where negative values mean relative wait in 100nsec units. Converting from milliseconds
means multiplying by -10000, which is what you see in the above code.
If the frequency in the note is not zero, then we need to call the Beep driver with proper IRP. We
already know that we need the IOCTL_BEEP_SET control code (defined in ntddbeep.h) and the BEEP_-
SET_PARAMETERS structure. All we need to do is build an IRP with the correct information using
IoBuildDeviceIoControlRequest, and send it to the beep device with IoCallDriver:
Chapter 8: Advanced Programming Techniques (Part 1) 256
else {
params.Duration = note->Duration;
params.Frequency = note->Frequency;
int count = max(1, note->Repeat);
KEVENT doneEvent;
KeInitializeEvent(&doneEvent, NotificationEvent, FALSE);
We loop around based on the Repeat member (which is usually 1). Then the IRP_MJ_DEVICE_CONTROL
IRP is built with IoBuildDeviceIoControlRequest, supplying the frequency to play and the
duration. Then, IoCallDriver is invoked with the Beep device pointer we obtained earlier, and the
IRP. Unfortunately (or futunately, depending on your perspective), the Beep driver just starts the
operation, but does not wait for it to finish. It might (and in fact, always) returns STATUS_PENDING
from the IoCallDriver call, which means the operation is not yet complete (the actual playing has
not yet begun). Since we don’t have anything else to do until then, the doneEvent event provided to
IoBuildDeviceIoControlRequest is signaled automatically by the I/O manager when the operation
completes - so we wait on the event.
Now that the sound is playing, we have to wait for the duration of that note with KeDelayExecutionThread:
LARGE_INTEGER delay;
delay.QuadPart = -10000LL * note->Duration;
KeDelayExecutionThread(KernelMode, FALSE, &delay);
Finally, if Repeat is greater than one, then we might need to wait between plays of the same note:
Chapter 8: Advanced Programming Techniques (Part 1) 257
At this point, the note data can be freed (or just returned to the lookaside list) and the code loops back
to wait for the availability of the next note:
ExFreeToPagedLookasideList(&m_lookaside, note);
}
The loop continues until the thread is instructed to stop by signaling stopEvent, at which point
it breaks from the infinite loop and cleans up by dereferencing the file object obtained from
IoGetDeviceObjectPointer:
ObDereferenceObject(beepFileObject);
}
Here is the entire thread function for convenience (comments and KdPrint removed):
void PlaybackState::PlayMelody() {
PDEVICE_OBJECT beepDevice;
UNICODE_STRING beepDeviceName = RTL_CONSTANT_STRING(DD_BEEP_DEVICE_NAME_U);
PFILE_OBJECT beepFileObject;
auto status = IoGetDeviceObjectPointer(&beepDeviceName, GENERIC_WRITE,
&beepFileObject, &beepDevice);
if (!NT_SUCCESS(status)) {
return;
}
for (;;) {
status = KeWaitForMultipleObjects(2, objects, WaitAny, Executive,
KernelMode, FALSE, nullptr, nullptr);
if (status == STATUS_WAIT_1) {
Chapter 8: Advanced Programming Techniques (Part 1) 258
break;
}
PLIST_ENTRY link;
{
Locker locker(m_lock);
link = RemoveHeadList(&m_head);
NT_ASSERT(link != &m_head);
}
auto note = CONTAINING_RECORD(link, FullNote, Link);
if (note->Frequency == 0) {
NT_ASSERT(note->Duration > 0);
LARGE_INTEGER interval;
interval.QuadPart = -10000LL * note->Duration;
KeDelayExecutionThread(KernelMode, FALSE, &interval);
}
else {
params.Duration = note->Duration;
params.Frequency = note->Frequency;
int count = max(1, note->Repeat);
KEVENT doneEvent;
KeInitializeEvent(&doneEvent, SynchronizationEvent, FALSE);
LARGE_INTEGER delay;
Chapter 8: Advanced Programming Techniques (Part 1) 259
The last piece of the puzzle is the PlaybackState::Stop method that signals the thread to exit:
void PlaybackState::Stop() {
if (m_hThread) {
//
// signal the thread to stop
//
KeSetEvent(&m_stopEvent, 2, FALSE);
//
// wait for the thread to exit
//
PVOID thread;
auto status = ObReferenceObjectByHandle(m_hThread, SYNCHRONIZE,
*PsThreadType, KernelMode, &thread, nullptr);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "ObReferenceObjectByHandle error (0x%X)\n",
status));
}
else {
KeWaitForSingleObject(thread, Executive, KernelMode, FALSE, nullptr\
);
ObDereferenceObject(thread);
}
ZwClose(m_hThread);
m_hThread = nullptr;
}
}
Chapter 8: Advanced Programming Techniques (Part 1) 260
If the thread exists (m_hThread is non-NULL), then we set the event (KeSetEvent). Then we
wait for the thread to actually terminate. This is technically unnecessary because the thread was
created with IoCreateSystemThread, so there is no danger the driver is unloaded prematurely.
Still, it’s worthwhile showing how to get the pointer to the thread object given a handle (since
KeWaitForSingleObject requires an object). It’s important to remember to call ObDereferenceObject
once we don’t need the pointer anymore, or the thread object will remain alive forever (keeping its
process and other resources alive as well).
Client Code
Here are some examples for invoking the driver (error handling omitted):
#include <Windows.h>
#include <stdio.h>
#include "..\KMelody\MelodyPublic.h"
int main() {
HANDLE hDevice = CreateFile(MELODY_SYMLINK, GENERIC_WRITE, 0,
nullptr, OPEN_EXISTING, 0, nullptr);
Note notes[10];
for (int i = 0; i < _countof(notes); i++) {
notes[i].Frequency = 400 + i * 30;
notes[i].Duration = 500;
}
DWORD bytes;
DeviceIoControl(hDevice, IOCTL_MELODY_PLAY, notes, sizeof(notes),
nullptr, 0, &bytes, nullptr);
CloseHandle(hDevice);
return 0;
}
Chapter 8: Advanced Programming Techniques (Part 1) 261
I recommend you build the driver and the client and test them. The project names are KMelody and
Melody in the solution for this chapter. Build your own music!
NTSTATUS NtQuerySystemInformation (
IN SYSTEM_INFORMATION_CLASS SystemInformationClass,
OUT PVOID SystemInformation,
IN ULONG SystemInformationLength,
OUT PULONG ReturnLength OPTIONAL);
The macros IN and OUT expand to nothing. These were used in the old days before SAL was invented
to provide some semantics for developers. For some reason, Winternl.h uses these macros rather than
the modern SAL annotations.
Chapter 8: Advanced Programming Techniques (Part 1) 262
We can copy this definition and tweak it a bit by turning it into its Zw variant, more suitable for
kernel callers. The SYSTEM_INFORMATION_CLASS enumeration and associated data structures are the
real data we’re after. Some values are provided in user-mode and/or kernel-mode headers. Most of
the values have been “reversed engineered” and can be found in open source projects, such as Process
Hacker². Although these APIs might not be officially documented, they are unlikely to change as
Microsoft’s own tools depend on many of them.
If the API in question only exists in certain Windows versions, it’s possible to query dynamically for
the existence of a kernel API with MmGetSystemRoutineAddress:
NTAPI NtQueryInformationProcess (
IN HANDLE ProcessHandle,
IN PROCESSINFOCLASS ProcessInformationClass,
OUT PVOID ProcessInformation,
IN ULONG ProcessInformationLength,
OUT PULONG ReturnLength OPTIONAL);
Curiously enough, the kernel-mode headers provide many of the PROCESSINFOCLASS enumeration
values, along with their associated data structures, but not the definition of this system call itself.
Here is a partial set of values for PROCESSINFOCLASS:
A more complete list is available in ntddk.h. A full list is available within the Process
Hacker project.
The following example shows how to query the current process image file name. ProcessImageFileName
seems to be the way to go, and it expects a UNICODE_STRING as the buffer:
²https://github.com/processhacker/phnt
Chapter 8: Advanced Programming Techniques (Part 1) 263
#include <ntddk.h>
SIZE_T QuotaPagedPoolUsage;
PVOID Reserved6;
SIZE_T QuotaNonPagedPoolUsage;
SIZE_T PagefileUsage;
SIZE_T PeakPagefileUsage;
SIZE_T PrivatePageCount;
LARGE_INTEGER Reserved7[6];
} SYSTEM_PROCESS_INFORMATION, * PSYSTEM_PROCESS_INFORMATION;
Notice there are lots of “reserved” members in SYSTEM_PROCESS_INFORMATION. We’ll manage with
what we get, but you can find the full data structure in the Process Hacker project.
EnumProcesses starts by querying the number of bytes needed by calling ZwQuerySystemInformation
with a null buffer and zero size, getting the last parameter as the required size:
void EnumProcesses() {
ULONG size = 0;
ZwQuerySystemInformation(SystemProcessInformation, nullptr, 0, &size);
size += 1 << 12; // 4KB, just to make sure the next call succeeds
We want to allocate some more in case new processes are created between this call and the next “real”
call. We can write the code in a more robust way and have a loop that queries until the size is large
enough, but the above solution is robust enough for most purposes.
Next, we allocate the required buffer and make the call again, this time with the real buffer:
if (NT_SUCCESS(ZwQuerySystemInformation(SystemProcessInformation,
buffer, size, nullptr))) {
if the call succeeds, we can start iterating. The returned pointer is to the first process, where
the next process is located NextEntryOffset bytes from this offset. The enumeration ends when
NextEntryOffset is zero:
Chapter 8: Advanced Programming Techniques (Part 1) 265
We output some of the details provided in the SYSTEM_PROCESS_INFORMATION structure and count
the nnumber of processes while we’re at it. The only thing left to do in this simple example is to clean
up:
}
ExFreePool(buffer);
}
EnumProcesses();
return STATUS_UNSUCCESSFUL;
}
Given this knowledge, we can make the KMelody driver a bit better by creating our thread in a
Csrss.exe process for the current session, instead of the first client process that comes in. This is better,
since Csrss always exists, and is in fact a critical process - one that if killed for whatever reason, causes
the system to crash.
Killing Csrss is not easy, since it’s a protected process starting with Windows 8.1, but kernel code
can certainly do that.
Chapter 8: Advanced Programming Techniques (Part 1) 266
1. Modify the KMelody driver to create the thread in a Csrss process for the current
session. Search for Csrss with ZwQuerySystemInformation and create the thread
in that process.
2. Add support for multiple sessions, where there is one playback thread per
session. Hint: call ZwQueryInformationProcess with ProcessSessionId to
find out the session a process is part of. Manage a list of PlaybackState ob-
jects, one for each session. You can also use the undocumented (but exported)
PsGetCurrentProcessSessionId API.
Summary
In this chapter, we were introduced to some programming techniques that are useful in many types
of drivers. We’re not done with these techniques - there will be more in chapter 11. But for now, we
can begin using some kernel-provided notifications, starting with Process and Thread notifications in
the next chapter.
Chapter 9: Process and Thread
Notifications
One of the powerful mechanisms available for kernel drivers is the ability to be notified when certain
important events occur. In this chapter, we’ll look into some of these events, namely process creation
and destruction, thread creation and destruction, and image loads.
In this chapter:
• Process Notifications
• Implementing Process Notifications
• Providing Data to User Mode
• Thread Notifications
• Image Load Notifications
• Remote Thread Detection
Process Notifications
Whenever a process is created or destroyed, interested drivers can be notified by the kernel of that fact.
This allows drivers to keep track of processes, possibly associating some data with these processes. At
the very minimum, these allow drivers to monitor process creation/destruction in real-time. By “real-
time” I mean that the notifications are sent “in-line”, as part of process creation; the driver cannot
miss any processes that may be created and destroyed quickly.
For process creation, drivers also have the power to stop the process from being fully created, returning
an error to the caller that initiated process creation. This kind of power can only be directly achieved
in kernel mode.
Windows provides other mechanisms for being notified when processes are created or destroyed. For
example, using Event Tracing for Windows (ETW), such notifications can be received by a user-mode
process (running with elevated privileges). However, there is no way to prevent a process from being
created. Furthermore, ETW has an inherent notification delay of about 1-3 seconds (it uses internal
buffers for performance reasons), so a short-lived process may exit before the creation notification
arrives. Opening a handle to the created process at that time would no longer be possible.
Chapter 9: Process and Thread Notifications 268
NTSTATUS PsSetCreateProcessNotifyRoutineEx (
_In_ PCREATE_PROCESS_NOTIFY_ROUTINE_EX NotifyRoutine,
_In_ BOOLEAN Remove);
The first argument is the driver’s callback routine, having the following prototype:
void ProcessNotifyCallback(
_Inout_ PEPROCESS Process,
_In_ HANDLE ProcessId,
_Inout_opt_ PPS_CREATE_NOTIFY_INFO CreateInfo);
• Process - the process object of the newly created process, or the process being destroyed.
• Process Id - the unique process ID of the process. Although it’s declared with type HANDLE, it’s
in fact an ID.
• CreateInfo - a structure that contains detailed information on the process being created. If the
process is being destroyed, this argument is NULL.
For process creation, the driver’s callback routine is executed by the creating thread (running as part
of the creating process). For process exit, the callback is executed by the last thread to exit the process.
In both cases, the callback is called inside a critical region (where normal kernel APCs are disabled).
Starting with Windows 10 version 1607, there is another function for process notifications: PsSetCreateProcessNotifyR
This “extended” function sets up a callback similar to the previous one, but the callback is also
invoked for Pico processes. Pico processes are those used to host Linux processes for the Windows
Subsystem for Linux (WSL) version 1. If a driver is interested in such processes, it must register with
the extended function.
Chapter 9: Process and Thread Notifications 269
The data structure provided for process creation is defined like so:
struct ItemHeader {
ItemType Type;
USHORT Size;
LARGE_INTEGER Time;
};
The ItemType enum defined above uses the C++ 11 scoped enum feature, where enum
values have a scope (ItemType in this case). These enums can also have a non-int size -
short in the example. If you’re using C, you can use classic enums, or even #defines if
you prefer.
The ItemHeader structure holds information common to all event types: the type of the event, the
time of the event (expressed as a 64-bit integer), and the size of the payload. The size is important, as
each event has its own information. If we later wish to pack an array of these events and (say) provide
them to a user-mode client, the client needs to know where each event ends and the next one begins.
Once we have this common header, we can derive other data structures for particular events. Let’s
start with the simplest - process exit:
For process exit event, there is just one interesting piece of information (besides the header and the
thread ID) - the exit status (code) of the process. This is normally the value returned from a user-mode
main function.
If you’re using C, then inheritance is not available to you. However, you can simulate it
by having the first member be of type ItemHeader and then adding the specific members;
The memory layout is the same.
struct ProcessExitInfo {
ItemHeader Header;
ULONG ProcessId;
};
Chapter 9: Process and Thread Notifications 272
The type used for a process ID is ULONG - process IDs (and thread IDs) cannot be larger than 32-bit.
HANDLE is not a good idea, as user mode may be confused by it. Also, HANDLE has a different size in
a 32-bit process as opposed to a 64-bit process, so it’s best to avoid “bitness”-affected members. If
you’re familiar with user-mode programming, DWORD is a common typedef for a 32-bit unsigned
integer. It’s not used here because DWORD is not defined in the WDK headers. Although it’s pretty
easy to define it explicitly, it’s simpler just to use ULONG, which means the same thing and is defined
in user-mode and kernel-mode headers.
Since we need to store every such structure as part of a linked list, each data structure must contain
a LIST_ENTRY instance that points to the next and previous items. Since these LIST_ENTRY objects
should not be exposed to user-mode, we will define extended structures containing these entries in a
different file, that is not shared with user-mode.
There are several ways to define a “bigger” structure to hold the LIST_ENTRY. One way is to create
templated type that has a LIST_ENTRY at the beginning (or end) like so:
template<typename T>
struct FullItem {
LIST_ENTRY Entry;
T Data;
};
A templated class is used to avoid creating a multitude of types, one for each specific event type. For
example, we could create the following structure specifically for a process exit event:
Chapter 9: Process and Thread Notifications 273
struct FullProcessExitInfo {
LIST_ENTRY Entry;
ProcessExitInfo Data;
};
We could even inherit from LIST_ENTRY and then just add the ProcessExitInfo structure. But
this is not elegant, as our data has nothing to do with LIST_ENTRY, so inheriting from it is artificial
and should be avoided.
The FullItem<T> type saves the hassle of creating these individual types.
IF you’re using C, then templates are not available, and you must use the above structure
approach. I’m not going to mention C again in this chapter - there is always a workaround
that can be used if you have to.
Another way to accomplish something similar, without using templates is by using a union to hold
on to all the possible variants. For example:
Then we just extend the list of data members in the union. The full item would be just a simple
extension:
struct FullItem {
LIST_ENTRY Entry;
ItemData Data;
};
The rest of the code uses the first option (with the template). The reader is encouraged to try the
second option.
The head of our linked list must be stored somewhere. We’ll create a data structure that will hold all
the global state of the driver, instead of creating separate global variables. Here is the definition of
our structure (in Globals.h in the smaple code for this chapter):
Chapter 9: Process and Thread Notifications 274
#include "FastMutex.h"
struct Globals {
void Init(ULONG maxItems);
bool AddItem(LIST_ENTRY* entry);
LIST_ENTRY* RemoveItem();
private:
LIST_ENTRY m_ItemsHead;
ULONG m_Count;
ULONG m_MaxCount;
FastMutex m_Lock;
};
m_MaxCount holds the maximum number of elements in the linked list. This will be used to prevent
the list from growing arbitrarily large if a client does not request data for a while. m_Count holds the
current number of items in the list. The list itself is initialized with the normal InitializeListHead
API. Finally, the fast mutex is initialized by invoking its own Init method as implemented in chapter
6.
// in SysMon.cpp
Globals g_State;
extern "C"
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING) {
auto status = STATUS_SUCCESS;
do {
UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\sysmon");
status = IoCreateDevice(DriverObject, 0, &devName,
FILE_DEVICE_UNKNOWN, 0, TRUE, &DeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to create device (0x%08X)\n",
status));
break;
}
DeviceObject->Flags |= DO_DIRECT_IO;
if (!NT_SUCCESS(status)) {
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (DeviceObject)
Chapter 9: Process and Thread Notifications 276
IoDeleteDevice(DeviceObject);
return status;
}
DriverObject->DriverUnload = SysMonUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = SysMonCreateClose;
DriverObject->MajorFunction[IRP_MJ_READ] = SysMonRead;
return status;
}
The device object’s flags are adjusted to use Direct I/O for read/write operations (DO_DIRECT_IO).
The device is created as exclusive, so that only a single client can exist to the device. This makes
sense, otherwise multiple clients might be getting data from the device, which would mean each client
getting parts of the data. In this case, I decided to prevent that by creating the device as exclusive (TRUE
value in the second to last argument). We’ll use the read dispatch routine to return event information
to a client.
The create and close dispatch routines are handled in the simplest possible way - just completing them
successfully, with the help of CompleteRequest we have encountered before:
For process exit we have just the process ID we need to save, along with the header data common to
all events. First, we need to allocate storage for the full item representing this event:
If the allocation fails, there is really nothing the driver can do, so it just returns from the callback.
Now it’s time to fill the generic information: time, item type and size, all of which are easy to get:
PushItem(&info->Entry);
First, we dig into the data item itself (bypassing the LIST_ENTRY) with the item variable. Next, we fill
the header information: The item type is well-known, since we are in the branch handling a process
exit notification; the time can be obtained with KeQuerySystemTimePrecise that returns the current
system time (UTC, not local time) as a 64-bit integer counting from January 1, 1601 at midnight
Universal Time. Finally, the item size is constant and is the size of the user-facing data structure (not
the size of the FullItem<ProcessExitInfo>).
Notice the item variable is a reference to the data; without the reference (&), a copy would
have been created, which is not what we want.
Chapter 9: Process and Thread Notifications 278
The specific data for a process exit event consists of the process ID and the exit code. The process ID
is provided directly by the callback itself. The only thing to do is call HandleToULong so the correct
cast is used to turn a HANDLE value into an unsigned 32-bit integer. The exit code is not given directly,
but it’s easy to retrieve with PsGetProcessExitStatus:
All that’s left to do now is add the new item to the end of our linked list. For this purpose, we’ll define
and implement a function named AddItem in the Globals class:
InsertTailList(&m_ItemsHead, entry);
m_Count++;
}
AddItem uses the Locker<T> we saw in earlier chapters to acquire the fast mutex (and release it when
the variable goes out of scope) before manipulating the linked list. Remember to set the C++ standard
to C++ 17 at least in the project’s properties so that Locker can be used without explicitly specifying
the type it works on (the compiler makes the inference).
We’ll add new items to the tail of the list. If the number of items in the list is at its maximum, the
function removes the first item (from the head) and frees it with ExFreePool, decrementing the item
count.
This is not the only way to handle the case where the number of items is too large. Feel free to use
other ways. A more “precise” way might be tracking the number of bytes used, rather than number
of items, because each item is different in size.
With AddItem implemented, we can call it from our process notify routine:
g_State.AddItem(&info->Entry);
Implement the limit by reading from the registry in DriverEntry. Hint: you can use APIs
such as ZwOpenKey or IoOpenDeviceRegistryKey and then ZwQueryValueKey. We’ll
look at these APIs more closely in chapter 11.
We choose to store the process ID, the parent process ID and the command line. Although this structure
can work and is fairly easy to deal with because its size is known in advance.
The potential issue here is with the command line. Declaring the command line with constant size
is simple, but not ideal. If the command line is longer than allocated, the driver would have to trim
it, possibly hiding important information. If the command line is shorter than the defined limit, the
structure is wasting memory.
not work. First, UNICODE_STRING is not normally defined in user mode headers. Secondly (and much
worse), the internal pointer to the actual characters normally would point to system space, inaccessible
to user-mode. Thirdly, how would that string be eventually freed?
Here is another option, which we’ll use in our driver:
Chapter 9: Process and Thread Notifications 280
We’ll store the command line length and copy the actual characters at the end of the structure, starting
from CommandLine. The array size is specified as 1 just to make it easier to work with in the code. The
actual number of characters is provided by CommandLineLength.
Given this declaration, we can begin implementation for process creation (CreateInfo is non-NULL):
The total size for an allocation is based on the command line length (if any). Now it’s time to fill in
the fixed-size details:
The item size must be calculated to include the command line length.
Next, we need to copy the command line to the address where CommandLine begins, and set the correct
command line length:
Chapter 9: Process and Thread Notifications 281
if (commandLineSize > 0) {
memcpy(item.CommandLine, CreateInfo->CommandLine->Buffer, commandLineSize);
item.CommandLineLength = commandLineSize / sizeof(WCHAR); // len in WCHARs
}
else {
item.CommandLineLength = 0;
}
g_State.AddItem(&info->Entry);
The command line length is stored in characters, rather than bytes. This is not mandatory, of course,
but would probably be easier to use by user mode code. Notice the command line is not NULL
terminated - it’s up to the client not read too many characters. As an alternative, we can make the
string null terminated to simplify client code. In fact, if we do that, the command line length is not
even needed.
Make the command line NULL-terminated and remove the command line length.
Astute readers may notice that the calculated data length is actually one character longer
than needed, perfect for adding a NULL-terminator. Why? sizeof(ProcessCreateInfo)
includes one character of the command line.
For easier reference, here is the complete process notify callback implementation:
KeQuerySystemTimePrecise(&item.Time);
item.Type = ItemType::ProcessCreate;
item.Size = sizeof(ProcessCreateInfo) + commandLineSize;
item.ProcessId = HandleToULong(ProcessId);
item.ParentProcessId = HandleToULong(CreateInfo->ParentProcessId);
item.CreatingProcessId = HandleToULong(
CreateInfo->CreatingThreadId.UniqueProcess);
item.CreatingThreadId = HandleToULong(
CreateInfo->CreatingThreadId.UniqueThread);
if (commandLineSize > 0) {
memcpy(item.CommandLine, CreateInfo->CommandLine->Buffer,
commandLineSize);
item.CommandLineLength = commandLineSize / sizeof(WCHAR);
}
else {
item.CommandLineLength = 0;
}
g_State.AddItem(&info->Entry);
}
else {
auto info = (FullItem<ProcessExitInfo>*)ExAllocatePoolWithTag(
PagedPool, sizeof(FullItem<ProcessExitInfo>), DRIVER_TAG);
if (info == nullptr) {
KdPrint((DRIVER_PREFIX "failed allocation\n"));
return;
}
g_State.AddItem(&info->Entry);
}
}
Chapter 9: Process and Thread Notifications 283
Now we need to access our linked list and pull items from its head. We’ll add this support to the
Global class by implementing a method that removed an item from the head and returns it. If the list
is empty, it returns NULL:
LIST_ENTRY* Globals::RemoveItem() {
Locker locker(m_Lock);
auto item = RemoveHeadList(&m_ItemsHead);
if (item == &m_ItemsHead)
return nullptr;
m_Count--;
return item;
}
If the linked list is empty, RemoveHeadList returns the head itself. It’s also possible to use IsListEmpty
to make that determination. Lastly, we can check if m_Count is zero - all these are equivalent. If there
is an item, it’s returned as a LIST_ENTRY pointer.
Back to the Read dispatch routine - we can now loop around, getting an item out, copying its data to
the user-mode buffer, until the list is empty or the buffer is full:
Chapter 9: Process and Thread Notifications 284
else {
while (true) {
auto entry = g_State.RemoveItem();
if (entry == nullptr)
break;
//
// get pointer to the actual data item
//
auto info = CONTAINING_RECORD(entry, FullItem<ItemHeader>, Entry);
auto size = info->Data.Size;
if (len < size) {
//
// user's buffer too small, insert item back
//
g_State.AddHeadItem(entry);
break;
}
memcpy(buffer, &info->Data, size);
len -= size;
buffer += size;
bytes += size;
ExFreePool(info);
}
}
return CompleteRequest(Irp, status, bytes);
Globals::RemoveItem is called to retrieve the head item (if any). Then we have to check if the
remaining bytes in the user’s buffer are enough to contain the data of this item. If not, we have to
push the item back to the head of the queue, accomplished with another method in the Globals class:
If there is enough room in the buffer, a simple memcpy is used to copy the actual data (everything
except the LIST_ENTRY to the user’s buffer). Finally, the variables are adjusted based on the size of
this item and the loop repeats.
Once out of the loop, the only thing remaining is to complete the request with whatever status and
information (bytes) have been accumulated thus far.
We need to take a look at the unload routine as well. If there are items in the linked list, they must be
freed explicitly; otherwise, we have a leak on our hands:
Chapter 9: Process and Thread Notifications 285
LIST_ENTRY* entry;
while ((entry = g_State.RemoveItem()) != nullptr)
ExFreePool(CONTAINING_RECORD(entry, FullItem<ItemHeader>, Entry));
The linked list items are freed by repeatedly removing items from the list and calling ExFreePool on
each item.
#include <Windows.h>
#include <stdio.h>
#include <memory>
#include <string>
#include "..\SysMon\SysMonPublic.h"
int main() {
auto hFile = CreateFile(L"\\\\.\\SysMon", GENERIC_READ, 0,
nullptr, OPEN_EXISTING, 0, nullptr);
if (hFile == INVALID_HANDLE_VALUE)
return Error("Failed to open file");
while (true) {
DWORD bytes = 0;
// error handling omitted
ReadFile(hFile, buffer.get(), size, &bytes, nullptr);
if (bytes)
Chapter 9: Process and Thread Notifications 286
DisplayInfo(buffer.get(), bytes);
The DisplayInfo function must make sense of the buffer it’s given. Since all events start with a
common header, the function distinguishes the various events based on the ItemType. After the event
has been dealt with, the Size field in the header indicates where the next event starts:
case ItemType::ProcessCreate:
{
DisplayTime(header->Time);
auto info = (ProcessCreateInfo*)buffer;
std::wstring commandline(info->CommandLine,
info->CommandLineLength);
printf("Process %u Created. Command line: %ws\n",
info->ProcessId, commandline.c_str());
break;
}
}
buffer += header->Size;
size -= header->Size;
}
Chapter 9: Process and Thread Notifications 287
To extract the command line properly, the code uses the C++ wstring class constructor that can build
a string based on a pointer and the string length. The DisplayTime helper function formats the time
in a human-readable way:
//
// convert to local time first (KeQuerySystemTime(Procise) returns UTC)
//
FileTimeToLocalFileTime((FILETIME*)&time, &local);
SYSTEMTIME st;
FileTimeToSystemTime(&local, &st);
printf("%02d:%02d:%02d.%03d: ",
st.wHour, st.wMinute, st.wSecond, st.wMilliseconds);
}
SYSTEMTIME is a convenient structure to work with, as it contains all ingredients of a date and time.
In the above code, only the time is displayed, but the date components are present as well.
That’s all we need to begin testing the driver and the client.
The driver can be installed and started as done in earlier chapters, similar to the following:
sc start sysmon
y-sub-type=winrt_app_id.mojom.WinrtAppIdService --field-trial-handle=2060,10918\
786588500781911,4196358801973005731,131072 --lang=en-US --service-sandbox-type=\
none --mojo-platform-channel-handle=5404 /prefetch:8
16:18:53.836: Thread 12456 Created in process 10720
16:18:58.159: Process 10404 Exited (Code: 1)
16:19:02.033: Process 6216 Exited (Code: 0)
16:19:28.163: Process 9360 Exited (Code: 0)
Thread Notifications
The kernel provides thread creation and destruction callbacks, similarly to process callbacks. The API
to use for registration is PsSetCreateThreadNotifyRoutine and for unregistering there is another
API, PsRemoveCreateThreadNotifyRoutine:
NTSTATUS PsSetCreateThreadNotifyRoutine(
_In_ PCREATE_THREAD_NOTIFY_ROUTINE NotifyRoutine);
NTSTATUS PsRemoveCreateThreadNotifyRoutine (
_In_ PCREATE_THREAD_NOTIFY_ROUTINE NotifyRoutine);
The arguments provided to the callback routine are the process ID, thread ID and whether the thread
is being created or destroyed:
If a thread is created, the callback is executed by the creator thread; if the thread exits, the callback
executes on that thread.
We’ll extend the existing SysMon driver to receive thread notifications as well as process notifications.
First, we’ll add enum values for thread events and a structure representing the information, all in the
SysMonCommon.h header file:
ULONG ThreadId;
ULONG ProcessId;
};
It’s convenient to have ThreadExitInfo inherit from ThreadCreateInfo, as they share the thread
and process IDs. It’s certainly not mandatory, but it makes the thread notification callback a bit simpler
to write.
Now we can add the proper registration to DriverEntry, right after registering for process notifica-
tions:
status = PsSetCreateThreadNotifyRoutine(OnThreadNotify);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to set thread callbacks (0x%08X)\n",
status));
break;
}
// in SysMonUnload
PsRemoveCreateThreadNotifyRoutine(OnThreadNotify);
The callback routine itself is simpler than the process notification callback, since the event structures
have fixed sizes. Here is the thread callback routine in its entirety:
KeQuerySystemTimePrecise(&item.Time);
item.Size = Create ? sizeof(ThreadCreateInfo) : sizeof(ThreadExitInfo);
item.Type = Create ? ItemType::ThreadCreate : ItemType::ThreadExit;
item.ProcessId = HandleToULong(ProcessId);
item.ThreadId = HandleToULong(ThreadId);
if (!Create) {
PETHREAD thread;
if (NT_SUCCESS(PsLookupThreadByThreadId(ThreadId, &thread))) {
item.ExitCode = PsGetThreadExitStatus(thread);
ObDereferenceObject(thread);
}
}
g_State.AddItem(&info->Entry);
}
Most of this code should look pretty familiar. The slightly complex part if retrieving the thread exit
code.
PsGetThreadExitStatus can be used for that, but that API requires a thread object pointer rather
than an ID. PsLookupThreadByThreadId is used to obtain the thread object that is passed to
PsGetThreadExitStatus. It’s important to remember to call ObDereferenceObject on the thread
object or else it will linger in memory until the next system restart.
To complete the implementation, we’ll add code to the client that knows how to display thread creation
and destruction (in the switch block inside DisplayInfo):
case ItemType::ThreadCreate:
{
DisplayTime(header->Time);
auto info = (ThreadCreateInfo*)buffer;
printf("Thread %u Created in process %u\n",
info->ThreadId, info->ProcessId);
break;
}
case ItemType::ThreadExit:
{
DisplayTime(header->Time);
auto info = (ThreadExitInfo*)buffer;
printf("Thread %u Exited from process %u (Code: %u)\n",
info->ThreadId, info->ProcessId, info->ExitCode);
break;
}
Here is some sample output given the updated driver and client:
Chapter 9: Process and Thread Notifications 291
Add client code that displays the process image name for thread create and exit.
NTSTATUS PsSetCreateThreadNotifyRoutineEx(
_In_ PSCREATETHREADNOTIFYTYPE NotifyType,
_In_ PVOID NotifyInformation); // PCREATE_THREAD_NOTIFY_ROUTINE
Using PsCreateThreadNotifyNonSystem indicates the callback for new threads should execute on
the newly created thread, rather than the creator.
NTSTATUS PsSetLoadImageNotifyRoutine(
_In_ PLOAD_IMAGE_NOTIFY_ROUTINE NotifyRoutine);
NTSTATUS PsRemoveLoadImageNotifyRoutine(
_In_ PLOAD_IMAGE_NOTIFY_ROUTINE NotifyRoutine);
The FullImageName argument is somewhat tricky. As indicated by the SAL annotation, it’s optional
and can be NULL. Even if it’s not NULL, it doesn’t always produce the correct image file name before
Windows 10. The reasons for that are rooted deep in the kernel, it’s I/O system and the file system
cache. In most cases, this works fine, and the format of the path is the internal NT format, starting
with something like “\Device\HadrdiskVolumex\…” rather than “c:\…”. Translation can be done in a
few ways, we’ll see one way when we look at the client code.
The ProcessId argument is the process ID into which the image is loaded. For drivers (kernel modules),
this value is zero.
The ImageInfo argument contains additional information on the image, declared as follows:
#define IMAGE_ADDRESSING_MODE_32BIT 3
• SystemModeImage - this flag is set for a kernel image, and unset for a user mode image.
• resourcesignatureLevel - signing level for Protected Processes Light (PPL) (Windows 8.1 and
later). See SE_SIGNING_LEVEL_ constants in the WDK.
• resourcesignatureType - signature type for PPL (Windows 8.1 and later). See the SE_IMAGE_-
SIGNATURE_TYPE enumeration in the WDK.
• ImageBase - the virtual address into which the image is loaded.
• ImageSize - the size of the image.
• ExtendedInfoPresent - if this flag is set, then IMAGE_INFO is part of a larger structure, IMAGE_-
INFO_EX, shown here:
To access this larger structure, a driver can use the CONTAINING_RECORD macro like so:
if (ImageInfo->ExtendedInfoPresent) {
auto exinfo = CONTAINING_RECORD(ImageInfo, IMAGE_INFO_EX, ImageInfo);
// access FileObject
}
The extended structure adds just one meaningful member - the file object used to open the image.
This may be useful for retrieving the file name in pre-WIndows 10 machines, as we’ll soon see.
As with the process and thread notifications, we’ll add the needed code to register in DriverEntry
and the code to unregister in the Unload routine. Here is the full DriverEntry function (with KdPrint
calls removed for brevity):
Chapter 9: Process and Thread Notifications 294
do {
UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\sysmon");
status = IoCreateDevice(DriverObject, 0, &devName,
FILE_DEVICE_UNKNOWN, 0, TRUE, &DeviceObject);
if (!NT_SUCCESS(status)) {
break;
}
DeviceObject->Flags |= DO_DIRECT_IO;
status = PsSetCreateThreadNotifyRoutine(OnThreadNotify);
if (!NT_SUCCESS(status)) {
break;
}
threadCallbacks = true;
status = PsSetLoadImageNotifyRoutine(OnImageLoadNotify);
if (!NT_SUCCESS(status)) {
break;
}
} while (false);
if (!NT_SUCCESS(status)) {
Chapter 9: Process and Thread Notifications 295
if (threadCallbacks)
PsRemoveCreateThreadNotifyRoutine(OnThreadNotify);
if (processCallbacks)
PsSetCreateProcessNotifyRoutineEx(OnProcessNotify, TRUE);
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
return status;
}
g_State.Init(10000);
DriverObject->DriverUnload = SysMonUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = SysMonCreateClose;
DriverObject->MajorFunction[IRP_MJ_READ] = SysMonRead;
return status;
}
As before, we need a structure to contain the information we can get from image load:
For variety, ImageLoadInfo uses a fixed size array to store the path to the image file. The interested
reader should change that to use a scheme similar to process create notifications.
The image load notification starts by not storing information on kernel images:
This is not necessary, of course. You can remove the above check so that kernel images are reported
as well. Next, we allocate the data structure and fill in the usual information:
The interesting part is the image path. The simplest option is to examine FullImageName, and if non-
NULL, just grab its contents. But since this information might be missing or not 100% reliable, we can
try something else first, and fall back on FullImageName if all else fails.
The secret is to use FltGetFileNameInformationUnsafe - a variant on FltGetFileNameInformation
that is used with File System Mini-filters, as we’ll see in chapter 12. The “Unsafe” version can be called
in non-file-system contexts as is our case. A full discussion on FltGetFileNameInformation is saved
for chapter 12. For now, let’s just use if the file object is available:
Chapter 9: Process and Thread Notifications 297
FltGetFileNameInformationUnsafe requires the file object that can be obtained from the extended
IMAGE_INFO_EX structure. wcscpy_s ensures we don’t copy more characters than are available in the
buffer. FltReleaseFileNameInformation must be called to free the PFLT_FILE_NAME_INFORMATION
object allocated by FltGetFileNameInformationUnsafe.
To gain access to these functions, add #include for <FltKernel.h> and add FlgMgr.lib into the Linker
Input / Additional Dependencies line.
Finally, if this method does not produce a result, we fall back to using the provided image path:
g_State.AddItem(&info->Entry);
Here is the full image load notification code for easier reference (KdPrint removed):
if (ImageInfo->ExtendedInfoPresent) {
auto exinfo = CONTAINING_RECORD(ImageInfo, IMAGE_INFO_EX, ImageInfo);
PFLT_FILE_NAME_INFORMATION nameInfo;
if (NT_SUCCESS(FltGetFileNameInformationUnsafe(
exinfo->FileObject, nullptr,
FLT_FILE_NAME_NORMALIZED | FLT_FILE_NAME_QUERY_DEFAULT,
&nameInfo))) {
wcscpy_s(item.ImageFileName, nameInfo->Name.Buffer);
FltReleaseFileNameInformation(nameInfo);
}
}
if (item.ImageFileName[0] == 0 && FullImageName) {
wcscpy_s(item.ImageFileName, FullImageName->Buffer);
}
g_State.AddItem(&info->Entry);
}
Notice the device name targets for C: and D: in figure 9-3. A file like c:\temp\mydll.dll will be reported
as \Device\DeviceHarddiskVolume3\temp\mydll.dll. It would be nice if the display would show the
common mappings instead of the NT device name.
One way of getting these mappings is by calling QueryDosDevice, which retrieves the target of a
symbolic link stored in the “??” Object Manager directory. We are already familiar with these symbolic
links, as they are valid strings to the CreateFile API.
Based on QueryDosDevice, we can loop over all existing drive letters and store the targets. Then, we
can lookup every device name and find its drive letter (symbolic link). Here is a function to do that.
If we can’t find a match, we’ll just return the original string:
#include <unordered_map>
drives >>= 1;
c++;
}
}
auto pos = wcschr(path + 1, L'\\');
if (pos == nullptr)
return path;
return path;
}
I will let the interested reader figure out how this code works. In any case, since user-mode is not the
focus of this book, you can just use the function as is, as we’ll do in our client.
Here is the part in DisplayInfo that handles image load notifications (within the switch):
case ItemType::ImageLoad:
{
DisplayTime(header->Time);
auto info = (ImageLoadInfo*)buffer;
printf("Image loaded into process %u at address 0x%llX (%ws)\n",
info->ProcessId, info->LoadAddress,
GetDosNameFromNTName(info->ImageFileName).c_str());
break;
}
Here is some example output when running the full driver and client:
Chapter 9: Process and Thread Notifications 301
The core of the driver are process and thread notification callbacks. The most important is the thread
creation callback, where the driver’s job is to determine whether a created thread is a remote one
or not. We must keep an eye for new processes as well, because the first thread in a new process is
technically remote, but we need to ignore it.
The data maintained by the driver and later provided to the client contains the following (Detector-
Public.h):
struct RemoteThread {
LARGE_INTEGER Time;
ULONG CreatorProcessId;
ULONG CreatorThreadId;
ULONG ProcessId;
ULONG ThreadId;
};
Here is the data we’ll store as part of the driver (in KDetector.h):
struct RemoteThreadItem {
LIST_ENTRY Link;
RemoteThread Remote;
};
ULONG NewProcesses[MaxProcesses];
ULONG NewProcessesCount;
ExecutiveResource ProcessesLock;
LIST_ENTRY RemoteThreadsHead;
FastMutex RemoteThreadsLock;
LookasideList<RemoteThreadItem> Lookaside;
There are a few class wrappers for kernel APIs we haven’t seen yet. FastMutex is the same we used in
the SysMon driver. ExecutiveResource is a wrapper for an ERESOURCE structure and APIs we looked
at in chapter 6. Here is its declaration and definition:
Chapter 9: Process and Thread Notifications 303
// ExecutiveResource.h
struct ExecutiveResource {
void Init();
void Delete();
void Lock();
void Unlock();
void LockShared();
void UnlockShared();
private:
ERESOURCE m_res;
bool m_CritRegion;
};
// ExecutiveResource.cpp
void ExecutiveResource::Init() {
ExInitializeResourceLite(&m_res);
}
void ExecutiveResource::Delete() {
ExDeleteResourceLite(&m_res);
}
void ExecutiveResource::Lock() {
m_CritRegion = KeAreApcsDisabled();
if(m_CritRegion)
ExAcquireResourceExclusiveLite(&m_res, TRUE);
else
ExEnterCriticalRegionAndAcquireResourceExclusive(&m_res);
}
void ExecutiveResource::Unlock() {
if (m_CritRegion)
ExReleaseResourceLite(&m_res);
else
ExReleaseResourceAndLeaveCriticalRegion(&m_res);
}
void ExecutiveResource::LockShared() {
Chapter 9: Process and Thread Notifications 304
m_CritRegion = KeAreApcsDisabled();
if (m_CritRegion)
ExAcquireResourceSharedLite(&m_res, TRUE);
else
ExEnterCriticalRegionAndAcquireResourceShared(&m_res);
}
void ExecutiveResource::UnlockShared() {
Unlock();
}
A similar API, KeAreAllApcsDisabled returns true if all APCs are disabled (essentially
whether the thread is in a guarded region).
• An Executive Resource is used to protect the NewProcesses array from concurrent write access.
The idea is that more reads than writes are expected for this data. In any case, I wanted to show
a possible wrapper for an Executive Resource.
• The class presents an interface that can work with the Locker<TLock> type we have been
using for exclusive access. For shared access, the LockShared and UnlockShared methods are
provided. To use them conveniently, a companion class to Locker<> can be written to acquire
the lock in a shared manner. Here is its definition (in Locker.h as well):
template<typename TLock>
struct SharedLocker {
SharedLocker(TLock& lock) : m_lock(lock) {
lock.LockShared();
}
~SharedLocker() {
m_lock.UnlockShared();
}
private:
TLock& m_lock;
};
LookasideList<T> is a wrapper for lookaside lists we met in chapter 8. It’s using the new API, as it’s
easier for selecting the pool type required. Here is its definition (in LookasideList.h):
Chapter 9: Process and Thread Notifications 305
template<typename T>
struct LookasideList {
NTSTATUS Init(POOL_TYPE pool, ULONG tag) {
return ExInitializeLookasideListEx(&m_lookaside, nullptr, nullptr,
pool, 0, sizeof(T), tag, 0);
}
void Delete() {
ExDeleteLookasideListEx(&m_lookaside);
}
T* Alloc() {
return (T*)ExAllocateFromLookasideListEx(&m_lookaside);
}
void Free(T* p) {
ExFreeToLookasideListEx(&m_lookaside, p);
}
private:
LOOKASIDE_LIST_EX m_lookaside;
};
Going back to the data members for this driver. The purpose of the NewProcesses array is to keep track
of new processes before their first thread is created. Once the first thread is created, and identified as
such, the array will drop the process in question, because from that point on, any new thread created
in that process from another process is a remote thread for sure. We’ll see all that in the callbacks
implementations.
The driver uses a simple array rather than a linked list, because I don’t expect a lot of processes with no
threads to exist for more than a tiny fraction, so a fixed sized array should be good enough. However,
you can change that to a linked list to make this bulletproof.
When a new process is created, it should be added to the NewProcesses array since the process has
zero threads at that moment:
if (CreateInfo) {
if (!AddNewProcess(ProcessId)) {
KdPrint((DRIVER_PREFIX "New process created, no room to store\n"));
}
else {
Chapter 9: Process and Thread Notifications 306
AddProcess locates an empty “slot” in the array and puts the process ID in it:
1. Add process names to the data maintained by the driver for each remote thread. A
remote thread is when the creator (the caller) is different than the process in which
the new thread is created. We also have to remove some false positives:
The second and third checks make sure the source process or target process is not the System process.
The reasons for the System process to exist in these cases are interesting to investigate, but are out
of scope for this book - we’ll just remove these false positives. The question is how to identify the
System process. All versions of Windows from XP have the same PID for the System process: 4. We
Chapter 9: Process and Thread Notifications 307
could use that number because it’s unlikely to change in the future, but there is another way, which
is foolproof and also allows me to introduce something new.
The kernel exports a global variable, PsInitialSystemProcess, which always points to the System
process’ EPROCESS structure. This pointer can be used just like any other opaque process pointer.
If the thread is indeed remote, we must check if it’s the first thread in the process, and if so, discard
this as a remote thread:
if (remote) {
//
// really remote if it's not a new process
//
bool found = FindProcess(ProcessId);
If the process is found, then it’s the first thread in the process and we should remove the process from
the new processes array so that subsequent remote threads (if any) can be identified as such:
if (found) {
//
// first thread in process, remove process from new processes array
//
RemoveProcess(ProcessId);
}
RemoveProcess searches for the PID and removes it from the array by zeroing it out:
Chapter 9: Process and Thread Notifications 308
If the process isn’t found, then it’s not new and we have a real remote thread on our hands:
else {
//
// really a remote thread
//
auto item = Lookaside.Alloc();
auto& data = item->Remote;
KeQuerySystemTimePrecise(&data.Time);
data.CreatorProcessId = HandleToULong(PsGetCurrentProcessId());
data.CreatorThreadId = HandleToULong(PsGetCurrentThreadId());
data.ProcessId = HandleToULong(ProcessId);
data.ThreadId = HandleToULong(ThreadId);
KdPrint((DRIVER_PREFIX
"Remote thread detected. (PID: %u, TID: %u) -> (PID: %u, TID: %u)\n",
data.CreatorProcessId, data.CreatorThreadId,
data.ProcessId, data.ThreadId));
Locker locker(RemoteThreadsLock);
// TODO: check the list is not too big
InsertTailList(&RemoteThreadsHead, &item->Link);
}
Getting the data to a user mode client can be done in the same way as we did for the SysMon driver:
Chapter 9: Process and Thread Notifications 309
//
// if remaining buffer size is too small, break
//
if (len < sizeof(RemoteThread))
break;
Because there is just one type of “event” and it has a fixed size, the code is simpler than in the SysMon
case.
The full driver code is in the KDetector project in the solution for this chapter.
int main() {
HANDLE hDevice = CreateFile(L"\\\\.\\kdetector", GENERIC_READ, 0,
nullptr, OPEN_EXISTING, 0, nullptr);
if (hDevice == INVALID_HANDLE_VALUE)
return Error("Error opening device");
for (;;) {
DWORD bytes;
if (!ReadFile(hDevice, rt, sizeof(rt), &bytes, nullptr))
return Error("Failed to read data");
CloseHandle(hDevice);
return 0;
}
earlier) is when a debugger wishes to forcefully break into a target process. Here is one way to do
that:
Here are some examples of output when the detector client is running:
13:08:15.280: Remote Thread from PID: 7392 TID: 4788 -> PID: 8336 TID: 9384
13:08:58.660: Remote Thread from PID: 7392 TID: 13092 -> PID: 8336 TID: 13288
13:10:52.313: Remote Thread from PID: 7392 TID: 13092 -> PID: 8336 TID: 12676
13:11:25.207: Remote Thread from PID: 15268 TID: 7564 -> PID: 1844 TID: 6688
13:11:25.209: Remote Thread from PID: 15268 TID: 15152 -> PID: 1844 TID: 7928
You might find some remote thread entries surprising (run Process Explorer for a while, for example)
Summary
In this chapter we looked at some of the callback mechanisms provided by the kernel: process, thread
and image loads. In the next chapter, we’ll continue with more callback mechanisms - opening handles
to certain object types, and Registry notifications.
Chapter 10: Object and Registry
Notifications
The kernel provides more ways to intercept certain operations. First, we’ll examine object notifications,
where obtaining handles to some types of objects can be intercepted. Then, we’ll look at Registry
operations interception.
In this chapter:
• Object Notifications
• The Process Protector Driver
• Registry Notifications
• Extending the SysMon Driver
• Exercises
Object Notifications
The kernel provides a mechanism to notify interested drivers when attempts to open or duplicate
a handle to certain object types. The officially supported object types are process, thread, and for
Windows 10 - desktop as well.
Desktop Objects
A desktop is a kernel object contained in a Window Station, yet another kernel object, which is in
itself part of a Session. A desktop contains windows, menus, and hooks. The hooks referred to here
are user-mode hooks available with the SetWindowsHookEx API.
Normally, when a user logs in, two desktops are created. A desktop named “Winlogon” is created
by Winlogon.exe. This is the desktop that you see when pressing the Secure Attention Sequence
key combination(SAS, normally Ctrl+Alt+Del). The second desktop is named “default” and is the
normal desktop we are familiar with, where normal windows are shown and used. Switching to
another desktop is done with the SwitchDesktop API. For some more details, read this blog post.
https://scorpiosoftware.net/2019/02/17/windows-10-desktops-vs-sysinternals-desktops/
NTSTATUS ObRegisterCallbacks (
_In_ POB_CALLBACK_REGISTRATION CallbackRegistration,
_Outptr_ PVOID *RegistrationHandle);
The concept of Altitude is also used for registry filtering (see “Registry Notifications” later in this
chapter) and file system mini-filters (see chapter 12).
Finally, RegistrationContext is a driver defined value that is passed as-is to the callback routine(s).
The OB_OPERATION_REGISTRATION structure(s) is where the driver sets up its callbacks, indicates
which object types and operations are of interest. It’s defined like so:
ObjectType is a pointer to the object type for this instance registration - process, thread or desk-
top. These pointers are exported as global kernel variables: PsProcessType, PsThreadType, and
ExDesktopObjectType, respectively.
The Operations field must specify one or two flags (OB_OPERATION), selecting create/open (OB_-
OPERATION_HANDLE_CREATE) and/or duplicate (OB_OPERATION_HANDLE_DUPLICATE).
OB_OPERATION_HANDLE_CREATE refers to calls to user mode functions such as CreateProcess,
OpenProcess, CreateThread, OpenThread, CreateDesktop, OpenDesktop and similar functions for
these object types. OB_OPERATION_HANDLE_DUPLICATE refers to handle duplication for these objects
(such as using the DuplicateHandle user-mode API).
The APIs intercepted are not user-mode only; kernel APIs are intercepted as well (the callbacks
parameters do indicate if the handle being created/duplicated is a kernel handle). Kernel APIs such
as ZwOpenProcess, PsCreateSystemThread, and ZwDuplicateObject are examples of affected
functions.
Any time one of these calls is made, one or two callbacks can be registered: a pre-operation callback
(PreOperation field) and/or a post-operation callback (PostOperation).
Pre-Operation Callback
The pre-operation callback is invoked before the actual create/open/duplicate operation completes,
giving a chance to the driver to make changes to the operation’s result. The pre-operation callback
receives a OB_PRE_OPERATION_INFORMATION structure, defined as shown here:
Chapter 10: Object and Registry Notifications 315
The driver should inspect the appropriate field based on the operation. For Create operations, the
driver receives the following information:
The OriginalDesiredAccess is the access mask specified by the caller. Consider this user-mode code to
open a handle to an existing process:
Chapter 10: Object and Registry Notifications 316
In this example, the client tries to obtain a handle to a process with the specified access mask,
indicating what are its “intentions” towards that process. The driver’s pre-operation callback receives
this value in the OriginalDesiredAccess field. This value is also copied to DesiredAccess. Normally,
the kernel will determine, based on the client’s security context and the process’ security descriptor
whether the client can be granted the access it desires.
The driver can, based on its own logic, modify DesiredAccess for example by removing some of the
access requested by the client:
The above code snippet removes the PROCESS_VM_READ access mask before letting the operation
continue normally. If it eventually succeeds, the client will get back a valid handle, but only with
the PROCESS_QUERY_INFORMATION access mask.
You can find the complete list of process, thread and desktop access masks in the MSDN
documentation.
You cannot add new access mask bits that were not requested by the client.
For duplicate operations, the information provided to the driver is the following:
Chapter 10: Object and Registry Notifications 317
The DesiredAccess field can be modified as before. The extra information provided is the source
process (from which a handle is being duplicated) and the target process (the process the new handle
will be duplicated into). This allows the driver to query various properties of these processes before
making a decision on how to modify (if at all) the desired access mask.
Notice that although both structures in the union are different, the first two members are
the same, so they have the same layout in memory. This is useful for handling create and
duplicate operations with the same code.
Post-Operation Callback
Post-operation callbacks are invoked after the operation completes. At this point, the driver cannot
make any modifications, it can only look at the results. The post-operation callback receives the
following structure:
This looks similar to the pre-operation callback information, except for the following:
• The final status of the operation is returned in ReturnStatus. If successful, it means the client
will get back a valid handle (possibly with a reduced access mask).
• The Parameters union provided has just one piece of information: the access mask granted to
the client (assuming the status is successful).
Chapter 10: Object and Registry Notifications 318
#define PROCESS_TERMINATE 1
struct Globals {
ULONG PidsCount; // currently protected process count
ULONG Pids[MaxPids]; // protected PIDs
ExecutiveResource Lock;
PVOID RegHandle;
void Init() {
Lock.Init();
}
void Term() {
Lock.Delete();
}
};
Notice that we must define PROCESS_TERMINATE explicitly, since it’s not defined in the
WDK headers (only PROCESS_ALL_ACCESS is defined). It’s fairly easy to get its definition
from user mode headers or documentation.
The ExecutiveResource type is the same used in chapter 9. It’s important to use an Executive
Resource here and not a (fast) mutex because we anticipate many more “reads” (checks if a process is
under the driver’s termination protection) than “writes” (adding or removing processes), so there is a
clear advantage to an Executive Resource in this case. The main file (Protector.cpp) declares a global
variable of type Globals named g_Data, calls Init in DriverEntry, and calls Term in the Unload
routine, as we’ll see shortly.
extern "C"
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING) {
g_Data.Init();
OB_OPERATION_REGISTRATION operation = {
PsProcessType, // object type
OB_OPERATION_HANDLE_CREATE | OB_OPERATION_HANDLE_DUPLICATE,
OnPreOpenProcess, nullptr // pre, post
};
OB_CALLBACK_REGISTRATION reg = {
OB_FLT_REGISTRATION_VERSION,
1, // operation count
RTL_CONSTANT_STRING(L"12345.6171"), // altitude
nullptr, // context
&operation // single operation
};
The registration is for process objects only, with a pre-callback provided. This callback should remove
the PROCESS_TERMINATE access mask from the desired access requested by the client.
Now we’re ready to do perform all standard initializatio, including objack callback registration:
do {
status = ObRegisterCallbacks(®, &g_Data.RegHandle);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to register callbacks (0x%08X)\n",
status));
break;
}
if (!NT_SUCCESS(status)) {
if (g_Data.RegHandle)
ObUnRegisterCallbacks(g_Data.RegHandle);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
return status;
}
DriverObject->DriverUnload = ProtectUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = ProtectCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = ProtectDeviceControl;
return status;
}
#define IOCTL_PROTECT_ADD_PID \
CTL_CODE(KPROTECT_DEVICE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_PROTECT_REMOVE_PID \
CTL_CODE(KPROTECT_DEVICE, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_PROTECT_REMOVE_ALL \
CTL_CODE(KPROTECT_DEVICE, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS)
Before implementing the I/O Control codes, we should write functions to add processes, remove
processes, and find whether a specific PID is under the driver’s protection. Here is the function to
add an array of process IDs:
Chapter 10: Object and Registry Notifications 321
Locker locker(g_Data.Lock);
for (int i = 0; i < MaxPids && added < count; i++) {
if (g_Data.Pids[i] == 0) {
g_Data.Pids[i] = pids[current++];
added++;
}
}
g_Data.PidsCount += added;
return added;
}
The function acquires the Executive Resource exlusively, as it is going to change the the PIDs array.
The loop body looks for an “empty” slot (where the PID is zero). If it finds one, it changes the value to
the current PID to house, and then moves on to the next. Finally, AddProcesses returns the number
of added PIDs.
The function does not check if the PID was already added. It doesn’t cause any particular issues,
but it might be nice to check for duplication, at the expense of a higher running time.
Locker locker(g_Data.Lock);
for (int i = 0; i < MaxPids && removed < count; i++) {
auto pid = g_Data.Pids[i];
if(pid) {
for (ULONG c = 0; c < count; c++) {
if (pid == pids[c]) {
g_Data.Pids[i] = 0;
removed++;
break;
}
}
}
}
Chapter 10: Object and Registry Notifications 322
g_Data.PidsCount -= removed;
return removed;
}
This function does the reverse - when it finds a non-zero PID, it searches the PIDs to remove with the
current PID, and if found, removes the PID by zeroing the entry in the array.
Lastly, FindProcess searches for a PID in the array:
This is a function we expect to be called many more times than AddProcesses or RemoveProcesses
- it should be called any time clients call OpenProcess or DuplicateHandle with a process handle to
duplicate. Any number of threads can be making such calls at any time. This is why it’s important to
make the function as efficient as possible.
The function does not change the PIDs array, which is why it can acquire the Executive Resource is
shared mode (and thus improve concurrency). Then the PID is searched in the array, returning its
index if found, or -1 if it can’t be found. Failing to find the PID should be the common case since the
driver is likely to protect a small number of processes. This is why the number of non-zero PIDs is
counted, and if it reaches the number of PIDs protected (g_Data.PidsCount), the loop can be exited
early before the entire MaxPids elements are traversed.
Now we’re ready to implement the IRP_MJ_DEVICE_CONTROL dispatch routine. We’ll start normally,
by preparing the information we need:
Adding and removing PIDs Ioctls accept the same information - an array of ULONG values represening
one or more PIDs. We can share their implementation like so:
Chapter 10: Object and Registry Notifications 323
switch (dic.IoControlCode) {
case IOCTL_PROTECT_ADD_PID:
case IOCTL_PROTECT_REMOVE_PID:
{
if (inputLen == 0 || inputLen % sizeof(ULONG) != 0) {
status = STATUS_INVALID_BUFFER_SIZE;
break;
}
auto pids = (ULONG*)Irp->AssociatedIrp.SystemBuffer;
if (pids == nullptr) {
status = STATUS_INVALID_PARAMETER;
break;
}
ULONG count = inputLen / sizeof(ULONG);
auto added = dic.IoControlCode == IOCTL_PROTECT_ADD_PID
? AddProcesses(pids, count) : RemoveProcesses(pids, count);
status = added == count ? STATUS_SUCCESS : STATUS_NOT_ALL_ASSIGNED;
info = added * sizeof(ULONG);
break;
}
First we have the usual checks for a proper buffer size and the system buffer being non-NULL. Then,
it’s just a matter of calling AddProcesses or RemoveProcesses as needed. The final status is set
to STATUS_SUCCESS if all the provided PIDs are added or removed. Otherwise, STATUS_NOT_ALL_-
ASSIGNED is set as the error value. This status is returned from trying to enable privileges in a token,
hijacked here as a convenience (or more likely laziness on my part).
Removing all processes is fairly simple, done directly in the case itself:
case IOCTL_PROTECT_REMOVE_ALL:
Locker locker(g_Data.Lock);
RtlZeroMemory(g_Data.Pids, sizeof(g_Data.Pids));
g_Data.PidsCount = 0;
status = STATUS_SUCCESS;
break;
}
Removing all PIDs is just clearing the PIDs array and resetting the count of protected processes to
zero.
Finally, CompleteRequest is used to complete the IRP with the current status and information, the
same helper function we used in chapter 9.
Chapter 10: Object and Registry Notifications 324
The Pre-Callback
The most important part of the driver is removing the PROCESS_TERMINATE access mask for PIDs that
are currently being protected:
OB_PREOP_CALLBACK_STATUS
OnPreOpenProcess(PVOID, POB_PRE_OPERATION_INFORMATION Info) {
if(Info->KernelHandle)
return OB_PREOP_SUCCESS;
AutoLock locker(g_Data.Lock);
if (FindProcess(pid)) {
// found in list, remove terminate access
Info->Parameters->CreateHandleInformation.DesiredAccess &=
~PROCESS_TERMINATE;
}
return OB_PREOP_SUCCESS;
}
If the handle is a kernel handle, we let the operation continue normally, since we don’t want to stop
kernel code from working properly.
Now we need the process ID for which a handle is being opened. The data provided in the callback
as the object pointer. Fortunately, getting the PID is simple with the PsGetProcessId API. It accepts
a PEPROCESS and returns its ID.
The last part is checking whether we’re actually protecting this particular process or not, so we call
FindProcess under the protection of the lock. If found, we remove the PROCESS_TERMINATE access
mask.
std::vector<DWORD> pids;
BOOL success = FALSE;
DWORD bytes;
switch (option) {
case Options::Add:
pids = ParsePids(argv + 2, argc - 2);
success = ::DeviceIoControl(hFile, IOCTL_PROCESS_PROTECT_BY_PID,
pids.data(), static_cast<DWORD>(pids.size()) * sizeof(DWORD),
nullptr, 0, &bytes, nullptr);
break;
case Options::Remove:
pids = ParsePids(argv + 2, argc - 2);
success = ::DeviceIoControl(hFile, IOCTL_PROCESS_UNPROTECT_BY_PID,
pids.data(), static_cast<DWORD>(pids.size()) * sizeof(DWORD),
nullptr, 0, &bytes, nullptr);
break;
Chapter 10: Object and Registry Notifications 326
case Options::Clear:
success = ::DeviceIoControl(hFile, IOCTL_PROCESS_PROTECT_CLEAR,
nullptr, 0, nullptr, 0, &bytes, nullptr);
break;
if (!success)
return Error("Failed in DeviceIoControl");
printf("Operation succeeded.\n");
::CloseHandle(hFile);
return 0;
}
The ParsePids helper function parses process IDs and returns them as a std::vector<DWORD> that is
easy to pass as an array by using the data() method on std::vector<T>:
Finally, the Error function is the same we used in previous projects, while PrintUsage just displays
simple usage information.
The driver is installed in the usual way, and then started:
sc start protect
Let’s test it by launching a process (Notepad.exe) as an example, protecting it, and then trying to kill
it with Task Manager. Figure 10-1 shows the notepad instance running.
Chapter 10: Object and Registry Notifications 327
Clicking End task in Task Manager, pops up an error, shown in Figure 10-2.
We can remove the protection and try again. This time the process is terminated as expected.
In the case of notepad, even with protection, clicking the window close button or selecting
File/Exit from the menu would terminate the process. This is because it’s being done internally by
calling ExitProcess which does not involve any handles being opened. This means the protection
mechanism we devised here is good for processes without any user interface.
Chapter 10: Object and Registry Notifications 328
Add a control code that allows querying the currently protected processes.
Registry Notifications
Somewhat similar to object notifications, the Configuration Manager (the part in the Executive that
manages the Registry) can be used to register for notifications when Registry keys or values are
accessed.
Before we look at Registry callbacks, some background on the Registry itself might be helpful.
Registry Overview
The Registry is a fairly well-known artifact in Windows; it’s a hirarchical database, used to store
system-wide and user-related information. Most of the data in the Registry is persisted in files, but
some is generated dynamically and not persisted (volatile).
The typical tool used to examine the Registry is RegEdit, part of Windows. Figure 10-3 shows the
hives shown when running RegEdit. The documented user-mode APIs use this layout of the Registry
in order to access keys.
HKEY hKey;
DWORD error = RegOpenKeyEx(HKEY_LOCAL_MACHINE,
L"SOFTWARE\\Microsoft\\DirectX", 0, KEY_READ, &hKey);
if (ERROR_SUCCESS == error) {
WCHAR version[64];
ULONG count = sizeof(version);
error = RegQueryValueEx(hKey, L"Version", nullptr, nullptr,
(BYTE*)version, &count);
if (ERROR_SUCCESS == error) {
printf("DirectX version: %ws\n", version);
}
RegCloseKey(hKey);
}
More details about the user-mode Registry API can be found in chapter 15 of my book “Windows
10 System Programming, part 2”.
If you run this little piece of code, and examine the key handle returned from RegOpenKeyEx in Process
Explorer, you’ll see something like figure 10-5. The key “name” seems to be what we have used.
Chapter 10: Object and Registry Notifications 330
However, if you double-click the handle to show the object’s (key) properties, you’ll see something
similar to figure 10-6.
Notice the key name in the title bar. We can confirm the name by copying the real object address and
feeding it to a kernel debugger using the !object command:
Chapter 10: Object and Registry Notifications 331
The “real” key name starts with “REGISTRY”, which is in fact a named kernel object stored at the root
of the Object Manager’s namespace (figure 10-7).
Clearly, the names used to access keys from documented Windows APIs go through some “trans-
lation”, changing HKEY_LOCAL_MACHINE to REGISTRY\MACHINE. To see the entire picture,
showing the “real” Registry, you can use my RegExp tool, downloadable from my Github repo (figure
10-8). It shows both the Registry as observed by user-mode APIs (upper part) and the real Registry
(lowe part), as used internally within the kernel.
Chapter 10: Object and Registry Notifications 332
All the key names received/handled with the following Registry notifications always use the real key
names.
NTSTATUS CmRegisterCallbackEx (
_In_ PEX_CALLBACK_FUNCTION Function,
_In_ PCUNICODE_STRING Altitude,
_In_ PVOID Driver, // PDRIVER_OBJECT
_In_opt_ PVOID Context,
_Out_ PLARGE_INTEGER Cookie,
_Reserved_ PVOID Reserved
Function is the callback itself, which we’ll look at in a moment. Altitude is the driver’s callback altitude,
which essentially has the same meaning as it has with object callbacks. The Driver argument should
be the driver object provided to DriverEntry. Context is a driver-defined value passed as-is to the
callback. Finally, Cookie is the result of the registration if successful. This cookie should be passed to
CmUnregisterCallback to unregister.
It’s a bit annoying that all the various registration APIs are inconsistent with respect to reg-
istration/unregistration: CmRegisterCallbackEx returns a LARGE_INTEGER as representing the
registration; ObRegisterCallbacks returns a PVOID; process and thread registration functions
return nothing (internally use the address of the callback itself to identify the registration). Finally,
process and thread unregistration is done with asymmetric APIs; Oh well.
NTSTATUS RegistryCallback (
_In_ PVOID CallbackContext,
_In_opt_ PVOID Argument1,
_In_opt_ PVOID Argument2);
The callback is called at IRQL PASSIVE_LEVEL (0) by the thread performing the operation.
Table 10-2 shows some values from the REG_NOTIFY_CLASS enumeration and the corresponding
structure passed in as Argument2.
Chapter 10: Object and Registry Notifications 334
Handling Pre-Notifications
The callback is called for pre-operations before these are carried out by the Configuration Manager.
At that point, the driver has the following options:
• Returning STATUS_SUCCESS from the callback instructs the Configuration Manager to continue
processing the operation normally (including calling other drivers that have registered for
notifications).
• Return some failure status from the callback. In this case, the Configuration Manager returns
to the caller with that status, and the post-operation will not be invoked.
• Handle the request in some way, and then return STATUS_CALLBACK_BYPASS from the callback.
The Configuration Manager returns success to the caller and does not invoke the post-operation.
The driver must take care to set proper values in the REG_xxx_KEY_INFORMATION structure
provided in the callback.
Handling Post-Operations
After the operation is completed, and assuming the driver did not prevent the post-operation from
occurring, the callback is invoked after the Configuration Manager performs the requested operation.
The structure provided for many post operations is shown here:
• Look at the operation result and do something benign (log it, for instance).
• Modify the return status by setting a new status value in the ReturnStatus field of the post-
operation structure, and return STATUS_CALLBACK_BYPASS from the callback. The Configura-
tion Manager returns this new status to the caller.
• Modify the output parameters in the REG_xxx_KEY_INFORMATION structure and return STATUS_-
SUCCESS. The Configuration Manager returns this new data to the caller.
Key names, value names and values could be large, so it’s best not to use fixed-size arrays (although
that would be much simpler), but store offsets to the names and value. Each name will be NULL-
terminated, which avoids the need to store lengths of strings (as we did in the command line case in
chapter 9). The data itself could be arbitrarily large, so we’ll have to decide on a maximum length to
copy as part of the notification.
DataType is one of the REG_xxx type constants, such as REG_SZ, REG_DWORD, REG_BINARY, etc. These
values are the same as used with user-mode APIs.
Next, we’ll add a new event type for this notification:
Chapter 10: Object and Registry Notifications 336
It’s possible to subdivide Registry notifications further by defining a Registry item type and then
define specific items for different Registry operations. In this example, we just add one specific
Registry operation, but you may want to take the more generic approach if multiple Registry
operations are of interest.
In DriverEntry, we need to add registry callback registration as part of the do/while(false) block.
The returned cookie representing the registration is stored in a global variable:
It would have been better to encapsulate all state in the Globals strcuture and provide
methods for initializing and uninitializing all the callbacks within this class. This is left as
an exercise to the reader.
CmUnRegisterCallback(g_RegCookie);
switch ((REG_NOTIFY_CLASS)(ULONG_PTR)arg1) {
case RegNtPostSetValueKey:
//...
}
return STATUS_SUCCESS;
}
In this driver we don’t care about any other operation, so after the switch we simply return a
successful status. Note that we examine the post-operation, since only the result is interesting for
this driver. Next, inside the case we care about, we cast the second argument to the post-operation
data and check if the operation succeeded:
If the operation is not successful, we bail out. This is just an arbitrary decision for this driver; a
different driver might be interested in these failed attempts.
Next, we need to check if the key in question is under HKEY_LOCAL_MACHINE, which as we’ve
seen is in actuality \REGISTRY\MACHINE.
The key path is not stored in the post-structure and not even stored in the pre-structure di-
rectly. Instead, the Registry key object itself is provided as part of the post-information structure.
We then need to extract the key name with CmCallbackGetKeyObjectIDEx (Windows 8+) or
CmCallbackGetKeyObjectID (earlier versions), and see if it’s starting with \REGISTRY\MACHINE\.
These APIs are declared as follows:
NTSTATUS CmCallbackGetKeyObjectID (
_In_ PLARGE_INTEGER Cookie,
_In_ PVOID Object,
_Out_opt_ PULONG_PTR ObjectID,
_Outptr_opt_ PCUNICODE_STRING *ObjectName);
NTSTATUS CmCallbackGetKeyObjectIDEx (
_In_ PLARGE_INTEGER Cookie,
_In_ PVOID Object,
_Out_opt_ PULONG_PTR ObjectID,
_Outptr_opt_ PCUNICODE_STRING *ObjectName,
_In_ ULONG Flags); // must be zero
Cookie identifies the registration cookie returned from CmRegisterCallbackEx, identifying the
driber. Object is the Registry key whos name we need. ObjectID is an optional returned value
Chapter 10: Object and Registry Notifications 338
that provides the unique identifier of the key in question. Finally, ObjectName is a pointer to a
UNICODE_STRING pointer retruned with the full key name itself.
The two APIs are identical from a parameter perspective, as the Flags argument to CmCallbackGetKeyObjectIDEx
must be zero. There are differences in implementation, however:
First, The returned key name from CmCallbackGetKeyObjectID is valid until the last handle
of the key is closed. With CmCallbackGetKeyObjectIDEx, the name must be freed by calling
CmCallbackReleaseKeyObjectIDEx:
Second, if the name of the Registry key is changed after it’s been obtained with CmCallbackGetKeyObjectID,
subsequent calls to CmCallbackGetKeyObjectID will return the old, stale, name. In contrast, CmCallbackReleaseKe
always returns the current key name.
Here is the call to obtain the key name and checking if it’s part of HKLM:
If the condition holds, then we need to capture the information of the operation into our notification
structure and add it to the queue. The needed information (data type, value name, actual value, etc.)
is provided with the pre-information structure that is luckily available as part of the post-information
structure we receive directly.
Calculating the correct size to allocate is more involved than previous cases, as we have several
variable-length strings to deal with. We can start with the base data structure size and then add
the sizes (in bytes) of the strings (not forgetting to leave room for a terminating NULL):
Chapter 10: Object and Registry Notifications 339
The driver stores the data itself, and since it’s unbounded in theory, we decide to store no more than
256 bytes. We will still report the true size of the data - the data itself may be truncated.
Now comes the real work of making the allocation and filling all the details. First, the fixed-size data,
including the header:
data.DataOffset = offset;
memcpy((PUCHAR)&data + offset, preInfo->Data, valueSize);
Using wcsncpy_s to copy the strings is a good choice in this case, since it appends NULL at the end of
strings (if there is enough space, and we made sure of that).
Finally, if CmCallbackGetKeyObjectIDEx succeeds, the resulting key name must be explicitly freed:
CmCallbackReleaseKeyObjectIDEx(name);
switch ((REG_NOTIFY_CLASS)(ULONG_PTR)arg1) {
case RegNtPostSetValueKey:
auto args = (REG_POST_OPERATION_INFORMATION*)arg2;
if (!NT_SUCCESS(args->Status))
break;
PCUNICODE_STRING name;
if (NT_SUCCESS(CmCallbackGetKeyObjectIDEx(
&g_RegCookie, args->Object, nullptr, &name, 0))) {
//
// look for HKLM subkeys
//
if (wcsncmp(name->Buffer, machine, ARRAYSIZE(machine) - 1) == 0) {
auto preInfo = (REG_SET_VALUE_KEY_INFORMATION*)args->PreInformation;
USHORT size = sizeof(RegistrySetValueInfo);
USHORT keyNameLen = name->Length + sizeof(WCHAR);
USHORT valueNameLen = preInfo->ValueName->Length + sizeof(WCHAR);
//
// restrict copied data to 256 bytes
//
USHORT valueSize = (USHORT)min(256, preInfo->DataSize);
size += keyNameLen + valueNameLen + valueSize;
auto info = (FullItem<RegistrySetValueInfo>*)
Chapter 10: Object and Registry Notifications 341
ExAllocatePoolWithTag(PagedPool,
size + sizeof(LIST_ENTRY), DRIVER_TAG);
if (info) {
auto& data = info->Data;
KeQuerySystemTimePrecise(&data.Time);
data.Type = ItemType::RegistrySetValue;
data.Size = size;
data.DataType = preInfo->Type;
data.ProcessId = HandleToULong(PsGetCurrentProcessId());
data.ThreadId = HandleToUlong(PsGetCurrentThreadId());
data.ProvidedDataSize = valueSize;
data.DataSize = preInfo->DataSize;
//
// first offset starts at the end of the structure
//
USHORT offset = sizeof(data);
data.KeyNameOffset = offset;
wcsncpy_s((PWSTR)((PUCHAR)&data + offset),
keyNameLen / sizeof(WCHAR), name->Buffer,
name->Length / sizeof(WCHAR));
offset += keyNameLen;
data.ValueNameOffset = offset;
wcsncpy_s((PWSTR)((PUCHAR)&data + offset),
valueNameLen / sizeof(WCHAR), preInfo->ValueName->Buffer,
preInfo->ValueName->Length / sizeof(WCHAR));
offset += valueNameLen;
data.DataOffset = offset;
memcpy((PUCHAR)&data + offset, preInfo->Data, valueSize);
g_State.AddItem(&info->Entry);
}
else {
KdPrint((DRIVER_PREFIX
"Failed to allocate memory for registry set value\n"));
}
}
CmCallbackReleaseKeyObjectIDEx(name);
}
break;
}
return STATUS_SUCCESS;
}
Chapter 10: Object and Registry Notifications 342
case ItemType::RegistrySetValue:
{
DisplayTime(header->Time);
auto info = (RegistrySetValueInfo*)buffer;
printf("Registry write PID=%u, TID=%u: %ws\\%ws type: %d size: %d data: ",
info->ProcessId, info->ThreadId,
(PCWSTR)((PBYTE)info + info->KeyNameOffset),
(PCWSTR)((PBYTE)info + info->ValueNameOffset),
info->DataType, info->DataSize);
DisplayRegistryValue(info);
break;
}
case REG_SZ:
case REG_EXPAND_SZ:
printf("%ws\n", (PCWSTR)data);
break;
default:
DisplayBinary(data, info->ProvidedDataSize);
break;
}
}
DisplayBinary is a simple helper function that shows binary data as a series of hex values shown
here for completeness:
Chapter 10: Object and Registry Notifications 343
Enhance SysMon by adding I/O control codes to enable/disable certain notification types
(processes, threads, image loads, Registry).
Chapter 10: Object and Registry Notifications 344
Performance Considerations
The Registry callback is invoked for every registry operation; there is no apriori way to request filtering
of certain operations only. This means the callback needs to be as quick as possible since the caller is
waiting. Also, there may be more than one driver in the chain of callbacks.
Some Registry operations, especially read operations happen in large quantities, so it’s better for a
driver to avoid processing read operations, if possible. If it must process read operations, it should at
least limit its processing to certain keys of interest, such as anything under HKLM\System\CurrentControlSet
(just an example). If processing can be done asynchronously, a work item could be used.
Write and create operations are used much less often, so in these cases the driver can do more if
needed.
Miscellaenous Notes
• The documentation provides some warnings when dealing with Registry notifications, worth
repeating here.
Certain Registry operations are lightly-documented because they are not very useful. Modifying the
following operations should be avoided as it’s difficult and error-prone: NtRestoreKey, NtSaveKey,
NtSaveKeyEx, NtLoadKeyEx, NtUnloadKey2, NtUnloadKeyEx, NtReplaceKey, NtRenameKey, NtSetInformationKe
The Object member should not be passed to general kernel routines (such as ObReferenceObjectByPointer).
However, for the first two cases, the object can still be used within the callback by calling
Comfiguration Manager functions (e.g. CmCallbackGetKeyObjectIDEx).
1. Implement a driver that protects a Registry key from modifications. A client can
send the driver registry keys to protect or unprotect.
2. Implement a driver that redirects Registry write operations coming from selected
processes (configured by a client application) to their own private key if they access
HKEY_LOCAL_MACHINE. If the app is writing data, it goes to its private store. If
it’s reading data, first check the private store, and if no such value is found, go to
the real Registry key.
Chapter 10: Object and Registry Notifications 345
Summary
In this chapter, we looked at two callback mechanisms supported by the kernel - obtaining handles to
certain object types, and Registry access. In the next chapter, we’ll look at more techniques that may
be useful for a driver developer.
Chapter 11: Advanced Programming
Techniques (Part 2)
In this chapter we’ll continue to examine techniques of various degrees of usefulness to driver
developers.
In this chapter:
• Timers
• Generic Tables
• Hash Tables
• Singly Linked Lists
• Callback Objects
Timers
We have briefly seen an example that uses a kernel timer in chapter 6. In this section, we’ll cover kernel
timers in more detail, as well as high-resolution timers, which have been introduced in Windows 8.1.
Kernel Timers
A kernel timer is represented by the KTIMER structure that must be allocated from non-paged memory.
The timer can be set to one shot or periodic. The interval itself can be relative or absolute, making
it quite flexible. A kernel timer is a dispatcher object, which means it can be waited upon with
KeWaitForSingleObject and similar APIs. Once a KTIMER is allocated, it must be initialized by
calling KeInitializeTimer or KeInitializeTimerEx:
Chapter 11: Advanced Programming Techniques (Part 2) 347
VOID KeInitializeTimerEx (
_Out_ PKTIMER Timer,
_In_ TIMER_TYPE Type);
There are two kinds of timers (similar to the two kinds of event kernel object types) - NotificationTimer
that releases any number of waiting threads, and remains in the signaled state, or a SynchronizationTimer,
that after releasing a single thread goes to the non-signaled state automatically. KeInitializeTimer
is a shortcut that initializes a notification timer.
Once the timer is initialized, its interval can be set with KeSetTimer (one shot) or KeSetTimerEx
(periodic):
BOOLEAN KeSetTimer (
_Inout_ PKTIMER Timer,
_In_ LARGE_INTEGER DueTime,
_In_opt_ PKDPC Dpc);
BOOLEAN KeSetTimerEx (
_Inout_ PKTIMER Timer,
_In_ LARGE_INTEGER DueTime,
_In_ LONG Period,
_In_opt_ PKDPC Dpc);
Both functions set the timer interval based on a LARGE_INTEGER structure, that is set to a negative
number for a relative count, and a positive number for an absolute count from January 1, 1601, at
midnight GMT. The number (whether positive or negative) is specified as 100nsec units. For example,
1msec equals 10000 x 100nsec units. Here is how to specify a relative interval of 10 milliseconds:
LARGE_INTEGER interval;
interval.QuadPart = -10 * 10000; // 10 msec
The Period argument in KeSetTimerEx indicates the period the timer should count repeatedly from
its first signaling. Curiously enough, it’s specified in milliseconds. Finally, a DPC object can be
Chapter 11: Advanced Programming Techniques (Part 2) 348
specified as an alternative to waiting. If one is provided, it will be inserted in a CPU’s DPC queue and
run just like any other DPC.
Both functions return TRUE if the timer is already in the system’s timer queue. If it was there before
the call, it’s implicitly cancelled and set to the new specified time. With KeSetTimer, once the timer
expires, it won’t restart unless another call to KeSetTimer(Ex) is made. Regardless, a timer can be
cancelled by calling KeCancelTimer:
KeCancelTimer returns TRUE if the timer was in the system’s timer queue - which is always TRUE for
a periodic timer.
Another available API to set a timer’s interval is KeSetCoalescableTimer:
BOOLEAN KeSetCoalescableTimer (
_Inout_ PKTIMER Timer,
_In_ LARGE_INTEGER DueTime,
_In_ ULONG Period,
_In_ ULONG TolerableDelay,
_In_opt_ PKDPC Dpc);
Most parameters are the same as KeSetTimerEx, except for the additional TolerableDelay. This
parameter allows a caller to set some “tolerance” interval in milliseconds that indicates that it’s ok to
program the timer to expire slightly after the provided DueTime by no more than the tolerance delay.
The period (if non-zero) can be up to the tolerance higher or lower. The point of a coallesable timer is
to allow the system to save energy by not waking up too often to signal timers. Close-enough timers
will be “coallesced” by the system, so that a single wakeup can signal multiple timers if their tolerance
allows it.
Finally, you can query a timer’s signaled state by calling KeReadStateTimer (may be useful for
debugging purposes):
Timer Resolution
It may seem from the KeSetTimer(Ex) APIs that the timer’s resolution can be really high, as the units
are very small. For example, it seems you can set a timer to expire after 1 microsecond by specifiying
the value -10 for DueTime. This does not work as expected, however.
There is a default timer resolution, which is typically 15.625 milliseconds in today’s systems. This
is the default (and maximum) resolution, that is also used by the kernel’s scheduler. This resolution
can be changed, however. A quick way to determine the clock’s resolution is to run the Sysinternals
ClockRes.exe command line tool. Here is an example run:
Chapter 11: Advanced Programming Techniques (Part 2) 349
C:\>clockres
The current timer interval is the active one, and is (more often than not) lower than the default.
This is because user mode processes can change the clock’s resolution to get better timing in
wait operations, sleep calls, and timers. For example, the timeBeginPeriod or timeSetEvent user
mode multimedia APIs allow setting up a timer with up to 1 millisecond resolution (both call the
NtSetTimerResolution native API). This causes the clock’s resolution to be reprogrammed to cater
for the client process. The system keeps track of processes that request resolution changes, and so has
to make sure the clock is using the highest resolution (lowest interval) requested by any process.
A kernel driver can specify its own request for a resolution value by calling ExSetTimerResolution:
ULONG ExSetTimerResolution (
_In_ ULONG DesiredTime,
_In_ BOOLEAN SetResolution);
The DesiredTime is in 100-nanosecond (nsec) units. If SetResolution is TRUE, the system adjusts
the resolution to the closest value it can support, and returns the actual set value. If SetResolution
is FALSE, the system decrements an internal counter (incremented for each ExSetTimerResolution
call with TRUE), and if zero is reached, resets the resolution to its initial value. Of course, this will not
occur as long as there are user mode processes that requested a higher resolution than the default.
With Windows 8 and later, you can also query the current resolution without making any changes
with ExQueryTimerResolution:
void ExQueryTimerResolution (
_Out_ PULONG MaximumTime,
_Out_ PULONG MinimumTime,
_Out_ PULONG CurrentTime);
The returned values are in 100-nsec units. Converted to milliseconds, these numbers are the same
ones displayed by ClockRes.
The KeQueryTimeIncrement function returns the same value as the maximum timer
resolution.
Chapter 11: Advanced Programming Techniques (Part 2) 350
High-Resolution Timers
Starting with Windows 8.1, the kernel provides support for another type of timer - high-resolusion
timers, that can be used instead of the “standard” timers. These newer timers offer the following
benefits over standard timers:
• There is no need to set the timer resolution explicitly - it will be set as required based on the
provided interval (and revert automatically as well).
• High resolution timers never expire earlier than their set time.
• There is no need to set up an explicit DPC to be used as callback - the callback is specified
directly as part of setting the timer. The system will invoke the callback at IRQL DISPATCH_-
LEVEL (2).
PEX_TIMER ExAllocateTimer (
_In_opt_ PEXT_CALLBACK Callback,
_In_opt_ PVOID CallbackContext,
_In_ ULONG Attributes);
VOID EXT_CALLBACK (
_In_ PEX_TIMER Timer,
_In_opt_ PVOID Context);
The CallbackContext parameter to ExAllocateTimer is passed as-is to the callback function, along
with the timer object itself. The attributes provided can be zero or the following:
ExAllocateTimer returns an opaque pointer to the allocated timer object that must be eventually
freed with ExDeleteTimer (shown later).
The next step is to set the timer interval and start it by calling ExSetTimer:
Chapter 11: Advanced Programming Techniques (Part 2) 351
BOOLEAN ExSetTimer (
_In_ PEX_TIMER Timer,
_In_ LONGLONG DueTime,
_In_ LONGLONG Period,
_In_opt_ PEXT_SET_PARAMETERS Parameters);
High-resolution timers only work with relative time, meaning DueTime must be a negative value
(in the usual 100 nsec units). The optional Period parameter is the period for a periodic timer. It’s
specified in the same 100 nsec units (contrary to a standard timer where the period is specified in
milliseconds). Finally, Parameters can be NULL or a pointer to EXT_SET_PARAMETERS:
The only parameter of interest is NoWakeTolerance, indicates the timer’s maximum tolerance for wak-
ing a processor. If the value is set to EX_TIMER_UNLIMITED_TOLERANCE, the timer never wakes a pro-
cessor in a low power state. Initializing this structure must be done with ExInitializeSetTimerParameters
that sets the Version member to the correct value, Reserved and NoWakeTolerance to zero. Here is
a typical way of working with EXT_SET_PARAMETERS if desired:
EXT_SET_PARAMETERS params;
ExInitializeSetTimerParameters(¶ms);
params.NoWakeTolerance = -5000; // 0.5 msec
ExSetTimer(timer, -15000, 0, ¶ms); // 1.5 msec interval
ExSetTimer cancels any previous timer that may have been active and sets the new values. If the
timer was active, the function returns TRUE. Otherwise, it returns FALSE.
As with standard timers, it’s possible to cancel a high-resolution timer with ExCancelTimer:
BOOLEAN ExCancelTimer (
_Inout_ PEX_TIMER Timer,
_In_opt_ PEXT_CANCEL_PARAMETERS Parameters);
The function returns TRUE if the timer was actually cancelled, or FALSE if the timer was inactive -
nothing to cancel. Parameters must be NULL.
Finally, a timer object must be deleted with ExDeleteTimer:
Chapter 11: Advanced Programming Techniques (Part 2) 352
BOOLEAN ExDeleteTimer (
_In_ PEX_TIMER Timer,
_In_ BOOLEAN Cancel,
_In_ BOOLEAN Wait,
_In_opt_ PEXT_DELETE_PARAMETERS Parameters);
Cancel indicates whether to cancel the timer (if active). If Cancel is set to TRUE, then Wait can be
set to TRUE as well to wait until the timer has been cancelled. If Wait is set to TRUE, so must Cancel.
Similar to ExSetTimer, an optional EXT_DELETE_PARAMETERS structure can be provided, that includes
an optional callback to be invoked when the timer is finally deleted. ExDeleteTimer returns TRUE if
Cancel is TRUE and the timer was cancelled.
You can find examples for using standard and high-resolution timers in the Timers project, part of
the source code for this chapter. The example driver has a few I/O control codes to set up a standard
timer and a high-resolution timer. Here is an excerpt for creating a high-resolution timer:
// in TimersPublic.h
struct PeriodicTimer {
ULONG Interval;
ULONG Period;
};
// in DriverEntry
// g_HiRes is PEX_TIMER
//...
case IOCTL_TIMERS_SET_HIRES:
//check buffer... and then
auto data = (PeriodicTimer*)Irp->AssociatedIrp.SystemBuffer;
ExSetTimer(g_HiRes, -10000LL * data->Interval,
10000LL * data->Period, nullptr);
status = STATUS_SUCCESS;
break;
//...
Chapter 11: Advanced Programming Techniques (Part 2) 353
The TimersTest user-mode application can be used to test the timers. Here is the entire code:
#include <Windows.h>
#include <stdio.h>
#include "..\Timers\TimersPublic.h"
DWORD bytes;
if (argc < 2 || _stricmp(argv[1], "query") == 0) {
TimerResolution res;
if (DeviceIoControl(hDevice, IOCTL_TIMERS_GET_RESOLUTION, nullptr,
0, &res, sizeof(res), &bytes, nullptr)) {
printf("Timer resolution (100nsec): Max: %u Min: %u "
"Current: %u Inc: %u\n",
res.Maximum, res.Minimum, res.Current, res.Increment);
float factor = 10000.0f;
printf("Timer resolution (msec): Max: %.3f Min: %.3f "
"Current: %.3f Inc: %.3f\n",
res.Maximum / factor, res.Minimum / factor,
res.Current / factor, res.Increment / factor);
}
}
else if (_stricmp(argv[1], "set") == 0 && argc > 2) {
int arg = 2;
bool hires = false;
Chapter 11: Advanced Programming Techniques (Part 2) 354
if (_stricmp(argv[2], "hires") == 0) {
hires = true;
arg++;
}
PeriodicTimer data{};
if (argc > arg) {
data.Interval = atoi(argv[arg]);
arg++;
if (argc > arg) {
data.Period = atoi(argv[arg]);
}
if (!DeviceIoControl(hDevice,
hires ? IOCTL_TIMERS_SET_HIRES : IOCTL_TIMERS_SET_PERIODIC,
&data, sizeof(data), nullptr, 0, &bytes, nullptr))
printf("Error setting timer (%u)\n", GetLastError());
}
}
else if (_stricmp(argv[1], "stop") == 0) {
DeviceIoControl(hDevice, IOCTL_TIMERS_STOP,
nullptr, 0, nullptr, 0, &bytes, nullptr);
}
else {
printf("Unknown option.\n");
}
CloseHandle(hDevice);
return 0;
}
I/O Timer
There is yet another type of timer that can be used by a driver, known as an I/O Timer. This timer exists
for every device object (just one per device). When started, it runs a callback at IRQL DISPATCH_LEVEL
every second. There is no way to further customize it. It can be used as a “watchdog” of some sort,
when high resolution is not required.
The first step in using an I/O timer is to initialize it:
NTSTATUS IoInitializeTimer(
_In_ PDEVICE_OBJECT DeviceObject,
_In_ PIO_TIMER_ROUTINE TimerRoutine,
_In_opt_ PVOID Context);
Notice the device object parameter - this is how the I/O timer is identified. TimerRoutine has the
following prototype:
Chapter 11: Advanced Programming Techniques (Part 2) 355
VOID IO_TIMER_ROUTINE (
_In_ struct _DEVICE_OBJECT *DeviceObject,
_In_opt_ PVOID Context);
Generic Tables
The term “generic tables” is used by the kernel API to refer to two binary tree implementations
available to device driver writers (and the kernel itself). The first type is a Splay Tree implementation,
referred to as simply Generic Tables. The second implementation is using AVL trees, referred to as
AVL tables.
Splay trees are binary search trees where frequently used items move closer to the root and thus
are faster to access. On the downside, the tree is not self-balancing in the sense that it can have
any depth. AVL trees (named after Georgy Adelson-Velsky and Evgenii Landis) are self-balancing
binary search trees trees, keeping their depth logarithmic on the number of items (in base 2). They
are similar to red-black trees, but are faster in retrieval. You can find more information online.
Both implementations have an almost identical API. We’ll start with Splay trees, and then look at the
differences compared to AVL trees.
Splay Trees
The most common functions related to generic tables are shown in table 11-1.
Function Description
RtlInitializeGenericTable Initialize a new generic table
RtlInsertElementGenericTable Insert a new item into the table
RtlLookupElementGenericTable Lookup an item by key (logarithmic)
RtlNumberGenericTableElements Return the number of items in the table
RtlGetElementGenericTable Return an item by index
RtlDeleteElementGenericTable Delete an item from the table
RtlEnumerateGenericTable Enumerate the items in the table
Chapter 11: Advanced Programming Techniques (Part 2) 356
It’s important to note that the tables API provide no inherent synchronization. It’s the job of the
driver to make sure thread/CPU safety exists. You can use any appropriate synchronization primitive
we looked at, such as a (fast) mutex, Executive Resource, or spin lock.
The first step when using a generic table is to initialize it by calling RtlInitializeGenericTable:
VOID RtlInitializeGenericTable (
_Out_ PRTL_GENERIC_TABLE Table,
_In_ PRTL_GENERIC_COMPARE_ROUTINE CompareRoutine,
_In_ PRTL_GENERIC_ALLOCATE_ROUTINE AllocateRoutine,
_In_ PRTL_GENERIC_FREE_ROUTINE FreeRoutine,
_In_opt_ PVOID TableContext);
RTL_GENERIC_COMPARE_RESULTS CompareFunction (
_In_ struct _RTL_GENERIC_TABLE *Table,
_In_ PVOID FirstStruct,
_In_ PVOID SecondStruct);
The returned value is a simple enumeration. The provided arguments should be cast to the actual data
stored in the table and compared using some key present in that data. The returned value must be
consistent - using the key for comparison in a consistent way - otherwise the table APIs cannot work
as expected.
The AllocateRoutine and FreeRoutine are needed to implement the method of allocating and
freeing memory for the nodes managed by the table. These include the data item itself the driver
wishes to store and any other metadata required by the table implementation. Here are the prototypes:
Chapter 11: Advanced Programming Techniques (Part 2) 357
PVOID AllocateFunction (
_In_ struct _RTL_GENERIC_TABLE *Table,
_In_ CLONG ByteSize);
VOID FreeFunction (
_In_ struct _RTL_GENERIC_TABLE *Table,
_In_ PVOID Buffer);
The byte size provided to the allocation function is properly calculated to include any metadata
required by the tables API. As we’ll soon see, the insert API specifies the driver’s data size and
automatically adds the required overhead before calling the allocation function.
As for the implementation itself - you can use any memory APIs discussed, such as ExAllocatePoolWithTag,
ExAllocatePool2, or even lookaside lists. You can use the paged pool or non-paged pool, as needed.
The deallocation function must free the allocation appropriately.
Finally, the TableContext parameter allows adding some context pointer that may be useful for the
driver. It can be retrieved by accessing the TableContext member of RTL_GENERIC_TABLE. It’s also
possible to allocate a structure that starts with a RTL_GENERIC_TABLE member, and add driver-specific
members, so that access is possible by casting to the larger structure.
Once the table is initialized, items can be inserted (based on a key) by calling RtlInsertElementGenericTable:
PVOID RtlInsertElementGenericTable (
_In_ PRTL_GENERIC_TABLE Table,
_In_reads_bytes_(BufferSize) PVOID Buffer,
_In_ CLONG BufferSize,
_Out_opt_ PBOOLEAN NewElement);
The provided Buffer should be the data to be placed in the table, which should contain the key to
be used for comparison. The function calls the compare function to figure out if the element already
exists in the table. If it does, its address is returned and no insertion takes place. If it doesn’t exist,
it’s inserted by copying the provided buffer to the “real” buffer allocated (by calling the registered
allocation routine). BufferSize should specify the number of bytes in the data structure to copy. The
returned pointer in this case is the address of the stored object within the table.
For example, suppose the driver wants to keep some data on a per-process basis, keyed by the process
ID. The data structure could look something like the following (full example is shown in the next
section):
Chapter 11: Advanced Programming Techniques (Part 2) 358
struct ProcessData {
ULONG Id; // serves as the key
// data to be tracked per process...
};
There is no need to store the returned pointer - the driver can get it later by performing a lookup.
Notice that the provided data is on the stack - it doesn’t matter, as it’s copied to the dynamically-
allocated buffer anyway.
The final optional parameter to RtlInsertElementGenericTable (NewElement) returns if a new
item was inserted (TRUE) or the item was already in the table (FALSE).
Retrieving an item based on the key is accomplished with RtlLookupElementGenericTable:
PVOID RtlLookupElementGenericTable (
_In_ PRTL_GENERIC_TABLE Table,
_In_ PVOID Buffer);
The provided Buffer should be the key data that will be used by the called compare routine. It
doesn’t have to include a full blown item if the key members are first in the data structure. In the
previous example, providing a simple ULONG is enough, as it’s the first member of ProcessData.
RtlLookupElementGenericTable returns the pointer to the data within the table, or NULL if the
item cannot be located.
The table API provides an additional way to retrieve items - by index:
PVOID RtlGetElementGenericTable(
_In_ PRTL_GENERIC_TABLE Table,
_In_ ULONG Index);
This is sometimes useful for enumeration purposes, although the order is not generally predictable.
You can get the number of items in the table with the simple RtlNumberGenericTableElements. To
get a predictable enumeration (ordered by key), you can call RtlEnumerateGenericTable:
Chapter 11: Advanced Programming Techniques (Part 2) 359
PVOID RtlEnumerateGenericTable (
_In_ PRTL_GENERIC_TABLE Table,
_In_ BOOLEAN Restart);
Set Restart to TRUE when initializing enumeration, and iterate until the returned pointer is NULL.
Here is an example:
RtlEnumerateGenericTable flattens the tree into a linked list and provides the items as required. A
similar API, RtlEnumerateGenericTableWithoutSplaying will not perturb the splay links.
Finally, to delete an item from the table, call RtlDeleteElementGenericTable:
BOOLEAN RtlDeleteElementGenericTable (
_In_ PRTL_GENERIC_TABLE Table,
_In_ PVOID Buffer);
The function returns TRUE if the item was found and was deleted, FALSE otherwise. You must be
careful to delete all items from the table before the driver unloads, or the memory used by remaining
items will leak. You can use the following loop to delete all items properly:
PVOID element;
while ((element = RtlGetElementGenericTable(&table, 0)) != nullptr) {
RtlDeleteElementGenericTable(&table, element);
}
Write a RAII wrapper for generic tables. Use C++ templates if you can.
#define IOCTL_TABLES_GET_PROCESS_COUNT \
CTL_CODE(TABLES_DEVICE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_TABLES_GET_PROCESS_BY_ID \
CTL_CODE(TABLES_DEVICE, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_TABLES_GET_PROCESS_BY_INDEX \
CTL_CODE(TABLES_DEVICE, 0x802, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_TABLES_DELETE_ALL \
CTL_CODE(TABLES_DEVICE, 0x803, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_TABLES_START \
CTL_CODE(TABLES_DEVICE, 0x804, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_TABLES_STOP \
CTL_CODE(TABLES_DEVICE, 0x805, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_TABLES_GET_ALL \
CTL_CODE(TABLES_DEVICE, 0x806, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
struct ProcessData {
ULONG Id;
LONG64 RegistrySetValueOperations;
LONG64 RegistryCreateKeyOperations;
LONG64 RegistryRenameOperations;
LONG64 RegistryDeleteOperations;
};
Every time a process makes one of these operation, the relevant counter is incremented. A generic
table is used to quickly lookup a process making a Registry operation based on the process’ ID.
The process generic table and other data is stored in the following structure (in Tables.h):
struct Globals {
void Init();
RTL_GENERIC_TABLE ProcessTable;
FastMutex Lock;
LARGE_INTEGER RegCookie;
};
A global instance is created in Tables.cpp. Init is used to initialize the fast mutex (a RAII wrapper
similar to the one we saw in chapter 6) and the table itself:
Chapter 11: Advanced Programming Techniques (Part 2) 361
Globals g_Globals;
void Globals::Init() {
Lock.Init();
RtlInitializeGenericTable(&ProcessTable,
CompareProcesses, AllocateProcess, FreeProcess, nullptr);
}
RTL_GENERIC_COMPARE_RESULTS
CompareProcesses(PRTL_GENERIC_TABLE, PVOID first, PVOID second) {
auto p1 = (ProcessData*)first;
auto p2 = (ProcessData*)second;
if (p1->Id == p2->Id)
return GenericEqual;
Allocation and deallocation are performed in a straightforward manner with ExAllocatePool2 and
ExFreePool:
POOL_FLAG_UNINITIALIZED is used to skip zeroing out the structure, as the table API will copy the
provided data anyway.
DriverEntry is fairly standard, with two additions. One is a Registry notification callback for tracking
Registry operations. The other is a process notification callback, so that when a process exits, the stats
Chapter 11: Advanced Programming Techniques (Part 2) 362
kept for the process are removed from the generic table. This is partly because process IDs may be
reused and that would track multiple processes that happen to have the same ID with the same data
structure.
If you would want to track all processes without losing stats, it’s possible to use a
combination of the process ID and its creation time as a unique key. Another option for
a unique key is a process key available with PsGetProcessStartKey (from Windows 10
version 1703). Another idea would be to push dead processes to a separate list.
extern "C"
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING) {
NTSTATUS status;
PDEVICE_OBJECT devObj = nullptr;
UNICODE_STRING link = RTL_CONSTANT_STRING(L"\\??\\Tables");
bool symLinkCreated = false, procRegistered = false;
do {
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Device\\Tables");
status = IoCreateDevice(DriverObject, 0, &name, FILE_DEVICE_UNKNOWN,
0, FALSE, &devObj);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX
"Failed in IoCreateDevice (0x%X)\n", status));
break;
}
//
// set process notification routine
//
status = PsSetCreateProcessNotifyRoutineEx(OnProcessNotify, FALSE);
if (!NT_SUCCESS(status))
break;
Chapter 11: Advanced Programming Techniques (Part 2) 363
procRegistered = true;
//
// Registry notitications
//
UNICODE_STRING altitude = RTL_CONSTANT_STRING(L"123456.789");
status = CmRegisterCallbackEx(OnRegistryNotify,
&altitude, DriverObject, nullptr,
&g_Globals.RegCookie, nullptr);
} while (false);
if (!NT_SUCCESS(status)) {
if (procRegistered)
PsSetCreateProcessNotifyRoutineEx(OnProcessNotify, TRUE);
if (!symLinkCreated)
IoDeleteSymbolicLink(&link);
if (devObj)
IoDeleteDevice(devObj);
return status;
}
DriverObject->DriverUnload = TablesUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = TablesCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = TablesDeviceControl;
return status;
}
The Registry notification callback first tests for the interesting operations:
At this point it’s time to look for the current process in the generic table. If it’s not there, then an entry
needs to be created:
PVOID buffer;
auto pid = HandleToULong(PsGetCurrentProcessId());
{
Locker locker(g_Globals.Lock);
buffer = RtlLookupElementGenericTable(&g_Globals.ProcessTable, &pid);
if (buffer == nullptr) {
//
// process does not exist, create a new entry
//
ProcessData data{};
data.Id = pid;
buffer = RtlInsertElementGenericTable(&g_Globals.ProcessTable,
&data, sizeof(data), nullptr);
if (buffer) {
KdPrint((DRIVER_PREFIX
"Added process %u from Registry callback\n", pid));
}
}
}
The Locker class is the same one we used in chapter 6 - acquiring the lock (fast mutex in
this case) in the constructor and releasing in the destructor. Once the fast mutex is acquired,
RtlLookupElementGenericTable is called to look for the process ID. If not found (NULL returned),
RtlInsertElementGenericTable is called to insert a new item. Technically, it’s possible to just call
RtlInsertElementGenericTable without doing a lookup first, as it would return the existing pointer
if the item to insrt already exists. Note that data is zeroed out before the ID is set, so that copying the
data to the table would start all counters at zero.
if (buffer) {
auto data = (ProcessData*)buffer;
switch (type) {
case RegNtPostSetValueKey:
InterlockedIncrement64(&data->RegistrySetValueOperations);
break;
case RegNtPostCreateKey:
case RegNtPostCreateKeyEx:
InterlockedIncrement64(&data->RegistryCreateKeyOperations);
break;
case RegNtPostRenameKey:
InterlockedIncrement64(&data->RegistryRenameOperations);
break;
case RegNtPostDeleteKey:
case RegNtPostDeleteValueKey:
InterlockedIncrement64(&data->RegistryDeleteOperations);
break;
}
}
The process notify callback should remove a dead process data structure:
Deleting could fail if the driver started after the process in question was already running. Note that
there is no need to create a new item if a process is created - if the process does not perform the
tracked Registry operations no item should be added as an optimization.
The IRP_MJ_DEVICE_CONTROL handler handles all client requests. It starts with the “usual” code:
Chapter 11: Advanced Programming Techniques (Part 2) 366
switch (dic.IoControlCode) {
After the switch, the IRP is completed with the status and len:
The CompleteRequest helper function is the same as used in chapter 8 (and others), completing the
IRP with whatever status and information provided.
Here is the case for getting the number of elements (processes) being tracked:
case IOCTL_TABLES_GET_PROCESS_COUNT:
{
if (dic.OutputBufferLength < sizeof(ULONG)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
Locker locker(g_Globals.Lock);
*(ULONG*)Irp->AssociatedIrp.SystemBuffer =
RtlNumberGenericTableElements(&g_Globals.ProcessTable);
len = sizeof(ULONG);
status = STATUS_SUCCESS;
}
break;
The NULL check for the system buffer is missing in the above snippet.
case IOCTL_TABLES_GET_PROCESS_BY_ID:
{
if (dic.OutputBufferLength < sizeof(ProcessData) ||
dic.InputBufferLength < sizeof(ULONG)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
ULONG pid = *(ULONG*)Irp->AssociatedIrp.SystemBuffer;
Locker locker(g_Globals.Lock);
auto data = (ProcessData*)RtlLookupElementGenericTable(
&g_Globals.ProcessTable, &pid);
if (data == nullptr) {
//
// invalid or non-tracked PID
//
status = STATUS_INVALID_CID;
break;
}
memcpy(Irp->AssociatedIrp.SystemBuffer, data, len = sizeof(ProcessData));
status = STATUS_SUCCESS;
}
break;
Getting all process information is a bit tricky, as we need to make sure not to overflow the user’s
buffer:
case IOCTL_TABLES_GET_ALL:
{
if (dic.OutputBufferLength < sizeof(ProcessData)) {
status = STATUS_BUFFER_TOO_SMALL;
break;
}
Locker locker(g_Globals.Lock);
auto count = RtlNumberGenericTableElements(&g_Globals.ProcessTable);
if (count == 0) {
status = STATUS_NO_DATA_DETECTED;
break;
}
NT_ASSERT(Irp->MdlAddress);
count = min(count, dic.OutputBufferLength / sizeof(ProcessData));
auto buffer = (ProcessData*)MmGetSystemAddressForMdlSafe(
Irp->MdlAddress, NormalPagePriority);
if (buffer == nullptr) {
Chapter 11: Advanced Programming Techniques (Part 2) 368
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
for (ULONG i = 0; i < count; i++) {
auto data = (ProcessData*)RtlGetElementGenericTable(
&g_Globals.ProcessTable, i);
NT_ASSERT(data);
memcpy(buffer, data, sizeof(ProcessData));
buffer++;
}
len = count * sizeof(ProcessData);
status = STATUS_SUCCESS;
}
break;
Here is where RtlGetElementGenericTable comes in handy. The code fills the user’s buffer with as
many ProcessData structures that would fit or all that exist if everything fits.
To delete all items (IOCTL_TABLES_DELETE_ALL), which is also needed in the Unload routine,
DeleteAllProcesses is called:
void DeleteAllProcesses() {
Locker locker(g_Globals.Lock);
//
// deallocate all objects still stored in the table
//
PVOID p;
auto t = &g_Globals.ProcessTable;
while ((p = RtlGetElementGenericTable(t, 0)) != nullptr) {
RtlDeleteElementGenericTable(t, p);
}
}
if (argc > 1) {
if (_stricmp(argv[1], "help") == 0)
return PrintUsage();
if (_stricmp(argv[1], "delete") == 0)
cmd = Command::DeleteAll;
else if (_stricmp(argv[1], "count") == 0)
cmd = Command::GetProcessCount;
else if (_stricmp(argv[1], "start") == 0)
cmd = Command::Start;
else if (_stricmp(argv[1], "getall") == 0)
cmd = Command::GetAllProcesses;
else if (_stricmp(argv[1], "stop") == 0)
cmd = Command::Stop;
else if (_stricmp(argv[1], "get") == 0) {
if (argc > 2) {
pid = atoi(argv[2]);
cmd = Command::GetProcessById;
Chapter 11: Advanced Programming Techniques (Part 2) 370
}
else {
printf("Missing process ID\n");
return 1;
}
}
else if (_stricmp(argv[1], "geti") == 0) {
if (argc > 2) {
pid = atoi(argv[2]);
cmd = Command::GetProcessByIndex;
}
else {
printf("Missing index\n");
return 1;
}
}
else
cmd = Command::Error;
}
if (cmd == Command::Error) {
printf("Command error.\n");
return PrintUsage();
}
auto hDevice = CreateFile(L"\\\\.\\Tables",
GENERIC_READ | GENERIC_WRITE, 0, nullptr,
OPEN_EXISTING, 0, nullptr);
if (hDevice == INVALID_HANDLE_VALUE) {
printf("Error opening device (%u)\n", GetLastError());
return 1;
}
DWORD bytes;
BOOL success = FALSE;
switch (cmd) {
case Command::GetProcessCount:
{
DWORD count;
success = DeviceIoControl(hDevice,
IOCTL_TABLES_GET_PROCESS_COUNT, nullptr, 0,
&count, sizeof(count), &bytes, nullptr);
if (success) {
printf("Process count: %u\n", count);
}
Chapter 11: Advanced Programming Techniques (Part 2) 371
break;
}
case Command::GetAllProcesses:
{
DWORD count = 0;
success = DeviceIoControl(hDevice,
IOCTL_TABLES_GET_PROCESS_COUNT, nullptr, 0,
&count, sizeof(count), &bytes, nullptr);
if (count) {
count += 10; // in case more processes created
auto data = std::make_unique<ProcessData[]>(count);
success = DeviceIoControl(hDevice,
IOCTL_TABLES_GET_ALL, nullptr, 0,
data.get(), count * sizeof(ProcessData), &bytes, nullptr);
if (success) {
count = bytes / sizeof(ProcessData);
printf("Returned %u processes\n", count);
for (DWORD i = 0; i < count; i++)
DisplayProcessData(data[i]);
}
}
break;
}
case Command::DeleteAll:
success = DeviceIoControl(hDevice, IOCTL_TABLES_DELETE_ALL,
nullptr, 0, nullptr, 0, &bytes, nullptr);
if (success)
printf("Deleted successfully.\n");
break;
case Command::GetProcessById:
case Command::GetProcessByIndex:
{
ProcessData data;
success = DeviceIoControl(hDevice,
cmd == Command::GetProcessById ?
IOCTL_TABLES_GET_PROCESS_BY_ID :
IOCTL_TABLES_GET_PROCESS_BY_INDEX,
&pid, sizeof(pid), &data, sizeof(data), &bytes, nullptr);
if (success) {
DisplayProcessData(data);
Chapter 11: Advanced Programming Techniques (Part 2) 372
}
break;
}
}
if (!success) {
printf("Error (%u)\n", GetLastError());
}
CloseHandle(hDevice);
return 0;
}
1. Add support for system-wide statistics for the implemented operations. Add control
codes to retrieve them from user mode.
2. Save deleted processes stats in a list (so they don’t get lost once a process is
terminated), and provide this list to the client if requested.
3. Implement the start and stop control codes to allow pausing and resuming counting
operations.
AVL Trees
The API for using AVL trees is virtually identical to the splay trees API with the addition of the suffix
“Avl” to function names, such as RtlInitializeGenericTableAvl. In the AVL tree case, a different
structure, RTL_AVL_TABLE, is used to manage the tree.
You may want to experience with both implementations and decide based on performance measure-
ments for your scenario that one implementation is better than the other. Fortunately, the kernel
headers provide a simple way to switch to AVL trees without changing any code by defining the
macro RTL_USE_AVL_TABLES before including <ntddk.h>:
Chapter 11: Advanced Programming Techniques (Part 2) 373
#define RTL_USE_AVL_TABLES
#include <ntddk.h>
That’s it! All calls to the Splay trees functions are redirected (the functions become macros) to the
AVL tree implementation.
Hash Tables
The Splay trees and AVL trees discussed are implemented as binary search trees. Another common
way to perform quick lookup is by using hash tables. Hash tables are based around a hash function
that, if properly implemented, provides a good distribution of values across keys - no greater/less than
comparison required.
The WDK documentation does not document any hash functions, but the kernel API supports a
hash table implementation. The functions are declared in <ntddk.h>, but are undocumented. As such,
they are not described in this book. Feel free to investigate their usage, starting with the function
RtlInitHashTableContext.
This is as simple as a linked list can possibly get. Just as with doubly-linked lists, one of these is defined
as the header of the list (Next is initialized to NULL), and the same structure is embedded in a larger
structure where the real data is. For example:
Chapter 11: Advanced Programming Techniques (Part 2) 374
struct MyData {
ULONGLONG Time;
ULONG ProcessId;
SINGLE_LIST_ENTRY Link;
ULONG ExitCode;
};
Since it’s a singly-linked list, you can only add a new head and remove the current head (both
implemented inline within ntdef.h):
VOID PushEntryList(
_Inout_ PSINGLE_LIST_ENTRY ListHead,
_Inout_ __drv_aliasesMem PSINGLE_LIST_ENTRY Entry);
Just like doubly-linked lists, the CONTAINING_RECORD macro can be used to get to the “real” data given
the pointer to SINGLE_LIST_ENTRY, the full structure type, and the name of the SINGLE_LIST_ENTRY
member within the larger structure.
The afformentioned functions are not thread/CPU safe, so must be properly protected if appropriate.
That said, APIs are provided for thread/CPU safe pushing and popping using a spin lock only:
PSINGLE_LIST_ENTRY ExInterlockedPopEntryList (
_Inout_ PSINGLE_LIST_ENTRY ListHead,
_Inout_ _Requires_lock_not_held_(*_Curr_) PKSPIN_LOCK Lock);
PSINGLE_LIST_ENTRY ExInterlockedPushEntryList (
_Inout_ PSINGLE_LIST_ENTRY ListHead,
_Inout_ __drv_aliasesMem PSINGLE_LIST_ENTRY ListEntry,
_Inout_ _Requires_lock_not_held_(*_Curr_) PKSPIN_LOCK Lock);
The spin lock is acquired at IRQL HIGH_LEVEL, which makes it easy to use from any IRQL.
To add an item, use an SLIST_ENTRY object (usually part of a bigger structure) by passing it to
ExInterlockedPushEntrySList macro:
PSLIST_ENTRY ExInterlockedPushEntrySList (
_Inout_ PSLIST_HEADER ListHead,
_Inout_ __drv_aliasesMem PSLIST_ENTRY ListEntry,
_Inout_opt_ _Requires_lock_not_held_(*_Curr_) PKSPIN_LOCK Lock);
The spin lock should be passed as NULL, as this macro expands to calling ExpInterlockedPushEntrySList:
PSLIST_ENTRY ExpInterlockedPushEntrySList (
_Inout_ PSLIST_HEADER ListHead,
_Inout_ __drv_aliasesMem PSLIST_ENTRY ListEntry);
As you can see, the spin lock is not used at all. It’s not quite clear why the macro accepts a spin lock,
but the documentation hints that this is only useful with doubly-linked lists, so the macro prototype
is probably for consistency only.
Similarly, popping an item (from the head only) is available with ExInterlockedPopEntrySList:
PSLIST_ENTRY ExInterlockedPopEntrySList (
_Inout_ PSLIST_HEADER ListHead,
_Inout_opt_ _Requires_lock_not_held_(*_Curr_) PKSPIN_LOCK Lock);
The function simply replaces (atomically) the head with NULL (making the list empty), and returns the
previous head. It’s the reponsibility of the driver to iterate through the list and free items that were
dynamically allocated explictly.
Finally, you can call ExQueryDepthSList to get the number of items in the list:
Callback Objects
The kernel defines a Callback object type that can be used to provide notifications, while maintaining
a higher level of abstraction, where the callback object hides the callback(s) that should be invoked.
There are quite a few callback objects used on a normal system, which can be viewed with Sysinternals
WinObj tool (figure 11-1).
There are three existing (and documented) callback objects that drivers can use (all in the \Callback
object manager directory):
Working with an existing callback object, or when creating one is essentially the same. The first
Chapter 11: Advanced Programming Techniques (Part 2) 377
step is to create the callback object with ExCreateCallback, giving it a name with the provided
OBJECT_ATTRIBUTES:
NTSTATUS ExCreateCallback (
_Outptr_ PCALLBACK_OBJECT *CallbackObject,
_In_ POBJECT_ATTRIBUTES ObjectAttributes,
_In_ BOOLEAN Create,
_In_ BOOLEAN AllowMultipleCallbacks);
The OBJECT_ATTRIBUTES structure must be initialized with a name, and optionally other attributes,
the most common being OBJ_CASE_INSENSITIVE. Set Create to TRUE to create a new callback object
if such does not exist. If a new callback object is created, AllowMultipleCallbacks specifies whether
multiple callbacks are allowed. If Create is FALSE or the object exists, this parameter is ignored. The
returned object’s (CallbackObject) reference count is incremented.
With a callback object in hand, an interested client can register a callback function with ExRegisterCallback:
PVOID ExRegisterCallback (
_Inout_ PCALLBACK_OBJECT CallbackObject,
_In_ PCALLBACK_FUNCTION CallbackFunction,
_In_opt_ PVOID CallbackContext);
VOID CallbackFunction (
_In_opt_ PVOID CallbackContext,
_In_opt_ PVOID Argument1,
_In_opt_ PVOID Argument2);
CallbackContext is whatever was passed in to ExRegisterCallback, and the two arguments are
provided by whoever is invoking the callbacks - these can be anything, as determined by the invoker.
When using existing callback objects, that’s all there is to it. If you are controlling the callback object,
then you can invoke the callbacks that are currently registered with ExNotifyCallback:
VOID ExNotifyCallback (
_In_ PVOID CallbackObject,
_In_opt_ PVOID Argument1,
_In_opt_ PVOID Argument2);
Finally, to unregister your callback (if you’re a client), call ExUnregisterCallback, passing the
registration cookie:
Chapter 11: Advanced Programming Techniques (Part 2) 378
You must also decrement the reference count of the callback object with ObDereferenceObject,
otherwise the callback object will leak. You can do that for the existing callback objects as soon as
you don’t need them.
The Callbacks driver demonstrates using a callback object with the SetSystemTime documented
callback. Here is the entire driver:
PVOID g_RegCookie;
//
// register our callback
//
g_RegCookie = ExRegisterCallback(callback, SystemTimeChanged, nullptr);
if (g_RegCookie == nullptr) {
ObDereferenceObject(callback);
KdPrint(("Failed to register callback\n"));
return STATUS_UNSUCCESSFUL;
}
//
// callback object no longer needed
//
ObDereferenceObject(callback);
Chapter 11: Advanced Programming Techniques (Part 2) 379
DriverObject->DriverUnload = OnUnload;
return STATUS_SUCCESS;
}
void OnUnload(PDRIVER_OBJECT) {
ExUnregisterCallback(g_RegCookie);
}
In this chapter we’ve looked at some potentially useful techniques a driver might want to use. In the
next chapter, we’ll turn our attention to file system mini-filters.
Chapter 12: File System Mini-Filters
File systems are targets for I/O operations to access files and other devices implemented as file systems
(such as named pipes and mailslots). Windows supports several file systems, most notably NTFS, its
native file system. File system filtering is the mechanism by which drivers can intercept calls destined
to file systems. This is useful for many types of software, such as anti-viruses, backups, encryption,
redirection, and more.
Windows supported for a long time a filtering model known as file system filters, which is now
referred to as legacy file system filters. A newer model called file system mini-filters was developed
to replace the legacy filter mechanism. Mini-filters are easier to write in many respects, and are the
preferred way to develop file system filtering drivers. In this chapter we’ll cover the basics of file
system mini-filters.
This is a long chapter, so you may want to consume it in chunks. The exmaple drivers get more
complex as the chapter progresses.
In this chapter:
• Introduction
• Loading and Unloading
• Initialization
• Installation
• Processing I/O Operations
• File Names
• The Delete Protector Driver
• The Directory Hiding Driver
• Contexts
• Initiating I/O Requests
• The File Backup Driver
• User Mode Communication
• Debugging
• Exercises
Chapter 12: File System Mini-Filters 381
Introduction
Legacy file system filters are notoriously difficult to write. The driver writer has to take care of
an assortment of little details, many of them boilerplate, complicating development. Legacy filters
cannot be unloaded while the system is running which means the system had to be restarted to load
an updated version of the driver. With the mini-filter model, drivers can be loaded and unloaded
dynamically, thus streamlining the development workflow considerably.
Internally, a legacy filter provided by Windows called the Filter Manager is tasked with managing
mini-filters. A typical filter layering is shown in figure 12-1.
Each mini-filter has its own Altitude, which determines its relative position in the device stack. The
filter manager is the one receiving the IRPs just like any other legacy filter and then calls upon the
mini-filters it’s managing, in descending order of altitude.
In some unusual cases, there may be another legacy filter in the hierarchy, that may cause a mini-filter
“split”, where some are higher in altitude than the legacy filter and some lower. In such a case, more
than one instance of the filter manager will load, each managing its own mini-filters. Every such filter
manager instance is referred to as a Frame. Figure 12-2 shows such an example with two frames.
Chapter 12: File System Mini-Filters 382
Loading a mini-filter driver is equivalent to loading a standard software driver. Unloading, however,
is not.
Unloading a mini-filter is accomplished with the FilterUnload API in user mode, or FltUnloadFilter
in kernel mode. This operation requires the same privilege as for loads, but is not guaranteed to
succeed, because the mini-filter’s Filter unload callback (discussed later) is called, which can fail the
request so that driver remains loaded.
Chapter 12: File System Mini-Filters 383
Although using APIs to load and unload filters has its uses, during development it’s usually easier
to use a built-in tool that can accomplish that (and more) called fltmc.exe (residing in the System32
directory). Invoking it (from an elevated command window) without arguments lists the currently
loaded mini-filters. Here is the output from a Windows 11 machine:
C:\WINDOWS\system32>fltmc
For each filter, the output shows the driver’s name, the number of instances each filter has currently
running (each instance is attached to a volume), its altitude and the filter manager frame it’s part of.
You may be wondering why there are drivers with different number of instances. The short answer
is that it’s up to the driver to decide whether to attach to a given volume or not (we’ll look at this in
more detail later in this chapter).
Loading a driver with fltmc.exe is done with the load option, like so:
fltmc includes other options. Type fltmc -? to get the full list. For example, you can get the details
of all instances for each driver using fltmc instances. Similarly, you can get a list of all volumes
mounted on a system with fltmc volumes. We’ll see later in this chapter how this information is
conveyed to the driver.
File system drivers and filters are created in the FileSystem directory of the Object Manager namespace.
Figure 12-3 shows this directory in WinObj.
Chapter 12: File System Mini-Filters 384
Initialization
A file system mini-filter driver has a DriverEntry routine, just like any other driver. The driver
must register itself as a mini-filter with the filter manager, specifying various settings, such as
what operations it wishes to intercept. The driver sets up appropriate structures and then calls
FltRegisterFilter to register. If successful, the driver can do further initializations as needed and
finally call FltStartFiltering to actually start filtering operations.
Note that the driver does not need to set up dispatch routines on its own (IRP_MJ_READ, IRP_MJ_-
WRITE, etc.). This is because the driver is not directly in the I/O path; the filter manager is.
FltRegisterFilter has the following prototype:
NTSTATUS FltRegisterFilter (
_In_ PDRIVER_OBJECT Driver,
_In_ const FLT_REGISTRATION *Registration,
_Outptr_ PFLT_FILTER *RetFilte);
The required FLT_REGISTRATION structure provides all the necessary information for registration. It’s
defined like so:
Chapter 12: File System Mini-Filters 385
FLT_REGISTRATION_FLAGS Flags;
PFLT_FILTER_UNLOAD_CALLBACK FilterUnloadCallback;
PFLT_INSTANCE_SETUP_CALLBACK InstanceSetupCallback;
PFLT_INSTANCE_QUERY_TEARDOWN_CALLBACK InstanceQueryTeardownCallback;
PFLT_INSTANCE_TEARDOWN_CALLBACK InstanceTeardownStartCallback;
PFLT_INSTANCE_TEARDOWN_CALLBACK InstanceTeardownCompleteCallback;
PFLT_GENERATE_FILE_NAME GenerateFileNameCallback;
PFLT_NORMALIZE_NAME_COMPONENT NormalizeNameComponentCallback;
PFLT_NORMALIZE_CONTEXT_CLEANUP NormalizeContextCleanupCallback;
PFLT_TRANSACTION_NOTIFICATION_CALLBACK TransactionNotificationCallback;
PFLT_NORMALIZE_NAME_COMPONENT_EX NormalizeNameComponentExCallback;
#if FLT_MGR_WIN8
PFLT_SECTION_CONFLICT_NOTIFICATION_CALLBACK SectionNotificationCallback;
#endif
} FLT_REGISTRATION, *PFLT_REGISTRATION;
There is a lot of information encapsulated in this structure. The most important fields are described
below:
• Size must be set to the size of the structure, which may depend on the target Windows version
(set in the project’s properties). Drivers typically just specify sizeof(FLT_REGISTRATION).
• Version is also based on the target Windows version. Drivers use FLT_REGISTRATION_VERSION.
• Flags can be zero or a combination of the following values:
– FLTFL_REGISTRATION_DO_NOT_SUPPORT_SERVICE_STOP - the driver does not support a
stop request, regardless of other settings.
– FLTFL_REGISTRATION_SUPPORT_NPFS_MSFS - the driver is aware of named pipes and
mailslots and wishes to filter requests to these file systems as well (see the sidebar “Pipes
and Mailslots” for more information).
– FLTFL_REGISTRATION_SUPPORT_DAX_VOLUME (Windows 10 version 1607 and later) - the
driver will support attaching to a Direct Access Volume (DAX), if such a volume is
available (see the sidebar “Direct Access Volume”).
Chapter 12: File System Mini-Filters 386
complete. Specifying NULL for this callback does not prevent instance teardown (prevention can
be achieved with the previous query teardown callback).
• InstanceTeardownCompleteCallback - an optional callback invoked after all the pending I/O
operations complete or canceled.
The rest of the callback fields are all optional and seldom used. These are beyond the scope of this
book.
The operation itself is identified by a major function code, many of which are the same as the ones we
met in previous chapters: IRP_MJ_CREATE, IRP_MJ_READ, IRP_MJ_WRITE and so on. However, there
are other operations identified with a major function that do not have a real major function dispatch
routine. This abstraction provided by the filter manager helps to isolate the mini-filter from knowing
the exact source of the operation - it could be a real IRP or it could be another operation that is
abstracted as an IRP. Furthermore, file systems support another mechanism for receiving requests,
known as Fast I/O. Fast I/O is used for synchronous I/O with cached files. Fast I/O requests transfer
data between user buffers and the system cache directly, bypassing the file system and storage driver
stack, thus avoiding unnecessary overhead. The NTFS file system driver, as a canonical example,
supports Fast I/O.
This information can be viewed with a kernel debugger by using the !drvobj command as shown
here for the NTFS file system driver:
Chapter 12: File System Mini-Filters 388
Dispatch routines:
[00] IRP_MJ_CREATE fffff8026b49bae0 Ntfs!NtfsFsdCreate
[01] IRP_MJ_CREATE_NAMED_PIPE fffff80269141d40 nt!IopInvalidDeviceRequest
[02] IRP_MJ_CLOSE fffff8026b49d730 Ntfs!NtfsFsdClose
[03] IRP_MJ_READ fffff8026b3b3f80 Ntfs!NtfsFsdRead
...
[19] IRP_MJ_QUERY_QUOTA fffff8026b49c700 Ntfs!NtfsFsdDispatchWait
[1a] IRP_MJ_SET_QUOTA fffff8026b49c700 Ntfs!NtfsFsdDispatchWait
[1b] IRP_MJ_PNP fffff8026b5143e0 Ntfs!NtfsFsdPnp
!devstack ffffad8c22448050 :
!DevObj !DrvObj !DevExt ObjectName
ffffad8c4adcba70 \FileSystem\FltMgr ffffad8c4adcbbc0
> ffffad8c22448050 \FileSystem\Ntfs ffffad8c224481a0
(truncated)
The filter manager abstracts I/O operations, regardless of whether they are IRP-based or fast I/O based.
Mini-filters can intercept any such request. If the driver is not interested in fast I/O, for example, it can
query the actual request type provided by the filter manager with the FLT_IS_FASTIO_OPERATION
and/or FLT_IS_IRP_OPERATION macros.
Table 12-1 lists some of the common major functions for file system mini-filters with a brief description
for each.
The next two fields are the pre and post operation callbacks, where at least one must be non-NULL
(otherwise, why have that entry in the first place?). Here is an example of initializing an array of
FLT_OPERATION_REGISTRATION structures (for an imaginary driver called “Sample”):
With this array at hand, registration for a driver that does not require any contexts could be done
with the following code:
PFLT_FILTER Filter;
NTSTATUS
DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
NTSTATUS status;
Chapter 12: File System Mini-Filters 391
The Altitude
As we’ve seen already, file system mini-filters must have an altitude, indicating their relative “position”
within the file system filters hierarchy. Contrary to the altitude we’ve already encountered with object
and registry callbacks, a mini-filter’s altitude value may be potentially significant.
First, the value of the altitude is not provided as part of mini-filter’s registration, but is read from
the registry. When the driver is installed, its altitude is written in the proper location in the registry.
Figure 12-4 shows the registry entry for the built-in Fileinfo mini-filter driver; the Altitude is clearly
visible, and is the same value shown earlier with the fltmc.exe tool.
Here is an example that should clarify why altitude matters. Suppose there is a mini-filter at altitude
10000 whose job is to encrypt data when written, and decrypt when read. Now suppose another mini-
filter whose job is to check data for malicious activity is at altitude 9000. This layout is depicted in
Figure 12-5.
Chapter 12: File System Mini-Filters 392
The encryption driver encrypts incoming data to be written, which is then passed on to the anti-virus
driver. The anti-virus driver is in a problem, as it sees the encrypted data with no viable way of
decrypting it (and even if it could, that would be wasteful). In such a case, the anti-virus driver must
have an altitude higher than the encryption driver. How can such a driver guarantee this is in fact the
case?
To rectify this (and other similar) situations, Microsoft has defined ranges of altitudes for drivers
based on their requirements (and ultimately, their role). In order to obtain a proper altitude, the driver
publisher must send an email to Microsoft ([email protected]) and ask an altitude be allocated
for that driver based on its intended target. Check out this link³ for the complete list of altitude ranges.
In fact, the link shows all drivers that Microsoft has allocated an altitude for, with the file name, the
altitude and the publishing company.
For testing purposes, you can choose any appropriate altitude without going through
Microsoft, but you should obtain an official altitude for production use.
Table 12-2 shows the list of groups and the altitude range for each group.
³https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/allocated-altitudes
Chapter 12: File System Mini-Filters 393
Installation
Figure 12-4 shows that there are additional Registry entries that must be set, beyond what is possible
with the standard CreateService installation API we’ve been using up until now (indirectly with
the sc.exe tool). One way to install a file system mini-filter driver is to use an INF file. This approach
was used in the first edition of the book, because at the time there was a driver project template for
file system mini-filters provided with the WDK that used an INF file. Curiously enough, that template
went away in recent WDKs without any explanation. Although it’s possible to use an existing project
from the first edition of the book as a basis for a driver that uses an INF file for installation, I will
show another way that does not require an INF file at all.
Chapter 12: File System Mini-Filters 394
If you want to see how to use an INF file to install a file system mini-filter, please see chapter 10 in
the first edition of the book. Using an INF file is perfectly fine.
The alternative approach we’ll use is to write the required Registry values directly as part of
DriverEntry prior to calling FltRegisterFilter. The next driver example in this chapter, Del-
Protect, that will be discussed in an upcoming section, uses this technique. Here is the code (error
handling omitted):
//
// set "DefaultInstance" value. Any name is fine.
//
UNICODE_STRING valueName = RTL_CONSTANT_STRING(L"DefaultInstance");
WCHAR name[] = L"DelProtectDefaultInstance";
status = ZwSetValueKey(hSubKey, &valueName, 0, REG_SZ, name, sizeof(name));
//
// create "instance" key under "Instances"
//
UNICODE_STRING instKeyName;
RtlInitUnicodeString(&instKeyName, name);
HANDLE hInstKey;
InitializeObjectAttributes(&subKeyAttr, &instKeyName, OBJ_KERNEL_HANDLE,
hSubKey, nullptr);
status = ZwCreateKey(&hInstKey, KEY_WRITE, &subKeyAttr, 0, nullptr, 0, nullptr);
//
// write out altitude
Chapter 12: File System Mini-Filters 395
//
WCHAR altitude[] = L"425342";
UNICODE_STRING altitudeName = RTL_CONSTANT_STRING(L"Altitude");
status = ZwSetValueKey(hInstKey, &altitudeName, 0, REG_SZ,
altitude, sizeof(altitude));
//
// write out flags
//
UNICODE_STRING flagsName = RTL_CONSTANT_STRING(L"Flags");
ULONG flags = 0;
status = ZwSetValueKey(hInstKey, &flagsName, 0, REG_DWORD,
&flags, sizeof(flags));
ZwClose(hInstKey);
The Flags value in the Registry indicates what types of volume attach the driver is interested in. This
can have on of the following values:
The last missing piece is the need to link with the Filter Manager API, implemented in FltMgr.lib. It
must be added to the Linker input libraries as shown in figure 12-6.
Chapter 12: File System Mini-Filters 396
Make sure you select “All Platforms” and “All Configurations”. You cannot add the
FltMgr.lib in source code using a #pragma comment(lib, "ftlmgr") similarly to user
mode. I don’t know why the linker does not accept this option.
Notice the “type= filesys” instead of “type= kernel” we used in previous chapters. This writes a value
of 2 in the Type value in the Registry, rather than 1. Does that really matter? As far as I can tell - it
doesn’t, but still, it’s best to write the expected value.
FLT_PREOP_CALLBACK_STATUS SomePreOperation (
_Inout_ PFLT_CALLBACK_DATA Data,
_In_ PCFLT_RELATED_OBJECTS FltObjects,
_Outptr_ PVOID *CompletionContext);
First, let’s look at the possible return values from a pre-operation, typed as the FLT_PREOP_CALLBACK_-
STATUS enumeration. Here are the common return values to use:
• FLT_PREOP_COMPLETE indicates the driver is completing the operation. The filter manager does
not call the post-operation callback (if registered) and does not forward the request to lower-
layer mini-filters.
• FLT_PREOP_SUCCESS_NO_CALLBACK indicates the pre-operation is done with the request and
lets it continue flowing to the next filter. The driver does not want its post-operation callback
to be called for this operation.
• FLT_PREOP_SUCCESS_WITH_CALLBACK indicates the driver allows the filter manager to propa-
gate the request to lower-layer filters, but it wants its post-operation callback invoked for this
operation.
• FLT_PREOP_PENDING indicates the driver is pending the operation. The filter manager does not
continue processing the request until the driver calls FltCompletePendedPreOperation to let
the filter manager know it can continue processing this request.
• FLT_PREOP_SYNCHRONIZE is similar to FLT_PREOP_SUCCESS_WITH_CALLBACK, but the driver
asks the filter manager to invoke its post-callback on the same thread at IRQL <= APC_LEVEL
(normally the post-operation callback can be invoked at IRQL <= DISPATCH_LEVEL by an
arbitrary thread).
The Data argument provides all the information related to the I/O operation itself, as a FLT_-
CALLBACK_DATA structure defined like so:
PVOID FilterContext[4];
};
KPROCESSOR_MODE RequestorMode;
} FLT_CALLBACK_DATA, *PFLT_CALLBACK_DATA;
This structure is also provided in the post-callback. Here is a rundown of the important members of
this structure:
* Flags may contain zero or a combination of flags, some of which are listed below:
* FLTFL_CALLBACK_DATA_DIRTY indicates the driver has made changes to the structure and then
called FltSetCallbackDataDirty. Every member of the structure can be modified except Thread
and RequestorMode.
* FLTFL_CALLBACK_DATA_FAST_IO_OPERATION indicates this is a fast I/O operation.
* FLTFL_CALLBACK_DATA_IRP_OPERATION indicates this is an IRP-based operation.
* FLTFL_CALLBACK_DATA_GENERATED_IO indicates this is an operation generated by another mini-
filter.
* FLTFL_CALLBACK_DATA_POST_OPERATION indicates this is a post-operation, rather than a pre-
operation.
ULONG IrpFlags;
UCHAR MajorFunction;
UCHAR MinorFunction;
UCHAR OperationFlags;
UCHAR Reserved;
PFILE_OBJECT TargetFileObject;
PFLT_INSTANCE TargetInstance;
FLT_PARAMETERS Parameters;
} FLT_IO_PARAMETER_BLOCK, *PFLT_IO_PARAMETER_BLOCK;
• TargetFileObject is the file object that is the target of this operation; it’s useful to have when
invoking some APIs.
• Parameters is a monstrous union providing the actual data for the specific information (similar
in concept to the Paramters member of an IO_STACK_LOCATION). The driver looks at the proper
structure within this union to get to the information it needs. We’ll look at some of these
structures once we look at specific operation types, later in this chapter.
Chapter 12: File System Mini-Filters 399
The second argument to the pre-callback is another structure of type FLT_RELATED_OBJECTS. This
structure mostly contains opaque handles to the current filter, instance and volume, which are useful
in some APIs. Here is the complete definition of this structure:
The FileObject field is the same one accessed through the I/O parameter block’s TargetFileObject
field.
The last argument to the pre-callback is a context value that can be set by the driver. If set, this value
is propagated to the post-callback routine for the same request (the default value is NULL).
FLT_POSTOP_CALLBACK_STATUS SomePostOperation (
_Inout_ PFLT_CALLBACK_DATA Data,
_In_ PCFLT_RELATED_OBJECTS FltObjects,
_In_opt_ PVOID CompletionContext,
_In_ FLT_POST_OPERATION_FLAGS Flags);
The post-operation function is called at IRQL <= DISPATCH_LEVEL in an arbitrary thread context,
unless the pre-callback routine returned FLT_PREOP_SYNCHRONIZE, in which case the filter manager
guarantees the post-callback is invoked at IRQL < DISPATCH_LEVEL on the same thread that executed
the pre-callback.
In the former case, the driver cannot perform certain types of operations because the IRQL is too high:
If the driver needs to do any of the above, it somehow must defer its execution to another routine
called at IRQL < DISPATCH_LEVEL. This can be done in one of two ways:
Chapter 12: File System Mini-Filters 400
In any case, using one of these deferring mechanisms is not allowed if the flags argument
is set to FLTFL_POST_OPERATION_DRAINING, which means the post-callback is part of
volume detaching. In this case, the post callback is called at IRQL < DISPATCH_LEVEL.
There are many little details here, check out the WDK documentation for yet more details. We’ll
use some of the above mechanisms later in this chapter.
Chapter 12: File System Mini-Filters 401
File Names
In some mini-filter callbacks, the name of the file being accessed is needed. At first, this seems like
an easy detail to find: the FILE_OBJECT structure has a FileName member, which should be exactly
what is needed.
Unfortunately, things are not that simple. Files may be opened with a full path or a relative one;
rename operations on the same file may be occurring at the same time; some file name information
is cached. For these and other internal reasons, the FileName field in the file object is not be trusted.
In fact, it’s only guaranteed to be valid in an IRP_MJ_CREATE pre-operation callback, and even there
it’s not necessarily in the format the driver needs.
To offset this issues, the filter manager provides the FltGetFileNameInformation API that can
return the correct file name when needed. This function is prototyped as follows:
NTSTATUS FltGetFileNameInformation (
_In_ PFLT_CALLBACK_DATA CallbackData,
_In_ FLT_FILE_NAME_OPTIONS NameOptions,
_Outptr_ PFLT_FILE_NAME_INFORMATION *FileNameInformation);
The CallbackData parameter is the one provided by the filter manager in any callback. The
NameOptions parameter is a set of flags that specify (among other things) the requested file format.
Typical value used by most drivers is FLT_FILE_NAME_NORMALIZED (full path name) ORed with
FLT_FILE_NAME_QUERY_DEFAULT (locate the name in a cache, otherwise query the file system).
The result from the call is provided by the last parameter, FileNameInformation. The result is an
allocated structure that needs to be properly freed by calling FltReleaseFileNameInformation.
The FLT_FILE_NAME_INFORMATION structure is defined like so:
UNICODE_STRING Name;
UNICODE_STRING Volume;
UNICODE_STRING Share;
UNICODE_STRING Extension;
UNICODE_STRING Stream;
UNICODE_STRING FinalComponent;
UNICODE_STRING ParentDir;
} FLT_FILE_NAME_INFORMATION, *PFLT_FILE_NAME_INFORMATION;
The main ingredients are the several UNICODE_STRING structures that should hold the various
components of a file name. Initially, only the Name field is initialized to the full file name (depending
Chapter 12: File System Mini-Filters 402
on the flags used to query the file name information, “full” may be a partial name). If the request
specified the flag FLT_FILE_NAME_NORMALIZED, then Name points to the full path name, in device
form. Device form means that file such as c:\mydir\myfile.txt is stored with the internal device name
to which “C:” maps to, such as \Device\HarddiskVolume3\mydir\myfile.txt. This makes the driver’s
job a bit more complicated if it somehow depends on paths provided by user mode (more on that
later).
The driver should never modify this structure, because the filter manager sometimes
caches it for use with other drivers.
Since only the full name is provided by default (Name field), it’s often necessary to split the
full path to its constituents. Fortunately, the filter manager provides such a service with the
FltParseFileNameInformation API. This one takes the FLT_FILE_NAME_INFORMATION object and
fills in the other UNICODE_STRING fields in the structure.
Note that FltParseFileNameInformation does not allocate anything. It just sets each UNICODE_-
STRING’s Buffer and Length to point to the correct parts in the full Name field. This means there is
no “unparse” function and it’s not needed.
In scenarios where a simple C string is available for a full path, the simpler (and weaker)
function FltParseFileName can be used for getting easy access to the file extension,
stream and final component. It can also be used outside the scope of file system mini-
filters.
The share string is empty for local files (Length is zero). ParentDir is set to the directory only. In our
example that would be \mydir1\mydir2\ (not the trailing backslash). The extension is just that, the file
extension. In our example this is txt.
The FinalComponent field stores the file name and stream name (if not using the default stream). For
our example, it would be myfile.txt.
The Stream component bares some explanation. Some file systems (most notable NTFS) provide the
ability to have multiple data “streams” in a single file. Essentially, this means several files can be
stored into a single “physical” file. In NTFS, for instance, what we typically think of as a file’s data is
in fact one of its streams named “$DATA”, which is considered the default stream. But it’s possible to
create/open another stream, that is stored in the same file, so to speak. Tools such as Windows Explorer
do not look for these streams, and the sizes of any alternate streams are not shown or returned by
standard APIs such as GetFileSize. Stream names are specified with a colon after the file name
before the stream name itself. For example, the file name “myfile.txt:mystream” points to an alternate
stream named “mystream” within the file “myfile.txt”. Alternate streams can be created with the
command interpreter as the following example shows:
C:\Temp>dir hello.txt
Volume in drive C is OS
Volume Serial Number is 1707-9837
Directory of C:\Temp
Notice the zero size of the file. Is the data really in there? Trying to use the type command fails:
Chapter 12: File System Mini-Filters 404
C:\Temp>type hello.txt:mystream
The filename, directory name, or volume label syntax is incorrect.
The type command interpreter does not recognize stream names. We can use the SysInternals tool
Streams.exe to list the names and sizes of alternate streams in files. Here is the command with our
hello.txt file:
The alternate stream content is not shown. To view (and optionally export to another file) the stream’s
data, we can use a tool called NtfsStreams available on my Github AllTools repository. Figure 12-7
shows NtfsStreams opening the hello.txt file from the previous example. We can clearly see stream’s
size and data.
The “$DATA” shown is the stream type, where $DATA is the normal data stream (there are other
predefined stream types). Custom stream types are specifically used in reparse points (beyond the
scope of this book).
Of course alternate streams can be created programmatically by passing the stream name at the end
of the filename after a colon, to the CreateFile API. Here is an example (error handling omitted):
Chapter 12: File System Mini-Filters 405
Streams can also be deleted normally with DeleteFile and can be enumerated (this is what
streams.exe and ntfsstreams.exe do) with FindFirstStream and FileNextStream.
QueryDefault = FLT_FILE_NAME_QUERY_DEFAULT,
QueryCacheOnly = FLT_FILE_NAME_QUERY_CACHE_ONLY,
QueryFileSystemOnly = FLT_FILE_NAME_QUERY_FILESYSTEM_ONLY,
RequestFromCurrentProvider = FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER,
DoNotCache = FLT_FILE_NAME_DO_NOT_CACHE,
AllowQueryOnReparse = FLT_FILE_NAME_ALLOW_QUERY_ON_REPARSE
};
DEFINE_ENUM_FLAG_OPERATORS(FileNameOptions);
struct FilterFileNameInformation {
FilterFileNameInformation(PFLT_CALLBACK_DATA data, FileNameOptions options \
=
FileNameOptions::QueryDefault | FileNameOptions::Normalized);
~FilterFileNameInformation();
Chapter 12: File System Mini-Filters 406
PFLT_FILE_NAME_INFORMATION operator->() {
return _info;
}
NTSTATUS Parse();
private:
PFLT_FILE_NAME_INFORMATION _info;
};
FilterFileNameInformation::FilterFileNameInformation(
PFLT_CALLBACK_DATA data, FileNameOptions options) {
auto status = FltGetFileNameInformation(data,
(FLT_FILE_NAME_OPTIONS)options, &_info);
if (!NT_SUCCESS(status))
_info = nullptr;
}
FilterFileNameInformation::~FilterFileNameInformation() {
if (_info)
FltReleaseFileNameInformation(_info);
}
NTSTATUS FilterFileNameInformation::Parse() {
return FltParseFileNameInformation(_info);
}
FilterFileNameInformation nameInfo(Data);
if(nameInfo) { // operator bool()
if(NT_SUCCESS(nameInfo.Parse())) {
KdPrint(("Final component: %wZ\n", &nameInfo->FinalComponent));
}
}
This flag can be set from user mode in CreateFile with FILE_FLAG_DELETE_ON_CLOSE as one of
the flags (second to last argument). The higher level function DeleteFile uses the same flag behind
the scenes.
For our driver, we want to support both options to cover all our bases. The driver will protect files
with client-defined extensions against deletion. A client can request to set a list of extensions, which
means we also nee a “standard” device object (as we created many times before), sometimes reffered
to as Control Device Object (CDO).
We’ll start by adding a Driver.h file to contain private driver data. This file looks like the following:
#pragma once
#include "ExecutiveResource.h"
struct FilterState {
PFLT_FILTER Filter;
UNICODE_STRING Extentions;
ExecutiveResource Lock;
Chapter 12: File System Mini-Filters 408
PDRIVER_OBJECT DriverObject;
};
The Filter member will hold the mini-filter registration handle. Extensions will hold the list of
extensions we must protect from deletion - the format of that will be described later. Finally, any
changes to the extensions list requires synchronization, so an Execuitive Resource is used (with a
RAII wrapper that we saw in chapter 6). Since most of the time the extension list is read (rather than
written), an Executive Resource is the best synchronization primitive to use.
Why do we need a driver object pointer stored in FilterState? This will become clear when we
implement the driver’s unload functionality.
Given the above declararion, we can create a global instance of the FilterState structure, initialize
it, and proceed to create the CDO and a symbolic link. Here is the complete DriverEntry (in the file
named Driver.cpp), with some KdPrint omitted for brevity:
FilterState g_State;
break;
symLinkCreated = true;
status = FltStartFiltering(g_State.Filter);
if (!NT_SUCCESS(status))
break;
} while (false);
if (!NT_SUCCESS(status)) {
g_State.Lock.Delete();
if(g_State.Filter)
FltUnregisterFilter(g_State.Filter);
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (devObj)
IoDeleteDevice(devObj);
return status;
}
g_State.DriverObject = DriverObject;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = OnCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = OnDeviceControl;
return status;
}
The InitMiniFilter call is used to register the mini-filter. It’s implemeneted in the MiniFilter.cpp file,
to make the driver pieces more “managaeable” - not everything is in the same file. if the mini-filter is
initialized successfully (and all other initializations succeed as well), the call to FltStartFiltering
starts the mini-filter action.
Let’s examine the initialization in InitMiniFilter. The first step is to initialize the “extensions” we
protect. For demonstration and testing purposes we’ll initialize it to a “PDF” extension. This is an
arbitrary choice, but it allows easy testing of the driver even before we implement the client-facing
functionlaity that allows changing the extensions being protected:
Chapter 12: File System Mini-Filters 410
NTSTATUS
InitMiniFilter(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
WCHAR ext[] = L"PDF;";
g_State.Extentions.Buffer = (PWSTR)ExAllocatePool2(POOL_FLAG_PAGED,
sizeof(ext), DRIVER_TAG);
if (g_State.Extentions.Buffer == nullptr)
return STATUS_NO_MEMORY;
The string is allocated dynamically for consistency: if a client modifies the extensions later, the driver
will free the existing string and then allocate a new one. To make it easier to work with multiple
protected extensions, I decided to keep a single string in memory with the extensions stored in
uppercase and separated by semicolons. For example, the string “PDF;DOCX;” indicates protecting
PDF and DOCX files from deletion.
The next piece of code writes the correct Registry entries for the FltRegisterFilter to have a chance
of success. The code is shown in the section “Installation”, earlier in this chapter, so I will not repeat
it here. After the Registry values are written the filter can be registered. We have to prepare an array
of callback structures based on what we need to support - namely IRP_MJ_CREATE (check for files
opened with the “delete-on-close” flag), and IRP_MJ_SET_INFORMATION (if a file is deleted explicitly):
We need pre-operations only, as our purpose is to prevent delete operations. Post-operations don’t
make sense, as the “deed is already done” at that point. Now the main registration structure and the
registration itself:
};
status = FltRegisterFilter(DriverObject, ®, &g_State.Filter);
The DelProtectInstanceSetup callback is where the the mini-filter decides (for each volume) to
attach or to skip. In this example, let’s decide to attach to NTFS volumes only:
NTSTATUS
DelProtectInstanceSetup(
PCFLT_RELATED_OBJECTS FltObjects, FLT_INSTANCE_SETUP_FLAGS Flags,
DEVICE_TYPE VolumeDeviceType, FLT_FILESYSTEM_TYPE VolumeFilesystemType) {
UNREFERENCED_PARAMETER(FltObjects);
UNREFERENCED_PARAMETER(Flags);
UNREFERENCED_PARAMETER(VolumeDeviceType);
STATUS_FLT_DO_NOT_ATTACH indicates the filter does not wish to attach to this volume, while
STATUS_SUCCESS indicates that it does. Using the file system type is one way to make a decision,
where the VolumeDeviceType is another. Consult the docs for the details.
The mini-filter unload callabck is where the mini-filter is unregistered. The driver should not add a
normal unload routine by setting the DriverUnload member of the DRIVER_OBJECT. The reason is
that the filter manager takes control of this callback. If you set it after FltRegisterFilter, some
cleanup won’t happen. If you set it before, it would simply be overriden by FltRegisterFilter. In
summary, this is where our cleanup is done:
FltUnregisterFilter(g_State.Filter);
g_State.Lock.Delete();
UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\DelProtect");
IoDeleteSymbolicLink(&symLink);
IoDeleteDevice(g_State.DriverObject->DeviceObject);
return STATUS_SUCCESS;
}
The remaining instance-related callbacks simply return STATUS_SUCCESS, but can be customized if
desired.
Chapter 12: File System Mini-Filters 412
Handling Pre-Create
The pre-create callback has the job to look for a file opened with the “delete-on-close” flag. The
function itself has the same prototype like all pre-operation callbacks. It starts by not blocking kernel
callers:
FLT_PREOP_CALLBACK_STATUS
DelProtectPreCreate(PFLT_CALLBACK_DATA Data,
PCFLT_RELATED_OBJECTS FltObjects, PVOID*) {
UNREFERENCED_PARAMETER(FltObjects);
if (Data->RequestorMode == KernelMode)
return FLT_PREOP_SUCCESS_NO_CALLBACK;
Allowing kernel callers to move forward regardless is not mandatory of course, but in most cases we
don’t want to prevent kernel code from doing work that may be necessary.
Next we need to check if the flag FILE_DELETE_ON_CLOSE exists in the creation request. The structure
to look at is the Create field under the Paramaters inside Iopb as follows:
The above params variable references the Create structure defined like so:
struct {
PIO_SECURITY_CONTEXT SecurityContext;
//
// The low 24 bits contains CreateOptions flag values.
// The high 8 bits contains the CreateDisposition values.
//
ULONG Options;
Generally, for any I/O operation, the documentation must be consulted to understand what’s available
and how to use it. In our case, the Options field is a combination of flags documented under the
FltCreateFile function (which we’ll use later in this chapter in an unrelated context). The code
checks to see if this flag exists, and if so, it means a delete operation is being initiated.
If the file is opened for deletion, we need to examine the file name and check if its extension is one
that we protect. If true, we have to fail the request. Here is the code:
if (!IsDeleteAllowed(filename)) {
Data->IoStatus.Status = STATUS_ACCESS_DENIED;
status = FLT_PREOP_COMPLETE;
KdPrint(("(Pre Create) Prevent deletion of %wZ\n", filename));
}
}
return status;
}
The file name can be obtained by directly examining the file object - this is only allowed for a pre-
create operation callback, which is exactly the callback we’re in. In all other cases, FltGetFileNameInformation
is the way to go.
IsDeleteAllowed is a private driver function to extract the extension and compare it to the list of
extentions the driver maintains:
//
// search for the prefix
Chapter 12: File System Mini-Filters 414
//
return wcsstr(g_State.Extentions.Buffer, uext) == nullptr;
}
return true;
}
The function starts by calling FltParseFileName to extract the extension. You may be thinking that
getting to the extension should be fairly easy by calling something like wcsrchr, looking for a dot.
However, if the file has a custom NTFS stream name, then finding the end of the extension would
require looking for a colon - not too complex, but why bother when there exists an API that does the
heavy lifting? Here is the prototype of FltParseFileName:
NTSTATUS FltParseFileName (
_In_ PCUNICODE_STRING FileName,
_Inout_opt_ PUNICODE_STRING Extension,
_Inout_opt_ PUNICODE_STRING Stream,
_Inout_opt_ PUNICODE_STRING FinalComponent);
The input is a UNICODE_STRING, with 3 outputs, all of them optional. This API does not allocate
anything - it simply points the UNICODE_STRING objects to the FileName. We just need the extension,
so the other arguments can be set to NULL.
The rest of the code does some juggling to convert the extension to uppercase (RtlUpcaseUnicodeString)
so that wcsstr can be used to search for the extension in the Extensions member we maintain inside
the FilterState structure. If the extension is not found (wcsstr returns NULL), the function returns
true to indicate file deletion is allowed.
FLT_PREOP_CALLBACK_STATUS DelProtectPreSetInformation(
PFLT_CALLBACK_DATA Data, PCFLT_RELATED_OBJECTS FltObjects, PVOID*) {
UNREFERENCED_PARAMETER(FltObjects);
if (Data->RequestorMode == KernelMode)
return FLT_PREOP_SUCCESS_NO_CALLBACK;
Since IRP_MJ_SET_INFORMATION is the way to do several types of operations, we need to check if this
is in fact a delete operation. The driver must first access the proper structure in the FLT_PARAMETERS
union, declared like so:
Chapter 12: File System Mini-Filters 415
struct {
ULONG Length;
FILE_INFORMATION_CLASS POINTER_ALIGNMENT FileInformationClass;
PFILE_OBJECT ParentOfTarget;
union {
struct {
BOOLEAN ReplaceIfExists;
BOOLEAN AdvanceOnly;
};
ULONG ClusterCount;
HANDLE DeleteHandle;
};
PVOID InfoBuffer;
} SetFileInformation;
FileInformationClass indicates which type of operation this instance represents and so we need
to check whether this is a delete operation:
The next step is to check the file extension that is about to be deleted. Since this is not a pre-
create callback, we must use FltGetFileNameInformation to get the file name, and then call
IsDeleteAllowed as before:
PFLT_FILE_NAME_INFORMATION fi;
//
// using FLT_FILE_NAME_NORMALIZED is important here for parsing purposes
//
if (NT_SUCCESS(FltGetFileNameInformation(
Data, FLT_FILE_NAME_QUERY_DEFAULT | FLT_FILE_NAME_NORMALIZED, &fi))) {
if (!IsDeleteAllowed(&fi->Name)) {
Data->IoStatus.Status = STATUS_ACCESS_DENIED;
KdPrint(("(Pre Set Information) Prevent deletion of %wZ\n",
&fi->Name));
status = FLT_PREOP_COMPLETE;
}
FltReleaseFileNameInformation(fi);
}
Now we can test the complete driver - we’ll find that files of the selected extensions cannot be deleted.
Here is an example command sequence once the driver is installed and PDF files are suppooed to be
protected:
c:\temp\>dir
10/19/2022 01:13 PM <DIR> .
05/28/2022 01:09 PM <DIR> Test
10/19/2022 10:41 AM 5 hello1.pdf
10/19/2022 10:41 AM 5 hello2.txt
10/19/2022 10:41 AM 5 hello3.txt
C:\Temp>del hello2.txt
C:\Temp>del hello1.pdf
Access is denied.
DelProtect Configuration
Now that we have the basic driver working, we can add support for custom extensions. The driver
can define a control code to be shared with user mode clients, defined in DelProtectPublic.h:
Chapter 12: File System Mini-Filters 417
The driver’s IRP_MJ_DEVICE_CONTROL doesn’t have anything we didn’t see before. Here is its
complete code:
switch (dic.IoControlCode) {
case IOCTL_DELPROTECT_SET_EXTENSIONS:
auto ext = (WCHAR*)Irp->AssociatedIrp.SystemBuffer;
auto inputLen = dic.InputBufferLength;
if (ext == nullptr ||
inputLen < sizeof(WCHAR) * 2 ||
ext[inputLen / sizeof(WCHAR) - 1] != 0) {
status = STATUS_INVALID_PARAMETER;
break;
}
if (g_State.Extentions.MaximumLength <
inputLen - sizeof(WCHAR)) {
//
// allocate a new buffer to hold the extensions
//
auto buffer = ExAllocatePool2(POOL_FLAG_PAGED,
inputLen, DRIVER_TAG);
if (buffer == nullptr) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
g_State.Extentions.MaximumLength = (USHORT)inputLen;
//
// free the old buffer
//
ExFreePool(g_State.Extentions.Buffer);
g_State.Extentions.Buffer = (PWSTR)buffer;
}
UNICODE_STRING ustr;
Chapter 12: File System Mini-Filters 418
RtlInitUnicodeString(&ustr, ext);
//
// make sure the extensions are uppercase
//
RtlUpcaseUnicodeString(&ustr, &ustr, FALSE);
memcpy(g_State.Extentions.Buffer, ext, len = inputLen);
g_State.Extentions.Length = (USHORT)inputLen;
status = STATUS_SUCCESS;
break;
}
return CompleteRequest(Irp, status, len);
}
Internally, there are only two ways to delete a file - IRP_MJ_CREATE with the FILE_DELETE_ON_CLOSE
flag and IRP_MJ_SET_INFORMATION with FileDispositionInformation. Clearly, in the above list,
item (2) corresponds to the first option and item (3) corresponds to the second option. The only mystery
left is DeleteFile - how does it delete a file?
From the driver’s perspective it does not matter at all, since it must map to one of the two options the
driver handles.
We’ll create a console application project named DelTest, for which the usage text should be something
like this:
c:\book>deltest
Usage: deltest.exe <method> <filename>
Method: 1=DeleteFile, 2=delete on close, 3=SetFileInformation.
Let’s examine the user mode code for each of these methods (assuming filename is a variable pointing
to the file name provided in the command line).
Using DeleteFile is trivial:
Opening the file with the delete-on-close flag can be achieved with the following:
Chapter 12: File System Mini-Filters 419
When the handle is closed, the file should be deleted (if the driver does not prevent it!)
Lastly, using SetFileInformationByHandle:
FILE_DISPOSITION_INFO info;
info.DeleteFile = TRUE;
HANDLE hFile = CreateFile(filename, DELETE, 0, nullptr,
OPEN_EXISTING, 0, nullptr);
BOOL success = SetFileInformationByHandle(hFile, FileDispositionInfo,
&info, sizeof(info));
CloseHandle(hFile);
Managing Directories
For the purpose of this driver, we’ll hold on to a list of directories which should be hidden. This list
can implemented in several ways, such as the linked-lists we have used in previous drivers. To make
it more interesting, we’ll use a dynamic array of string objects, both of which are part of the Kernel
Template Library (KTL), described in Appendix A, and available as part of the book’s downloads. The
idea is to build a reusable library, containing many of the expected types and functions as are available
in the user-mode standard C++ library. The KTL is not nearly as broad as the C++ STL, and it’s not
supposed to be. What it should be, is convenient reusable code for use in driver projects.
To start of, we’ll create an Empty WDM Driver project, as before, named KHide. The driver’s state is
going to be stored in the following structure declared in MiniFilter.h:
Chapter 12: File System Mini-Filters 420
#include <ktl.h>
struct FilterState {
FilterState();
~FilterState();
PFLT_FILTER Filter;
Vector<WString<PoolType::NonPaged>, PoolType::NonPaged> Files;
ExecutiveResource Lock;
PDRIVER_OBJECT DriverObject;
};
The ktl.h header contains all the #includes from other headers, also parts of the KTL. The FilterState
structure has a default constructor and a destructor, which means we cannot create a global variable
of that type and expect the constructor to be called (it won’t). Instead, we’ll use dynamic allocation
to create an instance, which will force calling the constructor. The KTL has overloads for the new and
delete operators.
The members include an Executive Resource (a RAII wrapper over the corresponding kernel object),
a mini-filter handle, and a Vector of WStrings. A WString is a null-terminated, Unicode string,
automatically managed, with a convenient API. The Vector class is a templated type for holding
a dynamic array of any type, used with a WString here. Both types require the pool type to use
internally provided with the PoolType enumeration, which wraps the flags POOL_FLAGS, normally
used with ExAllocatePool2:
The constructor of FilterState should initialize the Executive Resource, while the destructor should
delete it:
FilterState::FilterState() {
Lock.Init();
Filter = nullptr;
}
FilterState::~FilterState() {
Lock.Delete();
}
The Vector will initialize itself in its default constructor (to an empty vector).
The DriverEntry function should be mostly familiar, using the same kind of code as the DelProtect
driver for initializing the file system mini-filter, and creating a CDO to allow managing directories to
hide. Here is the complete implementation (with some KdPrint calls removed):
status = FltStartFiltering(g_State->Filter);
if (!NT_SUCCESS(status))
break;
} while (false);
if (!NT_SUCCESS(status)) {
if (g_State->Filter)
FltUnregisterFilter(g_State->Filter);
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (devObj)
IoDeleteDevice(devObj);
if (g_State)
delete g_State;
return status;
}
g_State->DriverObject = DriverObject;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = OnCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = OnDeviceControl;
//
// for testing purposes
//
#if DBG
g_State->Files.Add(L"c:\\Temp");
#endif
return status;
}
The last line before the return statement adds an example directory (c:\temp) to make it easier to test
the driver without the need to add a client, implement IRP_MJ_DEVICE_CONTROL, etc.
Initializing and registering the driver as a min-filter is very similar to the DelProtect driver. The
operation we’re concerned with is IRP_MJ_DIRECTORY_CONTROL, which is called when directory
information is required by a client. Here is the registration code (in MiniFilter.cpp):
Chapter 12: File System Mini-Filters 423
This driver only requires a single operation to intercept, and a single pre-callback for the first phase
of the implementation.
FLT_PREOP_CALLBACK_STATUS
OnPreDirectoryControl(PFLT_CALLBACK_DATA Data, PCFLT_RELATED_OBJECTS, PVOID*) {
if (Data->RequestorMode == KernelMode ||
Data->Iopb->MinorFunction != IRP_MN_QUERY_DIRECTORY)
return FLT_PREOP_SUCCESS_NO_CALLBACK;
We expect the client to provide directory names from the usual user-mode vantage point using drive
letters (often referred to as DOS paths), such as c:\temp. The kernel, however, provides names which
are in device form (e.g. \Device\HarddiskVolume4\Temp). We can convert the user-provided paths to
device form before storing them in the vector, or convert the device form path received from the filter
manager to a DOS path. We’ll take the latter approach in this driver (for versatility).
Chapter 12: File System Mini-Filters 424
The term “DOS path” is a historic one, because of the “drive-colon” format used originally in DOS
(Disk Operating System).
There are a few ways we could use to convert the device path to a DOS path. Probably the simplest
option is the API IoQueryFileDosDeviceName:
NTSTATUS IoQueryFileDosDeviceName(
_In_ PFILE_OBJECT FileObject,
_Out_ POBJECT_NAME_INFORMATION *ObjectNameInformation);
It requires a FILE_OBJECT and returns a POBJECT_NAME_INFORMATION, filling it with the name. The
latter structure is just a glorified UNICODE_STRING:
POBJECT_NAME_INFORMATION nameInfo;
if (!NT_SUCCESS(IoQueryFileDosDeviceName(FltObjects->FileObject, &nameInfo)))
return FLT_PREOP_SUCCESS_NO_CALLBACK;
Now we can acquire the Executive Resource in shared mode (just reading data), and compare the
directory to any one in the list. If found, we can fail the request:
UNICODE_STRING path;
auto status = FLT_PREOP_SUCCESS_WITH_CALLBACK;
{
SharedLocker locker(g_State->Lock);
for (auto& name : g_State->Files) {
name.GetUnicodeString(path);
if (RtlEqualUnicodeString(&path, &nameInfo->Name, TRUE)) {
//
// found directory. fail request
//
Data->IoStatus.Status = STATUS_NOT_FOUND;
Chapter 12: File System Mini-Filters 425
Data->IoStatus.Information = 0;
status = FLT_PREOP_COMPLETE;
break;
}
}
}
ExFreePool(nameInfo);
return status;
}
The SharedLocker class is a RAII wrapper around acquiring/releasing a shared lock of an Executive
Resource. The Vector class is used here with the range-based for feature of C++ 11 (and later).
This works, because Vector implements a begin and end methods (see Appendix A for more
information). A UNICODE_STRING is initialized to prepare calling RtlEqualUnicodeString which
allows comparing two UNICODE_STRING objects for equality, optionally without case sensitivity (TRUE
in the last argument), which is what we want. If a match is found, we set the final status of the IRP to
STATUS_NOT_FOUND (technically any failure status would work), and change the final return value
from the function to FLT_PREOP_COMPLETE, preventing any further propagation to lower-layered
filters.
The driver is installed in the normal way:
Now trying to navigate to a hidden directory (e.g c:\temp) works, but the directory is always reported
empty:
C:\Temp>dir
Volume in drive C has no label.
Volume Serial Number is E041-5DB0
Directory of C:\Temp
possible, but difficult. A better option is to let the file system driver “do its thing”, and then tweak the
returned result before letting it bubble up to the client.
We’ll use the second approach. To that end, we need to respond to IRP_MJ_DIRECTORY_CONTROL after
it has been processed by the I/O stack. This means we need a post callback. The driver’s mini-filter
callback registration structure changes to the following:
The post-callback does the heavy lifting. The idea is to look for a parent directory that contains the
directory we wish to hide, and if this is the case - remove our directory name from the list somehow
before it returns to the caller.
Let’s start, as before, but letting kernel callers have their way without interference:
FLT_POSTOP_CALLBACK_STATUS
OnPostDirectoryControl(PFLT_CALLBACK_DATA Data,
PCFLT_RELATED_OBJECTS FltObjects,
PVOID, FLT_POST_OPERATION_FLAGS flags) {
UNREFERENCED_PARAMETER(FltObjects);
if (Data->RequestorMode == KernelMode ||
Data->Iopb->MinorFunction != IRP_MN_QUERY_DIRECTORY ||
(flags & FLTFL_POST_OPERATION_DRAINING))
return FLT_POSTOP_FINISHED_PROCESSING;
If the caller is from kernel mode, or the request is not “query directory” (IRP_MN_QUERY_DIRECTORY),
we let the request continue normally. The last check is an optimization that looks at the Flags
argument, where the value FLTFL_POST_OPERATION_DRAINING indicates the mini-filter instance is
being detached, so no point in doing anything.
The information we get with IRP_MJ_DIRECTORY_CONTROL and IRP_MN_QUERY_DIRECTORY in the
FLT_PARAMETERS union looks like the following:
Chapter 12: File System Mini-Filters 427
struct {
ULONG Length;
PUNICODE_STRING FileName;
FILE_INFORMATION_CLASS FileInformationClass;
ULONG POINTER_ALIGNMENT FileIndex;
PVOID DirectoryBuffer;
PMDL MdlAddress;
} QueryDirectory;
Table 12-4: Query directory file information class values and data
All the above data structures are similar in spirit, but not identical. Let’s take one example:
All the structures listed in table 12-4 start with a NextEntryOffset member that points to the next
same-kind structure. Its value must be added to the current pointer to this structure. The last instance
has the NextEntryOffset set to zero, indicating there are no more instances. This idea is depicted in
figure 12-8.
The interesting part of the specific structure is the FileName member. This has the file or directory
name for which some information is required or provided. This is not a full path - rather, it’s just
the final name relative to the immediate parent directory. For example, if a query directory is sent
to a directory named c:\Dir1\Dir2, the FileName members would hold names like file1.txt, mydir
(directory), and so on.
All the details above mean that in order to hide a directory from a listing, we first need to check
if the parent directory being queried is a parent of any of the directories we are supposed to hide.
Then we need to traverse the structure layout as described, looking for the directory name (its final
component). if we find it, we can hide the directory by pointing the previous NextEntryOffset to
the next one, skipping this one structure we want to “hide”. This is depicted in figure 12-9.
Chapter 12: File System Mini-Filters 429
The example above is using FILE_DIRECTORY_INFORMATION, but we have to cotend with all other
7 possible structures. The problem is that the FileName member is not located at the same offset in
these structures! How can we deal with that in a sensible way?
Fortunately, in recent WDK versions, the <ntifs.h> header (where these structures are defined, and is
included by FltKernel.h) provides several convenience macros that provide the offsets to key (common)
members in these structures, namely NextEntryOffset (which is always zero in current versions),
FileName, and FileNameLength. These macros initialize a structure named FILE_INFORMATION_-
DEFINITION to hold these offsets along with the corresponding FileInformationClass:
// from ntifs.h
#define FileDirectoryInformationDefinition { \
FileDirectoryInformation, \
FIELD_OFFSET(FILE_DIRECTORY_INFORMATION, NextEntryOffset), \
FIELD_OFFSET(FILE_DIRECTORY_INFORMATION, FileName), \
FIELD_OFFSET(FILE_DIRECTORY_INFORMATION, FileNameLength) \
}
Astute readers may notice a bug here. I didn’t at first, as I assumed the definitions from WDK headers
are correct. Can you spot the error?
I reported the bug, but not sure if and when that will be fixed. It may very well be the case that the
header you’re using is already fixed. Please be aware that the next code snippets assume the error
exists, and swap the usage of FileNameLengthOffset and FileNameOffset.
Back to the QueryDirectory structure. The Length member is the total length of the data pointed
to by DirectoryBuffer. It’s not usually needed, but can serve as a sanity check. The MdlAddress
member provides an optional MDL that points to where DirectoryBuffer does. The docs indicate
that the MDL should be used if provided (by calling MmGetSystemAddressForMdlSafe). The
DirectoryBuffer address, by the way, points to user-mode memory when the query request is
coming from user mode (such as from Explorer.exe).
Now that we have all the pieces for the plan, we can go ahead and implement the rest of the post-
IRP_MJ_DIRECTORY_CONTROL callback.
We’ll continue by setting up an array of the expected structures and information classes using the
macros provided like FileDirectoryInformationDefinition:
Each item in the array is a FILE_INFORMATION_DEFINITION instance holding the correct offsets to
locate NextEntryOffset, FileName, and FileNameLength in each corresponding structure.
Now we need to search and locate the actual information class handed to us:
Chapter 12: File System Mini-Filters 431
if (actual == nullptr) {
KdPrint((DRIVER_PREFIX "Uninteresting info class (%u)\n",
params.FileInformationClass));
return FLT_POSTOP_FINISHED_PROCESSING;
}
The loop above might seem weird, but C++ 11 and later allow using range-based for for iterating
through fixed sized arrays, as is the case here with defs. If that feels awkward, feel free to change
to a class for loop with an index.
The actual pointer now points to the correct FILE_INFORMATION_DEFINITION that we need to use.
Next, we need to garb the DOS path of the queried directory, and start comparing it to our list of
directory parents:
SharedLocker locker(g_State->Lock);
for (auto& name : g_State->Files) {
//
// look for a backslash so we can remove the final component
//
auto bs = wcsrchr(name, L'\\');
if (bs == nullptr)
continue;
UNICODE_STRING copy;
copy.Buffer = name.Data(); // C-pointer to the characters
copy.Length = USHORT(bs - name + 1) * sizeof(WCHAR);
//
// copy now points to the parent directory
// by making its Length shorter
//
if (copy.Length == sizeof(WCHAR) * 2) // Drive+colon only (e.g. C:)
copy.Length += sizeof(WCHAR); // add the backslash
To clarify the above code, suppose the DOS directory is c:\Dir1\Dir2. This means some client is asking
about the contents in this directory. If one of the directories to hide is c:\Dir1\Dir2\Dir3 (stored in one
of the strings in our vector), we have to comapre with its parent, which in this case should succeed.
The parent matches the directory queried, which means we have to iterate through the results, locate
the final component in the list (Dir3 in the above example), and “hide” the directory by changing the
NextEntryOffset as described earlier. Here goes:
ULONG nextOffset = 0;
PUCHAR prev = nullptr;
auto str = bs + 1; // the final component beyond the backslash
do {
//
// due to a current bug in the definition of FILE_INFORMATION_DEFINITION
// the file name and length offsets are switched in the definitions
// of the macros that initialize FILE_INFORMATION_DEFINITION
//
auto filename = (PCWSTR)(base + actual->FileNameLengthOffset);
auto filenameLen = *(PULONG)(base + actual->FileNameOffset);
//
// notify the Filter Manager
//
FltSetCallbackDataDirty(Data);
}
else {
//
// Hide the directory!
//
*(PULONG)(prev + actual->NextEntryOffset) += nextOffset;
}
break;
}
prev = base;
base += nextOffset;
} while (nextOffset != 0);
break;
• We have to keep track of the previous pointer, so that we can manipulate it from the current
node we’re traversing. This is the role of the prev local variable.
• prev is defined as PUCHAR (pointer to unsigned characater - a byte) to make sure adding any
offset is interpreted as bytes. Remember, adding a number to a pointer advances the pointer
by the number times the size of the item being pointed to. Same reasoning applies to the base
variable.
• If the directory we need to hide happens to be the first, we need to change the DirectoryBuffer
member itself (move it to the second item), and that requires notifying the filter manager by
calling FltSetCallbackDataDirty. It can’t really happen in this example, as the first item
returned is always the “.” (dot) directory, referring to the current directory, but it’s good to
know about this practice that may be needed in other cases.
All that’s left to do is free the DOS path and return FLT_POSTOP_FINISHED_PROCESSING from the
callback.
Chapter 12: File System Mini-Filters 434
The full code of the callback is presented here for convenience (with some of the earlier comments
removed):
FLT_POSTOP_CALLBACK_STATUS
OnPostDirectoryControl(PFLT_CALLBACK_DATA Data,
PCFLT_RELATED_OBJECTS FltObjects, PVOID,
FLT_POST_OPERATION_FLAGS flags) {
UNREFERENCED_PARAMETER(FltObjects);
if (Data->RequestorMode == KernelMode ||
Data->Iopb->MinorFunction != IRP_MN_QUERY_DIRECTORY ||
(flags & FLTFL_POST_OPERATION_DRAINING))
return FLT_POSTOP_FINISHED_PROCESSING;
if (actual == nullptr) {
return FLT_POSTOP_FINISHED_PROCESSING;
}
NormalPagePriority);
if (!base)
base = (PUCHAR)params.DirectoryBuffer;
if (base == nullptr) {
return FLT_POSTOP_FINISHED_PROCESSING;
}
SharedLocker locker(g_State->Lock);
for (auto& name : g_State->Files) {
//
// look for a backslash so we can remove the final component
//
auto bs = wcsrchr(name, L'\\');
if (bs == nullptr)
continue;
UNICODE_STRING copy;
copy.Buffer = name.Data();
copy.Length = USHORT(bs - name + 1) * sizeof(WCHAR);
//
// copy now points to the parent directory
// by making its Length shorter
//
if (copy.Length == sizeof(WCHAR) * 2) // Drive+colon only
copy.Length += sizeof(WCHAR); // add the backslash
do {
//
// due to a current bug in the definition of FILE_INFORMATION_DEFINITION
// the file name and length offsets are switched in the definitions
// of the macros that initialize FILE_INFORMATION_DEFINITION
//
auto filename = (PCWSTR)(base +
actual->FileNameLengthOffset);
auto filenameLen = *(PULONG)(base +
actual->FileNameOffset);
Here is a directory listing when c:\temp is supposed to be hidden (before and after):
C:\>dir
Volume in drive C has no label.
Volume Serial Number is E041-5DB0
Directory of C:\
C:\>dir
Volume in drive C has no label.
Volume Serial Number is E041-5DB0
Directory of C:\
You can still navigate to the Temp directory with cd temp, but any dir inside would be empty. If you
want to prevent that, you can handle the pre-callback for IRP_MJ_CREATE and fail access to any of
the managed directories. I’ll leave that as an exercise for the reader.
Contexts
In some scenarios it is desirable to attach some data to file system entities such as volumes and files.
The filter manager provides this capability through contexts. A context is a data structure provided
by the mini-filter driver that can be set and retrieved for any file system object. These contexts are
connected to the objects they are set on, for as long as these objects are alive.
To use contexts, the driver must declare beforehand what contexts it may require and for what type of
objects. This is done as part of the registration structure FLT_REGISTRATION. The ContextRegistration
Chapter 12: File System Mini-Filters 438
• ContextType identifies the object type this context would be attached to. The FLT_CONTEXT_-
TYPE is typedefed as USHORT and can have one of the following values:
As can be seen from the above definitions, a context can be attached to a volume, filter instance, file,
stream, stream handle, transaction and section (on Windows 8 and later). The last value is a sentinel
for indicating this is the end of the list of context definitions. The aside “Context Types” contains
more information on the various context types.
Context Types
The filter manager supports several types of contexts:
• Volume contexts are attached to volumes, such as a disk partition (C:, D:, etc.).
• Instance contexts are attached to filter instances. A mini-filter can have several instances
running, each attached to a different volume.
• File contexts can be attached to files in general (and not a specific file stream).
Chapter 12: File System Mini-Filters 439
• Stream contexts can be attached to file streams, supported by some file systems, such as
NTFS. File systems that support a single stream per file (such as FAT) treat stream contexts
as file contexts.
• Stream handle contexts can be attached to a stream on a per FILE_OBJECT.
• Transaction contexts can be attached to a transaction that is in progress. Specifically, the
NTFS file system supports transactions, and such so a context can be attached to a running
transaction.
• Section contexts can be attached to section (file mapping) objects created with the function
FltCreateSectionForDataScan (beyond the scope of this chapter).
Not all types of contexts are supported on all file systems. The filter manager provides APIs to
query this dynamically if desired (for some context types), such as FltSupportsFileContexts,
FltSupportsFileContextsEx and FltSupportsStreamContexts.
Context size can be fixed or variable. If fixed size is desired, it’s specified in the Size field of
FLT_CONTEXT_REGISTRATION. For a variable sized context, a driver specifies the special value
FLT_VARIABLE_SIZED_CONTEXTS (-1). Using fixed-size contexts is more efficient, because the filter
manager can use lookaside lists for managing allocations and deallocations.
The pool tag is specified with the PoolTag field of FLT_CONTEXT_REGISTRATION. This is the tag the
filter manager will use when actually allocating the context. The next two fields are optional callbacks
where the driver provides the allocation and deallocation functions. If these are non-NULL, then the
PoolTag and Size fields are meaningless and not used.
Here is an example of building an array of context registration structure:
struct MyContext {
//...
};
Managing Contexts
To actually use a context, a driver first needs to allocate it by calling FltAllocateContext, defined
like so:
Chapter 12: File System Mini-Filters 440
NTSTATUS FltAllocateContext (
_In_ PFLT_FILTER Filter,
_In_ FLT_CONTEXT_TYPE ContextType,
_In_ SIZE_T ContextSize,
_In_ POOL_TYPE PoolType,
_Outptr_ PFLT_CONTEXT *ReturnedContext);
The Filter parameter is the filter’s opaque pointer returned by FltRegisterFilter but also available
in the FLT_RELATED_OBJECTS structure provided to all callbacks. ContextType is one of the supported
context macros shown earlier, such as FLT_FILE_CONTEXT. ContextSize is the requested context size
in bytes (must be greater than zero). PoolType can be PagedPool or NonPagedPool, depending on
what IRQL the driver is planning to access the context (for volume contexts, NonPagedPool must be
specified). Finally, the ReturnedContext field stores the returned allocated context; PFLT_CONTEXT is
typedefed as PVOID.
Once the context has been allocated, the driver can store in that data buffer anything it wishes. Then
it must attach the context to an object (this is the reason to create the context in the first place) using
one of several functions named FltSetXxxContext where “Xxx” is one of File, Instance, Volume,
Stream, StreamHandle, or Transaction. The only exception is a section context which is set with
FltCreateSectionForDataScan. Each of the FltSetXxxContext functions has the same generic
makeup, shown here for the File case:
NTSTATUS FltSetFileContext (
_In_ PFLT_INSTANCE Instance,
_In_ PFILE_OBJECT FileObject,
_In_ FLT_SET_CONTEXT_OPERATION Operation,
_In_ PFLT_CONTEXT NewContext,
_Outptr_ PFLT_CONTEXT *OldContext);
The function accepts the required parameters for the context at hand. In this file case it’s the instance
(actually needed in any set context function) and the file object representing the file that should
carry this context. The Operation parameter can be either FLT_SET_CONTEXT_REPLACE_IF_EXISTS
or FLT_SET_CONTEXT_KEEP_IF_EXISTS, which are pretty self explanatory.
NewContext is the context to set, and OldContext is an optional parameter that can be used to retrieve
the previous context with the operation set to FLT_SET_CONTEXT_REPLACE_IF_EXISTS.
Contexts are reference counted. Allocating a context (FltAllocateContext) and setting a context
increment its reference count. The opposite function is FltReleaseContext that must be called a
matching number of times to make sure the context is not leaked. Although there is context delete
function (FltDeleteContext), it’s usually not needed as the filter manager will tear down the context
once the file system object holding it is destroyed.
You must pay careful attention to context management, otherwise you may find that the
driver cannot be unloaded because a positive reference counted context is still alive, and
the file system object it’s attached to has not yet been deleted (such as a file or volume).
Clearly, this suggests a RAII context handling class could be useful.
Chapter 12: File System Mini-Filters 441
The typical scenario would be to allocate a context, fill it, set it on the relevant object and then call
FltReleaseContext once, keeping a reference count of one for the context. We will see a practical
use of contexts in the “File Backup Driver” section later in this chapter.
Once a context has been set on an object, other callbacks may wish to get a hold of that context. A set
of “get” functions provide access to the relevant context, all named in the form FltGetXxxContext,
where “Xxx” is one of File, Instance, Volume, Stream, StreamHandle, Transaction or Section.
The “get” functions increment the context’s reference count and so calling FltReleaseContext is
necessary once working with the context is completed.
NTSTATUS FltCreateFile (
_In_ PFLT_FILTER Filter,
_In_opt_ PFLT_INSTANCE Instance,
_Out_ PHANDLE FileHandle,
_In_ ACCESS_MASK DesiredAccess,
_In_ POBJECT_ATTRIBUTES ObjectAttributes,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
_In_opt_ PLARGE_INTEGER AllocationSize,
_In_ ULONG FileAttributes,
_In_ ULONG ShareAccess,
_In_ ULONG CreateDisposition,
_In_ ULONG CreateOptions,
_In_reads_bytes_opt_(EaLength) PVOID EaBuffer,
_In_ ULONG EaLength,
_In_ ULONG Flags);
Wow, that’s quite a mouthful - this function has many, many options. Fortunately, they are not difficult
to understand, but they must be set just right, or the call will fail with some weird status.
Chapter 12: File System Mini-Filters 442
As can be seen from the declaration, the first argument is the filter opaque address, used as the base
layer for I/O operations through the resulting file handle. The main return value is the FileHandle
to the open file if successful. We won’t go over all the various parameters (refer to the WDK
documentation), but we will use this function in the next section.
The extended function FltCreateFileEx has an additional output parameter which is the FILE_-
OBJECT pointer created by the function. FltCreateFileEx2 has an additional input parameter of
type IO_DRIVER_CREATE_CONTEXT used to specify additional information to the file system (refer
to the WDK documentation for more information).
With the returned handle, the driver can call the standard I/O APIs such as ZwReadFile, ZwWriteFile,
etc. The operation will still target lower layers only. Alternatively, the driver can use the returned
FILE_OBJECT from FltCreateFileEx or FltCreateFileEx2 with functions such as FltReadFile
and FltWriteFile (the latter functions require the file object rather than a handle). The Flt functions
are preferable, not just for consistency, but also because they are slightly faster, as they receive the
file object directly rather than having a handle to look up to locate the file object.
Once the operation is done, FltClose must be called on the returned handle. If a file object was
returned as well, its reference count must be decremented with ObDereferenceObject to prevent a
leak.
to be simpler this time, as we will not implement any CDO, and just use hard-coded rules to decide
which files should be backed up. Adding the required flexibility is saved as an exercise for the reader.
Here is DriverEntry:
PFLT_FILTER g_Filter;
extern "C"
NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath)\
{
auto status = InitMiniFilter(DriverObject, RegistryPath);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "Failed to init mini-filter (0x%X)\n", status));
return status;
}
status = FltStartFiltering(g_Filter);
if (!NT_SUCCESS(status)) {
FltUnregisterFilter(g_Filter);
}
return status;
}
We just register the filter by calling InitMiniFilter (do be described momentarily), and call
FltStartFiltering to get things going.
Registering the filter is mostly similar to earlier drivers, except that we’ll need some context to be kept
for files that we are going to back up. This means registration needs information about the context
objects we plan to use. Here is the context structure we’ll use:
struct FileContext {
Mutex Lock;
LARGE_INTEGER BackupTime;
BOOLEAN Written;
};
We’ll see the usage of this structure when we implement the callbacks. Registration is performed
within InitMiniFilter after the standard Registry entries have been written:
Chapter 12: File System Mini-Filters 444
As far as contexts go, we’ll need a context attached to certain files, so FLT_FILE_CONTEXT is the type
of context required. As for callbacks, we need to intercept IRP_MJ_CREATE after a file object has been
created to see whether it’s an interesting file. IRP_MJ_WRITE is required, so we can write the contents
of the file right before its contents are modified. The IRP_MJ_CLEANUP operation will be used to clean
up our context objects.
Since we’ll be using alternate streams, only NTFS can be used, as it’s the only standard file system in
Windows to support alternate file streams. This means the driver should not attach to a volume not
using NTFS. We used similar code in earlier drivers to attach to NTFS volumes only:
Chapter 12: File System Mini-Filters 445
NTSTATUS BackupInstanceSetup(
PCFLT_RELATED_OBJECTS FltObjects, FLT_INSTANCE_SETUP_FLAGS Flags,
DEVICE_TYPE VolumeDeviceType, FLT_FILESYSTEM_TYPE VolumeFilesystemType) {
UNREFERENCED_PARAMETER(FltObjects);
UNREFERENCED_PARAMETER(Flags);
UNREFERENCED_PARAMETER(VolumeDeviceType);
FLT_POSTOP_CALLBACK_STATUS OnPostCreate(
PFLT_CALLBACK_DATA Data, PCFLT_RELATED_OBJECTS FltObjects,
PVOID, FLT_POST_OPERATION_FLAGS Flags) {
if (Flags & FLTFL_POST_OPERATION_DRAINING)
return FLT_POSTOP_FINISHED_PROCESSING;
Next, let’s extract the parameters of the create operation, and check if the file in question is a directory:
FltIsDirectory is a simple function provided by the filter manager that returns TRUE in the last
boolean argument if the file object in question refers to a directory.
We are only interested in files opened for write access, not from kernel mode, and not new files (since
new files do not require backup). Also, directories are not interesting:
Chapter 12: File System Mini-Filters 446
if (dir
|| Data->RequestorMode == KernelMode
|| (params.SecurityContext->DesiredAccess & FILE_WRITE_DATA) == 0
|| Data->IoStatus.Status != STATUS_SUCCESS
|| Data->IoStatus.Information == FILE_CREATED) {
//
// kernel caller, not write access or a new file - skip
//
return FLT_POSTOP_FINISHED_PROCESSING;
}
The IO_STATUS_BLOCK.Information in a post-create callback returns how the file was creat-
ed/opened (if the operation is successful). In the case of a new file being created, we don’t care, as
there is nothing to back up.
Check out the documentation for FLT_PARAMETERS for IRP_MJ_CREATE to get more information on
the details shown above.
These kinds of checks are important, as they remove a lot of possible overhead for the driver. The
driver should always strive to do as little as possible to reduce its performance impact.
Now that we have a file we care about, we need to prepare a context object to be attached to the file.
This context will be needed later when we process the pre-write callback. First, we’ll extract the name
of the file. The driver needs to call the standard FltGetFileNameInformation. To make it a little
easier and less error-prone, we’ll use the RAII wrapper from the KTL.
Why don’t we just create a backup for the file right here and now? The file was opened for write
access, but there is no guarantee the client will actually write to the file; so we’ll wait until we get
a pre-write callback to perform the backup.
FilterFileNameInformation fileNameInfo(Data);
if (!fileNameInfo) {
return FLT_POSTOP_FINISHED_PROCESSING;
}
In this driver, we’ll backup files that have certain extensions - as mentioned already these will be
hard coded to simplify the coding that has little to do with file system mini-filters. We’ll call a helper
function to determine if we should care about this file:
Chapter 12: File System Mini-Filters 447
if (!ShouldBackupFile(fileNameInfo))
return FLT_POSTOP_FINISHED_PROCESSING;
//
// hard coded list of extensions
//
static PCWSTR extensions[] = {
L"txt", L"docx", L"doc", L"jpg", L"png"
};
return false;
}
if (fileNameInfo->Stream.Length > 0)
return FLT_POSTOP_FINISHED_PROCESSING;
Finally, we are ready to allocate our file context and initialize it. Allocation requires a call to
FltAllocateContext and specifying the context type and other details:
Chapter 12: File System Mini-Filters 448
FileContext* context;
auto status = FltAllocateContext(FltObjects->Filter,
FLT_FILE_CONTEXT, sizeof(FileContext), PagedPool,
(PFLT_CONTEXT*)&context);
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to allocate file context (0x%08X)\n", status));
return FLT_POSTOP_FINISHED_PROCESSING;
}
FltAllocateContext allocates a context with the required size and returns a pointer to the allocated
memory. PFLT_CONTEXT is just a void* - we can cast it to whatever type we need. The returned
context memory is not zeroed out, so all members must be initialized properly.
Now we can initialize the context and set it on the file object:
context->Written = FALSE;
context->Lock.Init();
context->BackupTime.QuadPart = 0;
//
// set file context
//
status = FltSetFileContext(FltObjects->Instance,
FltObjects->FileObject,
FLT_SET_CONTEXT_REPLACE_IF_EXISTS,
context, nullptr);
Why do we need this context in the first place? A typical client opens a file for write access and then
calls WriteFile potentially multiple times. Before the first call to WriteFile the driver should back
up the existing content of the file. This is why we need the boolean Written field - to make sure we
make the backup just once before the first write attempt. This flag starts as FALSE and will turn TRUE
after the first write operation. This turn of events is depicted in Figure 12-10.
Figure 12-10: Client and driver operations for common write sequence
Why do we need a mutex? We need some synchronization in an unlikely, but possible case, where
more than one thread within the client process write to the same file at roughly the same time. In such
a case, we need to make sure we make a single backup of the data, otherwise our backup may become
Chapter 12: File System Mini-Filters 449
corrupted. In all examples thus far where we needed such synchronization, we used a fast mutex, but
here we’re using a standard mutex. Why? The driver will perform I/O operations while holding the
(fats) mutex. I/O operations can only performed at IRQL PASSIVE_LEVEL (0). An acquired fast mutex
raises IRQL to APC_LEVEL (1), which will cause a deadlock if I/O APIs are used.
The deadlock occurs because I/O operations are completed by sending a special kernel APC to the
original thread. If that thread is waiting on a dast mutex (at IRQL APC_LEVEL=1), it will never run
the APC (all APCs are blocked while the IRQL is APC_LEVEL), thus a deadlock.
The Mutex class is the same one shown in chapter 6 (part of the KTL as well). The BackupTime member
is zeroed out and will be modified when we back up the file. In the current version of the driver, this
information is not used, but it could be written to another stream in the file as some sort of “metadata”.
Finally, FltReleaseContext must be called, which if all is well, sets the internal reference count of
the context to 1 (+1 for allocate, +1 for set, -1 for release):
FltReleaseContext(context);
return FLT_POSTOP_FINISHED_PROCESSING;
}
FLT_PREOP_CALLBACK_STATUS
OnPreWrite(PFLT_CALLBACK_DATA Data,
PCFLT_RELATED_OBJECTS FltObjects, PVOID*) {
//
// get the file context if exists
//
FileContext* context;
//
// no context, continue normally
//
return FLT_PREOP_SUCCESS_NO_CALLBACK;
}
Once we have a context, we need to make a copy of the file data just once before the first write
operation. First, we acquire the mutex and check the written flag from the context. if it’s false, then
a backup was not created yet and we call a helper function to make the backup:
do {
Locker locker(context->Lock);
if (context->Written) {
//
// already written, nothing to do
//
break;
}
FilterFileNameInformation name(Data);
if (!name)
break;
FltReleaseContext(context);
//
// don't prevent the write regardless of any error
//
return FLT_PREOP_SUCCESS_NO_CALLBACK;
}
Locker<> is the usual RAII type to acquire a synchronization object in its constructor and release in
the destructor.
Chapter 12: File System Mini-Filters 451
The BackupFile helper function is the key to making all this work. One might thing that making a
file copy is just an API away; unfortunately, it’s not. There is no “CopyFile” function in the kernel.
The CopyFile user mode API is a non-trivial function that does quite a bit of work to make copy
work. Part of it is reading bytes from the source file and writing to the destination file. But that’s not
enough in the general case. First, there may be multiple streams to copy (in case of NTFS). Second,
there is the question of the security descriptor from the original file which also needs to be copied in
certain cases (refer to the documentation for CopyFile to get all details).
The bottom line is that there is no single CopyFile we can use, and we’ll have to create our own
file copy operation. Fortunately, we just need to copy a single file stream - the default stream to
another stream inside the same physical file as our backup stream. Here is the start of our BackupFile
function:
FsRtlGetFileSize is a simple API that returns the size of a file (default NTFS stream).
This API is recommended whenever the file size is needed given a FILE_OBJECT pointer. The
alternative would be calling ZwQueryInformationFile or FltQueryInformationFile to obtain the
file size (it has many other types of information it can retrieve). The Zw variant is less desirable as it
requires a file handle and in some cases can cause a deadlock.
The route we’ll take is to open two handles - one (source) handle pointing to the original file (with
the default stream to back up) and the other (target) handle to the backup stream. Then, we’ll read
from the source and write to the target. This is conceptually simple, but as is often the case in kernel
programming, the devil is in the details.
Now we’re ready to open the source file with FltCreateFileEx. It’s important not to use ZwCreateFile,
so that the I/O requests are sent to the driver below this driver and not to the top of the file system
driver stack:
do {
OBJECT_ATTRIBUTES sourceFileAttr;
Chapter 12: File System Mini-Filters 452
InitializeObjectAttributes(&sourceFileAttr, path,
OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, nullptr, nullptr);
status = FltCreateFileEx(
FltObjects->Filter, // filter object
FltObjects->Instance, // filter instance
&hSourceFile, // resulting handle
&sourceFile, // resulting file object
GENERIC_READ | SYNCHRONIZE, // access mask
&sourceFileAttr, // object attributes
&ioStatus, // resulting status
nullptr, FILE_ATTRIBUTE_NORMAL, // allocation size, file attributes
FILE_SHARE_READ | FILE_SHARE_WRITE, // share flags
FILE_OPEN, // create disposition
FILE_SYNCHRONOUS_IO_NONALERT | FILE_SEQUENTIAL_ONLY, // sync I/O
nullptr, 0, // extended attributes, EA length
IO_IGNORE_SHARE_ACCESS_CHECK); // flags
if (!NT_SUCCESS(status))
break;
Before calling FltCreateFileEx, just like other APIs requiring a name, an OBJECT_ATTRIBUTES
structure must be initialized properly with the file name provided to BackupFile. This is the default
file stream that is about to change by a write operation and that’s why we’re making the backup. The
important arguments in the call are:
• filter and instance objects, which provide the necessary information for the call to go to the
next lower layer filter (or the file system) rather than go to the top of the file system stack.
• the returned handle, in hSourceFile.
• the returned FILE_OBJECT, to be used with FltReadFile.
• the access mask set to GENERIC_READ and SYNCHRONIZE.
• the create disposition, in this case indicating the file must exist (FILE_OPEN).
• the create options are set to FILE_SYNCHRONOUS_IO_NONALERT indicating synchronous oper-
ations through the resulting file handle. The SYNCHRONIZE access mask flag is required for
synchronous operations to work.
• the flag IO_IGNORE_SHARE_ACCESS_CHECK is important, because the file in question was
already opened by the client that most likely opened it with no sharing allowed. So we ask
the file system to ignore share access checks for this call.
Read the documentation of FltCreateFileEx to gain a better understanding of all the various
options this function provides.
Chapter 12: File System Mini-Filters 453
Next we need to open or create the backup stream within the same file. We’ll name the backup stream
“:backup” and use another call to FltCreateFileEx to get a handle and file object to the target file:
//
// open target file
//
UNICODE_STRING targetFileName;
const WCHAR backupStream[] = L":backup";
targetFileName.MaximumLength = path->Length + sizeof(backupStream);
targetFileName.Buffer = (WCHAR*)ExAllocatePool2(POOL_FLAG_PAGED,
targetFileName.MaximumLength, DRIVER_TAG);
if (targetFileName.Buffer == nullptr) {
status = STATUS_NO_MEMORY;
break;
}
RtlCopyUnicodeString(&targetFileName, path);
RtlAppendUnicodeToString(&targetFileName, backupStream);
OBJECT_ATTRIBUTES targetFileAttr;
InitializeObjectAttributes(&targetFileAttr, &targetFileName,
OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, nullptr, nullptr);
status = FltCreateFileEx(
FltObjects->Filter, // filter object
FltObjects->Instance, // filter instance
&hTargetFile, // resulting handle
&targetFile, // resulting file object
GENERIC_WRITE | SYNCHRONIZE, // access mask
&targetFileAttr, // object attributes
&ioStatus, // resulting status
nullptr, FILE_ATTRIBUTE_NORMAL,
0, // share flags
FILE_OVERWRITE_IF, // create disposition
FILE_SYNCHRONOUS_IO_NONALERT | FILE_SEQUENTIAL_ONLY,
nullptr, 0, // extended attributes, EA length
0); // flags
ExFreePool(targetFileName.Buffer);
if (!NT_SUCCESS(status)) {
//
// could fail if a restore operation is in progress
//
break;
Chapter 12: File System Mini-Filters 454
The file name is built by concatenating the base file name and the backup stream name. It is opened for
write access (GENERIC_WRITE) and overwrites any data that may be present (FILE_OVERWRITE_IF).
With these file objects in hand, we can start from the source and writing to the target. A simple
approach would be to allocate a buffer with the file size, and do the work with a single read and a
single write. This could be problematic, however, if the file is very large, possibly causing memory
allocation to fail.
There is also the risk of creating a backup for a very large file, possibly consuming lots
of disk space. For this kind of driver, backup should probably be avoided when a file is
too large (configurable in the Registry for instance) or avoid backup if the remaining disk
space would be below a certain threshold (again could be configurable). This is left as an
exercise for the reader.
A better option would be to allocate a relatively small buffer and just loop around until all the files
chunks have been copied. This is the approach we’ll use. First, allocate a buffer:
ULONG bytes;
auto saveSize = fileSize;
while (fileSize.QuadPart > 0) {
status = FltReadFile(
FltObjects->Instance,
sourceFile, // source file object
nullptr, // byte offset
(ULONG)min((LONGLONG)size, fileSize.QuadPart), // # of bytes
buffer,
0, // flags
&bytes, // bytes read
nullptr, nullptr); // no callback
if (!NT_SUCCESS(status))
break;
//
Chapter 12: File System Mini-Filters 455
if (!NT_SUCCESS(status))
break;
//
// update byte count remaining
//
fileSize.QuadPart -= bytes;
}
The loop keeps going as long as there are bytes to transfer. We start with the file size and then
decrement it for every chunk transferred. The function that do the actual work are FltReadFile
and FltWriteFile. We could have used ZwReadFile and ZwWriteFIle (we have handles), but this
is slightly less efficient. Notice the offsets are set to NULL, because we’re using synchronous I/O, where
the file objects track a file pointer automatically.
When all is done, there is one last thing to do. Since we may be overwriting a previous backup (that
may have been larger than this one), we must set the end of file pointer to the current offset:
FILE_END_OF_FILE_INFORMATION info;
info.EndOfFile = saveSize;
status = FltSetInformationFile(FltObjects->Instance,
targetFile, &info, sizeof(info), FileEndOfFileInformation);
} while (false);
if (buffer)
ExFreePool(buffer);
if (hSourceFile)
FltClose(hSourceFile);
if (hTargetFile)
FltClose(hTargetFile);
if (sourceFile)
ObDereferenceObject(sourceFile);
if (targetFile)
ObDereferenceObject(targetFile);
return status;
}
FLT_POSTOP_CALLBACK_STATUS
OnPostCleanup(PFLT_CALLBACK_DATA Data, PCFLT_RELATED_OBJECTS FltObjects,
PVOID, FLT_POST_OPERATION_FLAGS Flags) {
UNREFERENCED_PARAMETER(Flags);
UNREFERENCED_PARAMETER(Data);
FileContext* context;
FltReleaseContext(context);
FltDeleteContext(context);
return FLT_POSTOP_FINISHED_PROCESSING;
}
C:\Demos>type c:\Temp\hello.txt
goodbye, world!
C:\Demos>streams -d c:\Temp\hello.txt
:backup:$DATA (15 bytes)
68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0D 0A hello, world!..
The Streams tool uses the FindFirstStreamW and FindNextStreamW to iterate over the
streams within a file. Check out the source code for more information.
Restoring Backups
How can we restore a backup? We need to copy the “:backup” stream contents over the “normal” file
contents. Unfortunately, the CopyFile API cannot do this, as it does not accept alternate streams.
Let’s write a utility to do the work.
We’ll create a new console application project named Restore. We’ll add the following #includes to
the Restore.cpp file:
#include <Windows.h>
#include <stdio.h>
#include <string>
The main function should accept the file name as a command line argument:
Chapter 12: File System Mini-Filters 458
Next, we’ll open two files, one pointing to the “:backup” stream and the other to the “normal” file.
Then, we’ll copy in chunks, similarly to the driver’s BackupFile code - but in user mode. The Error
function just prints the provided text and whatever is returned from GetLastError:
std::wstring stream(argv[1]);
stream += L":backup";
LARGE_INTEGER size;
if (!GetFileSizeEx(hSource, &size))
return Error("Failed to get file size");
DWORD bytes;
while (size.QuadPart > 0) {
if (!ReadFile(hSource, buffer,
(DWORD)(min((LONGLONG)bufferSize, size.QuadPart)),
&bytes, nullptr))
return Error("Failed to read data");
Extend the driver to store an additional stream in the file with the backup time and date.
NTSTATUS
BackupFileWithSection(PUNICODE_STRING path, PCFLT_RELATED_OBJECTS FltObjects) {
LARGE_INTEGER fileSize;
auto status = FsRtlGetFileSize(FltObjects->FileObject, &fileSize);
if (!NT_SUCCESS(status) || fileSize.QuadPart == 0)
return status;
do {
OBJECT_ATTRIBUTES sourceFileAttr;
InitializeObjectAttributes(&sourceFileAttr, path,
OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, nullptr, nullptr);
Chapter 12: File System Mini-Filters 460
status = FltCreateFileEx(
FltObjects->Filter,
FltObjects->Instance,
&hSourceFile,
&sourceFile,
GENERIC_READ | SYNCHRONIZE,
&sourceFileAttr,
&ioStatus,
nullptr, FILE_ATTRIBUTE_NORMAL,
FILE_SHARE_READ | FILE_SHARE_WRITE,
FILE_OPEN,
FILE_SYNCHRONOUS_IO_NONALERT | FILE_SEQUENTIAL_ONLY,
nullptr, 0,
IO_IGNORE_SHARE_ACCESS_CHECK);
if (!NT_SUCCESS(status))
break;
UNICODE_STRING targetFileName;
const WCHAR backupStream[] = L":backup";
targetFileName.MaximumLength = path->Length + sizeof(backupStream);
targetFileName.Buffer = (WCHAR*)ExAllocatePool2(POOL_FLAG_PAGED,
targetFileName.MaximumLength, DRIVER_TAG);
if (targetFileName.Buffer == nullptr) {
status = STATUS_NO_MEMORY;
break;
}
RtlCopyUnicodeString(&targetFileName, path);
RtlAppendUnicodeToString(&targetFileName, backupStream);
OBJECT_ATTRIBUTES targetFileAttr;
InitializeObjectAttributes(&targetFileAttr, &targetFileName,
OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, nullptr, nullptr);
status = FltCreateFileEx(
FltObjects->Filter,
FltObjects->Instance,
&hTargetFile,
&targetFile,
GENERIC_WRITE | SYNCHRONIZE,
&targetFileAttr,
&ioStatus,
nullptr, FILE_ATTRIBUTE_NORMAL,
Chapter 12: File System Mini-Filters 461
0,
FILE_OVERWRITE_IF,
FILE_SYNCHRONOUS_IO_NONALERT | FILE_SEQUENTIAL_ONLY,
nullptr, 0, 0);
ExFreePool(targetFileName.Buffer);
if (!NT_SUCCESS(status)) {
break;
}
Now comes the new stuff. We’ll create a section object pointing to the source file:
The secion is created for read access, pointing to the source file (last argument). The loop needs to
map a view into memory of file chunks (we’ll go with 1 MB chunks as before), and then write the
data based on the mapped pointer:
LARGE_INTEGER offset{};
auto saveSize = fileSize;
PVOID buffer = nullptr;
SIZE_T size = 1 << 20;
while (fileSize.QuadPart > 0) {
buffer = nullptr;
SIZE_T bytes = min((LONGLONG)size, fileSize.QuadPart);
status = ZwMapViewOfSection(hSection, NtCurrentProcess(), &buffer, 0, 0\
, &offset, &bytes, ViewUnmap, 0, PAGE_READONLY);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "Failed in ZwMapViewOfSection (0x%X)\n", sta\
tus));
break;
}
ULONG written;
status = FltWriteFile(
FltObjects->Instance,
targetFile, nullptr,
Chapter 12: File System Mini-Filters 462
(ULONG)bytes, buffer,
0, &written,
nullptr, nullptr);
ZwUnmapViewOfSection(NtCurrentProcess(), buffer);
if (!NT_SUCCESS(status))
break;
//
// update count and offset
//
fileSize.QuadPart -= written;
offset.QuadPart += written;
}
FILE_END_OF_FILE_INFORMATION info;
info.EndOfFile = saveSize;
status = FltSetInformationFile(FltObjects->Instance,
targetFile, &info, sizeof(info), FileEndOfFileInformation);
} while(false);
ZwMapViewOfSection pefrorms the mapping, returning the pointer to the mapped memory in buffer.
Notice there is no buffer allocation anywhere - the data is just read directly.
Finall, we have to clean up, which is the same as the original code with the addition of the section
handle:
if (hSection)
ZwClose(hSection);
if (hSourceFile)
FltClose(hSourceFile);
if (hTargetFile)
FltClose(hTargetFile);
if (sourceFile)
ObDereferenceObject(sourceFile);
if (targetFile)
ObDereferenceObject(targetFile);
return status;
}
drawbacks is that the user mode client must initiate the communication. If the driver has something
to send to a user mode client (or clients), it cannot do so directly. It must store it and wait for the
client to ask for the data.
The filter manager provides an alternative mechanism for bi-directional communication between a
file system mini-filter and user mode clients, where any side can send information to the other and
even wait for a reply.
The mini-filter creates a filter communication port object by calling FltCreateCommunicationPort
to create such a port and register callbacks for client connection and messages. The user mode client
connects to the port by calling FilterConnectCommunicationPort, receiving a handle to the port.
A mini-filter sends a message to its user mode client(s) with FltSendMessage. Conversely, a user
mode client calls FilterGetMessage to wait until a message arrives, or calls FilterSendMessage
to send a message to the driver. If the driver is expecting a reply, a user mode client calls
FilterReplyMessage with the reply.
NTSTATUS FltCreateCommunicationPort (
_In_ PFLT_FILTER Filter,
_Outptr_ PFLT_PORT *ServerPort,
_In_ POBJECT_ATTRIBUTES ObjectAttributes,
_In_opt_ PVOID ServerPortCookie,
_In_ PFLT_CONNECT_NOTIFY ConnectNotifyCallback,
_In_ PFLT_DISCONNECT_NOTIFY DisconnectNotifyCallback,
_In_opt_ PFLT_MESSAGE_NOTIFY MessageNotifyCallback,
_In_ LONG MaxConnections);
• MaxConnections indicates the maximum number of clients that can connect to the port. It must
be greater than zero.
PSECURITY_DESCRIPTOR sd;
status = FltBuildDefaultSecurityDescriptor(&sd, FLT_PORT_ALL_ACCESS);
The security desciptor is necessary, otherwise no user-mode client would be able to open a handle
successfully, as the port is too secure. The object attributes can then be initialized:
The name of the port is in the object manager namespace, viewable with WinObj after port creation.
The flags must include OBJ_KERNEL_HANDLE, otherwise the call fails. Notice the last argument being
the security descriptor defined earlier. Now the driver is ready to call FltCreateCommunicationPort,
typically done after the driver calls FltRegisterFilter (because the returned opaque filter object
is needed for the call), but before FltStartFiltering so the port can be ready when actual filtering
starts:
PFLT_PORT ServerPort;
HRESULT FilterConnectCommunicationPort (
_In_ LPCWSTR lpPortName,
_In_ DWORD dwOptions,
_In_reads_bytes_opt_(wSizeOfContext) LPCVOID lpContext,
_In_ WORD wSizeOfContext,
_In_opt_ LPSECURITY_ATTRIBUTES lpSecurityAttributes,
_Outptr_ HANDLE *hPort);
• lpPortName is the port name (such as “\MyPort”). Note that with the default security descriptor
created by the driver, only admin level processes are able to connect.
• dwOptions is usually zero, but FLT_PORT_FLAG_SYNC_HANDLE in Windows 8.1 and later,
indicating the returned handle should work synchronously only. It’s not clear why this is
needed since the default usage is synchronous anyway.
• lpContext and wSizeOfContext support a way to send a buffer to the driver at connection time.
This could be used as a means of authentication, for example, where some password or token
is sent to the driver and the driver will fail requests to connect that don’t adhere to some
predefined authentication mechanism. In a production driver this is generally a good idea, so
that unknown clients could not “hijack” the communication port from legitimate clients.
• lpSecurityAttributes is the usual user mode SECURITY_ATTRIBUTES, typically set to NULL.
• hPort is the output handle used later by the client to send and receive messages.
This call invokes the driver’s client connection notify callback, declared as follows:
NTSTATUS PortConnectNotify(
_In_ PFLT_PORT ClientPort,
_In_opt_ PVOID ServerPortCookie,
_In_reads_bytes_opt_(SizeOfContext) PVOID ConnectionContext,
_In_ ULONG SizeOfContext,
_Outptr_result_maybenull_ PVOID *ConnectionPortCookie);
ClientPort is a unique handle to the client’s port which the driver must keep around and use whenever
it needs to communicate with that client. ServerPortCookie is the same one the driver specified in
FltCreateCommunicationPort. The ConnectionContex and SizeOfContex parameters contain the
optional buffer sent by the client. Finally, ConnectionPortCookie is an optional value the driver can
return as representing this client; it’s passed in the client disconnect and message notification routines.
If the driver agrees to accept the client’s connection it returns STATUS_SUCCESS. Otherwise, the client
will receive a failure HRESULT back at FilterConnectCommunicationPort.
Once the call to FilterConnectCommunicationPort succeeds, the client can start communicating
with the driver, and vice-versa.
NTSTATUS
FLTAPI
FltSendMessage (
_In_ PFLT_FILTER Filter,
_In_ PFLT_PORT *ClientPort,
_In_ PVOID SenderBuffer,
_In_ ULONG SenderBufferLength,
_Out_ PVOID ReplyBuffer,
_Inout_opt_ PULONG ReplyLength,
_In_opt_ PLARGE_INTEGER Timeout);
The first two parameters should be known by now. The driver can send any buffer described by
SenderBuffer with length SenderBufferLength. Typically the driver will define some structure in a
common header file the client can include as well so that it can correctly interpret the received buffer.
Optionally, the driver may expect a reply, and if so, the ReplyBuffer parameter should be non-NULL
with the maximum reply length stored in ReplyLength. Finally, Timeout indicates how long the driver
is willing to wait the message to reach the client (and wait for a reply, if one is expected). The timeout
has the usual format, described here for convenience:
The driver should be careful not to specify NULL from within a callback, because it means that if the
client is currently not listening, the thread blocks until it does, which may never happen. It’s better to
specify some limited value. Even better, if a reply is not needed right away, a work item can be used
to send the message and wait for longer if needed (refer to chapter 6 for more information on work
items, although the filter manager has its own work item APIs).
From the client’s perspective, it can wait for a message from the driver with FilterGetMessage,
specifying the port handle received when connecting, a buffer and size for the incoming message
and an OVERLAPPED structure than can be used to make the call asynchronous (non-blocking). The
received buffer always has a header of type FILTER_MESSAGE_HEADER, followed by the actual data
sent by the driver. FILTER_MESSAGE_HEADER is defined like so:
If a reply is expected, ReplyLength indicates how many bytes at most are expected. The MessageId field
allows distinguishing between messages, which the client should use if it calls FilterReplyMessage.
Chapter 12: File System Mini-Filters 467
A client can initiate its own message with FilterSendMessage which eventually lands in the driver’s
callback registered in FltCreateCommunicationPort. FilterSendMessage can specify a buffer
comprising the message to send and an optional buffer for a reply that may be expected from the
mini-filter.
See the documentation for FilterSendMessage and FilterReplyMessage for the complete
details.
PFLT_PORT g_Port;
PFLT_PORT g_ClientPort;
g_Port is the driver’s server port and g_ClientPort is the client port once connected (we will allow
a single client only).
We’ll have to modify DriverEntry to create the communication port as described in the previous
section. Here is the revised DriverEntry:
do {
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\BackupPort");
PSECURITY_DESCRIPTOR sd;
OBJECT_ATTRIBUTES attr;
InitializeObjectAttributes(&attr, &name,
OBJ_KERNEL_HANDLE | OBJ_CASE_INSENSITIVE, nullptr, sd);
Chapter 12: File System Mini-Filters 468
FltFreeSecurityDescriptor(sd);
if (!NT_SUCCESS(status))
break;
status = FltStartFiltering(g_Filter);
} while (false);
if (!NT_SUCCESS(status)) {
FltUnregisterFilter(g_Filter);
}
return status;
}
The driver only allows a single client to connect to the port (the last 1 to FltCreateCommunicationPort)
, which is quite common when a mini-filter works in tandem with a user mode service.
The PortConnectNotify callback is called when a client attempts to connect. Our driver simply stores
the client’s port and returns success:
NTSTATUS PortConnectNotify(
PFLT_PORT ClientPort, PVOID ServerPortCookie,
PVOID ConnectionContext, ULONG SizeOfContext,
PVOID* ConnectionPortCookie) {
UNREFERENCED_PARAMETER(ServerPortCookie);
UNREFERENCED_PARAMETER(ConnectionContext);
UNREFERENCED_PARAMETER(SizeOfContext);
UNREFERENCED_PARAMETER(ConnectionPortCookie);
g_ClientPort = ClientPort;
return STATUS_SUCCESS;
}
When the client disconnects, the PortDisconnectNotify callback is invoked. It’s important to close
the client port at that time, otherwise the mini-filter will never be unloaded:
Chapter 12: File System Mini-Filters 469
FltCloseClientPort(g_Filter, &g_ClientPort);
g_ClientPort = nullptr;
}
In this driver we don’t expect any messages from the client - the driver is the only one sending
messages - so the PostMessageNotify callback has an empty implementation.
Now we need to actually send a message when a file has been backed up successfully. For this
purpose, we’ll define a message structure common to the driver and the client in its own header
file, BackupCommon.h:
struct FileBackupPortMessage {
USHORT FileNameLength;
WCHAR FileName[1];
};
The message contains the file name length and the file name itself. The message does not have a
fixed size and depends on the file name length. In the pre-write callback after a file was backed up
successfully we need to allocate and initialize a buffer to send:
First we check if any client is connected, and if so we allocate a buffer with the proper size to include
the file name. Then we copy it to the buffer (RtlCopyMemory, the same as memcpy) before sending it
on its way with a limited timeout to be received.
Finally, in the filter’s unload routine we must close the filter communication port:
Chapter 12: File System Mini-Filters 470
FltCloseCommunicationPort(g_Port);
FltUnregisterFilter(g_Filter);
return STATUS_SUCCESS;
}
#include <Windows.h>
#include <fltUser.h>
#include <stdio.h>
#include <string>
#include "..\KBackup2\BackupCommon.h"
fltuser.h is the user mode header where the FilterXxx functions are declared (they are not part of
windows.h). In the cpp file we must add the import library for these functions:
Alternatively, this library can be added in the project’s properties in the Linker node, under
Input. Putting this in the source file is easier and more robust, since changes to the project
properties will not effect the setting. Without this library, “unresolved external” linker
errors will show up.
int main() {
HANDLE hPort;
auto hr = FilterConnectCommunicationPort(L"\\BackupPort",
0, nullptr, 0, nullptr, &hPort);
if (FAILED(hr)) {
printf("Error connecting to port (HR=0x%08X)\n", hr);
return 1;
}
Now we can allocate a buffer for incoming messages and loop around forever waiting for messages.
Once a message is received, we’ll send it for handling:
Chapter 12: File System Mini-Filters 471
for (;;) {
hr = FilterGetMessage(hPort, message, sizeof(buffer), nullptr);
if (FAILED(hr)) {
printf("Error receiving message (0x%08X)\n", hr);
break;
}
HandleMessage(buffer + sizeof(FILTER_MESSAGE_HEADER));
}
CloseHandle(hPort);
return 0;
}
The buffer here is allocated statically because the message just includes a file name, so a 4KB buffer
should be more than enough. Once a message is received, we pass the message body to a helper
function, HandleMessage, being careful to skip the always-present header.
All that’s left now is to do something with the data:
We build the string based on the pointer and length (fortunately, the C++ standard wstring class has
such a convenient constructor). This is important because the string is not necessarily NULL-terminated
(although we could have zeroed out the buffer before each message receipt, thus making sure zeros
are present at the end of the string).
The client application must be running elevated for the port open to succeed.
Debugging
Debugging file system mini-filter is no different than debugging any other kernel driver. However, the
Debugging Tools for Windows package has a special extension DLL, fltkd.dll, with specific commands
Chapter 12: File System Mini-Filters 472
to help with mini-filters. This DLL is not one of the default loaded extension DLLs, so the commands
must be used with their “full name” that includes the fltkd prefix and the command. Alternatively,
the DLL can be loaded explicitly with the .load command and then the commands can be directly
used.
Table 12-3 shows the some of the commands from fltkd with a brief description.
Command Description
!help shows the command list with brief descriptions
!filters shows information on all loaded mini-filters
!filter shows information for the specified filter address
!instance shows information for the specified instance address
!volumes shows all volume objects
!volume shows detailed information on the specified volume address
!portlist shows the server ports for the specified filter
!port shows information on the specified client port
FLT_FILTER: ffff8b8f633e8c80
Client Port List : Mutex (ffff8b8f633e8ed8) List [ffff8b8f5949b7a0-f\
fff8b8f5949b7a0] mCount=1
FLT_PORT_OBJECT: ffff8b8f5949b7a0
FilterLink : [ffff8b8f633e8f10-ffff8b8f633e8f10]
ServerPort : ffff8b8f5b195200
Cookie : 0000000000000000
Lock : (ffff8b8f5949b7c8)
MsgQ : (ffff8b8f5949b800) NumEntries=1 Enabled
MessageId : 0x0000000000000000
DisconnectEvent : (ffff8b8f5949b8d8)
Disconnected : FALSE
2: kd> !volumes
Exercises
1. Write a file system mini-filter that prevents file deletion from processes running certain image
name (e.g. “cmd.exe”).
2. Extend the file system mini-filter from the previous item, but instead of deleting files, moves
the files to the recycle bin.
3. Extend the file backup driver with the ability to choose the directories where backups will be
created.
4. Extend the File Backup driver to include multiple backups, limited by some rule, such as file
size, date or maximum number of backup copies.
5. Modify the File Backup driver to back up only the changed data instead of the entire file.
6. Come up with your own ideas for a file system mini-filter driver!
Chapter 12: File System Mini-Filters 476
Summary
This chapter was all about file system mini-filters - powerful drivers capable of intercepting any and
all file system activity. Mini-filters are a big topic, and this chapter should get you started on this
interesting and powerful journey. You can find more information in the WDK documentation, and
the WDK samples on Github.
In the next chapter, we’ll switch gears to look at the Windows Filtering Platform (WFP), used for
network filtering.
Chapter 13: The Windows Filtering
Platform
The Windows Filtering Platform (WFP) provides flexible ways to control network filtering. It exposes
user-mode and kernel-mode APIs, that interact with several layers of the networking stack. Some
configuration and control is available directly from user-mode, without requiring any kernel-mode
code (although it does require administrator-level access). WFP replaces older network filtering
technologies, such as Transport Driver Interface (TDI) filters some types of NDIS filters.
If examining network packets (and even modification) is required, or blocking is needed based on
some logic, a kernel-mode Callout driver can be written, which is what we’ll be concerned with in
this chapter. We’ll begin with an overview of the main pieces of WFP, look at some user-mode code
examples for configuring filters before diving into building a simple Callout driver that can use some
logic to block access to the network.
This chapter is an introduction to WFP, as full treatment would probably require its own book.
In this chapter:
• WFP Overview
• The WFP API
• User Mode Examples
• Callout Drivers
• Demo: Callout Driver
• Demo: User-Mode Client
• Summary
WFP Overview
WFP is comprised of user-mode and kernel-mode components. A very high-level architecture is
depicted in figure 13-1.
Chapter 13: The Windows Filtering Platform 478
In user-mode, the WFP manager is the Base Filtering Engine (BFE), which is a service implemented
by bfe.dll and hosted in a standard svchost.exe instance. It implements the WFP user-mode API,
essentially managing the platform, talking to its kernel counterpart when needed. We’ll examine
some of these APIs in the next section.
User-mode applications, services and other components can utilize this user-mode management API
to examine WFP objects state, and make changes, such as adding or deleting filters. A classic example
of such “user” is the Windows Firewall, which is normally controllable by leveraging the Microsoft
Management Console (MMC) that is provided for this purpose (see figure 13-2), but using these APIs
from other applications is just as effective.
Chapter 13: The Windows Filtering Platform 479
The kernel-mode filter engine exposes various logical layers, where filters (and callouts) can be
attached. Layers represent locations in the network processing of one or more packets. The TCP/IP
driver makes calls to the WFP kernel engine so that it can decide which filters (if any) should be
“invoked”.
For filters, this means checking the conditions set by the filter against the current request. If the
conditions are satisfied, the filter’s action is applied. Common actions include blocking a request
from being further processed, allowing the request to continue without further processing in this
layer, continuing to the next filter in this layer (if any), and invoking a callout driver. Callouts can
perform any kind of processing, such as examining and even modifying packet data.
The relationship between layers, filters, and callouts is depicted in figure 13-3.
Chapter 13: The Windows Filtering Platform 480
As you can see in figure 13-3, each layer can have zero or more filters, and zero or more callouts. The
number and meaning of the layers is fixed and provided out of the box by Windows. On most system,
there are about 100 layers. Many of the layers are sets of pairs, where one is for IPv4 and the other
(identical in purpose) is for IPv6.
The WFP Explorer tool I created provides some insight into what makes up WFP. Running the tool
and selecting View/Layers from the menu (or clicking the Layers tool bar button) shows a view of all
existing layers (figure 13-4).
You can download the WFP Explorer tool from its Github repository
Each layer is uniquely identified by a GUID. Its Layer ID is used internally by the kernel engine as an
identifier rather than the GUID, as it’s smaller and so is faster (layer IDs are 16-bit only). Most layers
have fields that can be used by filters to set conditions for invoking their actions. Double-clicking a
layer shows its properties. Figure 13-5 shows the general properties of an example layer. Notice it has
382 filters and 2 callouts attached to it. Clicking the Fields tab shows the fields available in this layer,
that can be used by filters to set conditions (figure 13-6).
Chapter 13: The Windows Filtering Platform 482
The meaning of the various layers, and the meaning of the fields for the layers are all documented in
the official WFP documentation.
The currently existing filters can be viewed in WFP Explorer by selecting Filters from the View menu
(figure 13-7). Layers cannot be added or removed, but filters can. Management code (user or kernel)
can add and/or remove filters dynamically while the system is running. Figure 16-7 shows that on the
system the tool is running on there are currently 2978 filters.
Chapter 13: The Windows Filtering Platform 484
Each filter is uniquely identified by a GUID, and just like layers has a “shorter” id (64-bit) that is used
by the kernel engine to more quickly compare filter IDs when needed. Since multiple filters can be
assigned to the same layer, some kind of ordering must be used when assessing filters. This is where
the filter’s weight comes into play. A weight is a 64-bit value that is used to sort filters by priority. As
you can see in figure 13-7, there are two weight properties - weight and effective weight. Weight is
what is specified when adding the filter, but effective weight is the actual one used. There are three
possible values to set for weight:
• A value between 0 and 15 is interpreted by WFP as a weight index, which simply means that
the effective weight is going to start with 4 bits having the specified weight value and generate
the other 60 bit. For example, if the weight is set to 5, then the effective weight is going to be
between 0x5000000000000000 and 0x5FFFFFFFFFFFFFFF.
• An empty value tells WFP to generate an effective weight somewhere in the 64-bit range.
• A value above 15 is taken as is to become the effective weight.
What is an “empty” value? The weight is not really a number, but a FWP_VALUE type can
hold all sorts of values, including holding no value at all (empty).
Double-clicking a filter in WFP Explorer shows its general properties, as shown in figure 13-8.
Chapter 13: The Windows Filtering Platform 485
The Conditions tab shows the conditions this filter is configured with (figure 13-9). When all the
conditions are met, the action of the filter is going to fire.
Chapter 13: The Windows Filtering Platform 486
The list of fields used by a filter must be a subset of the fields exposed by the layer this filter is attached
to. There are six conditions shown in figure 13-9 out of the possible 39 fields supported by this layer
(“ALE Receive/Accept v4 Layer”). As you can see, there is a lot of flexibility in specifying conditions
for fields - this is evident in the matching enumeration, FWPM_MATCH_TYPE:
FWP_MATCH_PREFIX,
FWP_MATCH_NOT_PREFIX,
FWP_MATCH_TYPE_MAX
} FWP_MATCH_TYPE;
A filter can have zero conditions, which means it’s always activated.
At this point, we have enough information to get acquainted with the WFP API.
The user-mode WFP APIs never set the last error, and always return the error value
directly. Zero (ERROR_SUCCESS) means success, while other (positive) values mean failure.
Do not call GetLastError when using WFP - just look at the returned value.
WFP functions and structures use a versioning scheme, where function and structure names end
with a digit, indicating version. For example, FWPM_LAYER0 is the first version of a structure
describing a layer. At the time of writing, this was the only structure for describing a layer. As
a counter example, there are several versions of the function beginning with FwpmNetEventEnum:
FwpmNetEventEnum0 (for Vista+), FwpmNetEventEnum1 (Windows 7+), FwpmNetEventEnum2 (Win-
dows 8+), FwpmNetEventEnum3 (Windows 10+), FwpmNetEventEnum4 (Windows 10 RS4+), and
FwpmNetEventEnum5 (Windows 10 RS5+). This is an extreme example, but there are others with less
“versions”. You can use any version that matches the target platform. To make it easier to work with
these APIs and structures, a macro is defined with the base name that is expanded to the maximum
supported version based on the target compilation platform. Here is the declarations for the macro
FwpmNetEventEnum:
DWORD FwpmNetEventEnum0(
_In_ HANDLE engineHandle,
_In_ HANDLE enumHandle,
_In_ UINT32 numEntriesRequested,
_Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT0*** entries,
_Out_ UINT32* numEntriesReturned);
#if (NTDDI_VERSION >= NTDDI_WIN7)
DWORD FwpmNetEventEnum1(
_In_ HANDLE engineHandle,
_In_ HANDLE enumHandle,
Chapter 13: The Windows Filtering Platform 488
You can see that the differences in the functions relate to the structures returned as part of these APIs
(FWPM_NET_EVENTx). It’s recommended you use the macros, and only turn to specific versions if there
is a compelling reason to do so.
The WFP APIs adhere to strict naming conventions that make it easier to use. All management
functions start with Fwpm (Filtering Windows Platform Management), and all management structures
start with FWPM. The function names themselves use the pattern <Prefix><Object Type><Operation>,
Chapter 13: The Windows Filtering Platform 489
It’s curious that the prefixes used for functions, structures, and enums start with FWP rather than
the (perhaps) expected WFP. I couldn’t find a compelling reason for this.
WFP header files start with fwp and end with u for user-mode or k for kernel-mode. For example,
fwpmu.h holds the management functions for user-mode callers, whereas fwpmk.h is the header for
kernel callers. Two common files, fwptypes.h and fwpmtypes.h are used by both user-mode and
kernel-mode headers. They are included by the “main” header files.
User-Mode Examples
Before making any calls to specific APIs, a handle to the WFP engine must be opened with
FwpmEngineOpen:
DWORD FwpmEngineOpen0(
_In_opt_ const wchar_t* serverName, // must be NULL
_In_ UINT32 authnService, // RPC_C_AUTHN_DEFAULT
_In_opt_ SEC_WINNT_AUTH_IDENTITY_W* authIdentity,
_In_opt_ const FWPM_SESSION0* session,
_Out_ HANDLE* engineHandle);
Most of the arguments have good defaults when NULL is specified. The returned handle must be used
with subsequent APIs. Once it’s no longer needed, it must be closed:
Enumerating Objects
What can we do with an engine handle? One thing provided with the management API is enumeration.
These are the APIs used by WFP Explorer to enumerate layers, filters, sessions, and other object types
in WFP. The following example displays some details for all the filters in the system (error handling
omitted for brevity, the project wfpfilters has the full source code):
Chapter 13: The Windows Filtering Platform 490
#include <Windows.h>
#include <fwpmu.h>
#include <stdio.h>
#include <string>
int main() {
//
// open a handle to the WFP engine
//
HANDLE hEngine;
FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT, nullptr, nullptr, &hEngine);
//
// create an enumeration handle
//
HANDLE hEnum;
FwpmFilterCreateEnumHandle(hEngine, nullptr, &hEnum);
UINT32 count;
FWPM_FILTER** filters;
//
// enumerate filters
Chapter 13: The Windows Filtering Platform 491
//
FwpmFilterEnum(hEngine, hEnum,
8192, // maximum entries,
&filters, // returned result
&count); // how many actually returned
//
// close enumeration handle
//
FwpmFilterDestroyEnumHandle(hEngine, hEnum);
//
// close engine handle
//
FwpmEngineClose(hEngine);
return 0;
}
The enumeration pattern repeat itself with all other WFP object types (layers, callouts, sessions, etc.).
Adding Filters
Let’s see if we can add a filter to perform some useful function. Suppose we want to prevent network
access from some process. We can add a filter at an appropriate layer to make it happen. Adding a
Chapter 13: The Windows Filtering Platform 492
DWORD FwpmFilterAdd0(
_In_ HANDLE engineHandle,
_In_ const FWPM_FILTER0* filter,
_In_opt_ PSECURITY_DESCRIPTOR sd,
_Out_opt_ UINT64* id);
The weird-looking comments are generated by the Microsoft Interface Definition Lan-
guage (MIDL) compiler when generating the header file from an IDL file. Although IDL
is most commonly used by Component Object Model (COM) to define interfaces and types,
WFP uses IDL to define its APIs, even though no COM interfaces are used; just plain C
functions. The original IDL files are provided with the SDK, and they are worth checking
out, since they may contain developer comments that are not “transferred” to the resulting
header files.
Some members in FWPM_FILTER are necessary - layerKey to indicate the layer to attach this filter,
any conditions needed to trigger the filter (numFilterConditions and the filterCondition array),
and the action to take if the filter is triggered (action field).
Chapter 13: The Windows Filtering Platform 493
Let’s create some code that prevents the Windows Calculator from accessing the network. You may
be wondering why would calculator require network access? No, it’s not contacting Google to ask for
the result of 2+2. It’s using the Internet for accessing current exchange rates (figure 13-10).
Clicking the Update Rates button causes Calculator to consult the Internet for the updated exchange
rate. We’ll add a filter that prevents this.
We’ll start as usual by opening handle to the WFP engine as was done in the previous example. Next,
we need to fill the FWPM_FILTER structure. First, a nice display name:
The name has no functional part - it just allows easy identification when enumerating filters. Now
we need to select the layer. We’ll also specify the action:
Chapter 13: The Windows Filtering Platform 494
filter.layerKey = FWPM_LAYER_ALE_AUTH_CONNECT_V4;
filter.action.type = FWP_ACTION_BLOCK;
There are several layers that could be used for blocking access, with the above layer being good enough
to get the job done. Full description of the provided layers, their purpose and when they are used is
provided as part of the WFP documentation.
The last part to initialize is the conditions to use. Without conditions, the filter is always going to be
invoked, which will block all network access (or just for some processes, based on its effective weight).
In our case, we only care about the application - we don’t care about ports or protocols. The layer
we selected has several fields, one of with is called ALE App ID (ALE stands for Application Layer
Enforcement) - see figure 13-11.
This field can be used to identify an executable. To get that ID, we can use FwpmGetAppIdFromFileName.
Here is the code for Calculator’s executable:
Chapter 13: The Windows Filtering Platform 495
The code uses the path to the Calculator executable on my system - you should change that as needed
because Calculator’s version might be different.
A quick way to get the executable path is to run Calculator, open Process Explorer, open
the resulting process properties, and copy the path from the Image tab.
The R"( and closing parenthesis in the above snippet disables the “escaping” property of
backslashes, making it easier to write file paths (C++ 14 feature).
The return value from FwpmGetAppIdFromFileName is a BLOB that needs to be freed eventually with
FwpmFreeMemory.
Now we’re ready to specify the one and only condition:
FWPM_FILTER_CONDITION cond;
cond.fieldKey = FWPM_CONDITION_ALE_APP_ID; // field
cond.matchType = FWP_MATCH_EQUAL;
cond.conditionValue.type = FWP_BYTE_BLOB_TYPE;
cond.conditionValue.byteBlob = appId;
filter.filterCondition = &cond;
filter.numFilterConditions = 1;
Those familiar with COM may recognize this approach as similar to a VARIANT.
The last step is to add the filter, and repeat the exercise for IPv6, as we don’t know how Calculator
connects to the currency exchange server (we can find out, but it would be simpler and more robust
to just block IPv6 as well):
Chapter 13: The Windows Filtering Platform 496
We didn’t specify any GUID for the filter. This causes WFP to generate a GUID. We didn’t specify
weight, either. WFP will generate them.
FwpmFreeMemory((void**)&appId);
FwpmEngineClose(hEngine);
Running this code (elevated) should and trying to refresh the currency exchange rate with Calculator
should fail (figure 13-12). Note that there is no need to restart Calculator - the effect is immediate.
Chapter 13: The Windows Filtering Platform 497
We can locate the filters added with WFP Explorer (figure 13-13):
Double-clicking one of the filters and selecting the Conditions tab shows the only condition where
the App ID is revealed to be the full path of the executable in device form (figure 13-14). Of course,
Chapter 13: The Windows Filtering Platform 498
you should not take any dependency on this format, as it may change in the future.
You can right-click the filters and delete them using WFP Explorer. The FwpmFilterDeleteByKey API
is used behind the scenes. This will restore Calculator’s exchange rate update functionality.
Callout Drivers
The existing WFP layers provides lots of flexibility when creating filters, thanks to the many fields
and comparison options available. In many scenarios, you could get away with using the user mode
API to add filters to get the functionality you need without resorting to writing a kernel driver. That
said, some scenarios require more flexibility than can be provided by the built-in layers and callouts
alone. Here are some examples that would require a callout driver:
• Checking some conditions that are not provided by fields in a required layer.
• Examining actual packet data, optionally modifying it.
• Pending an operation until a decision can be made whether to let the operation continue or not.
Chapter 13: The Windows Filtering Platform 499
In the rest of this chapter, we’ll look at some examples of callout drivers (as this is a kernel
programming book).
The first step can only be done in a kernel driver, as this is where the callout specifies its callbacks,
to be invoked by the WFP kernel engine when appropriate. The other two steps can be done from
user-mode or kernel-mode, where usually user-mode makes more sense, as it provides flexibility of
use, without the need to “disturb” the driver.
Technically, step 2 can be performed before step 1. If the callout is not registered, it will be treated
as a “blocking” callout, meaning it will block whatever operation it’s attached to.
Callout Registration
Registering a callout involves calling FwpsCalloutRegister with a callout description:
NTSTATUS FwpsCalloutRegister(
_Inout_ void* deviceObject,
_In_ const FWPS_CALLOUT* callout,
_Out_opt_ UINT32* calloutId);
The function requires a device object, created normally with IoCreateDevice, as we have seen many
times before. This is used to associate the callout with the device, so that the driver does not unload
prematurely if any code is still executed by one of the callout’s callbacks.
The important part of FwpsCalloutRegister is the callout structure provided:
Chapter 13: The Windows Filtering Platform 500
calloutKey is a GUID used to identify the callout. This GUID should be generated once, typically
with the Create GUID tool available as part of the Visual Studio Tools menu (figure 13-15). The same
GUID must be used when adding the callout to a layer, and when using it as part of a filter action (as
we’ll soon see).
flags can be zero, or a combination of flags. The list of flags has grown, indicated by the
version of FwpsCalloutRegister called, with the associated FWPS_CALLOUT structure version. At
the time of writing, FwpsCalloutRegister0 to FwpsCalloutRegister3 exist, with corresponding
FWPS_CALLOUT0 to FWPS_CALLOUT3. The data members are essentially the same (just “versioning”
changes), and the flags list extended. Here are a couple of notable flags (read the docs for the full list):
cessing is offloaded to a capable network interface card (NIC). If this is flag is not specified, off-
loading will be disabled for any processing path involving filters that use this callout. Normally,
you should set this flag.
• FWP_CALLOUT_FLAG_ENABLE_COMMIT_ADD_NOTIFY indicates the callout is able to receive no-
tifications about objects and filters added inside a transaction. Once the transaction commits
successfully, its callbacks will be invoked.
The last three members of FWPS_CALLOUT are callbacks that are invoked when appropriate. The most
important one is classifyFn, which is the one used to “classify” in some way the request, and decide
how processing should proceed. Here is the callback’s prototype:
void ClassifyFunction(
const FWPS_INCOMING_VALUES* inFixedValues,
const FWPS_INCOMING_METADATA_VALUES* inMetaValues,
void* layerData,
const void* classifyContext,
const FWPS_FILTER* filter,
UINT64 flowContext,
FWPS_CLASSIFY_OUT* classifyOut);
It’s quite a callback, having a multitude of parameters, some of which point to their own structures.
inFixedValues are the values set for the fields of the layer this callout is part of, wrapped in a
FWPS_INCOMING_VALUES structure:
The number of values (valueCount) is the same as the number of fields in the layer, and they are
provided in order. WFP Explorer makes it easier to see the order thanks to the provided index in a
layer’s properties (see figure 13-16 with an example layer).
Chapter 13: The Windows Filtering Platform 502
The field indices are also available in a set of enumerations, each enumeration describes one of the
layers with the field indices provided in the correct order. For example, here is a subset from the same
layer as shown in figure 13-16:
FWPS_FIELD_ALE_AUTH_CONNECT_V4_FLAGS,
FWPS_FIELD_ALE_AUTH_CONNECT_V4_INTERFACE_TYPE,
FWPS_FIELD_ALE_AUTH_CONNECT_V4_TUNNEL_TYPE,
FWPS_FIELD_ALE_AUTH_CONNECT_V4_INTERFACE_INDEX,
FWPS_FIELD_ALE_AUTH_CONNECT_V4_SUB_INTERFACE_INDEX,
FWPS_FIELD_ALE_AUTH_CONNECT_V4_IP_ARRIVAL_INTERFACE,
//...
FWPS_FIELD_ALE_AUTH_CONNECT_V4_MAX
} FWPS_FIELDS_ALE_AUTH_CONNECT_V4;
Next up is inMetaValues, pointing to a structure providing some general details of the operation
(comments shown are from the header itself):
NDIS_SWITCH_PORT_ID vSwitchSourcePortId;
NDIS_SWITCH_NIC_INDEX vSwitchSourceNicIndex;
NDIS_SWITCH_PORT_ID vSwitchDestinationPortId;
HANDLE vSwitchPacketContext;
PVOID subProcessTag;
// Reserved for system use.
UINT64 reserved1;
} FWPS_INCOMING_METADATA_VALUES;
I’ll mention a few useful members. First, currentMetadataValues indicates which other members
Chapter 13: The Windows Filtering Platform 505
The next parameter to the classify function is layerData, providing the actual data that makes sense
for this layer. Some layers don’t have any associated data, so this pointer may be NULL. For a “Stream”
layer (such as FWPS_LAYER_STREAM_V4), the pointer is to a FWPS_STREAM_CALLOUT_IO_PACKET
structure. In all other cases, it points to a NET_BUFFER_LIST, which is the standard way of describing
a network buffer. Clearly, there is a lot more to look into, some of which we’ll do later in this chapter.
The next parameter, classifyContext, is an internal pointer used by the WFP infrastructure. It may
be NULL for some layers. If not-NULL, it can be used to “pend” an operation - hold on to it, until a
decision can be made later, outside of the context of the callback. This is beyond the scope of this
chapter.
The next parameter, filter is the filter pointer used to invoke this callback. It’s essentially the one
used by a client code to set up this callout as an action target. Usually, filters are added from user
mode with FwpmFilterAdd, but they can be added by kernel code in exactly the same way. Here is
its generic definition (taking the version out of the equation):
Most members were set explicitly by whoever called FwpmFilterAdd. Consult the docs for the missing
pieces.
Note that the “runtime” structures used the kernel WFP engine (starting with Fwps) are
not the same ones used by the management functions (common to user-mode and kernel-
mode, starting with Fwpm). For example, filterId in the above structure is a 64-bit value,
instead of a GUID that is used to identify a filter with the management functions.
Chapter 13: The Windows Filtering Platform 506
The next parameter, flowContext, represents a context associated with the data flow, if any. Some
layers don’t support data flow, which would make this parameter ignorable.
Finally, the last parameter to the classify callback, classifyOut, is a pointer to a structure where the
result of the classification should be provided directly (unless the operation is pended). This is where
the final “decision” of the callout is to be stored:
The most important member is actionType, where the driver decides the suggested fate of the
operation. Possible values include:
Writing to actionType is “controlled” by the rights member. If it has the value FWPS_RIGHT_-
ACTION_WRITE, then the driver is allowed to set an action in actionType. If not, the driver is permitted
to set an action of FWP_ACTION_BLOCK only to override an earlier filter’s action. Technically, a callout
can always write an action value, but it should follow the rules. A driver setting the action to block
or permit should remove the FWPS_RIGHT_ACTION_WRITE flag from rights so that subsequent filters
will not likely “interfere” with the callout’s decision.
The only remaining member of FWPS_CLASSIFY_OUT yet to be discussed is flags, which can be set
to a combination of values with the most useful being the following (see the docs for the other two
possible flags):
• FWPS_CLASSIFY_OUT_FLAG_ABSORB - the data is silently dropped. This is typical for cases where
the original packet is absorbed, to be replaced by a different one. The driver sets this value in
such cases. We’ll see an example using this flag later in this chapter.
Back to FwpsCalloutRegister and the FWPS_CALLOUT structure - the next member, notifyFn, is
another callback the driver has to provide. It’s called when filters that are using this callout are added
or removed:
Chapter 13: The Windows Filtering Platform 507
NTSTATUS NotifyCallback(
_In_ FWPS_CALLOUT_NOTIFY_TYPE notifyType,
_In_ const GUID* filterKey,
_Inout_ FWPS_FILTER* filter);
The callout ID is an optional return value from FwpsCalloutRegister. The driver can store it for
later use, such as for unregistering purposes. Using the GUID of the callout is just as good.
The Driver
We start by creating a new WDM Empty Driver project as usual, named ProcessNetFilter. The INF
file is deleted, as it’s not needed. We’ll keep the interesting state of the driver in a global class named
Globals that will take care of all the WFP functionality (in Globals.h):
Chapter 13: The Windows Filtering Platform 508
#include "Vector.h"
#include "SpinLock.h"
class Globals {
public:
Globals();
static Globals& Get();
Globals(Globals const&) = delete;
Globals& operator=(Globals const&) = delete;
~Globals();
NTSTATUS DoCalloutNotify(
_In_ FWPS_CALLOUT_NOTIFY_TYPE notifyType,
_In_ const GUID* filterKey,
_Inout_ FWPS_FILTER* filter);
void DoCalloutClassify(
_In_ const FWPS_INCOMING_VALUES* inFixedValues,
_In_ const FWPS_INCOMING_METADATA_VALUES* inMetaValues,
_Inout_opt_ void* layerData,
_In_opt_ const void* classifyContext,
_In_ const FWPS_FILTER* filter,
_In_ UINT64 flowContext,
_Inout_ FWPS_CLASSIFY_OUT* classifyOut);
private:
Vector<ULONG, PoolType::NonPaged> m_Processes;
mutable SpinLock m_ProcessesLock;
inline static Globals* s_Globals;
};
The vector.h header implements a simple resizable array, which will not be described in this chapter.
The m_Processes member is declared as such a vector of process IDs (ULONG) allocated from non-
paged pool (PoolType::NonPaged enumeration value, defined in Memory.h). More information on
the Vector<> class and other parts of the KTL can be found in the appendix.
A Globals pointer named g_Data is defined in Main.cpp. It’s dynamically allocated with the new
operator (overloaded) to allow the constructor to run, and by the same token, it’s deleted with the
Chapter 13: The Windows Filtering Platform 509
Globals::Globals() {
s_Globals = this;
m_ProcessesLock.Init();
}
Globals& Globals::Get() {
return *s_Globals;
}
Let’s now turn our attention to the DriverEntry function. The driver creates a normal named device
object and a symbolic link in order to allow sending I/O controls for blocking and permitting network
access for process IDs. Most of the code should be very familiar at this point:
status = g_Data->RegisterCallouts(devObj);
if (!NT_SUCCESS(status))
break;
} while (false);
Chapter 13: The Windows Filtering Platform 510
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "DriverEntry failed (0x%X)\n", status));
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
IoDeleteDevice(devObj);
return status;
}
DriverObject->DriverUnload = ProcNetFilterUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = ProcNetFilterCreateClose;
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = ProcNetFilterDeviceCon\
trol;
return STATUS_SUCCESS;
}
The unfamiliar code is the call to Globals::RegisterCallouts. Registering callouts require calling
FwpsRegisterCallout for each callout. Why would we need multiple callouts? When adding callouts
(later), a callout is added at a specific layer. If the “same” callout behavior is required in several layers,
different callouts (with different GUIDs) must be added separately. Since we’re interested in blocking
network traffics for TCP and UDP, for IPv4 and IPv6, we require four callouts, even though the
callouts’ callbacks will be the same:
The GUIDs of these callouts are defined in the ProcNetFilterPublic.h header, shared with user mode,
as it’s more flexible to let user mode add callouts as needed.
The unload routine deletes the g_Data object, invoking the destructor, and then deletes the symbolic
link and device object:
Globals::~Globals() {
const GUID* guids[] = {
&GUID_CALLOUT_PROCESS_BLOCK_V4,
&GUID_CALLOUT_PROCESS_BLOCK_V6,
&GUID_CALLOUT_PROCESS_BLOCK_UDP_V4,
&GUID_CALLOUT_PROCESS_BLOCK_UDP_V6,
};
for(auto& guid : guids)
FwpsCalloutUnregisterByKey(guid);
}
The destructor reverses the callout registration by unregistering the four callouts.
Managing Processes
Managing the process IDs that require blocking done by manipulating the Vector<>. Some functions
in the Globals class are task with this work:
{
Locker locker(m_ProcessesLock);
//
// don't add if it's already there
//
if(!m_Processes.Contains(pid))
m_Processes.Add(pid);
}
ObDereferenceObject(process);
return STATUS_SUCCESS;
}
NTSTATUS Globals::ClearProcesses() {
Locker locker(m_ProcessesLock);
m_Processes.Clear();
return STATUS_SUCCESS;
}
m_ProcessesLock is of type SpinLock - a spin lock wrapper we’ve used in previous chapters.
Locker<> is a generic locker we’ve used as well.
A nice touch in the AddProcess implementation is checking that the process actually exists by calling
PsLookupProcessByProcessId.
Adding, removing and clearing the processes vector is done through I/O control codes. These are
defined in ProcNetFilterPublic.h alongside the callout GUIDs:
Chapter 13: The Windows Filtering Platform 513
#include <initguid.h>
// {5027C277-201A-4AAF-B8EC-95C05E857059}
DEFINE_GUID(GUID_CALLOUT_PROCESS_BLOCK_V4, 0x5027c277, 0x201a, 0x4aaf, 0xb8, 0x\
ec, 0x95, 0xc0, 0x5e, 0x85, 0x70, 0x59);
// {CF51FD24-566F-4C6D-9BC9-8013E9875E7E}
DEFINE_GUID(GUID_CALLOUT_PROCESS_BLOCK_V6, 0xcf51fd24, 0x566f, 0x4c6d, 0x9b, 0x\
c9, 0x80, 0x13, 0xe9, 0x87, 0x5e, 0x7e);
// {200E35C6-7182-4F9C-97DF-34028A225BEC}
DEFINE_GUID(GUID_CALLOUT_PROCESS_BLOCK_UDP_V4, 0x200e35c6, 0x7182, 0x4f9c, 0x97\
, 0xdf, 0x34, 0x02, 0x8a, 0x22, 0x5b, 0xec);
// {C8AF8E6D-1D0C-4547-A2A1-7593C3396BAF}
DEFINE_GUID(GUID_CALLOUT_PROCESS_BLOCK_UDP_V6, 0xc8af8e6d, 0x1d0c, 0x4547, 0xa2\
, 0xa1, 0x75, 0x93, 0xc3, 0x39, 0x6b, 0xaf);
switch (dic.IoControlCode) {
case IOCTL_PNF_CLEAR:
status = g_Data->ClearProcesses();
break;
case IOCTL_PNF_BLOCK_PROCESS:
case IOCTL_PNF_PERMIT_PROCESS:
if (dic.InputBufferLength < sizeof(ULONG)) {
Chapter 13: The Windows Filtering Platform 514
status = STATUS_BUFFER_TOO_SMALL;
break;
}
auto pid = *(ULONG*)Irp->AssociatedIrp.SystemBuffer;
status = dic.IoControlCode == IOCTL_PNF_BLOCK_PROCESS ?
g_Data->AddProcess(pid) : g_Data->DeleteProcess(pid);
if (NT_SUCCESS(status))
info = sizeof(ULONG);
break;
}
The above code should be familiar by now. CompleteRequest is a helper function we used before
that simply completes an IRP given an optional status and information:
NTSTATUS CompleteRequest(
PIRP Irp, NTSTATUS status = STATUS_SUCCESS, ULONG_PTR info = 0);
Callout Callbacks
The most interesting part in the driver is obviously the WFP related code. The callout registration
done previously points the notify and classify callouts to OnCalloutNotify and OnCalloutClassify,
respectively.
These two functions simply delegate their work to instance members of the Globals class so it would
be easier to access the class members:
Chapter 13: The Windows Filtering Platform 515
The real work is done in the member functions with the names DoCalloutNotify and DoCalloutClassify.
The notify callback is mostly uninteresting, but must be implemented. The code just outputs the fact
that a filter has been added or removed with its GUID if available:
if (notifyType == FWPS_CALLOUT_NOTIFY_ADD_FILTER) {
KdPrint((DRIVER_PREFIX "Filter added: %wZ\n", sguid));
}
else if (notifyType == FWPS_CALLOUT_NOTIFY_DELETE_FILTER) {
KdPrint((DRIVER_PREFIX "Filter deleted: %wZ\n", sguid));
}
if (filterKey)
RtlFreeUnicodeString(&sguid);
return STATUS_SUCCESS;
}
In most cases, the fact the filters are added or removed (that use on of the driver’s callouts) is not
important. Still, it may be useful in certain cases. For example, the driver can keep track of how many
filters are currently using the driver for logging or other purposes.
The above code uses the helper RtlStringFromGUID API provided by the kernel to convert a GUID
into a UNICODE_STRING. Memory is allocated by the routine, so RtlFreeUnicodeString must be
Chapter 13: The Windows Filtering Platform 516
called to free the string. Note that in some cases the GUID of the filter is not provided, so care must
be taken not to pass a NULL GUID to RtlStringFromGUID, as it will crash the system.
The most important callback is the classify one. Its job is to determine if the request should be blocked.
First, we need to check if a process ID is available as part of the “metadata” fields:
//
// search for the PID (if available)
//
if ((inMetaValues->currentMetadataValues & FWPS_METADATA_FIELD_PROCESS_ID)
== 0) return;
Now that we know a process ID is available, we’ll check if it’s on our list of PIDs to block:
bool block;
{
Locker locker(m_ProcessesLock);
block = m_Processes.Contains((ULONG)inMetaValues->processId);
}
The spin lock is acquired for the minimum possible interval, as multiple classify callbacks may be
running at the same time. A spin lock is used (and not a fast mutex), because the classify callback is
invoked at IRQL DISPATCH_LEVEL (2).
If we need to block, we set the action to “block” and tell downstream filters not to change the outcome:
Chapter 13: The Windows Filtering Platform 517
if(block) {
//
// block
//
classifyOut->actionType = FWP_ACTION_BLOCK;
//
// ask other filters from overriding the block
//
classifyOut->rights &= ~FWPS_RIGHT_ACTION_WRITE;
Removing the FWPS_RIGHT_ACTION_WRITE bit in the rights member is critical - otherwise next
callouts in the chain might change the action to “permit”. It’s OK to change a “permit” action to “block”
- but not vice-versa. Here is the full classify callout implementation for easier reference (comments
removed):
bool block;
{
Locker locker(m_ProcessesLock);
block = m_Processes.Contains((ULONG)inMetaValues->processId);
}
if(block) {
classifyOut->actionType = FWP_ACTION_BLOCK;
classifyOut->rights &= ~FWPS_RIGHT_ACTION_WRITE;
(ULONG)inMetaValues->processId));
}
}
In order for the driver to link successfully, the fwpkclnt.lib import library must be added to the Linker’s
Input tab (see figure 13-17).
You may try to add the import through a pragma like so: #pragma comment(lib,
"fwpkclnt"). This does
not have the desired effect. For some reason, this pragma only seems to work in user-mode projects.
For completeness, the driver should keep track of process destruction, and remove a
destroyed process from the list of blocked processes (if listed). Add code to accomplish
that.
Next, it adds a new WFP provider to the system, to make it easier to identify callouts and filters that
“belong” to the provider. Providers don’t play an active role in WFP, but they are useful for identifying
different “sources” of filters or callouts:
The feature where an initialization is permitted before a test with a semicolon in between
as seen above is available from C++ 17. It also works with a switch statement. It’s useful
in keeping the variable (error in the above code) constrained in scope of the if statement
(and an else statement if exists).
Defining a provider requires generating a GUID to uniquely identofy the provider. Here is the GUID
defined at the top of the BlockProcess.cpp file:
// {7672D055-03C0-43F1-9E31-0392850BD07F}
DEFINE_GUID(WFP_PROVIDER_CHAPTER13,
0x7672d055, 0x3c0, 0x43f1, 0x9e, 0x31, 0x3, 0x92, 0x85, 0xb, 0xd0, 0x7f);
Registering a provider must be done (as most operations) against the WFP engine:
DWORD RegisterProvider() {
HANDLE hEngine;
DWORD error = FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT,
nullptr, nullptr, &hEngine);
if (error)
return error;
Working with the WFP engine would require opening and closing it with code easily repeated. This
project just repeats the code, but a good idea would be to create a wrapper class for the WFP engine.
You can find one such example in the source code of WFP Explorer.
Next, we can check if the provider has already been registered. If so, no further action is needed.
Otherwise, we go ahead and register it:
Chapter 13: The Windows Filtering Platform 520
FWPM_PROVIDER* provider;
error = FwpmProviderGetByKey(hEngine, &WFP_PROVIDER_CHAPTER13, &provider);
if (error != ERROR_SUCCESS) {
FWPM_PROVIDER reg{};
WCHAR name[] = L"WKP2 Chapter 13";
reg.displayData.name = name;
reg.providerKey = WFP_PROVIDER_CHAPTER13;
reg.flags = FWPM_PROVIDER_FLAG_PERSISTENT;
FwpmEngineClose(hEngine);
return error;
}
Back to main. The next item on the agenda is opening a handle to the device. Without that, there are
no registered callouts:
if (!AddCallouts()) {
printf("Error adding callouts\n");
return 1;
}
Adding the callouts allows them to be used in filters. If no filter is referencing the callouts, they are
essentially useless.
AddCallouts opens a handle to the engine, and looks for one of the callouts. If it’s already added,
there is nothing else to do:
bool AddCallouts() {
HANDLE hEngine;
DWORD error = FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT,
nullptr, nullptr, &hEngine);
if (error)
return false;
do {
if (FWPM_CALLOUT* callout; FwpmCalloutGetByKey(hEngine,
&GUID_CALLOUT_PROCESS_BLOCK_V4, &callout) == ERROR_SUCCESS) {
FwpmFreeMemory((void**)&callout);
break;
}
const struct {
const GUID* guid;
const GUID* layer;
} callouts[] = {
{ &GUID_CALLOUT_PROCESS_BLOCK_V4, &FWPM_LAYER_ALE_AUTH_CONNECT_V4 },
{ &GUID_CALLOUT_PROCESS_BLOCK_V6, &FWPM_LAYER_ALE_AUTH_CONNECT_V6 },
{ &GUID_CALLOUT_PROCESS_BLOCK_UDP_V4, &FWPM_LAYER_ALE_RESOURCE_ASSIGNME\
NT_V4 },
{ &GUID_CALLOUT_PROCESS_BLOCK_UDP_V6, &FWPM_LAYER_ALE_RESOURCE_ASSIGNME\
NT_V6 },
};
FWPM_CALLOUT callout{};
callout.applicableLayer = *co.layer;
callout.calloutKey = *co.guid;
WCHAR name[] = L"Block PID callout";
callout.displayData.name = name;
callout.providerKey = (GUID*)&WFP_PROVIDER_CHAPTER13;
Each callout is added to the appropriate layer. For block/permit operations, the FWPM_LAYER_-
ALE_AUTH_CONNECT_V4/6 layer are the ones to use for TCP and FWPM_LAYER_ALE_RESOURCE_-
ASSIGNMENT_V4/6 for UDP. The WFP documentation lists all the available layers with their meaning.
For each callout, a display name is mandatory and so are the callout unique key (GUID) and the
applicable layer. Filters added to this layer(s) only can use these callouts. The provider is set as well
for easy identification.
Performing multiple operations against the engine can be done within the scope of a transaction
that adheres to the classic “ACID” properties (atomicity, consistency, isolation and durability),
meaning that either all operations within the transaction succeed or none do. FwpmTransactionbegin
initiates a transaction and FwpmTransactionCommit commits it. FwpmTransactionAbort is available
if aborting the transaction is desired. If the engine is closed prematurely, any transactions are aborted.
Finally, the engine is properly closed:
FwpmEngineClose(hEngine);
return error == ERROR_SUCCESS;
}
Back to main. The next thing to do is examine the command line arguments, and forward to the
correct function for handling:
else {
printf("Unknown or bad command.\n");
return 1;
}
if (success)
printf("Operation completed successfully.\n");
else
printf("Error occurred: %u\n", GetLastError());
CloseHandle(hDevice);
return 0;
}
Let’s examine each in turn, starting with BlockProcess. Its purpose is to add a PID to the list of
blocked processes. First, it needs to add filters to the four layers if these have not been added before:
We need to add the filters just once, since they can serve any number of process IDs. This means it will
be easier to give these four filters known GUIDs that we can then reference as needed. The following
is set up at the top of BlockProcess.cpp:
// {C5C2DEC4-C0CD-4187-9BE9-C749ED53549D}
DEFINE_GUID(GUID_FILTER_V4, 0xc5c2dec4, 0xc0cd, 0x4187, 0x9b, 0xe9, 0xc7, 0x49,\
0xed, 0x53, 0x54, 0x9d);
// {9E99EFD3-8E9E-496B-8F6D-63A69D2E84A7}
DEFINE_GUID(GUID_FILTER_V6, 0x9e99efd3, 0x8e9e, 0x496b, 0x8f, 0x6d, 0x63, 0xa6,\
0x9d, 0x2e, 0x84, 0xa7);
// {EE870CB6-7D26-4580-A8F4-8CA7783A98F9}
DEFINE_GUID(GUID_FILTER_UDP_V4, 0xee870cb6, 0x7d26, 0x4580, 0xa8, 0xf4, 0x8c, 0\
xa7, 0x78, 0x3a, 0x98, 0xf9);
// {C8EB1629-B3C7-4A37-95F5-1DA3495EC8F5}
DEFINE_GUID(GUID_FILTER_UDP_V6, 0xc8eb1629, 0xb3c7, 0x4a37, 0x95, 0xf5, 0x1d, 0\
xa3, 0x49, 0x5e, 0xc8, 0xf5);
The alternative would be to let WFP assign GUIDs to added filters, but that would mean locating
them would be more difficult, as it would require enumerating all filters and looking at the callout
GUID they point to (if any), and/or identifying the provider.
The first step in AddFilters is checking if one was added before, and aborting if so:
Chapter 13: The Windows Filtering Platform 524
bool AddFilters() {
HANDLE hEngine;
DWORD error = FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT,
nullptr, nullptr, &hEngine);
if (error)
return false;
do {
if (FWPM_FILTER* filter; FwpmFilterGetByKey(hEngine,
&GUID_FILTER_V4, &filter) == ERROR_SUCCESS) {
FwpmFreeMemory((void**)&filter);
break;
}
To add the filters, we open a transaction and call FwpmFilterAdd to add the four filters with their
associated layers:
filter.weight.type = FWP_UINT8;
filter.layerKey = *fi.layer;
filter.action.type = FWP_ACTION_CALLOUT_UNKNOWN;
filter.action.calloutKey = *fi.callout;
FwpmFilterAdd(hEngine, &filter, nullptr, nullptr);
}
error = FwpmTransactionCommit(hEngine);
} while (false);
For every filter, we set its unique key (filterKey member), a display name, our provider, a weight of
8 (“medium” weight), the layer GUID, and the action. The action bears some explanation.
The action has two parts - the type, and an optional callout key. Here are the valid values for the type
member when adding filters:
FWP_ACTION_BLOCK and FWP_ACTION_PERMIT only make sense if conditions are applied to the filter.
Otherwise, they will categorically deny or permit everything. We’ve seen an example of using FWP_-
ACTION_BLOCK with a condition that involves an application ID at the beginning of this chapter to
block a “calculator” application from accessing the network.
In our case, we use a callout, and since we only block if needed (and do nothing otherwise), the
FWP_ACTION_CALLOUT_UNKNOWN value is the safest to use.
After the filters are added (if needed), BlockProcess sends the request to the driver. Here is the full
function:
DWORD ret;
return DeviceIoControl(hDevice, IOCTL_PNF_BLOCK_PROCESS, &pid, sizeof(pid),
nullptr, 0, &ret, nullptr);
}
Similarly, PermitProcess removes a PID from the list of blocked processes by contacting the driver:
Chapter 13: The Windows Filtering Platform 526
Finally, ClearAll deletes all filters and callouts, since they might not be needed anymore, and then
tells the driver to clear its list of blocked processes:
DeleteFilters and DeleteCallouts open a handle to the WFP engine and call the appropriate API
to delete a filter/callout by key:
bool DeleteFilters() {
HANDLE hEngine;
DWORD error = FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT,
nullptr, nullptr, &hEngine);
if (error)
return false;
FwpmFilterDeleteByKey(hEngine, &GUID_FILTER_V4);
FwpmFilterDeleteByKey(hEngine, &GUID_FILTER_V6);
FwpmFilterDeleteByKey(hEngine, &GUID_FILTER_UDP_V4);
FwpmFilterDeleteByKey(hEngine, &GUID_FILTER_UDP_V6);
FwpmEngineClose(hEngine);
return true;
}
bool DeleteCallouts() {
HANDLE hEngine;
DWORD error = FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT,
nullptr, nullptr, &hEngine);
if (error)
return false;
Chapter 13: The Windows Filtering Platform 527
FwpmCalloutDeleteByKey(hEngine, &GUID_CALLOUT_PROCESS_BLOCK_V4);
FwpmCalloutDeleteByKey(hEngine, &GUID_CALLOUT_PROCESS_BLOCK_V6);
FwpmCalloutDeleteByKey(hEngine, &GUID_CALLOUT_PROCESS_BLOCK_UDP_V4);
FwpmCalloutDeleteByKey(hEngine, &GUID_CALLOUT_PROCESS_BLOCK_UDP_V6);
FwpmEngineClose(hEngine);
return true;
}
Testing
The driver is installed in the usual way using the sc.exe tool (running elevated), and then started:
As an example, I ran calculator, but this time issued a block command based on its process id:
And verified that calculator is unable to update its currency exchange rates. Opening WFP Explorer
and examining the callouts view shows the four added callouts (figure 13-18).
Similarly, we expect four filters to be added using these callouts (filters view in WFP Explorer, see
figure 13-19).
You can now use the permit option to remove a process from being blocked:
blockprocess clear
Debugging
The WFP Explorer tool proved to be very useful in debugging. Making sure the correct callouts
and filters are being added is easy to see with this tool. Of course, you can write your own tools
that are more specific to the task at hand. The WFP management API is fairly intuitive to use and
is documented well enough. You may find the source code of WFP Explorer (https://github.com/
zodiacon/WFPExplorer) useful for your own work with the management API.
Summary
WFP is a powerful platform that provides lots of flexibility in filtering network requests. In this chapter,
we scratched the surface of WFP, but obviously there is a lot more, such as pending network operations,
examining actual packets, and even modifying packets. All these will have to wait for another book.
Chapter 14: Introduction to KMDF
The Kernel Mode Driver Framework (KMDF) was first available with Windows Vista, and later ported
to Windows XP and even Windows 2000. Its purpose is to provide a higher level of abstraction over
WDM for the purpose of building drivers for hardware devices.
Up until now, we have used WDM only for writing drivers. This is perfectly acceptable since our
drivers were not dealing with hardware devices. Using KMDF to write non-hardware drivers has
marginal benefits, and at least one disadvantage, as it adds a dependency to the driver with potentially
little value.
In this chapter, we’ll examine the fundamentals of KMDF, and see how we can create the Booster
driver from chapter 4 using KMDF. We will get some advantages when using KMDF, such as seeing
(and managing) our device in Device Manager.
In this chapter:
• Introduction to WDF
• Introduction to KMDF
• Object Creation
• The Booster KMDF Driver
• The INF File
• The User-Mode Client
• Installing and Testing
• Registering a Device Class
• Summary
Introduction to WDF
The Windows Driver Model (WDM) we have been using throughout this book was released with
Windows 2000 and Windows 98 (“Consumer Windows”) as a way to write source-compatible drivers
for these two platforms. Windows NT 4 and Windows 95 had different driver models making it more
difficult for hardware vendors to release drivers, as two separate drivers had to be written that did
not share any code.
With WDM, many types of kernel drivers for hardware devices could be written with a shared source
code, being compiled separately on Windows 2000 and Windows 98. This mostly worked well and
Chapter 14: Introduction to KMDF 530
made it easier for hardware vendors to build kernel drivers for their hardware. The same process
applied to subsequent operating systems, being Windows XP and Windows ME.
Obviously, today the Consumer Windows line of operating systems is no more, so the source code
compatibility provided by WDM is no longer a true advantage. With time, some deficiencies of WDM
were showing. The most important one was lack of built-in support for handling Plug & Play and
Power Management IRPs properly. Most WDM drivers would copy such code from existing Microsoft
samples that were close to what they needed, adjusting the code to the specifics of their hardware. In
some cases, this “boilerplate” Plug & Play and Power code contributes to 50% of the entire driver’s
source code.
Microsoft realized that WDM is too low-level for hardware-based drivers, so they came up with the
Windows Driver Frameworks (WDF), formerly known as Windows Driver Foundation as a solution to
these issues. WDF’s first version was released in 2006, coinciding with the release of Windows Vista.
WDF consists of two parts:
• KMDF - the replacement of WDM; it’s a library that layers on top of WDM; WDM is still the
fundamental kernel driver model in Windows.
• UMDF - the User Mode Driver Framework, which allows certain types of drivers to be written
in user mode.
UMDF is not in scope of this book as it’s about writing driver in user-mode, contrary to what this
book is about. See the sidebar for more on UMDF.
UMDF
UMDF allows writing drivers for relatively slow hardware devices, such as USB, in user-mode.
Writing driver in user mode has several advantages:
• No system crash can ever happen, meaning the robustness of the system is maintained.
• Testing and debugging is easier, and can be done on the same machine.
A UMDF driver is a normal user-mode DLL, hosted by a system-provided host process, UMDFHost.Exe.
If the DLL causes an exception to occur, the host process could crash, but the system remains intact.
The driver then can be reloaded into a new host instance.
UMDF has two fundamental versions:
• Versions 1.x are based on the Component Object Model (COM), requiring the driver to
implement various interfaces, while also getting implemented framework interfaces.
• Versions 2.x, supported from Windows 8.1 only, use the same APIs as KMDF, so that moving
between KMDF and UMDF (in both directions) is much easier.
Does using UMDF imply that it’s possible to access kernel APIs from user-mode? No. The UMDF
APIs communicate with a Reflector driver that sits in kernel mode, provided by Microsoft, which
is the “go to guy” of the UMDF driver, for performing operations in kernel-mode; the fundamental
rules cannot be broken.
Chapter 14: Introduction to KMDF 531
UMDF is suitable for slow devices but is not good enough for devices that require handling of
interrupts or other high-performance requirements, such as devices for PCI Express. Such drivers
must be written as kernel mode drivers.
Introduction to KMDF
KMDF is a library, a layer on top of WDM. Every KMDF driver starts its life as a WDM driver.
“Transforming” the driver into KMDF happens when a KMDF driver object is created in DriverEntry.
Some of the benefits of KMDF include:
• Boilerplate Plug & Play and Power Management implemented within the framework.
• Consistent object model based on properties, methods and events (Callbacks).
• APIs have consistent naming conventions.
• Object hierarchy support and lifetime management using reference counting.
• Major versions of the framework can run side-by-side.
The KMDF header file to include is wdf.h, which should follow <ntddk.h> or <ntifs.h>, as it depends
on their definitions.
KMDF Objects
Objects are the basis of KMDF. Although the APIs are C-based, their management and naming
is object based. Example objects include driver, devices, queues and requests. Some object types
correspond directory to their underlying WDM object (such as devices and requests), but others are
new, providing a higher level of abstraction over some functionality. Each object is accessed via its
API, while the object itself is provided as a “handle”, rather than a true pointer to a structure.
KMDF is implemented with C++, so each “handle” does correspond to a C++ object.
Objects have properties, methods, and events, having the following attributes:
• Properties - replace direct field access. Function names include Get or Set as part of the name
(for properties that cannot fail), or Assign/Retrieve for properties that may fail. The format
for property APIs is Wdf<Objetct>Set/Get/Assign/Retrieve<Desc>
• Methods - perform operations on objects. Naturally, these can have return values. Methods
have the format Wdf<ObjectType><Operation>
Chapter 14: Introduction to KMDF 532
• Events - can be registered by the driver, providing a callback to handle some scenario. Event
names have the format Evt<ObjectType><Event>
Figure 14-1 shows the KMDF object hierarchy with the “handle” names for the various object types
supported. We’ll use some of these object types later on, when we write a KMDF-equivalent driver to
the Booster driver from chapter 4.
KMDF objects are reference counted. Normally, the driver writer does not have to explicitly manage
that lifetime, as a parent object will “release” its child objects when the parent is destroyed. Since all
objects are somewhere in a hierarchy, manual referencing or dereferencing is not needed. There are
cases, however, that a driver may wish to extend the lifetime of an object. For example, the driver
might wish to log some information related to a KMDF object asynchronously using a work item. For
this purpose KMDF provides two generic lifetime-management APIs:
Every KMDF object supports two events related to its lifetime: EvtObjectCleanup and EvtObjectDestroy.
The EvtObjectDestroy callback is invoked just before the object is destroyed - its reference count
is zero. EvtObjectCleanup is raised earlier, when the object is in the process of being deleted, but
there still might be outstanding references to it. The object should release any references it holds to
other objects. The primary use case of this event is to break circular references, which is the primary
concern in any reference-counting system.
• WDFDRIVER - represents the driver. It’s a wrapper over the WDM DRIVER_OBJECT object
provided in DriverEntry. Creating a WDFDRIVER “transforms” the driver in KMDF.
• WDFDEVICE - represents a device (logical or physical). It’s a wrapper around a WDM DEVICE_-
OBJECT.
• WDFQUEUE - represents a queue of requests. There is no WDM equivalent to this object type, as
its purpose is to allow handling IRPs in a driver-selected way. It supports three types of queues:
sequential, parallel, and manual.
• WDFREQUEST - represents a request. It’s a wrapper over a WDM IRP.
Object Creation
KMDF makes it relatively easy to create objects, as it follows a consistent pattern for all object types.
Here is the WdfDriverCreate API as an example:
NTSTATUS WdfDriverCreate(
_In_ PDRIVER_OBJECT DriverObject,
_In_ PCUNICODE_STRING RegistryPath,
_In_opt_ PWDF_OBJECT_ATTRIBUTES DriverAttributes,
_In_ PWDF_DRIVER_CONFIG DriverConfig,
_Out_opt_ WDFDRIVER* Driver);
The function starts with mandatory parameters (DriverObject and RegistryPath in this case),
followed by two data structures. The first (WDF_OBJECT_ATTRIBUTES) is generic and appears in every
“Create” KMDF API. The second is a specific structure for further customization (WDF_DRIVER_CONFIG
in this case).
The generic structure pointer can be NULL, providing “default” behavior. Here is its declaration (with
the source-provided comments intact):
//
// Execution level constraints for Object
//
WDF_EXECUTION_LEVEL ExecutionLevel;
//
// Synchronization level constraint for Object
//
WDF_SYNCHRONIZATION_SCOPE SynchronizationScope;
//
// Optional Parent Object
//
WDFOBJECT ParentObject;
//
// Overrides the size of the context allocated as specified by
// ContextTypeInfo->ContextSize
//
size_t ContextSizeOverride;
//
// Pointer to the type information to be associated with the object
//
PCWDF_OBJECT_CONTEXT_TYPE_INFO ContextTypeInfo;
} WDF_OBJECT_ATTRIBUTES, *PWDF_OBJECT_ATTRIBUTES;
Most of the members are self-explanatory, but not all. When using it, it’s recommended to start with
a sensible instance - this is what the WDF_OBJECT_ATTRIBUTES_INIT inline function is for:
Its source is provided directly - it sets the Size member to the sizeof the structure, zeroes out every-
thing, and then sets two members to specific values: ExecutionLevel to WdfExecutionLevelInheritFromParent
and SynchronizationScope to WdfSynchronizationScopeInheritFromParent, both of which can
be considered “default”. These enumerations define the value zero to be invalid.
Using WDF_OBJECT_ATTRIBUTES_INIT is not needed if nothing needs to change - passing WDF_NO_-
OBJECT_ATTRIBUTES (defined as NULL) for this pointer to the creation function is sufficient. Notice
the EvtCleanupCallback and EvtDestroyCallback discussed earlier; this is where you would set
these if needed.
Back to WdfDriverCreate - The second structure is a more specific one, where there is always a
helper macro to initialize it - WDF_DRIVER_CONFIG_INIT in this case. It takes the “config” structure,
and required parameters. After initialization, you can change other members in the structure. The
WDF_DRIVER_CONFIG structure is defined like so:
Chapter 14: Introduction to KMDF 535
The fact that only the driver’s “add device” handler is required in WDF_DRIVER_CONFIG_INIT
(discussed later) indicates that the other members have sensible defaults.
The final parameter to a creation function is the resulting object handle. In the case of WdfDriverCreate
it’s a WDFDRIVER* where the result should land. This last parameter is optional - specifying NULL, or
more elegantly WDF_NO_HANDLE indicates the caller is not interested in the resulting handle. This is
typical for cases where the handle can later be retrieved independently. We’ll see both cases later on.
Once the “pattern” of creation functions is understood, it makes it relatively easy to use any creation
function. Here is another example, to solidify the pattern:
NTSTATUS WdfIoQueueCreate(
_In_ WDFDEVICE Device,
_In_ PWDF_IO_QUEUE_CONFIG Config,
_In_opt_ PWDF_OBJECT_ATTRIBUTES QueueAttributes,
_Out_opt_ WDFQUEUE* Queue);
WdfIoQueueCreate is used for creating queues - we’ll see a concrete example later - it has the ingredi-
ents discussed before: required parameters (Device), a specific structure (WDF_IO_QUEUE_CONFIG) ini-
tialized withWDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE or WDF_IO_QUEUE_CONFIG_INIT (some
extra flexibility here), and then the generic object attributes structure (QueueAttributes), with the
final parameter being the returned queue handle. The two structures seem to be in reverse order
compared to WdfDriverCreate - there is no good reason that I could find for this discrepancy.
Chapter 14: Introduction to KMDF 536
Context Memory
When creating a device object in WDM, a device extension size can be speicfied with the second
argumnent to IoCreateDevice. If the value is non-zero, the kernel will allocate the additional bytes
at the end of the DEVICE_OBJECT structure and point the DeviceExtension member to the beginning
of that block.
KMDF extends this idea by allowing any KMDF object to be associated with driver-specific memory
block (context). This makes it easy to track any required state along with an associated object. The first
step in allocating a some context memory associated with a KMDF object is to define the structure of
that extra memory. For example:
struct MyDeviceContext {
// members
};
Then, use a macro provided by KMDF that conveniently creates a function for accessing the memory:
WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(MyDeviceContext, DeviceGetContext)
The macro creates the function DeviceGetContext that can be used to retrieve a pointer to the
data (MyDeviceContext) after allocation. To make the actual allocation, the context size must be
specified within the generic WDF_OBJECT_ATTRIBUTES structure. A convenient macro can initialize
such structure before creating the actual object. Here is an example assuming a device object:
WDF_OBJECT_ATTRIBUTES devAttr;
WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&devAttr, MyDeviceContext);
status = WdfDeviceCreate(&DeviceInit, &devAttr, &device);
if(NT_SUCCESS(status)) {
MyDeviceContext* context = DeviceGetContext(device);
// use context
}
The project created is not truly empty - it has an INF file present. In the WDM case, we used to delete
it. This time we’ll keep it, as it’s required to get some of the niceties, such as getting our device listed
in Device Manager. Technically, we could have done that with WDM as well.
We’ll examine the INF file later. For now, let’s proceed with the main parts of the driver’s code.
Driver Initialization
We’ll add a standard C++ file to the project named Booster.cpp, and write the standard DriverEntry
prototype. The first thing to do in a KMDF driver is “transform” it to such from WDM. This is done
by creating the root KMDF driver object, wrapping the WDM-provided DRIVER_OBJECT:
WdfDriverCreate accepts the driver object and Registry path passed to DriverEntry along with the
“config” and “attributes” structures as dicussed in the creation “pattern” earlier in this chapter.
Compared to a WDM driver, DriverEntry seems lacking - two crucial pieces are missing: device
creation and symbolic link creation. Instead, WDF_DRIVER_CONFIG_INIT is used to initialize the
“config” structure with callback function named BoosterDeviceAdd. This callback is called every
time a device of this driver is “detected” in the system.
The Unload routine has been set up, as it’s being handled by KMDF automatically.
Since our driver is not handling any hardware device, true Plug & Play cannot detect it. Instead, the
INF file indicates (as we’ll see later when we take a closer look at it) that whenever the driver is loaded,
it should be treated as if its first (and only) device is “discovered”, and so the AddDevice callback must
be invoked (BoosterAddDevice in this case). This is where we’ll create the device object and symbolic
link.
The AddDevice callback is where all the magic happens. We need to do three things in that callback:
Chapter 14: Introduction to KMDF 538
Let’s see what each item entails. Fisrt, creating a device object:
The AddDevice callback receives the driver object handle and a helper structure, WDFDEVICE_INIT,
which is not publicly defined, but there are APIs to manage its contents. In our case, we don’t need
fo to anything. WdfDeviceCreate accepts a pointer to it, which means it can replace it with a new
object. The returned WDFDEVICE is a handle to the newly created device object.
What is missing compared to WDM? We used to pass a device name to IoCreateDevice, but no such
name is provided in the above call. The reason will become clear with the next initialization - the
symbolic link.
With the drivers we’ve written so far (not including filters), we provided an device name and an
explicit symbolic link. In the hardware space (what KMDF was built for), that is unlikely to be a good
idea. For example, suppose we’re writing a driver for a printer device. What should the device name
be? What should the symbolic link name be? “Printer1”? “MyPrinter”?
Using arbitrary strings has several drawbacks:
All these above issues are mostly applicable to hardware-based devices. Our Booster device is going
to be a singleton in the system (no other Booster devices can be connected), so perhaps the above
concerns are irrelevant. But we will treat our booster device similar to a hardware device in this
sense, to show the flexibility that we get if we adhere to that model.
What is that model? How can we solve the above issues? The I/O system provides the idea of Device
Interfaces. A device interface is identified with a GUID, but from a conceptual perspective it’s best to
think of these just like interfaces in object-oriened code.
Chapter 14: Introduction to KMDF 539
An interface is an abstraction that defined some kind of expected behavior, where multiple imple-
mentations of that behavior are possible. The way to solve the above issues is to register the device
as “implementing” one (or more) interfaces. In a case of printers, and many other “standard” device,
Microsoft has already defined those deviie interfaces with well-known (and documented) GUIDs.
A printer driver can say “register my device as a printer”. If a driver is for a multifunction device, like
a printer/scanner/fax set of devices which are part of the same hardware, then such a driver needs
to register itself as “implementing” three interfaces - printer, scanner and fax. Each such registration
creates a unique, repeatable, symbolic link, which is what we need.
With KMDF, the call to make (for each supported interface) is to WdfDeviceCreateDeviceInterface:
NTSTATUS WdfDeviceCreateDeviceInterface(
_In_ WDFDEVICE Device,
_In_ CONST GUID* InterfaceClassGUID,
_In_opt_ PCUNICODE_STRING ReferenceString);
The above API requires the device object, the GUID to register it with, and the result is provided by
ReferenceString, which is the resulted symbolic link. It is optional, since the driver has no use for
it - instead, it’s the client that needs the symbolic link. How can a client get the symbolic link? It will
have to use certain user-mode APIs to “locate” a device that implement the Booster “interface”. We’ll
see those later when we write a user-mode client.
Since our Booster device is unique, there is no predefined device interface we can use. Instead, we’ll
generate a GUID and consider that the Booster’s device interface. Think of it as “what does it mean
to be a booster device?”. We’ll add that GUID to the header file shared with user-mode clients, as it’s
needed in order to locate the device.
We’ll add a BoosterCommon header file to the project, which has the same pieces as the one from
earlier versions - the supported control code and the ThreadData structure. Additionally, it will have
our generated GUID:
#include <initguid.h>
#define IOCTL_BOOSTER_SET_PRIORITY \
CTL_CODE(BOOSTER_DEVICE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
struct ThreadData {
ULONG ThreadId;
int Priority;
};
// {49BDF7E8-8AD1-4852-9FB6-833279A1545F}
DEFINE_GUID(GUID_Booster, 0x49bdf7e8, 0x8ad1, 0x4852, \
0x9f, 0xb6, 0x83, 0x32, 0x79, 0xa1, 0x54, 0x5f);
Chapter 14: Introduction to KMDF 540
The GUID was generated with the Create GUID tool shown in figure 13-15.
Back to BoosterAddDevice - here is the call to WdfDeviceCreateDeviceInterface:
The next step in the AddDevice callback is to create a request queue. A queue is an sbtraction provided
by KMDF for handling requests (IRPs). When a request comes in, such as IRP_MJ_CREATE, IRP_MJ_-
READ or IRP_MJ_WRITE, KMDF takes control of the request. Internally, there are three “packages” used
by KMDF for request processing:
• I/O Package - handles “standard” requests like Create, Read and Device I/O Control
• P&P/Power package - handles IRP_MJ_PNP (Plug & Play) and IRP_MJ_POWER (Power Manage-
ment) requests
• WMI package - handles Windows Management Instrumentation (WMI) requests
Figure 14-2 shows the way these packages are logically connected internally and to request queues.
Since the booster device is not Plug & Play, and doesn’t support WMI, we only need to concern
ourselves with “standard” requests. At least one queue is required to handle such requests. Three
possible queues are provided:
Since the booster driver holds no state, there is no particular limit to the number of requests that
be handled concurrently - a parallel queue is the way to go. If there was some state, we could
use a sequential queue that would make it easier to handle requests without manually adding
synchronization, at the possible expense of lower performance with such requests, since they would
be handles in a classic First-In-First-Out (FIFO) queue.
To create a queue, we need to initialize its configuration, which mostly means which requests should
be handled by that queue, and then call WdfIoQueueCreate:
WDF_IO_QUEUE_CONFIG config;
WDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE(&config, WdfIoQueueDispatchParallel);
config.EvtIoDeviceControl = BoosterDeviceControl;
WDFQUEUE queue;
status = WdfIoQueueCreate(device, &config, WDF_NO_OBJECT_ATTRIBUTES, &queue);
PFN_WDF_IO_QUEUE_IO_RESUME EvtIoResume;
PFN_WDF_IO_QUEUE_IO_CANCELED_ON_QUEUE EvtIoCanceledOnQueue;
union {
struct {
ULONG NumberOfPresentedRequests;
} Parallel;
} Settings;
WDFDRIVER Driver;
} WDF_IO_QUEUE_CONFIG, *PWDF_IO_QUEUE_CONFIG;
You can see the EvtIo* callbacks for various requests and notifications, including EvtIoDefault
which is a “catch all” handler for other requests not specified elsewhere.
You may be wondering about the IRP_MJ_CREATE and IRP_MJ_CLOSE handlers. These are handled
automatically by the framework (in addition to IRP_MJ_CLEANUP. Create completed the request
successfully.
Customizing handlers for Create, Close, and Cleanup is possible with event
callbacks that can be applied on the DeviceInit structure using a WDFFILEOBJECT
object. See the documentation for WDF_FILEOBJECT_CONFIG_INIT and
WdfDeviceInitSetFileObjectConfig.
WDF_IO_QUEUE_CONFIG config;
WDF_IO_QUEUE_CONFIG_INIT_DEFAULT_QUEUE(&config, WdfIoQueueDispatchParallel);
config.EvtIoDeviceControl = BoosterDeviceControl;
WDFQUEUE queue;
status = WdfIoQueueCreate(device, &config, WDF_NO_OBJECT_ATTRIBUTES,
&queue);
Chapter 14: Introduction to KMDF 543
return status;
}
VOID EVT_WDF_IO_QUEUE_IO_DEVICE_CONTROL(
_In_ WDFQUEUE Queue,
_In_ WDFREQUEST Request,
_In_ size_t OutputBufferLength,
_In_ size_t InputBufferLength,
_In_ ULONG IoControlCode);
As you can see, the function already provides most of what we need in order to process the request.
There is no need to dig into the I/O stack location, as in WDM. The needed information is handed to
us on silver platter, so to speak.
We’ll start the implementation by examining the given control code:
switch (IoControlCode) {
case IOCTL_BOOSTER_SET_PRIORITY:
You may be wondering why the code uses UNREFERENCED_PARAMETER on the input buffer length.
Shouldn’t we be checking that as part of processing? As it turns out, even that is not strictly necessary
for our case. Here are the next lines of code:
ThreadData* data;
status = WdfRequestRetrieveInputBuffer(Request, sizeof(ThreadData),
(PVOID*)&data, nullptr);
if (!NT_SUCCESS(status))
break;
Chapter 14: Introduction to KMDF 544
WdfRequestRetrieveInputBuffer accepts the request object, the minimum size of the input buffer
(sizeof(ThreadData), the resulting pointer, and an optional variable to receive the actual input
buffer size. If the buffer is too small, WdfRequestRetrieveInputBuffer returns an appropriate status.
All we need to do is bail out if we get a failed status.
The next part of the handler is identical to the WDM case. This is what makes this driver unique:
PKTHREAD thread;
status = PsLookupThreadByThreadId(UlongToHandle(data->ThreadId), &thread);
if (!NT_SUCCESS(status))
break;
KeSetPriorityThread(thread, data->Priority);
ObDereferenceObject(thread);
info = sizeof(ThreadData);
break;
All that’s left to do is complete the request, for which KMDF has a bunch of APIs with different
completion details, such as the Information and the priority boost. For Booster, the following is
what is needed:
}
WdfRequestCompleteWithInformation(Request, status, info);
Here is the full device I/O control handler for easy reference:
switch (IoControlCode) {
case IOCTL_BOOSTER_SET_PRIORITY:
ThreadData* data;
status = WdfRequestRetrieveInputBuffer(Request, sizeof(ThreadData),
Chapter 14: Introduction to KMDF 545
(PVOID*)&data, nullptr);
if (!NT_SUCCESS(status))
break;
PKTHREAD thread;
status = PsLookupThreadByThreadId(
UlongToHandle(data->ThreadId), &thread);
if (!NT_SUCCESS(status))
break;
KeSetPriorityThread(thread, data->Priority);
ObDereferenceObject(thread);
info = sizeof(ThreadData);
break;
}
WdfRequestCompleteWithInformation(Request, status, info);
}
It seems more appropriate to use a naturally hierarchical format, such as XML or JSON. When INF
was invented, neither XML nor JSON existed. I would have expected Microsoft to adapt XML or
JSON at some point, but this hasn’t happened at the time of this writing, and unlikely to happen in
the future.
The Version section is mandatory in an INF file. The following is generated by the WDK project wizard
(for the empty KMDF project type) for the Booster project, also showing the provided comments
(anything after a semicolon is considered a comment until the end of the line):
[Version]
Signature="$WINDOWS NT$"
Class=System ; TODO: specify appropriate Class
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318} ; TODO: specify appropriate Cl\
assGuid
Provider=%ManufacturerName%
CatalogFile=Booster.cat
DriverVer= ; TODO: set DriverVer in stampinf property pages
PnpLockdown=1
The Signature directive must be set to the magic string "$Windows NT$". The reason for this name is
historical, and not important for this discussion.
The Class and ClassGuid directives are mandatory and specify the class (type or group) to which this
driver belongs to. The generated INF contains an example class, System, which is a predefined class
defined by Microsoft long ago, with its associated GUID.
The “TODO” comments indicate that we should probably change that to an “appropriate” class. What
is appropriate here? If the devices the driver manages are one of the predefined types (such as printer,
disk, display, etc.), then that one should be used. These predefined device classes are listed in the WDK
docs. For the Booster driver, it’s more appropriate to generate our own Booster “category” (class),
by generating another GUID. For the current driver, we’ll stick with the default System class. We’ll
generate our own later in this chapter.
The Class is mostly useful for hardware-based drivers, as some functionality can be specified based
on the driver’s class, such as loading certain filters. The list of all classes and their properties can be
found in the Registry under HKLM\System\CurrentControlSet\Control\Class. Each class is uniquely
identified by a GUID; the string name is just a human-readable helper. Figure 14-3 shows the System
class entry in the Registry.
Chapter 14: Introduction to KMDF 547
Back to the Version section in the INF - the Provider directive is the name of the driver publisher.
It doesn’t mean much in practical terms, but might appear in some UI, so should be something
meaningful. The value set by the WDK template is %ManufacturerName%. Anything within percent
symbols is treated like a “macro” - to be replaced by the actual value specified in another section
called Strings. Here is part of this section (traditionally the last section in the file):
[Strings]
SPSVCINST_ASSOCSERVICE= 0x00000002
ManufacturerName="Pavel Yosifovich"
DiskName = "Booster Installation Disk"
Booster.DeviceDesc = "Booster Device"
Booster.SVCDESC = "Booster Service"
As you can see, I have replaced the ManufacturerName with my name, and removed the original
“TODO” set by the project template.
[Manufacturer]
%ManufacturerName%=Standard,NT$ARCH$
[Standard.NT$ARCH$]
%Booster.DeviceDesc%=Booster_Device, Root\Booster ; TODO: edit hw-id
The Manufacturer section is mandatory, where the device installation sections must be listed.
Typically there is just one, but technically an INF can install drivers for multiple devices. The string
“Standard” forms a name for a section augmented with “NT$ARCH$” where “$ARCH$” is expanded
to the platform name, such as “AMD64”. This makes it easy to add sections that target specific
architectures, if desired.
Chapter 14: Introduction to KMDF 548
The pointed-to section, “Standard.NT$ARCH$” has directives pointing to specific device installation
instructions (just one in this case). The left part (“%Booster.DeviceDesc%”) is shown in case the Plug
& Play manager needs to show some User Interface with a description of the device, but otherwise
is not important. The value after the equals sign is comprised of at least two parts. The first is a
section name (“Booster_Device” in this case), where installation instructions continue. The second is
the unique device ID for this device. The format is generally Enumerator\ID, where Enumerator is
a type of bus in the hardware case (e.g. PCI), or a virtual bus, as it is in our case - the Root bus can
be used to force a device to load always, which is what we want since the Booster device is not a
hardware one.
The “TODO” comment indicates this can be changed if desired. We’ll keep the default since it’s
basically what we need.
Device Installation
The base name “Booster_Device” is used in multiple sections, all working towards installing the driver
with the correct settings. Here are the relevant sections:
[Booster_Device.NT]
CopyFiles=Drivers_Dir
[Drivers_Dir]
Booster.sys
Booster_Device.NT (applies for any architecture) has a CopyFiles directive, pointing to “Drivers_Dir”
section that lists the files to copy (Booster.sys only in this case).
The Booster_Device.NT.Services serves the same purpose as the CreateService API (or the sc.exe tool
we have been using). You can see the service information listed, including DisplayName, ServiceType,
StartType, and ErrorControl. ServiceBinary sets the ImagePath value in the Registry, pointing to
“%12%\Booster.sys”. This weird “%12%” value represents the %SystemRoot%\System32\Drivers direc-
tory. Table 14-1 shows some common directory names encoded with a number enclosed by percent
symbols.
Chapter 14: Introduction to KMDF 549
Number Directory
01 The directory from which the INF file is being installed
10 The Windows directory (same as %SystemRoot%)
11 The System directory (%SystemRoot\System32)
12 The Drivers directory (%SystemRoot\System32\Drivers)
17 The INF directory (%SystemDrive%INF)
20 The Fonts directory
24 Root directory of the system disk (e.g. C:\)
-1 Absolute path
There are a few more sections starting with “Booster_Device”, listed under a comment that reads
“CoInstaller Installation”. A co-installer is a generic name for any additional installation that may be
required besides the driver-specific files. In this case, it’s properly installing KMDF. These sections
are boilerplate, and there is no need to touch them.
ThreadData data;
data.ThreadId = atoi(argv[1]);
data.Priority = atoi(argv[2]);
DWORD bytes;
if (DeviceIoControl(hDevice, IOCTL_BOOSTER_SET_PRIORITY,
&data, sizeof(data), nullptr, 0, &bytes, nullptr))
printf("Success!\n");
else
printf("Error: %u\n", GetLastError());
CloseHandle(hDevice);
return 0;
}
As you can see from the above code, the only change compared to a “classic” client is the way the
symbolic link is obtained, by calling a helper function, FindBoosterDevice. If such a device is found,
its symbolic link is returned as a std::wstring, which is just handed over to CreateFile like always.
Learly, that function is the mystery.
We’ll start by adding the required includes:
#include <Windows.h>
#include <string>
#include <stdio.h>
#include "..\Booster\BoosterCommon.h"
#include <SetupAPI.h>
All the above includes should be familiar, except <setupapi.h>. This is where we’ll find the API used
to search for devices based on some criteria. Next, we’ll need to add its import library, since it’s
implemented in a separate DLL that is not referenced by default:
Now we can start implementing the FindBoosterDevice function. Remember that the driver has
registered itself with the GUID_Booster device interface, which is also provided in the common header
file. We need to search for devices that “implement” that device interface:
Chapter 14: Introduction to KMDF 551
std::wstring FindBoosterDevice() {
HDEVINFO hDevInfo = SetupDiGetClassDevs(&GUID_Booster, nullptr, nullptr,
DIGCF_PRESENT | DIGCF_DEVICEINTERFACE);
if (!hDevInfo)
return L"";
The SetupDiGetClassDevs API opens a handle to a “device information set” based on the supplied
arguments. Here we specify GUID_Booster to hone in only on this GUID, and tell the API to search
for existing devices only (DIGCF_PRESENT, without which the search would be extended to devices
that are installed by not currently loaded), and the second flag (DIGCF_DEVICEINTERFACE) indicates
the API should interpret GUID_Booster as a device interfaces, rather than a device class (which we’ll
see later). Our device class is System, so looking for that would return too many results.
The next step is to enumerate the resulting list of devices (if any), where we expect to find one
device or no device at all (if the Booster driver has not been loaded). The enumration is done with
SetupDiEnumDeviceInfo like so:
std::wstring result;
do {
SP_DEVINFO_DATA data{ sizeof(data) };
if (!SetupDiEnumDeviceInfo(hDevInfo, 0, &data))
break;
The zero indicates the first item in the device set. We could enumerate more by incrementing the index
until the call fails. Assuming a single Booster device is installed, zero is all we need. Once successful,
we can proceed to locate the symbolic link from the first (and only) device interface supported for the
Booster device:
This retrieves the first device interface. Now we need the symbolic link:
BYTE buffer[1024];
auto detail = (PSP_DEVICE_INTERFACE_DETAIL_DATA)buffer;
detail->cbSize = sizeof(*detail);
if (SetupDiGetDeviceInterfaceDetail(hDevInfo, &idata, detail,
sizeof(buffer), nullptr, &data))
result = detail->DevicePath;
} while (false);
SetupDiDestroyDeviceInfoList(hDevInfo);
return result;
}
Chapter 14: Introduction to KMDF 552
std::wstring FindBoosterDevice() {
HDEVINFO hDevInfo = SetupDiGetClassDevs(&GUID_Booster, nullptr, nullptr,
DIGCF_PRESENT | DIGCF_DEVICEINTERFACE);
if (!hDevInfo)
return L"";
std::wstring result;
do {
SP_DEVINFO_DATA data{ sizeof(data) };
if (!SetupDiEnumDeviceInfo(hDevInfo, 0, &data))
break;
BYTE buffer[1024];
auto detail = (PSP_DEVICE_INTERFACE_DETAIL_DATA)buffer;
detail->cbSize = sizeof(*detail);
if (SetupDiGetDeviceInterfaceDetail(hDevInfo, &idata, detail,
sizeof(buffer), nullptr, &data))
result = detail->DevicePath;
} while (false);
SetupDiDestroyDeviceInfoList(hDevInfo);
return result;
}
That’s it. This is all we need to get the symbolic link dynamically, based on the device interface we’re
after (GUID_Booster).
Once these files are copied to some directory on the target system, we need to use a tool called
devcon.exe, provided with the Windows SDK, to actually perform the installation. You can find it
in a directory like c:\Program Files (x86)\Windows Kits\10\Tools\10.0.25300.0\x64. Open an elevated
command window, navigate to above path and run the following:
The above command assumes that the driver files were copied to c:\Demo. The last argument must be
the hardware ID specific earlier in the INF file. The reason it’s required is that there could be multiple
device IDs. I would have expected DevCon to select the whatever is in the INF file if there is just
one. Currently, it’s not doing that. When installing, you’ll get the following dialog popping up (figure
14-4).
The dialog color is based on whether the driver about to be installed is signed or not. In our case, it’s
unsigned (the system is in test-signing mode), so the color is bright red as a warning. Click the “Install
this driver software anyway” option to proceed.
If you try to right-click the INF file and select Install, it won’t work. This only works with
a certain name for the install section (DefaultInstall), which is not the name given by the
KMDF project template.
Once the driver is installed, you can open Device Manager and expand the System Devices node -
remember the driver was listed in the system device class. The booster name should appear (figure
14-5).
Chapter 14: Introduction to KMDF 554
Right-clicking the Booster node and selecting properties, and navigating to the Details tab, shows
various properties of the device. Select Hardware IDs from the drop-down combobox and you’ll see
the familiar root\booster name (figure 14-6).
Chapter 14: Introduction to KMDF 555
You can browse to the System32\Drivers directory where you’ll find Booster.sys. You can also look in
the Registry at the Booster key under the standard Services key.
What about the device name and symbolic link? Let’s take a look using a local kernel debugger.
DriverStartIo: 00000000
DriverUnload: fffff8003f661760 Booster
AddDevice: fffff8001f292050 Wdf01000!FxDriver::AddDevice
Dispatch routines:
[00] IRP_MJ_CREATE fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[01] IRP_MJ_CREATE_NAMED_PIPE fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[02] IRP_MJ_CLOSE fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[03] IRP_MJ_READ fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[04] IRP_MJ_WRITE fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[05] IRP_MJ_QUERY_INFORMATION fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[06] IRP_MJ_SET_INFORMATION fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
...
[16] IRP_MJ_POWER fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[17] IRP_MJ_SYSTEM_CONTROL fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[18] IRP_MJ_DEVICE_CHANGE fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[19] IRP_MJ_QUERY_QUOTA fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[1a] IRP_MJ_SET_QUOTA fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
[1b] IRP_MJ_PNP fffff8001f257ac0 Wdf01000!FxDevice::DispatchWithL\
ock
!devstack ffffd28733edede0 :
!DevObj !DrvObj !DevExt ObjectName
> ffffd28733edede0 \Driver\Booster ffffd28745ec8fb0
ffffd28743ef5b10 \Driver\PnpManager ffffd28743ef5c60 0000010a
!DevNode ffffd2872e9d3050 :
DeviceInst is "ROOT\SYSTEM\0001"
ServiceName is "Booster"
Chapter 14: Introduction to KMDF 557
You may need to enter .reload to force loading of KMDF symbols so you would see proper symbols.
A few things to notice here:
How do we see the symbolic link without running the client? We could examine the symbolic links
directory using WinObj and look for a target of \Device\0000010a (figure 14-7).
Note the name of the symbolic link: “ROOT#SYSTEM#0001#”. It consists of a “local” name (the same
one shown as DeviceInst in the debugger output above where backslashes are replaced by pound
signs), and then the GUID_Booster GUID in string form.
We can run the user-mode client normally:
boost.exe 7752 20
If you’re curious as to how the driver is installed via the INF file, you can look at the DevCon source
code, which is provided as part of the WDK samples on Github. A couple of other tools you might
find useful include Device Exploer, an enhanced version of Device Manager (you can find it in one
of my Github repos), as well as InstDrv tool - a command line tool that can install a driver based
on an INF file without the need to specify a hardware ID - it just installs the first one. The source
code is part of the Device Explorer’s solution.
[DevClass_AddReg]
HKR,,,,MyDeviceClassName ; change as needed
HKR,,SilentInstall,,1
And of course the Version section has to use the new class and newly generated class GUID.
I have created another project in the same solution named Booster2 that has the same code, but with
the needed changes in the INF so that we get a new device class.
Here are the changes in the INF:
[Version]
Signature="$WINDOWS NT$"
Class=BoosterDevice
ClassGuid={AE4151AF-8C29-41C3-BB16-0B3115733333}
Provider=%ManufacturerName%
CatalogFile=Booster2.cat
DriverVer= ; TODO: set DriverVer in stampinf property pages
PnpLockdown=1
AddReg=DevClass_AddReg
[DevClass_AddReg]
HKR,,,,BoosterDevice
HKR,,SilentInstall,,1
The GUID in ClassGuid was generated with the Create GUID tool. Once the generated driver is
installed, the new device in Device Manager is shown similar to figure 14-8.
Does the client need to change? Not necessarily. It’s still value to look for the GUID_Booster device
interfaces. However, it also makes sense to add the device class GUID to the BoosterCommon.h header
file. Then the search function has the flexibility to search by device class instead of (or in addition to)
the device interface.
Summary
KMDF provides a higher level of abstraction over WDM. Its real power is clearly visible when writing
drivers for hardware-based devices, but as we have seen in this chapter it has some niceties we can
take advantage of to simplify coding.
Chapter 15: Miscellaneous Topics
In this last chapter of the book, we’ll take a look at various topics that didn’t fit well in previous
chapters.
In this chapter:
• Driver Signing
• Driver Verifier
• Filter Drivers
• Device Monitor
• Driver Hooking
• Kernel Libraries
Driver Signing
Kernel drivers are the only official mechanism to get code into the Windows kernel. As such, kernel
drivers can cause a system crash or another form of system instability. The Windows kernel does
not have any distinction between “more important” drivers and “less important” drivers. Microsoft
naturally would like Windows to be stable, with no system crashes or instabilities. Starting from
Windows Vista, on 64 bit systems, Microsoft requires drivers to be signed using a proper certificate
acquired from a certificate authority (CA). Without signing, the driver will not load.
Does a signed driver guarantee quality? Does it guarantee the system will not crash? No. It only
guarantees the driver files have not changed since leaving the publisher of the driver and that the
publisher itself is authentic. It’s not a silver bullet against driver bugs, but it does give some sense of
confidence in the driver.
For a hardware-based driver, Microsoft requires these to pass the Windows Hardware Quality Lab
(WHQL) tests, containing rigorous tests for stability and driver functionality. If the driver passes
these tests, it receives a Microsoft stamp of quality, which the driver publisher can advertise as a sign
of quality and trust. Another consequence of passing WHQL is making the driver available through
Windows Update, which is important for some publishers.
Starting with Windows 10 version 1607 (“Anniversary update”), for systems that were freshly installed
(not upgraded from an earlier version) with secure boot on - Microsoft requires drivers to be signed
by Microsoft as well as by the publisher. This is true for all types of drivers, not just related to
hardware. Microsoft provides a web portal where drivers can be uploaded (must already be signed by
the publisher), tested in some ways by Microsoft and finally signed by Microsoft and returned back to
the publisher. It may take some time for Microsoft to return the signed driver the first time the driver
is uploaded, but later iterations are fairly fast (several hours).
Chapter 15: Miscellaneous Topics 562
The driver that needs to be uploaded includes the binaries only. The source code is not required.
Figure 15-1 shows an example driver image file from Nvidia that is signed by both Nvidia and
Microsoft on a Windows 10 19H1 system.
The first step in driver signing is obtaining a proper certificate from a certificate authority (such
as Verisign, Globalsign, Digicert, Symantec, and others) for at least kernel code signing. The CA
will validate the identity of the requesting company, and if all is well, will issue a certificate. The
downloaded certificate can be installed in the machine’s certificate store. Since the certificate must be
Chapter 15: Miscellaneous Topics 563
kept secret and not leak, it is typically installed on a dedicated build machine and the driver signing
process is done as part of the build process.
The actual signing operation is done with the SignTool.exe tool, part of the Windows SDK. You can
use Visual Studio to sign a driver if the certificate is installed in a certificate store on the local machine.
Figure 15-2 shows the signing properties in Visual Studio.
Visual Studio provides two types of signing: Test sign and production sign. With test signing, a test
certificate (a locally-generated certificate that is not trusted globally) is typically used. This allows
testing the driver on systems configured with test signing enabled, as we’ve done throughout this
book. Production signing is about using a real certificate to sign the driver for production use.
Test certificates can be generated at will using Visual Studio when selecting a certificate, as shown in
Figure 15-3.
Chapter 15: Miscellaneous Topics 564
Figure 15-4 shows an example of production signing a release build of a driver in Visual Studio. Note
that the digest algorithm should be SHA256 rather than the older, less secure, SHA1.
Dealing with the various procedures for registering and signing drivers is beyond the scope of this
book. Things got more complicated in recent years due to new Microsoft rules and procedures. Consult
the official documentation available here⁴.
⁴https://docs.microsoft.com/en-us/windows-hardware/drivers/install/kernel-mode-code-signing-policy--windows-
vista-and-later-
Chapter 15: Miscellaneous Topics 565
Driver Verifier
Driver Verifier is a built-in tool that existed in Windows since Windows 2000. Its purpose is to help
identify driver bugs and bad coding practices. For example, suppose your driver causes a BSOD in
some way, but the driver’s code is not on any call stacks in the crash dump file. This typically means
that your driver did something which was not fatal at the time, such as writing beyond one of its
allocated buffers, where that memory was unfortunately allocated to another driver or the kernel.
At that point, there is no crash. However, sometime later that driver or the kernel will use that
overflowed data and most likely cause a system crash. There is no easy way to associate the crash
with the offending driver. The driver verifier offers an option to allocate memory for the driver in
its own “special” pool, where pages at higher and lower addresses are inaccessible, and so will cause
an immediate crash upon a buffer overflow or underflow, making it easy to identify the problematic
driver.
Driver verifier has a GUI and a command line interface, and can work with any driver - it does not
require any source code. The easiest way to start with the verifier is to open it by typing verifier in the
Run dialog or searching for verifier when clicking the Start button. Either way, the verifier presents
its initial user interface shown in Figure 15-5.
Chapter 15: Miscellaneous Topics 566
There are two things that need to be selected: the type of checks to do by the verifier, and the drivers
that should be checked. The first page of the wizard is about the checks themselves. The options
available on this page are as follows:
• Create standard settings selects a predefined set of checks to be performed. We’ll see the
complete list of available checks in the second page, each with a flag of Standard or Additional.
All those marked Standard are selected by this option automatically.
• Create custom settings allows fine grained selection of checks by listing all the available checks,
shown in Figure 15-6.
• Delete existing settings deletes all existing verifier settings.
• Display existing settings shows the current configured checks and the drivers for which this
checks apply.
• Display information about the currently verified drivers shows the collected information for the
drivers running under the verifier in an earlier session.
Chapter 15: Miscellaneous Topics 567
Selecting Create custom settings shows the available list of verifier settings, a list that has grown
considerably since the early days of Driver Verifier. The flag Standard flag indicates this setting is
part of the Standard settings that can be selected in the first page of the wizard. Once the settings have
been selected, the Verifier shows the next step for selecting the drivers to execute with these settings,
shown in Figure 15-7.
Chapter 15: Miscellaneous Topics 568
• Automatically select unsigned drivers is mostly relevant for 32 bit systems as 64 bit systems
must have signed drivers (unless in test signing mode). Clicking Next will list such drivers.
Most systems would not have any.
• Automatically select drivers built for older versions of Windows is a legacy setting for NT 4
hardware based drivers. Mostly uninteresting for modern systems.
• Automatically select all drivers installed on the computer is a catch all option that selects all
drivers. This theoretically could be useful if you are presented with a system that crashes but no
one has any clue as to offending driver. However, this setting is not recommended, as it slows
down the machine (verifier has its costs), because verifier intercepts various operations (based
on the previous settings) and typically causes more memory to be used. So it’s better in such a
scenario to select the first (say) 15 drivers, see if the verifier catches the bad driver, and if not
select the next 15 drivers, and so on.
• Select driver names from a list* is the best option to use, where Verifier presents a list of drivers
currently executing on the system, as shown in Figure 15-8. If the driver in question is not
Chapter 15: Miscellaneous Topics 569
currently running, clicking Add currently not loaded driver(s) to the list… allows navigating to
the relevant SYS file(s).
Finally, clicking Finish changes makes the settings permanent until revoked, and the system typically
needs to be restarted so that verifier can initialize itself and hook drivers, especially if these are
currently executing.
Let’s try this (in a virtual machine). It may take several clicks on Crash to actually crash the system.
Figure 15-10 shows the result on a Windows 7 VM after some clicks on Crash and several seconds
passing by. Note the BSOD code (BAD_POOL_HEADER). A good guess would be the buffer overflow
wrote over some of the metadata of a pool allocation.
Chapter 15: Miscellaneous Topics 571
Loading the resulting dump file and looking at the call stack shows this:
1: kd> k
# Child-SP RetAddr Call Site
00 fffff880`054be828 fffff800`029e4263 nt!KeBugCheckEx
01 fffff880`054be830 fffff800`02bd969f nt!ExFreePoolWithTag+0x1023
02 fffff880`054be920 fffff800`02b0669b nt!ObpAllocateObject+0x12f
03 fffff880`054be990 fffff800`02c2f012 nt!ObCreateObject+0xdb
04 fffff880`054bea00 fffff800`02b1a7b2 nt!PspAllocateThread+0x1b2
05 fffff880`054bec20 fffff800`02b20d95 nt!PspCreateThread+0x1d2
06 fffff880`054beea0 fffff800`028aaad3 nt!NtCreateThreadEx+0x25d
07 fffff880`054bf5f0 fffff800`028a02b0 nt!KiSystemServiceCopyEnd+0x13
08 fffff880`054bf7f8 fffff800`02b29a60 nt!KiServiceLinkage
Chapter 15: Miscellaneous Topics 572
Clearly, MyFault.sys is nowhere to be found. analyze -v, by the way is no wiser and concludes that
the module nt is the culprit.
Now let’s try the same experiment with Driver Verifier. Choose standard settings and navigate to
the System32\Drivers to locate MyFault.sys (if it’s not currently running). Restart the system, run
NotMyFault again, select Buffer overflow and click Crash. You will notice that the system crashes
immediately, with a BSOD similar to the one shown in Figure 15-11.
Figure 15-11: NotMyFault BSOD on Windows 7 with Buffer overflow and Verifier active
The BSOD itself is immediately telling. The dump file confirms it with the following call stack:
Chapter 15: Miscellaneous Topics 573
0: kd> k
# Child-SP RetAddr Call Site
00 fffff880`0651c378 fffff800`029ba462 nt!KeBugCheckEx
01 fffff880`0651c380 fffff800`028ecb96 nt!MmAccessFault+0x2322
02 fffff880`0651c4d0 fffff880`045f1c07 nt!KiPageFault+0x356
03 fffff880`0651c660 fffff880`045f1f88 myfault+0x1c07
04 fffff880`0651c7b0 fffff800`02d63d56 myfault+0x1f88
05 fffff880`0651c7f0 fffff800`02b43c7a nt!IovCallDriver+0x566
06 fffff880`0651c850 fffff800`02d06eb1 nt!IopSynchronousServiceTail+0xfa
07 fffff880`0651c8c0 fffff800`02b98296 nt!IopXxxControlFile+0xc51
08 fffff880`0651ca00 fffff800`028eead3 nt!NtDeviceIoControlFile+0x56
09 fffff880`0651ca70 00000000`777e98fa nt!KiSystemServiceCopyEnd+0x13
Filter Drivers
The Windows driver model is device-centric as we’ve seen already in chapter 7. Devices can be layered
on top of each other, resulting in the highest layer device getting first crack at an incoming IRP. This
same model is used for file system drivers, which we leveraged in chapter 12 with the help of the
Filter Manager, which is specialized for file system filters. However, the filtering model is generic and
can be utilized for other types of devices. In this section we’ll take a closer look at the general model
of device filtering, which we’ll be able to apply to a broad range of devices, some of which are related
to hardware devices while others are not.
The kernel API provides several functions that allow one device to be layered on top of another device.
The simplest is probably IoAttachDevice which accepts a device object to attach and a target named
device object to attach to. Here is its prototype:
NTSTATUS IoAttachDevice (
PDEVICE_OBJECT SourceDevice,
_In_ PUNICODE_STRING TargetDevice,
_Out_ PDEVICE_OBJECT *AttachedDevice);
The output of the function (besides the status) is another device object to which the SourceDevice was
actually attached to. This is required since attaching to a named device which is not at the top of its
device stack succeeds, but the source device is actually attached on top of the topmost device, which
may be another filter. It’s important, therefore, to get the real device that the source device attached
itself to, as that device should be the target of requests if the driver wishes to propagate them down
the device stack. This is illustrated in Figure 15-12.
Chapter 15: Miscellaneous Topics 574
Unfortunately, attaching to a device object requires some more work. As discussed in chapter 7, a
device can ask the I/O manager to help with accessing a user’s buffer with Buffered I/O or Direct
I/O (for IRP_MJ_READ and IRP_MJ_WRITE requests) by setting the appropriate flags in the Flags
member of the DEVICE_OBJECT. In a layering scenario there are multiple devices, so which device is
the one that determines how the I/O manager should help with I/O buffers? It turns out it’s always
the topmost device. This means that our new filter device should copy the value of DO_BUFFERED_IO
and DO_DIRECT_IO flags from the device it actually layered on top of. The default for a device just
created with IoCreateDevice has neither of these flags set, so if the new device fails to copy these
bits, it most likely will cause the target device to malfunction and even crash, as it would not expect
its selected buffering method not being respected.
There are a few other settings that need to be copied from the attached device to make sure the new
filter looks the same to the I/O system. We’ll see these settings later when we build a complete example
of a filter.
What is this device name that IoAttachDevice requires? This is a named device object within the
Object Manager’s namespace, viewable with the WinObj tools we’ve used before. Most of the named
device objects are located in the \Device\ directory, but some are located elsewhere. For example,
if we were to attach a filter device object to Process Explorer’s device object, the name would be
\Device\ProcExp152 (the name is case insensitive).
Other functions for attaching to another device object include IoAttachDeviceToDeviceStack
and IoAttachDeviceToDeviceStackSafe, both accepting another device object to attach to rather
than a name of a device. These functions are mostly useful when building filters registered for
hardware-based device drivers, where the target device object is provided as part of device node
building (partially described in chapter 7 as well). Both return the actual layered device object, just as
IoAttachDevice does. The Safe function returns a proper NTSTATUS, while the former returns NULL
on failure. Other than that, these functions are identical.
Generally, kernel code can obtain a pointer to a named device object with IoGetDeviceObjectPointer
Chapter 15: Miscellaneous Topics 575
that returns a device object and a file object open for that device based on a device name. Here is the
prototype:
NTSTATUS IoGetDeviceObjectPointer (
_In_ PUNICODE_STRING ObjectName,
_In_ ACCESS_MASK DesiredAccess,
_Out_ PFILE_OBJECT *FileObject,
_Out_ PDEVICE_OBJECT *DeviceObject);
The desired access is typically FILE_READ_DATA or any other that is valid for file objects. The returned
file object’s reference is incremented, so the driver needs to be careful to decrement that reference
eventually (ObDereferenceObject) so the file object does not leak. The returned device object can
be used as an argument to IoAttachDeviceToDeviceStack(Safe).
The above code snippet sets all major function codes pointing to the same function. The HandleFilterFunction
function must, at the very least, call the lower layered driver using the device object obtained from
one of the “attach” functions. Of course, being a filter, the driver will want to do additional work
or different work for requests it’s interested in, but all the requests it does not care about must be
forwarded to the lower layer device, or else that device will not function properly.
This “forward and forget” operation is very common in filters. Let’s see how to implement this
functionality. The actual call that transfers an IRP to another device is IoCallDriver. However,
before calling it the current driver must prepare the next I/O stack location for the lower driver’s use.
Remember that initially, the I/O manager only initializes the first I/O stack location. it’s up to every
layer to initialize the next I/O stack location before using IoCallDriver to pass the IRP down the
device stack.
The driver can call IoGetNextIrpStackLocation to get a pointer to the next layer’s IO_STACK_-
LOCATION and go ahead and initialize it. In most cases, however, the driver just wants to present
to the lower layer the same information it received itself. One function that can help with that is
IoCopyCurrentIrpStackLocationToNext, which is pretty self explanatory. This function, however,
does not just blindly copy the I/O stack location like so:
Chapter 15: Miscellaneous Topics 576
Why? The reason is subtle, and has to do with the completion routine. Recall from chapter 7
that a driver can set up a completion routine to be notified once an IRP is completed by a lower
layer driver (IoSetCompletionRoutine/Ex). The completion pointer (and a driver-defined context
argument) are stored in the next I/O stack location, and that’s why a blind copy would duplicate
the higher-level completion routine (if any), which is not what we want. This is exactly what
IoCopyCurrentIrpStackLocationToNext avoids.
But there is actually a better way if the driver does not need a completion routine and just wants
to use “forward and forget”, without paying the price of copying the I/O stack location data. This is
accomplished by skipping the I/O stack location in such a way so that the next lower layer driver sees
the same I/O stack location as this one:
IoSkipCurrentIrpStackLocation(Irp);
status = IoCallDriver(LowerDeviceObject, Irp);
IoSkipCurrentIrpStackLocation simply decrements the internal IRP’s I/O stack location’s pointer,
and IoCallDriver increments it, essentially making the lower driver see the same I/O stack location
as this layer, without any copying going on; this is the preferred way of propagating the IRP down if
the driver does not wish to make changes to the request and it does not require a completion routine.
Attaching Filters
When does a driver call one of the attach functions? The ideal time is when the underlying device
(the attach target) is being created; that is, the device node is in the process of being built. This is
common in filters for hardware-based device drivers, where filters can be registered in the named
values UpperFilters and LowerFilters we saw in chapter 7. For these filters, the proper location for
actually creating the new device object and attaching it to an existing device stack is in a callback set
with the AddDevice member accessible from the driver object like so:
DriverObject->DriverExtension->AddDevice = FilterAddDevice;
We’ve briefly discussed that in chapter 14 when looking at driver initialization with KMDF.
Chapter 15: Miscellaneous Topics 577
This AddDevice callback is invoked when a new hardware device belonging to the driver has been
identified by the Plug & Play system. This routine has the following prototype:
NTSTATUS AddDeviceRoutine (
_In_ PDRIVER_OBJECT DriverObject,
_In_ PDEVICE_OBJECT PhysicalDeviceObject);
The I/O system provides the driver with the device object at the bottom of the device stack
(PhysicalDeviceObject or PDO) to be used in a call to IoAttachDeviceToDeviceStack(Safe). This
PDO is one reason why DriverEntry is not a suitable location to make an attach call - at this point
the PDO is not yet provided. Furthermore, a second device of the same type may be added into the
system (such as a second USB camera), in which case DriverEntry is not going to be called at all;
only the AddDevice routine will.
Here is an example for implementing an AddDevice routine for a filter driver (error handling omitted):
struct DeviceExtension {
PDEVICE_OBJECT LowerDeviceObject;
};
//
// copy some info from the attached device
//
DeviceObject->DeviceType = ext->LowerDeviceObject->DeviceType;
DeviceObject->Flags |= ext->LowerDeviceObject->Flags &
(DO_BUFFERED_IO | DO_DIRECT_IO);
//
// important for hardware-based devices
//
Chapter 15: Miscellaneous Topics 578
return status;
}
• The device object is created without a name. A name is not needed, because the target device
is named and is the real target for IRPs, so no need to provide our own name. The filter is going
to be invoked regardless.
• In the IoCreateDevice call we specify a non-zero size for the second argument, asking the
I/O manager to allocate an extra buffer (sizeof(DeviceExtension)) along with the actual
DEVICE_OBJECT. Up until now we used global variables to manage state for a device because we
had just one. However, a filter driver may create multiple device objects and attach to multiple
device stacks, making it harder to correlate device objects with some state. The device extension
mechanism makes it easy to get to a device-specific state given the device object itself. In the
above code we capture the lower device object as our state, but this structure can be extended
to include more information as needed.
• We copy some information from the lower device object, so that our filter appears to the I/O
system as the target device itself. Specifically, we copy the device type and the buffering method
flags. Copying the buffering method flags is critical, as the buffering method is determined by
the uppermost device - our filter as it may turn out.
• Finally, we remove the DO_DEVICE_INITIALIZING flag (set by the I/O system initially) to
indicate to the Plug & Play manager that the device is ready for work. The DO_POWER_PAGABLE
flag indicates Power IRPs should arrive in IRQL < DISPATCH_LEVEL, and is in fact mandatory.
Given the above code, here is a “forward and forget” implementation that uses the lower device as
described in the previous section:
IoSkipCurrentIrpStackLocation(Irp);
return IoCallDriver(ext->LowerDeviceObject, Irp);
}
the target device already exists, it’s already working, and at some point it gets a filter. The driver must
make sure this slight “interruption” does not have any adverse effect on the target device. Most of the
operations shown in the previous sections are relevant here as well, such as copying some flags from
the lower device. However, some extra care must be taken to make sure the target device’s operations
are not disrupted.
Using IoAttachDevice, the following code creates a device object and attaches it over another named
device object (error handling omitted):
//
// use hard-coded name for illustration purposes
//
UNICODE_STRING targetName = RTL_CONSTANT_STRING(L"\\Device\\SomeDeviceName");
PDEVICE_OBJECT DeviceObject;
auto status = IoCreateDevice(DriverObject, 0, nullptr,
FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
PDEVICE_OBJECT LowerDeviceObject;
status = IoAttachDevice(DeviceObject, &targetName, &LowerDeviceObject);
//
// copy information
//
DeviceObject->Flags |= LowerDeviceObject->Flags &
(DO_BUFFERED_IO | DO_DIRECT_IO);
Astute readers may notice that the above code has an inherent race condition. Can you spot it?
This is essentially the same code used in the AddDevice callback in the previous section. But in that
code there was no race condition. This is because the target device was not yet active - the device node
was being built, device by device, from the bottom to the top. The device was not yet in a position to
receive requests.
Contrast that with the above code - the target device is working and could be very busy, when
suddenly a filter appears. The I/O system makes sure there is no issue while performing the actual
attach operation, but once the call to IoAttachDevice returns (and in fact even before that), requests
continue to come in. Suppose that a read operation comes in just after IoAttachDevice returns but
before the buffering method flags are set - the I/O manager will see the flags as zero (neither I/O)
since it only looks at the topmost device, which is now our filter! So if the target device uses Direct
I/O (for example), the I/O manager will not lock the user’s buffer, will not create an MDL, etc. This
could lead to a system crash if the target driver always assumes that Irp->MdlAddress (for example)
is non-NULL.
Chapter 15: Miscellaneous Topics 580
The window of opportunity for failure is very small, but it’s better to play it safe.
How can we solve this race condition? We must prepare our new device object fully before actually
attaching. We can do that by calling IoGetDeviceObjectPointer to get the target device object,
copy the required information to our own device (at this time still not attached), and only then call
IoAttachDeviceToDeviceStack(Safe). We’ll see a complete example later in this chapter.
Filter Cleanup
Once a filter is attached, it must be detached at some point. Calling IoDetachDevice with the lower
device object pointer performs this operation. Notice the lower device object is the argument, not the
filter’s own device object. Finally, IoDeleteDevice for the filter’s device object should be called, just
as we did in all our drivers so far.
The question is when should this cleanup code be called? if the driver is unloaded explicitly, then the
normal unload routine should perform these cleanup operations. However, some complication arises
in filters for hardware-based drivers. These drivers may need to unload because of a Plug & Play
event, such as a user yanking out a device out of the system. A hardware based driver receives an
IRP_MJ_PNP request with a minor IRP IRP_MN_REMOVE_DEVICE indicating the hardware itself is gone,
so the entire device node is not needed and it will be torn down. It’s the responsibility of the driver
to handle this PnP request properly, detach from the device node and delete the device.
This means that for hardware-based filters, a simple “forward and forget” for IRP_MJ_PNP will not
suffice. Special treatment is needed for IRP_MN_REMOVE_DEVICE. Here is some example code:
1. The driver allocates the structure as part of a device extension or a global variable and initializes
it once with IoInitializeRemoveLock.
2. For every IRP, the driver acquires the remove lock with IoAcquireRemoveLock before passing it
down to a lower device. if the call fails (STATUS_DELETE_PENDING) it means a remove operation
is in progress and the driver should return immediately.
3. Once a lower driver is done with the IRP, release the remove lock (IoReleaseRemoveLock).
4. When handling IRP_MN_REMOVE_DEVICE call IoReleaseRemoveLockAndWait before detach-
ing and deleting the device. The call will succeed once all other IRPs are not longer being
processed.
With these steps in mind, the generic dispatch passing requests down must be changed as follows
(assuming the remove lock was already initialized):
struct DeviceExtension {
IO_REMOVE_LOCK RemoveLock;
PDEVICE_OBJECT LowerDeviceObject;
};
//
// second argument is unused in release builds of Windows
//
auto status = IoAcquireRemoveLock(&ext->RemoveLock, Irp);
if(!NT_SUCCESS(status)) { // STATUS_DELETE_PENDING
Irp->IoStatus.Status = status;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return status;
}
IoSkipCurrentIrpStackLocation(Irp);
Chapter 15: Miscellaneous Topics 582
IoReleaseRemoveLock(&ext->RemoveLock, Irp);
return status;
}
The IRP_MJ_PNP handler must be modified to use the remove lock properly:
IoSkipCurrentIrpStackLocation(Irp);
auto status = IoCallDriver(ext->LowerDeviceObject, Irp);
if�(minor == IRP_MN_REMOVE_DEVICE) {
// wait if needed
IoReleaseRemoveLockAndWait(&ext->RemoveLock, Irp);
IoDetachDevice(ext->LowerDeviceObject);
IoDeleteDevice(fido);
}
else {
IoReleaseRemoveLock(&ext->RemoveLock, Irp);
}
return status;
}
Device Monitor
With the information presented thus far it is possible to build a generic driver that can attach to device
objects as filters to other devices. This allows for intercepting requests to (almost) any device we’re
interested in. A companion user-mode client will allow adding and removing devices to filter.
We’ll create a new Empty WDM driver project named KDevMon as we’ve done numerous times.
The driver should be able to attach to multiple devices, and on top of that expose its own Control
Chapter 15: Miscellaneous Topics 583
Device Object (CDO) to handle user-mode client configuration requests. The CDO will be created in
DriverEntry as usual, but attachments will be managed separately, controlled by requests from a
user-mode client.
To manage all the devices currently being filtered, we’ll create a helper class named DevMonManager.
Its primary purpose is to add and remove devices to filter. Each device will be represented by the
following structure:
struct MonitoredDevice {
UNICODE_STRING DeviceName;
PDEVICE_OBJECT DeviceObject;
PDEVICE_OBJECT LowerDeviceObject;
};
For each device, we need to keep the filter device object (the one created by this driver), the lower
device object to which it’s attached and the device name. The name will be needed for detach purposes.
The DevMonManager class holds a fixed array of MonitoredDevice structures, a fast mutex to protect
the array and some helper functions. Here are the main ingredients in DevMonManager:
class DevMonManager {
public:
void Init(PDRIVER_OBJECT DriverObject);
NTSTATUS AddDevice(PCWSTR name);
int FindDevice(PCWSTR name);
bool RemoveDevice(PCWSTR name);
void RemoveAllDevices();
MonitoredDevice& GetDevice(int index);
PDEVICE_OBJECT CDO;
private:
bool RemoveDevice(int index);
private:
MonitoredDevice Devices[MaxMonitoredDevices];
int MonitoredDeviceCount;
FastMutex Lock;
PDRIVER_OBJECT DriverObject;
};
Chapter 15: Miscellaneous Topics 584
First, we have to acquire the mutex in case more than one add/remove/find operation is taking place
at the same time. Next, we can make some quick checks to see if all our array slots are taken and that
the device in question is not already being filtered:
Locker locker(Lock);
if (MonitoredDeviceCount == MaxMonitoredDevices)
return STATUS_BUFFER_TOO_SMALL;
if (FindDevice(name) >= 0)
return STATUS_SUCCESS;
Now it’s time to look for a free array index where we can store information on the new filter being
created:
A free slot is indicated by a NULL device object pointer inside the MonitoredDevice structure. Next,
we’ll try and get a pointer to the device object that we wish to filter with IoGetDeviceObjectPointer:
UNICODE_STRING targetName;
RtlInitUnicodeString(&targetName, name);
PFILE_OBJECT FileObject;
PDEVICE_OBJECT LowerDeviceObject = nullptr;
auto status = IoGetDeviceObjectPointer(&targetName, FILE_READ_DATA,
&FileObject, &LowerDeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to get device object pointer (%ws) (0x%8X)\n",
name, status));
return status;
}
The result of IoGetDeviceObjectPointer is in fact the topmost device object, which is not
necessarily the device object we were targeting. This is fine, since any attach operation will actually
Chapter 15: Miscellaneous Topics 585
attach to the top of the device stack anyway. The function can fail, of course, most likely because a
device with that specific name does not exist.
The next step is to create the new filter device object and initialize it, partly based on the device object
pointer we just acquired. At the same time, we need to fill the MonitoredDevice structure with the
proper data. For each created device we want to have a device extension that stores the lower device
object, so we can get to it easily at IRP handling time. For this, we define a device extension structure
called simply DeviceExtension that can hold this pointer (in the DevMonManager.h file):
struct DeviceExtension {
PDEVICE_OBJECT LowerDeviceObject;
};
do {
status = IoCreateDevice(DriverObject, sizeof(DeviceExtension), nullptr,
FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
if (!NT_SUCCESS(status))
break;
IoCreateDevice is called with the size of the device extension to be allocated in addition to the
DEVICE_OBJECT structure itself. The device extension is stored in the DeviceExtension field in
the DEVICE_OBJECT, so it’s always available when needed. Figure 15-13 shows the effect of calling
IoCreateDevice.
Now we can continue with device initialization and the MonitoredDevice structure:
Chapter 15: Miscellaneous Topics 586
//
// allocate buffer to copy device name
//
buffer = (WCHAR*)ExAllocatePool2(POOL_FLAG_PAGED, targetName.Length,
DRIVER_TAG);
if (!buffer) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
Devices[i].DeviceName.Buffer = buffer;
Devices[i].DeviceName.MaximumLength = targetName.Length;
RtlCopyUnicodeString(&Devices[i].DeviceName, &targetName);
Devices[i].DeviceObject = DeviceObject;
At this point the new device object is ready, all that’s left is to attach it and finish some more
initializations:
status = IoAttachDeviceToDeviceStackSafe(
DeviceObject, // filter device object
LowerDeviceObject, // target device object
&ext->LowerDeviceObject); // result
if (!NT_SUCCESS(status))
break;
Devices[i].LowerDeviceObject = ext->LowerDeviceObject;
//
// hardware based devices require this
//
DeviceObject->Flags &= ~DO_DEVICE_INITIALIZING;
Chapter 15: Miscellaneous Topics 587
DeviceObject->Flags |= DO_POWER_PAGABLE;
MonitoredDeviceCount++;
} while (false);
The device is attached, with the resulting pointer saved immediately to the device extension. This is
important, as the process of attaching itself generates at least two IRPs - IRP_MJ_CREATE and IRP_-
MJ_CLEANUP and so the driver must be prepared to handle these. As we shall soon see, this handling
requires the lower device object to be available in the device extension.
All that’s left now is to clean up:
if (!NT_SUCCESS(status)) {
if (buffer)
ExFreePool(buffer);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
Devices[i].DeviceObject = nullptr;
}
if (LowerDeviceObject) {
// dereference - not needed anymore
ObDereferenceObject(FileObject);
}
return status;
}
}
if (FindDevice(name) >= 0)
return STATUS_SUCCESS;
continue;
UNICODE_STRING targetName;
RtlInitUnicodeString(&targetName, name);
PFILE_OBJECT FileObject;
PDEVICE_OBJECT LowerDeviceObject = nullptr;
auto status = IoGetDeviceObjectPointer(&targetName, FILE_READ_DATA,
&FileObject, &LowerDeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to get device object pointer (%ws) (0x%8X)\n",
name, status));
return status;
}
do {
status = IoCreateDevice(DriverObject, sizeof(DeviceExtension),
nullptr, FILE_DEVICE_UNKNOWN, 0, FALSE, &DeviceObject);
if (!NT_SUCCESS(status))
break;
//
// allocate buffer to copy device name
//
buffer = (WCHAR*)ExAllocatePool2(POOL_FLAG_PAGED,
targetName.Length, DRIVER_TAG);
if (!buffer) {
status = STATUS_INSUFFICIENT_RESOURCES;
break;
}
DeviceObject->DeviceType = LowerDeviceObject->DeviceType;
Devices[i].DeviceName.Buffer = buffer;
Devices[i].DeviceName.MaximumLength = targetName.Length;
RtlCopyUnicodeString(&Devices[i].DeviceName, &targetName);
Devices[i].DeviceObject = DeviceObject;
Chapter 15: Miscellaneous Topics 589
status = IoAttachDeviceToDeviceStackSafe(
DeviceObject, // filter device object
LowerDeviceObject, // target device object
&ext->LowerDeviceObject); // result
if (!NT_SUCCESS(status))
break;
Devices[i].LowerDeviceObject = ext->LowerDeviceObject;
// hardware based devices require this
DeviceObject->Flags &= ~DO_DEVICE_INITIALIZING;
DeviceObject->Flags |= DO_POWER_PAGABLE;
MonitoredDeviceCount++;
} while (false);
if (!NT_SUCCESS(status)) {
if (buffer)
ExFreePool(buffer);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
Devices[i].DeviceObject = nullptr;
}
if (LowerDeviceObject) {
// dereference - not needed anymore
ObDereferenceObject(FileObject);
}
return status;
}
// should never get here
NT_ASSERT(false);
return STATUS_UNSUCCESSFUL;
}
return RemoveDevice(index);
}
ExFreePool(device.DeviceName.Buffer);
IoDetachDevice(device.LowerDeviceObject);
IoDeleteDevice(device.DeviceObject);
device.DeviceObject = nullptr;
MonitoredDeviceCount--;
return true;
}
The important parts are detaching the device and deleting it. FindDevice is a simple helper to locate
a device by name in the array. It returns the index of the device in the array, or -1 if the device is not
found:
The only trick here is to make sure the fast mutex is acquired before calling this function.
Chapter 15: Miscellaneous Topics 591
DevMonManager g_Data;
Nothing new in this piece of code. Next we must initialize all dispatch routines so that all major
functions are supported:
// equivalent to:
// for (int i = 0; i < ARRAYSIZE(DriverObject->MajorFunction); i++)
// DriverObject->MajorFunction[i] = HandleFilterFunction;
We’ve seen similar code earlier in this chapter. The above code uses a C++ reference to change all
major functions to point to HandleFilterFunction, which we’ll meet very soon. Finally, we need
to save the returned device object for convenience in the global g_Data (DevMonManager) object and
initialize it:
Chapter 15: Miscellaneous Topics 592
g_Data.CDO = DeviceObject;
g_Data.Init(DriverObject);
return status;
}
The Init method just initializes the fast mutex and saves the driver object pointer for later use with
IoCreateDevice (which we covered in the previous section).
We will not be using a remove lock in this driver to simplify the code. The reader is encouraged to
add support for a remove lock as described earlier in this chapter.
Before we dive into that generic dispatch routine, let’s take a closer look at the unload routine. When
the driver is unloaded, we need to delete the symbolic link and the CDO as usual, but we also must
detach from all currently active filters. Here is the code:
g_Data.RemoveAllDevices();
}
The key piece here is the call to DevMonManager::RemoveAllDevices. This function is fairly
straightforward, leaning on DevMonManager::RemoveDevice for the heavy lifting:
void DevMonManager::RemoveAllDevices() {
Locker locker(Lock);
for (int i = 0; i < MaxMonitoredDevices; i++)
RemoveDevice(i);
}
Handling Requests
The HandleFilterFunction dispatch routine is the most important piece of the puzzle. It will be
called for all major functions, targeted to one of the filter devices or the CDO. The routine must make
that distinction, and this is exactly why we saved the CDO pointer earlier. Our CDO supports create,
close and DeviceIoControl. Here is the initial code:
Chapter 15: Miscellaneous Topics 593
case IRP_MJ_DEVICE_CONTROL:
return DevMonDeviceControl(DeviceObject, Irp);
}
return CompleteRequest(Irp, STATUS_INVALID_DEVICE_REQUEST);
}
If the target device is our CDO, we switch on the major function itself. For create and close we simply
complete the IRP successfully by calling a helper function we met in chapter 7:
Next, we’ll get the thread that issued the request by digging deep into the IRP and then get the thread
and process IDs of the caller:
Chapter 15: Miscellaneous Topics 594
In most cases, the current thread is the same one that made the initial request, but it doesn’t have to
be - it’s possible that a higher-layer filter received the request, did not propagate it immediately for
whatever reason, and later propagated it from a different thread.
Now it’s time to output the thread and process IDs and the type of operation requested:
The MajorFunctionToString helper function just returns a string representation of a major function
code. For example, for IRP_MJ_READ it returns “IRP_MJ_READ”.
At this point the driver can further examine the request. If IRP_MJ_DEVICE_CONTROL was received, it
can look at the control code and the input buffer. If it’s IRP_MJ_WRITE, it can look at the user’s buffer,
and so on.
This driver can be extended to capture these requests and store them in some list (as we did in
chapters 8 and 9, for example), and then allow a user mode client to query for this information. This
is left as an exercise for the reader.
Finally, since we don’t want to hurt the operation of the target device, we’ll pass the request along
unchanged:
IoSkipCurrentIrpStackLocation(Irp);
return IoCallDriver(ext->LowerDeviceObject, Irp);
}
The DevMonDeviceControl function mentioned earlier is the driver’s handler for IRP_MJ_DEVICE_-
CONTROL. This is used to add or remove devices from filtering dynamically. The defined control codes
are as follows (in KDevMonCommon.h):
Chapter 15: Miscellaneous Topics 595
#define IOCTL_DEVMON_ADD_DEVICE \
CTL_CODE(DEVMON_DEVICE, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_REMOVE_DEVICE \
CTL_CODE(DEVMON_DEVICE, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_REMOVE_ALL \
CTL_CODE(DEVMON_DEVICE, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_START_MONITOR \
CTL_CODE(DEVMON_DEVICE, 0x803, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_STOP_MONITOR \
CTL_CODE(DEVMON_DEVICE, 0x804, METHOD_NEITHER, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_ADD_DRIVER \
CTL_CODE(DEVMON_DEVICE, 0x805, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DEVMON_REMOVE_DRIVER \
CTL_CODE(DEVMON_DEVICE, 0x806, METHOD_BUFFERED, FILE_ANY_ACCESS)
switch (code) {
case IOCTL_DEVMON_ADD_DEVICE:
case IOCTL_DEVMON_REMOVE_DEVICE:
{
auto buffer = (WCHAR*)Irp->AssociatedIrp.SystemBuffer;
auto len = stack->Parameters.DeviceIoControl.InputBufferLength;
if (buffer == nullptr || len < 2 || len > 512) {
status = STATUS_INVALID_BUFFER_SIZE;
break;
}
case IOCTL_DEVMON_REMOVE_ALL:
{
g_Data.RemoveAllDevices();
status = STATUS_SUCCESS;
break;
}
}
Here is the main function of the user mode client (very little error handling):
DWORD bytes;
if (_wcsicmp(cmd, L"add") == 0) {
if (!DeviceIoControl(hDevice, IOCTL_DEVMON_ADD_DEVICE, argv[2],
DWORD(::wcslen(argv[2]) + 1) * sizeof(WCHAR), nullptr, 0,
&bytes, nullptr))
return Error("Failed in add device");
printf("Add device %ws successful.\n", argv[2]);
Chapter 15: Miscellaneous Topics 597
return 0;
}
else if (_wcsicmp(cmd, L"remove") == 0) {
if (!DeviceIoControl(hDevice, IOCTL_DEVMON_REMOVE_DEVICE, argv[2],
DWORD(::wcslen(argv[2]) + 1) * sizeof(WCHAR), nullptr, 0,
&bytes, nullptr))
return Error("Failed in remove device");
printf("Remove device %ws successful.\n", argv[2]);
return 0;
}
else if (_wcsicmp(cmd, L"clear") == 0) {
if (!DeviceIoControl(hDevice, IOCTL_DEVMON_REMOVE_ALL,
nullptr, 0, nullptr, 0, &bytes, nullptr))
return Error("Failed in remove all devices");
printf("Removed all devices successful.\n");
}
else {
printf("Unknown command.\n");
return Usage();
}
return 0;
}
sc start devmon
As a first example, we’ll launch Process Explorer (must be running elevated so its driver can be
installed if needed), and filter requests coming to it:
Remember that WinObj shows a device named ProcExp152 in the Device directory of the object
manager namespace. We can launch DbgView from SysInternals elevated, and configure it to log
kernel output. Here is some example output:
Chapter 15: Miscellaneous Topics 598
It should be no surprise to find out the process ID of Process Explorer on that machine is 5432 (and
it has a thread with ID 8820). Clearly, Process Explorer sends to its driver requests on a timely basis,
and it’s always IRP_MJ_DEVICE_CONTROL.
The devices that we can filter can be viewed with WinObj, mostly in the Device directory, shown in
Figure 15-14.
Now press some keys. You’ll see that for every key pressed you get a line of output. Here is some of
it:
What is this process 612? This is an instance of Csrss.exe running in the user’s session. One of Csrss’
duties is to get data from input devices. Notice it’s a read operation, which means some response
buffer is expected from the keyboard class driver. But how can we get it? We’ll get to that in the next
section.
You can try out other devices. Some may fail to attach (typically those that are open for exclusive
access), and some are not suited for this kind of filtering, especially file system drivers.
Here is an example with the Multiple UNC Provider device (MUP):
Navigate to some network folder and you’ll see lots of activity similar to what you see here:
056 11:47:25 driver: \FileSystem\FltMgr: PID: 7212, TID: 8272, MJ=2 (IRP_MJ_CLO\
SE)
057 11:47:25 driver: \FileSystem\FltMgr: PID: 7212, TID: 8272, MJ=5 (IRP_MJ_QUE\
RY_INFORMATION)
...
094 11:47:25 driver: \FileSystem\FltMgr: PID: 6164, TID: 6620, MJ=0 (IRP_MJ_CRE\
ATE)
095 11:47:25 driver: \FileSystem\FltMgr: PID: 7212, TID: 7288, MJ=0 (IRP_MJ_CRE\
ATE)
096 11:47:25 driver: \FileSystem\FltMgr: PID: 6164, TID: 6620, MJ=5 (IRP_MJ_QUE\
RY_INFORMATION)
097 11:47:25 driver: \FileSystem\FltMgr: PID: 6164, TID: 6620, MJ=18 (IRP_MJ_CL\
EANUP)
098 11:47:25 driver: \FileSystem\FltMgr: PID: 7212, TID: 7288, MJ=5 (IRP_MJ_QUE\
RY_INFORMATION)
099 11:47:25 driver: \FileSystem\FltMgr: PID: 6164, TID: 6620, MJ=2 (IRP_MJ_CLO\
SE)
100 11:47:25 driver: \FileSystem\FltMgr: PID: 7212, TID: 7288, MJ=12 (IRP_MJ_DI\
RECTORY_CONTROL)
...
Notice the layering is on top of the Filter Manager we met in chapter 10. Also notice that multiple
processes are involved (both are Explorer.exe instances). The MUP device is a volume for the Remote
file system. This type of device is best filtered with a file system mini-filter.
Results of Requests
The generic dispatch handler we have for the DevMon driver only sees requests coming in. These can
be examined, but an interesting question remains - how can we get the results of the request? Some
driver down the device stack is going to call IoCompleteRequest. If the driver is interested in the
results, it must set up an I/O completion routine.
As discussed in chapter 7, completion routines are invoked in reverse order of registration when
IoCompleteRequest is called. Each layer in the device stack (except the lowest one) can set up a
completion routine to be called as part of request completion. At this time, the driver can inspect the
IRP’s status, examine output buffers, etc.
Setting up a completion routine is done with IoSetCompletionRoutine or (better)
IoSetCompletionRoutineEx. Here is the latter’s prototype:
Chapter 15: Miscellaneous Topics 601
NTSTATUS IoSetCompletionRoutineEx (
_In_ PDEVICE_OBJECT DeviceObject,
_In_ PIRP Irp,
_In_ PIO_COMPLETION_ROUTINE CompletionRoutine,
_In_opt_ PVOID Context, // driver defined
_In_ BOOLEAN InvokeOnSuccess,
_In_ BOOLEAN InvokeOnError,
_In_ BOOLEAN InvokeOnCancel);
Most of the parameters are pretty self-explanatory. The last three parameters indicate for which IRP
completion status to invoke the completion routine:
• If InvokeOnSuccess is TRUE, the completion routine is called if the IRP’s status passes the NT_-
SUCCESS macro.
• If InvokeOnError is TRUE, the completion routine is called if the IRP’s status fails the NT_-
SUCCESS macro.
• If InvokeOnCancel is TRUE, the completion routine is called if the IRP’s status is STATUS_-
CANCELLED, which means the request has been canceled.
NTSTATUS CompletionRoutine (
_In_ PDEVICE_OBJECT DeviceObject,
_In_ PIRP Irp,
_In_opt_ PVOID Context);
The completion routine is called by an arbitrary thread (the one that called IoCompleteRequest) at
IRQL DISPATCH_LEVEL (2). This means all the rules for IRQL 2 must be followed.
What can the completion routine do? It can examine the IRP’s status and buffers, and can call
IoGetCurrentIrpStackLocation to get more information from the IO_STACK_LOCATION. It must
not call IoCompleteRequest, because this already happened (this is the reason we are in the
completion routine in the first place).
What about the return status? There are actually only two options here: STATUS_MORE_PROCESSING_-
REQUIRED and everything else. Returning that special status tells the I/O manager to stop propagating
the IRP up the device stack and “undo” the fact the IRP was completed. The driver claims ownership
of the IRP and must eventually call IoCompleteRequest again (this is not an error). This option is
mostly for hardware-based drivers and will not be discussed further in this book.
Any other status returned from the completion routine continues propagation of the IRP up the device
stack, possibly calling other completion routines for upper layer drivers. In this case, the driver must
mark the IRP as pending if the lower device marked it as one:
Chapter 15: Miscellaneous Topics 602
if (Irp->PendingReturned)
IoMarkIrpPending(Irp); // sets SL_PENDING_RETURNED in irpStackLoc->Control
This is necessary because the I/O manager does the following after the completion routine returns:
The exact reasons for all these intricacies are beyond the scope of this book. The best
source of information on these topics is Walter Oney’s excellent book, “Programming
the Windows Driver Model”, second edition (MS Press, 2003). Although the book is old
(covering Windows XP), and it’s about hardware device drivers only, it’s still quite relevant
and has some great information.
Driver Hooking
Using filter drivers described in this chapter and in chapter 10 provides a lot of power to a driver
developer: the ability to intercept requests to almost any device. In this section I’d like to mention
another technique, that although not “official”, may be quite useful in certain cases.
This driver hooking technique is based on the idea of replacing dispatch routine pointers of running
drivers. This automatically provides “filtering” for all devices managed by that driver. The hooking
driver will save the old function pointers and then replace the major function array in the driver object
with its own functions. Now any request coming to a device under control of the hooked driver will
invoke the hooking driver’s dispatch routines. There is no extra device objects or any attaching going
on here.
Some drivers are protected by PatchGuard against these kinds of hooks. A canonical
example is the NTFS file system driver - on Windows 8 and later - cannot be hooked
in that way. If it is, the system will crash after a few minutes.
PatchGuard (officially called Kernel Patch Protection) is a kernel mechanism that hashes
various data structures that are considered important, and if any change is detected - will
crash the system. A classic example is the System Service Dispatch Table (SSDT) which
points to system services (system calls).
Chapter 15: Miscellaneous Topics 603
Drivers have names and thus are part of the Object Manager’s namespace, residing in the Driver
directory, shown with WinObj in Figure 15-15 (must run elevated to see the contents of the Driver
directory).
To hook a driver, we need to locate the driver object pointer (DRIVER_OBJECT), and to do that we can
use an undocumented, but exported, function that can locate any object given its name:
NTSTATUS ObReferenceObjectByName (
_In_ PUNICODE_STRING ObjectPath,
_In_ ULONG Attributes,
_In_opt_ PACCESS_STATE PassedAccessState,
_In_opt_ ACCESS_MASK DesiredAccess,
_In_ POBJECT_TYPE ObjectType,
_In_ KPROCESSOR_MODE AccessMode,
_Inout_opt_ PVOID ParseContext,
_Out_ PVOID *Object);
UNICODE_STRING name;
RtlInitUnicodeString(&name, L"\\driver\\kbdclass");
PDRIVER_OBJECT driver;
auto status = ObReferenceObjectByName(&name, OBJ_CASE_INSENSITIVE,
nullptr, 0, *IoDriverObjectType, KernelMode,
nullptr, (PVOID*)&driver);
if(NT_SUCCESS(status)) {
// manipulate driver
ObDereferenceObject(driver); // eventually
}
The hooking driver can now replace the major function pointers, the unload routine, the add device
routine, etc. Any such replacement should always save the previous function pointers for unhooking
when desired and for forwarding the request to the real driver. Since this replacement must be done
atomically, it’s best to use InterlockedExchangePointer to make the exchange atomically.
The following code snippet demonstrates this technique:
A fairly complete example of this hooking technique can be found in my DriverMon project on
Github at https://github.com/zodiacon/DriverMon.
Implement a driver that hooks other drivers using this technique. Create a user-mode
client that can hook a specified driver on the command line.
Kernel Libraries
In the course of writing drivers, we developed some classes and helper functions that can be used in
multiple drivers. It makes sense, though, to package them in a single library that we can then reference
instead of copying source files from project to project.
The project templates provided with the WDK don’t explicitly provide a static library for drivers, but
it’s fairly easy to make one. The way to do this is to create a normal driver project (based on WDM
Chapter 15: Miscellaneous Topics 605
Empty Driver for example), and then just change the project type to a static library as shown in Figure
15-16.
A driver project that wants to link to this library just needs to add a reference with Visual Studio
by right-clicking the References node in Solution Explorer, choosing Add Reference… and checking
the library project. Figure 15-17 shows the references node of an example driver after adding the
reference.
Summary
Kernel programming is a vast topic, some parts of which we covered in this book. Obviously, there
is more. Most kernel driver topics are documented in the WDK, and if you followed the book you
should have a much easier time reading that documentation.
Chapter 15: Miscellaneous Topics 606
The KTL is a work in progress. Interested readers are welcome to contribute by providing pull
requests and raising issues. The KTL can be found at
https://github.com/zodiacon/windowskernelprogrammingbook2e/tree/master/ktl
Standard Library
The std.h file adds support for move semantics with a std::move function that behaves like its user-
mode counterpart. This allows adding move semantics to kernel types.
Synchronization
Several wrappers are provided to deal with thread and processor synchronization. All have an Init
method, as well as Lock and Unlock.
Memory
The new and delete operator are overloaded, with an enumeration that makes it less likely to get an
error with the pool flags (memory.h and Memory.cpp):
Appendix: The Kernel Template Library 608
The LookasodeList template class is a wrapper around lookaside lists (either paged or non-paged).
See LookasideList.h.
Strings
The BasicString<> template class provides support for a variable-length string, either UTF-16 or
ANSI, based on template arguments:
Containers
The Vector<> template class abstract a dynamic array of objects that are trivially constructible and
copyable, i.e. don’t have dynamic memory internally. Examples are integers, and plain structure.