0% found this document useful (0 votes)
25 views21 pages

safirefuzz

The document presents S AFIRE F UZZ, a high-performance framework for fuzzing embedded ARM firmware through near-native rehosting, which significantly improves execution speed and vulnerability discovery compared to existing methods. It achieves an average speedup of 690x over the state-of-the-art HALucinator and integrates in-process fuzzing with dynamic binary rewriting techniques. The authors evaluate its effectiveness by implementing fuzzing harnesses for various firmware samples, demonstrating its superiority in uncovering security vulnerabilities in embedded systems.

Uploaded by

perumayilsageer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views21 pages

safirefuzz

The document presents S AFIRE F UZZ, a high-performance framework for fuzzing embedded ARM firmware through near-native rehosting, which significantly improves execution speed and vulnerability discovery compared to existing methods. It achieves an average speedup of 690x over the state-of-the-art HALucinator and integrates in-process fuzzing with dynamic binary rewriting techniques. The authors evaluate its effectiveness by implementing fuzzing harnesses for various firmware samples, demonstrating its superiority in uncovering security vulnerabilities in embedded systems.

Uploaded by

perumayilsageer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Forming Faster Firmware Fuzzers

Lukas Seidel1 , Dominik Maier2 , and Marius Muench3


1 Qwiet
AI, [email protected]
2 TU
Berlin, [email protected]
3 VU Amsterdam and University of Birmingham, [email protected]

Abstract from TCP packets. Given the importance of embedded sys-


A recent trend for assessing the security of an embedded tems and the extended attack surface, testing their firmware’s
system’s firmware is rehosting, the art of running the firmware security is essential.
in a virtualized environment, rather than on the original hard- A common approach for security analysis is fuzzing, an
ware platform. One significant use case for firmware rehosting automated process in which a fuzzer feeds semi-random, po-
is fuzzing to dynamically uncover security vulnerabilities. tentially malformed input to the target, to find buggy corner
However, state-of-the-art implementations suffer from high cases. Due to its effectiveness, the field has become an increas-
emulator-induced overhead, leading to less-than-optimal exe- ingly popular research topic [41]. The recent field of rehosting
cution speeds. Instead of emulation, we propose near-native commonly enables fuzzing of an embedded device’s firmware
rehosting: running embedded firmware as a Linux userspace by creating virtual execution environments [24, 47]. To au-
process on a high-performance system that shares the instruc- tomatically generate inputs that are accepted by the target,
tion set family with the targeted device. We implement this modern fuzzers use feedback about the target’s last execu-
approach with S AFIRE F UZZ, a throughput-optimized rehost- tion to choose which inputs to mutate further. As the inputs
ing and fuzzing framework for ARM Cortex-M firmware. gradually cover a larger amount of the target’s code for each
S AFIRE F UZZ takes monolithic binary-only firmware images execution, throughput naturally decreases. Since every test-
and uses high-level emulation (HLE) and dynamic binary case spends more time in the target, the fuzzer can execute
rewriting to run them on far more powerful hardware with fewer testcases per second, which slows down further explo-
low overhead. By replicating experiments of HALucinator, ration of the state space. Hence, one angle to further increase
the state-of-the-art HLE-based rehosting system for binary fuzzing efficacy is to vastly increase execution speed. Testing
firmware, we show that S AFIRE F UZZ can provide a 690x the program against the same inputs in less time gives the
throughput increase on average during 24-hour fuzzing cam- fuzzer more time to explore the target further.
paigns while covering up to 30% more basic blocks. In this paper, we propose S AFIRE F UZZ, a new and perfor-
mant rehosting and fuzzing approach for embedded binary-
1 Introduction only ARM firmware. Instead of rehosting target firmware
using a general-purpose emulator, as done by prior work
Embedded systems have become ubiquitous. These special- (e.g., [20,33,35,46,50]), we deploy a technique which we term
purpose computing devices are employed in all areas of every- near-native rehosting. The core insight behind our approach
day life, such as automotive systems, networking equipment, is that powerful server and desktop ARM computing devices
healthcare machines, smart-home devices, and more. Since provide execution modes and instruction sets sufficiently sim-
they become evermore connected, protecting their confiden- ilar to the ones present on embedded systems. Based on this
tiality, integrity and availability gets increasingly important. observation, we create a dynamic binary rewriting engine to
In contrast to traditional computers, embedded devices of- run firmware directly on a more powerful system, while insert-
ten do not run a full-fledged operating system, but a mono- ing fuzzing instrumentation on-the-fly. To deal with hardware
lithic software stack that handles every aspect of the system: interactions, we follow the High-Level Emulation (HLE) ap-
memory management, interrupts, processing of user data, and proach proposed by HALucinator [20] and replace hardware
hardware interactions. This so-called device firmware of- interactions with HAL-based hooks.
ten exposes a lot of complex functionality, such as drivers As we will show, our approach significantly improves ex-
and custom parsers, to external sources that may be attacker- ecution speed when compared with recent rehosting frame-
controlled, for example via wireless data over the air, or even works, leading to improved fuzzing efficacy and the discov-

1
ery of previously undetected bugs. In particular, we evaluate 2.2 ARM Cortex-A/M
S AFIRE F UZZ against HALucinator [20], the state-of-the-art
HLE rehosting approach, and Fuzzware, a recent peripheral- ARM is one of the most popular instruction set architecture
modeling-based rehosting approach. Our evaluation shows families for embedded systems [12]. In particular, the 32-bit
that S AFIRE F UZZ can provide an average speedup of 690x ARMv7-M and ARMv7-A variants are widely used, due to
when compared to HALucinator and up to 147x compared to their low cost and energy efficiency. ARMv7-A, targeting
Fuzzware, resulting in the additional discovery of up to 30% more complex embedded systems, features two different exe-
additional basic blocks during 24-hour fuzzing runs. cution modes: In ARM mode, the processor executes instruc-
tions with a fixed size of four bytes at a four-byte alignment.
In summary, we make the following contributions:
In Thumb mode, instructions consist either of two or four
• We propose S AFIRE F UZZ: a high-performance near- bytes, and the resulting two-byte alignment allows for much
native rehosting framework for interactive execution of denser packing of code [3] which is favorable for resource-
embedded ARM firmware. constrained embedded systems. These modes are switchable
on the fly via the—otherwise unused—least significant bit for
• We prove its applicability for highly efficient fuzzing of branch targets, whereas a 1 indicates execution in thumb-, and
ARMv7-M binary firmware images. For this, we tightly 0 execution in ARM mode at the target location. ARMv7-M,
integrate in-process fuzzing with dynamic binary rewrit- on the other hand, specifically targets microcontrollers and
ing techniques, in conjunction with Hardware Abstrac- only implements the Thumb-v2 instruction set.
tion Layer (HAL) function hooking. Beyond that, the more recent instruction set families
ARMv8-A and ARMv9-A added the AArch64 extension,
• We evaluate S AFIRE F UZZ by implementing fuzzing har- which provides a 64-bit instruction set. While CPUs imple-
nesses for 12 firmware samples from the HALucinator menting these families usually target mobile and desktop
test suite and compare its performance against recent devices, the extension provides support for executing in 32-
rehosting approaches. We also rehost two new samples bit mode on lower exception levels (i.e., EL1 and EL2) while
from scratch. Our evaluation shows that near-native re- using a 64-bit OS or hypervisor.
hosting outperforms rehosting approaches built on top
of general-purpose emulators.
2.3 Fuzzing
Fuzzing is a popular approach for automatic vulnerability dis-
2 Background covery, able to uncover a multitude of different vulnerabilities.
These include, but are not limited to, memory corruption bugs,
2.1 Embedded Systems & Firmware such as buffer overflows, double frees and use-after-frees, and
logic bugs, such as integer overflows, infinite loops and even
Embedded systems are characterized by their use of firmware,
race conditions. In coverage-guided fuzzing, the fuzzer uses
which is responsible for driving the device’s hardware and
execution feedback to determine interesting inputs. For this,
offering higher-level functionalities at the same time. For
the fuzzer adds instrumentation to the target program, either at
interacting with the hardware, the firmware typically uses one
compile-time, if source code is available, or later on a binary
of the following channels.
level. This instrumentation reports coverage information back
Memory-Mapped Input/Output (MMIO) assigns a range in
to the fuzzing engine, e.g., by tracking executed branches in
the physical memory to each peripheral. Each of these ranges
a bit map. When an input generates unique coverage, it is
is divided into MMIO registers. Accessing these registers
added to the fuzzing corpus to be subsequently mutated for
allows the firmware to directly interact with its peripherals,
the generation of new test cases.
for instance, to read data from an external source or to turn on
a LED. Port-Mapped Input/Output (PMIO) behaves similarly
to MMIO, with the difference that specialized instructions 3 Motivation
enable the interaction via IO ports. Direct Memory Access
(DMA) allows firmware to bypass the CPU when transfer- Rehosting, the automated creation of virtual execution en-
ring data between peripherals and main memory. Instead of vironments of embedded firmware, combines multiple ap-
blocking the CPU for the whole duration of the slow memory proaches to overcome the challenges associated with emulat-
transfer, the CPU only needs to initiate the process by interact- ing the (potentially unknown) peripherals of an embedded
ing with a specialized DMA controller peripheral. Peripherals system [24]. While early work relied on hardware-in-the-
indicate the occurrence of specific events (e.g., the arrival loop emulation to offload unknown accesses to a physical de-
of new data) via Interrupts. Based on the configuration of vice [35, 37, 48], hardware-less rehosting became the de-facto
the device’s interrupt controller, the firmware then resumes standard to enable coverage-guided fuzzing of embedded sys-
execution at a designated Interrupt Service Routine (ISR). tems [20, 25, 31, 33, 40, 45, 46, 51].

2
Rehosting System Approach Emulator Binary rehosting work focuses solely on the ARM architec-
PRETENDER [31] Pattern-based QEMU y ture. Hence, we argue that a high-throughput firmware
HALucinator [20] HAL-based Unicorn y
PartEmu [32] Pattern-based QEMU y fuzzer may consider alternative strategies for additional
P2IM [25] Pattern-based QEMU y performance, such as direct binary translation or binary
Frankenstein [45] HAL-based QEMU y rewriting.
Para-Rehosting [38] HAL-based N/A n
JetSet [34] Symbolic Execution QEMU y [R2] Expensive Dispatch of Memory Accesses. Mono-
uEmu [51] Symbolic Execution S2E y
FirmWire [33] Pattern-based Panda y
lithic firmware usually resides in a single flat address
FUZZWARE [46] Symbolic Execution Unicorn y space. However, QEMU was developed for more com-
SEmu [52] Specification-guided QEMU y* plex systems deploying an MMU. To emulate this across
* Requires access to device documentation.
all systems, its so-called SoftMMU dispatches memory
accesses. This leads to significant performance over-
Table 1: Survey of recent firmware rehosting solutions.
head [17]. Being able to directly access the guest mem-
ory without indirection would greatly benefit guest exe-
Scharnowski et al. [46] classify the approaches to overcome cution speed, and hence, fuzzing.
unknown peripheral behavior deployed by hardware-less re-
[R3] Basic Block Caching & Chaining. One of the core per-
hosting approaches in three categories: (1) high-level emula-
formance optimizations of QEMU is the ability to cache
tion, (2) pattern-based MMIO modeling, and (3) symbolic-
already translated blocks and chain the execution of mul-
execution-based approaches. The first category aims to elimi-
tiple blocks together. However, this optimization was
nate hardware accesses of virtualized firmware by hooking
not available in early adaptations of AFL-QEMU [11].
library functions of the hardware-abstraction layer. The sec-
While mainlined in AFL++ [26] and resulting, we ob-
ond approach uses heuristics to categorize MMIO registers
serve that various rehosting solutions were developed on
and then uses pre-defined models to respond to accesses. The
top of legacy versions and the resulting lack of adoption
third approach aims to resolve values advancing firmware
severely hinders fuzzing performance.
execution on-the-fly via symbolic execution. Additionally,
recent work [52] proposes specification-guided emulation in [R4] Lack of In-Process Fuzzing. Up to now, rehosting
which MMIO peripheral models are derived from datasheets solutions run their fuzzing engine in a separate process.
and device documentation. However, this leads to unnecessary kernel interaction
We survey recent rehosting systems in Table 1 and note and context switches compared to a solution that embeds
that all approaches have been used to implement fuzzing cam- the fuzzer in the same process.
paigns for firmware. However, we also observe that, except
for Para-Rehosting [38] which requires source code access, We note that some roadblocks were partially addressed by
all surveyed rehosting systems rely on already existing em- prior work. For instance, FirmWire [33] deploys the basic
ulators to virtualize the firmware. We hypothesize that this block caching & chaining optimizations and Frankstein [45]
is a hindrance to fuzzing, as readily available emulators are uses QEMU’s user mode which eliminates the need for a
general-purpose tools and were never designed with fuzzing SoftMMU. However, to the best of our knowledge, no prior
in mind. work systematically tackled all roadblocks and explored the
possibility of a highly performant firmware fuzzer.
A pathway to faster firmware fuzzing. Closer inspection
of Table 1 yields another detail: All surveyed emulation-based 4 Design
rehosting systems either extend QEMU directly or build on
top of a QEMU-based emulator. This raises QEMU’s emula- 4.1 Overview
tion approach to the de-facto standard for firmware rehosting.
While QEMU offers huge extensibility and support for var- S AFIRE F UZZ acts as a highly-efficient rehosting and execu-
ious ISAs, we believe that relying solely on its emulation tion engine for firmware fuzzing by overcoming the road-
capabilities leads to trade-offs for fuzzing efficacy. In par- blocks described in Section 3. At the core of our proposed ap-
ticular, when inspecting the state of the art, we observe the proach stands a technique we term near-native rehosting. In-
following commonly accepted performance roadblocks: stead of emulating the firmware through lifting and recompila-
tion, (R1), we exploit the fact that certain ARMv8-A cores pro-
[R1] Binary Lifting & Recompilation. QEMU lifts guest vide userspace compatibility with the AArch32 and Thumb
code to TinyCode, its internal intermediate representa- instruction set variants. Hence, we can directly execute large
tion, before applying instrumentation and Just-In-Time parts of the firmware code on powerful cores through binary
(JIT) compiles each block to the host architecture. While instrumentation. As we mirror the memory layout of the em-
this results in support for various instruction sets, most bedded device in userspace, rewritten instructions do not need

3
additional logic to dispatch memory accesses, circumventing Algorithm 1 Binary Rewriting in S AFIRE F UZZ.
(R2). Additionally, our rewriting approach is optimized to for curr_addr in basic_block do
cache already instrumented blocks, minimizing engine over- if curr_addr in registered_hooks then
head (R3). Lastly, we embed the fuzzing logic in the same insert( jump_to_hook)
process space as the engine and the rewritten firmware to break
minimize the required interactions with the host operating end if
system (R4). insn = disas(curr_addr)
In the following, we will describe S AFIRE F UZZ’s core en- if requires_rewriting(insn) then
gine, our dynamic rewriting approach, as well as our solution insert(patch(insn))
to rehosting challenges. else
insert(insn)
4.2 Rehosting & Rewriting Engine end if
if is_delimiter(insn) then
S AFIRE F UZZ’s engine is responsible for executing the tar- insert(branch_to(rewrite_successors))
get firmware, handling rehosting aspects, rewriting instruc- insert(branch_to(resolve_branch_target))
tions, and insertion of fuzzing instrumentation. For dealing break
with unknown hardware peripherals, we loosely follow the end if
High-Level Emulation approach by Clements et al. [20], as end for
hooking corresponding HAL embeds easily into our rewriting copy(∗execution_code_site, rewritten_basic_block)
approach. Hence, the engine uses a firmware-specific harness.
This harness initializes memory ranges and registers HAL
hooks before target execution is started by rewriting the first the register contents statically is non-trivial. Consequently,
basic block at the specified entry point. we instrument the binary dynamically at runtime. As any ba-
The engine is responsible for translating basic blocks on sic block is never rewritten more than once, the approach’s
demand while adding instrumentation for fuzzing, transfer- one-time overhead is negligible. We describe the rewriting
ring execution to registered hooks where required, and de- process in Algorithm 1 and further illustrate the rewriting
ploying interrupt approximation mechanisms. While all of process on a basic block level in Figure 1. One upside of
these tasks are crucial, we note that as little time as possible our near-native rehosting approach is that a majority of all
should be spent inside the engine’s code or in the operating instructions require no rewriting at all, as there is no mismatch
system. Hence, the engine only rewrites basic blocks when between the AArch32 execution state on the Cortex-A core
their immediate predecessor gets executed for the first time, we are utilizing and the Cortex-M the firmware was built for.
and keeps a cache of already instrumented blocks. During Cortex-M processors make use of Thumb-v2 instructions, of
the initial rewriting and execution of a block, multiple jumps which most can be executed natively and without divergence
to the engine may be required. After an emitted basic block on an ARMv8-A target platform with AArch32 mode sup-
has been executed completely for the first time, the engine port1 . PC-modifying instructions and PC-relative memory
eliminates all, now unnecessary, jumps from this block back accesses are the two only classes of instructions that require
to itself. This way, rewrites of successive blocks and branch rewriting. This stands in contrast to the usually deployed
resolution have to happen only once. The instrumentation processor-emulation-based rehosting techniques, where in-
overhead is constant for one engine run. structions for one architecture are executed on another and all
instructions require translation.
4.3 Basic Block Rewriting
The simplest approach to dynamic rewriting is the in-place 4.4 Function Hooking
replacement of instructions. While solutions following this
approach exist to replace single instructions [22], we argue While rewriting a new basic block, the engine checks whether
that this is not possible in the general case, and especially not a hook is registered for the current address. A user can supply
for ARMv7-M, as replaced instructions may be larger than such a function, written in a high-level programming lan-
the original ones. On top, we need to insert additional instruc- guage. The engine emits a jump to the user-supplied code
tions for instrumentation and hooking. This shifts the relative that will then execute at block execution time. The hooking
position of instructions to each other. As many instructions locations in S AFIRE F UZZ are restricted to function hooks
on ARM operate in a PC-relative fashion and jump targets by design. As all (register) state we alter is shared between
cannot be guaranteed to be preserved. While calculating tar- 1 Exceptions are low-level system instructions like svc, swi, or mrc/mcr.
gets of dynamic branches and the required alterations are However, due to our HAL-based rehosting approach, functions including
possible with control-flow recovery heuristics [44], inferring them are never executed.

4
Rewritten Basic Block
Original Basic Block Rewritten Basic Block
after first Execution

0x10000: movs r0, #0 movs r0, #0 movs r0, #0


0x10002: movs r1, #0 movs r1, #0 movs r1, #0
0x10004: PC-relative: movt r3, #0x1 movt r3, #0x1
ldr r3, [pc, #0x30]
0x10006: cmp r3, #1 rewrite to
load from
{ movw
ldr
r3, #0x34
r3, [r3]
movw
ldr
r3, #0x33
r3, [r3]
0x10008: beq #0x20e cmp r3, #1 cmp r3, #1
absolute
address push {r0-r12, lr} b #12
mov r0, #SUCC_0_ADDR mov r0, #SUCC_0_ADDR
blx rewrite_bb blx rewrite_bb
mov r0, #SUCC_1_ADDR mov r0, #SUCC_1_ADDR
blx rewrite_bb blx rewrite_bb
blx resolve_branch blx resolve_branch
pop {r0-r12, lr} pop {r0-r12, lr}
nop beq #RESOLVED_ADDR

Figure 1: Example of S AFIRE F UZZ rewriting a basic block. The new block resides somewhere in the rewritten code site (cf.
Fig. 2). Jumps into the engine are simplified and require setting up the branch target registers in reality. After the first slow-path
execution, jumps back into the engine are skipped on subsequent hot-path executions. Context save and restore are also simplified
and usually involve the preservation of processor condition flags.

the executed firmware and user code, writing hooks on a per- 4.5 Interrupt Approximation
instruction base would be more complex when harnessing a
new target. Also, we found that function hooks are in all cases Many embedded devices rely on interrupts for signal deliv-
sufficient to stub out hardware interactions for HLE rehost- ery and the processing of asynchronous, external events. The
ing. This restriction allows us to gain runtime performance, SysTick timer, for instance, is present in many ARM micro-
as we can make certain assumptions in our state save and controllers. Monolithic Real-Time OSs (RTOS) use it to poll
restore routines, as long as the firmware respects common MMIO registers and schedule new tasks. Therefore, our re-
ARM calling conventions. Our approach’s introduced over- hosting solution requires a way of simulating interrupts in
head is minimal: The context save and restore, including the order to accurately execute such firmware.
jump into the hook, comprises five instructions. Such hooks, For this purpose, we implement tick-based interrupts. Al-
also called handlers, are designed to model the behavior of a though it would be theoretically possible to translate inter-
part of the firmware. They replace calls to functions on the rupts and let the Cortex-A host system handle them in our
Hardware Abstraction Layer during execution and thereby near-native scenario, interrupts on embedded devices are com-
mask peripheral accesses in the firmware. This procedure is monly triggered by external peripherals and their ISRs will
the core of peripheral management in HLE-based systems try to access them via MMIO, hence requiring a rewrite in
and our near-native approach introduces no intrinsic draw- any case. Previous rehosting work has shown that interrupt
backs. Commonly implemented functionalities are simulation approximation is sufficient to model firmware behavior [25]
and handling of interrupts, accepting external data during and we see it as beneficial for fuzzing: our tick-based counter
fuzzing and making it available to the firmware, or replacing leads to higher determinism, thus resulting in reproducible
the memory allocator with a sanitizing one to increase the and analyzable program traces. Improving on HALucinator,
observability of security violations. As we model functions on which uses a similar approach with basic-block-level counters
the HAL in our approach, cross-firmware reuse is facilitated. to trigger a timer, our approach uses indirect call-level coun-
Firmware images using the same system libraries or being ters and manual clock-update hooks. We update timers and
developed for the same microcontroller often share common trigger interrupts at specific points in the execution. Hook-
hardware abstractions. Which firmware functions are hooked ing every single basic block would introduce unnecessary
with which handler is determined by the user within a harness performance overhead.
(cf. 5.3). A firmware’s harness can be seen as a specification
of domain knowledge in code. They are highly specific to a 5 Implementation
certain processor or microcontroller model and have to take
care of handling a firmware’s characteristics, from setting an 5.1 Engine Internals
entry point to providing correctly mapped memory.
We implemented S AFIRE F UZZ using the Rust programming
language. The engine core consists of 1481 source lines of
code. Another 1716 make up the entirety of implemented

5
HAL handlers, the harnesses for the 14 evaluated targets cached. Finally, the address of the branch target is returned in
add up to 2360 lines. Disassembling is performed with Cap- r0. In the new basic block, we overwrite the register-specific
stone [4]. For the assembly of modified or new instructions to offset on the stack with the returned value, giving us control
be emitted, we use Keystone [7]. over the register content after restoring the execution con-
The engine handles all tasks to allow the execution of text. We provide additional details on how we handle further
firmware in a foreign environment. Hence, it performs many control-flow modifying instructions in Appendix A.2.
tasks an emulator has to take care of as well.
Context Switching. All jumps back into the engine require
Branch Resolution. When the engine rewrites a basic block a context save and restore. In case of a jump into user-defined
that closes with a static branch, it emits a call back into the hooks, the routine is minimal: The engine pushes r1 to r11
engine. On the first execution of the basic block, this jump into on the stack and pops them again upon the hook finishing.
the engine, also called slow path, gets replaced with a static Finally, the engine branches back to the Link Register (LR).
branch, resolving the original firmware address to the new This approach only induces negligible overhead. Furthermore,
location of the rewritten basic block. As such branch target it allows access to function parameters and the returning
calculations only need to be performed once, the removal of of values in a more natural way than in emulators such as
the jump into the engine reduces the overhead on forthcoming Unicorn. While in Unicorn a hook has to use API calls to
executions. Executing all these slow path functions exactly read and write register contents, our framework exposes the
once per basic block and skipping them afterward is a major natural Application Binary Interface (ABI). Before resolving
performance optimization. branch targets in the engine, a full context save is required.
If instructions directly modify the PC in a not statically- This includes all 13 general-purpose registers, the LR, as well
resolvable way, i.e., using register contents, the engine needs as condition flags in the Application Program Status Register
to resolve the target at runtime. These include BX, BLX and (APSR).
MOV instructions with PC as the target register. The routine
resolving the correct address of the target basic block in the
rewritten code range performs the following steps: If the
Memory Accesses. After branches, PC-relative memory ac-
target address resides in our rewritten code range, we assume
cesses are the second-largest class of instructions the engine
a tailcall. We then simply OR the address with 1 to make
needs to modify when rewriting basic blocks to another loca-
sure that we stay in Thumb mode and return this address.
tion. In ARMv7-M, small chunks of data are often co-located
Otherwise, a cache lookup is performed. The engine rewrites
with the basic blocks that load from them, using a PC-relative
the new basic block if it is not already cached. Subsequently,
LDR. As it is non-trivial to infer how large the data segment
the real address of the rewritten block is returned.
behind a basic block will be, copying the data during basic
The branch resolution function is also one of the anchors
block rewriting is non-trivial. We therefore statically resolve
for our engine’s interrupt approximation. Interrupts can be
the address and replace the PC-relative load with an absolute
registered as execute-every-nth-tick, where a tick is a BLX
one from the original code site. Figure 2 shows the engine’s
jump or the execution of a manually placed hook. Every time
memory layout.
we resolve a dynamic jump, a branch counter is increased. If
interrupts are globally enabled, and any interrupt handlers are
registered, the engine triggers the interrupt if enough ticks Caching. To minimize time spent inside the engine during
have passed. hot-path execution, we cache various data points.
Jump tables are commonly compiled as loads of a fixed Reassembling instructions with Capstone and Keystone is
value from the data segment directly into the PC. When the very expensive. Consequently, we cache and reuse blocks
engine encounters such LDR PC, [...] instructions upon wherever possible. One example are wide branches (B.W)
rewriting, it emits a jump back into the engine. This han- emitted when replacing branches referring to locations in the
dler function checks whether the specific load was already original code site with branches targeting rewritten blocks.
resolved and cached.In the case of the first invocation, the new As the machine code instruction encodes an offset and not an
basic block gets rewritten, and the memory location the load absolute address, it has a high potential for reuse. We insert the
reads from is overwritten with the new address. On return assembled bytes into a fixed-size array and perform the lookup
from the engine, the original instruction is executed and reads by the required offset. Dynamic branching instructions such as
the adjusted value, correctly adjusting the PC. ARMv7-M BLX have to get resolved every time because register contents
additionally features Table Branches, causing a PC-relative and hence the jump target might change. This requires a
branch using an offset table. Again, execution is redirected lookup, using the original address in the firmware to retrieve
back to the engine where it performs the table lookup and the equivalent basic block’s location in the new code site. As
calculates the corresponding offset. With this information, the our map key in both cases is a value in a linear address space
target basic block is instrumented and lifted, if it is not already with a well-defined upper bound, we can use a simple array as

6
firmware RAM variable-size
firmware incl. rewritten malloc arena
data segment code site
...
image for custom
and
(a) stack (b) allocator (c)

0x0 code_len 0x20000000 0x20014000 0x30000000 0x30000000 0xff000000 0xffffffff


+
code_len*10

Figure 2: Example Memory Layout of S AFIRE F UZZ highlighting rehosting-specific regions. The rewritten code site contains the
new basic blocks after processing through the engine. Firmware execution happens here. Regions in parentheses include usual
process data: (a) contains the normal code segment, (b) contains shared libraries, (c) contains the program’s stack.

the data structure. Here, the i-th element contains the address option to specify a token file the fuzzer will use to mutate new
of the rewritten basic block for the original address i. This inputs. Tokens in AFL are domain-specific byte sequences,
allows us to do lookups in one instruction. We store such a such as tags in HTML or XML, facilitating the fuzzer’s job
mapping not only for targets of BLX instructions but for every to generate meaningful input. LibAFL’s Launcher compo-
address for which the engine emitted a new block. nent combines all these pieces and handles the launching and
restarting of fuzzing processes.
Processor Cache Maintenance. As our engine not only
emits new instructions once but also modifies already-emitted Coverage Tracking. In order to enable coverage-guided
basic blocks, we have to deal with the ARMv7-A core’s non- fuzzing, the engine needs to track coverage and make it
unified cache architecture. Such cores have separate, non- observable by the fuzzing backend. We implement non-
coherent caches for instruction and data accesses, potentially colliding edge coverage tracking by setting values in a static
leading to problems and heavy inconsistencies when it comes bit map whose address and size are known at compile time.
to self-modifying code. While it is possible to overwrite in- To track coverage, the engine inserts an additional basic block
structions in memory at runtime, the processor might still after every conditional and table branch. For every previously
execute old or invalid instructions due to missing coherence. unseen edge, S AFIRE F UZZ increases a global counter, acting
After every instruction rewrite, i.e., overwriting an instruction as a unique identifier and index into the global bit map. The
in memory that was already executed at least once, we need to inserted instructions update the corresponding entry upon
invalidate both caches for the corresponding memory range. execution. This approach, with the new basic block consisting
The next fetch on these instructions will then cache-miss and of seven instructions including only a single memory access,
the processor will correctly load the new version from mem- minimizes the introduced overhead while still providing the
ory. Cache flushes as well as cache misses are the exception. fuzzer with meaningful insights. We opted for a boolean
They are strictly necessary to guarantee the coherence of our coverage map as opposed to a hitcounts approach to further
rewriting approach, and the observed overhead during testing reduce overhead, as experiments have shown that plain edge
was negligible. coverage in many cases even outperforms AFL’s default
hitcount metric [28].
5.2 The Fuzzer
A core building block of our implementation is the fuzzer. Parallelism. The LibAFL Launcher allows scaling to an
LibAFL [27] provides the mutation backend. We chose its arbitrary number of cores. Multiple fuzzing instances can be
on-disk corpus to store the input queue as well as found ob- started automatically in parallel and information such as the
jectives. The harness acts as the entry point to our engine and coverage map is synchronized. Each core runs its own process
defines the firmware from the fuzzer’s point of view. First, it with an individual engine instance without shared caches.
retrieves the fuzzing input from LibAFL and then kicks off a This way we avoid executing expired or inconsistent views
single execution. As feedback mechanisms, we use a combi- of the rewritten code, potentially resulting from one instance
nation of a map observer, tracking the state of the coverage executing a basic block that is currently being rewritten by
map, which is updated on every conditional branch, as well as another, without expensive cache coherence attestation.
the execution time and timeouts. For scheduling, we employ
a strategy favoring small test cases. Mutations are performed
5.3 Harnessing
following AFL’s havoc approach, as implemented in LibAFL.
The method involves bit flips, integer overwrites, block dele- Harnesses in S AFIRE F UZZ are supplied at compile time and
tion and block duplication. Our framework also exposes the are written in Rust, as is the rest of the framework. To allow

7
flexible configuration while maintaining usability, the engine Firmware HAL # Hooked Functions
exposes a set of interfaces a harness’ developer has to im- WYCINWYC STM32 25
NXP HTTP mcuxpresso 23
plement, i.e., functions the engine can call to handle various SAMR21 HTTP SAMR21 23
parts of the rehosting process. This includes but is not limited 6LoWPAN Receiver Contiki 37
to a setup function, called once at engine initialization, e.g., 6LoWPAN Transmitter Contiki 29
to set up memory segments and copy the firmware image to P2IM Drone STM32 32
the correct position, and a reset function handling memory P2IM PLC STM32 32
STM PLC STM32 35
restoration and resetting timers and the custom allocator. TCP Echo Client STM32 31
TCP Echo Server STM32 29
UDP Echo Client STM32 31
6 Evaluation UDP Echo Server STM32 28

In our evaluation, we set out to answer the following three Table 2: Targets with their corresponding Hardware-
research questions: Abstraction Layer and the amount of hooked functions neces-
sary for successful rehosting.
RQ1. How does S AFIRE F UZZ com-
pare to the state of the art in
firmware fuzzing? Fuzzer Setup. We compare S AFIRE F UZZ against HALuci-
nator [20] and Fuzzware [46], the respective state-of-the-art
RQ2. What are the core performance HLE-based and peripheral-modeling based fuzzer.
gains and remaining roadblocks
To compare with HALucinator, we use hal-fuzz, its fuzzing-
for S AFIRE F UZZ?
oriented open source version [6]. This version is based on
RQ3. Can S AFIRE F UZZ identify pre- UnicornAFL [10] and uses legacy AFL [1] as fuzzing back-
viously unknown or undetected end. To investigate the impact of a more modern fuzzer
vulnerabilities? and increase comparability with S AFIRE F UZZ, we swap in a
libAFL-based backend as the mutation engine. We apply the
To answer (RQ1), we select 12 firmware targets from previ- same configuration parameters and mutation strategies as the
ous research on firmware security and rehosting [20,25,30,31, ones used by S AFIRE F UZZ and term this setup HALucinator -
42, 51] and compare the S AFIRE F UZZ’s fuzzing efficacy with libAFL. For Fuzzware, we use its AFL++ [26] fuzzing back-
HALucinator [6] and Fuzzware [46] in different configura- end with AFL++v3.14c. Unfortunately, we encountered non-
tions. Based on these results, we provide a detailed analysis of trivial bugs when running Fuzzware on our ARM platform
S AFIRE F UZZ’s performance (RQ2). Lastly, to answer (RQ3), which severely limited fuzzing performance. After consulting
we first discuss previously undetected bugs in the 12 firmware with the Fuzzware authors, which confirmed that ARM hosts
samples found by S AFIRE F UZZ and then apply our approach are not supported, we resorted to running Fuzzware on an x86
to two new targets. host. We used a Ubuntu 18.04 VM with 64 cores and 196 GB
Unless otherwise specified, we executed all experiments of RAM, hosted on an AMD EPYC 7662 server.
on a HoneyComb LX2 ARM workstation running a 64-bit For each fuzzer and target, we conducted five 24-hour runs
Ubuntu 18.04. This system features 16 ARM Cortex-A72 with each fuzzing process pinned to one designated core.
cores with a clock rate of up to 2 GHz, 32 GB DDR4 memory
with a frequency of 3200 MT/s and a 128 GB m.2 SSD.
Seed Selection. We use the seeds provided by HALucinator
for all setups except for Fuzzware, where we use the default
6.1 Experiment Setup
seeds. This is due to the different input semantics for the
Target Selection. We selected targets based on their preva- fuzzers: In contrast to HAL-level abstractions, it is not pos-
lence in prior research with a special focus on HALucina- sible to use known file formats as seed inputs for Fuzzware,
tor [20], as this is the most recent binary HAL-based rehosting as its inputs encode the interactions with the different MMIO
framework. All targets chosen for evaluation are compiled registers.
for Cortex-M cores. The specific S AFIRE F UZZ harnesses are
following the hooking and implementation of HALucinator
in order to ensure semantically identical behavior when be- Fidelity. Execution flow on the basic block level cannot be
ing presented with the same input. We provide an overview guaranteed to be 100% identical for the different frameworks
of the targets in Table 2 and detail hooked functions on the due to their implementation differences. We ensured that the
Hardware Abstraction Layer for four exemplary targets in simulated systems exposed functionally equivalent behavior
Appendix A.3. between HALucinator and S AFIRE F UZZ by comparing mes-

8
Firmware S AFIRE F UZZ HALucinator HALucinator - libAFL Fuzzware
exec/s # basic blocks exec/s # basic blocks exec/s # basic blocks exec/s # basic blocks
6LoWPAN Receiver 581.4 2840 1.2 2354 2.5 2724 73.6 1812 / 1618
6LoWPAN Transmitter 1877.0 2563 1.8 2176 2.6 2307 66.4 2460 / 2101
NXP HTTP 5216.8 2341 4.8 1990 4.5 2209 22.5 447 / 337
SAMR21 HTTP 2894.6 1927 3.1 1581 1.7 1310 1018.4 52 / 26
P2IM PLC 772.1 238 19.5 243 6.3 247 24.5 637 / 453
P2IM Drone 7279.7 237 9.3 281 2.8 283 9.7 583 / 500
STM PLC 7193.8 748 10.8 654 2.0 776 15.5 732 / 381
WYCINWYC 3083.1 3263 9.4 1384 12.3 2795 41.0 3375 / 3166
TCP Echo Client 3401.3 2403 4.8 1679 4.0 2290 87.2 460 / 375
TCP Echo Server 2762.1 2177 5.0 1563 4.7 1710 88.4 459 / 229
UDP Echo Client 4485.3 1613 5.0 1188 4.7 1594 90.2 460 / 229
UDP Echo Server 4636.7 1450 5.9 1045 5.1 1485 85.1 460 / 229

Table 3: Results of fuzzing the targets over 24 hours. Reported numbers are median values from the five runs. For Fuzzware, we
report reached basic blocks both with and without considering HAL functions.

sage logs and exit addresses.2 All sample inputs provided For additional analysis of found crashes and a comparison to
by the hal-fuzz repository were executed in both HALucina- other rehosting tools, we refer to Appendix A.1.
tor and S AFIRE F UZZ and we asserted that the resulting logs
matched. Additionally, we generated such message logs for
various inputs from the fuzzing queue, selected at random, as
S AFIRE F UZZ vs. HALucinator. On all targets except
well as for crashing inputs and made sure that the traces and
the P2IM PLC firmware, our framework offers greatly in-
exit addresses were equivalent.
creased performance compared to HALucinator. Conducting
the Mann-Whitney U test on the execution speed and cov-
Metrics. The metrics we use for comparative means are erage metrics confirmed statistically significant divergence
(1) executions per second and (2) total coverage measured in for p < 0.05 between S AFIRE F UZZ and HALucinator for all
basic blocks.3 For (1) we count the total executions for each targets, with the exception of coverage for the P2IM PLC
fuzzing run and divide them by the 24-hour time budget. For target. We note that the fuzzing campaign against this target
(2), we replay the test cases in the respective tools, except for is inefficient for all frameworks: the target exhibits extremely
S AFIRE F UZZ, where we replay found test cases in hal-fuzz easily triggerable crashes and, additionally, a significant part
for better comparability. As both Fuzzware and hal-fuzz are of all inputs lead to an infinite loop and, thus, a timeout.
based on Unicorn, this allows us to collect translated blocks,
We achieve an up to 1000x increase in raw throughput
which we further filter for actual basic blocks as defined by
when running the frameworks in the same environment. When
Ghidra’s [5] SimpleBlockModel.
considering reached basic blocks over time, we observe that
For Fuzzware, we replay the testcases a second time, while
fuzzing with HALucinator offers higher consistency and more
ignoring subtraces traversing HAL functions hooked by the
reliable results. However, even in a worst-case comparison
other frameworks. This allows us to identify the number of
our approach is able to offer improvements for most targets.
basic blocks not executed by our HLE-based approach.

6.2 Comparison with the State of the Art S AFIRE F UZZ vs HALucinator-libAFL. When replacing
Table 3 shows an overview of the results of our experiments legacy AFL with LibAFL in HALucinator, achieved coverage
and Figure 3 visualizes reached coverage over time. In the is greatly improved but still bested by our framework in nearly
following, we discuss S AFIRE F UZZ’s performance in terms all cases during 24-hour runs. Performing the Mann-Whitney
of execution speed and reached basic blocks in comparison U test indicates statistical significance except for the coverage
to HALucinator and Fuzzware on our experimental platform. for P2IM PLC, STM PLC, WYCINWYC and the UDP Echo
2We enabled debug prints at various places for all firmware targets, log-
Client samples. In all of these cases, HALucinator-libAFL’s
overall achieved coverage is close, or better, than S AFIRE -
ging printable output to STDOUT and all interaction with HAL-I/O, e.g.,
packet contents when a simulated ethernet packet arrives. The produced logs F UZZ’s. As expected, the differences in execution speed of
provide a detailed trace of the input processing of the firmware under test. HALucinator-libAFL remain approximately the same com-
3 Note that we deliberately refrain from reporting paths, as the metric
pared to HALucinator. Reached coverage over time, on the
is not well-defined and considered obsolete. In particular, the definition of
unique input and the corresponding execution path differs widely across
other hand, often follows similar patterns as for S AFIRE F UZZ,
different fuzzers. Furthermore, the number of paths and the achieved code just significantly later in time. Given that both frameworks
coverage do not necessarily correlate. use the same fuzzing backend, this is unsurprising.

9
Figure 3: Coverage over time for S AFIRE F UZZ, HALucinator, and Fuzzware in different configurations. Shown are the median
and 95% confidence intervals over five 24-hour runs for each target.

S AFIRE F UZZ vs Fuzzware. S AFIRE F UZZ outperforms even exceeding the amount found with S AFIRE F UZZ. In the
Fuzzware in terms of execution speed and uncovers more case of WYCINWYC, Fuzzware achieved similar coverage
basic blocks except for WYCINWYC and the P2IM Drone to S AFIRE F UZZ during the latter part of the 24-hour runs.
and PLC samples. On 7 of the targets, Fuzzware performed In terms of execution speed the Fuzzware barely reaches
significantly worse than the other frameworks. We suspect any blocks in the SAMR21 sample, and, thus, terminates
that this is due to the differences in automation: we used Fuz- execution early leading to very high throughput. However,
zware’s genconfig functionality for harness creation. As such, S AFIRE F UZZ provides similar throughput while discovering
the auto-generated harnesses may encounter roadblocks early basic blocks and exercising large parts of the target.
on in emulation even before reaching the main logic of the
target. These roadblocks are circumvented in HALucinator
and S AFIRE F UZZ due to HLE-based hooking. S AFIRE F UZZ vs Fuzzware (NoHal). When disregarding
With the exception of the 6LoWPAN Receiver/Transmitter, basic blocks located in HAL functionality, the reached cov-
STM PLC, and WYCINWYC for coverage, and SAMR21 for erage of Fuzzware decreases slightly. Unsurprisingly, other
execution speed, the presented results show statistical signif- aspects of this experiment follow the same patterns as for
icance under the Mann-Whitney U test. For the 6LoWPAN Fuzzware without modifications.
samples, the experiments with Fuzzware consisted of single However, one striking observation of this experiment is
runs which found a high number of uncovered basic blocks, that even when hooking large parts of the HAL to provide

10
firmware functionality, only a couple of hundred basic blocks Firmware Minimized Crashes
are actually cut off from the target. Consequently, HLE-based WYCINWYC 16
SAMR21 HTTP 2
approaches are losing less potential insight into the target than 6LoWPAN Receiver 93
one might expect. At the same time, in the cases Fuzzware 6LoWPAN Transmitter 27
generates competitive coverage, a comparable amount of ba- P2IM PLC 14
sic blocks is only found after a significant amount of time. STM PLC 325
While this can be partially attributed to the different seeds, we JPEG Decoder 2
STM32Sine 1
speculate that Fuzzware also spends a significant amount of
time blocked in HAL functionality before reaching the main Table 4: Crashes found in targets under test. We minimized
logic of a target. crashes with AFL’s cmin.

6.3 Performance Analysis 6.4 Vulnerabilities


Results of fuzzing the different firmware targets with our During our experiments, we collected the objectives found
framework show a strong correlation between execution speed with S AFIRE F UZZ. Additionally, we created harnesses for
and found objectives (i.e., crashes and timeouts). For in- two new firmware samples. We report the minimized crashes
stance, when fuzzing the 6LoWPAN Receiver target, four runs in Table 4 and highlight noteworthy crashes in the following.
achieved 559 executions per second with 18197 objectives
on average. The outlier run only produced 3666 objectives WYCINWYC. This firmware is intended as a benchmark
and ran at 1536 executions per second. Similar results can be for firmware fuzzing, assessing a fuzzer’s capability to reach
observed for all tested targets, although other factors, such vulnerable code paths and, more importantly, to detect the
as input length and validity correlating with path complexity, fault by incorporating artificial vulnerabilities. It exposes five
influence performance, too. synthetically inserted memory corruptions within an XML
Throughput drastically decreases once the fuzzer finds its parser, each corresponding to a different vulnerability type.
first crash. Every time our target process crashes or time- S AFIRE F UZZ found all five bug classes, demonstrating our
outs, the process has to be restarted and the engine has to framework’s ability to detect various kinds of corruptions.
perform all the heavy lifting again. Analysis and rewriting Our drop-in allocator replacement allowed us to find double
of basic blocks, especially (dis-)assembling instructions, are free by keeping track of used and freed memory pointers. By
magnitudes more expensive than executing the firmware in making use of guard pages and relying on the host system’s
the hot path, where most time is spent in the target and not MMU, we can uncover segmentation faults and even heap
in the engine. To estimate the performance penalty imposed overflows more reliably. For on-system fuzzing setups, iden-
by restarts, we created a microbenchmark using the WYCIN- tifying such memory corruptions is often more difficult, as
WYC firmware. The first execution this firmware on a valid many embedded devices do not have an MMU, and available
XML input requires 1.06 seconds averaged over ten runs. emulators commonly rely on overhead-inducing SoftMMUs.
Subsequent runs on the same input execute at 6100 resets per
second, i.e., they require only 0.00016 seconds. While there 6LoWPAN Receiver/Transmitter. We re-discovered mul-
are multiple ways of addressing the problem of expensive, tiple vulnerabilities in Contiki-NG [23] originally found by
recurring restarts (e.g., snapshotting), we do not consider it HALucinator embedded in the 6LoWPAN Receiver target.
to be a major problem since it is unusual for real-life fuzz Most notably, an out-of-bounds write in the data subsection
campaigns to contain hundreds of crashes. that can be used for PC control, hence achieving Remote
Fuzzing experiments with our framework show a relatively Code Execution (CVE-2019-8359). Other bugs include an in-
high variety in explored paths and executed basic blocks as teger overflow in the 6LoWPAN fragment processing, leading
well as in objectives and hence number of executions. This to a buffer overflow and, in turn, access to unmapped mem-
is presumably due to non-determinism in the used fuzzer ory, crashing the firmware (CVE-2019-9183). Remarkably, in
backend. The divergence decreases in targets with few or contrast to prior work, our approach also found a path trigger-
no crashes. The partially drastic increase in covered basic ing CVE-2019-8359 on the Transmitter. This demonstrates
blocks compared to other frameworks can be explained by the that S AFIRE F UZZ is capable of finding bugs not detected by
increased amount of tested inputs over 24 hours. That the in- prior work that fuzzed the same firmware (HALucinator [20],
crease in coverage is not linear is expected, too, as uncovering Fuzzware [46] and uEmu [51]).
linearly more new parts of a program requires exponentially
more executions [14]. This makes new ways of enhancing JPEG Decoder. To test whether S AFIRE F UZZ can find bugs
fuzzing performance, such as proposed with S AFIRE F UZZ, in additional firmware, we followed HALucinator’s approach
even more important. and compiled an example application. In particular, we target

11
an application using LibJPEG [8] to decode and visualize an On top, we lose valuable runtime information available during
image embedded on an SD card inserted in the device. dynamic instrumentation.
Our fuzzing campaign found two previously unknown vul- Snapshotting. Since we target embedded firmware, we did
nerabilities. The first one is a segmentation fault caused by not implement memory snapshots. The technique nowadays
a critical error routine that, instead of terminating the pro- is increasingly applied in fuzzing. For it, a memory snapshot
gram after beginning to parse a corrupted input image, falls might be taken after a firmware’s boot-up process, which
through silently. Subsequently, no checks are in place to avoid would allow the engine to skip this part, and fast resets could
accessing and dereferencing pointers in uninitialized structs be used instead of full restarts. For our use-case, boot-up rou-
in memory. We traced back the second vulnerability to miss- tines of the firmwares investigated in the course of this work
ing bounds checks in the color conversion function. Output consisted of only a couple of hundred instructions. Since this
buffers have hard-coded sizes but the faulty routine uses the is a very small amount of code, we decided that the additional
decoded image’s width to iterate over scanlines and write to overhead of snapshotting and memory resets would simply
the buffer, heavily exceeding the stack-located buffer’s limits. not be worth it. However, it could be interesting to evaluate
in the future, as it allows for stateful fuzzing [39] and could
allow faster resets upon crash.
STM32 Sine. The second additional target we test is open-
source firmware for electric motor inverters [9]. During our Manual Effort. While our tool imposes the same restric-
fuzzing experiment, we explore a substantial part of the termi- tions as other HLE-based rehosting engines when it comes to
nal interface which parses and processes various commands adapting a new target to the system, the barrier to entry de-
to change hardware-internal parameters via CAN bus com- creases over time as the potential reusability of user-provided
munication. S AFIRE F UZZ finds a crash related to updating hooks in this ecosystem is enormous. Many common and pop-
certain parameter enumerations in the CAN configuration. An ular embedded platforms share their HALs and hooks for them
interface ID is retrieved from memory and used as an offset need to be implemented only once: In the case of firmware tar-
into memory. Corrupting this value leads to arbitrary memory geting STM32 boards, we implemented a total of 18 generic
writes. However, at the time of writing, we could not confirm HAL functions, whereas no function was used in less than
whether this crash is a true positive, as part of its root cause two targets, and typical targets use up to 10 of these func-
lies in the hardware configuration, which may be reported tions. Moreover, due to the plethora of existing HLE-based
wrongly by our HAL hooks. systems, readily available HAL stubs already exist. In terms
of peripheral management and function-hooking capabilities,
our engine offers functionalities compatible with many other
7 Discussion HLE-based systems. For our experiments, we ported multiple
HAL-emulating hooks from HALucinator’s Python imple-
Performance of Initial Run. Our approach is very fast dur- mentation to Rust without much effort. The authors of HALu-
ing fuzzing, once all blocks have been translated dynamically. cinator [20] also argue that, while this method requires some
However, most of the firmware’s basic blocks are unknown manual effort, this allows HLE-based approaches to handle
during the first few executions, the fuzzer spends a majority of firmware automated systems such as P2IM [25] cannot.
time inside the engine, reassembling and caching. Early exe- In this work, we mainly focussed on improving execution
cutions hence run orders of magnitude slower than subsequent speeds. In the future, reducing the required manual effort to
executions. Since each restart resets S AFIRE F UZZ’s caches, adapt a new target to our fuzzer could be a worthwhile goal.
a further performance improvement would be to not exit on Automatically identifying and hooking HAL functions would
timeouts, or to write the cache to shared memory or disk. facilitate the analysis of a broader spectrum of firmware. Pair-
When receiving a SIGALRM signal, instead of restarting the ing our near-native rehosting approach for high performance
whole process, the engine could just report a timeout exit code with Scharnowski et al.’s MMIO-modeling technique [46] for
to LibAFL and manually reset the execution state. Resuming increased generality seems extremely promising.
on a crash, e.g., a SIGSEGV, in a similar fashion is not easily Hardware Platforms. Many emulation-specific techniques
possible. The run may have tampered with and corrupted our we employed for increased performance during fuzzing are
in-process state, as firmware execution happens in the same adaptable to other domains or even CPUs architectures. Sup-
process space of the engine. Thus, any undefined behavior porting additional ISAs, such as RISC-V, is mostly an en-
could also influence the engine’s state or code. To tackle this gineering effort by extending the dynamic rewriting part of
issue, the caches could be tracked outside the current process, the engine with new translation passes. Yet, one of the core
similar to the implementation of qemuafl [26]. Moving to a concepts of our approach is fundamental to the ARM environ-
fully static rewriting is another option that, however, won’t ment: We exploit the fact that on the one hand, a large portion
necessarily benefit fuzzing performance: the reassembly time of the world’s embedded software offering large and interest-
is merely moved to the beginning, the resulting binary should ing attack surface runs on MCUs with ARMv7-M cores, and
not be faster overall, assuming we use the same techniques. on the other hand, high-power commodity ARMv8 CPUs are

12
widely available implementing the ARMv7-A instruction set or eliminated completely. While these approaches reach com-
as part of the AArch32 execution mode. Natively executing petitive performance compared to fuzzing via source-based
large portions of instructions of software compiled for very instrumentation, none of these frameworks enable firmware
low-powered devices on vastly more powerful CPUs is to this fuzzing at the time of writing. All of them focus on either
extent unique to ARM. Running our framework on even more the x86 or AArch64 ISA, and have strong assumptions on the
potent ARM cores is the logical next step. Development and layout of the target binary, such as a clear distinction between
testing were conducted on comparatively low-performance code and data sections or position-independent code. We note
CPUs. Apple recently made powerful AArch64-based cores that these assumptions which enable efficient static rewriting
widely available and popular by introducing the M1 line [2]. are rarely applicable to binary firmware, which is why we
However, in our experiments, we confirmed that 32-bit ARM adopted a dynamic rewriting approach for S AFIRE F UZZ.
support is not available in M1 chips and ARMv8 implemen-
tations generally seem to increasingly discontinue support.
32-bit support is, however, still supported on a wide range of Rehosting. In recent years, rehosting [24,47] enabled fuzzing
products, from cheap development boards (e.g., the Cortex- for various types of embedded systems, ranging from Linux-
A72 cores embedded on a Raspberry Pi 44 ) over high-end based IoT devices [36, 50] over wireless chipsets [33, 40, 45]
consumer products (e.g., the Cortex-X1 cores used in 2022 to deeply-embedded devices with monolithic firmware [16,
Thinkpad X13s) and modern server-grade CPUs such as Am- 20, 25, 31, 38, 46, 51]. S AFIRE F UZZ draws direct inspiration
pere eMAG processors. from these frameworks and prototypes, especially from HAL-
based rehosting approaches such as HALucinator [20] and
Para-Rehosting [38]. However, in comparison to S AFIRE -
8 Related Work F UZZ, most prior rehosting approaches focus on the creation
of emulation environments for target firmware, rather than in-
Dynamic Binary Rewriting. Using dynamic binary rewriting vestigating possibilities for highly-efficient fuzzing solutions.
to create a virtual execution environment for other software
is a well-known concept. For instance, the original VMWare The most notable exceptions are FirmAFL and Fuzzware.
Workstation implementation [15] provided virtualization ca- FirmAFL aims to improve fuzzing efficacy for Linux-based
pabilities for x86 systems via system-level x86-to-x86 trans- firmware, by fuzzing single applications with QEMU’s user
lation and a trap-and-emulate approach for sensitive opera- mode emulator while selectively using full-system emulation
tions. Similarly, QEMU [13] allows the emulation of different to provide additional runtime context when needed. Fuzzware,
hardware platforms via dynamic binary translation. However, on the other hand, targets monolithic firmware and integrates
none of these approaches is tailored toward enabling low- the fuzzer into the peripheral-modeling process while using
level firmware fuzzing. Minor trade-offs in performance are local dynamic symbolic execution to narrow down the pos-
accepted by design, and hardware accesses from the guest may sible input space. While both solutions provide additions to
require complex emulation back-ends. In contrast, S AFIRE - firmware fuzzing, they both rely on a QEMU-based emulation
F UZZ’s near-native rehosting approach enables running of engine and, unlike S AFIRE F UZZ, do not explore an alternative
code targeting an embedded ISA variant on a more powerful low-overhead binary rewriting approach.
host with a different ISA variant, as long both variants belong
to the same family (e.g., ARMv6-M and ARMv8-A).
Nonetheless, various frameworks explored binary rewriting Concurrent to our work, MetaEmu [18] and ICICLE [19]
for fuzzing, such as FRIDA’s Stalker mode [29] or AFL++’s aim to advance the state-of-the-art by broadening the range
Qemu- and Unicorn mode [26]. While these frameworks of rehostable architectures. As opposed to S AFIRE F UZZ’s
aim to provide optimized rewriting techniques to lower the near-native rewriting, both frameworks make use of Ghidra’s
runtime overhead, none of them considered the possibility of processor and instruction set definitions to automatically de-
near-native rehosting. AFL++’s Qemu- and Unicorn lift the rive virtualized execution environments. MetaEmu can also
binary code to TCG, its intermediate representation, before simultaneously rehost and analyze multiple connected targets.
applying instrumentation, and Frida requires that the ISA of Although they do focus on performance and implement mul-
the fuzzed target matches the one of the host. tiple IR optimization passes, they did not benchmark their
approach against real-world targets from previous work. Un-
Static Binary Rewriting. Recently, different static rewriting like S AFIRE F UZZ, they only show that they slightly outper-
approaches for fuzzing have been proposed. Frameworks like form Unicorn on a few micro benchmarks. While ICICLE
retrowrite [21], StochFuzz [49], or ZAFL [43] move large puts the focus on fuzzing, their main contribution is effective
parts of the one-time rewriting cost to a static offline phase. architecture-agnostic instrumentation. In contrast to S AFIRE -
As a result, rewriting during run time is kept to a minimum F UZZ, their framework - based on just-in-time compiled P-
4 Duringthe development of S AFIRE F UZZ, we confirmed that the frame- Code and a SoftMMU - does not fundamentally rethink emu-
work runs on a Raspberry Pi 4 and achieves highly competitive performance. lation and achieves performance on par to Unicorn.

13
9 Conclusion [2] Apple M1 Chip. https://www.apple.com/
newsroom/2020/11/apple-unleashes-m1/. Last
In this work, we investigated the possibility of improving Accessed: 21.02.2022.
recent approaches for binary firmware fuzzing. Our engine,
termed S AFIRE F UZZ, leverages HAL-level hooking and dy- [3] ARM7TDMI Technical Reference Manual: The Thumb
namic binary rewriting for rehosting low-level ARM Cortex- instruction set. https://developer.arm.com/
M firmware onto more powerful ARM Cortex-A systems. documentation/ddi0210/c/CACBCAAE. Last Ac-
We evaluated S AFIRE F UZZ by implementing fuzzing har- cessed: 21.02.2022.
nesses for the state-of-the-art firmware suite for HAL-based
[4] Capstone: Disassembler Framework. https://www.
rehosting approaches. Our performance analysis shows that
capstone-engine.org/. Last Accessed: 17.02.2022.
S AFIRE F UZZ can provide a 690x throughput increase on
average and a 30% improvement of basic block coverage [5] Ghidra: Reverse Engineering Suite. https://
compared to the state of the art over 24h fuzzing campaigns. ghidra-sre.org/. Last Accessed: 7.02.2023.
Overall, S AFIRE F UZZ demonstrates that emulation effi-
ciency is an important factor when designing rehosting sys- [6] hal-fuzz Github Repository. https://github.com/
tems with the ultimate goal to fuzz test embedded device ucsb-seclab/hal-fuzz. Last Accessed: 06.02.2022.
firmware. We hope that the insights of our work will form the
basis for faster firmware fuzzers in the future. [7] Keystone: Assembler Framework. https://www.
keystone-engine.org/. Last Accessed: 17.02.2022.

Availability [8] LibJPEG Decoder Firmware. https://github.com/


STMicroelectronics/STM32CubeF4/tree/master/
Projects/STM324x9I_EVAL/Applications/
The source code of S AFIRE F UZZ, as well as all evalua-
LibJPEG/LibJPEG_Decoding. Last Accessed:
tion harnesses and experiment code, is publicly available at:
07.02.2023.
https://github.com/pr0me/SAFIREFUZZ.
[9] OpenInverter Firmware: stm32-sine. https://
github.com/jsphuebner/stm32-sine. Last Ac-
Coordinated Disclosure cessed: 07.02.2023.
We disclosed the previously unknown vulnerabilities dis- [10] UnicornAFL: A Bridge between AFL++ and the Uni-
cussed in Section 6.4 to the maintainer of the STM32 Sine corn Emulator. https://github.com/AFLplusplus/
project and to the ST’s Product Security Incident Response unicornafl. Last Accessed: 10.02.2022.
Team (ST PSIRT).
[11] Improving afl’s qemu mode perfor-
mance. https://abiondo.me/2018/09/21/
Acknowledgements improving-afl-qemu-mode/, 2018. Last Accessed:
13.12.2022.
This work was supported by the European Union’s Hori-
zon 2020 research and innovation programme under project [12] Aspencore. Embedded systems market study, 2019.
TESTABLE, grant agreement No. 101019206, the Dutch Min-
[13] Fabrice Bellard. Qemu, a fast and portable dynamic
istry of Economic Affairs and Climate through the AVR pro-
translator. In USENIX Annual Technical Conference,
gram (Memo project) and the Dutch Science Organization
2005.
NWO through projects Theseus and NWA ORC Intersect.
We would like to thank the anonymous reviewers and the ar- [14] Marcel Böhme and Brandon Falk. Fuzzing: On the
tifact evaluation committee for the feedback on our work. We exponential cost of vulnerability discovery. In ACM
further want to express our gratitude to Tobias Scharnowski Joint Meeting on European Software Engineering
for his help on Fuzzware and Chris Boyce for pointing out an Conference and Symposium on the Foundations of
error in the list of valid basic blocks for the P2IM PLC target. Software Engineering, ESEC/FSE 2020, 2020.

[15] Edouard Bugnion, Scott Devine, Mendel Rosenblum,


References Jeremy Sugerman, and Edward Y Wang. Bringing
virtualization to the x86 architecture with the original
[1] American Fuzzy Lop Fuzzer. https://github.com/ vmware workstation. ACM Transactions on Computer
google/AFL. Last Accessed: 07.02.2022. Systems (TOCS), 2012.

14
[16] Chen Cao, Le Guan, Jiang Ming, and Peng Liu. Device- [26] Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and
agnostic firmware execution is possible: A concolic ex- Marc Heuse. AFL++: Combining incremental steps of
ecution approach for peripheral emulation. In Annual fuzzing research. In USENIX Workshop on Offensive
Computer Security Applications Conference (ACSAC), Technologies (WOOT), 2020.
2020.
[27] Andrea Fioraldi, Dominik Christian Maier, Dongjia
[17] Chao-Jui Chang, Jan-Jan Wu, Wei-Chung Hsu, Zhang, and Davide Balzarotti. Libafl: A frame-
Pangfeng Liu, and Pen-Chung Yew. Efficient memory work to build modular and reusable fuzzers. In
virtualization for cross-isa system mode emulation. In ACM Conference on Computer and Communications
ACM Conference on Virtual Execution Environments Security (CCS), 2022.
(VEE), 2014.
[28] Andrea Fioralldi, Alessandro Mantovani, Dominik
[18] Zitai Chen, Sam L Thomas, and Flavio D Garcia. Maier, and Davide Balzarotti. Registered report: Dis-
Metaemu: An architecture agnostic rehosting frame- secting american fuzzy lop - a fuzzbench evaluation. In
work for automotive firmware. In ACM Conference on 1st International Fuzzing Workshop (FUZZING), 2022.
Computer and Communications Security (CCS), 2022.
[29] Frida. Stalker. https://frida.re/docs/stalker/,
[19] Michael Chesser, Surya Nepal, and Damith C. Ranas- 2020. Last Accessed: 13.12.2022.
inghe. Icicle: A re-designed emulator for grey-box
firmware fuzzing. In International Symposium on [30] Fabio Gritti, Fabio Pagani, Ilya Grishchenko, Lukas
Software Testing and Analysis (ISSTA), 2023. Dresel, Nilo Redini, Christopher Kruegel, and Giovanni
Vigna. HEAPSTER: Analyzing the Security of Dynamic
[20] Abraham A Clements, Eric Gustafson, Tobias Allocators for Monolithic Firmware Images. In IEEE
Scharnowski, Paul Grosen, David Fritz, Christopher Symposium on Security and Privacy (S&P), 2022.
Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias
Payer. HALucinator: Firmware re-hosting through [31] Eric Gustafson, Marius Muench, Chad Spensky, Nilo
abstraction layer emulation. In USENIX Security Redini, Aravind Machiry, Yanick Fratantonio, Da-
Symposium, 2020. vide Balzarotti, Aurélien Francillon, Yung Ryn Choe,
Christophe Kruegel, and Giovanni Vigna. Toward
[21] Sushant Dinesh, Nathan Burow, Dongyan Xu, and Math- the analysis of embedded firmware through automated
ias Payer. Retrowrite: Statically instrumenting cots bina- re-hosting. In Symposium on Recent Advances in
ries for fuzzing and sanitization. In IEEE Symposium Intrusion Detection (RAID), 2019.
on Security and Privacy (S&P), 2020.
[32] Lee Harrison, Hayawardh Vijayakumar, Rohan Padhye,
[22] Gregory J. Duck, Xiang Gao, and Abhik Roychoud- Koushik Sen, and Michael Grace. PARTEMU: Enabling
hury. Binary rewriting without control flow recov- dynamic analysis of Real-World TrustZone software us-
ery. In ACM SIGPLAN Conference on Programming ing emulation. In USENIX Security Symposium, 2020.
Language Design and Implementation (PLDI), 2020.
[33] Grant Hernandez, Marius Muench, Dominik Maier,
[23] A. Dunkels, B. Gronvall, and T. Voigt. Contiki - a Alyssa Milburn, Shinjo Park, Tobias Scharnowski,
lightweight and flexible operating system for tiny net- Tyler Tucker, Patrick Traynor, and Kevin R. B. Butler.
worked sensors. In IEEE International Conference on FirmWire: Transparent Dynamic Analysis for Cellular
Local Computer Networks, 2004. Baseband Firmware. In Symposium on Network and
Distributed System Security (NDSS), 2022.
[24] Andrew Fasano, Tiemoko Ballo, Marius Muench,
Tim Leek, Alexander Bulekov, Brendan Dolan-Gavitt, [34] Evan Johnson, Maxwell Bland, YiFei Zhu, Joshua Ma-
Manuel Egele, Aurélien Francillon, Long Lu, Nick Gre- son, Stephen Checkoway, Stefan Savage, and Kirill
gory, et al. Sok: Enabling security analyses of embed- Levchenko. Jetset: Targeted firmware rehosting for
ded systems via rehosting. In ACM Symposium on embedded systems. In USENIX Security Symposium,
Information, Computer and Communications Security 2021.
(ASIACCS), 2021.
[35] Markus Kammerstetter, Christian Platzer, and Wolf-
[25] Bo Feng, Alejandro Mera, and Long Lu. P2IM: Scal- gang Kastner. Prospect: Peripheral proxying sup-
able and hardware-independent firmware testing via ported embedded code testing. In ACM Symposium on
automatic peripheral interface modeling. In USENIX Information, Computer and Communications Security
Security Symposium, 2020. (ASIACCS), 2014.

15
[36] Mingeun Kim, Dongkwan Kim, Eunsoo Kim, Suryeon [46] Tobias Scharnowski, Nils Bars, Moritz Schloegel, Eric
Kim, Yeongjin Jang, and Yongdae Kim. Firmae: To- Gustafson, Marius Muench, Giovanni Vigna, Christo-
wards large-scale emulation of iot firmware for dynamic pher Kruegel, Thorsten Holz, and Ali Abbasi. Fuzzware:
analysis. In Annual Computer Security Applications Using precise MMIO modeling for effective firmware
Conference (ACSAC), 2020. fuzzing. In USENIX Security Symposium, 2022.

[37] Karl Koscher, Tadayoshi Kohno, and David Molnar. [47] Christopher Wright, William A Moeglein, Saurabh
{SURROGATES}: Enabling {Near-Real-Time} dy- Bagchi, Milind Kulkarni, and Abraham A Clements.
namic analyses of embedded systems. In USENIX Challenges in firmware re-hosting, emulation, and anal-
Workshop on Offensive Technologies (WOOT), 2015. ysis. ACM Computing Surveys (CSUR), 2021.
[48] Jonas Zaddach, Luca Bruno, Aurelien Francillon, Da-
[38] Wenqiang Li, Le Guan, Jingqiang Lin, Jiameng Shi, and
vide Balzarotti, et al. Avatar: A framework to sup-
Fengjun Li. From library portability to para-rehosting:
port dynamic security analysis of embedded systems’
Natively executing open-source microcontroller oss on
firmwares. In Symposium on Network and Distributed
commodity hardware. In Symposium on Network and
System Security (NDSS), 2014.
Distributed System Security (NDSS), 2021.
[49] Zhuo Zhang, Wei You, Guanhong Tao, Yousra Aafer,
[39] Dominik Maier, Otto Bittner, Marc Munier, and Julian Xuwei Liu, and Xiangyu Zhang. Stochfuzz: Sound and
Beier. Fitm: Binary-only coverage-guided fuzzing for cost-effective fuzzing of stripped binaries by incremen-
stateful network protocols. In Workshop on Binary tal and stochastic rewriting. In IEEE Symposium on
Analysis Research (BAR), 2022. Security and Privacy (S&P), 2021.

[40] Dominik Maier, Lukas Seidel, and Shinjo Park. [50] Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu
Basesafe: Baseband sanitized fuzzing through emula- Song, Hongsong Zhu, and Limin Sun. FIRM-AFL:
tion. In ACM Conference on Security and Privacy in High-Throughput greybox fuzzing of IoT firmware via
Wireless and Mobile Networks (WiSec), 2020. augmented process emulation. In USENIX Security
Symposium, 2019.
[41] Valentin J. M. Manès, HyungSeok Han, Choongwoo
Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, [51] Wei Zhou, Le Guan, Peng Liu, and Yuqing Zhang. Au-
and Maverick Woo. The art, science, and engineering tomatic firmware emulation through invalidity-guided
of fuzzing: A survey. IEEE Transactions on Software knowledge inference. In USENIX Security Symposium,
Engineering, 47, 2021. 2021.
[52] Wei Zhou, Lan Zhang, Le Guan, Peng Liu, and Yuqing
[42] Marius Muench, Jan Stijohann, Frank Kargl, Aurélien
Zhang. What your firmware tells you is not how you
Francillon, and Davide Balzarotti. What you corrupt is
should emulate it: A specification-guided approach for
not what you crash: Challenges in fuzzing embedded
firmware emulation. In ACM Conference on Computer
devices. In Symposium on Network and Distributed
and Communications Security (CCS), 2022.
System Security (NDSS), 2018.

[43] Stefan Nagy, Anh Nguyen-Tuong, Jason D. Hiser,


Jack W. Davidson, and Matthew Hicks. Breaking
through binaries: Compiler-quality instrumentation for
better binary-only fuzzing. In USENIX Security
Symposium, 2021.

[44] Tobias Pfeffer, Paula Herber, Lucas Druschke, and


Sabine Glesner. Efficient and safe control flow recov-
ery using a restricted intermediate language. In IEEE
Conference on Enabling Technologies: Infrastructure
for Collaborative Enterprises (WETICE), 2018.

[45] Jan Ruge, Jiska Classen, Francesco Gringoli, and


Matthias Hollick. Frankenstein: Advanced wireless
fuzzing to exploit new bluetooth escalation targets. In
USENIX Security Symposium, 2020.

16
A Appendix

A.1 Comparison to other Papers

Firmware S AFIRE F UZZ HALucinator - Paper Para-Rehosting


exec/s Time Crashes exec/s Time Crashes exec/s Time Crashes
WYCINWYC 3083.1 24h 16 17.92 24h 5 647.86 11h:43m 909
SAMR21 HTTP 2894.6 24h 2 22.92 19d:04h 273 902.95 12h:33m 219
NXP HTTP 5216.8 24h 0 154.5 14d:0h 0 1443.22 12h:39m 0
6LoWPAN RX 581.4 24h 93 18.84 1d:10h 3 – – –
6LoWPAN TX 1877.0 24h 27 15.3 1d:10h 0 – – –
P2IM Drone 7279.7 24h 0 11.8 9d:01h 0 – – –
P2IM PLC 772.1 24h 14 215 9d:01h 634 – – –
ST-PLC 7193.8 24h 325 3.73 1d:10h 27 2552.8 12h:15m 41
STM32 TCP Client 3401.3 24h 0 58.0 3d:08h 0 1092.4 12h:00m 58
STM32 TCP Server 2762.1 24h 0 56.7 3d:08h 0 1466.7 12h:00m 129
STM32 UDP Client 4636.7 24h 0 44.1 3d:08h 0 1245.0 12h:00m 65
STM32 UDP Server 3803.2 24h 0 66.7 3d:08h 0 902.3 12h:00m 16

Table 5: Throughput Comparison with experiments reported in HALucinator [20] and Para-Rehosting [38]. For S AFIRE F UZZ,
we report values of the median run based on the number of executions. We minimized Crashes with AFL’s cmin for our own
experiments, for the other numbers it is not known whether or which minimization the authors applied.

In addition to our re-evaluation of different approaches, we P2IM. S AFIRE F UZZ offers a substantial speed-up over the
also compare our approach to experiments described in other P2IM technique on the tested firmware: Feng et al. report
papers. 32.7 and 17.2 executions per second for the PLC and Drone
targets respectively, where we observe 772 and 7279 in the
HALucinator. When compared to the numbers reported median case [25].
in HALucinator [20], gathered on a stronger CPU, we still
achieve more than 200x on average (cf. Table 5). Conse-
quently, we explore the targets much faster. The authors
report that they found exactly five crashes in 612 paths.
The divergence from our experiments, where HALucinator
did not find a single crash, could be due to potentially less
coverage, as we fuzzed the program on a weaker CPU and
achieved fewer executions. During their 24-hour fuzzing
campaign, the target was executed about 1.5 million times
on a 12-core Xeon server. It is not stated whether the results
were achieved by using a single or multiple cores per fuzz run.

Para-Rehosting. In this work, the proposed framework


executed WYCINWYC about 27 million times within 12
hours. The authors report 3166 paths and 909 crashes, no
absolute number of covered basic blocks is given. The
median of crashes found with S AFIRE F UZZ on WYCINWYC
is 25, 653 before minimization. While the achieved speedup
compared to Para-Rehosting is existent but not extreme, the
approach is no direct competitor as it requires compilable
source code of the firmware. This often is a non-given in
embedded security analysis. A comparison of execution
speed can be found in Table 5.

17
A.2 Special Control-Flow modifying Functions A.3 Functions hooked at HAL-level
Sometimes it is necessary to call unmodified functions in the Function Name System Used in
firmware from within user-defined hooks, e.g., when we write WYCINWYC SAMR21 HTTP 6LoWPAN P2IM PLC
malloc general ✓ ✓ ✓ ✓
an interrupt handler that resolves a callback. In emulators realloc general ✓ ✓ ✗ ✗
free general ✓ ✓ ✓ ✓
such as Unicorn, such behavior is handled by modifying the puts general ✓ ✗ ✗ ✓
Program Counter in the emulation state. Returning from HAL_GetTick
HAL_RTC_GetDate
STM32
STM32








the user-defined hook will then resume execution at the HAL_RTC_GetTime STM32 ✓ ✗ ✗ ✗
serial_putc STM32 ✓ ✗ ✗ ✗
specified address. Such modifications are not as trivial in our serial_getc STM32 ✓ ✗ ✗ ✗
mbed::Stream::write STM32 ✓ ✗ ✗ ✗
engine as there is no differentiation between firmware- and mbed::Stream::read STM32 ✓ ✗ ✗ ✗
✓ ✗ ✗ ✗
engine-PC. We handle this case by calling a dedicated naked rtc_write
rtc_read
STM32
STM32 ✓ ✗ ✗ ✗
function, Rust itself exposes no tailcall functionality. Naked HAL_SYSTICK_Config STM32 ✗ ✗ ✗ ✓
HAL_UART_Receive_IT STM32 ✗ ✗ ✗ ✓
functions do not incorporate any Rust-intrinsic function pro- HAL_UART_Transmit STM32 ✗ ✗ ✗ ✓
HAL_UART_IRQHandler STM32 ✗ ✗ ✗ ✓
or epilogues after compilation but consist of a single, pure millis Arduino ✗ ✗ ✗ ✓
HardwareSerial::read Arduino ✗ ✗ ✗ ✓
assembly block. An example of such a function can be seen HardwareSerial::write Arduino ✗ ✗ ✗ ✓
in Listing 1. The target function expects two parameters, the HardwareSerial::available
usart_write_wait
Arduino
SAM R21








third one is used to supply the address of the target function. ethernetif_input SAM R21 ✗ ✓ ✗ ✗
ksz8851snl_low_level_output SAM R21 ✗ ✓ ✗ ✗
Upon finishing, it returns directly to the user-defined hook uip_tcpchksum RF233 ✗ ✗ ✓ ✗
uip_udpchksum RF233 ✗ ✗ ✓ ✗
and Rust can perform its normal clear-up, dropping variables rf233_on RF233 ✗ ✗ ✓ ✗
and correctly adjusting the stack frame. rf233_off
i2c_master_read_packet_wait
RF233
RF233








Another case where special handling was necessary, are trx_sram_read RF233 ✗ ✗ ✓ ✗
trx_frame_read RF233 ✗ ✗ ✓ ✗
the GCC-specific thumb switch cases. They expose normal trx_frame_write RF233 ✗ ✗ ✓ ✗
trx_reg_read RF233 ✗ ✗ ✓ ✗
switch-case or table branch functionality but they work on trx_reg_write RF233 ✗ ✗ ✓ ✗
✗ ✗ ✓ ✗
making assumptions about the Link Register and directly clock_init
clock_time
Contiki-OS
Contiki-OS ✗ ✗ ✓ ✗
modifying it. Naturally, this cannot work in our rewritten clock_seconds Contiki-OS ✗ ✗ ✓ ✗

code site, where instructions are shifted by non-deducible


amounts from their original addresses. As we expect this Table 6: Functions hooked at HAL-level for four exemplary
not to be the only time when a function might require targets.
knowledge of the LR, we introduced a saving mechanism. On
every BLX, the address of the original callsite is stored at a
globally-known memory location which then can be accessed
by the callee. For the GCC switch-case functions, we simply
hook and replace the original functions with naked functions,
performing the same set of arithmetic modifications to
calculate the new target address on the stored LR instead of
using the register content.

1 #[naked]
2 unsafe extern "aapcs" fn _call_netif_input (
3 _rx_pbuf_ptr : u32,
4 _ethernet_netif_ptr : u32,
5 _netif_input_cb_addr : u32,
6 ) -> u32 {
7 asm! ("mov pc, r2", options ( noreturn ));
8 }

Listing 1: Example of naked function used for tailcalls from


within user hooks.

18
B Artifact Appendix Cortex-A core with 32-bit support is required, which can
for instance be found on a Raspberry Pi 4b featuring four
B.1 Abstract Cortex-A72 cores. Installation instructions for Raspberry Pis
can be found in our main repository. Additionally, due to
This artifact allows the replication of the experiments and interoperability issues, our Fuzzware-specific experiments
results described in Section 6. We provide the following: (i) A were run on an x86-64 VM.
stand-alone repository containing the full source code for our
During artifact evaluation, we provided the reviewers with
rehosting and fuzzing engine, ready to be compiled and used
access to the same hardware we used during our evaluation:
(https://github.com/pr0me/SAFIREFUZZ), (ii) a reposi-
(M1) a HoneyComb ARM workstation, and (M2) an Ubuntu
tory containing documentation, build- and setup scripts for
18.04 x86-64 VM hosted on an AMD EPYC 7662 server.
replicating our experiments and a copy of the data we gath-
ered during our evaluation (https://github.com/pr0me/
safirefuzz-experiments). B.2.4 Software Dependencies
The artifact has been validated on a HoneyComb LX2
1. Rust: Our artifact is implemented in the Rust program-
ARM workstation containing 16 ARM Cortex-A72 cores
ming language. Per the rust-toolchain file provided
with a clock rate of up to 2 GHz, 32 GB DDR4 memory with
in the main repository [1], we pin the installation envi-
a frequency of 3200 MT/s and a 128 GB m.2 SSD running
ronment to compiler version rustc 1.62.0-nightly.
Ubuntu 18.04.05.
2. Cross-Compilation: A cross compilation toolchain
B.2 Description & Requirements is required. On Ubuntu, the corresponding
packets are gcc-arm-linux-gnueabihf and
B.2.1 Security, Privacy, and Ethical Concerns g++-arm-linux-gnueabihf.
While running and evaluating S AFIRE F UZZ does not require Install the armv7-unknown-linux-gnueabihf rust tar-
destructive steps, small changes to the host system weakening get for the above-mentioned compiler version. Note that
its security guarantees are needed to run our system. these steps are even required when directly building in
In particular, we require ASLR to be disabled, to increase an ARM environment such as the HoneyComb. While
determinism and avoid mapping of, e.g., linked libraries in the processor can execute programs targeted for both
segments we need otherwise and expect to be empty. Addi- ARMv7 and ARMv8 versions, if the OS is built for
tionally, we enable allocating virtual memory down to address aarch64, cross-compilation is required as the artifact
0 by adjusting mmap_min_addr, as we need to place parts of binary will execute in ARMv7’s 32-bit mode.
the firmware image in low memory regions. Both can be con-
figured by running the SAFIREFUZZ/prepare_sys.sh script. 3. External Dependencies: The main artifact requires the
Those changes should be reverted after usage of our system, LibAFL and Keystone external dependencies that cannot
either by manually reverting the changes or rebooting. be automatically fetched by Rust’s package manager. We
include the dependencies as git submodules, pinned to
B.2.2 How to Access specific versions.
The evaluated third-party frameworks introduce their
We provide public access to our code and experiment setups own dependencies and can be set up as documented in
and data through the following GitHub repositories at specific the HALucinator5 and Fuzzware6 repositories.
tags for artifact evaluation:
4. Python: For multiple build and automation scripts pro-
1. S AFIRE F UZZ main repository: https://github.com/
vided with the AE experiment repository, we require a
pr0me/SAFIREFUZZ/tree/post_ae
Python version > 3.9. Additionally, we require the fol-
DOI: https://zenodo.org/record/8223057
lowing Python libraries for analyzing and plotting the re-
2. Artifact Evaluation data: https://github.com/ sults of our experiments: jupyter, numpy, matplotlib,
pr0me/safirefuzz-experiments/tree/post_ae seaborn, scipy, pandas.
DOI: https://zenodo.org/record/8223055
The repositories contain detailed information on building, B.2.5 Benchmarks
running, and reproducing our experiments. To evaluate S AFIRE F UZZ, we use a collection of 12+2
firmware samples: 12 samples from the original HALuci-
B.2.3 Hardware Dependencies nator evaluation, and 2 previously untested samples (JPEG
S AFIRE F UZZ rehosts low-level Cortex-M firmware onto more 5 https://github.com/ucsb-seclab/hal-fuzz

powerful Cortex-A cores. As such, a system containing a 6 https://github.com/fuzzware-fuzzer/fuzzware

19
Decoder and STM32 Sine). We include all samples in the 7. Build with
experiment repository under 00_firmware. $> cargo build -release -target
Using these samples, we evaluate our approach against the armv7-unknown-linux-gnueabihf
following fuzzing setups: 8. Run the prepare_sys.sh script as root.
1. HALucinator. State-of-the-art high-level-emulation-
based rehosting and fuzzing framework. We include B.3.2 Basic Test
the fuzzing-ready hal-fuzz version as a submodule in After installing and compiling the main arti-
safirefuzz-experiments/01_fuzzing/hal-fuzz. fact, you will find the safirefuzz binary under
2. HALucinator - LibAFL. We replace HALucinator’s ./target/armv7-unknown-linux-gnueabihf/release/.
legacy AFL forkserver with a LibAFL-based fork- Compilation is always specific to a single target or harness,
server. This new version is identical in configuration so make sure to change the target (cf. Section B.3.1, step
to the forkserver backend we use in S AFIRE F UZZ. We 6.) and re-compile before trying to execute a new firmware
conduct this comparison to eliminate variables such image.
as differences in mutation strategies. Details can be Start fuzzing a specific firmware image with a directory of
found in the safirefuzz-experiments repository under seeds by running:
01_fuzzing/forkserver_LibAFL.
./safirefuzz -b 00_firmware/wycinwyc.bin -i
3. Fuzzware. A recent peripheral-modeling-based rehost- 01_fuzzing/seeds/wycinwyc/ -c 1.
ing approach. This is the only experiment we con-
ducted in an x86-64 environment, as, even after con- When starting a fuzzing campaign, you should see
sulting the authors, Fuzzware could not be brought to LibAFL’s status reports scrolling by. For running a test on
run in our default ARM environment. We provide us- the WYCINWYC target, you should be able to see rapidly
age information and link the necessary submodule under increasing numbers for corpus, around 400-600 after approx.
01_fuzzing/fuzzware. 30 seconds, which are interesting inputs leading to unique
new coverage, at roughly 7000 executions per second. You
We include setup guides and detailed usage instructions for can find these inputs in the queue directory while crashing
all evaluated frameworks under 01_fuzzing/README.md. inputs are stored in crashes.
To then execute a single input, execute:
B.3 Set-up
./safirefuzz -b 00_firmware/wycinwyc.bin -i
B.3.1 Installation 01_fuzzing/crashes/SOMECRASHID
If you are using the provided access to the experiment ma-
chines, all systems are already set-up and below instructions We automated most of these steps with the
can be skipped. To manually install S AFIRE F UZZ, the main safirefuzz_target.py script included in the experi-
artifact, please follow these steps: ments repository under 01_fuzzing. For instance, running
1. Checkout our experiments repository 2 and initialize the $> ./safirefuzz_target.py nxp_http will automat-
submodules recursively. ically build S AFIRE F UZZ for the correct target and start
2. Inside the experiments repository: fuzzing. This script defaults to running on the third core (-c
$> cd 01_fuzzing/SAFIREFUZZ 2), change this if you are running multiple tests in parallel.
3. Install the Rust programming language.7
4. Install the cross-compilation toolchain with ‘rustup B.4 Evaluation Workflow
target add armv7-unknown-linux-gnueabihf‘
and cross-arch linkers, e.g., on Ubuntu by running B.4.1 Major Claims
‘sudo apt install gcc-arm-linux-gnueabihf (C1): S AFIRE F UZZ achieves statistically significant more
g++-arm-linux-gnueabihf‘. exec/s (ca. 690x on avg.) and coverage than HALuci-
5. Specify the correct linker by adding the following lines to nator, except for coverage on the P2IM PLC and P2IM
your ˜/.cargo/config: Drone targets. This is proven by experiment E1.
[target.armv7-unknown-linux-gnueabihf]
linker = "arm-linux-gnueabihf-gcc" (C2): S AFIRE F UZZ achieves more exec/s (ca. 1100x on
6. Specify the target harness you want to execute / fuzz in avg.) and coverage than HALucinator-LibAFL, except
src/engine.rs: for coverage on the UDP Echo Server, STM PLC,
use crate::harness::wycinwyc as harness; P2IM PLC and P2IM Drone targets. These results are
7 https://www.rust-lang.org/tools/install statistically significant, except for coverage on the P2IM

20
PLC, STM PLC, WYCINWYC, and UDP Echo Client compute-hours + 10 human-minutes coverage collection
targets. This is proven by experiment E2. set-up time + 2-6 compute-hours queue replay time]:
For details how to start a HALucinator-LibAFL
(C3): S AFIRE F UZZ achieves more exec/s (ca. 145x on avg.) fuzzing campaign, refer to the corresponding section in
and coverage than Fuzzware, except for coverage on 01_fuzzing/README.md.
P2IM PLC and P2IM Drone. The results are statistically
significant except for coverage on the 6LoWPAN (E3): Fuzzware Comparison [15 human-minutes fuzzing
RX/TX, STM PLC, and WYCINWYC targets and for set-up time + up to 5x12x24 compute-hours + 15
execution speed on the SAMR21 target. This is proven human-minutes coverage collection set-up time + 2-6
by experiment E3. compute-hours queue replay time]: In order to set up
and start fuzzing with Fuzzware, please refer to the
(C4): S AFIRE F UZZ reliably re-discovers previously found detailed instructions provided 01_fuzzing/fuzzware
bugs during fuzzing (E0). This includes vulnerabilities as part of our experiments repository.
in the WYCINWYC and 6LoWPAN RX/TX targets as
discussed in Section 6.4. (E4): Vulnerability discovery [15 human-minutes fuzzing
set-up time + up to 24 compute-hours + 15 human-
(C5): S AFIRE F UZZ finds crashes in the previously untested minutes replay & verification]: To compile and fuzz the
firmware images JPEG Decoder and STM32 Sine. This previously untested targets, please refer to the README
can be replicated with experiment E4. We discuss the included in 03_case_studies inside the experiment
findings in Section 6.4 of our paper. repository.

For C1, C2 and C3, we discuss our results in Section 6.2 in Collecting Coverage. For all experiments except E3, cover-
the main paper. Table 3 reports numbers gathered during our age can be collected using the eval_bbs_halucinator.py
experiments and Figure 3 illustrates achieved coverage over script in 02_coverage_collection. For E3, use
the course of a 24-hour fuzzing campaign for all targets and the scripts fuzzware_genstats_with_hal.sh and
frameworks. fuzzware_genstats_without_hal.sh. For detailed in-
structions, refer to the README provided within the directory.
B.4.2 Experiments Analyzing Results. We provide scripts to test whether
As a working S AFIRE F UZZ installation is required for the achieved coverage and execution speeds are statistical
subsequent steps, refer to Section B.3.1 of this Appendix and significant under 04_eval_data inside the experiment
the README of our main repository [1] for instructions. repository. Please use the bb_mann_whitney.ipynb and
The following steps assume you work in the pre-configured execs_mann_whitney.ipynb jupyter notebooks inside the
environments. coverage and executions directories. We further provide a
gen_fig3.ipynb notebook to plot coverage data over time.
(E0): S AFIRE F UZZ Baseline [20 human-minutes fuzzing To use these notebooks with data from your experiments, you
set-up time + up to 5x12x24 compute-hours + 10 will need to exchange the .data and .csv in the according
human-minutes coverage collection set-up time + according subdirectories. Please refer to the README for more
2-6 compute-hours queue replay time]: Use the details.
./safirefuzz_target.py script in 01_fuzzing of
the experiments repository [2] to start a 24-hour fuzzing Time & Resource Considerations. Due the extent of the
campaign for the specified target with S AFIRE F UZZ. experiments carried out during evaluation, it may not be pos-
sible to run all experiments for all reviewers in the time frame
(E1): HALucinator Comparison [20 human-minutes allocated for artifact evaluation. Hence, we provide the raw
fuzzing set-up time + up to 5x12x24 compute-hours + data collected from our runs under 04_eval_data in our ex-
10 human-minutes coverage collection set-up time + periment repository. The raw data allows to reproduce our
2-6 compute-hours queue replay time]: HALucinator is claims without, or only partially, running the experiments.
readily set-up, you can start fuzzing with this framework
by executing the corresponding script in the hal-fuzz B.5 Version
submodule. For further details, refer to the HALucinator
Based on the LaTeX template for Artifact Evaluation
section in 01_fuzzing/README.md.
V20220926. Submission, reviewing and badging methodol-
ogy followed for the evaluation of this artifact can be found at
(E2): HALucinator-LibAFL Comparison [20 human-
https://secartifacts.github.io/usenixsec2023/.
minutes fuzzing set-up time + up to 5x12x24

21

You might also like