0% found this document useful (0 votes)
108 views

Response-Time Analysis of ROS2

Uploaded by

Svastits
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Response-Time Analysis of ROS2

Uploaded by

Svastits
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Response-Time Analysis of ROS 2 Processing

Chains Under Reservation-Based Scheduling


Daniel Casini
Scuola Superiore Sant’Anna, Pisa, Italy
Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany
[email protected]
Tobias Blaß
Corporate Research, Robert Bosch GmbH, Renningen, Germany
Saarland Informatics Campus, Saarland University, Saarbrücken, Germany
[email protected]
Ingo Lütkebohle
Corporate Research, Robert Bosch GmbH, Renningen, Germany
[email protected]
Björn B. Brandenburg
Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany
[email protected]

Abstract
Bounding the end-to-end latency of processing chains in distributed real-time systems is a well-
studied problem, relevant in multiple industrial fields, such as automotive systems and robotics.
Nonetheless, to date, only little attention has been given to the study of the impact that specific
frameworks and implementation choices have on real-time performance. This paper proposes a
scheduling model and a response-time analysis for ROS 2 (specifically, version “Crystal Clemmys”
released in December 2018), a popular framework for the rapid prototyping, development, and
deployment of robotics applications with thousands of professional users around the world. The
purpose of this paper is threefold. Firstly, it is aimed at providing to robotic engineers a practical
analysis to bound the worst-case response times of their applications. Secondly, it shines a light on
current ROS 2 implementation choices from a real-time perspective. Finally, it presents a realistic
real-time scheduling model, which provides an opportunity for future impact on the robotics industry.

2012 ACM Subject Classification Software and its engineering → Real-time schedulability

Keywords and phrases ROS, real-time systems, response-time analysis, robotics, resource reservation

Digital Object Identifier 10.4230/LIPIcs.ECRTS.2019.6

Supplement Material ECRTS 2019 Artifact Evaluation approved artifact available at


https://dx.doi.org/10.4230/DARTS.5.1.5
The associated source code is available at https://github.com/boschresearch/ros2_response_
time_analysis.

1 Introduction
ROS, the Robot Operating System [43], is one of the most popular frameworks for designing
and developing Linux-based robots. Powering over 100 different robot designs, it is used
by tens of thousands of developers and researchers in both industry and academia [4, 22].
However, after over a decade of development and in the face of increasingly demanding
applications, it became clear to the ROS community that the framework is held back by
several long-standing shortcomings and architectural limitations that cannot be rectified in
a backwards-compatible manner. This motivated the development of ROS 2, a complete
refactoring of ROS that puts the successful concept onto a modernized and improved
© Daniel Casini, Tobias Blaß, Ingo Lütkebohle, and Björn B. Brandenburg ; Artifact
* Complete
licensed under Creative Commons License CC-BY en
t
ECRTS *

*
*
st

We
nsi

AE *
* Co

ll Docum

31st Euromicro Conference on Real-Time Systems (ECRTS 2019).


se
eu

Ev
e

nt

Editor: Sophie Quinton; Article No. 6; pp. 6:1–6:23


R
*

ed
* Easy to
alu
at e d
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
6:2 Response-Time Analysis of ROS 2 Processing Chains

foundation. Of particular interest to us, a major design goal of ROS 2 is to improve the
real-time capabilities of the framework, enabling the implementation of time-critical control
paths inside ROS [24].
Safely implementing such control paths requires predicting the end-to-end latency (or
response time) of time-critical processing chains. For instance, such a chain might cover
all steps from a sensor, via a controller, to the final actuator and span multiple software
components, multiple cores, and even multiple hosts. Although the end-to-end latency
problem is well-studied in the literature, existing work often assumes idealized scheduling
models that are not always directly applicable in real systems. Case in point, the ROS1
scheduling approach (as described in detail in Section 3) does not match any of the classic
results on bounding end-to-end latencies. ROS developers therefore have to resort to fully
prototyping and deploying a design to measure its timing properties, which severely limits
the degree to which the design space can be explored in practice.
While the scheduling model of operating systems like Linux has been studied extensively,
this is not the case for ROS. Even though it is a middleware layer and not a proper operating
system, its effects on an application’s runtime behavior are substantial, rivaling or even
exceeding that of the underlying OS. For example, ROS multiplexes independent message
handlers onto shared threads using custom scheduling policies. Consequently, applications
running on top of ROS are subject to the scheduling decisions of the underlying operating
system and the middleware layer, with complex and interdependent effects on timing.
An additional complication stems from one of the key strengths of ROS: its modular
structure. ROS emphasizes composing existing, battle-tested components instead of reimple-
menting common subsystems for each robot from scratch. While this greatly simplifies and
speeds up robot development, it also obfuscates the overall timing behavior. This problem is
further aggravated by ROS’ event-driven design style, which gives rise to data dependencies
and potentially long processing chains. As a result, it is extremely difficult for developers
to anticipate, or even just understand, the timing of processing chains that cross multiple,
loosely coupled components, most of which are developed by independent teams all around
the world. Realistically, automated end-to-end response-time analysis is thus required to
safely employ ROS in time-critical situations.
In this paper, we seek to lay the theoretical foundations for such an automated analysis
tool by exploring the temporal behavior of ROS 2 “Crystal Clemmys” (released in Decem-
ber 2018) [5]. We present and validate a model of ROS applications running on top of
a resource reservation scheduler such as SCHED_DEADLINE in Linux. Based on this model,
we develop an end-to-end response-time analysis for ROS processing chains that takes the
peculiarities and engineering constraints of the ROS environment into account. Finally, to
demonstrate the applicability of our analysis to practical ROS components, we evaluate our
approach on the popular move_base package [3], the core of the ROS navigation stack.

2 Background

This section introduces necessary background on the three pillars on which this paper rests.
First, the structure of ROS and its execution model are presented. Then, we review resource
reservations, an OS-level mechanism to isolate the resource consumption of processes, and
last we review the Compositional Performance Analysis approach for response-time analysis.

1
For brevity, we omit the version number and refer to ROS 2 as ROS in the remainder of the paper.
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:3

Language-specific rclcpp rclpy


ROS Application Client Libraries

Client Library rcl

Client libraries
Middleware Library rmw
ROS Middleware
DDS Intra-process API
RTI Connext DDS OpenSplice DDS
DDS Implementations
eProsima FastRTPS
Operating System Linux/Windows/OS X

Figure 1 Layered structure of ROS.

rclcpp rclpy rclnodejs RCLAda


2.1 ROS
eProsima
ROS places great emphasis RTI Connext OpenSplice
FastRTPS on modularity and composability. It therefore encourages strict
DDS DDS
separation between the logical structure of the application and the mapping of this structure
onto hosts, processors, and threads. While the former is defined by the package developer,
the latter is entirely up to the system integrator. This way, software modules can be
developed independently of the target platform without losing the ability to taylor them to
the deployment characteristics of a particular robot.
From a logical perspective, ROS applications are composed of nodes, the smallest self-
contained units of behavior. These nodes communicate using the publish-subscribe paradigm:
nodes publish messages on topics, which broadcast the message to all nodes that are
subscribed to the topic. Nodes react to incoming messages by activating callbacks to process
each message. Since these callbacks may publish messages themselves, complex behavior can
be implemented as a network of topics and callbacks. ROS also allows callbacks to invoke
remote procedure calls by means of the service mechanism using a continuation-passing style.
Specifically, a callback can initiate a non-blocking service request to a service callback and
specify a third client callback to be invoked once the response is available.
ROS seamlessly allows composing nodes written in different programming languages and
using different communication backends. The ROS implementation is therefore split into
multiple layers of abstraction, which are visualized in Figure 1. Each supported programming
language requires a client library that provides a language-specific API to the ROS application
model. The ROS project officially supports C++ and Python, with community-provided
support for numerous other languages. Below the surface, these libraries use a common
system model provided by the rcl library. This ensures consistent behavior between the
languages and reduces code duplication.
Despite this unified implementation, some parts of the ROS system are allowed to differ
between languages. In particular, client libraries have a lot of freedom in implementing the
execution model to allow the callback graph to be expressed in the most natural way in each
language. A language supporting coroutines, for example, might allow coroutines as event
handlers instead of callbacks. We therefore limit the focus of this paper to the C++ interface,
which we believe to be the most likely choice for time-critical components.
For inter-node communication, ROS uses the Data Distribution Service [39] (DDS), an
industry standard for data distribution in real-time systems. DDS specifies a network-
transparent publish-subscribe mechanism that can be adapted to the needs at hand using
a rich set of Quality-of-Service (QoS) policies. ROS works with different, independent
implementations of the DDS standard (currently, FastRTPS by eProsima [2], Connext by
RTI [6], and Vortex OpenSplice by Adlink [7]), each with a different API. The rcl client library
therefore accesses the DDS subsystem over the common rmw (ROS MiddleWare) interface,

ECRTS 2019
6:4 Response-Time Analysis of ROS 2 Processing Chains

which provides a DDS-agnostic API to the rcl layer. Each supported DDS implementation
requires a dedicated rmw implementation, which translates between the common rmw
interface and the vendor-specific DDS API.
To deploy a ROS application, the individual nodes have to be distributed to hosts and
then mapped onto operating system processes. ROS does not impose any restrictions on
this mapping. Processes implement the ROS execution model by running executors, which
receive messages from rcl and invoke the corresponding callbacks. ROS provides two built-in
executors: a sequential one that executes callbacks in a single thread, and a parallel one
that distributes the processing of pending callbacks across multiple threads. Moreover, ROS
supports arbitrarily complex setups of multiple, user-defined executors.
From a real-time perspective, it is important to note that executors implement custom
scheduling policies; we revisit this issue in Section 3 in detail. Furthermore, how each ROS
executor’s threads are scheduled by the OS has a major impact on the overall timing behavior
of the application. To ensure predictable scheduling, executor threads can be bound to a
reservation server, which is a pragmatic configuration approach to increasing predictability
that we advocate in this paper. Next, we briefly review key aspects of resource reservations.

2.2 Resource Reservations


An ideal mechanism to ensure predictable service for ROS threads is a resource reservation,
which is a classic OS-level abstraction that limits interference between processes by bounding
their resource consumption. Resource reservations are typically implemented by reservation
servers. In general, a reservation server ri is characterized by a budget Qi and a period Pi ,
and guarantees that its client threads receive Qi units of execution time in each period.
Many different reservation algorithms have been designed and developed over the last
30 years [12], with various different features and support for different scheduling algorithms
(e.g., fixed-priority, deadline-based scheduling, etc.). This paper does not focus on any
specific algorithm, but we assume the reservation server to comply with the periodic resource
model [58], namely, we require that: (i) each reservation has an implicit deadline, (ii)
whenever ri has workload, the reservation algorithm guarantees at least Qi units of service
every Pi time units, and (iii) there exists a bounded maximum service delay, i.e., a bounded
maximum release delay that a process running in a reservation can experience because of
budget exhaustion and delays due to other reservations. Under these assumptions, the
minimum amount of service provided by the reservation in an interval of length ∆ can be
expressed with a supply-bound function sbf(∆) [33, 58]. In the following, we assume that
such a supply bound function is known for each reservation and refer to [12, 15, 33, 58] for a
discussion of how to obtain them.
On Linux systems, resource reservations are available through the SCHED_DEADLINE [32]
scheduling class, which implements the Constant Bandwidth Server [8] reservation algorithm.
Moreover, as a special case of practical relevance, a thread running on a dedicated core at
the highest priority of the SCHED_FIFO scheduling class can be considered as running in a
reservation with the supply bound function sbf(∆) = ∆.
Next, to complete the overview of needed background, we review the Compositional
Performance Analysis approach, upon which we base our response-time analysis.

2.3 Compositional Performance Analysis


Compositional Performance Analysis (CPA) is an approach to analytically evaluate the
performance of heterogeneous and distributed systems [27]. CPA models systems as networks
of resources, and workloads as tasks with dependencies. Resources provide processing time,
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:5

which is consumed by tasks. Tasks with dependencies are organized as a direct acyclic
graph, and paths in the graphs are denoted as processing chains. Tasks sharing the same
resource are scheduled according to a resource-specific scheduling policy. The source task of
a chain is triggered according to an externally provided event arrival curve [27, 30, 62] η e (∆),
which defines an upper bound on the number of events that can arrive in any time window
[t, t + ∆). Event arrival curves are general enough to model both periodic and event-driven
(e.g., interrupt-driven) activations [47]. For example,
l whenma task Tx is periodically triggered,

its arrival curve can be expressed as ηxe (∆) = period(x) . Non-source tasks are triggered
according to derived activation curves. Activation curves are obtained from arrival curves
by accounting for release jitter, which reflects the activation delay due to predecessor tasks
and depends on their response times. However, response times also depend on the release
jitter, thus creating a cyclic dependency. To solve this problem, the analysis starts with
an initial jitter of zero, and then iteratively applies the response-time analysis and updates
all jitter bounds until convergence is achieved [42, 63]. Convergence is guaranteed (for
non-overloaded systems) by the monotonic dependency between response time and jitter (the
more jitter, the higher the response times, and vice versa). The basic CPA approach bounds
the end-to-end latency of a chain with the sum of the individual response-time bounds of each
task. Extensions have been subsequently designed to improve analysis precision, e.g., [54, 56].
Since ROS provides executors that dispatch callbacks in peculiar ways using custom
scheduling policies, the existing CPA literature and tooling is not a perfect match for ROS.
However, we liberally take inspiration from CPA to obtain a similarly flexible timing model
that reflects the idiosyncrasies of ROS, which we introduce next.

3 ROS Scheduling

As described in Section 2.1, the ROS execution model multiplexes all callbacks associated with
an executor onto one or more threads. The ROS C++ library provides its built-in executor
in two variants: a single-threaded and a multi-threaded one. In this initial study of the ROS
timing behavior, we focus exclusively on the simpler and more predictable single-threaded
executor. The following description is based on a careful study of the ROS source code and
documentation, and is to our knowledge the first comprehensive description of the scheduling
behavior of ROS. To validate our observations, we conclude this section with an experiment
that demonstrates and corroborates our findings on a concrete example.
The executor distinguishes four categories of callbacks: timers, which are triggered by
system-level timers, subscribers, which are triggered by new messages on a subscribed topic,
services, which are triggered by service requests, and clients, which are triggered by responses
to service requests. The executor is responsible for taking messages from the input queues of
the DDS layer (by interacting with the rcl layer) and executing the corresponding callback.
Since it executes callbacks to completion, it is a non-preemptive scheduler. However, unlike
most commonly studied schedulers, it does not always consider all ready tasks for execution.
Instead, it bases its decisions on the readySet, a cached copy of the set of ready non-timer
callbacks, which it updates in irregular, execution-dependent intervals. The algorithm is
depicted in Figure 2, in which we assume C to be the set of all callbacks assigned to the
executor, and C tmr , C sub , C srv , C clt to be the subsets of C consisting only of timers, subscribers,
services, and clients, respectively.
If the executor is idle, it updates its readySet. This is the only step in which the executor
interacts with the underlying communication layer (i.e., rmw, via rcl). It then looks for
a callback to execute by searching through the four callback categories (for efficiency, the

ECRTS 2019
6:6 Response-Time Analysis of ROS 2 Processing Chains

readySet ← {c ∈ (C \ C tmr ) | c ready in rcl}

cb ← highest-priority callback in s
execute next instance of cb
readySet ← readySet \ {cb}

s 6= ∅
s ← {t ∈ C tmr | t ready in rcl}

s=∅ s=∅
s 6= ∅
s ← C sub ∩ readySet

s=∅
s 6= ∅
s ← C srv ∩ readySet

s=∅
s 6= ∅
s ← C clt ∩ readySet

Figure 2 The executor scheduling algorithm.

executor blocks if there is nothing to do; this optimization has been omitted for clarity). It
first checks whether any timers have expired. Since these are not managed by the DDS layer,
this check is based on the current timer state and does not depend on the readySet. It then
searches the readySet for subscriptions, services, and clients (in this order). Evaluating the
queues in a fixed order has the intrinsic effect of assigning each queue a different priority
(i.e., the timer queue is examined first and hence has the highest priority, and the client
queue is examined last and has the lowest priority). When a queue is considered, callback
instances are examined in callback registration order, i.e., the order in which the callbacks
were registered with the executor. Consequently, the registration order represents a second
level of priorities. Overall, the pair (callback type, registration time) is a unique priority
for each callback.
Whenever a category has at least one ready callback, the highest-priority one is selected,
executed, and then removed from the readySet. Finally, when the readySet is empty and no
expired timers are left, the executor returns to the idle state and updates the readySet based
on a current snapshot of the communication layer. We refer to the updating of the readySet
as a polling point and the interval between two polling points as a processing window. The
n-th polling point is referred to as PP n , and the n-th processing window (ranging from PP n
to PP n+1 ) as PW n .
Compared to regular fixed-priority scheduling, this algorithm exhibits a few unusual
properties. First, messages arriving during a processing window are not considered until the
next polling point, which depends on all remaining callbacks. This leads to priority inversion,
as lower-priority callbacks may implicitly block higher-priority callbacks by prolonging the
current processing window.
Second, it relies on a ready set instead of the more usual ready list. This means that the
algorithm cannot know how many instances of any non-timer callback are ready. It therefore
processes at most one instance of any callback per processing window. This aggravates the
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:7

T1 T2 T3

SL
SM
SH
L
M
H
T3
T2
T1
T0
PP 0 PP 1
0 1 2 3 4 5 6 7 8 9
Time (seconds)

Figure 3 Gantt-Chart of the scheduler validation test. At times T1 and T3 , the timers trigger.
At time T2 , the second batch of service requests and messages is submitted.

priority inversion above, as a backlogged callback might have to wait for multiple processing
windows until it is even considered for scheduling. Effectively, this means that a non-timer
callback instance might be blocked by multiple instances of the same lower-priority callback.

The presented description of the ROS scheduler is based on manual code inspection. In a
system as complex as ROS, however, this is potentially error-prone, as there might be subtle
interactions that are easily overlooked yet change the behavior drastically. Thus, to validate
our model, we implemented a special-purpose ROS node that executes arbitrary-length
callbacks in a way that allows inferring the behavior of the ROS scheduler from the resulting
trace. Specifically, the node is controlled using three topics (H, M, and L), three services
(SH, SM, and SL), and a special-purpose topic to create timers. Note that the chosen names
assume that topics and services are prioritized in registration order; checking that topic H
actually has the highest priority is part of the model validation. In the following description,
time zero refers to the point in time when the first batch of validation callbacks arrives at the
node. The i-th timer is denoted as ti . For ease of visualization, all callbacks run for 500ms.

Our test first sets up two timers at 200ms (T0 ) and two timers at 2300ms (T3 ). It then
sends the message sequence <L M H SH SL L M H SH SL>, waits for 1.5 seconds (T2 ), and
then sends <SM SM H >. The result is visualized in Figure 3. Note that the polling points
are not determined by the test; rather, they are inferred from the resulting timing behavior.

One can clearly observe the scheduler executing only a single callback per ready event,
even if multiple messages have been queued up; this is especially apparent after the second
polling point. Furthermore SM is visibly skipped at time 4, even though it arrives earlier at
T2 (i.e., during the execution of t1 ). This proves the existence of polling points. The timers,
however, are clearly not subject to these polling points, since t2 and t3 arrive later than SM
but are still executed during the first processing window.

ECRTS 2019
𝛾5
𝛾2 𝛾6

𝑐1
𝛾3
6:8 Response-Time Analysis of ROS 2 Processing
𝛾4 Chains
𝛾1

𝑐1 𝑐2 𝑐6 𝑐7 𝑐11 𝑐12 𝛾 1 = (𝑐1 , 𝑐2 , 𝑐5 ) 𝛾 4 = (𝑐6 , 𝑐7 , 𝑐10 , 𝑐13 , 𝑐14 )


𝑐5 𝑐10 𝛾 2 = (𝑐3 , 𝑐4 , 𝑐5 ) 𝛾 5 = (𝑐8 , 𝑐9 , 𝑐10 , 𝑐11 , 𝑐12 )
𝑐3 𝑐4 𝑐8 𝑐9 𝑐13 𝑐14 𝛾 3 = (𝑐6 , 𝑐7 , 𝑐10 , 𝑐11 , 𝑐12 ) 𝛾 6 = (𝑐8 , 𝑐9 , 𝑐10 , 𝑐13 , 𝑐14 )

Figure 4 Example of a ROS graph. Circles represent callbacks, and edges represent communication
2
relations 𝛾among them. The corresponding
5 processing chains are also shown.
𝛾
𝛾6

4 System Model

In this section, we introduce a model of the timing-related aspects of a ROS system, its
callbacks, and their activation relations. Table 1 summarizes our notation.
We model a ROS system as a direct acyclic graph (DAG) D = {C, E} composed of a set
of callbacks C = {c1 , . . . , cn } and a set of directed edges E ⊆ C × C. We assume the graph
D to be fixed, i.e., callbacks can neither join nor leave the system at runtime. Recall from
Section 3 that C tmr , C sub , C clt , and C srv denote the subsets of all timer, subscriber, client,
and service callbacks, respectively.
Each callback ci ∈ C has a worst-case execution time ei , a unique priority πi , and releases
a potentially infinite sequence of instances. We assume a discrete-time model, that is, all
time parameters are integer multiples of some basic time unit (e.g., a processor cycle).
Depending on its type, a callback instance is activated when the DDS layer receives a
message or a timer expires. When an instance of a callback is activated, it is said to be pending,
and it remains pending until it completes. A callback instance is said to be ready when it
is pending but not executing. Each edge (ci , cj ) ∈ E encodes an activation relation from
callback ci to callback cj , meaning that during the execution of an instance of ci it activates
up to one instance of cj (e.g., by publishing a message to the topic to which cj is subscribed).
Each callback is associated with a set of predecessors pred(ci ) = {cj ∈ C : ∃ (cj , ci ) ∈ E}
and a set of successors succ(ci ) = {cj ∈ C : ∃ (ci , cj ) ∈ E}. A callback without predecessors
(respectively, successors) is said to be a source callback (respectively, sink callback).

Processing chains. The ROS graph D can have multiple source and sink callbacks. Each
source originates one or more callback chains γ x = (cs , . . . , ce ), i.e., directed paths in the
graph. The set of all chains of the graph from a source callback to any other callback is
denoted by chains(D) = {γ 1 , . . . , γ s }. Callbacks can be shared by multiple chains. An
example of a ROS graph with several chains is shown in Figure 4.

Activation model. As in CPA, each source callback cs is associated with a given external
event arrival curve ηse (∆), denoting the maximum number of instances of cs that can be
released in any interval of length ∆, while non-source callbacks are associated with a (derived)
activation curve. We assume w.l.o.g. that ηse (∆) > 0 for ∆ > 0.
As discussed in Section 2.1, non timer-callbacks are activated in a data-driven fashion.
Our model assumes that callbacks belong to a single timer, topic, or service. (This is not
a restriction, since two callbacks may execute the same code.) Consequently, a callback
may have multiple incoming edges only if it subscribes to a topic with multiple publishers
(similarly to what is referred to as OR-activation semantics in other work [27]). In this case,
all subscribers are triggered once for each message published to the topic. The derivation of
activation curves for callbacks with multiple incoming edges is discussed in Section 5.1.
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:9

Table 1 Summary of Notation.

Symbol Description Symbol Description


x
ci the i-th callback γ the x-th processing chain
rk the k-th reservation γ x,y the y-th subchain of γ x
Ck set of callbacks in reservation rk ηie external arrival curve of ci
δi,j propagation delay from ci to cj ηia derived activation curve of ci
A an offset into a busy window sbfk (∆) supply-bound function of rk
Ri∗ (A) least positive solution of ci ’s rbfi (∆) request-bound function of ci
P
response-time equation for offset A RBF (C, ∆) c ∈C
rbfi (∆)
i

Scheduling of executors. As discussed in Section 3, this paper adopts the built-in single-
threaded executor. To compel the OS to guarantee predictable service to ROS executors, we
assume each executor (i.e., each thread) to be assigned to a single reservation server, and
each reservation server to handle a single executor. Consequently, callbacks assigned to an
executor can equivalently be considered as assigned to the corresponding reservation server.
The system comprises a set of w reservations R = {r1 , . . . , rw }, and the set of all the
callbacks assigned to reservation rk is denoted Ck . Analogously, the sets of all timers,
subscriptions, clients and services allocated to reservation rk are denoted as Cktmr , Cksub , Ckclt ,
and Cksrv , respectively. The symbols lpk (ci ) and hpk (ci ) denote the set of callbacks in Ck with
lower and higher priority than πi , respectively. The reservations are partitioned onto a set of
m processors P = {p1 , . . . , pm }, i.e., each reservation is statically assigned to a processor.
The results presented in this paper rely on the availability of a supply-bound function
sbfk (∆) denoting the minimum service provided by a reservation rk in any interval of length
∆ (recall Section 2.2). Whenever multiple reservations are allocated to the same processor,
the resource provisioning described by the supply-bound function is guaranteed only if all
reservations are schedulable, i.e., they are always able to provide their complete budget
during each period [33]. In this paper, reservations are assumed to be schedulable. The
problem of guaranteeing reservations to meet their timing constraints is also referred to as
global schedulability in prior works (e.g., in the context of hierarchical scheduling [9]). Many
results are available to ensure global schedulability, e.g., [37].

Propagation delay. The propagation delay between the publication of a message by a


sending callback and the activation of an associated receiving callback can be significant.
Indeed, due to the inherently distributed topology of ROS systems, the message exchange
can involve the network, introducing additional latencies. To model such a delay, each pair of
reservations (rx , ry ) is characterized by a (DDS-dependent) worst-case communication delay
δi,j , denoting the maximum time experienced by a message sent from the DDS layer of a
sending callback ci allocated to rx until being received by the DDS layer of a receiving callback
cj allocated to ry , i.e., the maximum additional delay experienced by cj before being activated.
When rx = ry , δi,j is assumed to be negligible. This delay can be either analytically upper-
bounded for different types of networks (e.g., see [17, 19, 51]), or pragmatically measured,
depending on the requirements of the target application domain.

Event sources. With the exception of timers, all callback types provided by ROS implement
data-driven activation semantics. Consequently, all chains comprised solely of ROS callbacks
are initially triggered by a timer. Nonetheless, applications often have to react to external

ECRTS 2019
6:10 Response-Time Analysis of ROS 2 Processing Chains

events that are delivered asynchronously via interrupts (e.g., certain sensors, network packets
delivering inputs from supervisory controllers or human operators, etc.). To integrate such
events in our ROS model, we allow external threads to interact with ROS and model them
as pseudo-callbacks. Specifically, we name these threads event sources. An event source is a
regular OS-level thread that is sporadically activated, and interacts with ROS by publishing
to one or more topics, thus acting as an interface or ingress point for external events. As
we do in the case of executors, we assume each event source to be exclusively assigned to
a dedicated reservation. For notational convenience, we let C evt denote the set of all event
sources and refer to event sources as callbacks ci ∈ C evt .

5 Response-Time Analysis for Processing Chains

This section presents an analysis of the end-to-end delay (i.e., the maximum response time)
of a generic ROS processing chain. Our analysis is inspired by the CPA approach (described
in Section 2.3), whose event-propagation mechanism is a natural fit for the distributed and
message-based nature of ROS. A discussion of possible alternatives is postponed to Section 7.
As in CPA, a complex ROS graph is analyzed by computing individual response-time
upper bounds for each callback. End-to-end latencies can then be obtained by summing
the individual response times of the callbacks of each chain [27]. Unfortunately, none of the
existing instantiations of CPA can compute these per-callback response times, as they are
unaware of the peculiarities of the ROS scheduling mechanism (e.g., polling points). We
therefore present a ROS-specific response-time analysis for callbacks in Section 5.2.
Although this approach provides a safe and simple upper bound on end-to-end latencies,
the resulting bounds may be overly pessimistic if arrival bursts of interfering callbacks are
accounted for multiple times, once for each callback in the chain under analysis. This effect
is known in the literature as the “pay-burst-only-once” problem [30, 56]. To improve the
accuracy of our analysis, Section 5.4 presents a bound in which portions of chains, named
subchains, are analyzed in a holistic way. We define the y-th subchain γ x,y of γ x as a sequence
of consecutive callbacks ci ∈ γ x of the original chain that are allocated to a single reservation
rk , i.e., ci ∈ γ x,y ⇒ ci ∈ Ck . With this approach, arrival bursts are accounted for only
once per subchain. The CPA approach can then be applied on a per-subchain basis, by
propagating arrival curves and summing response-time bounds whenever a subchain crosses
a reservation boundary, or joins with another chain in a callback with multiple predecessors.

5.1 High-Level Overview


Figure 5 shows an example that illustrates how the proposed analysis can be used to upper-
bound the response time of a callback chain spanning multiple reservations. For clarity,
interfering callbacks have been omitted in the figure. Response-time bounds for the various
subchains of γ x (i.e., γ x,1 = (c1 , c2 ), γ x,2 = (c3 , c4 ), γ x,4 = (c6 , c7 ), and γ x,3 = (c5 )) can be
derived with the results that will be presented in Sections 5.2 and 5.4.
As discussed in Section 2.3, activation curves of non-source subchains must be derived
from their predecessors and depend on both the response time of previous subchains and
a
communication delays. In this example, ηx,2 (∆) = ηxe (∆ + Rx,1 + δ2,3 ), ηx,3 a a
(∆) = ηx,2 (∆ +
a e
Rx,2 + δ4,5 ), ηx,4 (∆) = ηx,3 (∆ + Rx,3 + δ5,6 ), where Rx,y is a response-time upper bound
for γ x,y . The response time of the chain shown in this example can then be computed
as the sum of the response times of the subchains and communications delays, i.e., as
Rx = Rx,1 + δ2,3 + Rx,2 + δ4,5 + Rx,3 + δ5,6 + Rx,4 .
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:11

η𝑒𝑥 (∆) η𝑎𝑥,2 (∆) η𝑎𝑥,3 (∆)


𝑐1 𝑐2 𝑐3 𝑐4 𝑐5
𝛾 𝑥,1 𝛾 𝑥,2 𝛾 𝑥,3
η𝑎𝑥,4 (∆)
𝑐7 𝑐6
𝑟1 𝑟2 𝑟3
𝛾 𝑥,4

Figure 5 A callback chain crossing multiple reservations. Activations of the first subchain are
bounded by an external event arrival curve, while subsequent subchains are characterized by an
activation curve that depends on response times and communication delays in the previous subchains.

Processing chains sharing one or more callbacks are also supported by the analysis
framework. To deal with this case, the jitter propagation approach is extended to callbacks
with multiple incoming edges, i.e., multiple predecessors [29]. As discussed in Section 4, a
callback with multiple incoming edges is triggered when it receives a message from any of its
predecessors. Consequently, the activation curve of a callback (i.e., the source callback of a
subchain, when the holistic approach of Section 5.4 is adopted) is derived from the activation
curves of the predecessors as follows:
X
ηia (∆) = ηja (∆ + Rj + δj,i ), (1)
cj ∈pred(ci )

where Rj is the response time of cj and δj,i is the propagation delay of messages from cj to
ci . The sum in Equation (1) follows since each incoming message spawns a callback instance.

5.2 Analysis for Individual Callbacks


To start, we recall some classic definitions from uniprocessor schedulability analysis. A
time t is a quiet time with respect to a callback ci if there is no pending instance of ci
that arrived prior to t. An interval [t1 , t2 ) is a busy period [59] for ci iff t1 and t2 are
quiet times of ci and there is no quiet time (w.r.t. ci ) in between t1 and t2 . The response
time Ri of a callback ci is defined as the maximum difference, over all possible instances,
between the finishing time and the release time of the specific instance. For each callback
ci , the request-bound function rbfi (∆) is defined as the maximum amount of (cumulative)
processor service required by callback instances released in an interval of length ∆, i.e.,
rbfi (∆) = ηia (∆) · ei [31]. Finally, we define the total request-bound function of a given set of
callbacks C ∗ as RBF (C ∗ , ∆) = ci ∈C ∗ rbfi (∆).
P

From a scheduling perspective, callbacks can be divided into three categories: event
sources, timers, and polling-point-based (pp-based) callbacks. For convenience, in addition
to the sets Ckevt and Cktmr containing respectively the event sources and timers allocated
to rk , we define also the set Ckpp = Ck \ (Ckevt ∪ Cktmr ) of pp-based callbacks allocated to
rk . Event sources are the easiest to analyze since, as described in Section 4, each event
source is exclusively allocated to a dedicated reservation. Building on the concept of the
supply-bound function sbfk (∆), i.e., the minimum amount of service provided by reservation
rk in an interval of length ∆, Lemma 1 provides a response-time bound for event sources.
I Lemma 1. If A ≥ 0 is the time at which the instance of an event source callback ci ∈ Ckevt
under analysis is released (relative to the beginning of the current busy period), and Ri∗ (A) is
the least positive value that satisfies
sbfk (A + Ri∗ (A)) = rbfi (A + 1), (2)
then Ri = max{Ri∗ (A) | A ≥ 0} is a response-time bound for ci .

ECRTS 2019
6:12 Response-Time Analysis of ROS 2 Processing Chains

Proof. By assumption (cf. Section 4), if an event source ci is allocated to a reservation


rk , no other callbacks are allocated to rk . Consequently, each callback instance can suffer
only self-interference from other instances of the same callback. The lemma follows since
the amount of service provided by rk in the interval [0, A + Ri∗ (A)) is lower-bounded by
sbfk (A + Ri∗ (A)) and the maximum amount of service required by instances of ci released in
the interval [0, A] is bounded by rbfi (A + 1). J

Lemma 1 is not directly applicable, as it requires checking an unbounded number of


possible release offsets A. To actually implement a response-time analysis, both an upper
bound on the length of the analysis interval and a reduction of the number of release offsets
that must be checked are needed; we revisit this issue in Section 5.3.
Next, we consider the response times of timers, which are proper callbacks and thus
dispatched by ROS executors. As described in Section 3, timer scheduling is not subject to
polling points. Nevertheless, since executors process callbacks non-preemptively, timers are
subject to lower-priority blocking. Lemma 2 bounds the blocking experienced by a timer
callback due to the lower-priority callbacks cj ∈ lpk (ci ).

I Lemma 2. A timer callback ci ∈ Ck is blocked for at most Bi = max{ej | cj ∈ lpk (ci )}


time units by lower-priority callbacks.

Proof. First note that callbacks allocated to any reservation ro 6= rk cannot block ci since
there is an independent executor in each reservation and, as explained in Section 3, timers
are not subject to polling points. An instance of a timer callback ci ∈ Ck can be released
at time t∗ + 1, where t∗ is the time at which a lower-priority callback cj ∈ lpk (ci ) started
executing. Due to non-preemptive scheduling, ci cannot start until cj completes, i.e., after
at most ej time units. The lemma follows. J

With Lemma 2 in place, Lemma 3 upper-bounds the response time of timer callbacks.

I Lemma 3. If A ≥ 0 is the time at which the instance under analysis of a timer callback
ci ∈ Cktmr is released (relative to the beginning of the current busy period), and Ri∗ (A) is the
least positive value that satisfies

sbfk (A + Ri∗ (A)) = rbfi (A + 1) + RBF (hpk (ci ), A + Ri∗ (A) − ei + 1) + Bi , (3)

then Ri = max{Ri∗ (A) | A ≥ 0} is a response-time bound for ci .

Proof. By Lemma 2, the blocking due to lower-priority callbacks experienced by ci is bounded


by Bi . Due to the priority assignment presented in Section 3, every callback with a priority
higher than a timer is itself a timer, i.e., cj ∈ hpk (ci ) ⇒ cj ∈ Cktmr . Due to non-preemptive
scheduling, as soon a callback instance starts executing it cannot be interfered with by any
other callback, i.e., higher-priority callbacks can interfere only in the interval [0, A+Ri∗ (A)−ei ].
The lemma follows by noting that: (i) the interference from higher-priority callbacks is
bounded by the total request-bound function, i.e., RBF (hpk (ci ), A + Ri∗ (A) − ei + 1), (ii) the
callback under analysis can suffer self-interference only from instances released in [0, A], and
(iii) the amount of service provided by rk in the interval [0, A + Ri∗ (A)) is lower-bounded
by sbfk (A + Ri∗ (A)), i.e., the callback under analysis completes no later than when the
guaranteed minimum service matches the maximum total demand. J

Again, we discuss how to use Lemma 3 in a practical response-time analysis in Section 5.3.
Next, we consider pp-based callbacks. Due to the unpredictable nature of dynamic polling
points, pp-based callbacks suffer additional blocking. Indeed, when an instance of a pp-based
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:13

timers
𝑒𝑖


0 𝐴 𝑃𝑃ℎ−1 𝑃𝑃ℎ 𝐴 + 𝑅𝑖∗ (𝐴) 𝑡

other instances of 𝑐𝑖 𝑋

workload subject to polling points (both higher and lower priority)

Figure 6 Time intervals in which other pp-based callbacks and timers can interfere with a
pp-based callback ci under analysis.

callback is released, it requires the completion of one or more processing windows before
being executed. A response-time bound for pp-based callbacks is provided by Lemma 4,
which is illustrated in Figure 6.

I Lemma 4. If A ≥ 0 is the time at which the instance of a pp-based callback ci ∈ Ckpp


under analysis is released (relative to the beginning of the current busy period), X ≥ 0 is the
difference between time A + Ri∗ (A) − ei and the last polling point before time A + Ri∗ (A) − ei
(see Figure 6), and Ri∗ (A) is the least positive value that satisfies

sbfk (A + Ri∗ (A)) = rbfi (A + 1) + RBF (Ckoth , A + Ri∗ (A) − ei − X + 1)


+ RBF (Cktmr , A + Ri∗ (A) − ei + 1), (4)

where Ckoth = Ck \ (Cktmr ∪ {ci }) is the set of the other non-timer callbacks allocated to rk ,
then Ri = max{Ri∗ (A) | A ≥ 0} is a response-time bound for ci .

Proof. Due to polling points, non-timer callbacks (both of higher and lower priority) can
delay the callback under analysis only with instances that have arrived by the last polling
point. Note that a polling point cannot occur while a callback is executing. Consequently,
the polling point at time A + Ri∗ (A) − ei − X is the last polling point before A + Ri∗ (A), and
pp-based callbacks can delay the callback instance under analysis only with instances released
in [0, A + Ri∗ (A) − ei − X) (note that a callback released exactly at a polling point PP n is
processed during PW n ). Due to the priority assignment discussed in Section 3, each timer
callback has higher priority than any pp-based callback. It follows that, due to non-preemptive
scheduling, all timer callbacks Cktmr can interfere with the pp-based callback under analysis
up to the time at which it starts executing, i.e., at any time in [0, A + Ri∗ (A) − ei ]. The lemma
then follows analogously to Lemma 3 by noting that: (i) the callback under analysis can
suffer self-interference only from instances released in [0, A], and (ii) the amount of service
provided by rk in the interval [0, A + Ri∗ (A)) is lower-bounded by sbfk (A + Ri∗ (A)). J

Lemma 4 upper-bounds the response time experienced by a pp-based callback. As for the
previous lemmas, we will discuss how to bound the space of possible times A in Section 5.3.
Moreover, Lemma 4 depends on the time distance X between A + Ri∗ (A) − ei and the
last polling point before A + Ri∗ (A), which is generally unknown during offline analysis.
Consequently, we need to determine the scenario (i.e., the value of X) that maximizes
the response time. Intuitively, this case occurs when the callback ci under analysis starts
executing just after the last polling point, i.e., lower-priority callbacks can interfere with ci
throughout the time from its release until it starts executing. In this case, X = 0. Lemma 5
proves that X = 0 indeed dominates all possible values of X.

ECRTS 2019
6:14 Response-Time Analysis of ROS 2 Processing Chains

I Lemma 5. The delay experienced by a pp-based callback ci ∈ Ck \ (Cktmr ∪ Ckevt ) due to


other pp-based callbacks is maximized when ci starts executing just after the last polling point:

max RBF (Ckoth , A + Ri∗ (A) − ei − X + 1) = max RBF (Ckoth , A + Ri∗ (A) − ei + 1), (5)
A≥0,X≥0 A≥0

where Ri∗ (A), A, and X are defined as in Lemma 5.

Proof. The lemma follows by noting that X ≥ 0 and that RBF (Ckoth , A+Ri∗ (A)−ei −X +1) is
a sum of monotonic non-decreasing functions; hence it is monotonic non-decreasing, too. J

By Lemma 5, it follows that the amount of interference generated by timer callbacks


ct ∈ Cktmr and non-timer callbacks cn ∈ Ckoth is the same in the worst case. Consequently, we
can merge the two sets, and rewrite Equation (4) in a simpler manner:

sbfk (A + Ri∗ (A)) = rbfi (A + 1) + RBF ({Ck \ ci }, A + Ri∗ (A) − ei + 1). (6)

Equation (6) highlights that the scheduling policy adopted by the built-in ROS executor
allows every other callback, independent of priority, to interfere with pp-based callbacks.
Consequently, polling points make the priority assignment ineffective for upper-bounding the
response time of pp-based callbacks. This confirms what we empirically observed during the
model validation (Section 3) from an analytical perspective. Note that timer callbacks are
not affected by polling points and their response-time bound is equivalent to non-preemptive
fixed-priority scheduling [17], in the context of a resource reservation (Lemma 3).

5.3 Bounding the Search Space


The lemmas presented in Section 5.2 require checking Equations (3), (4) and (5) for all
possible A ≥ 0, where A represents the relative release time (with respect to the beginning of
the current busy period) of the callback instance under analysis. To use the previous lemmas
in a practical response-time analysis, both a bound on the analysis interval and a reduction
of the search space size are required. Note that the analysis interval can be bounded by
the longest interval during which a reservation rk is busy serving higher-or-equal-priority
workload, i.e., the length of the longest busy period [59], which Lemma 6 bounds.

I Lemma 6. Let Ckevt , Cktmr , and Ckpp be the sets of all event source, timer, and pp-based
callbacks allocated to rk , respectively. If ci ∈ Ck is the callback under analysis, and L∗ is the
least positive value that satisfies

rbfi (L∗ )

 if ci ∈ Ckevt
∗ ∗ ∗
sbfk (L ) = RBF (hpk (ci ), L ) + Bi + rbfi (L ) if ci ∈ Cktmr (7)
 pp
RBF (C , L∗ )

if c ∈ C ,
k i k

then L∗ is an upper bound on the length of the longest busy period.

Proof. By contradiction, assume that there exist a busy period of ci with length L0 > L∗ .
Under this assumption, in the busy period corresponding to L0 either (i) there are more
callbacks delaying ci , or (ii) callbacks execute for more time or, (iii) callbacks arrive more
frequently than in the busy period corresponding to L∗ . By Lemmas 1, 3, and 4, Equation (7)
accounts for all callbacks that can delay ci . Moreover, by definition of the request-bound
function, Equation (7) is composed of a sum of products of worst-case execution times and
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:15

arrival curves. By definition, no callback can execute for more than its worst-case execution
time. Further, the activation curve η a (∆) defines an upper bound on the number of events
that can arrive in any time window [t, t + ∆), thus leading to a contradiction. J

With Lemma 6 restricting the search to a finite interval, Lemma 7 below reduces the number
of points contained in the search space. To this end, consider the response-time bounds
computed with Equations (3), (4) and (6): each can be expressed as an instance of a general
response-time equation sbfk (A + x) = rbfi (A + 1) + I(A + x) + B, where B is a constant and
the function I depends only on its argument. Equation (6), for example, can be written in
this form by substituting B = 0 and I(∆) = RBF ({Ck \ ci }, ∆ − ei + 1). For any A, we let
SOL(A) denote the set of all positive x that satisfy the general response-time equation.
I Lemma 7. For a callback ci ∈ Ck under analysis, let A−i = {A > 0 | rbfi (A + 1) = rbfi (A)}
denote the points where rbfi (A) stays constant. For any a ∈ A− ∗ ∗
i , Ri (a) 6= maxA≥0 Ri (A).

Proof. We prove that Ri∗ (a) is strictly less than its “neighbor” Ri∗ (a − 1) ∈ SOL(a − 1), and
hence necessarily also less than maxA≥0 Ri∗ (A). To this end, we establish that Ri∗ (a − 1) =
Ri∗ (a) + 1 (which is well-defined since 0 ∈/ A− ∗
i ) by showing that (i) Ri (a) + 1 ∈ SOL(a − 1)
∗ 0 0
and (ii) Ri (a) + 1 ≤ a for any a ∈ SOL(a − 1).
Step (i): By definition, Ri∗ (a) ∈ SOL(a), and thus sbfk (A + Ri∗ (A)) = rbfi (A + 1) + I(A +
Ri (A))+B. By adding 0 = 1−1 and using the fact that a ∈ A−

i and hence rbfi (a+1) = rbfi (a),
we equivalently obtain sbfk (a − 1 + Ri∗ (a) + 1) = rbfi (a − 1 + 1) + I(a − 1 + Ri∗ (a) + 1) + B.
This is the definition of SOL(a − 1) and hence proves Ri∗ (a) + 1 ∈ SOL(a − 1).
Step (ii): Consider any a0 ∈ SOL(a−1). Then sbfk ((a−1)+a0 ) = rbfk (a)+I((a−1)+a0 )+B.
Using again that rbfi (a+1) = rbfi (a), this is equivalent to sbfk (a+(a0 −1)) = rbfk (a+1)+I(a+
(a0 −1))+B, which matches the definition of SOL(a) and hence a0 −1 ∈ SOL(a). By definition,
Ri∗ (a) = min{x | x ∈ SOL(a)}, and thus we have a0 − 1 ≥ Ri∗ (a) ⇔ Ri∗ (a) + 1 ≤ a0 . J

Together, Lemmas 6 and 7 enable an efficient implementation of the response-time analysis


by restricting the required search space Ai (w.r.t. a callback ci ) to

Ai = {A | 0 ≤ A ≤ L∗ } \ A− ∗
i = {0 ≤ A ≤ L | rbfi (A + 1) 6= rbfi (A)} ∪ {0}.

To further reduce the effects of arrival bursts, we next provide a joint response-time bound
for a sequence of callbacks in a single reservation.

5.4 Analysis for Processing Chains


This section is provides an end-to-end analysis for linear subchains composed of multiple
callbacks, where each subchain does not cross reservation boundaries. To this end, we extend
the notion of request bound functions to subchains as rbf x,y (∆) = ηsa (∆) · ex,y , where cs is
the first callback of the subchain γ x,y , and ex,y = ci ∈γ x,y ei is the cumulative worst-case
P

execution time of the subchain. Consequently, RBF γ (Γk , ∆) = ∀γ x,y ∈Γk rbf x,y (∆), where
P

Γk is the set of subchains allocated to rk . Lemma 8 allows us to compute a response-time


bound for a subchain composed of multiple callbacks (if a subchain consists of only a single
callback, its response time can be computed with the results of Section 5.2).
I Lemma 8. If γ x,y = (cs , . . . , ce ) is a subchain composed of |γ x,y | ≥ 2 callbacks, Γk is the
set of subchains allocated to rk , and Rx,y is the least positive value that satisfies

sbfk (Rx,y ) = RBF γ (Γk , Rx,y − ee + 1), (8)

then Rx,y is a response-time bound for γ x,y .

ECRTS 2019
𝛾3
𝛾1 𝛾4

𝑐1
𝛾5
𝛾2 𝛾6
6:16 Response-Time Analysis of ROS 2 Processing Chains

P=80ms position local-planner global-planner legend


J =20 μs 𝑒1 = 200 μs mapg 𝑒5 = 10𝑚s
𝜋5 = 1
reservation
P=80ms laser scanner 𝜋1 = 1 goal
P=10s plan1 𝑒6 = 200𝑚s
J =20 μs J =0 callback
𝑒2 =200μs 𝑒3 =2ms 𝑒4 =18ms 𝜋6 = 2
P=80ms odometry 𝜋2 = 2 𝜋3 = 3 𝜋4 = 4 vel P=1s
pose mapl planl plan2 𝑒9 = 200𝑚s arrival curve
J =20 μs J =0 𝜋7 = 3

Figure 7 Callback graph of the event-driven move_base system.


plan2

Proof. Since |γ x,y | ≥ 2, the last callback in a subchain is a pp-based callback (timers and
𝑐2 𝑐6 𝑐7 𝑐11 𝑐12
𝑐1 4,𝛾𝛾2==5,(𝑐(𝑐1and
, 𝑐2 , 𝑐5 ) 𝛾 4 = (𝑐6 , 𝑐7 , 𝑐10 , 𝑐13 , 𝑐14 )
1
event source are necessarily source callbacks). By Lemmas Equation (6), every
𝑐5 𝑐10 3 , 𝑐4 , 𝑐5 ) 𝛾 5 = (𝑐8 , 𝑐9 , 𝑐10 , 𝑐11 , 𝑐12 )
other callback can interfere
𝑐3 𝑐8 𝑐9
𝑐4 with a pp-based 𝑐13
callback. 𝛾 = (𝑐each
3
𝑐14 Since 6 , 𝑐7 , 𝑐10instance
, 𝑐11 , 𝑐12 ) 𝛾 6=of(𝑐8a, 𝑐9 ,subchain
𝑐10 , 𝑐13 , 𝑐14 )

completes when the last callback of the chain terminates, it follows that the chain under
analysis can be interfered with by all other chains, regardless of callback priorities. The
lemma follows by noting that, due to non-preemptive scheduling, the subchain cannot be
interfered with during the execution of its last callback. J

Lemma 8 extends Lemma 4 for subchains. As observed in Section 5.2, also in this case
the presence of pp-based callbacks make the priority assignment ineffective for the purpose
of computing a tighter response-time bound. However, since arrival bursts of interfering
callbacks are accounted only for once per subchain, analyzing a subchain holistically still
overall improves the analysis accuracy for long subchains.

5.5 Analysis Summary


The results presented in this section allows analyzing ROS systems under reservation-based
scheduling. Specifically, Section 5.2 proposed a response-time analysis for single callbacks,
and Section 5.4 extended it to subchains allocated to a single reservation. As discussed
in Section 5.1, both approaches allow to compute a safe end-to-end latency for generic
processing chains by propagating arrival curves and summing individual response-time
bounds. Specifically, the effects of predecessor callbacks are accounted for as release jitter in
the activation curves of non-source callbacks. Such release jitter depends on the response
times of predecessors, but also response times depend on jitter in a circular manner. As in
the CPA approach, this problem can be solved by iteratively searching for a global fixed
point at which all jitter terms and response times are consistent.

6 Case Study

Our analysis seeks to enable ROS developers to easily and quickly try different designs
and explore various what-if scenarios. To evaluate the suitability of our approach for that
purpose, we analyzed a safety-critical processing chain in the popular move_base package,
the central part of the ROS navigation stack for wheeled robots, using sensor rates and
(observed) maximum execution times from a Bosch-internal case study. Since move_base has
not been ported to ROS 2 yet, we model the ROS 1 version as if it ran on a ROS 2 system.
The move_base package addresses the path planning problem: given a map of the
environment, first find a path to the goal location (global planning), and then control the
robot’s velocity to follow that path while avoiding obstacles (local planning). Both planners
base their decision on internal maps, which reflect the component’s knowledge of obstacles
and properties of the environment. As the robot moves through the environment, these maps
are continously updated based on the most recent sensor data.
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:17

The move_base callback graph is illustrated in Figure 7. The incoming sensor and position
data is normalized to absolute coordinates based on the robot’s pose and then integrated into
the respective maps. The local planner then updates its plan based on the new information.
From a timing perspective, the execution time of the global planner stands out. Unlike
the local planner, the global planner is difficult to predict and its execution time depends
heavily on the path-finding difficulty. The global planner’s map is also significantly larger
and often updated only partially, further reducing predictability. In the Bosch case study,
global planning times reached up to 200 ms, more than ten times the local planner’s runtime.
Fortunately, the global planning is not time-critical. At worst, computing the global plan too
late may cause the robot to take a detour for a few moments. We therefore separate the global
planning callbacks from the time-critical local planning callbacks using two reservations.
This not only isolates the more unpredictable components from the critical path, but also
helps to limit the effects of the ROS executor scheduling policy.
Internally, the move_base subsystem is completely time-driven, using ROS topics only to
communicate with other components. We configured the local planner to run in sync with
the fixed sensor rate of 12.5 Hz, and the global planner to run more rarely at 1 Hz. While this
setup makes for a quite predictable system, it is also very inflexible and makes it difficult to
ensure that components do not consume stale data. For comparison, we therefore modeled
two variants: the original, time-driven version, and an event-driven alternative design that
models the activation dependency explicitly using internal topics.
In our case study, we are interested in the end-to-end latency of the path from the
odometry input to the wheel command output (denoted “vel” for velocity). This latency
determines, for example, how long the robot takes to react to an obstacle suddenly appearing
in front of it, and hence is safety-relevant. In the event-driven setup, this is defined as the
worst-case response time from activation to completion of the chain. For the time-driven
setup, we compute the response time of the local planner and, if there is jitter on the sensor
inputs, add the activation period as worst-case sampling delay. Although generally speaking
the response time of the last chain element is not necessarily identical to the chain’s overall
latency, it happens to coincide here: by prioritizing the chain in decreasing order, and
triggering all tasks at the same time, no task can run before its predecessor completes. The
completion of the local planner thus implies the completion of the entire processing chain.
One of the most difficult steps in using reservation-based scheduling is dimensioning the
reservations correctly. When isolating the local from the global planner, one would like to
give the local planner just enough budget to complete in time, leaving as much execution
time as possible for the global planner. To this end, we prototyped our analysis in the
pyCPA framework [18]. Figure 8a shows the results for both the time-driven setup and the
event-driven setup. In case of the event-driven setup, we also included the analysis results
when disabling the whole-chain analysis described in Section 5.4. The graph shows the entire
range of local planner budgets in percent of the total core bandwidth. For simplicity, we
allocate the rest of the core to the global planner reservation. Since we do not analyze any
processing chains through the global planner’s reservation, though, the exact amount of
bandwidth dedicated to this reservation does not impact the reported response times.
The graph clearly shows a similar effect of budgeting on the time-driven and event-driven
system. However, due to the worst-case sampling delay, the time-driven system remains one
sampling period of 80 ms above the event-driven latencies. Clearly, the event-driven approach
is advisable in this setup, allowing the system to wait until the sensor results arrive instead
of commencing planning based on stale values. One can also observe the beneficial effects of
the whole-chain analysis; when disabled, the chain’s self-interference significantly inflates
the predicted response-time bounds. Since the analysis conservatively assumes that every
callback is blocked by every other callback, actual interference is overcounted four-fold.

ECRTS 2019
6:18 Response-Time Analysis of ROS 2 Processing Chains

500 700

End-to-End latency (ms)


End-to-end latency (ms) time-driven time-driven
400 event-driven 600 event-driven
event-driven (no chains) 500 event-driven (no chains)
300
400
200 300

200
100
100
0
40 45 50 55 60 65 0 20 40 60
Budget of the local reservation (percent) Jitter on the input sensors (ms)

(a) Effects of reservation dimensioning on the (b) Effects of input jitter on the critical path
critical path latency. latency, using a budget of 45%.

Figure 8 Experimental results.

Another important property of any control system is how it copes with input jitter.
While the previous experiment modeled the sensors as strictly following the 12.5 Hz schedule
with only 200 µs of jitter, Figure 8b shows the predicted end-to-end latency as input jitter
increases, using a local planning budget of 45%. Here, one clearly observes the main benefits
of purely time-driven systems; they are very robust to input jitter, mainly because they are
not influenced by bursts. For the event-driven system, one can observe a significant rise
roughly every 20 ms. These are the points at which one more event can arrive during the
execution of the processing chain. The event-driven system remains superior below 40 ms of
jitter (i.e., at most one event at the same time), but succumbs to self-interference at larger
jitters. Without a systematic analysis, such tradeoffs are extremely difficult to anticipate.
In conclusion, this case study highlights the benefits of automated response-time analysis.
Without implementing a single line of ROS code, we are able to reason about the worst-case
latencies of two quite different move_base designs, noting the advantages and disadvantages
in different scenarios. Having a fully-integrated and automatic version of this analysis
would clearly be a major aid to ROS developers, allowing response times to be treated
as a measurable design constraint instead of relying solely on intuition, trial-and-error, or
post-implementation experimentation. Extrapolating a bit further, it might even allow one
day to reason about the latency of external dependencies, enabling the safe and easy reuse of
well-tested components for time-critical purposes.

7 Related Work

The literature concerning the real-time aspects of ROS 2 processing chains is quite limited.
To the best of our knowledge, this is the first paper modeling a ROS system from a real-time
perspective and proposing a response-time analysis for ROS processing chains.
Most of the existing work on ROS targets ROS 1 systems, mainly conducting empirical
performance measurement and proposing possible improvements. For instance, Saito et al. [53]
proposed a priority-based message transmission algorithm for ROS 1 that allows publishers
to send data to multiple subscribers according to their priorities, and a mechanism for
synchronizing communications among multiple nodes running at different frequencies. Suzuki
et al. [61] presented a mechanism for coordinating CPU and GPU execution of ROS 1 nodes,
and an offline scheduling algorithm that assigns priorities to nodes according to their laxities.
Maruyama et al. [36] conducted an experimental study aimed at comparing the performance
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:19

of ROS 1 and a preliminary version of ROS 2 under different DDS implementations. Gutiérrez
et al. [25] performed a similar evaluation of ROS 2 “Ardent Apalone” under Linux with
the PREEMPT-RT patch. Concerning more general robotic systems, Lotz et al. [34, 35]
presented a meta-model for designing non-functional aspects of robotics systems such that
the resulting models can be analyzed with the SymTA/S timing analysis tool [27], which is
based on the CPA approach [29, 46, 48, 49, 65].
Concerning the analysis of processing chains in distributed systems, one of the first
proposals to verify end-to-end timing constraints is due to Fohler and Ramamritham [20],
who proposed an approach for obtaining a static schedule composed of tasks with precedence
constraints. In the context of non-static scheduling, prior work can be divided into two main
categories: those based on CPA [27] and those adopting an holistic approach [42, 63]. The
first method adopts arbitrary arrival curves [47] and analyzes chains crossing different nodes
of a distributed system individually (by means of a local component analysis), and propagates
the event model (i.e., activation curves) until convergence is achieved [50]. Different local
analyses have been designed over years. For instance, Schlatow and Ernst [54, 55] proposed
a local analysis for chains entirely contained in a single resource (e.g., a processing node)
where tasks along the chain can have arbitrary priorities, under preemptive scheduling. Other
authors [26, 28, 52, 56, 64] improved the analysis precision by accounting for correlations
among events in different components. Previously, Thiele et al. [62] proposed Real-Time
Calculus, an approach similar to CPA in which the service demand of the workload is modeled
with arrival curves, and service curves model the processing capacity of local components. As
in Network Calculus [30], arrival curves and services curve are combined together by means
of a max-plus algebra, thereby obtaining the timing behavior of the component. Concerning
the holistic approach, the seminal work is due to Tindell and Clark [63], who proposed
a schedulability analysis for transactions, i.e., sporadically triggered sequences of events,
scheduled under fixed-priority preemptive scheduling. Their analysis has been refined by
Palencia et al. considering offsets [42] and precedence relations [41]. More details about the
transactional task model can be found in a survey by Rahni et al. [44].
Only little attention has been given to date on how specific frameworks affect worst-case
response times. To the best of our knowledge, all of them target the OpenMP framework [40],
which is usually used for globally scheduled parallel tasks. For example, Serrano et al. [57]
distinguished between tied and untied sub-tasks in OpenMP, proposing a response-time
analysis for a parallel task composed of untied sub-tasks. While untied nodes have no
particular scheduling restrictions, tied sub-tasks are OpenMP-specific and consist of a
subgraph whose nodes must all execute on a single thread. Subsequently, Sun et al. [60]
proposed an improvement of the OpenMP scheduling policy. To the best of our knowledge,
the present paper is the first to systematically study the temporal behavior of ROS.

8 Limitations, Extensions, and Conclusions

This initial work on the timing analysis of ROS 2 processing chains can already handle practical
components (such as move_base), and provides a rich foundation for future developments.
Nonetheless, given the inevitable complexities associated with a mature, flexible, and widely
used framework, we had to elide certain infrequently used aspects of ROS. In the following,
we discuss these limitations and highlight promising direction for future extensions.
This paper considers the built-in single-threaded ROS executor. ROS also provides a
multi-threaded variant of that executor, and additionally allows the definition of arbitrary
special-purpose executors. Being able to easily integrate special-purpose schedulers tailored
to specific robot needs would allow for interesting domain-specific research in the future.

ECRTS 2019
6:20 Response-Time Analysis of ROS 2 Processing Chains

When using multiple executor threads in a shared process, concurrency problems arise.
ROS introduces mutually-exclusive callback groups to address this problem, and guarantees
that callbacks in the same group are never executed concurrently. Extending our analysis to
handle blocking relationships among callbacks remains future work.
This paper assumed the graph of the callbacks to be fixed. However, ROS allows nodes to
dynamically join and leave, as well as to subscribe to and unsubscribe from topics dynamically
at runtime, which is particularly useful for implementing different operating modes. This
problem is referred to as mode changes in the literature [38, 45]. Our analysis can be
applied to each mode in stable operation, but not does account for transient effects during
mode changes. The design of new analysis techniques accounting for mode changes (e.g.,
extending [10, 11, 13] to ROS systems) represents another relevant future direction.
We modeled the overhead of network delays and the underlying DDS implementation as a
single variable δi,j , which allows for a safe and simple accounting for network-related delays
in the overall response time by summing the communication delay every time the network is
crossed. An opportunity for future improvements would be to integrate network analysis
to eliminate the pessimism induced by the pay-burst-only-once problem when the network
is crossed multiple times. Furthermore, a detailed study of available DDS implementations
would allow for a more precise modeling of message processing overheads.
In addition to topics and services, ROS also provides a waitable callback type. This type
is intended to implement more complex communication primitives like the long-running and
high-level actions known from ROS 1 [1]. Since this mechanism was only introduced in the
latest release, there are no known users of this mechanism as of now. It will be necessary to
extend our analysis to these additional methods as and when they are adopted in ROS 2.
We assumed each callback to trigger an activation of all its successors at most once per
execution. As a future improvement, we would like to extend the proposed analysis to allow a
callback to trigger its successors only after having executed a predefined number of instances,
or to trigger multiple instances of each successors in a single execution [23].
Our analysis based on the CPA approach allows to simply and efficiently analyze a
real-world system, limiting the complexity by considering reservations individually. The
analysis accuracy can be further improved by considering correlations among activation
events in the chain, thus reducing the “pay-burst-only-once” problem also for chains spanning
multiple reservations. A possible research direction for future work consists in extending the
approaches presented by Fonseca et al. [21] and Casini et al. [14] (in the context of preemptive
and non-preemptive fixed-priority scheduling of parallel tasks, respectively), based on which
chains crossing multiple reservations could be modeled by means of self-suspending tasks [16].
In this way, arrival bursts can be considered only once per reservation, thus improving the
analysis precision for chains crossing the same reservation multiple times.
To conclude, we have presented the first comprehensive scheduling model of ROS 2
systems, based on a review of its source code and documentation. We derived a response-time
analysis for processing chains that takes the specific properties of the ROS framework into
account and applied to a realistic case study. While there remain ample opportunities for
future extensions, our contributions represent the first steps towards an automated analysis
tool that could allow ROS users without expert knowledge in real-time systems to quickly
and conveniently determine temporal safety and latency properties of their applications.

References
1 Action Lib. URL: http://wiki.ros.org/actionlib.
2 eProsima FastRTPS. URL: https://www.eprosima.com/index.php/products-all/
eprosima-fast-rtps.
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:21

3 The move_base package. URL: http://wiki.ros.org/move_base.


4 Robots using ROS. URL: http://robots.ros.org.
5 ROS 2 “Crystal Clemmys”. URL: http://www.ros.org/news/2018/12/ros-2-crystal-
clemmys-released.html.
6 RTI Connext DDS. URL: https://www.rti.com/products/connext-dds-professional.
7 Vortex OpenSplice. URL: http://www.prismtech.com/vortex/vortex-opensplice.
8 L. Abeni and G. Buttazzo. Integrating multimedia applications in hard real-time systems. In
Proceedings of the 19th IEEE Real-Time Systems Symposium (RTSS 1998), Madrid, Spain,
december 2-4, 1998.
9 A. Biondi, G. C. Buttazzo, and M. Bertogna. Schedulability Analysis of Hierarchical Real-Time
Systems under Shared Resources. IEEE Transactions on Computers, 65, May 2016.
10 A. Block and J. H. Anderson. Accuracy versus migration overhead in real-time multiprocessor
reweighting algorithms. In 12th International Conference on Parallel and Distributed Systems
- (ICPADS’06), Minneapolis, USA, july, 12-15, 2006.
11 A. Block, J. H. Anderson, and G. Bishop. Fine-grained task reweighting on multiprocessors.
In 11th IEEE International Conference on Embedded and Real-Time Computing Systems and
Applications (RTCSA’05), Hong Kong, China, july, 17-19, 2005.
12 G. C. Buttazzo. Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and
Applications, Third Edition. Springer, 2011.
13 D. Casini, A. Biondi, and G. C. Buttazzo. Handling Transients of Dynamic Real-Time
Workload Under EDF Scheduling. IEEE Transactions on Computers, 2018.
14 D. Casini, A. Biondi, G. Nelissen, and G. Buttazzo. Partitioned Fixed-Priority Scheduling of
Parallel Tasks Without Preemptions. In 2018 IEEE Real-Time Systems Symposium (RTSS),
December 2018.
15 Daniel Casini, Luca Abeni, Alessandro Biondi, Tommaso Cucinotta, and Giorgio Buttazzo.
Constant Bandwidth Servers with Constrained Deadlines. In Proceedings of the 25th Interna-
tional Conference on Real-Time Networks and Systems, RTNS ’17, 2017.
16 Jian-Jia Chen, Geoffrey Nelissen, Wen-Hung Huang, Maolin Yang, Björn Brandenburg, Kon-
stantinos Bletsas, Cong Liu, Pascal Richard, Frédéric Ridouard, Neil Audsley, Raj Rajkumar,
Dionisio de Niz, and Georg von der Brüggen. Many suspensions, many problems: a review of
self-suspending tasks in real-time systems. Real-Time Systems, September 2018.
17 Robert I. Davis, Alan Burns, Reinder J. Bril, and Johan J. Lukkien. Controller Area Network
(CAN) schedulability analysis: Refuted, revisited and revised. Real-Time Systems, 35(3):239–
272, April 2007.
18 Jonas Diemer, Philip Axer, and Rolf Ernst. Compositional performance analysis in python
with pyCPA. In In Proceedings of WATERS’12, 2012.
19 Jonas Diemer, Jonas Rox, and Rolf Ernst. Modeling of Ethernet AVB Networks for Worst-Case
Timing Analysis. IFAC Proceedings Volumes, 2012. 7th Vienna International Conference on
Mathematical Modelling.
20 G. Fohler and K. Ramamritham. Static scheduling of pipelined periodic tasks in distributed
real-time systems. In Proceedings 9th Euromicro Workshop on Real Time Systems, June 1997.
21 J. Fonseca, G. Nelissen, V. Nelis, and L. M. Pinho. Response time analysis of sporadic DAG
tasks under partitioned scheduling. In 2016 11th IEEE Symposium on Industrial Embedded
Systems (SIES), May 2016.
22 T. Foote. ROS community metrics report. URL: http://download.ros.org/downloads/
metrics/metrics-report-2018-07.pdf.
23 J. J. G. Garcia, J. C. P. Gutierrez, and M. G. Harbour. Schedulability analysis of distributed
hard real-time systems with multiple-event synchronization. In Proceedings 12th Euromicro
Conference on Real-Time Systems. Euromicro RTS 2000, 2000.
24 B. Gerkey. Why ROS 2.0? URL: http://design.ros2.org/articles/why_ros2.html.

ECRTS 2019
6:22 Response-Time Analysis of ROS 2 Processing Chains

25 C. Gutierrez, L. Juan, I. Uguarte, and V. Vilches. Towards a distributed and real-time


framework for robotsç Evaluation of ROS 2.0 communications for real-time robotics applications.
Techical report, Erle Robotics S.L., 2018.
26 R. Henia and R. Ernst. Context-aware scheduling analysis of distributed systems with
tree-shaped task-dependencies. In Design, Automation and Test in Europe, March 2005.
27 R. Henia, A. Hamann, M. Jersak, R. Racu, K. Richter, and R. Ernst. System level performance
analysis - the SymTA/S approach. IEEE Proceedings - Computers and Digital Techniques,
March 2005.
28 K. Huang, L. Thiele, T. Stefanov, and E. Deprettere. Performance Analysis of Multimedia
Applications using Correlated Streams. In 2007 Design, Automation Test in Europe Conference
Exhibition, 2007.
29 M. Jersak. Compositional Performance Analysis for Complex Embedded Applications.
30 Jean-Yves Le Boudec and Patrick Thiran. Network Calculus: A Theory of Deterministic
Queuing Systems for the Internet. Springer-Verlag, Berlin, Heidelberg, 2001.
31 J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: exact charac-
terization and average case behavior. In [1989] Proceedings. Real-Time Systems Symposium,
1989.
32 J. Lelli, C. Scordino, L. Abeni, and D. Faggioli. Deadline scheduling in the Linux kernel.
Software: Practice and Experience, 46(6):821–839, 2016.
33 G. Lipari and E. Bini. Resource partitioning among real-time applications. In 15th Euromicro
Conference on Real-Time Systems, 2003. Proceedings., pages 151–158, July 2003.
34 A. Lotz, A. Hamann, R. Lange, C. Heinzemann, J. Staschulat, V. Kesel, D. Stampfer, M. Lutz,
and C. Schlegel. Combining robotics component-based model-driven development with a
model-based performance analysis. In 2016 IEEE International Conference on Simulation,
Modeling, and Programming for Autonomous Robots (SIMPAR), December 2016.
35 Alex Lotz, Arne Hamann, Ingo Lütkebohle, Dennis Stampfer, Matthias Lutz, and Christian
Schlegel. Modeling Non-Functional Application Domain Constraints for Component-Based
Robotics Software Systems. In 6th International Workshop on Domain-Specific Languages
and Models for Robotic Systems (DSLRob’15), 2015.
36 Yuya Maruyama, Shinpei Kato, and Takuya Azumi. Exploring the performance of ROS2. In
Proceedings of the 13th International Conference on Embedded Software, page 5. ACM, 2016.
37 L. Marzario, G. Lipari, P. Balbastre, and A. Crespo. IRIS: a new reclaiming algorithm
for server-based real-time systems. In Proceedings. RTAS 2004. 10th IEEE Real-Time and
Embedded Technology and Applications Symposium, 2004., May 2004.
38 M. Negrean, M. Neukirchner, S. Stein, S. Schliecker, and R. Ernst. Bounding mode change
transition latencies for multi-mode real-time distributed applications. In ETFA2011, 2011.
39 Object Management Group. Data Distribution Service (DDS), 1.4 edition, 2015.
40 OpenMP. OpenMP Application Program Interface, Version 4.0., 2013.
41 J. C. Palencia and M. G. Harbour. Exploiting precedence relations in the schedulability analysis
of distributed real-time systems. In Proceedings 20th IEEE Real-Time Systems Symposium,
1999.
42 J. C. Palencia and M. Gonzalez Harbour. Schedulability analysis for tasks with static and
dynamic offsets. In Proceedings 19th IEEE Real-Time Systems Symposium, 1998.
43 Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob
Wheeler, and Andrew Y Ng. ROS: an open-source Robot Operating System. In ICRA workshop
on open source software, volume 3, page 5. Kobe, Japan, 2009.
44 Ahmed Rahni, Emmanuel Grolleau, Michaël Richard, and Pascal Richard. Feasibility Analysis
of Real-time Transactions. Real-Time Syst., 48(3), 2012.
45 J. Real and A. Crespo. Mode Change Protocols for Real-Time Systems: A Survey and a New
Proposal. Real-Time Systems, 26(2):161–197, March 2004.
46 K. Richter. Compositional Scheduling Analysis Using Standard Event Models: The SymTA/S
Approach.
D. Casini, T. Blaß, I. Lütkebohle, and B. B. Brandenburg 6:23

47 K. Richter and R. Ernst. Event model interfaces for heterogeneous system analysis. In
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition, pages
506–513, 2002.
48 K. Richter, M. Jersak, and R. Ernst. A Formal Approach to MpSoC Performance Verification.
Computer, 36(4), 2003.
49 K. Richter, R. Racu, and R. Ernst. Scheduling Analysis Integration for Heterogeneous
Multiprocessor SoC. In Proceedings of the 24th IEEE International Real-Time Systems
Symposium (RTSS), 2003.
50 K. Richter, D. Ziegenbein, M. Jersak, and R. Ernst. Model composition for scheduling analysis
in platform design. In Proceedings 2002 Design Automation Conference, June 2002.
51 Jonas Rox and Rolf Ernst. Formal Timing Analysis of Full Duplex Switched Based Ethernet
Network Architectures. In SAE Technical Paper, 2010.
52 Jonas Rox and Rolf Ernst. Compositional Performance Analysis with Improved Analysis
Techniques for Obtaining Viable End-to-end Latencies in Distributed Embedded Systems. Int.
J. Softw. Tools Technol. Transf., 15(3), 2013.
53 Yukihiro Saito, Takuya Azumi, Shinpei Kato, and Nobuhiko Nishio. Priority and synchroniza-
tion support for ROS. In 2016 IEEE 4th International Conference on Cyber-Physical Systems,
Networks, and Applications (CPSNA), pages 77–82. IEEE, 2016.
54 J. Schlatow and R. Ernst. Response-Time Analysis for Task Chains in Communicating Threads.
In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS),
April 2016.
55 Johannes Schlatow and Rolf Ernst. Response-Time Analysis for Task Chains with Complex
Precedence and Blocking Relations. ACM Trans. Embed. Comput. Syst., 16(5s), September
2017.
56 Simon Schliecker and Rolf Ernst. A Recursive Approach to End-to-end Path Latency Com-
putation in Heterogeneous Multiprocessor Systems. In Proceedings of the 7th IEEE/ACM
International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS
’09, 2009.
57 M. A. Serrano, A. Melani, R. Vargas, A. Marongiu, M. Bertogna, and E. Quiñones. Timing
characterization of OpenMP4 tasking model. In 2015 International Conference on Compilers,
Architecture and Synthesis for Embedded Systems (CASES), 2015.
58 Insik Shin and Insup Lee. Periodic resource model for compositional real-time guarantees. In
24th IEEE Real-Time Systems Symposium, 2003.
59 Marco Spuri. Analysis of Deadline Scheduled Real-Time Systems. Technical report, RR-2772,
INRIA, 1996.
60 J. Sun, N. Guan, Y. Wang, Q. He, and W. Yi. Real-Time Scheduling and Analysis of
OpenMP Task Systems with Tied Tasks. In 2017 IEEE Real-Time Systems Symposium
(RTSS), December 2017.
61 Yuhei Suzuki, Takuya Azumi, Shinpei Kato, and Nobuhiko Nishio. Real-Time ROS extension on
transparent CPU/GPU coordination mechanism. In 2018 IEEE 21st International Symposium
on Real-Time Distributed Computing (ISORC), pages 184–192. IEEE, 2018.
62 L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for scheduling hard real-
time systems. In 2000 IEEE International Symposium on Circuits and Systems. Emerging
Technologies for the 21st Century. Proceedings, May 2000.
63 Ken Tindell and John Clark. Holistic Schedulability Analysis for Distributed Hard Real-time
Systems. Microprocess. Microprogram., 40(2-3):117–134, 2012.
64 Ernesto Wandeler and Lothar Thiele. Workload Correlations in Multi-processor Hard Real-time
Systems. J. Comput. Syst. Sci., 73(2), 2007.
65 D. Ziegenbein, M. Jersak, K. Richter, and R. Ernst. Breaking Down Complexity for Reliable
System-Level Timing Validation. In Ninth IEEE/DATC Electronic Design Processes Workshop
(EDP’02), 2002.

ECRTS 2019

You might also like