The ROS 2 C++ Executors

Hi,

starting this topic to spread some awareness around the current status of ROS 2 executors in the rclcpp C++ client library and to discuss their future.
Let’s have a quick look at the available executor classes.

First we have probably the two most famous executors:

  • The SingleThreadedExecutor: this is the default executor for ROS 2 (and the first developed). It uses wait-sets and processes events in an arbitrary order.
  • The MultiThreadedExecutor: this is a multi-threaded version of the SingleThreadedExecutor.

The initial implementation of these executors had major performance issues.
So the ROS community spent a good amount of time trying to improve them.
This resulted in a lot of different executors, both open and closed source.
I suggest you to look at the ROSCon 2021 “Executors Workshop”.

Out of all this work, two new executors got into rclcpp:

  • The StaticSingleThreadedExecutor: this executor was developed by Nobleo and then improved by iRobot. Its initial implementation was focused on the idea that ROS 2 systems are static at steady-state, so you don’t need to rebuild the list of entities at every iteration. Despite the name, this executor is fully usable also for non-static systems.
  • The EventsExecutor: this executor was developed by iRobot. Its main difference with respect to all the other executors is that it doesn’t use the concept of wait-set, but rather it is based on an events-queue. This results in not paying overhead for entities that are not receiving events and in the possibility of processing events in the correct arrival order.
    It uses the same concept as the static executor to avoid rebuilding the list of entities.

As part of ROS 2 Jazzy, the rclcpp maintainers did a major rework of the executors, essentially generalizing the concept used for not having to rebuild the list of entities to all executors (that before was used only by the StaticSingleThreadedExecutor and the EventsExecutor).

Let’s have a look at the performance of the executors today, using ROS 2 Rolling and the irobot performance framework.
I created a pseudo-random system with ~40 publishers and ~70 subscriptions spread across 8 executors, let them publish for 60 seconds and I got these numbers (which align with my expectations and previous results):

The conclusions I would like to draw out of this are:

  • The SingleThreadedExecutor is now more performant than the StaticSingleThreadedExecutor
  • The StaticSingleThreadedExecutor should now be deprecated: all its improvements are now available in all the other executors; moreover there are open issues and concerns that are specific to its implementation.
  • The EventsExecutor should already be the choice if you are looking for improved performance and we should start the process to promote it as default executor.
33 Likes

Thanks for sharing this.

Shall we consider changing the default executor in the tutorial to the EventsExecutor?

1 Like

Are there any notable downsides to the EventsExecutor current implementation we should be aware of or where it is not feature complete?

I’d love to make a new isolated events executor container for rclcpp_components so that this could be a runtime option for Nav2 users (or just move over to it if there’s no notable downsides / issues).

1 Like

TL;DR;
before making the EventsExecutor the default one we need to fix a bug and provide an alternative, backward compatible, mode.


The long version:

There’s currently an important bug that makes the events-executor not usable in combination with simulated time: TimersManager doesn't follow ROS time · Issue #2480 · ros2/rclcpp · GitHub
The client library working group is currently discussing solutions.

Besides the aforementioned bug, there are some concerns that the current implementation of the events-executor has a different ordering for entities execution with respect to the other executors.
This concept applies to situations when an executor thread awakes and there’s more than 1 entity ready for work.

The EventsExecutor always executes entities in the order in which they become ready, while wait-set based executors have a fixed order: they always execute timers first, then subscriptions, etc; moreover there’s an implicit ordering also within each group of entities, based on the order in which entities were added to the executor or node.

Note that this contributes to the difference in performance: wait-set based executors iterate over list of entities and check which one is ready. If you have 50 entities in an executor, but only 1 is receiving messages, you will waste time looping every time (that’s why disabling builtin services/topics improves CPU with wait-set executors).
With the events-executor on the other hand you have a list of ready events each with an associated entity, so every time the executor has work to do, rather than iterating over multiple lists you will have look-ups in a map.

This difference affects also the situation where an entity has more than one item of work to do, such as a subscription that received 2 messages.
The wait-set based executors can’t process 2 items of work from the same entity within the same iteration.
So the SingleThreadedExecutor in this situation will 1) loop through everything and execute the first message, 2) try to go back to sleep 3) wake up immediately and 4) loop through everything again and execute the second message.
On the other hand, whenever the EventsExecutor wakes up, it will execute all the available events before going back to sleep.

IMO the EventsExecutor approach besides being way more performant, is also more correct, as it ensures that all events will be processed as soon as possible.
However, before making the EventsExecutor default, it was requested that we provided an alternative implementation that ensures the same ordering behavior as other executors.

The EventsExecutor design allows to support different behaviors through the definition of custom EventQueue class.
The current implementation in rclcpp just uses a std::queue under the hood, so the idea is to write a new class that internally will sort events as is done by wait-set executors.
This will obviously come at a performance cost (I expect performance to still be a lot better than the current default executor), but users can always switch to different queue implementations.
P.S. There are already other queue implementations in the community:

5 Likes

Or add an argument to the existing one :wink:

1 Like

In short, I don’t have fundamental objections to working towards making the EventsExecutor the default.

However, I think at least 3 important things need to be done to it before we can even consider that:

  1. It needs to support sim_time, which it currently does not.
  2. It probably needs to have a multi-threaded version, though this may be negotiable.
  3. It needs to be run against all of the tests we currently have. Note that for most tests in the core, we use the default (SingleThreadedExecutor) by default, unless the test explicitly requests otherwise.

Once those 3 major things are done, I think the EventsExecutor could be promoted out of the experimental namespace, and then we could consider making it the default.

8 Likes

Why does the multi-threaded executor have such a high latency? This seems counter-intuitive to me if running on multiple CPU cores. Is there some major copy penalty?

Short answer:
The multi threaded executor rebuilds the waits set for almost every executed entity (sub, service timer).

Long answer:
There is no concept of buffering ready events for every callback_group and there is only one set of ready events at a time. So what happens is :

  • WaitSet with 2 callback groups is build and waited on
  • WaitSet returns a WaitResult with multiple entities that are ready for execution
  • Thread one starts executing the ready entity and marks the first callback group as ‘in use’
  • Second thread comes by, and request something to execute. If the WaitResult only contains ready entries for the first callback group nothing is ready for execution(Mutual Exclusive).
  • Second thread builds a WaitSet only containing entities belonging to the non blocked (second) callback group, and waits.
  • First thread finishes executing the entity, and marks the first callback group as ‘not in use’
  • Second thread is woken up, as a callback group changed.
  • The WaitResult is empty. (The first callback group was not included)
  • Second thread rebuilds the WaitSet containing both callback groups. ← This creates the delay
  • Wait instantly returns as events are ready in the first callback group
3 Likes

Neat, looking forward to a rclpy version as well. There will be one right? Right? :smiley:

Arguably it could use a performance boost on that end far more than rclcpp.

1 Like

The performance of rclpy is a topic we have also briefly discussed in some of the last client library WG meetings.
There’s definitely multiple issues there (both problems with large messages as well as problems with high frequency small messages).
Find a few more details here Client Library WG May 24th Notes.

The EventsExecutor design is based on the RWM listener APIs (i.e. you can get a notification when an RMW entity receives a message) and these APIs could be exposed in Python quite easily (they are in rcl).

It would be great if someone in the community wanted to try that.

2 Likes

In the context of rclcpp executors development, is there any interest in wrapping or otherwise recreating the features of the rclc executor? The rclc executor contains lots of features like user-defined ordering, and triggering conditions for callbacks.

It seems like it should be possible to use the rclc executor with rclcpp by extracting all the relevant rcl handles, but I haven’t tried it personally. If anyone has and would be willing to share an example that would be helpful as well.

1 Like

I personally don’t know much about the rclc executor, but I’m interested in knowing more about it.
It could either be wrapped or re-implemented, depending on whatever is easier (side-note: moving this as a common executor in rcl (not rclc) would be a very interesting proejct; I have no idea whether it’s doable)

I propose the following roadmap to take the EventsExecutor out of the experimental namespace:

P.S. note that I also started a discussion dedicated to how the experimental workspace should be managed here: On the `rclcpp::experimental` namespace.

A further promotion to default executor can be discussed as a following step.

@clalancette I think these 2 should be requirements to become the default, not to go out of experimental.

2 Likes