SingleThreadedExecutor creates a high CPU overhead in ROS 2

wjwwood · October 30, 2019, 1:31am

Hey guys, sorry for not responding here earlier, I have been following the discussions with a lot of interest. I really appreciate all the work you guys did to identify the issues with the current implementation.

I’m also planning on doing a first review of the proposed static executor that @MartinCornelis posted as soon as I can. We’ve had a bit of a backlog on features into rclcpp, but hopefully we can make a lot of progress on executor related changes during the F-turtle sprint.

I just wanted to briefly mention a few other changes we’ve been wanting to do with respect to the executor, which have sort of been preventing me from nailing down design documentation and recommending courses of action on threads like this one.

First, I really want to change the executor design so that you can use more than one executor per node. At the moment the association is “an executor may have zero to many nodes, and a node may have (be associated with) zero to one executors”. In the future I’d like to see it be callback groups which are the most granular thing that can be associated with an executor. I believe this was one of the possible ways to improve the design mentioned in https://github.com/ros2/rclcpp/issues/825.

The other major change is that we’d like to create a “wait set” like class in rclcpp (we have the wait set in rcl already), so that users may choose to avoid the executor pattern all together and instead wait on items and decide how to handle them on their own. In this case, I think that callback groups and executors will not be used. I’m still thinking about all the implications and possible use cases (including mixed use of executors and wait sets). This isn’t directly affecting the discussions here, but it may have an impact as “waitables” like timers and subscriptions may no longer have to be associated with a callback group or executor, where as right now they must be in order to be used.

Finally, there’s a lot of interface clean up around the executor that’d I’d like to undertake, specifically to expose the scheduling logic (currently it’s very naive and hard coded) and also I’d like to refactor the “memory strategy” class. It has a very important purpose (allowing you to control any incidental memory allocations), but it’s current design is pretty hard to understand.

I haven’t decided if we should either, try to integrate the suggested changes and/or try to tackle the performance problems described here first and then make some of the changes I described above, or first make the architectural changes and then re-evaluate the feedback in this thread, or try and do them together somehow. Perhaps a compromise would be to do the architecture changes while also working with people in this thread to ensure proper tracing hooks and try to catch obvious performance issues as we go, and then look more changes we could make, e.g. a more static executor design and/or changes to rmw to provide more information from the middleware.

I hope this is something we can discuss in detail at ROSCon (for those who will be there) and at the real-time working group as well. We’ll do our best to summarize the discussions here too.

MartinCornelis · November 27, 2019, 9:53am

Hey @wjwwood,

Thank you very much for the kind words. We decided to pause our work on the PR for now specifically because of the points mentioned in your post.

The way we implemented the static executor atm gets the job done, but it would be even better if we could write an executor that captures multiple improvements at the same time. We are looking forward to the rclcpp changes planned for the Foxy release.

In the meantime we’ve separated the static_executor functionality from rclcpp and have written it as a separate library as requested by some users.
The Dashing and Eloquent versions can be found here:

To use the static_executor, please look at the README. By default the original executor will be used, you have to make changes to the package.xml CMakeLists.txt and your source code to actually use the static_executor.

We hope this separate library version can help some people out, while we all wait for the even more awesome executor that is planned for Foxy!

alsora · March 24, 2020, 3:32pm

Hi,
with the next ROS 2 release approaching quickly, I would like to revive this discussion.

At iRobot we are currently investigating the performance of ROS 2 on single core platforms, thus improving the executor and its related data-structures is crucial.

My colleague Mauro Passerino is currently working on improvements to the StaticExecutor proposed by @MartinCornelis, with the goal to have it merged in the Foxy release.

We run multiple tests using our benchmark application and we got the following results for a 10 nodes system:

SingleThreadedExecutor CPU usage: 72%
StaticSingleThreadedExecutor CPU usage 53%
StaticSingleThreadedExecutor + our changes: CPU usage 40%

You can find more details in the static executor PR

These are already great improvements, however I think it would be very productive to have a discussion about what other steps can be taken both for the next release as well as for the future of ROS.

@wjwwood @ivanpauno @tomoyafujita @Ingo_Lutkebohle and any one else in this thread, would you be interested in scheduling a meeting on these topics?

wjwwood · March 25, 2020, 3:56am

I’m currently trying to finish a pull request to kick off the changes to the executor design, and while doing it, I think I have decided to take the static executor pr first. I’m not 100% sure yet, but I’m leaning that way. But either way I intend to make some progress on that pr this week.

As for having a meeting about what to do in the future, that’s fine, but in the next two weeks I will be very busy trying to get the already planned features into the foxy release, with help from some others like @Ingo_Lutkebohle. So I don’t think there’s much time to add more items for this release, nor will it help get them in to have a lot of other meetings (at least for me personally), so I’d prefer to schedule this for a few weeks from now, but I’m happy to attend and contribute to it.

Dejan_Pangercic · April 1, 2020, 7:35am

@alsora I can invite you to the RTWG where the Executor topic is being regularly discussed: https://docs.google.com/document/d/1zBKwDUDeWvJNyCvjzYriaZQoZO2VYGWe1uxw5Xxn5cY/edit?usp=sharing

The meeting coordinates you can find in this calendar: https://index.ros.org/doc/ros2/Governance/#upcoming-ros-events.

alsora · April 2, 2020, 9:43am

Thank you! I will try to join the next meeting!

I saw the PR for the refactor of the executors from @wjwwood https://github.com/ros2/rclcpp/pull/1047
We will probably make some comments there in the meanwhile.

Topic		Replies	Views
Reducing ROS 2 CPU overhead by simplifying the ROS 2 layers Next Generation ROS	11	7831	October 15, 2021
High cpu load for simple python nodes General galactic	32	8800	September 16, 2023
Faster rclpy executor now in Rolling General ros2 , rolling , python	4	588	April 8, 2025
Reconsidering 1-to-1 mapping of ROS nodes to DDS participants Next Generation ROS	22	3942	September 10, 2019
ROS 2 Real-time Working Group Online Meeting 7 - Dec 11, 2019, 7AM PDT (UTC-8) Next Generation ROS wg-real-time	8	1585	December 12, 2019

SingleThreadedExecutor creates a high CPU overhead in ROS 2

Related topics