SingleThreadedExecutor creates a high CPU overhead in ROS 2

Hey guys, sorry for not responding here earlier, I have been following the discussions with a lot of interest. I really appreciate all the work you guys did to identify the issues with the current implementation.

I’m also planning on doing a first review of the proposed static executor that @MartinCornelis posted as soon as I can. We’ve had a bit of a backlog on features into rclcpp, but hopefully we can make a lot of progress on executor related changes during the F-turtle sprint.

I just wanted to briefly mention a few other changes we’ve been wanting to do with respect to the executor, which have sort of been preventing me from nailing down design documentation and recommending courses of action on threads like this one.


First, I really want to change the executor design so that you can use more than one executor per node. At the moment the association is “an executor may have zero to many nodes, and a node may have (be associated with) zero to one executors”. In the future I’d like to see it be callback groups which are the most granular thing that can be associated with an executor. I believe this was one of the possible ways to improve the design mentioned in https://github.com/ros2/rclcpp/issues/825.

The other major change is that we’d like to create a “wait set” like class in rclcpp (we have the wait set in rcl already), so that users may choose to avoid the executor pattern all together and instead wait on items and decide how to handle them on their own. In this case, I think that callback groups and executors will not be used. I’m still thinking about all the implications and possible use cases (including mixed use of executors and wait sets). This isn’t directly affecting the discussions here, but it may have an impact as “waitables” like timers and subscriptions may no longer have to be associated with a callback group or executor, where as right now they must be in order to be used.

Finally, there’s a lot of interface clean up around the executor that’d I’d like to undertake, specifically to expose the scheduling logic (currently it’s very naive and hard coded) and also I’d like to refactor the “memory strategy” class. It has a very important purpose (allowing you to control any incidental memory allocations), but it’s current design is pretty hard to understand.


I haven’t decided if we should either, try to integrate the suggested changes and/or try to tackle the performance problems described here first and then make some of the changes I described above, or first make the architectural changes and then re-evaluate the feedback in this thread, or try and do them together somehow. Perhaps a compromise would be to do the architecture changes while also working with people in this thread to ensure proper tracing hooks and try to catch obvious performance issues as we go, and then look more changes we could make, e.g. a more static executor design and/or changes to rmw to provide more information from the middleware.

I hope this is something we can discuss in detail at ROSCon (for those who will be there) and at the real-time working group as well. We’ll do our best to summarize the discussions here too.

6 Likes