Multi threaded subscription callbacks

SecretaryBirds · October 6, 2017, 5:59pm

I’m hoping someone can solve a debate we’re having.

Does each subscription callback run in its own thread? Or do they round-robin in a single thread based on the data order received?

I guess we have assumed that a node with multiple subscriptions were spawning those callbacks in their own thread, which makes things performant. We’ve even been implementing some data protectors on shared data because of this assumption.

If this isn’t the case, then our callbacks must be fast enough, and our algorithms naïve enough to hide the fact that data is being processed sequentially.

wjwwood · October 6, 2017, 6:41pm

Depends

In both C++ and Python we have this concept of Executors. A single threaded executor (the default if you’re using rclcpp::spin()) is merely round-robin in the blocking spin call (in the thread in which you call spin). There is also a multi-threaded executor which will create a set of threads which will be dispatched work in round-robin fashion.

Right this second there is no way to have a separate thread per subscription (which was possible in ROS 1 with “callback queue’s”), but we want to change that. Right now, the most granular you can get is one executor per node (and therefore one thread per node in the case of a single threaded executor).

Have a look at the single threaded executors, they just wait for something to do, do it, and then loop:

C++:

github.com

ros2/rclcpp/blob/022b2b1b807d8239b492e81a8fc156e36c60724d/rclcpp/src/rclcpp/executors/single_threaded_executor.cpp#L33-L34


      
          auto any_exec = get_next_executable();
          execute_any_executable(any_exec);

Python:

github.com

ros2/rclpy/blob/a6daedbde500a1dbd222c1f1851e1ae2aea1f2e1/rclpy/rclpy/executors.py#L382-L383


      
          handler, entity, node = next(self.wait_for_ready_callbacks(timeout_sec=timeout_sec))
          handler()

The multi-threaded executors do something like: acquire a lock, wait for something to do, claim it, release lock, do it, loop. The C++ one will do this in each thread (each thread is identical), but the thread you call MultiThreadedExecutor.spin() on will just wait for the other threads to join (sort of wasted atm):

github.com

ros2/rclcpp/blob/022b2b1b807d8239b492e81a8fc156e36c60724d/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp#L70-L78


      
          executor::AnyExecutable::SharedPtr any_exec;
          {
            std::lock_guard<std::mutex> wait_lock(wait_mutex_);
            if (!rclcpp::utilities::ok() || !spinning.load()) {
              return;
            }
            any_exec = get_next_executable();
          }
          execute_any_executable(any_exec);

But the Python one works differently, it uses the thread in which spin() is called to wait for work, then dispatches that work to a “thread pool executor” (Python concept, not ours) via the submit() method (concurrent.futures — Launching parallel tasks — Python 3.12.1 documentation) which actually executes the user’s callback:

github.com

ros2/rclpy/blob/a6daedbde500a1dbd222c1f1851e1ae2aea1f2e1/rclpy/rclpy/executors.py#L410-L411


      
          handler, entity, node = next(self.wait_for_ready_callbacks(timeout_sec=timeout_sec))
          self._executor.submit(handler)

You’ll notice there is nothing special about these executors, and you can create your own which let you have complete control over how many threads and how they are utilized.

That is not the case, and whether or not that’s true depends on your definition of performant

If you want to reduce overhead and latency, then creating threads for each callback would be very inefficient. For utilizing multicore systems as much as possible (another definition of performant), then you’ll want at least some threads, but still you’d ideally want to only have as few threads as possible and reuse them rather than create them frequently.

Lucky we give you all the tools required to control threading, you can use a single thread or multiple threads or even create your own executor and do what ever you want.

This is where the concept of “callback groups” comes in, I think. The idea of a callback group is that everything with a callback belongs to one (timers, subscriptions, service clients/servers) and the type of callback group determines how the executor will treat them. For instance, two callbacks (same from two different timers) in a single “mutually exclusive” callback group will never be executed at the same time as one another by the executor. However, if you place them in separate mutually exclusive callback groups, then they could be executed at the same time as one another (and therefore data shared between them needs to be protected). There is also a “reentrant” callback group, which means that not only can a callback be called at the same time as other callbacks, it can be called multiple times concurrently.

The purpose of the callback groups is to let the user describe the synchronization coupling between different callbacks without tying that to the threading model. For instance, you can describe a callback as reentrant, but that does not mean it will be called concurrently, because if you use a single threaded executor it will not, but if you used a multithreaded executor it might (if for example there is more than one message to be processed simultaneously).

Like the executors you can create your own callback groups to express any constraint you might have.

The Python ones are probably the easiest to understand:

github.com

ros2/rclpy/blob/a6daedbde500a1dbd222c1f1851e1ae2aea1f2e1/rclpy/rclpy/callback_groups.py#L75-L112


      
          class ReentrantCallbackGroup(CallbackGroup):
              """Allow callbacks to be executed in parallel without restriction."""
          
              def can_execute(self, entity):
                  return True
          
              def beginning_execution(self, entity):
                  return True
          
              def ending_execution(self, entity):
                  pass
          
          
          class MutuallyExclusiveCallbackGroup(CallbackGroup):
              """Allow only one callback to be executing at a time."""
          
              def __init__(self):
                  super().__init__()
                  self._active_entity = None
                  self._lock = Lock()

This file has been truncated. show original

Sorry for the lack of documentation, but we’ve been trying to make sure this all makes sense before finishing it off and documenting it throughly. Any feedback on the pattern is welcome.

bbus · May 29, 2020, 9:43pm

I’m not super familiar with Python threading (or threading in general, frankly), but it seems that if ROS uses the “process pool executor” instead of the “thread pool executor” then it could happen out of sequence because it is not tied to the python global interpreter lock.

I’ve tried to find more information on ROS’ use of threading on callbacks, and this discussion is the best I’ve been able to find. Perhaps I should ask this on answers. But I thought maybe this counts as a discussion into whether the linked file could use the “process pool executor” instead of the “thread pool executor”.

Do we have documentation on which one ROS1 uses? The best I could find was

and

which to me seem to contradict each other.

Topic		Replies	Views
How to use callback groups in ROS2 ROS General ros2	10	20356	June 24, 2022
Concurrency, Threading and race conditions with rospy ROS General noetic	7	5044	March 12, 2023
Async executor in ROS2 ROS General	3	5798	May 31, 2017
Dynamically subscribing to nodes in ROS2 after rospy.spin ROS General	1	2380	June 17, 2021
Can the thread number of MultiThreadExecutor be infinite? What're the cons of long-running callbacks? ROS General ros2	4	1150	February 20, 2022

Related topics