I’m hoping someone can solve a debate we’re having.
Does each subscription callback run in its own thread? Or do they round-robin in a single thread based on the data order received?
I guess we have assumed that a node with multiple subscriptions were spawning those callbacks in their own thread, which makes things performant. We’ve even been implementing some data protectors on shared data because of this assumption.
If this isn’t the case, then our callbacks must be fast enough, and our algorithms naïve enough to hide the fact that data is being processed sequentially.
In both C++ and Python we have this concept of Executors. A single threaded executor (the default if you’re using rclcpp::spin()) is merely round-robin in the blocking spin call (in the thread in which you call spin). There is also a multi-threaded executor which will create a set of threads which will be dispatched work in round-robin fashion.
Right this second there is no way to have a separate thread per subscription (which was possible in ROS 1 with “callback queue’s”), but we want to change that. Right now, the most granular you can get is one executor per node (and therefore one thread per node in the case of a single threaded executor).
Have a look at the single threaded executors, they just wait for something to do, do it, and then loop:
C++:
Python:
The multi-threaded executors do something like: acquire a lock, wait for something to do, claim it, release lock, do it, loop. The C++ one will do this in each thread (each thread is identical), but the thread you call MultiThreadedExecutor.spin() on will just wait for the other threads to join (sort of wasted atm):
But the Python one works differently, it uses the thread in which spin() is called to wait for work, then dispatches that work to a “thread pool executor” (Python concept, not ours) via the submit() method (concurrent.futures — Launching parallel tasks — Python 3.12.1 documentation) which actually executes the user’s callback:
You’ll notice there is nothing special about these executors, and you can create your own which let you have complete control over how many threads and how they are utilized.
That is not the case, and whether or not that’s true depends on your definition of performant
If you want to reduce overhead and latency, then creating threads for each callback would be very inefficient. For utilizing multicore systems as much as possible (another definition of performant), then you’ll want at least some threads, but still you’d ideally want to only have as few threads as possible and reuse them rather than create them frequently.
Lucky we give you all the tools required to control threading, you can use a single thread or multiple threads or even create your own executor and do what ever you want.
This is where the concept of “callback groups” comes in, I think. The idea of a callback group is that everything with a callback belongs to one (timers, subscriptions, service clients/servers) and the type of callback group determines how the executor will treat them. For instance, two callbacks (same from two different timers) in a single “mutually exclusive” callback group will never be executed at the same time as one another by the executor. However, if you place them in separate mutually exclusive callback groups, then they could be executed at the same time as one another (and therefore data shared between them needs to be protected). There is also a “reentrant” callback group, which means that not only can a callback be called at the same time as other callbacks, it can be called multiple times concurrently.
The purpose of the callback groups is to let the user describe the synchronization coupling between different callbacks without tying that to the threading model. For instance, you can describe a callback as reentrant, but that does not mean it will be called concurrently, because if you use a single threaded executor it will not, but if you used a multithreaded executor it might (if for example there is more than one message to be processed simultaneously).
Like the executors you can create your own callback groups to express any constraint you might have.
The Python ones are probably the easiest to understand:
Sorry for the lack of documentation, but we’ve been trying to make sure this all makes sense before finishing it off and documenting it throughly. Any feedback on the pattern is welcome.
I’m not super familiar with Python threading (or threading in general, frankly), but it seems that if ROS uses the “process pool executor” instead of the “thread pool executor” then it could happen out of sequence because it is not tied to the python global interpreter lock.
I’ve tried to find more information on ROS’ use of threading on callbacks, and this discussion is the best I’ve been able to find. Perhaps I should ask this on answers. But I thought maybe this counts as a discussion into whether the linked file could use the “process pool executor” instead of the “thread pool executor”.
Do we have documentation on which one ROS1 uses? The best I could find was