Hello. We are investigating and studying the ROS2 real-time in a team.
In the previous post bellow, we reported about thread priority setting of main-thread and DDS child thread.
In these post we reported that thread proity could affect the jitter of sleep or timing of callback.
But if threads are in dirrefent CPU core, they do not affect each other.
So it seems natural to run callback as thread with specific CPU core and scheduling policy.
In fact, tasks are executed as “thread” in RTOS. By using such a thread, we’ll get
- ROS executor thread can be interrupted (i.e. can read topic) even if callbacks are running
- easily implement preemption
- prevent duplicattion of scheduling procedure in executor and OS, this would make easy to verification
- combination with LET sheduler may be also acceptable
And more, we want to detect deadline miss. By splitting callback thread, existing timer mechanism can be applied to overrun handler (timer can be triggered even if callback is running).
So we implemented PoC, toy code to run callbacks as different thread from main thread with specific CPU core and scheduling policy, and detect overrun.
It’s a PoC (proof of concept) implementation so I implement in user-land, not in ROS 2 layer.
Sample scenario
Consider following situation:
- There is only one CPU core for simplicity.
- There are 3 tasks(callbacks) with different priority, namely TaskA, TaskB, and TaskC.
- Tasks are fired by topic, namely there are 3 paris, PubA-SubA, PubB-SubB, PubC-SubC.
Tasks are fired by SubX as thread. - TaskA has the highest priority and shortest task.
TaskB has middle priority and middle task.
TaskC has the lowest priority and longest task.
Namely TaskA shoud run even if TaskB or TaskC is running.
To illustrate this, see figure below.
- TaskC fires by topic C.
-
means TaskC is running. - When topic B comes, TaskB fires and TaskC is stopped.
O
means TackC is stop
The same is true when topic A comes. - When TaskA is finished(
X
means this), TaskB runs because TaskB has higer priority than TaskC.
The same When TaskB finished.
(In a nutshell preemption.)
priority
^
| TaskA ---X
|
| TaskB --------O ------X
|
| TaskC ------------O ------X
| ^ ^ ^
| | | |
| Topic C B A
|
+----------------------------------------------------------> time
In the PoC code, thread priorities are main thread > DDS thread > TaskA thread > TaskB thread > TaskC thread.
Tasks run in the same core.
Implementation
POSIX thread API enables to set scheduling policy, CPU affinity.
I guess ROS 2 is developed with no (or less) OS restriction, so it may not be desirable to use the POSIX API.
But as far as I know, it’s hard to implement preemption without OS level support.
See ThreadedSubscription in https://github.com/y-okumura-isp/ROS2_ThreadedCallback/blob/master/include/threaded_subscriber.hpp.
You can see following in the constructor ThreadedSubscription
.
- creates callback thread
- and sets up thread by
pthread_setschedparam
andpthread_setaffinity_np
To implement subscription callback and overrun handler, overload on_subscription
and on_overrun
.
ThreadedSubscription
is a helper class, and to use this in node class do following:
- Use
ThreadedSubscription::create_subscription(rclcpp::Node *node, const std::string & topic, const rclcpp::QoS & qos)
when you don’t nedd overrun handler. - Use
ThreadedSubscription::create_subscription(rclcpp::Node *node, const std::string & topic, const rclcpp::QoS & qos, std::chrono::duration<DurationRepT, DurationT> overrun_period)
when you need overrun handler.
It’s PoC, so I think there is more sophisticated API. I want to discuss the necessity of threaded callback rather than API now.
See README.md for detail.
Discussion
What do you think about a executor which uses threads associated to each callbacks i.e. creates a thread when create_subscription
is called.
If you agree and try to implement in ROS layer, we should consider several things:
- do we implement such a mechanism in rcl layer or rclcpp layer? As rcl does not have executor, it may be easy to start with rclcpp.
- Decide what to do for new topic when the callback is already running. Drop? Delay?
Error handling may be the best in some case. So may developer want to select?- If we select delay, we may need to consider Executor
get_next_ready_executable
andwait_for_work
relation. We need to clear event flag, but execute subscription lazily.
- If we select delay, we may need to consider Executor
- As Data writer(executor in this situation) and data reader(subscription callback) run in parallel, we need to prevent data from changing in the middle of the callback.
So it may be good for callback thread to read data i.e. callrcl_take
.
Questions, suggestions and advice are welcome.
Thank you.