In one of the previous ROS 2 TSC meeting it was suggested that we form a Working Group in which we will try to analyse the current state of ROS 2 and make it real-time.
To this date we have the following articles about real-time in ROS 2:
- Original article by Jackie: https://design.ros2.org/articles/realtime_background.html
- ROS 2 ported on some RTOS (https://www.esol.com/embedded/ros.html, http://blackberry.qnx.com/en/articles/what-adas-market-needs-now)
- Apex.AI article about porting ROS 1 applications to ROS 2 applications: https://www.apex.ai/blog/porting-algorithms-from-ros-1-to-ros-2
- Bosch proposing how to make Callback-group-level Executor real-time https://vimeo.com/292707644
Since real-time is not something that can start and stop within the ROS 2 “borders”, we would like to propose to analyse an entire stack, from the hardware platform to the applications written with ROS 2.
There is many details that we could get lost into but we think that we could start with the following list and elaborate on the items:
- Pick real-time capable hardware platform. Decide if we go multi-core many-core or uP.
- RTOS (real time operating system). Decide if we go posix or not posix. Add adaptive partitioning scheduler.
- Create/Get BSP (board support package) and do modifications (e.g. patch RT PREEMPT for Linux, configure the kernel (e.g. isolate CPUs, remove all unwanted drivers, add other applications))
- Explore use of real-time hyper-visor (QNX has one)
- Use/create static and real-time middleware
- rmw, rcl, rclcpp layers:
- introduce safe data types (bounded, check type integrity)
- perform memory audit (remove unneeded memory allocations)
- split memory allocation in init and runtime phases, avoid memory fragmentation
- remove all blocking calls (or replace with timed calls, e.g.
mutex
vstimed_mutex
) - implement real-time safe log output handler (no logging to console or file)
- implement real-time pub/sub (either using Waitset or modified Callback/Executor)
- convert ros2 launch to C++
- run tools for static and dynamic code analysis (PCLint, LDRA, Silexica, LTT-ng)
- Check everything above in the STL library
- Node architecture for deterministic execution (policy for message aggregation, nodes cohesion, parallelization, …)
- Global error handling (history of failures, core dumps, fail-safe mechanism, …)
- Real-time safety for higher level concepts, e.g.:
- services
- parameters
- actions
- Create reference applications and porting guidelines from ROS1 to ROS2: https://www.apex.ai/blog/porting-algorithms-from-ros-1-to-ros-2
- Create CI for RT testing (e.g. https://github.com/ros2/ros2/issues/607#issuecomment-460319513)
We are requesting for comments:
- Do you have items to be added/removed from above list?
- Do you want to join this working group? We will form a regular ROS working group that uses Discourse for discussions and holds video and in-person meetings.
We are aiming for our first video meeting next week.
D.