ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A answers.ros.org

ROS 2 and Real-time

Agree with this and aligned with my paragraph above, we need to specify not only the level of real-time we’re aiming for but also the particular use cases this group is willing to maintain to serve as study cases for the community.

I’m somewhat concerned about the “real-time” capabilities we may end-up achieving with micro-ROS’s existing architecture. While the project is still ongoing and much work is left (almost half its way though), the architecture of DDS-XRCE requires a “bridge” to transform XRCE’s client/server (or peer-to-peer if it ends up being implemented) communications into DDSI-RTPS ones (the ones in common ROS 2). Beyond communications between XRCE-native entities, communications won’t interoperate directly with the ROS 2 network and such bridging will introduce a compromise for real-time applications. Including such “bridge” as one of the cases of study will benefit the (micro-ROS) project.

From our (Bosch CR) side, we only target the MCU itself as the real-time capable device.

Apart from pragmatic reasons (this makes it much easier), in our current products that’s all we need.

I would guess that this assumption is true for many products, because it is a very common architectural approach to keep the real-time safe parts a very small part of the overall system. Of course, AD is a notable exception, which is probably why Dejan is so interested in a more comprehensive approach :wink:

Last, but not least, we’re very interested in wireless communications and hard real-time is out with that anyway.

It depends on the implementation of the bridge. Assuming the transport layer is real-time capable (e.g., wired), the bridge itself could maintain real-time guarantees.

Very much agreed!

While I can’t speak on its behalf yet (or how deterministic it’ll be), there’s some interesting work we’ve been exploring with a local research group here in our area that’s extending TSN for wireless communications. Let me know if you’re interested and I’ll connect you.

Soft real-time guarantees probably, but hard real-time ones require the bridge itself to also be hard-real-time. Not only the transport layer but everything above as well, from the data link layer (OSI layer 2) up to the ROS 2 rcl (OSI layer 7, and going through all middle layers including the network and transport layers (which we typically refer to as the networking stack), the communication middleware (e.g. DDS), etc.).

AFAIK, real-time isn’t a goal in micro-ROS project but I’d agree that we should indeed consider it and commit resources to it.

I just want to highlight that having a bridge, as an architectural choice, does not prevent us from achieving real-time capability. Of course, the current implementation of the xrce-dds-bridge we use in micro-ROS is a different story. A lot of work remains to be done there, on all the layers you mentioned and probably internally as well.

I would like to join this working group and contribute.

Ingo, I agree with you that this is a significant effort - however there are many many use cases. If you are relying on low cost camera sensors (either 2D or 3D) for localization and obstacle avoidance you are most likely going to be piping that data through the CPU. If you want to go faster than a crawl you better have some level of determinism as to when that data is going to be processed and a stop/continue decision is made. There are certain types of camera based navigation that just don’t work unless you have deterministic timing and that deterministic timing needs to occur between cameras and things like IMU.

I have customers today (for example in warehouse) who both want the robot to go as fast as possible (to increase throughput) but also want to spend as little on the robot as possible - hence lower cost sensors. Now, I can easily build a fast low cost robot, but unless I am using systems that are outside of ROS its hard for me to ensure it is safe.

Even if I could get part of the system to have deterministic timing that would be a huge boon - for example - if the safety sensors could pass messages direct to the motor controller over a deterministic data fabric (say AVB ethernet) such that the motor controller knows that it has to either slow down or stop that would already help a lot.

I might start there - with that simple subset - a safety sensor that has a RTOS uC and a motor controller that has a similar RTOS uC and them passing DDS messages with deterministic characteristics.

If we can get that to work then we can move on to the CPU oriented applications of camera based navigation.

1 Like

David, I’m all sold on real-time. It’s essential for many applications. I hope my comments above are not implying otherwise. Real-time is essential for predictability, which is what I think you’re aiming for.

I’m even more sold on determinism. In fact, I’m usually credited with putting it prominently on the ROS map through a talk at the ROSCon in 2017. That’s not the same as real-time, however.

When I say that those are big efforts, I’m not saying we don’t have to do this. I’m mentioning it because I care about the how.

And in particular, I think that the how depends a lot on the what for. And Dejan didn’t specify that. So, all I’m asking is that people seriously think about the what for before embarking on this journey.

At the very least, these thoughts will give us some requirements and some metrics. And then we can think about how to achieve that in the easiest way.

It will also give us some common ground to speak about.

For me personally, I care about small consumer robots. So, my answers to the stack Dejan posted are:

  1. We use micro-controllers and they are real-time capable and usually single core, some multi-core, usually homogeneous.
  2. We use a POSIX RTOS, NuttX
  3. We chose hardware that’s supported by 2. Still a pain though.
  4. Not necessary
  5. We use Micro-XRCE-DDS

  6. 6.1) don’t care
    6.2) got some ideas, interest to collaborate
    6.3) got some ideas, interest to collaborate
    6.4) unclear…
    6.5) don’t care
    6.6) HERE’S THE BIGGIE
    6.7) don’t care
    6.8) got it
  7. don’t care
  8. working on it, using SPA approach
  9. proposing system modes approach
  10. don’t care
  11. maybe
  12. we use a HIL approach

Now, if you ask Dejan, I’m pretty sure his answers will be very different. That’s because he has a very different application (autonomous driving). Not sure about what Victor will say, but he’s working on manipulation, which could again be quite different.

So, from my perspective, we’d like to collaborate on some of the points under bullet point 6, and that’s also the core thing for ROS2. For the rest of the things, we are very likely to have a radically different approach.

Therefore, please, before embarking on a “full stack” approach, get your use cases clear and then maybe people will know what they’re getting into and what the goals are.

3 Likes

Great! Now we are getting some good discussion. Ingo, yes we’ve dived in to actions and solutions before getting clear on the problem statement. We’ve also generated a list of actions that mix both development and process and could be more clearly MECE. We are missing chunks - determanistic message passing requires known allocations of bandwidth - which is available with standards like AVB but not, so far, included on the list - we also need to talk a little bit about hardware. How about if we reformulate the discussion as follows for everyone to throw rocks at:

Ultimate goal: To enable universal robotic service

Current Situation: ROS robots cannot operate safely in many domains due to a lack of predictability and determanism that means sensors can’t be relied on to prevent human injury

Desired Situation: ROS2 robots can guarantee safety in environments with unprotected humans by using sensors and by passing data from those sensors through the ROS2 messaging infrastructure.

Problem Statement: How can we create deterministic responses to sensor inputs in ROS2?

Problem Breakdown

Hardware

  • CPU
  • uC (I assume most everything attaches to a standard uC and is determanistic to the uC) *
  • Communications infrastructure
    • High speed (e.g. AVB Ethernet) *
    • Low speed (e.g. SLIP)

Software

  • uC RTOS (e.g. NuttX) *
    • modifications???
  • CPU RTOS (e.g. QMX deriv)
    modifications (e.g. patch RT PREEMPT for Linux, etc)
    drivers for determanistic communications (e.g. bandwidth allocation on AVB)
  • Middleware (e.g. DDS and Micro-XRCE-DDS)
  • Libraries (rmw, rcl, rclcpp)
    • Cleanup for safe implementation
      • introduce safe data types (bounded, check type integrity)
      • perform memory audit (remove unneeded memory allocations)
      • split memory allocation in init and runtime phases, avoid memory fragmentation
      • implement real-time safe log output handler (no logging to console or file)
      • remove all blocking calls (or replace with timed calls, e.g. mutex vs timed_mutex)
    • Cleanup for desirability
      • convert ros2 launch to C++
    • Implement real-time pub/sub (either using Waitset or modified Callback/Executor) *
      • Real-time pub/sub will need to request bandwidth allocation from shared comms infrastructure (e.g. AVB ethernet)
      • Define message length standards
      • Other
  • Communications infrastructure firmware *
  • Services
    • Global error handling (history of failures, core dumps, fail-safe mechanism, …)
    • Real-time safety for higher level concepts, e.g.: services, parameters, actions)

Process

I’d propose to do the bold italic items first with the image of the initial target state being a sensor (say a sonar) passing a message to a motor controller for E-Stop.

As the author of the Ada client library, I’m interested in this topic. Not sure I can contribute much at this time but I will in any case try to keep the Ada library in sync with developments in this area.

Yes, I’d like to join the working group and contribute.

Please count me in for teleconferences!

@LanderU and @abilbaotm, unless I’m traveling, I’ll be in the meeting but to would it be possible for you to participate also and provide a small update of the RT build farm we’re building? I think it’s a great opportunity to get input.

1 Like

I guess some of people interesting in ROS 2 and Real-time are aware of the future DDS/TSN mapping but I did not see any material related to it linked in the discussion, so I think it is worth adding the links here:

I’d also like to attend the working group meetings.

@vmayoral @iluetkeb @davecrawley et al thank you very much for your feedback and also apologies for not replying earlier.

I am with you and I agree that we need a use case. Not listing an item to decide on the use case was an oversight on my side. However I got the following use cases from your replies:

  1. consumer robots (Ingo)
  2. warehouse logistics robots (David)
  3. mobile manipulation robots (Victor)
  4. autonomous driving robots/cars (Dejan, Geoff)

If you want to plus one any of above items or add another use case please let me know.

Otherwise it is clear that alone above 4 use cases are vastly different and will have different requirements, different HW, different middleware and probably also different control and data flows (which will result in different node architectures).

Said that I guess we have 2 options to start from here:

  1. select one of the use cases above and probably loose interest from people having other uses case.
  2. focus on somewhat generic parts of ROS 2 that will help independently of the selected use case.
    My reading of your comments is that these parts are the following ones:
    1. create rmw layers for static and real-time middleware (RTI Connext Micro, Micro-XRCE-DDS)
    2. perform memory audit in rmw, rcl and rclcpp (remove unneeded memory allocations)
    3. split memory allocation in init and runtime phases, avoid memory fragmentation
    4. remove all blocking calls (or replace with timed calls, e.g. mutex vs timed_mutex )
    5. implement real-time pub/sub (either using Waitset or modified Callback/Executor)
    6. integrate tools for static and dynamic code analysis (PCLint, LDRA, Silexica, LTT-ng)
    7. Create node architecture for deterministic execution (policy for message aggregation, nodes cohesion, parallelization, local error handling …)
    8. Create a design for global error handling (history of failures, core dumps, fail-safe mechanism, …)
      1. Create CI for RT testing (e.g. https://github.com/ros2/ros2/issues/607#issuecomment-460319513 )

I am leaning towards the second option.

I’d like to invite for a meeting next week to decide on above and to kick off the work. Could you guys meet on

Note that we have an interest across the globe, so getting a meeting friendly time in every zone will be impossible.

1 Like

That starting time is very Europe-friendly, so yes, it works for me :slight_smile:

How much time do you expect the meeting would take? More than 90 minutes?

That time works for me.

Thanks for organizing @Dejan_Pangercic, works for us very fine. Looking forward!

Thank you @vmayoral for these interesting links.

@vmayoral do you have a written down definition for what soft real-time and hard-real time are? At Apex.AI we have a very informal one for the latter:

In general a hard real-time system can be defined as follows: the entire system must be deterministic, and if a deadline or a data sample is missed the consequence is catastrophic.

@iluetkeb I was asked by a ROS 2 TSC to lead the ROS 2 RT WG. While I do currently work in autonomous driving and have an insight in this domain, my first priority in this role is to make sure that ROS 2 is a success. Hence I will also live with the less comprehensive approach if this is what the rest of you think is better to do.

@davecrawley it seems that here yours and Ingo’s goal somewhat overlap.

@iluetkeb is it unclear if you are interested in this or do you need more clarification from my side?

@iluetkeb can you elaborate a bit more what SPA is?

@iluetkeb can you elaborate a bit more what system modes approach is?

@davecrawley the problem here is that if the sensor does not speak DDS (and hence we can not use DDS’ data model flow) - then we are actually solving a use case specific approach (e.g. is this sensor connected over AVB Ethernet or CAN, which RTOS are we running, do we have a regular network stack or a TSN, …). I am not opposed to jumping on this but lets first decide on whether we do use case specific or generic work.

Otherwise thanks for re-structuring this.

@iluetkeb I want to keep at 60mins.