ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A answers.ros.org

ROS 2 and Real-time

I just want to highlight that having a bridge, as an architectural choice, does not prevent us from achieving real-time capability. Of course, the current implementation of the xrce-dds-bridge we use in micro-ROS is a different story. A lot of work remains to be done there, on all the layers you mentioned and probably internally as well.

I would like to join this working group and contribute.

Ingo, I agree with you that this is a significant effort - however there are many many use cases. If you are relying on low cost camera sensors (either 2D or 3D) for localization and obstacle avoidance you are most likely going to be piping that data through the CPU. If you want to go faster than a crawl you better have some level of determinism as to when that data is going to be processed and a stop/continue decision is made. There are certain types of camera based navigation that just don’t work unless you have deterministic timing and that deterministic timing needs to occur between cameras and things like IMU.

I have customers today (for example in warehouse) who both want the robot to go as fast as possible (to increase throughput) but also want to spend as little on the robot as possible - hence lower cost sensors. Now, I can easily build a fast low cost robot, but unless I am using systems that are outside of ROS its hard for me to ensure it is safe.

Even if I could get part of the system to have deterministic timing that would be a huge boon - for example - if the safety sensors could pass messages direct to the motor controller over a deterministic data fabric (say AVB ethernet) such that the motor controller knows that it has to either slow down or stop that would already help a lot.

I might start there - with that simple subset - a safety sensor that has a RTOS uC and a motor controller that has a similar RTOS uC and them passing DDS messages with deterministic characteristics.

If we can get that to work then we can move on to the CPU oriented applications of camera based navigation.

1 Like

David, I’m all sold on real-time. It’s essential for many applications. I hope my comments above are not implying otherwise. Real-time is essential for predictability, which is what I think you’re aiming for.

I’m even more sold on determinism. In fact, I’m usually credited with putting it prominently on the ROS map through a talk at the ROSCon in 2017. That’s not the same as real-time, however.

When I say that those are big efforts, I’m not saying we don’t have to do this. I’m mentioning it because I care about the how.

And in particular, I think that the how depends a lot on the what for. And Dejan didn’t specify that. So, all I’m asking is that people seriously think about the what for before embarking on this journey.

At the very least, these thoughts will give us some requirements and some metrics. And then we can think about how to achieve that in the easiest way.

It will also give us some common ground to speak about.

For me personally, I care about small consumer robots. So, my answers to the stack Dejan posted are:

  1. We use micro-controllers and they are real-time capable and usually single core, some multi-core, usually homogeneous.
  2. We use a POSIX RTOS, NuttX
  3. We chose hardware that’s supported by 2. Still a pain though.
  4. Not necessary
  5. We use Micro-XRCE-DDS

  6. 6.1) don’t care
    6.2) got some ideas, interest to collaborate
    6.3) got some ideas, interest to collaborate
    6.4) unclear…
    6.5) don’t care
    6.6) HERE’S THE BIGGIE
    6.7) don’t care
    6.8) got it
  7. don’t care
  8. working on it, using SPA approach
  9. proposing system modes approach
  10. don’t care
  11. maybe
  12. we use a HIL approach

Now, if you ask Dejan, I’m pretty sure his answers will be very different. That’s because he has a very different application (autonomous driving). Not sure about what Victor will say, but he’s working on manipulation, which could again be quite different.

So, from my perspective, we’d like to collaborate on some of the points under bullet point 6, and that’s also the core thing for ROS2. For the rest of the things, we are very likely to have a radically different approach.

Therefore, please, before embarking on a “full stack” approach, get your use cases clear and then maybe people will know what they’re getting into and what the goals are.

3 Likes

Great! Now we are getting some good discussion. Ingo, yes we’ve dived in to actions and solutions before getting clear on the problem statement. We’ve also generated a list of actions that mix both development and process and could be more clearly MECE. We are missing chunks - determanistic message passing requires known allocations of bandwidth - which is available with standards like AVB but not, so far, included on the list - we also need to talk a little bit about hardware. How about if we reformulate the discussion as follows for everyone to throw rocks at:

Ultimate goal: To enable universal robotic service

Current Situation: ROS robots cannot operate safely in many domains due to a lack of predictability and determanism that means sensors can’t be relied on to prevent human injury

Desired Situation: ROS2 robots can guarantee safety in environments with unprotected humans by using sensors and by passing data from those sensors through the ROS2 messaging infrastructure.

Problem Statement: How can we create deterministic responses to sensor inputs in ROS2?

Problem Breakdown

Hardware

  • CPU
  • uC (I assume most everything attaches to a standard uC and is determanistic to the uC) *
  • Communications infrastructure
    • High speed (e.g. AVB Ethernet) *
    • Low speed (e.g. SLIP)

Software

  • uC RTOS (e.g. NuttX) *
    • modifications???
  • CPU RTOS (e.g. QMX deriv)
    modifications (e.g. patch RT PREEMPT for Linux, etc)
    drivers for determanistic communications (e.g. bandwidth allocation on AVB)
  • Middleware (e.g. DDS and Micro-XRCE-DDS)
  • Libraries (rmw, rcl, rclcpp)
    • Cleanup for safe implementation
      • introduce safe data types (bounded, check type integrity)
      • perform memory audit (remove unneeded memory allocations)
      • split memory allocation in init and runtime phases, avoid memory fragmentation
      • implement real-time safe log output handler (no logging to console or file)
      • remove all blocking calls (or replace with timed calls, e.g. mutex vs timed_mutex)
    • Cleanup for desirability
      • convert ros2 launch to C++
    • Implement real-time pub/sub (either using Waitset or modified Callback/Executor) *
      • Real-time pub/sub will need to request bandwidth allocation from shared comms infrastructure (e.g. AVB ethernet)
      • Define message length standards
      • Other
  • Communications infrastructure firmware *
  • Services
    • Global error handling (history of failures, core dumps, fail-safe mechanism, …)
    • Real-time safety for higher level concepts, e.g.: services, parameters, actions)

Process

I’d propose to do the bold italic items first with the image of the initial target state being a sensor (say a sonar) passing a message to a motor controller for E-Stop.

As the author of the Ada client library, I’m interested in this topic. Not sure I can contribute much at this time but I will in any case try to keep the Ada library in sync with developments in this area.

Yes, I’d like to join the working group and contribute.

Please count me in for teleconferences!

@LanderU and @abilbaotm, unless I’m traveling, I’ll be in the meeting but to would it be possible for you to participate also and provide a small update of the RT build farm we’re building? I think it’s a great opportunity to get input.

1 Like

I guess some of people interesting in ROS 2 and Real-time are aware of the future DDS/TSN mapping but I did not see any material related to it linked in the discussion, so I think it is worth adding the links here:

I’d also like to attend the working group meetings.

@vmayoral @iluetkeb @davecrawley et al thank you very much for your feedback and also apologies for not replying earlier.

I am with you and I agree that we need a use case. Not listing an item to decide on the use case was an oversight on my side. However I got the following use cases from your replies:

  1. consumer robots (Ingo)
  2. warehouse logistics robots (David)
  3. mobile manipulation robots (Victor)
  4. autonomous driving robots/cars (Dejan, Geoff)

If you want to plus one any of above items or add another use case please let me know.

Otherwise it is clear that alone above 4 use cases are vastly different and will have different requirements, different HW, different middleware and probably also different control and data flows (which will result in different node architectures).

Said that I guess we have 2 options to start from here:

  1. select one of the use cases above and probably loose interest from people having other uses case.
  2. focus on somewhat generic parts of ROS 2 that will help independently of the selected use case.
    My reading of your comments is that these parts are the following ones:
    1. create rmw layers for static and real-time middleware (RTI Connext Micro, Micro-XRCE-DDS)
    2. perform memory audit in rmw, rcl and rclcpp (remove unneeded memory allocations)
    3. split memory allocation in init and runtime phases, avoid memory fragmentation
    4. remove all blocking calls (or replace with timed calls, e.g. mutex vs timed_mutex )
    5. implement real-time pub/sub (either using Waitset or modified Callback/Executor)
    6. integrate tools for static and dynamic code analysis (PCLint, LDRA, Silexica, LTT-ng)
    7. Create node architecture for deterministic execution (policy for message aggregation, nodes cohesion, parallelization, local error handling …)
    8. Create a design for global error handling (history of failures, core dumps, fail-safe mechanism, …)
      1. Create CI for RT testing (e.g. https://github.com/ros2/ros2/issues/607#issuecomment-460319513 )

I am leaning towards the second option.

I’d like to invite for a meeting next week to decide on above and to kick off the work. Could you guys meet on

Note that we have an interest across the globe, so getting a meeting friendly time in every zone will be impossible.

1 Like

That starting time is very Europe-friendly, so yes, it works for me :slight_smile:

How much time do you expect the meeting would take? More than 90 minutes?

That time works for me.

Thanks for organizing @Dejan_Pangercic, works for us very fine. Looking forward!

Thank you @vmayoral for these interesting links.

@vmayoral do you have a written down definition for what soft real-time and hard-real time are? At Apex.AI we have a very informal one for the latter:

In general a hard real-time system can be defined as follows: the entire system must be deterministic, and if a deadline or a data sample is missed the consequence is catastrophic.

@iluetkeb I was asked by a ROS 2 TSC to lead the ROS 2 RT WG. While I do currently work in autonomous driving and have an insight in this domain, my first priority in this role is to make sure that ROS 2 is a success. Hence I will also live with the less comprehensive approach if this is what the rest of you think is better to do.

@davecrawley it seems that here yours and Ingo’s goal somewhat overlap.

@iluetkeb is it unclear if you are interested in this or do you need more clarification from my side?

@iluetkeb can you elaborate a bit more what SPA is?

@iluetkeb can you elaborate a bit more what system modes approach is?

@davecrawley the problem here is that if the sensor does not speak DDS (and hence we can not use DDS’ data model flow) - then we are actually solving a use case specific approach (e.g. is this sensor connected over AVB Ethernet or CAN, which RTOS are we running, do we have a regular network stack or a TSN, …). I am not opposed to jumping on this but lets first decide on whether we do use case specific or generic work.

Otherwise thanks for re-structuring this.

@iluetkeb I want to keep at 60mins.

@Dejan_Pangercic in our view, when speaking about hard-real time systems, missing a deadline implies a system failure (which often can lead to a catastrophic consequence, though not always necessarily). We discuss this topic at https://arxiv.org/pdf/1809.02595.pdf.
Summarized, our view is the following:

Real-time systems can be classified depending on how critical to meet the corresponding timing constraints. For hard real-time systems, missing a deadline is considered a system failure. Examples of real-time systems are anti-lock brakes or aircraft control systems. On the other hand, firm real-time systems are more relaxed. An information or computation delivered after a missing a deadline is considered invalid, but it does not necessarily lead to system failure. In this case, missing deadlines could degrade the performance of the system. In other words, the system can tolerate a certain amount of missed deadlines before failing. Examples of firm real-time systems include most professional and industrial robot control systems such as the control loops of collaborative robot arms, aerial robot autopilots or most mobile robots, including self-driving vehicles. Finally, in the case of soft real-time, missed deadlines -even if delivered late- remain useful. This implies that soft real-time systems do not necessarily fail due to missed deadlines, instead, they produce a degradation in the usefulness of the real-time task in execution. Examples of soft-real time systems are telepresence robots of any kind (audio, video, etc.).

Beyond this, there’s also a nice description of the differences between hard, firm and soft at http://design.ros2.org/articles/realtime_background.html

Hard real-time software systems have a set of strict deadlines, and missing a deadline is considered a system failure. Examples of hard real-time systems: airplane sensor and autopilot systems, spacecrafts and planetary rovers.

Soft real-time systems try to reach deadlines but do not fail if a deadline is missed. However, they may degrade their quality of service in such an event to improve responsiveness. Examples of soft real-time systems: audio and video delivery software for entertainment (lag is undesirable but not catastrophic).

Firm real-time systems treat information delivered/computations made after a deadline as invalid. Like soft real-time systems, they do not fail after a missed deadline, and they may degrade QoS if a deadline is missed (1). Examples of firm real-time systems: financial forecast systems, robotic assembly lines (2).

To me, it is unclear whether blocking calls are an issue currently. For micro-ROS, the main focus is on rmw and rcl, and they don’t have many of those, if any at all. I’ve seen some mutexes being used in rmw layers (e.g., FastRTPS), but since we are not using those implementations for real-time applications (right?), not sure whether they are relevant.

It is a staged execution approach, inspired by Fawkes sense-plan-act (-> SPA) pipeline. In practice there are more than just those three stages, but SPA is the basic motivation. Basically, we assign callbacks to stages, and then execute those stages sequentially.

This is supposed to be more easy to use than classical priority-based scheduling for the typical robotics guy who isn’t a scheduling expert (which include me ;-), and still provide deterministic ordering.

See slide 28 from https://micro-ros.github.io/download/2019-05-07_micro-ROS.pdf for an example.

We’re currently writing this up in a bit more detail, but since it hasn’t been fully implemented, yet, things are subject to change.

See the concept description for the motivation and there is also a modes example.

Most importantly, the concept description makes a connection to error handling, which is the motivation for me mentioning it here.

@Dejan_Pangercic - can you post a new thread with meeting times? I had to search through 17 replies to find this and I would have missed it if you hadn’t mentioned it in the TSC. I’d like to attend and include @mjeronimo and @lbegani also. Is there a Google calendar invite we can be added to?

1 Like