ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A answers.ros.org

ROS 2 and Real-time

As the author of the Ada client library, I’m interested in this topic. Not sure I can contribute much at this time but I will in any case try to keep the Ada library in sync with developments in this area.

Yes, I’d like to join the working group and contribute.

Please count me in for teleconferences!

@LanderU and @abilbaotm, unless I’m traveling, I’ll be in the meeting but to would it be possible for you to participate also and provide a small update of the RT build farm we’re building? I think it’s a great opportunity to get input.

1 Like

I guess some of people interesting in ROS 2 and Real-time are aware of the future DDS/TSN mapping but I did not see any material related to it linked in the discussion, so I think it is worth adding the links here:

I’d also like to attend the working group meetings.

@vmayoral @iluetkeb @davecrawley et al thank you very much for your feedback and also apologies for not replying earlier.

I am with you and I agree that we need a use case. Not listing an item to decide on the use case was an oversight on my side. However I got the following use cases from your replies:

  1. consumer robots (Ingo)
  2. warehouse logistics robots (David)
  3. mobile manipulation robots (Victor)
  4. autonomous driving robots/cars (Dejan, Geoff)

If you want to plus one any of above items or add another use case please let me know.

Otherwise it is clear that alone above 4 use cases are vastly different and will have different requirements, different HW, different middleware and probably also different control and data flows (which will result in different node architectures).

Said that I guess we have 2 options to start from here:

  1. select one of the use cases above and probably loose interest from people having other uses case.
  2. focus on somewhat generic parts of ROS 2 that will help independently of the selected use case.
    My reading of your comments is that these parts are the following ones:
    1. create rmw layers for static and real-time middleware (RTI Connext Micro, Micro-XRCE-DDS)
    2. perform memory audit in rmw, rcl and rclcpp (remove unneeded memory allocations)
    3. split memory allocation in init and runtime phases, avoid memory fragmentation
    4. remove all blocking calls (or replace with timed calls, e.g. mutex vs timed_mutex )
    5. implement real-time pub/sub (either using Waitset or modified Callback/Executor)
    6. integrate tools for static and dynamic code analysis (PCLint, LDRA, Silexica, LTT-ng)
    7. Create node architecture for deterministic execution (policy for message aggregation, nodes cohesion, parallelization, local error handling …)
    8. Create a design for global error handling (history of failures, core dumps, fail-safe mechanism, …)
      1. Create CI for RT testing (e.g. https://github.com/ros2/ros2/issues/607#issuecomment-460319513 )

I am leaning towards the second option.

I’d like to invite for a meeting next week to decide on above and to kick off the work. Could you guys meet on

Note that we have an interest across the globe, so getting a meeting friendly time in every zone will be impossible.

1 Like

That starting time is very Europe-friendly, so yes, it works for me :slight_smile:

How much time do you expect the meeting would take? More than 90 minutes?

That time works for me.

Thanks for organizing @Dejan_Pangercic, works for us very fine. Looking forward!

Thank you @vmayoral for these interesting links.

@vmayoral do you have a written down definition for what soft real-time and hard-real time are? At Apex.AI we have a very informal one for the latter:

In general a hard real-time system can be defined as follows: the entire system must be deterministic, and if a deadline or a data sample is missed the consequence is catastrophic.

@iluetkeb I was asked by a ROS 2 TSC to lead the ROS 2 RT WG. While I do currently work in autonomous driving and have an insight in this domain, my first priority in this role is to make sure that ROS 2 is a success. Hence I will also live with the less comprehensive approach if this is what the rest of you think is better to do.

@davecrawley it seems that here yours and Ingo’s goal somewhat overlap.

@iluetkeb is it unclear if you are interested in this or do you need more clarification from my side?

@iluetkeb can you elaborate a bit more what SPA is?

@iluetkeb can you elaborate a bit more what system modes approach is?

@davecrawley the problem here is that if the sensor does not speak DDS (and hence we can not use DDS’ data model flow) - then we are actually solving a use case specific approach (e.g. is this sensor connected over AVB Ethernet or CAN, which RTOS are we running, do we have a regular network stack or a TSN, …). I am not opposed to jumping on this but lets first decide on whether we do use case specific or generic work.

Otherwise thanks for re-structuring this.

@iluetkeb I want to keep at 60mins.

@Dejan_Pangercic in our view, when speaking about hard-real time systems, missing a deadline implies a system failure (which often can lead to a catastrophic consequence, though not always necessarily). We discuss this topic at https://arxiv.org/pdf/1809.02595.pdf.
Summarized, our view is the following:

Real-time systems can be classified depending on how critical to meet the corresponding timing constraints. For hard real-time systems, missing a deadline is considered a system failure. Examples of real-time systems are anti-lock brakes or aircraft control systems. On the other hand, firm real-time systems are more relaxed. An information or computation delivered after a missing a deadline is considered invalid, but it does not necessarily lead to system failure. In this case, missing deadlines could degrade the performance of the system. In other words, the system can tolerate a certain amount of missed deadlines before failing. Examples of firm real-time systems include most professional and industrial robot control systems such as the control loops of collaborative robot arms, aerial robot autopilots or most mobile robots, including self-driving vehicles. Finally, in the case of soft real-time, missed deadlines -even if delivered late- remain useful. This implies that soft real-time systems do not necessarily fail due to missed deadlines, instead, they produce a degradation in the usefulness of the real-time task in execution. Examples of soft-real time systems are telepresence robots of any kind (audio, video, etc.).

Beyond this, there’s also a nice description of the differences between hard, firm and soft at http://design.ros2.org/articles/realtime_background.html

Hard real-time software systems have a set of strict deadlines, and missing a deadline is considered a system failure. Examples of hard real-time systems: airplane sensor and autopilot systems, spacecrafts and planetary rovers.

Soft real-time systems try to reach deadlines but do not fail if a deadline is missed. However, they may degrade their quality of service in such an event to improve responsiveness. Examples of soft real-time systems: audio and video delivery software for entertainment (lag is undesirable but not catastrophic).

Firm real-time systems treat information delivered/computations made after a deadline as invalid. Like soft real-time systems, they do not fail after a missed deadline, and they may degrade QoS if a deadline is missed (1). Examples of firm real-time systems: financial forecast systems, robotic assembly lines (2).

To me, it is unclear whether blocking calls are an issue currently. For micro-ROS, the main focus is on rmw and rcl, and they don’t have many of those, if any at all. I’ve seen some mutexes being used in rmw layers (e.g., FastRTPS), but since we are not using those implementations for real-time applications (right?), not sure whether they are relevant.

It is a staged execution approach, inspired by Fawkes sense-plan-act (-> SPA) pipeline. In practice there are more than just those three stages, but SPA is the basic motivation. Basically, we assign callbacks to stages, and then execute those stages sequentially.

This is supposed to be more easy to use than classical priority-based scheduling for the typical robotics guy who isn’t a scheduling expert (which include me ;-), and still provide deterministic ordering.

See slide 28 from https://micro-ros.github.io/download/2019-05-07_micro-ROS.pdf for an example.

We’re currently writing this up in a bit more detail, but since it hasn’t been fully implemented, yet, things are subject to change.

See the concept description for the motivation and there is also a modes example.

Most importantly, the concept description makes a connection to error handling, which is the motivation for me mentioning it here.

@Dejan_Pangercic - can you post a new thread with meeting times? I had to search through 17 replies to find this and I would have missed it if you hadn’t mentioned it in the TSC. I’d like to attend and include @mjeronimo and @lbegani also. Is there a Google calendar invite we can be added to?

1 Like

OK so I like following target use cases:

  1. consumer robots (Ingo)
  2. warehouse logistics robots (David)
  3. mobile manipulation robots (Victor)
  4. autonomous driving robots/cars (Dejan, Geoff)

They are real and will keep us focused. The requirements of these use cases obviously differ, but there can be a lot of commonality in the hardware implementation. I’d propose that we agree a model HW architecture that can usefully span all those 4 domains.

All of these systems will basically consist of

a) Sensors
b) uC

  1. uC attached to sensors (I assume that this is always required for any sensor)
  2. uC attached elsewhere in the network (e.g for bridging one network type to another)
  3. uC attached to actuators

c) Networking infrastructure
d) Main host computer
e) Actuators

For real time system purposes we only really care about b,c,d. So I’d propose we agree what model HW we’d work towards. Something like

b) uC : STM32 running Nuttx
c) 1)AVB Ethernet & 2) point to point serial line comms (could be any underlying physical layer - which physical layer is used is not relevant to real time)
d) 1) i86 & 2) ARM

I’d suggest that we target the elements that definitely span all 4 domains in the beginning namely b and c2. I respect @iluetkeb desire not to get in to d in the first instance. We’ll probably get more mileage by sorting out b and c2 first, but the way I see the world - eventually we are going to need real time in d even for relatively low cost applications like warehouse.

Not only the transport layer but everything above as well, from the data link layer (OSI layer 2) up to the ROS 2 rcl (OSI layer 7, and going through all middle layers including the network and transport layers (which we typically refer to as the networking stack), the communication middleware (e.g. DDS), etc.).

The network infrastructure is key. I picked AVB because it is widely used in automotive and widely available. Also in it most of the OSI layers are already real time capable. In particular you have to have bandwidth guarantees (possible in AVB and implied in any point to point protocol) without which I don’t see how we can make a RT system. If you have a sensor that transmits directly on to a broadcast network (e.g. plain vanilla ethernet) that doesn’t have its own individualized bandwidth allocation - I don’t see how you can make that system RT unless you can also guarantee that it is the only thing that is doing so on the parts of the network it is using.

I put serial in there, because well, you are always bound to have some kind of serial comms going on somewhere.

This doesn’t prevent us from expanding with other connectivity choices or uC later on - it just means that we’ll design it with this stuff in mind and can provide a template to build hardware for eventual testing.

@davecrawley the problem here is that if the sensor does not speak DDS (and hence we can not use DDS’ data model flow) - then we are actually solving a use case specific approach (e.g. is this sensor connected over AVB Ethernet or CAN, which RTOS are we running, do we have a regular network stack or a TSN, …). I am not opposed to jumping on this but lets first decide on whether we do use case specific or generic work.

Sure! Whatever sensor you hook up will have to connect to a uC that speaks DDS and connects to our RT middleware fabric. I’d propose we create / define a standard setup for that uC but not really get in to the connection between the uC and the sensor. I think we have to assume that whatever the connection between the uC and the sensor it is deterministic. Most of the sensors I use hook directly in to a uC that I control anyways. It only gets messy when you want to connect a sensor that connects to a shared communications fabric out of the box and that sensor doesn’t speak DDS or respect determinism. For example, an ethernet LIDAR. But there is no way around it! Such a sensor will inject data with non deterministic timing in to a shared and finite communications fabric and as such cannot be deterministic as long as it commingles with other non-deterministic data while it does so. You have to put it in to our real time middleware layer before it comingles with any other non-real time data and so that either means re-programming whatever uC is on the sensor or using a bridge. That bridge will mostly have the same setup as the standard uC discussed above. So I think we just define/agree one standard uC setup.

AVB has the property that it can handle RT and non-RT data streams simultaneously. Obviously we have to figure out how our middleware is going to talk to it though and make sure that it allocates the right amount of bandwidth, for a sensor for example, to ensure the guaranteed quality of service.

1 Like

I would also like to join the meeting on Monday, and would like to propose an item for the agenda that is in particular related to 6.5 (real-time pub/sub).

We have recently investigated the current ROS 2 implementation from a real-time predictability angle. In particular, we have investigated the order in which callbacks are executed. As it turns out, the current implementation has a couple of very surprising properties that result from the interplay between the executor implementation in rclcpp and the rmw API. Overall, the rclcpp executor exhibits a behavior that is somewhere in the middle between FIFO, round robin on the topics, and fixed-priority scheduling. These properties make it very hard to understand and predict the execution order of callbacks, even if the arrival order at the DDS level is known perfectly.

The behavior is described in detail in Section 3 of our paper (which is available as a preprint) and can be experimentally verified using our model validation test.

I think it would be useful to discuss (a) how best to address this issue and (b) which behavior and which ordering guarantees ROS 2 should provide.

1 Like

@nburek @traversaro @mosteo @davecrawley does this time work for you for our first meeting?

@mkhansen I will post a separate thread with the meeting invite.

2 Likes

This time works for me. :+1:

@Dejan_Pangercic
I’d also like to join the meeting, really appreciate for taking care of this thread.

@iluetkeb
i really like the idea about SPA, since it is what robots(and human) usually do. actually i was thinking more like Sense(sometimes reflection immediately) -> Perception -> Plan -> Action.

@vmayoral
happy that you brought up Precision Time Protocol here, i think that this is mandatory when it comes to the robots orchestration and work together at the same time.