Latency induced by timer chains, how to avoid them?

Hi,
I common problem I run into are latencies induced by chains of timers, spanning over multiple nodes.
Example :
State Estimator

  • Has a timer computing the current state of the robot (Pose / Velocity etc) every 10 ms. Publishes state data.

Trajectory Follower

  • Has a timer computing the joint cmd every 10 ms. Works on state data, publishes joint cmd.

Joint Driver

  • Has a timer every 10ms, that writes the last received joint cmd to the hardware.

In this example, we can have a latency up to 30 ms, just by bad timing of the timers. E.g. if every timer runs at the exactly same time, each timer would work on 10 ms old data from the previous timer in the chain.
To make matters worse, the latency will vary on every startup.

I thought about this a while and found a few possible solution so far

  • Use something like ros2_control and run as much as possible in sequence / remove as much timers as possible, run things callback based.
  • Increase the frequency of the timers.
  • Sync the timers.

Does anyone know of a further option ?

Option 1 and 2 are not practicable in my special use case. Therefore I am looking into option 3. Until now my best idea was, to measure the worst case runtime of the timers, and extend the rcl timer api to be able to set the ā€˜start timeā€™ of a timer. Note, timers always trigger on ā€˜start timeā€™ + X * period.

E.g.
State estimator needs up to 4 ms to compute the state, and msg pub / rec takes around 400 us.
Trajectory Follower takes around 2 ms + plus 400 us pub/rec.

This would result in a start timer config of

  • State estimator 0
  • Trajectory Follower 5ms
  • Joint driver 8ms

I would love to know your thoughts about this, and if I am missing something obvious, that would break the timer sync ideaā€¦

Why not the following:

  • A single timer in your state estimator node,
  • No timer in the trajectory follower node. It performs its calculation and publishes in its subscriber callback,
  • No timer in the joint driver node, it writes the joint cmd to hardware also in its subscriber callback.

That way the timer of the state estimator drives the control loop frequency, and the other nodes process ā€˜as quick as possibleā€™.

This would be option 1 / 2 I mentioned above. As I wrote there this is not practicable in our special use case.

The example above is a toy example to illustrate the problem, the real world system has multiple inputs to the joint controller, and the source of the state estimator also has multiple inputs.

A second problem is, that the joint driver needs to run on a stable frequency. You may especially never ā€˜burstā€™ write to it. As runtimes of the path follower are load dependent on our system (we steer up to 30 ā€˜robotsā€™ at a time) a direct chaining can lead to burst, if one iteration takes 5 ms, and the next 1 ms. Then the joint driver would see a write 6 ms after the last instead of 10 msā€¦

Your post reminded me of @Ingo_Lutkebohleā€™s presentation(s) about this/a similar topic.

From ROSConā€™17 for instance: Determinism in ROS ā€“ or when things break /sometimes / and how to fix itā€¦ (slides, recording).

1 Like

There were also several talks on executor determinism at ROSConā€™21.

I havenā€™t looked at it in detail, but I think the PiCAS executor targets a similar use case.

One easy approach is a sequence of ā€œwait_for_messageā€ calls on all the inputs, and then running the compute if you have new data on all or some of them. You can optionally wait in there to achieve a target output rate, and this can also be extended for different input rates, and to handle message loss.

Naturally, itā€™s less flexible than using an executor, but itā€™s very predictable.

Iā€™ve also seen people try to sync timers through a sync topic, but unless the runtimes of the callbacks are very constant, this can still run into the same ā€œrunning just before the next update arrivesā€ problem.

btw, in the rclc executor we have support for trigger conditions which can help with such things, but I would not really recommend switching to rclc just for this. Maybe this would be interesting to add to rclcpp at some point.

btw, rclcpp::waitset might also be useful for such things.

3 Likes

following @Ingo_Lutkebohle

using rclcpp::waitset with the guide here Topic message handling guideline - Autoware Documentation (autowarefoundation.github.io)

1 Like

Thanks for all your feedback.

The talk of @Ingo_Lutkebohle does indeed touch a similar issue, but that was contained in one process. The talk also mentions briefly the possibility of controlling the start point of timers, which sounds similar to my initial idea.

I am not sold on the ā€œwait_for_messageā€ approach, as this can still introduce huge latencies, if the input is not synced.

I also discussed the problem with some colleagues and we came to the conclusion, that for a ā€˜perfectā€™ solution, we would need to know all run times of all processes feeding the joint driver. Then we could back track and assign points in time, were processes should provide / process data. This also seems to be the approach of the PiCAS guys.
But as Ingo mentioned, for this our runtimes would need be either constant or have a known upper bound. (This is sadly not the case for our system).

Another idea that popped up, was to monitor the runtimes / latencies of the nodes dynamically, and have some sort of master constantly adjust the data processing time point.

To sum it up, we are still undecided on how we want to solve this problem, but at least we have some new ideas. So thanks for the input.

Glad it helped :slight_smile: Regarding your explanation of the challenges in your system, let me do a bit more high-level take on this:

In any active system there is the question of when to update the control signal. ROS provides two relatively reasonable defaults out-of-the-box ā€“ on new data, and at a fixed rate ā€“ but as we see here, this is not always sufficient.

Whenever I find myself in such a situation, the first thing I search for is a way to simplify the problem. For example, input data is often not synchronized by default, but maybe it can be synchronized. Many multi-camera systems support a trigger signal, on micro-controllers it is often possible to sample inputs at nearly the same instant, sometimes you can use the same hardware clock for multiple free-running devices on the same PCB, or you can sync times using PTP, etc. pp.

The same thing goes for process run-times. Quite honestly, if you canā€™t put at least a reasonable upper-bound on the times of the processes feeding your component, you have a problem no matter what, and I would suggest fixing that. Sometimes real-time scheduling the processes already helps a lot, at other times you may need to use different algorithms, and so on.

Also keep in mind that maybe you donā€™t need all the information all the time. Depending on the dynamics of your system you may be able to interpolate some of the data, or maybe some information channels can tolerate missing our outdated data more easily than others.

Last, but not least, the question is how accurate do you have to be. Of course we all like better performing systems, but sometimes itā€™s not really necessary. But if it is necessary, e.g., because of safety reasons, then maybe this constraint should drive the overall design of the entire sub-system you are worried about and you then better get those inputs synchronized and those run-times constant, or else you will run into trouble with the certification anyway.