Latency induced by timer chains, how to avoid them?

JM_ROS · December 10, 2024, 1:45pm

Hi,
I common problem I run into are latencies induced by chains of timers, spanning over multiple nodes.
Example :
State Estimator

Has a timer computing the current state of the robot (Pose / Velocity etc) every 10 ms. Publishes state data.

Trajectory Follower

Has a timer computing the joint cmd every 10 ms. Works on state data, publishes joint cmd.

Joint Driver

Has a timer every 10ms, that writes the last received joint cmd to the hardware.

In this example, we can have a latency up to 30 ms, just by bad timing of the timers. E.g. if every timer runs at the exactly same time, each timer would work on 10 ms old data from the previous timer in the chain.
To make matters worse, the latency will vary on every startup.

I thought about this a while and found a few possible solution so far

Use something like ros2_control and run as much as possible in sequence / remove as much timers as possible, run things callback based.
Increase the frequency of the timers.
Sync the timers.

Does anyone know of a further option ?

Option 1 and 2 are not practicable in my special use case. Therefore I am looking into option 3. Until now my best idea was, to measure the worst case runtime of the timers, and extend the rcl timer api to be able to set the ‘start time’ of a timer. Note, timers always trigger on ‘start time’ + X * period.

E.g.
State estimator needs up to 4 ms to compute the state, and msg pub / rec takes around 400 us.
Trajectory Follower takes around 2 ms + plus 400 us pub/rec.

This would result in a start timer config of

State estimator 0
Trajectory Follower 5ms
Joint driver 8ms

I would love to know your thoughts about this, and if I am missing something obvious, that would break the timer sync idea…

JRTG · December 10, 2024, 2:12pm

Why not the following:

A single timer in your state estimator node,
No timer in the trajectory follower node. It performs its calculation and publishes in its subscriber callback,
No timer in the joint driver node, it writes the joint cmd to hardware also in its subscriber callback.

That way the timer of the state estimator drives the control loop frequency, and the other nodes process ‘as quick as possible’.

JM_ROS · December 10, 2024, 3:03pm

This would be option 1 / 2 I mentioned above. As I wrote there this is not practicable in our special use case.

The example above is a toy example to illustrate the problem, the real world system has multiple inputs to the joint controller, and the source of the state estimator also has multiple inputs.

A second problem is, that the joint driver needs to run on a stable frequency. You may especially never ‘burst’ write to it. As runtimes of the path follower are load dependent on our system (we steer up to 30 ‘robots’ at a time) a direct chaining can lead to burst, if one iteration takes 5 ms, and the next 1 ms. Then the joint driver would see a write 6 ms after the last instead of 10 ms…

gavanderhoorn · December 10, 2024, 6:38pm

Your post reminded me of @Ingo_Lutkebohle’s presentation(s) about this/a similar topic.

From ROSCon’17 for instance: Determinism in ROS – or when things break /sometimes / and how to fix it… (slides, recording).

JRTG · December 10, 2024, 6:43pm

There were also several talks on executor determinism at ROSCon’21.

I haven’t looked at it in detail, but I think the PiCAS executor targets a similar use case.

Ingo_Lutkebohle · December 10, 2024, 7:20pm

One easy approach is a sequence of “wait_for_message” calls on all the inputs, and then running the compute if you have new data on all or some of them. You can optionally wait in there to achieve a target output rate, and this can also be extended for different input rates, and to handle message loss.

Naturally, it’s less flexible than using an executor, but it’s very predictable.

I’ve also seen people try to sync timers through a sync topic, but unless the runtimes of the callbacks are very constant, this can still run into the same “running just before the next update arrives” problem.

btw, in the rclc executor we have support for trigger conditions which can help with such things, but I would not really recommend switching to rclc just for this. Maybe this would be interesting to add to rclcpp at some point.

btw, rclcpp::waitset might also be useful for such things.

ZhenshengLee · December 11, 2024, 2:11am

following @Ingo_Lutkebohle

using rclcpp::waitset with the guide here Topic message handling guideline - Autoware Documentation (autowarefoundation.github.io)

JM_ROS · December 12, 2024, 1:05pm

Thanks for all your feedback.

The talk of @Ingo_Lutkebohle does indeed touch a similar issue, but that was contained in one process. The talk also mentions briefly the possibility of controlling the start point of timers, which sounds similar to my initial idea.

I am not sold on the “wait_for_message” approach, as this can still introduce huge latencies, if the input is not synced.

I also discussed the problem with some colleagues and we came to the conclusion, that for a ‘perfect’ solution, we would need to know all run times of all processes feeding the joint driver. Then we could back track and assign points in time, were processes should provide / process data. This also seems to be the approach of the PiCAS guys.
But as Ingo mentioned, for this our runtimes would need be either constant or have a known upper bound. (This is sadly not the case for our system).

Another idea that popped up, was to monitor the runtimes / latencies of the nodes dynamically, and have some sort of master constantly adjust the data processing time point.

To sum it up, we are still undecided on how we want to solve this problem, but at least we have some new ideas. So thanks for the input.

Ingo_Lutkebohle · December 12, 2024, 1:52pm

Glad it helped Regarding your explanation of the challenges in your system, let me do a bit more high-level take on this:

In any active system there is the question of when to update the control signal. ROS provides two relatively reasonable defaults out-of-the-box – on new data, and at a fixed rate – but as we see here, this is not always sufficient.

Whenever I find myself in such a situation, the first thing I search for is a way to simplify the problem. For example, input data is often not synchronized by default, but maybe it can be synchronized. Many multi-camera systems support a trigger signal, on micro-controllers it is often possible to sample inputs at nearly the same instant, sometimes you can use the same hardware clock for multiple free-running devices on the same PCB, or you can sync times using PTP, etc. pp.

The same thing goes for process run-times. Quite honestly, if you can’t put at least a reasonable upper-bound on the times of the processes feeding your component, you have a problem no matter what, and I would suggest fixing that. Sometimes real-time scheduling the processes already helps a lot, at other times you may need to use different algorithms, and so on.

Also keep in mind that maybe you don’t need all the information all the time. Depending on the dynamics of your system you may be able to interpolate some of the data, or maybe some information channels can tolerate missing our outdated data more easily than others.

Last, but not least, the question is how accurate do you have to be. Of course we all like better performing systems, but sometimes it’s not really necessary. But if it is necessary, e.g., because of safety reasons, then maybe this constraint should drive the overall design of the entire sub-system you are worried about and you then better get those inputs synchronized and those run-times constant, or else you will run into trouble with the certification anyway.

Topic		Replies	Views
ROS2 latency using different node setups ROS General	31	9934	June 17, 2021
Turtlebot2 control dead time (latency) of 200ms? TurtleBot	1	4351	October 21, 2021
Rosserial_arduino performance question ROS General	5	2551	April 10, 2017
Simulation: Firefly synchronization in Swarm Robotics Projects ros2 , gazebo , swarmrobotics , humble	1	1797	July 19, 2023
New Gazebo plugin for ROS control to simulate control packet latency: robot_hw_sim_latency Projects	0	1928	August 21, 2017

Latency induced by timer chains, how to avoid them?

Related topics