Latency and throughput in ROS2

fkromer · April 1, 2018, 10:42am

The paper Maruyama, Yuya et al. “Exploring the performance of ROS2.” 2016 International Conference on Embedded Software (EMSOFT) (2016): 1-10. evaluates the latency of ROS2 (different node configurations, different DDS implementations, different QoS policies, etc.).

Does someone know if these results have been updated by some more recent work(s)?

Dejan_Pangercic · April 27, 2018, 7:07am

We will release a plugin-based tool that will let you evaluate latency and memory resource while altering number of pubs/subs, QoS, RT settings, large and small message types, publishing frequency, security settings, inter and intra process communication, etc. We need 2-3 more weeks, stay tuned.

vmayoral · April 27, 2018, 10:36am

That’s fantastic @Dejan_Pangercic, looking forward to have a look.

On our side, we recently released a first tech report (https://arxiv.org/abs/1804.07643) that’s part of a series that hopefully will characterize better the latencies and throughput in ROS 2 while taking in consideration Real-Time aspects. This first tech report treats OSI layers 1 and 2.

Dejan_Pangercic · June 18, 2018, 5:53pm

Hi all, we released the performance_test tool: https://github.com/ApexAI/performance_test to e.g. benchmark latency, jitter, lost samples, etc. in different DDS implementations.
Currently supported benchmarking is for communication mean over:

FastRTPS directly
ROS 2 rmw layer (and thus any supported rmw_* implementation)
Connext DDS Micro directly

We plan to extend it to use ROS1 comms as well.

Hope you find it useful, any feedaback is more than welcome.

D.

astralien3000 · June 19, 2018, 3:03pm

Hello !

Nice tool @Dejan_Pangercic !

What are the requirements on the RMW layer to be tested ?
I have a non-DDS-based RMW, but it currently only support simple pub/sub.
It would be interesting to measure performance with your tool and compare it to DDS implementations.

Also (for latency), is it only measuring the end-to-end latency, or can it be more comprehensive ? For example measuring the time spent in each layer (RCL/RMW/DDS/Network).

andreaspasternak · June 19, 2018, 10:31pm

Hi @astralien3000,

I am the maintainer of the performance test tool.

If you already have RMW implementation which supports pub/sub you should be able to directly test your communication mean without any additional work. You just need to set the proper environment variable RMW_IMPLEMENTATION=rmw_ndn before starting the tool.

It can not messure latency based on application layers unfortunately, for this it would have to be invasive in all these layers.

But you can create a new communication plugin for the NDN transport as I did for FastRTPS here: https://github.com/ApexAI/performance_test/blob/master/performance_test/src/communication_abstractions/fast_rtps_communicator.hpp.

This will allow you to compare only the communication frameworks performance and will also give you some insight over the overhead the various RMW layers introduce.

If you run into issues implementing the plugin I will be glad to support you.

astralien3000 · June 20, 2018, 5:09pm

Thank you for your answer @andreaspasternak,

I was lacking some features (typesupport, proper management of multithreading, etc…), but in the end I was able to test my stack with your package. Also, I already implemented the invasive solution for measuring the latency of each layers. I will ask you if I need help, thank you again.

Dejan_Pangercic · June 20, 2018, 5:45pm

Hi @astralien3000,

Also, I already implemented the invasive solution for measuring the latency of each layers. I will ask you if I need help, thank you again.

Would you mind sharing some more details about your solution for measuring latency in each layer? Maybe we could even integrate it into the performance_test itself.

astralien3000 · June 21, 2018, 9:04am

I didn’t do anything complex, but since it is an invasive way of measuring, I don’t know if it is easy to integrate into the performance_test. I only printed events with their timestamp on stdio, and then post processed with a python script. It is not very precise, but since I don’t want the real latency (only being able to compare the two implementations), it is ok for me. To avoid the print extra cost during the experiment, you can register the events+timestamp in a (pre-allocated) table, and print everything at the end.

Devendra_aaru · August 2, 2018, 2:48am

Very nice. I was thinking if someone could do a ros message extension to add-in the latencies at each layers in a private area of the message at each layer (encode as a BLOB) and could be retrieved at the subscriber to measure-in.

vmayoral · September 5, 2018, 7:16pm

For those interested, a new technical report studying this topic is available: https://arxiv.org/pdf/1808.10821.pdf.

vmayoral · September 10, 2018, 9:31am

Here’s another update: https://arxiv.org/pdf/1809.02595.pdf

Towards a distributed and real-time framework for robots: evaluation of ROS 2.0 communications for real-time robotic applications

In this work we present an experimental setup to show the suitability of ROS 2.0 for real-time robotic applications. We disclose an evaluation of ROS 2.0 communications in a robotic inter-component (hardware) communication case on top of Linux. We benchmark and study the worst case latencies and missed deadlines to characterize ROS 2.0 communications for real-time applications. We demonstrate experimentally how computation and network congestion impacts the communication latencies and ultimately, propose a setup that, under certain conditions, mitigates these delays and obtains bounded traffic.

Compared to other results:

All the measurements have been made in embedded devices.
We measure latencies in a inter-component scenario. Given the lack of synchronization mechanisms (in this particular work we did not set them up), we use round-trip (ping-pong).
Previous work focuses on the measurement of local latencies while we measure distributed ones.
We measure how communications are affected in stressed conditions. This is the best way to show if the communication stack is well configured for real-time (which connects to previous work Latency and throughput in ROS2 - #11 by vmayoral and Latency and throughput in ROS2 - #3 by vmayoral).

awesomebytes · September 11, 2018, 3:53am

Hello Victor, thanks a lot for your reports, I’m reading them right now and I came to a sentence that I don’t understand. On [1804.07643] Time-Sensitive Networking for robotics on the 4th page you say:

Additionally, as Ethernet is asynchronous, the high priority frames sharing the same link can content between them.

What does content mean in that context?

Sorry if it’s a bit of a picky question.

awesomebytes · September 11, 2018, 4:08am

Oops, later on I found the usage of contend which now makes sense to me. I guess that one was a typo

vmayoral · September 11, 2018, 9:17am

Hey there @awesomebytes! It certainly sounds like a typo. Many thanks for reporting. Let us review it internally and report back if our mistakes go beyond that.

Cheers!

vmayoral · September 20, 2018, 7:50am

An another one:

Time Synchronization in modular collaborative robots, M-cobots

A new generation of robot systems which are modular, flexible and safe for human-robot interaction are needed. Existing cobots seem to meet only the later and require a modular approach to improve their reconfigurability and interoperability. We propose a new sub-class of cobots named M-cobots which tackle these problems. In particular, we discuss the relevance of synchronization for these systems, analyze it and demonstrate how with a properly configured M-cobot, we are able to obtain a) distributed sub-microsecond clock synchronization accuracy among modules, b) timestamping accuracy of ROS 2.0 messages under 100 microseconds and c) millisecond-level end-to-end communication latencies, even when disturbed with networking overloads of up to 90 % of the network capacity.

Read the tech report at https://arxiv.org/pdf/1809.07295.pdf

Anup_Pemmaiah · September 29, 2018, 2:02am

Hi Victor,

Thank you for the reports related to latency using RT_PREMPT linux and ROS 2 with various network settings. It was very interesting to read.

Have couple of questions.

In the base RT_PREEMPT linux kernel report (https://arxiv.org/pdf/1808.10821.pdf), I understand Table-III and Table-IV is what matters. But, while looking at, Table-II (Roundtrip latency results with RT normal), was curious if you know what might be the reason for TX traffic at 100Mbps, the MAX latency is considerably high at 25ms? I would expect latency to be high when RX traffic at 100Mbps
In the ROS 2 evaluation report (https://arxiv.org/pdf/1809.02595.pdf), in Fig 5-a, when the system is idle, DDS2 has high MAX latency(4ms) compared to others. Was just curious which DDS implementation is this and what might be the reason?
In the Fog 6-f of the ROS 2 evaluation report, at 80Mbps, where it cannot meet the deadlines and dropped packets, was curious, is ksoftirqd processing the packets the primary cause for the latency or can it be the DDS layer causing the latency. Also, about packets being dropped, would setting the size of kernel socket buffers ( net.core.rmem*, net.core.wmem*) would help too.

Thanks
Anup

Dejan_Pangercic · October 2, 2018, 2:40am

@vmayoral see 3 questions above by folks from Apex.

carlossv · October 2, 2018, 10:42am

Hi @Anup_Pemmaiah, @Dejan_Pangercic

Thanks for your feedback, I will try to clarify some of your doubts:

Both TX and RX paths are suffering from the context change to the ksoftirqd threads but in a different way. In the transmission path both streams are going trough the same Qdisc queue. When there are packets pending to be transmitted in the Qdisc queue they are sent from the ksoftirqd context. At some point the fair scheduler decides that the ksoftirqd thread has consumed enough CPU and it is preempted. During this time, packets are accumulated and we observe high latencies in the order of milliseconds. For 100 Mbps it looks like the RX path, packet are processed more efficiently. This is probably because the Ksoftirqd context is not triggered all the time and part of these packets are processed in the Ethernet IRQ thread which has real-time priority. However, when we increased the network load of the concurrent traffic (>200Mbps) we observed also high latencies even in the RT normal case.
For fig 5a and 5b we were using the default configuration of each DDS. In the case of that DDS the default configuration might not be optimized for low bounded latencies but for other purposes. However, when for the real-time settings (fig 5c and 5d) we customized the configuration of that DDS and the problem was solved.
In this case we had 80 Mbps non-ROS 2.0 concurrent traffic with the ROS 2.0 round-trip traffic. As there is no contention in the DDS layers the problem was very likely caused in the kernel level. Posterior analysis tracing the kernel confirmed our suspects. Changing the socket queues may prevent packet drop but would not solve the root of the problem which is going to cause latency. The real problem is caused by how the net processing is deferred to ksoftirqd context. For the moment we can only mitigate these problems and expect this is solved in the new kernel releases.

vmayoral · October 2, 2018, 10:59am

My colleague @carlossv just answered @Dejan_Pangercic. Ping me if you guys are around IROS and would like to discuss this face to face.

Topic		Replies	Views
ROS2 latency using different node setups General	31	9707	June 17, 2021
Fast DDS v2.2.0 throughput performance Next Generation ROS	0	1688	March 1, 2021
DDS Performance Comparison on Windows for RMW Providers in ROS 2 Next Generation ROS ros2 , dds , fastdds , rmw	5	2276	October 18, 2021
Fast DDS v2.2.0 latency performance Next Generation ROS	2	3366	April 19, 2021
ROS2 Foxy & RMW Fast DDS: Improved Intra-process & Inter-process performance Next Generation ROS	15	3492	November 30, 2020

Latency and throughput in ROS2

Related topics