If you already have RMW implementation which supports pub/sub you should be able to directly test your communication mean without any additional work. You just need to set the proper environment variable RMW_IMPLEMENTATION=rmw_ndn before starting the tool.
It can not messure latency based on application layers unfortunately, for this it would have to be invasive in all these layers.
This will allow you to compare only the communication frameworks performance and will also give you some insight over the overhead the various RMW layers introduce.
If you run into issues implementing the plugin I will be glad to support you.
I was lacking some features (typesupport, proper management of multithreading, etc…), but in the end I was able to test my stack with your package. Also, I already implemented the invasive solution for measuring the latency of each layers. I will ask you if I need help, thank you again.
Also, I already implemented the invasive solution for measuring the latency of each layers. I will ask you if I need help, thank you again.
Would you mind sharing some more details about your solution for measuring latency in each layer? Maybe we could even integrate it into the performance_test itself.
I didn’t do anything complex, but since it is an invasive way of measuring, I don’t know if it is easy to integrate into the performance_test. I only printed events with their timestamp on stdio, and then post processed with a python script. It is not very precise, but since I don’t want the real latency (only being able to compare the two implementations), it is ok for me. To avoid the print extra cost during the experiment, you can register the events+timestamp in a (pre-allocated) table, and print everything at the end.
Very nice. I was thinking if someone could do a ros message extension to add-in the latencies at each layers in a private area of the message at each layer (encode as a BLOB) and could be retrieved at the subscriber to measure-in.
Towards a distributed and real-time framework for robots: evaluation of ROS 2.0 communications for real-time robotic applications
In this work we present an experimental setup to show the suitability of ROS 2.0 for real-time robotic applications. We disclose an evaluation of ROS 2.0 communications in a robotic inter-component (hardware) communication case on top of Linux. We benchmark and study the worst case latencies and missed deadlines to characterize ROS 2.0 communications for real-time applications. We demonstrate experimentally how computation and network congestion impacts the communication latencies and ultimately, propose a setup that, under certain conditions, mitigates these delays and obtains bounded traffic.
Compared to other results:
All the measurements have been made in embedded devices.
We measure latencies in a inter-component scenario. Given the lack of synchronization mechanisms (in this particular work we did not set them up), we use round-trip (ping-pong).
Previous work focuses on the measurement of local latencies while we measure distributed ones.
Hey there @awesomebytes! It certainly sounds like a typo. Many thanks for reporting. Let us review it internally and report back if our mistakes go beyond that.
Time Synchronization in modular collaborative robots, M-cobots
A new generation of robot systems which are modular, flexible and safe for human-robot interaction are needed. Existing cobots seem to meet only the later and require a modular approach to improve their reconfigurability and interoperability. We propose a new sub-class of cobots named M-cobots which tackle these problems. In particular, we discuss the relevance of synchronization for these systems, analyze it and demonstrate how with a properly configured M-cobot, we are able to obtain a) distributed sub-microsecond clock synchronization accuracy among modules, b) timestamping accuracy of ROS 2.0 messages under 100 microseconds and c) millisecond-level end-to-end communication latencies, even when disturbed with networking overloads of up to 90 % of the network capacity.
Thank you for the reports related to latency using RT_PREMPT linux and ROS 2 with various network settings. It was very interesting to read.
Have couple of questions.
In the base RT_PREEMPT linux kernel report (https://arxiv.org/pdf/1808.10821.pdf), I understand Table-III and Table-IV is what matters. But, while looking at, Table-II (Roundtrip latency results with RT normal), was curious if you know what might be the reason for TX traffic at 100Mbps, the MAX latency is considerably high at 25ms? I would expect latency to be high when RX traffic at 100Mbps
In the ROS 2 evaluation report (https://arxiv.org/pdf/1809.02595.pdf), in Fig 5-a, when the system is idle, DDS2 has high MAX latency(4ms) compared to others. Was just curious which DDS implementation is this and what might be the reason?
In the Fog 6-f of the ROS 2 evaluation report, at 80Mbps, where it cannot meet the deadlines and dropped packets, was curious, is ksoftirqd processing the packets the primary cause for the latency or can it be the DDS layer causing the latency. Also, about packets being dropped, would setting the size of kernel socket buffers ( net.core.rmem*, net.core.wmem*) would help too.
Thanks for your feedback, I will try to clarify some of your doubts:
Both TX and RX paths are suffering from the context change to the ksoftirqd threads but in a different way. In the transmission path both streams are going trough the same Qdisc queue. When there are packets pending to be transmitted in the Qdisc queue they are sent from the ksoftirqd context. At some point the fair scheduler decides that the ksoftirqd thread has consumed enough CPU and it is preempted. During this time, packets are accumulated and we observe high latencies in the order of milliseconds. For 100 Mbps it looks like the RX path, packet are processed more efficiently. This is probably because the Ksoftirqd context is not triggered all the time and part of these packets are processed in the Ethernet IRQ thread which has real-time priority. However, when we increased the network load of the concurrent traffic (>200Mbps) we observed also high latencies even in the RT normal case.
For fig 5a and 5b we were using the default configuration of each DDS. In the case of that DDS the default configuration might not be optimized for low bounded latencies but for other purposes. However, when for the real-time settings (fig 5c and 5d) we customized the configuration of that DDS and the problem was solved.
In this case we had 80 Mbps non-ROS 2.0 concurrent traffic with the ROS 2.0 round-trip traffic. As there is no contention in the DDS layers the problem was very likely caused in the kernel level. Posterior analysis tracing the kernel confirmed our suspects. Changing the socket queues may prevent packet drop but would not solve the root of the problem which is going to cause latency. The real problem is caused by how the net processing is deferred to ksoftirqd context. For the moment we can only mitigate these problems and expect this is solved in the new kernel releases.
Hi @vmayoral would it be possible to provide the source code for the ping-pong test in the paper: Towards a distributed and real-time framework for robots: Evaluation of ROS 2.0 communications for real-time robotic applications ?
we recently measured latency with different parameter settings (number of nodes, frequency, payload) and submitted a paper on that. You can find a preprint on arxiv: https://arxiv.org/pdf/2101.02074.pdf
Hi @urczf, this warrants its own thread instead of burying at the bottom of a two year old conversation. If you have original and relevant work please feel free to post it and link back to older threads that are relevant.