Minimising ROS2 Discovery Traffic

Hello everyone,

I wanted to let you know that we just posted the results of some experiments we did measuring how much we could reduce ROS2/DDS discovery traffic using zenoh (http://zenoh.io). The findings are quite promising as the Zenoh based solution (1) drastically reduces DDS discovery overhead – 97% to 99,9% traffic reduction in tested scenarios, (2) allows for peer-to-peer communication when useful, (3) enables efficient Internet-scale routing when necessary, and (3) does not require any changes to your existing ROS2 systems, in other terms it is completely transparent for your ROS2 applications.

The full details are available at Minimizing Discovery Overhead in ROS2 · zenoh

Please feel free to suggest other experiments you like us to try out or ask questions, clarifications.

Take Care!

6 Likes

Hello again,

As I mentioned above the interoperability between zenoh and ROS2 here is a demo from @JEnoch that shows how to drive turtlesim from a Rust zenoh application. On the same repo you’ll also see how to do that through a web-page using zenoh REST API.

Once more your comments and feedback is very welcome.

Take Care!

1 Like

I’d actually be interested in getting some idea of the latency of a setup including the Zenoh-DDS bridge to bridge topics from Zenoh to ROS 2.

Perhaps some round-trip and throughput measurements for increasing sizes of message payloads? And jitter?

The demo sends out geometry_msgs/Twists: whats the delay incurred by bridging? Is this kind of setup suitable for use at high frequencies?

1 Like

Hello @gavanderhoorn,

What we can easily provide are the throughput and RTT (Latency = RTT/2) measures for the zenoh master. The graph below shows the throughput we get on an AMD Ryzen workstation. As you can see the peak throughput is around 60GBps. Please notice that these is the throughput while going through the loopback interface over TCP/IP.

The two different graph show the throughput for the zenoh-net and the higher level zenoh API. It may be insightful to look into the code of the examples we use for measuring throughput – please see (zenoh/zn_pub_thr.rs at master · eclipse-zenoh/zenoh · GitHub, zenoh/zn_sub_thr.rs at master · eclipse-zenoh/zenoh · GitHub, zenoh/z_sub_thr.rs at master · eclipse-zenoh/zenoh · GitHub, zenoh/z_put_thr.rs at master · eclipse-zenoh/zenoh · GitHub).

If you skim through the code you’ll notice how the code used to test performance does not try to play tricks or take shortcuts. This is the code how you’d write in your application after having looked at the Getting Started guide. In other terms, we try to make performance as accessible as possible. If you wonder what is our behaviour across the network, when we measure throughput over a 10Gpbs network, the only difference we see from the localhost is that the throughput saturates at 10Gbps. For the rest, what remains the same is that we saturate a 1Gbps network at 128 bytes payload and a 10Gbps network at about 1024 bytes. We are writing a blog on performance where we’ll share all this data along with the performance of our zero copy. If you can wait a bit, we’ll share a pretty throughout analysis in one week or so.

For what concerns Round Trip Time (RTT), we usually measure it for a fixed size, 64 bytes, and for increasing publication frequencies.

We think this is more relevant than the usual RTT test shown in performance evaluations which is essentially the same as the inf in our x-axis (meaning as fast can you can). In essence by looking at latency at different publication periods you can more clearly see how caches will impact the actual latency experienced by your application – also notice that real applications rarely write as fast as possible.

Anyway, as you can see from the graph the RTT gets down to 40 micro-secs quite rapidly – in other terms 20 micro-secs latency.

In conclusion, this are the raw zenoh performance which hopefully will give you an idea of what zenoh may add as an overhead when bridging DDS data over the network.

Let me know if you have further questions.

Take Care!

3 Likes

Hi guys,

Just worth noting ROS2 Foxy counts already with a solution in scenarios of many nodes or no multicast: The discovery server:

  • Zero configuration effort: Just an environment variable.
  • Fully documented
  • Reduces DDS Discovery traffic
  • No Bridges required, no penalty in performance.
  • Redundancy available.
  • Follows the DDS Standard.
  • Already adopted by many users .
1 Like

Perhaps a comparison of this vs the discovery server might be in order? That would show the threshold when people should start considering a switch from the Discovery Server to something that’s more difficult to set up. It could also show other pros/cons of each system, so that users of ROS 2 (both old and new) can make the most informed decisions that they can about their systems’ needs.

Also, does Zenoh require CycloneDDS? Or can it be made to work with other DDS vendors? Forgive me if that’s an ignorant question…

1 Like

Zenoh is a communications middleware independent of DDS, so it can be used on its own if that’s what you want to do. The Zenoh-to/from-DDS thing should work with any DDS vendor that complies with the specification.

1 Like

Thanks @gbiggs for helping out with the clarification. Indeed the zenoh-plugin-dds works with any DDS implementation – as far as on the DDS side there is an implementation that complies with the DDS specification.

@jhdcs, one big difference between what we reported and the Discovery Service – beside the further reduction in bandwidth – is that Zenoh does not require a service deployed somewhere on the network. You still retain the peer-to-peer. Just drop in the zenoh-dds-plugin and you are in business.

Take Care!

1 Like