Zero-copy DDS Performance Comparison for RMW Providers in ROS 2

Hello ROS community!

Following the previous DDS middleware comparison on Windows platforms, today we bring you the performance comparison using zero-copy delivery.

This benchmark uses the Apex performance test tool to get latency results and maximum throughput in inter-process communications using Fast DDS and Cyclone DDS RMW implementations with zero-copy capabilities activated:

  • Fast DDS using data-sharing delivery.
  • Cyclone DDS using Iceoryx.

Testing limitations

Only results for Linux and reliable reliability are compared, since Cyclone+Iceoryx is not supported on Windows, and the RMW implementation of Cyclone DDS does not seem to support zero-copy for best-effort reliability. Note that these limitations do not apply to Fast DDS zero-copy.

Cyclone+Iceoryx has severe issues with data rates over 1,000 messages per second. Shortly after launching the processes, it complains that the mempool has no more available chunks. We tried increasing the number of chunks, with no improvement. The result is that the maximum throughput is very poor compared to Fast DDS, especially for large data sizes.

The Iceoryx daemon iox-roudi crashed several times during the batch execution of the tests, invalidating the remaining test cases of the batch. This was easily resolved repeating the affected tests, but proves the problem of having a single point of failure that completely disables the zero-copy delivery.

Latency:

Throughput:

Conclusion

These results confirm that Fast DDS implementation of the zero-copy delivery has the best performance and robustness among the open source DDS implementations for ROS 2! Especially with large data sizes, it performs considerably better in both latency and throughput.

eProsima will shortly publish the complete workbench description and results on its website, so stay tuned!

6 Likes

We tried increasing the number of chunks up to 50,000 on iceoryx, but it did not help. This number of chunks will already cause allocation problems on 4MB messages on most systems, but even for the 256B case it complained of no available free chunks.

2 Likes

I added zero copy feature of fastdds 2.3.x in performance_test repo which is hosted in GitHub - ZhenshengLee/performance_test: Github repo for apex.ai performance_test for more middlewares. and the test results is in ROS 2 Jetson benchmarks | Site to host ROS 2 Jetson benchmark results

Results show that the iceoryx chunk problem occurs often, as you said.

And even in udp mode, (sometimes) rmw_cyclonedds just cannot send and recv normal msgs!

1 Like

Hi, again

Recently I read about the osrf-tsc-report about the performance of FastDDS and CycloneDDS, especially the response from vendors like your company!

I have a little question, in the response of FastDDS TSC-RMW-Reports/eProsima-response.md at main · ros2middleware/TSC-RMW-Reports (github.com)

In order to answer the question of performance of transmitting 4K camera images, the benchmark choose the 2MB msg size data, why would the msg size be 2MB,the 4K image size is (24/8) * 3840 * 2160 bytes or about 24.9MB right?

I also created an issue about this [QST] Why using 2MB/4MB msg size data in benchmark to demostrate the performance in transimitting ~4K camera images? · Issue #105 · osrf/TSC-RMW-Reports (github.com)

Thanks!

Perhaps this question is focused on the optimal setup for the DDS, however in practice buffer size is not the low-hanging fruit to improve performance for this camera image transfer & processing.

A 4K camera image (~24MB for RGB888) over a wire between computer systems is best transferred with lossy compression such as H.264 for >=10x reduction in data footprint, as Chris Lalancette suggested in the reply.

If the 4K camera image is processed local to the computer it’s best to remain in process and leave the image on the hardware accelerator for processing than place the burden on the CPU. REP-2007 does this to save the DDS from this demanding workload.

Thanks

1 Like