This benchmark uses the Apex performance test tool to get latency results and maximum throughput in inter-process communications using Fast DDS and Cyclone DDS RMW implementations with zero-copy capabilities activated:
Fast DDS using data-sharing delivery.
Cyclone DDS using Iceoryx.
Testing limitations
Only results for Linux and reliable reliability are compared, since Cyclone+Iceoryx is not supported on Windows, and the RMW implementation of Cyclone DDS does not seem to support zero-copy for best-effort reliability. Note that these limitations do not apply to Fast DDS zero-copy.
Cyclone+Iceoryx has severe issues with data rates over 1,000 messages per second. Shortly after launching the processes, it complains that the mempool has no more available chunks. We tried increasing the number of chunks, with no improvement. The result is that the maximum throughput is very poor compared to Fast DDS, especially for large data sizes.
The Iceoryx daemon iox-roudi crashed several times during the batch execution of the tests, invalidating the remaining test cases of the batch. This was easily resolved repeating the affected tests, but proves the problem of having a single point of failure that completely disables the zero-copy delivery.
These results confirm that Fast DDS implementation of the zero-copy delivery has the best performance and robustness among the open source DDS implementations for ROS 2! Especially with large data sizes, it performs considerably better in both latency and throughput.
eProsima will shortly publish the complete workbench description and results on its website, so stay tuned!
We tried increasing the number of chunks up to 50,000 on iceoryx, but it did not help. This number of chunks will already cause allocation problems on 4MB messages on most systems, but even for the 256B case it complained of no available free chunks.
In order to answer the question of performance of transmitting 4K camera images, the benchmark choose the 2MB msg size data, why would the msg size be 2MB,the 4K image size is (24/8) * 3840 * 2160 bytes or about 24.9MB right?
Perhaps this question is focused on the optimal setup for the DDS, however in practice buffer size is not the low-hanging fruit to improve performance for this camera image transfer & processing.
A 4K camera image (~24MB for RGB888) over a wire between computer systems is best transferred with lossy compression such as H.264 for >=10x reduction in data footprint, as Chris Lalancette suggested in the reply.
If the 4K camera image is processed local to the computer it’s best to remain in process and leave the image on the hardware accelerator for processing than place the burden on the CPU. REP-2007 does this to save the DDS from this demanding workload.