ROS2-Galactic DDS Problems

Hey guys,
We are a student team at Siemens in Germany focusing on ROS2 under the supervision of @flo. currently stumbled across some quirks and problems with using DDS as RMW with ROS2-galactic applications.

Our attempt to get a smooth camera stream up and running smoothly even via Ethernet and 480p resolution failed so far while using a computer with limited resources.
The connection had a very high latency of roughly 2-3 seconds and wasn’t reliable at all due to constant freezes. In the use case of steering a mobile platform remotely, this was simply not sufficient.
Now we want to investigate why this problem occurs and how to fix it. On our way to evaluate the current state of the transmissions and network, we discovered some helpful tools for tracking these metrics:

Unfortunately, some of the tools (except the cli ddsperf) don’t work out of the box and the documentation is very limited on how to set up the tools. DDSperf (CLI) is included by default in the ros2 galactic installation with cyclondds. But the scripts for plotting and tracking a remote connection provided in https://github.com/eclipse-cyclonedds/cyclonedds/tree/master/examples/perfscript won’t run. (Tools tested so far are ddsperf, and apexai)

Our setup:

I’ve read some other threads regarding problems with the setup of DDS to work as expected. But this topic still seems to be a huge problem with many of the ROS2 users running it outside of turtlesim and examples.

The first step to overcome this is to measure the current performance of the setup and then start tweaking the DDS to our needs. In the last step, there would be a comparison with other DDS implementations. If you have any suggestions regarding this procedure or tips and tricks for the DDS setup please feel free to share.

Our first very rough findings for the performance on different devices are as following:
Packetloss:

  • 0.03% on native ubuntu 20.04 machine with ros2 galactic
  • 10% on IOT2050 with docker container and osrf:ros2-galactic image
  • 20% on VmWare VM ubuntu 20.04 with ros2 galactic

The test was conducted with subscriber and publisher both on the tested machine with 100Hz and 1MB Payload.

$ ddsperf pub 100Hz size 1MB -u -k 1 
$ ddsperf sub -u -k 1

Some other related threads:

Keep in mind that usb_cam is inherently very inefficient if you aren’t using the compressed image topic, regardless of the RMW. This because each sensor_msgs/Image is a raw, uncompressed image: not making use of compression inside the frame. Even if you do enable compression on usb_cam, you still won’t get compression from frame to frame like you get with H.264 or another codec.If you send raw video frames over the network at 25fps, then it is very likely to be poor performance. Additionally, without compression enabled, your CPU is having to do the work of decoding the stream from the camera, which is most likely compressed (Oak-D-lite says it supports H.264, H.265, and MPEG codecs, all of which are compressed). It’s not a DDS limitation I think, but a question of CPU doing unnecessary work, and sending data over the network in an inefficient format.

2 Likes

Just because you mentioned my very simple performance test rex-schilasky/ros2_latency_ipc. At least that one should be easy to setup and use.

You need to source your ROS2 distro, call colcon build, start the receiving node and finally start the sending one with parameters for number of runs, delays between two publications and the message size.

The log result is easy to interpret

--------------------------------------------
Messages received             : 1000
Message size received         : 1 kB
Message average latency       : 366 us
Message min latency           : 140 us @ 355
Message max latency           : 402 us @ 150
Throughput                    : 2730 kB/s
                              : 2 MB/s
                              : 2730 Msg/s
--------------------------------------------
--------------------------------------------
Messages received             : 1000
Message size received         : 2 kB
Message average latency       : 375 us
Message min latency           : 168 us @ 99
Message max latency           : 409 us @ 73
Throughput                    : 5332 kB/s
                              : 5 MB/s
                              : 2666 Msg/s
--------------------------------------------
--------------------------------------------
Messages received             : 1000
Message size received         : 4 kB
Message average latency       : 379 us
Message min latency           : 139 us @ 177
Message max latency           : 442 us @ 790
Throughput                    : 10528 kB/s
                              : 10 MB/s
                              : 2632 Msg/s
4 Likes

@aposhian Thank you very much for your input. We will check that and keep you posted!
@rex-schilasky Thanks for the advice on how to run the tool. My text was missleading, we didn’t test all the tools yet. But your tool looks also very promissing

Hi, you can look up the computer resource situation, especially CPU. And you can change the size of image. And I think you know the reseaon and try to control the size and speed of the communication program.