ROS2 speed

Hi Bernd,

Thank you for reporting this. I’m sorry things weren’t working well in this case, even after all of the things you tried. Thank you for taking the time and effort to create the careful writeup and the minimal test case that demonstrates the problem. That helps so much.

I was able to replicate your results, and then spent some time trying other approaches. The following sequence of actions seem to make it work nicely on Cyclone with everything at defaults (Cyclone DDS with its out-of-the-box configuration):

  1. change from “row-major” message alignment to “column-major”. In other words, instead of a list of small messages, instead try a single message with “columns” for each primitive, like uint16[] x, uint16[] y, etc. The reasoning is the same as between PointCloud and PointCloud2 in ROS: it simplifies things a lot for the serialization and deserialization systems.
  2. increase the maximum and default UDP buffer sizes for both the write and read buffers. This is generally required for DDS, because it hammers UDP very hard when dealing with large and/or fast messages. It’s good you already did this, because performance crashes at high data rates when those buffers fill up.
  3. use a C++ subscriber for the performance measurement rather than the Python-based system provided by the ros2 CLI tool.

With those three steps, it now shows 100% message delivery at the full speed of 1000 Hz for messages up through 20,000 elements on my machine. When sending 5000-element messages, top shows CPU usage of 13% on the publisher and 25% on the subscriber. I am sure that with extra effort to increase fragment sizes and other DDS configuration games, things could get much better, but I didn’t try any of that.

You could also try using fixed-size arrays instead of dynamic-length arrays inside the message. Although that would make the downstream code a bit more complex to ignore the “unused” part of the buffer, some RMW implementations will then be able to use additional techniques to speed things up.

You could also try to compose the driver node in the same process as whatever will receive the messages and process them, because then you can bypass a lot of comms layers.

I’ll make a PR to your test-case repository with my hacky implementation of the ideas of (1) and (3) above.

Cheers!
Morgan

13 Likes