ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

ROS 2 for Consumer Robotics

Hi all,
I’m Alberto, from iRobot.
As you may know, my colleague Juan and I gave a presentation at ROSCon about using ROS 2 in consumer robotics application.
If you missed it, you can find it here

The purpose of our talk was to discuss about a performance gap in ROS 2 that was preventing it from being effectively used on a particular class of platforms:
we thought that Linux-based, low cost, embedded platforms (as RaspberryPi) were not taken enough into consideration.

During our talk we highlighted the most relevant problems of the current (Dashing) release and we suggested some improvements.
All the experiments presented in the talk or in this blog post have been performed using our open-source performance evaluation framework: it’s a very simple framework that measures some fundamental performance metrics; its peculiarity is that it allows to simulate any type of arbitrarily complex ROS 2 system through a JSON file.
You can find it here

We want to thank all of you for the interest you showed in the problems that we presented and we are extremely happy to see all the work that is going on to address these issues.

First of all, we can have a look at the default performance of the current ROS 2 master.

The performance are already a lot better than what we measured for Dashing, in particular the CPU usage and the latency improved and we have a reduction of ~40 Mb of RAM. We have no more “too late” or “lost” messages (we are in a single process scenario, so this is highly desirable!)

However, what is really interesting is all the work that is currently going on and will be part of the next releases

  • CycloneDDS
    Refactor of the serialization procedure: this made the code cleaner and it reduced latency and CPU usage, especially for big messages. (results here)
    This has been recently merged to master.
    Improving the deserialization procedure will be the next step and many more features are coming since this DDS has been added to the list of ROS 2 stable sources.

  • FastRTPS
    Substantial improvements on the memory side: before most of the memory allocation was happening at startup to target real-time use cases, while now the default behavior will be more memory efficient in scenario where real-time constraints are not present.
    The RAM usage in our benchmark goes from 116 Mb to 33 Mb.
    Part of this is already available in master, while other works will be ready for the next Eloquent relase.

  • RMW Iceoryx
    A new RMW, completely based on a shared memory layer in order to ensure best performances for inter-process communication.
    This is already available and it’s receiving a lot of interst and support.

  • Static Executor
    A first step towards the refactor of executors in rclcpp. This new executor performs the best when most of the nodes are already available at startup, but is still able to recognize new publisher/subscriptions.
    Unfortunately, it is not updated to the current master, so I haven’t been able to test it extensively, but the CPU usage looks definitely better!
    It’s targeting the next ROS 2 release, Foxy.
    SingleThreadedExecutor creates a high CPU overhead in ROS 2

With all this work going on, I think that we can say that if Dashing and Eloquent were the releases that added to ROS 2 most of the required features, with the next release, Foxy, the ROS 2 performances will get a big improvement!

Let’s keep going in this direction, to make ROS 2 a success!


Hi Alberto,

As you said, Fast RTPS and rmw_fastrtps default settings are designed for real-time behaviour:

1.- Static allocations: We allocate some memory at startup to avoid dynamic allocations. While this is good for many applications, it is not if you are looking for minimum memory usage.

2.- Async Publishing: The user thread is not publishing directly the message but copies the data to a buffer, and a middleware thread does the actual publication. This is good for real-time determinism as the user thread returns immediately. But is not good if you are looking for minimum latency.

Also, the Fast RTPS for dashing didn’t count with an intra-process mechanism.

Therefore in your initial comparison, you were comparing:

  • Fast RTPS: Static Allocations + Async Pub + Loopback (Dashing)
  • Cyclone DDS: Dynamic Allocations + Sync Pub + Intraprocess (Latest)

But, if you setup Fast RTPS with the recommended settings for this case and use the latest version, you get:

This experiment includes the whitelist mechanism you recommended in your presentation (10 mb less of memory usage). A lot better, and matching your requirements.

The documentation to setup the rmw of fast RTPS is in the readme of the rmw:

Also is worth to comment ROS2 is creating a DDS participant per publication or subscription, leading to a lot of participants. That is not the recommended use of DDS, and in this case leads to a lot of participants. For the next ROS2 release this behaviour is going to be fixed:

We did an experiment using your framework with an equivalent topology, but just a single DDS participant. In that case:

With 1-to-1 mapping, both the memory and the CPU usage improve a lot, as it is the recommended way to use DDS.

How to Reproduce the results:

Also, to easily reproduce these results and the different experiments, we created a complete Rosject:

Important Note: iRobot scenario uses a Raspberry Pi with raspbian. The rosject creates a cloud instance using linux, and the memory page size is bigger than in a raspberry PI, so the memory measures are higher for any configuration, but it is useful to understand the differences.

As conclusion, ROS2 with Fast RTPS is highly configurable, making ROS2 fast and reliable for very different cases, and for Foxy we will focus on performance, so as you predict, it will be even better.

More Resources:

For a how-to on how to change the basics: (readme)
And for detailed documentation of all the available options, see: