ROS 2 for Consumer Robotics

Hi all,
I’m Alberto, from iRobot.
As you may know, my colleague Juan and I gave a presentation at ROSCon about using ROS 2 in consumer robotics application.
If you missed it, you can find it here iRobot-ROSCON.pdf | DocDroid

The purpose of our talk was to discuss about a performance gap in ROS 2 that was preventing it from being effectively used on a particular class of platforms:
we thought that Linux-based, low cost, embedded platforms (as RaspberryPi) were not taken enough into consideration.

During our talk we highlighted the most relevant problems of the current (Dashing) release and we suggested some improvements.
All the experiments presented in the talk or in this blog post have been performed using our open-source performance evaluation framework: it’s a very simple framework that measures some fundamental performance metrics; its peculiarity is that it allows to simulate any type of arbitrarily complex ROS 2 system through a JSON file.
You can find it here GitHub - irobot-ros/ros2-performance: Framework to evaluate peformance of ROS 2.

We want to thank all of you for the interest you showed in the problems that we presented and we are extremely happy to see all the work that is going on to address these issues.

First of all, we can have a look at the default performance of the current ROS 2 master.

The performance are already a lot better than what we measured for Dashing, in particular the CPU usage and the latency improved and we have a reduction of ~40 Mb of RAM. We have no more “too late” or “lost” messages (we are in a single process scenario, so this is highly desirable!)

However, what is really interesting is all the work that is currently going on and will be part of the next releases

With all this work going on, I think that we can say that if Dashing and Eloquent were the releases that added to ROS 2 most of the required features, with the next release, Foxy, the ROS 2 performances will get a big improvement!

Let’s keep going in this direction, to make ROS 2 a success!

17 Likes

Hi Alberto,

As you said, Fast RTPS and rmw_fastrtps default settings are designed for real-time behaviour:

1.- Static allocations: We allocate some memory at startup to avoid dynamic allocations. While this is good for many applications, it is not if you are looking for minimum memory usage.

2.- Async Publishing: The user thread is not publishing directly the message but copies the data to a buffer, and a middleware thread does the actual publication. This is good for real-time determinism as the user thread returns immediately. But is not good if you are looking for minimum latency.

Also, the Fast RTPS for dashing didn’t count with an intra-process mechanism.

Therefore in your initial comparison, you were comparing:

  • Fast RTPS: Static Allocations + Async Pub + Loopback (Dashing)
  • Cyclone DDS: Dynamic Allocations + Sync Pub + Intraprocess (Latest)

But, if you setup Fast RTPS with the recommended settings for this case and use the latest version, you get:

This experiment includes the whitelist mechanism you recommended in your presentation (10 mb less of memory usage). A lot better, and matching your requirements.

The documentation to setup the rmw of fast RTPS is in the readme of the rmw: GitHub - ros2/rmw_fastrtps: Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.

Also is worth to comment ROS2 is creating a DDS participant per publication or subscription, leading to a lot of participants. That is not the recommended use of DDS, and in this case leads to a lot of participants. For the next ROS2 release this behaviour is going to be fixed:

We did an experiment using your framework with an equivalent topology, but just a single DDS participant. In that case:

With 1-to-1 mapping, both the memory and the CPU usage improve a lot, as it is the recommended way to use DDS.

How to Reproduce the results:

Also, to easily reproduce these results and the different experiments, we created a complete Rosject:

https://rds.theconstructsim.com/l/f196e4b/

Important Note: iRobot scenario uses a Raspberry Pi with raspbian. The rosject creates a cloud instance using linux, and the memory page size is bigger than in a raspberry PI, so the memory measures are higher for any configuration, but it is useful to understand the differences.

As conclusion, ROS2 with Fast RTPS is highly configurable, making ROS2 fast and reliable for very different cases, and for Foxy we will focus on performance, so as you predict, it will be even better.

More Resources:

For a how-to on how to change the basics:
GitHub - ros2/rmw_fastrtps: Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS. (readme)
And for detailed documentation of all the available options, see:
https://fast-rtps.docs.eprosima.com/en/latest/xmlprofiles.html#

4 Likes