ROS2 Memory Usage (no DDS)

For resource-constrained systems, using ROS2 could be challenging. An application willing to use ROS2 must link against ROS2 libraries and afford the memory usage by entities like nodes/pubs/subs, and DDS entities.

The intention is to benchmark memory usage of ROS2 applications, and based on results, discuss strategies to reduce the memory usage.

To better isolate ROS2 from the DDS memory usage, we’ll be using StubDDS which has stubbed RMW APIS and should have a minimal memory footprint, while keeping basic functionality (like pub/sub with intra-process communication).

Objetive

We want to identify the ROS2 memory usage by:

  • ROS2 shared libraries: RSS memory usage at startup.
  • Memory footprint of ROS2 entities.
  • How does memory scale with:
    • Variable number of nodes
    • Variable number of pubs/subs, using same or different messages
    • Different messages sizes
  • Analyze ways to reduce memory usage.

Test platform specs: Quad-core ARM 1.2 MHz - 433Mb RAM

Memory usage depends on the platform tested. x86 has the biggest memory usage. In this post, results belong to the platform specified above which had the lowest memory usage, so we can consider results as a the minimum memory ROS2 will use.

Memory tool used: pmap

Example usage: $ pmap -x <PID of application>
This tool provides information about memory metrics like RSS, VSZ used by each shared library loaded by the application.

ROS2 shared libraries: RSS memory usage

Base RSS memory

We want to obtain the RSS memory usage from ROS2 shared libraries at startup. But before we have to run an application not using ROS2, like the classic HelloWorld program, and measure the memory usage for the simplest app:

Memory usage by non-ROS2 shared libraries: ~1Mb

The total memory usage is ~1Mb, which is the “base” RSS memory usage and should not be taken into account when trying to get specific ROS2 memory usage.

The memory usage by these libraries doesn’t almost increase when adding ROS2 elements and creating bigger applications.

ROS2 base RSS memory

In some architectures memory is loaded from shared libraries into process space when they are first used. So to the helloworld previously used we’ll add just rclcpp::init() to make sure RCLCPP is loaded (and its dependencies).

  • Helloworld (no link with ROS2 libs): ~1Mb.
  • Application with only rclcpp::init(): ~3Mb (So, ~2Mb due to ROS2 shared libs).
  • librclcpp.so maps 792 Kb into process memory. Biggest impact on memory (25% of total RSS).

We can see in the pie graph that 52 shared libraries are added by ROS2, most of them mapping 8Kb each into process space.

Minimum system: node with publisher and subscriber

  • RSS=4Mb (~1Mb increase against app with rclcpp::init() only)
  • librclcpp.so maps 1196 Kb into process memory. Biggest impact on memory (28.6% of total RSS).
  • librclcpp.so sees an increase of 400Kb when creating entities for first time. It remains constant after, if the system gets bigger (same for all other libs).
  • Memory used by the new entities is included in anon (anonymous memory) and as the process name memory_benchmark.

With this benchmark we can say that the base RSS memory for a ROS2 app is ~4Mb since this value, i.e. memory used by ROS2 and non-ROS2 shared libs, is maintained
when increasing the size of the system adding more nodes/pubs/subs.

When bigger systems are created, the extra memory belongs to the application and also shows in “anon” (anonymous) memory. Extra memory depends on the size and number of ROS2 entities.

The base memory will grow considerably if using a real DDS as CycloneDDS or FastRTPS, which is not taken into account here.

Short analysis about librclcpp.so

Clearly RCLCPP has a big impact on memory usage. A comparision on librclcpp.so section sizes compared to librcl.so shows why such an impact, the library is quite big and loading it into process space will have a big impact.

$ size -A librclcpp.so # This command shows section sizes of the library

Sections such as the .gnu.hash, .dynsym, .dynstr, .rela.dyn, .gnu.version, .gnu.version_r and .init must be available at run-time and loaded in physical memory partially or entirely by the run-time linker (corresponds to 215Kb on librclcpp.so ). The other sections are loaded when needed (lazy allocation).

Memory footprint of ROS2 entities

Here we analyze the memory usage by single entities:

  • Node
  • Publishers
  • Subscription
  • Messages

Procedure to obtain values and comments

By default when creating a node, 6 services/1pub/1sub are created whithin the node (used by ROS2 params). We modified rclcpp code to not create them:

Parameter services and parameter event publisher - Parameter event subscription

The approach taken to get memory usage by different entities consists on creating a large number of those, and measure the app memory usage after creation of each new entity. Allocators could allocate way more memory than needed for each entity, so it requires the creation of multiple entities until we see a new increase on RSS. Creating a large number of them allow us to obtain average values.

The memory usage by entity seems to be constant regardless of the amount of entities created (linear memory increase).


image

With these values we can get a formula for an approximate minimum usage of memory on a ROS2 system:

For example a system of 10 nodes like:

  • Node
    • 3 Pubs → MsgA ← 3 Subs
    • 2 Pubs → MsgB ← 2 Subs

So there are 2 diff messages used in subscriptions, 2 diff messages used in publishers.

Nnodes = 10 ; Nsubs = 50 ; Npubs = 50 ; Nmsgs_used_by_subs = 2 ; Nmsgs_used_by_pubs = 2;

RSS ~= 4Mb +10 * 5Kb + 50 * 7.5Kb + 50 * 2.3Kb + 2 * 11.5Kb + 2 * 4.13Kb = 4667 Kb

Node size not constant when enabling ROS2 params

As mentioned before, we’ve removed the six services, publisher and subscription used by ROS2 parameters when creating every node:

Parameter services and parameter event publisher - Parameter event subscription

Without these entities, the average node size seems to be constant regardless of the amount of nodes (~5Kb), so the increase in memory is linear with the amount of nodes.

After re-enabling the creation of these entities, new nodes additions have a non-linear increase of memory.

The first nodes being created take ~50Kb each, but after creating 100 nodes, a new node uses almost 100Kb. As average, every new 60 nodes the node size increases 50Kb.

In the left plot below, we compare the total system RSS creating nodes with and without the parameter event entities. In the plot on the right, the average node size with respect to amount of nodes created.

Using different messages sizes

In the previous tests we used a small message (just an int) to be used on publishers and subscriptions. An average node with a pub/sub took ~15Kb RSS.

If we add 1Kb data field to the message, the average memory taken when creating a node/pub/sub was 20Kb.

MsgA: int;
MsgB: int + 1Kb array

  • Node: Pub → MsgA ← Sub : ~15Kb average memory per node
  • Node: Pub → MsgB ← Sub : ~20Kb average memory per node

So we can say that memory used by publishers/subscriptions also depend on the message being used.

Stub vs FastRTPS entiities
Of course the memory usage by entities is different if we use a real DDS (like FastRTPS) instead of the Stub DDS:

ROS2 memory when system scales

Here we compare:

  • Shared libs RSS for only a HelloWorld
  • Shared libs RSS when only rclcpp::init
  • Shared libs RSS of a single node pub/sub system
  • Shared libs RSS of 100 nodes each with a pub/sub, all using same message type.
  • Shared libs RSS of 100 nodes each with a pub/sub using a different message type

The bars graph shows the total memory usage (RSS) by the different applications.


The pie graphs show the porportional RSS from each shared library for each of the tests.

The memory usage in the case of a single node/pub/sub system is what’s considered the base memory usage of a ROS2 app (middle pie graph). The size of the shared libraries RSS doesn’t increase when increasing the size of the system (adding nodes,pubs,subs,etc).
Increasing the size of the system (without creating new messages) as seen in the 4th pie, shows that extra memory goes to “anon” (memory allocated for nodes/pubs/subs/msgs).

Increasing the size of the system but creating also new messages as seen in the 5th pie, shows that much of the extra memory goes to the memory_benchmark application, since now has to include each new message header file and they are used as template arguments for pubs/subs.

Possible approaches to reduce memory usage

We can divide efforts in reducing memory usage in two main areas:

  • Reduce memory used by shared libraries
  • Reduce memory used by ROS2 entities

Focusing on one or the other depends on the size of the system trying to fit in some platform. If the system is made up by hundreds of entities, better to focus on reducing size of entities.

On the contrary if the system is small, the focus should be better on reducing the “base” ROS2 memory usage.

Reduce memory used by shared libraries

Some ideas to reduce memory used by shared libraries are:

  • Reduce size of librclcpp.so (since has the biggest footprint), by dividing it into modules (shared libs) which can be used or not at user will.
  • Remove not needed dependencies: Some platforms mark as NEEDED many shared libs that are not really used at run-time.
  • Link all libs statically, to reduce the start-up/run-time overhead and bookkeeping required to load shared libraries symbols at runtime. From the pie graphs we can see a good memory usage by dozens of small libraries (which at least use 4Kb each due minimum page size). Estimated 12% RSS reduction.

Reduce memory used by ROS2 entities

  • Remove libstatistics, tracetools, etc. with compile flags. These libraries add extra pubs/subs to every entity created. Same with ROS2 params.
  • Review what are entities made up of and see if they can be optimized.
  • Messages seem to have an impact on memory usage. Review use of messages, see if can be optimized.
25 Likes

I completely agree about wanting to be able to remove or disable these extra messages/pubs/subs/etc. for resource-constrained systems.

Note that tracetools does not use ROS 2 entities. It exists for this exact purpose: its main goal is to be able to offer similar-ish execution information with way lower/minimal overhead thanks to tracing. Also, you can already completely remove it with --cmake-args -DTRACETOOLS_DISABLED=ON :smiley: (although I realize that the README doesn’t explicitly say that)

6 Likes

Thanks for the report @mauropasse, I have found this super interesting.

In the micro-ROS team, we usually deal with this kind of memory management situation.

In our case, we have developed a ROS2 C99 user API: RCLC. This layer exists on top of RCL and its main objective is to be memory-aware and real-time (CC: @JanStaschulat). Also, this is the default API for micro-ROS, so we usually built it in MCU with less than 150 KB of program memory and around 64 KB of RAM.

Some memory profiling of the whole micro-ROS stack can be found here, but I think that it can be interesting to profile the RCLC using StubDDS to isolate our user layer (CC: @mamerlan).

Also, regarding the possibility of disabling ROS2 features we use:

Most of those patches are micro-ROS specific but in our opinion, some of them could be considered to be integrated into ROS2 mainline of RCL and RCUtils repos (CC: @ralph-lange).


  • Link all libs statically, to reduce the start-up/run-time overhead and bookkeeping required to load shared libraries symbols at runtime. From the pie graphs we can see a good memory usage by dozens of small libraries (which at least use 4Kb each due minimum page size). Estimated 12% RSS reduction.

Just as a comment for this, we have prepared a fork of rosidl_typesupport_c that allows using multiple supports but having all of them statically linked. Just tunning the typesupport dispatcher and generating the symbols array at “code generation time”. This is useful for embedded and micro-ROS but maybe also for environments where static linkage is required.

8 Likes

I think most of these can be disabled via NodeOptions. I have disabled the ones related with parameters in the past.

1 Like

Thanks @mauropasse, that’s a very interesting analysis!

One thing which would be very interesting to know is how much of this memory usage is shared between nodes. Usually, shared libraries have the advantage that they need to be loaded only once and much of their data can be shared between processes. Of course, all data that is specific to your application cannot be shared. Therefore, seeing how much is shared and how much counts again for each node you start would be very helpful to know.

2 Likes

I’m creating a “Foxy” ROS2bot on a Raspberry Pi3B+ (64-bit ARM Quad-Core 1.4GHz 1GB RAM) which has 680MB free RAM running Ubuntu 20.04 64-bit Server.

I expect to have Nodes for:

  • Left/Right Motor
  • Left/Right Motor Encoders
  • BNO055 IMU
  • ToF Distance Sensor
  • Pan Servo for ToF Distance Sensor
  • YDLIDAR
  • PiCamera (only publish 320x240 images for off-board resources - no onboard image subscribers)
  • Health Node pubs: (battery voltage, cpu load, cpu temp …)
  • Baggers for sensor and health data

And nodes not characterized in “ROS2 Memory Usage”:

  • Encoder/IMU fusion localization service
  • Path to Goal w/obstacle avoidance re-routing action
  • Stationary LIDAR scan localization against reference map (w/o map development)

Do I read the charts and the report correctly, that my bot with 680MB free RAM should not be memory constrained?

(I’m guessing the bot will be processing constrained though.)