For resource-constrained systems, using ROS2 could be challenging. An application willing to use ROS2 must link against ROS2 libraries and afford the memory usage by entities like nodes/pubs/subs, and DDS entities.
The intention is to benchmark memory usage of ROS2 applications, and based on results, discuss strategies to reduce the memory usage.
To better isolate ROS2 from the DDS memory usage, we’ll be using StubDDS which has stubbed RMW APIS and should have a minimal memory footprint, while keeping basic functionality (like pub/sub with intra-process communication).
Objetive
We want to identify the ROS2 memory usage by:
- ROS2 shared libraries: RSS memory usage at startup.
- Memory footprint of ROS2 entities.
- How does memory scale with:
- Variable number of nodes
- Variable number of pubs/subs, using same or different messages
- Different messages sizes
- Analyze ways to reduce memory usage.
Test platform specs: Quad-core ARM 1.2 MHz - 433Mb RAM
Memory usage depends on the platform tested. x86 has the biggest memory usage. In this post, results belong to the platform specified above which had the lowest memory usage, so we can consider results as a the minimum memory ROS2 will use.
Memory tool used: pmap
Example usage: $ pmap -x <PID of application>
This tool provides information about memory metrics like RSS, VSZ used by each shared library loaded by the application.
ROS2 shared libraries: RSS memory usage
Base RSS memory
We want to obtain the RSS memory usage from ROS2 shared libraries at startup. But before we have to run an application not using ROS2, like the classic HelloWorld program, and measure the memory usage for the simplest app:
Memory usage by non-ROS2 shared libraries: ~1Mb
The total memory usage is ~1Mb, which is the “base” RSS memory usage and should not be taken into account when trying to get specific ROS2 memory usage.
The memory usage by these libraries doesn’t almost increase when adding ROS2 elements and creating bigger applications.
ROS2 base RSS memory
In some architectures memory is loaded from shared libraries into process space when they are first used. So to the helloworld previously used we’ll add just rclcpp::init() to make sure RCLCPP is loaded (and its dependencies).
- Helloworld (no link with ROS2 libs): ~1Mb.
- Application with only rclcpp::init(): ~3Mb (So, ~2Mb due to ROS2 shared libs).
- librclcpp.so maps 792 Kb into process memory. Biggest impact on memory (25% of total RSS).
We can see in the pie graph that 52 shared libraries are added by ROS2, most of them mapping 8Kb each into process space.
Minimum system: node with publisher and subscriber
- RSS=4Mb (~1Mb increase against app with rclcpp::init() only)
- librclcpp.so maps 1196 Kb into process memory. Biggest impact on memory (28.6% of total RSS).
- librclcpp.so sees an increase of 400Kb when creating entities for first time. It remains constant after, if the system gets bigger (same for all other libs).
- Memory used by the new entities is included in anon (anonymous memory) and as the process name memory_benchmark.
With this benchmark we can say that the base RSS memory for a ROS2 app is ~4Mb since this value, i.e. memory used by ROS2 and non-ROS2 shared libs, is maintained
when increasing the size of the system adding more nodes/pubs/subs.
When bigger systems are created, the extra memory belongs to the application and also shows in “anon” (anonymous) memory. Extra memory depends on the size and number of ROS2 entities.
The base memory will grow considerably if using a real DDS as CycloneDDS or FastRTPS, which is not taken into account here.
Short analysis about librclcpp.so
Clearly RCLCPP has a big impact on memory usage. A comparision on librclcpp.so section sizes compared to librcl.so shows why such an impact, the library is quite big and loading it into process space will have a big impact.
$ size -A librclcpp.so # This command shows section sizes of the library
Sections such as the .gnu.hash, .dynsym, .dynstr, .rela.dyn, .gnu.version, .gnu.version_r and .init must be available at run-time and loaded in physical memory partially or entirely by the run-time linker (corresponds to 215Kb on librclcpp.so ). The other sections are loaded when needed (lazy allocation).
Memory footprint of ROS2 entities
Here we analyze the memory usage by single entities:
- Node
- Publishers
- Subscription
- Messages
Procedure to obtain values and comments
By default when creating a node, 6 services/1pub/1sub are created whithin the node (used by ROS2 params). We modified rclcpp code to not create them:
Parameter services and parameter event publisher - Parameter event subscription
The approach taken to get memory usage by different entities consists on creating a large number of those, and measure the app memory usage after creation of each new entity. Allocators could allocate way more memory than needed for each entity, so it requires the creation of multiple entities until we see a new increase on RSS. Creating a large number of them allow us to obtain average values.
The memory usage by entity seems to be constant regardless of the amount of entities created (linear memory increase).
With these values we can get a formula for an approximate minimum usage of memory on a ROS2 system:
For example a system of 10 nodes like:
- Node
- 3 Pubs → MsgA ← 3 Subs
- 2 Pubs → MsgB ← 2 Subs
So there are 2 diff messages used in subscriptions, 2 diff messages used in publishers.
Nnodes = 10 ; Nsubs = 50 ; Npubs = 50 ; Nmsgs_used_by_subs = 2 ; Nmsgs_used_by_pubs = 2;
RSS ~= 4Mb +10 * 5Kb + 50 * 7.5Kb + 50 * 2.3Kb + 2 * 11.5Kb + 2 * 4.13Kb = 4667 Kb
Node size not constant when enabling ROS2 params
As mentioned before, we’ve removed the six services, publisher and subscription used by ROS2 parameters when creating every node:
Parameter services and parameter event publisher - Parameter event subscription
Without these entities, the average node size seems to be constant regardless of the amount of nodes (~5Kb), so the increase in memory is linear with the amount of nodes.
After re-enabling the creation of these entities, new nodes additions have a non-linear increase of memory.
The first nodes being created take ~50Kb each, but after creating 100 nodes, a new node uses almost 100Kb. As average, every new 60 nodes the node size increases 50Kb.
In the left plot below, we compare the total system RSS creating nodes with and without the parameter event entities. In the plot on the right, the average node size with respect to amount of nodes created.
Using different messages sizes
In the previous tests we used a small message (just an int) to be used on publishers and subscriptions. An average node with a pub/sub took ~15Kb RSS.
If we add 1Kb data field to the message, the average memory taken when creating a node/pub/sub was 20Kb.
MsgA: int;
MsgB: int + 1Kb array
- Node: Pub → MsgA ← Sub : ~15Kb average memory per node
- Node: Pub → MsgB ← Sub : ~20Kb average memory per node
So we can say that memory used by publishers/subscriptions also depend on the message being used.
Stub vs FastRTPS entiities
Of course the memory usage by entities is different if we use a real DDS (like FastRTPS) instead of the Stub DDS:
ROS2 memory when system scales
Here we compare:
- Shared libs RSS for only a HelloWorld
- Shared libs RSS when only rclcpp::init
- Shared libs RSS of a single node pub/sub system
- Shared libs RSS of 100 nodes each with a pub/sub, all using same message type.
- Shared libs RSS of 100 nodes each with a pub/sub using a different message type
The bars graph shows the total memory usage (RSS) by the different applications.
The pie graphs show the porportional RSS from each shared library for each of the tests.
The memory usage in the case of a single node/pub/sub system is what’s considered the base memory usage of a ROS2 app (middle pie graph). The size of the shared libraries RSS doesn’t increase when increasing the size of the system (adding nodes,pubs,subs,etc).
Increasing the size of the system (without creating new messages) as seen in the 4th pie, shows that extra memory goes to “anon” (memory allocated for nodes/pubs/subs/msgs).
Increasing the size of the system but creating also new messages as seen in the 5th pie, shows that much of the extra memory goes to the memory_benchmark
application, since now has to include each new message header file and they are used as template arguments for pubs/subs.
Possible approaches to reduce memory usage
We can divide efforts in reducing memory usage in two main areas:
- Reduce memory used by shared libraries
- Reduce memory used by ROS2 entities
Focusing on one or the other depends on the size of the system trying to fit in some platform. If the system is made up by hundreds of entities, better to focus on reducing size of entities.
On the contrary if the system is small, the focus should be better on reducing the “base” ROS2 memory usage.
Reduce memory used by shared libraries
Some ideas to reduce memory used by shared libraries are:
- Reduce size of librclcpp.so (since has the biggest footprint), by dividing it into modules (shared libs) which can be used or not at user will.
- Remove not needed dependencies: Some platforms mark as NEEDED many shared libs that are not really used at run-time.
- Link all libs statically, to reduce the start-up/run-time overhead and bookkeeping required to load shared libraries symbols at runtime. From the pie graphs we can see a good memory usage by dozens of small libraries (which at least use 4Kb each due minimum page size). Estimated 12% RSS reduction.
Reduce memory used by ROS2 entities
- Remove libstatistics, tracetools, etc. with compile flags. These libraries add extra pubs/subs to every entity created. Same with ROS2 params.
- Review what are entities made up of and see if they can be optimized.
- Messages seem to have an impact on memory usage. Review use of messages, see if can be optimized.