Hi,
Recently I have been investigating the performances of ROS2.
What concerns me the most it’s actually its memory usage and how it is increasing among different distributions.
I would like to have a discussion with you about what do you think of these results.
TL; DR;
ROS2 nodes take always at least 20 MB of RAM in Crystal and this amount is increased from previous distributions.
Moreover the virtual memory is extremely high in almost all cases.
FastRTPS behaves really well for single node cases, but OpenSplice scales better when having more nodes in the same process.
I’m measuring memory using
PhisicalRAM ---> ps -o rss --pid $PROCESS_ID
Virtual RAM ---> ps -o vsz --pid $PROCESS_ID
Reading this old thread, dating back to April 2017, Fast-RTPS Memory Usage - Please Help
the amount of memory used by simple nodes was reported as
talker
: 4.7Mb
talker with 40 publishers
: 8.7Mb
If I run some simple nodes, I get the following results:
| Distribution | DDS | Physical RAM | Virtual RAM
publisher_lambda | Bouncy | FastRTPS | 17MB | 320MB
publisher_lambda | Crystal | FastRTPS | 20MB | 420MB
publisher_lambda | Crystal | Opensplice | 24MB | 1.7GB
The first thing we note is that the amount of physical memory used has increased a lot since that old Beta release.
Moreover the Virtual memory requirement is extremely high.
Trying to add publishers/subscriptions inside the same node, it seems to scale pretty well.
| Distribution | DDS | Physical RAM | Virtual RAM
1 node subscribing 1 header topic | Crystal | FastRTPS | 22MB | 500MB
1 node subscribing 2 header topic | Crystal | FastRTPS | 22MB | 500MB
1 node subscribing 40 header topic | Crystal | FastRTPS | 24MB | 500MB
1 node subscribing 1 header topic | Crystal | OpenSplice | 27MB | 1.8GB
1 node subscribing 2 header topic | Crystal | OpenSplice | 27MB | 1.8GB
1 node subscribing 40 header topic | Crystal | OpenSplice | 29MB | 1.8GB
There are almost no differences in adding more publishers/subscribers of the same type to a node.
All these nodes were using very small messages, how things change if we use big messages?
For example with 2MB or 4MB fields? Or with subscription to two different big topics?
| Distribution | DDS | Physical RAM | Virtual RAM
1 node subscribing 4MB topic | Crystal | FastRTPS | 22MB | 1.1GB
1 node subscribing 4MB and 2MB | Crystal | FastRTPS | 36MB | 1.38GB
1 node subscribing 40 4MB topic | Crystal | FastRTPS | 40MB | 31GB
1 node subscribing 4MB topic | Crystal | OpenSplice | 27MB | 2.1GB
1 node subscribing 4MB and 2MB | Crystal | OpenSplice | 36MB | 2.1GB
1 node subscribing 40 4MB topic | Crystal | OpenSplice | 29MB | 16GB
With both the DDS tested, the type of message and its dimension are not influencing the allocated memory for the subscription in a noticeable way.
On the other hand the virtual memory is way bigger than in the case of small messages.
Adding an additional subscription to a different topic, increases the allocated physical memory. OpenSplice do not require any additional Virtual memory.
The physical memory overhead is around 50% in both cases.
We see differences when looking at nodes creating 40 subscriptions to different big topics.
In this case FastRTPS almost double the amount of physical memory, while OpenSplice has the same overhead as using 40 header topics, as shown in previous table.
For what concerns the virtual memory, FastRTPS shows a whopping 31GB, but also OpenSplice has a very high value, with its 16GB.
Trying to create systems with more than 1 node.
| Distribution | DDS | Physical RAM | Virtual RAM
1 node publishing 1 header topic | Crystal | FastRTPS | 22MB | 500MB
20 node publishing 1 header topic | Crystal | FastRTPS | 354MB | 3.4GB
1 node publishing 1 header topic | Crystal | OpenSplice | 25MB | 1.8GB
2 node publishing 1 header topic | Crystal | OpenSplice | 28MB | 2.1GB
3 node publishing 1 header topic | Crystal | OpenSplice | 29MB | 2.17GB
10 node publishing 1 header topic | Crystal | OpenSplice | 40MB | 2.24GB
20 node publishing 1 header topic | Crystal | OpenSplice | 57MB | 2.33GB
Here we see big differences between the two DDS.
FastRTPS has a smaller memory usage when we talk about a system with a single node.
However OpenSplice scales way better, using only 1/7 of memory when we have 20 nodes.
What do you think of these results?
Will the memory requirements of ROS2 keep increasing with the next releases?
In your opinion, how does this cope with the use of ROS2 in embedded platforms, which may be restricted to 32 bit memory addresses?