Fast-RTPS Memory Usage - Please Help

Hello,

I am building a DDS node that will interact with a ROS2 node. I have found that the Fast-RTPS DDS middleware is using what I consider a very excessive amount of memory.

Running the HelloWorld example allocated 404,340 KB of memory. This seems extremely large for such a simple example.

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tco 29836 0.0 0.1 404340 10320 pts/0 Sl 15:41 0:01 ./HelloWorldExample publisher

When I build out my actual application the VSZ gets as large as 3 Gigs. Each publisher and subscriber that my application creates adds Megs to the memory usage even if the actual data is only a single integer.

Is this level of memory usage expected or is it possible get Fast-RTPS running in under 40 megs with approximately 20 topics involved? My embedded target has 256 Megs of RAM total.

Thank you for any help or advice you can provide.

nm.

1 Like

@NotMe that sounds like a bug, but I’m unable to reproduce it, at least on my macOS machine (I haven’t tried Linux yet).

If I run talker or listener I get a steady 4.7Mb of memory usage for both. If I modify it like this to create 40 of each publisher/subscriber (each on a different topic):

I still only about 8.4Mb of memory usage for each.

Maybe you could provide a similarly minimal example which has this problem? We haven’t gotten to the point where we are profiling to look for unintended memory usage, so there might be a bug in there.

Also, I’m using the master branch, which version of ROS2 are you using? That might affect your Fast-RTPS version and maybe the issue you’re seeing is already addressed on master.

Quick remark: I think we are talking about different examples here. It looks like the example you are referring too is a FastRTPS example, that uses different QoS settings (amongst other things) that the ROS2 talker / listener example that @wjwwood is referring too. @NotMe do you have the same behavior with the ROS2 talker / listener examples?

Thank you both for the very quick response.

Before my first post I was running the HelloWorld example from the eProsima Fast-RTPS. Specifically it was this commit:
"
tco@osboxes:~/Fast-RTPS-git$ git show
commit 2c86bb2d42e414007c2962942e321e82ff43ae9a
Merge: 87881ec b285f9b
Author: JavierIH javierisabel@eprosima.com
Date: Thu Apr 20 08:08:08 2017 +0200

Merge branch 'master' of git.sambaserver.eprosima.com:rtps/rtps

"

This morning I followed the instructions here:

https://github.com/ros2/ros2/wiki/Linux-Install-Binary

And installed this:

ros2-beta1-package-linux-fastrtps.tar.bz2

Running the ‘talker’ and ‘listener’ examples in separate terminals I see that each has allocated over 540 megs according to the ‘VSZ’ column in ‘ps aux’.

I attached two screenshots to help communicate exactly what I used for the test before my first post and the test that I just ran this morning.

As before, any help would be greatly appreciated.

nm.

Thanks for trying out our examples too.

Can you try a more recent binary, like this one:

http://ci.ros2.org/view/packaging/job/packaging_linux/lastSuccessfulBuild/artifact/ws/ros2-package-linux-x86_64.tar.bz2

And see if the issue persists. Likewise I’ll try to reproduce what you’ve described in my Linux VM tonight.

@wjwwood Thank you for the continuing help.

I ran the ‘talker’ example from the ‘lastSuccessfulBuild’ link that you sent. The virtual memory usage is still in the hundreds of megabytes range. Screenshot is attached.

tco 7359 0.0 0.1 323340 10756 pts/0 Sl+ 16:24 0:00 bin/talker

Are you aware of any fixed-sized buffer allocations that would account for the ~323 Megs used for this simple example?

nm.

The “virtual memory size” (VSZ) listed in ps isn’t what you think it is. That is the total size of the address space assigned to the executable, which includes code, data, memory-mapped files, shared libraries, swap usage, and pages that have been allocated but not actually used yet. It’s not uncommon for that to be very large; I’ve got a virtual machine running right now whose VSZ is about 46 GB…

You should actually pay attention to the resident set size (RSS), which is how much physical memory the task is actually using. In your case, 10,320 kB seems more reasonable.

@preed

Thank you for bringing this up as a point of clarification. I am aware of the difference between virtual memory size and resident memory size. I have attached a screenshot below showing how the virtual memory becomes resident when an application uses the memory and not immediately when malloc is called.

The issue in my case is that I do not have any way to provide swap space to the OS on my embedded target. Initially everything works functionally since the resident size of all my processes is less than the physical memory size (256 megs). After adding publishers and subscribers for all the necessary topics required the resident size exceeds the physical RAM size and Linux randomly kills a process as described in the following link:

https://serverfault.com/questions/141988/avoid-linux-out-of-memory-application-teardown/142017

The only way I can ensure that this will not happen in the released software is to enforce that the virtual memory size never exceeds the physical memory available.

My embedded target is running on an SD card and I do not think it is a good idea to add a swap partition due to the wear that occurs on this type of media over numerous write cycles. If I am wrong and it is acceptable to use an SD card partition as swap space then that is very good news for me. Do you know if this is acceptable?

I created this topic in an effort to confirm that the behavior I see on my system is expected and to ask if there is any way to configure Fast-RTPS to allocate significantly less memory. Can you please provide confirmation that the behavior I see is expected? Are you aware of any configuration to reduce the memory allocation?

For a point of reference I have an evaluation copy of a commercial DDS middleware and the VSZ for a simple ‘hello world’ application is only 4 megabytes. This is a big difference from 300 megs.

Thank you again for taking the time to respond to my post.

nm

I’m not familiar with Fast-RTPS, my point is just that VSZ can be deceptive because there are many things that add into that which don’t affect resident memory at all; If Fast-RTPS uses memory-mapped files or has large shared libraries, it could easily balloon far beyond the size of your physical memory without having any negative impact. If the actual RSS value is growing enough that processes are being killed, you might want to look in to using a memory profiling tool like Google perftools or valgrind massif on your program to see what’s allocating so much.

As far as SD cards go, how much wear it can handle depends largely on the card. There are industrial SD cards that supposedly can handle hundreds of TB of writes before they fail, so they would last for years even if you’re writing hundreds of GB per day.

@preed

Thank you again. I will look into an industrial SD card.

Before posting here I spent several days reading the Fast-RTPS code and running valgrind’s Massif tool on the sample applications. I was not able to find anything that could be easily reduced or eliminated.

It seems to me that the code is memory intensive as it is implemented and intended to run on systems with several gigs of RAM and plenty of swap space.

I hoped someone here could tell me that I was wrong and that there was a network buffer size or something else that I could configure to be smaller.

nn.

Yeah, I think the VSZ is not out of the normal. By comparison, my dockerd and gnome-terminal instances in my Linux VM have over 600Mb each in VSZ. I have no doubt that this might be contributing to your OOM killer issues, but it doesn’t seem to be not unique to talker (so it’s probably not a bug). In fact I think the vast majority of that VSZ is shared amongst all of them.

Not that ROS 1 is the standard-bearer, but out of curiosity I had a look at the talker from roscpp_tutorials:

% ps aux | grep talker
william  22046  0.7  0.2 397428 11896 pts/4    Sl+  13:47   0:00 /opt/ros/kinetic/lib/roscpp_tutorials/talker

And it is quite similar with ~397 Mb of VSZ and ~12 Mb of RSS.

That being said, I think Fast-RTPS is mostly designed for “typical” desktop machines with at least a few Gb of memory. There might be simple things we can do to reduce the memory pressure these programs put on the system, but after reading around on the internet, looking at pmap and a cursory look at the output of massif, I don’t see any low hanging fruit.

I still haven’t tried Connext or OpenSplice yet, but they might perform better in your usecase until we can improve Fast-RTPS for this case. That’s part of the idea behind the middleware abstraction is that we can use different implementations with different intrinsic qualities where applicable.

We can almost certainly fine tune the existing software to use less memory, but we just haven’t gotten to that level refinement yet. If anyone has any recommendations (for the ROS 2 code or Fast-RTPS), we’re happy to try them out to improve this.

@wjwwood

Thank you. Your confirmation is very helpful.

It seems I will need to continue down the path of a commercial middleware. I will let you know if I find something specific to improve or recommend in ROS 2 or Fast-RTPS.

nm.

Sure, the guys at eProsima might be able to help you get it working as you like too. I’ll make sure they’re aware of this thread.

You might also consider CoreDX DDS for your middleware layer. The ROS2 integration code can be found at https://github.com/tocinc/rmw_coredx

The rmw_coredx integration layer is relatively new, so there hasn’t been much optimization done yet; but in general CoreDX DDS does well in resource constrained environments.

–clark

Thanks @wjwwood for making me aware of this thread.

After checking what @NotMe has detected, i have been investigating why Fast RTPS requires so much virtual memory. Thanks to this link I’ve discovered each pthread thread is requiring 72MB. This occurs since a change in glibc-2.15.

I was checking a small program that uses pthread threads and the same program using std threads. A std thread only requires 8MB of virtual memory

Fast RTPS uses Asio library. In my system Asio library is using pthreads. I was able to configure Asio library to use std thread changing src/cpp/CMakeFiles.txt. I changed line 183 with this new line:

add_definitions(-D${PROJECT_NAME_UPPER}_SOURCE -DASIO_HAS_STD_THREAD -DASIO_DISABLE_THREADS)

After this change the HelloWorldExamples is using less virtual memory:

 ricardo@ricardodesktop  ~/workspace/desarrollo/proyectos/fastrtps >ps aux | grep Hello                                         
 ricardo  12502  0.0  0.0  73620 10160 pts/2    Sl+  13:04   0:00 ./examples/C++/HelloWorldExample/HelloWorldExample publisher

Are good numbers for you?

Hi @NotMe,

I think it would be useful to have a meeting with you to understand better your requirements and give you potential solutions. Could you please send me an email ( jaimemartin@eprosima.com ) with your contact details to organize a conference call ?