ROS2 memory usage in large networks

It seems to me that the decentralized approach of ROS2 has a hard time of scaling with larger networks (100 to 10000 Nodes). For comparison, I have implemented a most minimal Node in both ROS1 and ROS2:

Ros 1 Minimal Node

#include <ros/ros.h>

class MinimalNode
{
public:
  MinimalNode()
  {
    ROS_INFO("Hello");
  }
};

int main(int argc, char** argv)
{
  ros::init(argc, argv, "min_node");

  MinimalNode node;
  ros::spin();
  return 0;
}

Ros 2 Minimal Node

#include <rclcpp/rclcpp.hpp>

class MinimalNode : public rclcpp ::Node
{
public:
  MinimalNode() : rclcpp::Node("minimal") { RCLCPP_INFO(this->get_logger(), "Hello"); }
};

int main(int argc, char** argv)
{
  rclcpp::init(argc, argv);
  rclcpp::Node::SharedPtr node = std::make_shared<MinimalNode>();
  rclcpp::spin(node);
  return 0;
}

I launched multiple copies of these nodes and recorded the physical memory use via ps.

                        1  Node       100 Nodes       200 Nodes       1000 Nodes            
ROS1                       11MB            11MB            11MB             11MB
ROS2
   FastRTPS                11MB            36MB            44MB             ran out of memory
   Connext                 33MB            58MB            failed           -
   OpenSplice              26MB            memleak?        -                -

Under Connext RTI I began seeing the following error after having started about 100 Nodes, which is why I labeled the attempts to spawn more than that as ‘failed’:

[min_node-94] [D0013|ENABLE]DDS_DomainParticipantPresentation_reserve_participant_index_entryports:!enable reserve participant index
[min_node-94] [D0013|ENABLE]DDS_DomainParticipant_enableI:Automatic participant index failed to initialize.PLEASE VERIFY CONSISTENT TRANSPORT / DISCOVERY CONFIGURATION. 

With OpenSplice, my nodes seem to suffer from a memory leak. The leak occurs once OpenSplice throws the following error:

[min_node-22] Report      : API_INFO
[min_node-22] Date        : 2020-02-21T14:33:49+0100
[min_node-22] Description : The number of samples '5000' has surpassed the warning level of '5000' samples.
[min_node-22] Node        : <hostname>
[min_node-22] Process     : min_node <22398>
[min_node-22] Thread      : dq.builtins 7f5ec5269700
[min_node-22] Internals   : 6.9.190705OSS///v_checkMaxSamplesWarningLevel/v_kernel.c/1940/769 /1582292029.991222504/13

With few nodes (50 is fine for example) I do not see the error and no memleak occurs either.

Among the three tested middlewares, only FastRTPS appears to be able to support larger networks of nodes. Even then however, the memory usage quickly ramps up and essentially bottlenecks the possible network size.

Since this is only considering the absolute most bare-bone node one could implement, I have concerns about how well ROS2 can scale up with large networks of hundreds of nodes. Seeing that the overhead of a very simple node could easily go into the 100s of MB with increasing network size, this puts further constraints on embedded devices running such a node in a large network.

I would be very interested in hearing what kind of strategic choices have been made with respect to a large number of nodes. Are networks of >1000 nodes considered outside the scope of ROS2?

Furthermore, what techniques do exist to cope with this hunger for memory?

1 Like

We did testing on the CPU side of things, but this may be related Reconsidering 1-to-1 mapping of ROS nodes to DDS participants . I can imagine that this mapping also influences the “size” of nodes. If the DDS part of the application is repeated for every single node (each node gets a participant) I suppose this will increase the size of the application in memory as well.

Did you manage to conclude anything else from your research that might allow for optimization?

@fabmene

you could try to use rmw_cyclone. as far as we know, it is less memory footprint and cpu consumption. actually thread number is also steady even if we make multi-nodes. (could be wrong, we analyzed that on dashing with Raspi3+.)

@MartinCornelis

i guess you are right, remapping nodes to DDS participants with Context is under development. using the same context with multi-nodes can reduce memory footprint and cpu consumption.