It seems to me that the decentralized approach of ROS2 has a hard time of scaling with larger networks (100 to 10000 Nodes). For comparison, I have implemented a most minimal Node in both ROS1 and ROS2:
Ros 1 Minimal Node
#include <ros/ros.h>
class MinimalNode
{
public:
MinimalNode()
{
ROS_INFO("Hello");
}
};
int main(int argc, char** argv)
{
ros::init(argc, argv, "min_node");
MinimalNode node;
ros::spin();
return 0;
}
Ros 2 Minimal Node
#include <rclcpp/rclcpp.hpp>
class MinimalNode : public rclcpp ::Node
{
public:
MinimalNode() : rclcpp::Node("minimal") { RCLCPP_INFO(this->get_logger(), "Hello"); }
};
int main(int argc, char** argv)
{
rclcpp::init(argc, argv);
rclcpp::Node::SharedPtr node = std::make_shared<MinimalNode>();
rclcpp::spin(node);
return 0;
}
I launched multiple copies of these nodes and recorded the physical memory use via ps
.
1 Node 100 Nodes 200 Nodes 1000 Nodes
ROS1 11MB 11MB 11MB 11MB
ROS2
FastRTPS 11MB 36MB 44MB ran out of memory
Connext 33MB 58MB failed -
OpenSplice 26MB memleak? - -
Under Connext RTI I began seeing the following error after having started about 100 Nodes, which is why I labeled the attempts to spawn more than that as ‘failed’:
[min_node-94] [D0013|ENABLE]DDS_DomainParticipantPresentation_reserve_participant_index_entryports:!enable reserve participant index
[min_node-94] [D0013|ENABLE]DDS_DomainParticipant_enableI:Automatic participant index failed to initialize.PLEASE VERIFY CONSISTENT TRANSPORT / DISCOVERY CONFIGURATION.
With OpenSplice, my nodes seem to suffer from a memory leak. The leak occurs once OpenSplice throws the following error:
[min_node-22] Report : API_INFO
[min_node-22] Date : 2020-02-21T14:33:49+0100
[min_node-22] Description : The number of samples '5000' has surpassed the warning level of '5000' samples.
[min_node-22] Node : <hostname>
[min_node-22] Process : min_node <22398>
[min_node-22] Thread : dq.builtins 7f5ec5269700
[min_node-22] Internals : 6.9.190705OSS///v_checkMaxSamplesWarningLevel/v_kernel.c/1940/769 /1582292029.991222504/13
With few nodes (50 is fine for example) I do not see the error and no memleak occurs either.
Among the three tested middlewares, only FastRTPS appears to be able to support larger networks of nodes. Even then however, the memory usage quickly ramps up and essentially bottlenecks the possible network size.
Since this is only considering the absolute most bare-bone node one could implement, I have concerns about how well ROS2 can scale up with large networks of hundreds of nodes. Seeing that the overhead of a very simple node could easily go into the 100s of MB with increasing network size, this puts further constraints on embedded devices running such a node in a large network.
I would be very interested in hearing what kind of strategic choices have been made with respect to a large number of nodes. Are networks of >1000 nodes considered outside the scope of ROS2?
Furthermore, what techniques do exist to cope with this hunger for memory?