Reconsidering 1-to-1 mapping of ROS nodes to DDS participants

ishugoel · July 25, 2019, 3:13pm

Hello,

We are working on running ROS 2 in an Embedded board and we find out that ROS 2 consumes high CPU because every ROS node is mapped to a DDS participant. We have performed some tests to investigate the issue and the tests and the results can be found at this link: GitHub - nobleo/ros2_performance.

The roadmap of ROS 2 development mentions “Reconsider 1-to-1 mapping of ROS nodes to DDS participants” https://index.ros.org/doc/ros2/Roadmap/. We would like to see this happen rather sooner than later. We already observe that this leads to problems in CPU usage and can constrain people in their freedom to design an architecture for a robotic system. The ROS2 middleware should allow for a setting where everything can be grouped into a single DDS participant for the people that want to use nodes for modularity at the top level, but don’t want the code fragmented at the bottom level. Many use cases exist where one would like to create multiple nodes that all run on the same hardware. This is especially important since intra-process communication does not work effectively at the time of writing this post.

Does anyone face this same kind of problem?
We would like to discuss the idea of reconsidering the 1-to-1 mapping of ROS nodes to DDS participants here and would like the current 1-to-1 mapping implementation to change and would be willing to contribute to changing this if possible.

Thank you,
Ishu Goel

dirk-thomas · July 25, 2019, 3:33pm

Instead of introducing an option the current idea is to associate the DDS participant with the context created during rmw_init. That would imply that common applications using a single init / context will only use a single DDS participant - even if they are composed of multiple ROS nodes.

ishugoel · July 25, 2019, 3:45pm

Thanks for your reply @dirk-thomas. When can we expect this functionality to become available? Is there anything we can do to help?

dirk-thomas · July 25, 2019, 4:00pm

If it gets implemented in time it will be available in the next ROS 2 r release which is Eloquent in Nov 2019.

Any help is appreciated. It will likely start with a design article to discuss the side affects of the intended change. E.g. the ROS node name is currently being used for the DDS participant name. When that mapping goes away there needs to be a replacement mechanism to communicate the node name.

christophebedard · July 26, 2019, 8:40am

After the initial ROS Answers post, we’ve also looked into this. In your answer, you mentioned that part of the CPU usage is caused by the executor itself. However, we found that this is really the main cause, rather than the DDS participant mapping. Therefore, if we want to lower the overhead (with relation to DDS), we should be looking at the executor as a whole.

Also, as a caution, we should not overlook the overhead that profiling adds, and how much it can really skew the results!

I’m working on a more in-depth analysis of the CPU usage of different parts of the executor. I’ll provide some results sometime next week.

MartinCornelis · July 26, 2019, 9:04am

Hi @christophebedard,

As mentioned in our research, both the SingleThreadedExecutor and the 1-to-1 mapping of nodes to DDS participants appear to contribute to the large CPU overhead. We were planning to open a separate discourse discussion for the SingleThreadedExecutor optimization. This way the discussions don’t mix and both “problems” can be addressed. The link to the SingleThreadedExecutor discussion will appear on our github page soon.

We look forward to reading your findings.

christophebedard · July 26, 2019, 9:51am

Yeah I wanted to make sure we didn’t forget about the executor! Glad to hear that you’re planning on opening a separate discussion for it

scg · July 26, 2019, 2:12pm

It depends a bit on the usecase. Perf shows about a 50/50 cause with this test (10 nodes, 20 topics/publishers/timers and 200 subscribers). Changing these numbers will also change the usage numbers.
Also have a look at https://github.com/ros2/rclcpp/pull/778 which skips the entire dds for intraprocess communication

alsora · July 26, 2019, 3:52pm

Hi, I’m the author of the intra-process communication PR mentioned by @scg.

First of all, thank you for the showing your results.

Using 1 participant per process (or context) is definitely an interesting idea.
Especially with Fast-RTPS since it would also reduce the memory usage a lot.

Fast-RTPS does not implement shared memory transportation yet, but it recognizes “local publications” i.e. messages where the publisher and the subscription are in the same participant. In this case the message is not sent over the network, but directly passed to the subscription.

I run your performance tests together with the new intra-process communication.

             | CPU       
rosonenode   | 12%      
rtps         | 8%

This still does not reach the same results of Fast-RTPS alone for this particular example.
Note that the new Intra-process implementation adds an additional entity to the waitset of the nodes for each subscription (possibly slowing down the SingleThreadedExecutor).

Note that other RMW implementation allows to reduce CPU usage. For example CycloneDDS https://github.com/eboasson/cyclonedds where a sort of intra-process communication is already implemented.
The performance of the CycloneDDS are slightly worst than the ones of the rclcpp PR since in this case it is possible to easily skip serialization and to save some copies by knowing in advance all the subscriptions.

kyrofa · July 26, 2019, 8:03pm

Not trying to debate the performance gains of this approach, but I think it’s worth pointing out that SROS 2 (and indeed DDS-Security in general) currently only supports security at the domain participant level. Using the same participant for multiple nodes will make security very difficult, as all nodes will have effectively the same identity and thus the same access control.

tomoyafujita · July 29, 2019, 2:51am

@kyrofa

Using the same participant for multiple nodes will make security very difficult

good point, thanks for this comment.

@dirk-thomas

applications using a single init / context will only use a single DDS participant - even if they are composed of multiple ROS nodes.

is that supposed to mean 1 process space : 1 participant always?

@ishugoel

thanks for bring up this issue, could you let us know once issue is registered?

ishugoel · July 29, 2019, 11:43am

Hi @tomoyafujita,
Thank you for your comment. Should I raise an issue for this on rwm github page: https://github.com/ros2/rmw/issues? Or what is the common way of registering an issue? And in the issue can I link to our github or Discourse page?

tomoyafujita · July 30, 2019, 9:41am

@ishugoel

that works for me at least,

and we are going to prepare Pi3 Model B+ with Ubuntu18.04/Dasing to make sure this problem on our side.

thanks,
Tomoya

MartinCornelis · July 31, 2019, 11:40am

@tomoyafujita I opened the issue here https://github.com/ros2/rmw/issues/180

tomoyafujita · July 31, 2019, 11:55am

@MartinCornelis

thanks, we will share update via issue .

tomoya

tomoyafujita · August 9, 2019, 9:36am

Just FYI,

[Environment]
Pi3 Model B+ with Ubuntu18.04/Dasing

Binary	publishers	subscribers	ROS	ROS nodes	ROS timers	DDS participants	CPU Usage[%]
ros	20	200	yes	10	10	10	255.1
rosonenode	20	200	yes	1	1	1	139.1
nopub	0	0	yes	10	10	10	23.6
rtps	20	200	no	0	0	1	51.3
noros	20*	200*	no	0	0	0	1.5

ishugoel · August 12, 2019, 12:53pm

Hi @tomoyafujita,

Thanks a lot for sharing your results.

ivanpauno · August 26, 2019, 9:16pm

There’s a design document with a proposal about this change: https://github.com/ros2/design/pull/250.

ishugoel · August 28, 2019, 9:02am

Hi @ivanpauno,

Thanks a lot for your efforts. It is a nice document.

Regards,
Ishu

MartinCornelis · August 29, 2019, 8:58am

@ivanpauno what is your expected timeline for these changes?
For someone who is new to this process (getting changes implemented into the core of ROS2) it is hard to judge how long implementing something like this could/would take.

I’m also very interested in getting a high level description of the entire process if this is possible.
The developer guide https://index.ros.org//doc/ros2/Contributing/Developer-Guide is written more with “normal” packages in mind. I don’t think a major overhaul like this (or for instance other changes to rclcpp, rcl and rmw that can have far reaching impact), are simply handled by creating multiple disjointed pull requests.
From watching from the sidelines what I’ve seen so far is that:

A discussion is started on discourse / an issue is raised on the github of the package
A design document is written together with community members (and members of the TSC)
The document is reviewed by (community members and) members of the TSC
?

This leads me to the following questions that maybe you or another member of the community could answer:
After writing the design document.
How are the next steps decided?
Who does the actual implementation, how is this decided?
How does one keep track of all the activities in multiple layers between multiple people?

Do multiple interested parties just respond to the issue/design document and figure something out from there? Is there a certain structure to this? Who is “responsible” for the final outcome?

Thanks in advance to anyone that can give me some clarity on the process.
Also thanks for all the hard work everyone has been putting in so far!

Greetings,

Martin

Topic		Replies	Views
Experiences with ROS 2 on our robots and what we learned on the way ROS General ros2	8	3107	August 28, 2022
ROS2 latency using different node setups ROS General	31	9927	June 17, 2021
SROS2 API after moving to one Participant per Context ROS General ros2	3	729	January 7, 2020
ROS 2 Alternative middleware report ROS General ros2	47	20507	December 6, 2023
Zenoh / ROS1 Bridge Released and more ROS General ros2 , ros , zenoh	1	1178	July 15, 2023

Reconsidering 1-to-1 mapping of ROS nodes to DDS participants

Related topics