Just wanted to highlight a paper that was recently accepted and will appear in IEEE Robotics and Autonomous Letters (RA-L) discussing the impact of ROS 2 composition for system designers. This was written by myself, @alsora, and @mjcarroll to help make clear the value of composition, selection of executors and containers, and the best practices around it.
The experiments show-case the most recent behavior of ROS 2 on Humble on both a unit-node level to understand characteristics of ROS 2 and on a system-level of a full-fledged autonomous robotics system running Nav2 – with insights provided by iRobot on their commercial needs on extremely resource constrained devices.
A key outcome that we found is that composition alone can save you 28% CPU and 33% RAM alone when running a run-of-the-mill AMR stack using Nav2. This massive resource savings is what actually motivated me to start working on this paper so I can grab every person on the street and scream in their face like a madman that if they’re not using composition, they’re the crazy one.
S. Macenski, A. Soragna, M. Carroll, Z. Ge, “Impact of ROS 2 Node Composition in Robotic Systems”, IEEE Robotics and Autonomous Letters (RA-L), 2023.
All the props to Alberto and Michael as the experts
I quickly read through the paper, it is very well-researched, thank you for sharing.
It seems to me that the main cause of the increase in memory usage is the use of a DDS-based RMW implementation. Shared memory-based RMW implementations were dismissed due to stability issues.
With these preconditions, the results were not very surprising (although it is good to have the exact numbers from the paper), it is a consequence of how DDS works and the need for serialization and deserialization instead of just passing pointers like when intra-process communication is used.
I agree that it is a good practice to develop all nodes as components, so component composition can be used without additional effort.
However, I am not sure I agree with the final recommendation of compositing many components into a single process. Doesn’t this practice reduce the reliability and fault tolerance of the system? E.g. if the whole Nav2 stack is composited into a single process, then a crash in any of those components will lead to the crash of the whole stack. This might be acceptable in some cases (like in the case of a very resource-constrained system) but may not in others.
What do you think? Do you see this as a risk of using composition?
For the long term, I would rather see more investment in alternative ROS2 backends where the issues outlined in the paper can be fixed:
Do not use network protocol-based RMW backends for intra-host communication, but use protocols that work without serialization / deserialization. A shared memory-based / zero-copy transport can be a good fit. Besides the standard POSIX shared memory, I would also look at Android’s Binder framework as an implementation. We know that it has pretty good performance (since billions of phones use it for almost every IPC need), and also has nice security features (e.g. both identity and token-based authentication)
For inter-host communication, a protocol like Zenoh (https://zenoh.io) could be adopted. In the Zenoh model, we can have a single router / gateway process on each host that is in charge of the inter-host communication and the translation from intra-host communication to a network protocol. The protocol itself has less overhead than DDS, and it was already evaluated by the ROS TSC.
I agree, its not surprising that putting resources closer together without serialization improves things. I’m just floored at how much it improves things to simply compose them together even without intra-process communication (IPC) or shared memory enabled, as the Nav2 experiments show. It could improve even more if IPC supported multiple QoS settings so we could use it within Nav2 as well.
I believe we touch upon this in the paper:
For systems under development, components are often placed in separate processes so that failures do not disturb the larger system and analysis tools can be run in a well- defined scope. As programs become more mature, they may be grouped into processes to reduce latency or share resources.
IMO, a production quality system shouldn’t be crashing and I would take any report of a crash not caused by a user-generated plugin to be of the upmost importance to resolve. If you have any fear that it might crash and you can recover from it as a non-critical system via a respawn, I think that’s a fine way to go if you can own the increased network affects. Crashes though shouldn’t be happening in a production system - practice defensive programming
I don’t have the deepest experience with this, but from my understanding most shared memory transport systems aren’t particularly fast, but they’re guaranteeing of stable performance (not that its necessarily fast). I don’t disagree having that as a stable option in the pool of options has obvious value. I think a mixture of eProsima, Cyclone, and Iceoryx are working on that, but still some ways to go to working well.
I generally agree. The metrics I see look quite promising to have as a Tier 1 option. I don’t know if there’s serious work towards that in progress somewhere, but I’d be supportive of it and be happy to beta test it.
Hi Steve, thanks for posting this. The information comes at a good time for me as I’m rethinking the way I orchestrate my nodes. I do have the requirement of recording the data (in bags) for later introspection… and I’m guessing that running rosbag2 in the usual inter-process way would undermine the performance improvements you observe when using composition (because of all those subscriptions coming from outside the composed node).
If so, would a good solution to this be to create a node using the C++ rosbag2 API, and have each “composed subsystem” handle its own logging?