This post is specifically intended to critique software design. I have nothing but appreciation for the people who work on open source robotics. I’m also relatively new to the community, so it’s perfectly possible that I’m just misunderstanding something
While trying to understand the architecture of the ros2_control package, I was wondering why it doesn’t use the node and message passing system already provided by ROS 2 and DDS. Basic usage gave me the impression that the control library was over-abstracting when the same level of modularity and interoperability could be achieved via defining both controllers and hardware as regular nodes.
For example, a specific PID controller would have subscribe to /measurement, /setpoint, and /voltage topics with user configured message types. On the receiving end, a piece of hardware would subscribe to /voltage and publish /temperature. This seems more straightforward and benefits from relying on ROS infrastructure for message passing, logging, etc.
This past thread had an inconclusive discussion on the topic.
The main feature a native ros controller implementation would not achieve is requiring controllers to own their hardware resources. However, I question the necessity of that, since it seems like a lot of work to circumvent a specific potential issue.
You already found the ROS Answers post where I’ve asked / suggested something similar. I still believe it would be a very interesting approach for the kinds of applications ros2_control (and it’s predecessor: ros_control) tries to make possible.
AFAIU and IIRC, non-determinism in (among other parts) the RMW layer and the executor(s) complicated going that way with the design. Perhaps using components could help mitigate some of that.
It’s possible though, as OROCOS implements the approach you describe (and it comes with ROS 2 integration so you could already use this to implement a control system while keeping ROS compatibility).
Although the linked post is two years old, the arguments of @destogl are still valid IMHO.
Keep in mind that one focus of ros2_control was real-time safe control loops, and they can run without any ROS networking magic (e.g., the joint_trajectory_controller performing an action goal → there is no ROS node involved any more). This was successfully tested with control loops faster than 1kHz, where you will have problems with most of the DDS implementations if you don’t know exactly how to configure that (latency, jitter,…).
About the use case you are describing with temperature as a measured state: You won’t need fast control loops and you are perfectly fine with ROS nodes without ros2_control. Additionally, you don’t get any benefit with the simulator integrations, because you won’t simulate temperature in gazebo.
there are actually two parts in there, as I see, and I will answer to both of those separately.
1. Why ros2_control doesn’t uses nodes/components
There are mostly historic resons for it. We at Stogl Robotics (my company) have already testes splitting the ros2_contorl to separate nodes across the network running distributed control in real-time. The test was successful, but if you want to achieve hard real-time (jitter under 1% of control frequency), as expected in industrial applications, we need to extend this even further. In this extreme case, there is also a question, why use then ROS 2 and DDS and not EtherCAT or similar that already solves many real-time related issues.
Looking at the soft real-time approach (jitter up to 50% of control frequency), many ROS 2 users expect - our multi-node approach could be useable out of the box with some smaller polishing. Still, to make this nicely integrated into the current architecture (keeping everything working) is a project between 6 and 12 months for a FTE.
Also on the note of the components - I often encounter that those are not working on the robot as one would expect, so also some work to make this bulletproof.
2. Abstraction of ros2_control
Even is go with the node-based structure to ros2_control, I would argue to keep the same amount of abstraction as there is now. This is not uncommon in other ROS 2 libraries. The point is that one would have communication over topics, but basically the structure of them should be the same to have enough flexibility of exchanging controllers and HW interfaces.
Regarding ownership of HW resources, this is not a bug, but a feature. In the control system, you want to have deterministic behavior about who is talking when and what to the hardware. Now these all mechanisms are all done through internal functions and in the node case we have to give them access through services or actions.
To wrap-up, I am very open to exploring the design with nodes and components - the only requirement I have is that someone commits to work on this until it is done.
I came across the Topic Based Ros2 Control package by PickNik Robotics. It integrates ros2_control with topic-based communication. It might be relevant to the discussion on modularity and integration.
I’d be interested to hear your thoughts on how such an approach might fit into the goals of ros2_control, particularly regarding the balance between real-time performance and modular design.
The ROS 2 control is modular. What you have referenced is not topic-based ros2-control, but only hardware interface.
There is already one experimental application of it, but more with the goal of making the control system distributed across multiple computers. To do this right, it is a lot of effort, but not many use-cases. Therefore, I don’t expect anything in this direction anytime soon.
Using topics instead of shared memory for communication is achievable also in terms of the real-time, but as I said, I don’t see anyone willing to invest 1-2 years of work into it.
This approach could indeed simplify modularity and interoperability while reducing the need for extra layers of abstraction.
The primary reason for ROS 2 control’s architecture, I think largely comes down to the real-time performance requirements. Using shared memory within a single process allows for low-latency, high-throughput communication between controllers and hardware. DDS-based topic communication, although real-time capable, can introduce more latency than direct memory access.
The current design enforces that a controller manages its hardware resources to avoid conflicts and ensure deterministic performance. However, in more distributed, non-time-critical systems, this strict ownership model might not be as necessary.
I doubt ros2_control was developed for this. There are more than enough solutions for “distributed, non-time-critical systems”. If your application requires such systems, maybe consider something else that isn’t ros2_control.
You can consider integrating drivers for industrial networks such as CC-Link IE, SLMP, Modbus TCP, MQTT, etc… instead of re-inventing the wheel. If your application is for monitoring purposes with some amount of “pushing a button to turn on something”, OPC UA might even work for you. APIs/documentation for these networks are freely available on the internet.
The ros2_control system was originally the pr2_controller_manager, which pre-dates ROS2, and the v1.0 release of ROS1 even. It was made only a few months before the PR2, so it was “designed quickly.” Robot arms need a pretty fast control loop (1 khz for the PR2) and for some robot arms if you fail to deliver control commands quickly enough, they can damage themselves. To keep up with this requirement, the pr2_controller_manager was designed to be hard-realtime, though in retrospect, the PR2 hardware was very robust and only soft-realtime was needed.
Communicating over sockets is too unpredictable for fast realtime systems, so writing the controller manager on top of ROS wasn’t going to work. Switching between threads can be too slow for certain realtime systems, so the control loop was single-threaded, and injecting and extracting data from the external ROS environment was atomic. Even memory allocation can be too slow (or have unpredictable timing bounds) for certain realtime systems. The pr2_controller_manager and the original controllers took the realtime constraint over-seriously, and preallocated all the memory for the control loop.
Ros2_control inherits its architecture from the original design, though with much better design for handling different types of robots.
Now, could we do better?
Probably not by building on top of ROS2 or DDS. Once you start using sockets, you will be too slow (or have too much unpredictable delay) to hit 1ms deadlines reliably. Maybe you could do it with shared memory transports. However, even then you are dealing with multiple threads or processes making up your control loop, which is complicated to get right.
It would be interesting to have a system with a similar nodes/topics model inside the realtime component. It would be more composable and maybe easier to reason about and debug. That could be a fun thing to build, though getting the design simple enough that anyone will want to use it is very challenging.
I personally think the different language and extra conceputal model in ros2_control add a lot on top of the collection of components that’s important. Command interfaces are claimed by their active controller and exclude other controllers from using them.
It’s nice for reasoning about what part of the control system can act on what part of your robot. You’ve got active and inactive controllers and if you want to switch out the command interfaces you can deactivate and activate controllers as necessary.
I’ve built topic-based control loops for the sake of simplicity. I was in a time crunch and didn’t understand ros2_control at the time. There’s nothing really keeping anyone from doing that. Works pretty okay.
But I seem to remember one of the first things I had to debug was that somewhere in my launch file I was launching two systems that published commands to the same topic. I was interleaving the desired command data from one with zeros from the other. Robot would stutter a bit and then go into a overtorque fault condition.
So I wouldn’t want to see command interface ownership leave ros2_control. I agree with @destogl here:
Regarding ownership of HW resources, this is not a bug, but a feature.
Ownership and similar features add extra semantics to particular topics and nodes, and it all needs language to describe it. State interfaces for read-only topics that can be safely shared arbitrarily, command interfaces for read/write topics that need some kind of broker or manager to manage access. I guess we could call them state and command topics, but the extra idea is still there.
As far as nodes, aren’t ros2_control controllers and components now lifecycle nodes? Maybe always have been? They also require implementation of certain additional methods. But the lifecycle state machine is there and nice for hardware. Makes it more understandable to orchestrate their configuration and activation to prepare the system to a correct working state.
One of the things ros2_control gives us now is that many of the preparation and initialization and mode switching activities required to bring up and alter the state of the system are well thought-out. There are lots of things that feel like a headache for a “simple” first application in the sense that there are a lot of methods that you need to implement for a new custom controller and transitions you need to handle.
However, I think a lot of those are probably features that SHOULD be implemented anyway for robust real-life applications.
There are a few things about ros2_control that I think make it tough.
Not only is it a conceptually complex framework, it’s a conceptually complex framework implemented as dynamic libraries written in C++.
Hardly ros2_control’s fault, but there are a lot of moving parts to get the skeleton of a basic controller in place and loaded. The basic code files, CMakeLists.txt, package.xml of typical ROS nodes but also getting the pluginlib definition right in your .cpp file and exporting the pluginlib .xml with the correct names, getting your YAML files right.
All those names tend to be similar but distinct (namespaces, class names, etc) but if you confuse things you get an obscure runtime error message that doesn’t tell you what file has the problem, or maybe you just crash /controller_manager.
Then there are headaches like debugging plugins, terrifying compile-time error messages, terrible runtime error messages, again, mostly just C++ headaches.
ros2_control somehow feels like the ROS 2 mandatory way to interface with hardware.
I see new ROS 2 users in forums asking questions about ros2_control hardware interfaces for their first project when it’s not clear if they’ve ever hooked a piece of hardware to a laptop and got them to talk back and forth. The same goes for Micro-ROS and stuff like that.
Writing an ordinary Python node that does basic comms over a serial port to shuttle data and commands bewteen ros topics and an Arduino will do the trick for most learning explorations.
So ros2_control probably shouldn’t be people’s first stab at implementation of their idea. (Also, IMO, doing a PID loop over socket-based pub/sub topics is a good way to appreciate ROS 2 control, even better if you DO implement it in Python and it’s NOT something like a thermal control system).
ros2_control feels very nice when you actually start to feel like you need it.
The documentation used to be lacking.
I feel like there’s been a big push to improve the docs greatly in the past couple of years, but in 2022 or so when I was looking at it deeply for the first time, I was having a tough time.
I think the documentation provided by the C++ comments was pretty good even back then, but trying to navigate the C++ code of an unfamiliar framework can be its own challenge.
This is less of an issue for parts of the ROS 2 codebase that aren’t tied to something that feels really basic like hooking a piece of hardware to your ROS 2 computer.
There’s a lot about the ros2_control design that feels like a well-thought-out articulation of what hardware is, what it does, what it provides to the rest of the system, what it needs from the system to function, and how its state needs to be monitored and prepared over time and compared with the software system’s expectations and predictions for what it was supposed to do. So it feels necessarily complex to me, full of a lot of features that would otherwise need to be user-implemented for mature robots anyway.
It’d be nice if the ros2_control effort had always been able to work with an ordinary ROS 2 node graph with a transparent and as-performant-as-possible data transport that provided some latency guarantees on a realtime system. I find the MARA project discussed in the ROS Answers thread @AngleSideAngle posted very compelling. But we still don’t have that kind of transport readily available without significant individual project effort as far as I know.
Thanks for appreciating the effort to improve the documentation. However, it sill lacks a bit of general information for new users: What is ros2_control, what it isn’t, do’s and don’ts. You have listed valid points here which would be worth adding there. Would you like to contribute and submit a PR?