Needed: ROS2 Node Management Tool

How do you recommend launching a prototype robotics systems on multiple PCs (minimum of an operations computer and a robot computer)? There has been past discussion on this and some great ROS1 tools, but I have yet to see a ROS2 tool that sufficiently fits the bill without also being overly challenging to install and configure (possible exception, perhaps too soon to say).

It feels like this should be a common ROS2 tool, but it appears that the mindset of assigning this design work to roslaunch alone has hampered efforts to make something simple and robust. We are thus left with many bespoke solutions developed by the community for their specific purposes (e.g. docker compose, ansible, systemd, etc.).

From my perspective, rosmon (ROS1) probably got the closest to a must-have reliable tool for remote node management (though I mostly used node manager on projects I was on) and we are in need of some ROS2 equivalent node management tool. Does anyone have insight or recommendations? A few requirements/desirements would be:

  1. Manually start / stop nodes on multiple PCs at any time (perhaps with requirement to start some daemon first)
  2. See status (alive/dead) of each node (or component set)
  3. View individual terminals for each node at any time
  4. View parameters for each node (yes, there are other ways to do this already)
  5. Launch debug sessions (gdb or similar) on any running node (rosmon does this)
5 Likes
  1. GitHub - swri-robotics/swri_console: Replacement for rqt_console
  2. GitHub - ros-visualization/rqt_reconfigure at rolling
  3. gdb -p PID?

For 1 and 2 we used rosmon in ros1, but I don’t know an alternative to this for ros2. I would be interested if this exists.

1 Like
  1. This looks really useful, thanks! One thing that happens with some nodes, especially those that interact with drivers, is their output does not go to roslog and is instead pushed to console. So while using this or rqt_console is helpful, it is not complete.
  2. Yeah, probably what I’ll use for now; an all in one node manager would be nice though. We often end up with lots of windows with ROS
  3. I’m guessing this is what rosmon was doing, with the magic being that it automatically tracked PIDs locally and remotely.

The multi-agent suite is the closest I’ve seen to cover all these requirements for ROS2; I’m just worried about complexity being high and documentation that is not there yet. The other option that solves 1. is something like tmuxp, tmuxinator, or catmux, but these require a lot more scripting overhead for every configuration and don’t solve the other requirements. I’m debating rolling my own tool.

@swan @Rayman

How about using Kubernetes, agnostic from application framework? overkill :sweat_smile: ???

you can do the following with simple command. (just ones of some examples.)
these are some visual aids to show what it can do here, all images are come from the tutorials above.

  1. You can deploy your application where you want with connectivity.
  2. Dashboard or CNI tools can be used.

  1. You can jump in any terminals of system or pods in the cluster via kubectl exec ...

The pain point is, we need to manage the cluster… i think creating kubernetes cluster can be already barrier for ROS users.
We have internal proprietary system service using mDNS and Raft Consensus Protocol with user configured candidate physical machines to set up the cluster dynamically with robustness and static services such as dashboard, but that is not open source.

i am thinking that if we can set up the cluster with systemd service (will be static cluster api-server) with support systemd kubelet to start the cluster automatically · Issue #28 · fujitatomoya/ros_k8s · GitHub, that would be helpful??

or if you are looking for the ROS specific tools only, this is just FYI.

thanks,
Tomoya

3 Likes

I appreciate seeing different approaches quite a bit, this is great.

I already use docker for dev containers (which we happen to use on robot as well for maintenance convenience) and have put together simple kubernetes clusters, but I’m hesitant to do a cluster for my robotics stack when I think about possible extra maintenance, networking challenges, runtime overhead and side effects. How is this working out for you? Are you creating a separate image per package or just mounting executables to some base image and running them? I assume you have CI for all your nodes/packages? Are all your pods running with host network and fully privileged?

this is working okay for development especially. (we do not have any plan for the production environment yet.) i would love to get feedback and use cases if anyone is interested :smile:

images are really dependent packages or application. this is about security (fail independently, should it auto-heal by pod?), performance (application can use true zero copy if they are in the same pod) and so on.

yes, we have full internal CI/CD pipeline daily and by-commit.

sometimes using host network is straight forward with privilege especially for development, but that is we can configure when we deploy the application. our plan is to use Cilium CNI empowered by eBPF (with wireguard VPN and encryption) instead of host network, so that we can even go beyond the NAT in the network layer without application layer proxy or bridge anymore. (this is not ROS development, but we are actively working on this, and verified with AWS and Huawei Cloud) besides that, we can rely on full observability and security such as Runtime Enforcement.

the thing is Cilium CNI does not fully support multicast yet (I am not sure multicast would be major use case, because we can have discovery service in the cluster with auto-healing and backup endpoints.), we are still working on some cli extension and utilities to enable in Cilium development. (see more details for ros_k8s/docs/Setup_Kubernetes_Cluster.md at main · fujitatomoya/ros_k8s · GitHub)

according to the runtime overhead, maybe KubeEdge meets Cilium !!! | KubeEdge would be interesting.

I am probably not gonna go more details, since this is not really about ROS application specific architecture. but if you are interested in this architecture with ROS, i am happy to discuss and chat more! please let me know :grin: (jfyi, I will be talking about this in next KubeCon NA 2024, robotics can be one of the major use case I believe :crossed_fingers: )

thanks,
Tomoya

We’re a docker/compose shop also, with the general approach being to use a container per major system. E.g. control, commanding, path planning, comms, perception, etc. Taking advantage of restart-unless-stopped lets us make sure the robot is up and ready to use right after bootup. I particularly like using docker because each container can be different images. One drawback (for us anyways) is the images can get quite large and be cumbersome to update.

It terms of monitoring, we do monitor onboard but anything from robot to console has been largely in house built as we have significantly lower bandwidth available than most (subsea robots communicating over acoustic communications). I’d definitely be interested in hearing other folks approaches, especially in communications starved situations.

1 Like