ROS Cross-Distribution Communication

Just out of curiosity, I happened to check ROS cross-distribution communication.
Topic and Services are confirmed using latest container with host network.

ROS 1

Scenario ROS Master Can communicate?
kinetic/melodic kinetic YES
melodic/noetic kinetic YES
noetic/kinetic kinetic YES
kinetic/melodic melodic YES
melodic/noetic melodic YES
noetic/kinetic melodic YES
kinetic/melodic noetic YES
melodic/noetic noetic YES
noetic/kinetic noetic YES
  • ros-ROSDISTRO-roscpp-tutorials package is used for verification.

ROS 2 (Updated)

Scenario Can communicate?
foxy/galactic YES
galactic/rolling(humble) YES
rolling(humble)/foxy YES
  • export RMW_IMPLEMENTATION=rmw_fastrtps_cpp is enabled always to specify rmw implementation.
  • export ROS_DOMAIN_ID=5 is specified.
  • docker command docker run -it --privileged --name ros2_distro ros:distro. (need to replace distro into foxy, galactic, humble and rolling)
  • Bind different IPC namespace and network interface to each container. (no argument --net=host nor --ipc=host to docker command.)

might be dependent on rmw implementation, but cross-distribution communication is not supported as official, see Can nodes from different ROS 2 distributions communicate compatibly? - ROS Answers: Open Source Q&A Forum.

7 Likes

I’ve performed similar experiments in the past and I’m surprised by your results for ROS 2.

At least for me, forcing all distributions to use the same RMW resulted in successful communication between nodes from different distributions. I also used the demo / tutorial nodes.

How did you define compatibility exactly? Only in case of changes to msg/srv/action definitions did things not work for me when I tested.

(inter-RMW compatibility is something which isn’t obvious to many ROS 2 users I’ve noticed. Especially with galactic<->x (where x ∈ [foxy, humble, rolling]), but essentially in any mixed-RMW setup, ros2/rmw_cyclonedds#184 complicates things a bit)

1 Like

At least for me, forcing all distributions to use the same RMW resulted in successful communication between nodes from different distributions. I also used the demo / tutorial nodes.

yeah, i was expecting this, i was remembering some of combination worked before.

that is one of the intention to confirm this cross-distribution communication for me.

but during development, API/ABI could be changed over distribution, that is one reason to make these things not work beyond distribution. (i guess this is the bottom line…)

did you happen to use container like docker?

i think using container with binding host network interface could be the reason for my result.

it depends on RMW implementation how they tell or try to use localhost inter-process communication if supported, but it seems it would be worth to check with the different network interface to each container. (i will try this and share the result later.)

How did you define compatibility exactly? Only in case of changes to msg/srv/action definitions did things not work for me when I tested.

I just used container images from dockerhub (e.g. ros:humble and so on) I think standard message definitions are all same.

(inter-RMW compatibility is something which isn’t obvious to many ROS 2 users I’ve noticed. Especially with galactic<->x (where x ∈ [foxy, humble, rolling]), but essentially in any mixed-RMW setup, ros2/rmw_cyclonedds#184 5 complicates things a bit)

Agree, i believe this is one of the case. if we work with 3rd party application, sometimes we cannot control the whole distributed system or application.

We are likely to use specific RMW implementation for the platform with special feature and affinity provided by implementation such as zero copy, memory footprint and so on, but at the same time it needs to communication with other endpoints via network sometimes.

probably this is not common case, but it would be nice to have inter-operability between some RMW implementation over network.

i think this is one of out-of-the-box user experience to use ROS 2.

(i say some not Tier I, because it sounds like it pushes DDS protocol to be Tier I implementation at least from the current status, i am not sure about this…)

as mentioned in this issue, topics should be no problem between Tier I impls, but service does not work because of service identification.

I’ve always been under the impression that trying to bridge two different ROS distros is an anti-pattern, and that the official recommendation is that people shouldn’t do this because the results will be questionable at best, and catastrophic at worst.

2 Likes

I just re-ran all my tests (all possible pairs from [foxy, galactic, humble, rolling] with demo_nodes_cpp[listener, talker], demo_nodes_cpp[add_two_ints_{client, server}] and action_tutorials_cpp[fibonacci_action_{client, server}] across all pairs) and everything still works for me.

yes, all tests were container<->container.

I’ve only tested Fast-RTPS / Fast-DDS.

That needs some care: if using --net=host, it will assume the SHM transport can be used. Adding --ipc=host can work, but sometimes errors. It will then fallback to non-SHM I believe (and will actually still work). I disable SHM for testing purposes (I was more interested in binary message compatibility than performance) using something like this and then things work with or without --net=host.

I’ll see if I can make the test scripts available (it’s just a set of Bash scripts using the standard ros Docker images).

In most cases this is true. Between Foxy and Galactic however, at least control_msgs changed and that’s definitely incompatible and will lead to problems (as in: inability to communicate, no exploding robots or anything like that).

I believe this is or will be a very common use-case rather sooner than later, as I’ve already come across packages / nodes which don’t work with one RMW, but do with another (well, behaviour is better). If a vendor for instance only supports using Cyclone (because that’s what they’ve developed against and that’s what they’ve tested), using Fast-DDS is not an option.

Another example (which is one of the contexts in which I ran into ros2/rmw_cyclonedds#184) would be micro-ROS: it’s essentially only compatible with Fast-DDS, which, if we’re not supposed to mix-and-match RMWs within a single ROS 2 application / node graph, would require the entire application to be run with Fast-DDS.

In principle I would agree, but it’s naive.

In real-world scenarios it’s really difficult to completely control every package and every node (or at least incurs significant development/maintenance overhead). For organisations with legacy for example (and I’ve encountered companies still on Dashing fi), you can’t just say: “well … akshually, you’re not supposed to do that, so tough luck for you and your anti-pattern”.

And as I wrote above, there are situations where external factors prevent you from controlling which RMW gets used by a certain node or set of nodes.

The same goes for which ROS 2 version is supported by certain packages. Especially if that package is not available OSS.

that’s a bit dramatic I feel. I would not be comfortable doing it in general, but for certain use-cases and with (extensive) testing, it should be OK. Same as in ROS 1.

2 Likes

Would ROS 2 inform me about a message definition mismatch? In release mode? By failure ideally, not by console message with debug level?

1 Like

sorry guys for the confusion, here is the result that i have now.
with Fast-DDS, cross-distribution communication works via network.

ROS 2 (Updated)

Scenario Can communicate?
foxy/galactic YES
galactic/rolling(humble) YES
rolling(humble)/foxy YES
  • export RMW_IMPLEMENTATION=rmw_fastrtps_cpp is enabled always to specify rmw implementation.
  • export ROS_DOMAIN_ID=5 is specified.
  • docker command docker run -it --privileged --name ros2_distro ros:distro. (need to replace distro into foxy, galactic, humble and rolling)
  • Bind different IPC namespace and network interface to each container. (no argument --net=host nor --ipc=host to docker command.)

Although the basic examples work, I often found strange quirks and problems when having applications using different ROS 2 distributions try to communicate.
Note that I was using the same RMW, regardless of what the default was.

I think that the problem here is when people using legacy/mixed setups then ask for the community to help debugging and fixing their use-case.
I really don’t think that the ROS community has the bandwidth to actively support all permutations and ensure that they are working fine.
We already see that non-default RMWs or non-default settings often have bugs that are not noticed during standard CI/testing phases.

I think that ROS 2 should definitely show errors/issues whenever you are using different versions of the RMW libraries.
This would not help people that can’t upgrade, but it will definitely simplify the life of people that are using 2 different versions by mistake.

1 Like

I’d just like to mention that in the past, I have also encountered issues colliding GUIDs for verious DDS vendors when spawning multiple ROS2 nodes from different docker container with various network interface types. Duplicate GUIDs can result in discovery issues in DDS, see my own Q&A here:

2 Likes

DDS does support detecting matching failures, both in terms of QOS policy mismatches and type consistency mismatches. This information is exposed via the Status API for datawriters/datareaders (RTI doc examples listed below, as a reference).

https://community.rti.com/static/documentation/connext-dds/6.0.1/doc/api/connext_dds/api_cpp2/classdds_1_1core_1_1status_1_1InconsistentTopicStatus.html
https://community.rti.com/static/documentation/connext-dds/6.0.1/doc/api/connext_dds/api_cpp2/classdds_1_1core_1_1status_1_1OfferedIncompatibleQosStatus.html

It’s worth noting that these status APIs can often tell you what group of policies and what types are being mismatched on in aggregate, but some deeper diving has to be done on the part of the user to figure out which individual participant+datareader/writer is offending on each by looking up GUIDs and some of the information obtained through discovery. If you are familiar with the RTI Admin Console tool, almost anything it can display/deduce can also be deduced by the user, outside of specific monitoring plugins, though this probably varies across DDS implementations.

All of this to say, there are high-level APIs to alert you of some of these problems, but I’m not sure how much of this is exposed by the RMW at this time, or how granular that information is.

1 Like