ROS 2 Alternative middleware report

Timple · October 5, 2023, 5:22am

We typically solve this situation as follows:

Either the publisher is periodic, so the first missed messages aren’t that bad.
Or the publisher should be latched. And the late subscribers still get the last one.

JM_ROS · October 5, 2023, 9:10am

Can we just use ROS 2 graph? we can get node name, namespace, topic type and endpoint information as well.

Yes, one could do this, but through as long, as you got a distributed self discovering RMW in the background, its all a bit racy and you don’t have any guarantees.

How can the publisher know all concerned subscriptions are ready w/o having a priori knowledge? Can you share the example here what you are looking for?

If I understood it correctly, the new RMW shall have a centralized broker. Therefore, we can ask the broker for certain information, like how many active clients are connected to topic X and what are their IPs etc. Armed with this information we would initial the connection from the publisher side.
This is of course still racy, if multiple nodes start up at the same time, but as long as you ensure a defined startup order (e.g. by using livecycle nodes) you will get the guarantee, that all connections should be connected, after all nodes are configured.

I guess this boils down to the question, can we with the new RMW expect guarantees for certain operations ? e.g. :

Will it be somehow deterministic how long it takes to register a node to the network, and how fast other nodes discover it ?
Might is be possible to ensure, that any message published, will be received by every subscriber that was registered before ?

clalancette · October 5, 2023, 1:57pm

No, almost certainly not. A broker architecture involves additional latency and data copies, which would not work well for large data. Data connections between nodes will likely be peer-to-peer.

However, we are considering whether to have zenohd running all the time as a discovery service. If we did that, then the nodes would discover each other through the discovery service, but would still make peer-to-peer data connections for efficiency.

JM_ROS · October 5, 2023, 3:22pm

This is exactly what I meant when I wrote ‘centralized broker’

tomoyafujita · October 5, 2023, 3:26pm

@JM_ROS

Thanks for the explanation, now I see that is discovery implementation.
As you mentioned, Client/Server design is still racy but i think it can be cost effective, that is one of the requirement in this RMW alternative described in the doc.

Satco · October 5, 2023, 7:18pm

Sorry if this is ignorant, I’ve only been reading in passing via email. But how is this different than rosmaster in ros1?

kisg · October 5, 2023, 7:24pm

@clalancette

However, we are considering whether to have zenohd running all the time as a discovery service. If we did that, then the nodes would discover each other through the discovery service, but would still make peer-to-peer data connections for efficiency.

I think this is a good idea, especially when considering the new ultra-low latency options (e.g. shared memory), which will probably need some kind of discovery/broker service anyway.

clalancette · October 5, 2023, 7:37pm

It is similar in concept. However, it would differ in two major respects:

We would probably launch zenohd in the background (so you don’t have to run the equivalent to roscore by hand). If we go this route, we’ll have to add in some configuration variables so people can disable that if they want to run their own.
There are some features of zenohd that we could leverage to potentially make the system self-healing in case of failure. That is, if zenohd crashed, we could potentially detect it, automatically restart it, and recover the whole graph (with some delay). While I don’t expect we’ll implement this in time for Jazzy, our research suggests that it is possible to do.

kydos · October 8, 2023, 5:00pm

The zenohd could also be used to deal with R2X communication as that would give quite a bit of control on what gets out and how it gets out. In other terms, finely controlling the information that “should” vs “should not” flow out of the robot, along with pacing of information (as an example). We’ve seen in several real-world use cases (Indy Autonomous Challenge being one of those), that the data-update rates required by on-robot nodes and for R2X are usually quite different. Thus being able to pace this data saves bandwidth (and CPU on the receiving robots).

–kydos

Felix_Xu · October 9, 2023, 12:38am

I’d like to be clear, the target is to

add a new rmw implementation (no DDS),
refactor the rclcpp

Is it 1), or 2), or both?

ZhenshengLee · October 9, 2023, 11:38am

I think in the short term it should be 1)
in the long term it should be both.

clalancette · October 9, 2023, 12:30pm

The goal of this work is to add a new RMW implementation based on Zenoh.

There are no current plans to refactor rclcpp. We will continue to improve rclcpp as necessary, both in terms of performance and features.

gbiggs · October 10, 2023, 1:00am

In addition to what other people have pointed out about the zenohd, it’s worth noting that Zenoh is capable of both that phonebook-style discovery, but also distributed discovery similar to what DDS uses (although it is typically more efficient about it). So you can have the best of both worlds and use whichever is most appropriate for your application.

ZhenshengLee · October 10, 2023, 6:31am

During the implementation of rmw_zenoh, there would be possibility to change the rcl api in order to get the best performance, am I right?

clalancette · October 10, 2023, 12:42pm

Certainly we can make changes to the RCL/RMW layer to improve performance as necessary.

caioaamaral · October 11, 2023, 12:18am

I’m curious about why a broker would be an undesired feature here. In OPC UA versus ROS, DDS, and MQTT benchmark, OPC UA performed better than DDS (in this case FastRTPS was the one being used).

I’m not an expert in this field, but it seems that brokered would avoid the so appalling discovery storm that comes with larger DDS networks.

Anyway, even if an unbrokered architecture is the way to go for the alternative middleware, I don’t think OPC UA should be immediately discarded. Specification itself shows an example of “broke-less” udp (for the pub/sub version), that at least remembers what a common DDS implementation does during the discovery step.

Also, the same spec makes it clear that it “does not define a Message Oriented Middleware”, being one free to even implement DDS as the transport protocol within an OPC UA implementation. As far as I remember, the same applies to OROCOS (not completely sure about this last part, but I remember seeing something about a DDS orb somewhere)

I’m very excited for what is coming in the future

gbiggs · October 11, 2023, 7:32am

Brokered can be reliable and work well for small messages, but when the message size grows very large, as it does with images and point clouds, a broker can quickly become a bottleneck.

BastianLampe · October 11, 2023, 10:40am

We benchmarked a brokered transmission of point clouds in a 5G network in one of our papers. For anyone interested in the results, feel free to check out our GitHub repository where you find code and instructions to reproduce the results, or to run your own experiments with different configurations. You find the paper here: [2209.03630] Enabling Connectivity for Automated Mobility: A Novel MQTT-based Interface Evaluated in a 5G Case Study on Edge-Cloud Lidar Object Detection

peci1 · October 11, 2023, 11:45am

Hi, I see some confusion here. I hope nobody wants to implement a fully brokered rmw (all data through a central element). The discussion here was about having a brokered discovery service (ROS 1 rosmaster style) or decentralized (ROS 2/DDS style).

clalancette · October 11, 2023, 11:57am

As others have pointed out, we need to be careful to define our terms here.

When we talk about “brokered” in the paper and above, we are using the term as defined in Broker pattern - Wikipedia . In particular, a broker is responsible for all communication between peers. That includes initial discovery, as well as ongoing data over topics, and request/replies for services. This notion of brokering is undesirable in ROS 2, as it increases latency and CPU usage (due to the additional copies between the publisher and broker, and from the broker to the subscribers).

The topics and services in ROS 1, for instance, are not “brokered” in my understanding. Instead, it is a centralized discovery service so that peers can find each other. Once peers know how to contact each other, they connect to each other directly to exchange data.

It is this latter notion that we are considering introducing into ROS 2 via the zenohd router.

Topic		Replies	Views
Investigation into alternative middleware solutions General ros2	56	9989	September 27, 2023
New Zenoh bridge for ROS 2 General ros2 , networking , zenoh	4	3793	December 2, 2023
ROS 2 Galactic Default Middleware Announced General ros2 , dds , wg-middleware , galactic , rmw	1	7511	December 17, 2020
👉 Survey: Default DDS Implementation for ROS 2 Humble -- due 10/12/2021 General ros2 , rmw , humble	3	3102	November 10, 2021
Zenoh ROS 2 RMW: A New Middleware Implementation - ROS Developers OpenClass #194 Training & Education ros2 , dds , zenoh , ros2-rmw	1	1169	August 19, 2024

ROS 2 Alternative middleware report

Related topics